I imagine we’ve all heard about the recent “Sequoia” bug discovered by the Qualys Research team, identified by CVE-2021-33909. It’s a fascinating bug caused by a size_t to int conversion. According to the analysis, seq_dentry attempts to convert a size_t to an int by sending size_t size to the dentry_path function, which expects a signed integer. Assuming the architecture is 32 bits, size_t’s value can be 0 to 4294967296 since it is unsigned, but int can only hold from -2147483648 to 2147483648 because it is signed (this means that it can have negative values also). This results in an out-of-bounds access during the pointer arithmetic in dentry_path that’s done with p = buf + buflen;.

This specific bug is interesting because the circumstances it appears in are quite common in the Linux Kernel. To examine this further I decided to employ CodeQL. So, precisely what is CodeQL? CodeQL is a language by Semmle/Github/Microsoft to control a semantic analysis engine for static code examination. In CodeQL, code is treated as data. Security vulnerabilities, bugs, and other issues are represented as the results of queries that may be executed on code-retrieved databases. Queries that discover potential vulnerabilities display the outcome in the source file. As a result, it is a tremendously strong tool for variant analysis.

Alright, let’s look at the CodeQL query I wrote.

/**
 * @author Jordy Zomer
 * @name unsigned to signed used in pointer arithmetic
 * @description finds unsigned to signed conversions used in pointer arithmetic, potentially causing an out-of-bound access
 * @id cpp/sign-conversion-pointer-arithmetic
 * @kind problem
 * @problem.severity warning
 * @tags reliability
 *       security
 *       external/cwe/cwe-787
 */

import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.security.Overflow

from FunctionCall call, Function f, Parameter p, DataFlow::Node sink, PointerArithmeticOperation pao
where
f = call.getTarget() and
p = f.getAParameter() and
p.getUnspecifiedType().(IntegralType).isSigned() and
call.getArgument(p.getIndex()).getUnspecifiedType().(IntegralType).isUnsigned() and
// Here we check if the argument is an operand in an expression that does pointer arithmetics
pao.getAnOperand() = sink.asExpr() and
DataFlow::localFlow(DataFlow::parameterNode(p), sink)
select call, "This call: $@  passes an unsigned int to a function that requires a signed int: $@. And then used in pointer arithmetic: $@", call, call.toString(), f, f.toString(), sink, sink.toString()

So what we do here is obtain a FunctionCall to a Function with any parameter that requires a signed integer. Following that, we look for any function calls that provide an unsigned number to this function despite the fact that it expects a signed integer. After that, we will use the DataFlow library to “taint track” any use of this argument in pointer arithmetic. Running this query on the Linux kernel database successfully identifies the Sequoia vulnerability as well as hundreds of additional instances that may be vulnerable.

Because there are so many results, I decided to refine the query slightly, so I added three filters to narrow down the criteria.

  • Establish whether there is a size check where the source is more than something
  • Determine whether the sink is smaller than something
  • Identify whether the source is a constant.

I configured it such that it only displayed results if none of these filters matched. Below you will find the updated query:

/**
 * @author Jordy Zomer
 * @name unsigned to signed used in pointer arithmetic
 * @description finds unsigned to signed conversions used in pointer arithmetic, potentially causing an out-of-bound access
 * @id cpp/sign-conversion-pointer-arithmetic
 * @kind problem
 * @problem.severity warning
 * @tags reliability
 *       security
 *       external/cwe/cwe-787
 */

import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.security.Overflow

from FunctionCall call, Function f, Parameter p, DataFlow::Node sink, PointerArithmeticOperation pao
where
f = call.getTarget() and
p = f.getAParameter() and
p.getUnspecifiedType().(IntegralType).isSigned() and
call.getArgument(p.getIndex()).getUnspecifiedType().(IntegralType).isUnsigned() and
pao.getAnOperand() = sink.asExpr() and
// determine whether there is not a check where the `Sink` < "something" 
not exists(Operation a | guardedLesser(a, sink.asExpr())) and
//  establish whether there is not a size check where the `Source` > "something"
not exists(Operation b | guardedGreater(b, call.getArgument(p.getIndex()))) and
// identify whether the `Source` is not constant
not call.getArgument(p.getIndex()).isConstant() and
DataFlow::localFlow(DataFlow::parameterNode(p), sink)
select call, "This call: $@  passes an unsigned int to a function that requires a signed int: $@. And then used in pointer arithmetic: $@", call, call.toString(), f, f.toString(), sink, sink.toString()

Going through the results yielded the following issues and associated patches:

Due to the large number of results, we didn’t check to see if everything was truly vulnerable, we simply wanted it to be obviously secure. Furthermore, this is a work in progress, expect additional patches soon. If you wish to help fix these findings, please feel free to reach out to me at jordy [at] pwning.systems and I’ll provide you with the results.

Because of the nature of this query, it may be a good idea for the Github Securitylab team to use it on LGTM, as this type of bug may occur in any C application. CodeQL’s potential as a static analysis tool is obvious. I sincerely hope that it will be used in other research and projects.

I’d like to thank Greg and the other developers that contributed for their fantastic collaboration and insights. It was a huge amount of fun!

Cheers!