Fuzzing floating point code

2020-09-21

In this article I demonstrate how to fuzz test floating point code using libFuzzer. I provide custom mutator and crossover functions that allows libFuzzer to efficiently explore the space of interesting floating point inputs. My method can also be considered as a form of property based testing as pioneered by QuickCheck.

If you are not experienced with fuzzing I recommend you to first read the libFuzzer Tutorial.

Fuzz testing or fuzzing is commonly used to find bugs in software that accepts untrusted user inputs, for example text or binary format parsers, compression and network protocols. Open source coverage-guided fuzzing engines libFuzzer, AFL and Hongfuzz are primarily designed for this purpose and are good at generating corresponding inputs. They lack awareness of the structure (or grammar) of the expected inputs and this makes them inefficient at generating interesting inputs for functions requiring inputs with a complex structure.

To fuzz test floating point code I focus on generating interesting IEE 754 floating point values such as NaN, infinity, zero and machine epsilon. I supply libFuzzer with custom mutator and crossover functions to make it aware of the structure of floating point numbers. This is called structure aware or grammar aware fuzzing.

Consider a simple floating point function that returns the sum of the inputs while ignoring all NaNs:

double sum(const double *begin, const double *end) {
  return std::accumulate(
      begin, end, 0.0, [](auto a, auto b) { return std::isnan(b) ? a : a + b; });
}

It’s tempting to assume this function will never return NaN. The following libFuzzer target tests this property:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  double *begin = (double *)Data;
  double *end = (double *)Data + Size / sizeof(double);

  double res = sum(begin, end);

  if (std::isnan(res)) {
    std::abort();
  }

  return 0;
}

The above fuzz target assumes that it’s input is an array of doubles. To speed up fuzzing I provide libFuzzer with a custom mutator function that randomly generates floating point values that are likely to trigger edge cases:

extern "C" size_t LLVMFuzzerCustomMutator(uint8_t *Data, size_t Size,
                                          size_t MaxSize, unsigned int Seed) {
  double *begin = (double *)Data;
  double *end = (double *)Data + Size / sizeof(double);

  std::minstd_rand gen(Seed);

  auto rfp = [&]() {
    switch (std::uniform_int_distribution<>(0, 10)(gen)) {
    case 0:
      return std::numeric_limits<double>::quiet_NaN();
    case 1:
      return std::numeric_limits<double>::min();
    case 2:
      return std::numeric_limits<double>::max();
    case 3:
      return -std::numeric_limits<double>::min();
    case 4:
      return -std::numeric_limits<double>::max();
    case 5:
      return std::numeric_limits<double>::epsilon();
    case 6:
      return -std::numeric_limits<double>::epsilon();
    case 7:
      return std::numeric_limits<double>::infinity();
    case 8:
      return -std::numeric_limits<double>::infinity();
    case 9:
      return 0.0;
    case 10:
      std::uniform_real_distribution<> dis(-1.0, 1.0);
      return dis(gen);
    }
    return 0.0;
  };

  switch (std::uniform_int_distribution<>(0, 3)(gen)) {
  case 0: { // Change element
    if (begin != end) {
      std::uniform_int_distribution<> d(0, end - begin - 1);
      begin[d(gen)] = rfp();
    }
    break;
  }
  case 1: // Add element
    if (Size + sizeof(double) <= MaxSize) {
      *end = rfp();
      ++end;
    }
    break;
  case 2: // Delete element
    if (begin != end) {
      --end;
    }
    break;
  case 3: // Shuffle elements
    std::shuffle(begin, end, gen);
    break;
  }

  return (end - begin) * sizeof(double);
}

I also provide libFuzzer with a custom crossover function that chooses elements from the two inputs with equal probability:

extern "C" size_t LLVMFuzzerCustomCrossOver(const uint8_t *Data1, size_t Size1,
                                            const uint8_t *Data2, size_t Size2,
                                            uint8_t *Out, size_t MaxOutSize,
                                            unsigned int Seed) {
  std::minstd_rand gen(Seed);
  std::bernoulli_distribution bd(0.5);
  size_t n = std::min({Size1, Size2, MaxOutSize}) / sizeof(double);
  for (size_t i = 0; i < n; ++i) {
    ((double *)Out)[i] = bd(gen) ? ((double *)Data1)[i] : ((double *)Data2)[i];
  }
  return n * sizeof(double);
}

I assemble the code together in a file fpfuzzing.cpp and build it with the following command:

$ clang++ -g -fsanitize=fuzzer fpfuzzing.cpp -o fpfuzzing

Finally I start the fuzzing with the following command:

$ ./fpfuzzing

Almost immediately the fuzzer finds a test case that proves that the sum function returns NaN:

...
SUMMARY: libFuzzer: deadly signal
MS: 4 Custom-CustomCrossOver-CustomCrossOver-Custom-; base unit: 520c19e785a3d29dc10636fa0fb0d2df487c5d6b
0x0,0x0,0x0,0x0,0x0,0x0,0xf0,0x7f,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0xf0,0xff,
\x00\x00\x00\x00\x00\x00\xf0\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0\xff
artifact_prefix='./'; Test unit written to ./crash-c7b321858ccd8a7bbb62bec3a371766a39408f2e
Base64: AAAAAAAA8H8AAAAAAAAAAAAAAAAAAPD/

I use gdb to investigate:

$ gdb fpfuzzing
Reading symbols from fpfuzzing...
(gdb) b abort
Breakpoint 1 at 0x475bf0: file fpfuzzing.cpp, line 22.
(gdb) r
Thread 1 "fpfuzzing" hit Breakpoint 1, 0x00007ffff7a7276e in abort () from /lib64/libc.so.6
(gdb) f 1
#1  0x0000000000475c05 in LLVMFuzzerTestOneInput (Data=0xdf75d0 "", Size=32) at fpfuzzing.cpp:22
22          std::abort();
(gdb) p Size/sizeof(double)
$1 = 4
(gdb) p *begin@4
$2 = {inf, nan(0x8000000000000), nan(0x8000000000000), -inf}
(gdb)

The fuzzer discovered that -inf + inf = nan. My initial assumption that the sum function never returns NaN was wrong.

You can fuzz test more complicated functions by extending the mutator and cross over functions. For example generating symmetric matrices or linear systems. I expect my method to be good at identifying issues caused by edge cases.

Download the full example code: fpfuzzing.cpp.

References

“libFuzzer Tutorial”. https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md
“Structure-Aware Fuzzing with libFuzzer”. https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md
OWASP. “Fuzzing”. https://owasp.org/www-community/Fuzzing
“OSS-Fuzz”. https://google.github.io/oss-fuzz/