SMBleedingGhost Writeup Part III: From Remote Read (SMBleed) to RCE

SHARE THIS ARTICLE

Follow zecops

Introduction

Previous SMBleedingGhost write-ups: 

In the previous part of the series, SMBleedingGhost Writeup Part II: Unauthenticated Memory Read – Preparing the Ground for an RCE, we described two techniques that allow us to read uninitialized memory from the pool buffers allocated by the SrvNetAllocateBuffer function of the srvnet.sys module. The first technique accomplishes that by crafting a special SMB packet and deducing information from the server’s response. The second technique, which has less limitations, does that by sending specially crafted compressed data and deducing information depending on whether the server drops the connection.

The next thing we had to understand was: what can be done with this reading ability? As a reminder, we began this research with a write-what-where primitive that we demonstrated in our previous research about achieving local privilege escalation. Since most of the memory layout in the modern Windows versions is randomized, we need to have at least one pointer to be able to do something useful with the write-what-where primitive. Unfortunately, memory allocated with the SrvNetAllocateBuffer function is mostly used for network data such as SMB packets and doesn’t contain system pointers. We could try and read uninitialized memory left by a previous allocation that wasn’t done with SrvNetAllocateBuffer, but it would be difficult to predict where to look for a pointer in this case, especially since we can’t run code on the target computer that could help us grooming the pool (unlike in the case of a local privilege escalation, for example). So we started looking for something more reliable.

Hear the news first

  • Only essential content
  • New vulnerabilities & announcements
  • News from ZecOps Research Team
We won’t spam, pinky swear 🤞

SrvNetAllocateBuffer and the allocated buffer layout

As we already mentioned in our local privilege escalation research, the SrvNetAllocateBuffer function doesn’t just return a buffer with the requested size. Instead, it returns a pointer to a struct that is located at the bottom of the pool-allocated memory block, containing information about the allocated buffer. The layout of the pool-allocated memory block is the following:

While our reading technique can only read bytes from the “User buffer” region, we can use the integer overflow bug to copy parts of the SRVNET_BUFFER_HDR struct to the “User buffer” region of another buffer, which we can then read. We can do that by setting the Offset field to point at the SRVNET_BUFFER_HDR struct beyond the data we want to read. We just need to make sure that the data that is located there can be interpreted as valid compressed data, otherwise the copying won’t happen.

Hunting for pointers

Let’s take a look at the fields of the SRVNET_BUFFER_HDR struct and see whether there’s something worth reading:

#pragma pack(push, 1)
struct SRVNET_BUFFER_HDR {
/*00*/  (orange) LIST_ENTRY ConnectionBufferList;
/*10*/  WORD BufferFlags; // 0x01 - no transport header, 0x02 - part of a lookaside list
/*12*/  WORD LookasideListIndex; // 0 to 8
/*14*/  WORD LookasideListLogicalProcessor;
/*16*/  WORD TracingDataCount; // 0, 1 or 2, for TracingPtr1/2, TracingUnknown1/2
/*18*/  (blue) PBYTE UserBufferPtr;
/*20*/  DWORD UserBufferSizeAllocated;
/*24*/  DWORD UserBufferSizeUsed;
/*28*/  DWORD PoolAllocationSize;
/*2C*/  BYTE unknown1[4];
/*30*/  (blue) PBYTE PoolAllocationPtr;
/*38*/  (blue) PMDL pMdl1;
/*40*/  DWORD BytesProcessed;
/*44*/  BYTE unknown2[4];
/*48*/  SIZE_T BytesReceived;
/*50*/  (blue) PMDL pMdl2;
/*58*/  (orange) PVOID pSrvNetWskStruct;
/*60*/  DWORD SmbFlags;
/*64*/  (orange) PVOID TracingPtr1;
/*6C*/  SIZE_T TracingUnknown1;
/*74*/  (orange) PVOID TracingPtr2;
/*7C*/  SIZE_T TracingUnknown2;
/*84*/  BYTE unknown3[12];
};
#pragma pack(pop)

The colored variables are pointers. The blue-colored pointers all point inside the pool-allocated memory block, with offsets which can be calculated in advance, so it’s enough to read one of them. Having an absolute pointer to the pool-allocated memory block will surely be helpful. Regarding the orange-colored pointers:

  • ConnectionBufferList – A linked list of all of the received, unhandled buffers of a connection. The list head is a part of the connection object created by the SrvNetAllocateConnection function in srvnet.sys. A buffer is added to the list by the SrvNetWskReceiveComplete function. In our case, there will be only one buffer in the list, so both pointers (Flink and Blink of the LIST_ENTRY struct) will point to the list head inside the connection object.
  • pSrvNetWskStruct – Initially, a pointer to the connection object mentioned above. The pointer is set by the SrvNetWskReceiveEvent function, but is overridden by the SrvNetWskReceiveComplete function with the pointer to the SRVNET_BUFFER_HDR struct. Thus, reading it is not more useful than reading one of the other blue-colored pointers. By the way, if you search for “pSrvNetWskStruct“ you’ll find out that it played a role in exploiting EternalBlue.
  • TracingPtr1/2 – These pointers are only used when tracing is enabled, as it seems.

As you can see, the only other useful pointer for us to read is one of the pointers from the ConnectionBufferList struct. Both pointers (Blink and Flink of the LIST_ENTRY struct) point to the connection object. The object struct has been named SRVNET_RECV by EternalBlue researchers, so we’ll use this name as well.

Getting a module base address

Now that we know how to get the two pointers – a pointer to a pool-allocated memory block and a pointer to an SRVNET_RECV struct – we can freely modify the two buffers using the write-what-where primitive. There are probably several ways from this point to achieve RCE, but we had a feeling that getting a base address of a module would be the most straightforward option since there are so many things we can modify in a data section of a module. As we’ve seen, none of the pointers in a memory block allocated by SrvNetAllocateBuffer point to a module. We had hopes for the SRVNET_RECV struct, but we didn’t find pointers that point to a module there, too. On the bright side, there are several pointers to modules one additional dereference away:

At this point, we noticed that since we can override those pointers in SRVNET_RECT, we can call an arbitrary function by replacing the HandlerFunctions pointer and triggering one of the events, e.g. closing the connection so that Srv2DisconnectHandler is called. This will come in handy later, but we didn’t have any function pointers to call yet, so we continued with our attempt to get a module base address.

Unlike writing, reading those pointers is not as easy since our technique allows us to read only from the “User buffer” region. So close, yet so far. Since we can get and modify a pool-allocated memory block and an SRVNET_RECV struct, we hoped to find code that we can trigger that does a double-dereference-read followed by a double-dereference-write with two variables that we control, similar to the following:

ptr1 = *(pSrvNetRecv + offset1)
value = *ptr1
ptr2 = *(pSrvNetRecv + offset2)
*ptr2 = value

If we could find such a snippet, we would trigger it to copy the first pointer (e.g. HandlerFunctions) to the “User buffer” region, read it, then copy the second pointer (e.g. the Srv2ConnectHandler function pointer) to the “User buffer” region and read it as well, deducing the module base address from it. We searched for such a snippet for a long time, but didn’t find a good match. Finally, we settled for a sub-optimal option which nevertheless worked. Let’s take a look at the relevant part of the SrvNetFreeBuffer function (simplified):

void SrvNetFreeBuffer(PSRVNET_BUFFER_HDR Buffer)
{
    PMDL pMdl1 = Buffer->pMdl1;
    PMDL pMdl2 = Buffer->pMdl2;

    if (pMdl2->MdlFlags & 0x0020) {
        // MDL_PARTIAL_HAS_BEEN_MAPPED flag is set.
        MmUnmapLockedPages(pMdl2->MappedSystemVa, pMdl2);
    }

    if (Buffer->BufferFlags & 0x02) {
        if (Buffer->BufferFlags & 0x01) {
            pMdl1->MappedSystemVa = (BYTE*)pMdl1->MappedSystemVa + 0x50;
            pMdl1->ByteCount -= 0x50;
            pMdl1->ByteOffset += 0x50;
            pMdl1->MdlFlags |= 0x1000; // MDL_NETWORK_HEADER

            pMdl2->StartVa = (PVOID)((ULONG_PTR)pMdl1->MappedSystemVa & ~0xFFF);
            pMdl2->ByteCount = pMdl1->ByteCount;
            pMdl2->ByteOffset = pMdl1->MappedSystemVa & 0xFFF;
            pMdl2->Size = /* some calculation */;
            pMdl2->MdlFlags = 0x0004; // MDL_SOURCE_IS_NONPAGED_POOL
        }

        Buffer->BufferFlags = 0;

        // ...

        pMdl1->Next = NULL;
        pMdl2->Next = NULL;

        // Return the buffer to the lookaside list.
    } else {
        SrvNetUpdateMemStatistics(NonPagedPoolNx, Buffer->PoolAllocationSize, FALSE);
        ExFreePoolWithTag(Buffer->PoolAllocationPtr, '00SL');
    }
}

Upon freeing the buffer, if buffer flags 0x02 (means the buffer is part of a lookaside list) and 0x01 (means the buffer has no transport header) are set, some operations are made on the two MDL objects to add the transport header before resetting the flags to zero and returning the buffer back to the lookaside list. If we set aside the meaning behind the operations on the MDL objects for a moment and look at the operations in terms of memory manipulation, we can notice that the code does a double-dereference-read followed by a double-dereference-write with two variables that we control (the two MDL pointers), which is what we were looking for. The downside is that the content that we want to read from is also modified (lines 13-16, 29), a side effect we hoped to avoid.

Given the above, here’s how we managed to read the AcceptSocket pointer:

1. Prepare buffer A from a lookaside list such that the “User buffer” region is filled with zeros. This buffer will end up holding the pointer that we’ll eventually read.

2. Prepare buffer B from a different lookaside list such that:

  • The pMdl1 pointer points at the address of the HandlerFunctions pointer minus 0x18, the offset of MappedSystemVa in the MDL struct.
  • The pMdl2 pointer points at the “User buffer” region of Buffer A.
  • The Flags field is set to 0x03.

We can override the SRVNET_BUFFER_HDR struct fields by decompressing them from a larger buffer using the technique described in the Observation #2 section of the previous part of the writeup.

3. When buffer B is freed, the following operations will take place:

  • The MDL flags will be read from the second MDL at buffer A. If the MDL_PARTIAL_HAS_BEEN_MAPPED flag is set, MmUnmapLockedPages will be called and the system will likely crash. That’s why we filled the buffer with zeros in step 1.
  • The HandlerFunctions pointer and the memory around it will be modified as depicted here:
+00 |  00 00 00 00 00 00 00 00
+08 |  __ __ __|10 __ __ __ __
+10 |  __ __ __ __ __ __ __ __
+18 |  [+50..................]  <--  HandlerFunctions
+20 |  __ __ __ __ __ __ __ __
+28 |  [-50......] [+50......]
  • The HandlerFunctions pointer and the memory around it will be read as depicted here:
+00 |  __ __ __ __ __ __ __ __
+08 |  __ __ __ __ __ __ __ __
+10 |  __ __ __ __ __ __ __ __
+18 |  ab cd ef gh ij kl mn op  <--  HandlerFunctions
+20 |  __ __ __ __ __ __ __ __
+28 |  qr st uv wx __ __ __ __
  • The “User buffer” region of buffer A will be modified as depicted here: (The orange-colored bytes contain the pointer we want to read. We just need to order them properly.)
+00 |  00 00 00 00 00 00 00 00
+08 |  ?? ?? 04 00 __ __ __ __
+10 |  __ __ __ __ __ __ __ __
+18 |  __ __ __ __ __ __ __ __
+20 |  00 {c}0 {ef gh ij kl mn op}
+28 |  qr st uv wx {ab} 0{d} 00 00

4. Read the AcceptSocket pointer from the “User buffer” region of buffer A.

The good news: we managed to read the pointer. The bad news: we corrupted some data in the SRVNET_RECT struct. Luckily for us, the corruption doesn’t affect the system as long as nothing happens with the relevant connection. When something does happen, e.g. the connection closes, the system crashes. That’s not a problem since we’ll get RCE soon, and we can fix the corruption if we want to. We didn’t implement such a fix in our POC and such fix was left as an exercise for the reader.

After reading the AcceptSocket pointer, we used the same technique to read the srvnet!SrvNetWskConnDispatch pointer. We read the AcceptSocket pointer and not the HandlerFunctions pointer since the array of handler functions is shared between all connections, while the buffer pointed by AcceptSocket is not shared with other connections. Therefore, we can corrupt the latter, affecting the stability of only a single connection.

If we have a copy of the srvnet.sys file used on the target computer, we can just compute the offset of the SrvNetWskConnDispatch pointer in the module locally and subtract the offset from the pointer we read, getting the srvnet.sys module base address as a result. That’s what we did in our POC to keep things simple. One can improve it to be more general. One option that comes to mind is keeping several versions of srvnet.sys locally, and deducing the correct one by the least significant bytes of the read pointer.

Implementing arbitrary read

From the beginning of this research we had a convenient write-what-where (arbitrary write) primitive, but had nothing that allowed us to read memory. We worked hard until now to gain some memory reading abilities, and at this point we felt that we had enough tools to make our life easier and implement a convenient arbitrary read primitive. We began by exploring the possibilities of calling an arbitrary function.

Given that we have the base address of the srvnet.sys module, we can call any of the module’s functions. But what about the function’s arguments? The srv2!Srv2ReceiveHandler function is called by SrvNetCommonReceiveHandler, and the call looks like this:

HandlerFunctions = *(pSrvNetRecv + 0x118);
Arg1 = *(ULONG_PTR)(pSrvNetRecv + 0x128);
Arg2 = *(ULONG_PTR)(pSrvNetRecv + 0x130);
(HandlerFunctions[1])(Arg1, Arg2, Arg3, Arg4, Arg5, Arg6, Arg7, Arg8);

The first two arguments are read from the SRVNET_RECT struct, so we can control them. We don’t have as much control over the other arguments. The x86-64 calling convention specifies that it’s the caller’s responsibility to allocate and free the stack space for the arguments, so even though a 8-arguments function is intended to be called, we can replace the pointer with a function that expects any other amount of arguments, and it will work.

Here are the steps we used to trigger the function call:

  1. Send a specially crafted message so that the connection’s SRVNET_RECT struct pointer will be copied to a buffer we can read.
  2. Send another, valid message, which will reuse the same SRVNET_RECT struct, but don’t close the connection yet. Note that when a connection is closed, the SRVNET_RECT struct is not freed. The SrvNetPrepareConnectionForReuse function is called to reset the struct so that it can be reused for the next connection.
  3. Read the SRVNET_RECT struct pointer that we copied in step 1.
  4. Replace the HandlerFunctions pointer and the arguments using the write-what-where primitive.
  5. Send an additional message over the connection from step 2 so that the function that took the place of srv2!Srv2ReceiveHandler is called.

Now all we had to do was to find a convenient function to copy memory from one location to another, so that we can copy arbitrary memory to the pool buffer we can read from. memcpy comes to mind, and srvnet.sys does have such a function (memmove, to be precise), but this function requires a third argument, the amount of bytes to be copied, which we don’t control. Failing to find a convenient function that requires one or two arguments, we realized that we’re not limited by functions implemented in srvnet.sys, we can also call functions from srvnet’s import table by pointing HandlerFunctions at the right offset. There, we found the perfect function: RtlCopyUnicodeString.

The RtlCopyUnicodeString function gets two UNICODE_STRING pointers as arguments, and copies the content of the source string to the destination string. Unlike C strings which are NULL-terminated, strings in the kernel are defined by the UNICODE_STRING struct which holds a pointer to the string, and the string’s length in bytes. The string buffer can hold any binary data. If you peek at the implementation of RtlCopyUnicodeString, you can see that the copying is done with the memmove function, i.e. plain binary data copying. All we have to do is prepare our two UNICODE_STRING structs and call RtlCopyUnicodeString, then read the copied data:

Executing shellcode

After achieving a convenient arbitrary read primitive, we moved on to the next challenge towards our goal of remote code execution: running a shellcode. We used the technique that Morten Schenk presented in his Black Hat USA 2017 talk (pages 47-51).

The idea is to write a shellcode below the KUSER_SHARED_DATA structure which is located at a constant address, the only address that is not randomized in the kernel memory layout of the recent Windows versions. Then modify the relevant page table entry, making the page executable. The base address of the page table entries in the kernel is randomized, but can be retrieved from the MiGetPteAddress function in ntoskrnl.exe. Here are the steps we used to execute our shellcode:

  1. Use our arbitrary read primitive to get the base address of ntoskrnl.exe from srvnet’s import table.
  2. Read the base address of the page table entries from the MiGetPteAddress function, as described in Morten’s slides.
  3. Write the shellcode at address KUSER_SHARED_DATA + 0x800 (0xFFFFF78000000800). Note that we could also use one of the pool buffers, using KUSER_SHARED_DATA is just more convenient.
  4. Calculate the relevant page table entry address and clear the NX bit to allow execution, as described in Morten’s slides.
  5. Call the shellcode using our ability to call an arbitrary function.

Launching a reverse shell

Technically, we achieved remote code execution, so we could stop here. But if we’re not popping calc or launching a reverse shell, the POC is not complete, so we went on to fill that gap. Since our shellcode runs in kernel mode, we can’t just run cmd.exe or calc.exe and call it a day. We needed to find a way to get our code to run in user mode. While searching for prior work on the topic we found sleepya’s shellcode, written originally for EternalBlue exploits, which is designed to do just that. 

In short, here’s what the shellcode does:

  1. Hook IA32_LSTAR MSR to lower the IRQL (Interrupt Request Level) from DISPATCH_LEVEL to PASSIVE_LEVEL. The shellcode begins execution at the DISPATCH_LEVEL IRQL which imposes several limitations. For more information see the great explanation of zerosum0x0.
  2. Find a privileged user mode process (lsass.exe or spoolsv.exe) and queue a user mode APC in one of the alertable threads that is in waiting state.
  3. In the APC kernel routine, allocate EXECUTE_READWRITE memory and point the APC normal (user mode) routine there. Then copy the user mode shellcode to the newly allocated memory, prepended with a stub to create a new thread.
  4. In the APC normal routine a new thread is created, executing the user mode shellcode.

Published about three years ago, the shellcode didn’t work right away on recent Windows versions, so we had to make a couple of adjustments:

  1. Incompatibility with the KVA Shadow mitigation. In the blog post Fixing Remote Windows Kernel Payloads to Bypass Meltdown KVA Shadow zerosum0x0 explains why the first part of the shellcode, IA32_LSTAR MSR hooking, isn’t supported when the KVA Shadow mitigation is enabled, and proposes a fix. We tried the proposed fix, but it didn’t work on newer Windows versions – zerosum0x0 targeted Windows 10 version 1809 while we were targeting versions 1903 and 1909. The right thing to do is to improve the fix or find another solution, but we just removed the IRQL lowering part. As a result, the POC can sometimes crash the system while trying to access paged memory (bug check IRQL_NOT_LESS_OR_EQUAL), but it doesn’t happen often, so we left it as is since it’s good enough for a POC.
  2. Fixed finding the base address of ntoskrnl.exe. At first, we tried using zerosum0x0’s method – get an address of the first ISR (Interrupt Service Routine), which is located in ntoskrnl.exe, and search for a nearby PE header. The method didn’t work for us since the ISR pointer points to ntoskrnl’s INITKDBG section which is not mapped. Since we already found the ntoskrnl.exe base address, we fixed it by just passing it as an argument to the shellcode.
  3. Fixed a problem with finding the offset of ETHREAD.ThreadListEntry. The original code looked for the current thread in the thread list of the current process. The thread won’t be found if the current thread is attached to a different process than the one it was originally created in (see KeStackAttachProcess).
  4. Fixed the UserApcPending check in the KAPC_STATE struct for Windows 10 version R5 and newer. Since Windows 10 version R5 UserApcPending shares a byte with the newly added bit value, SpecialUserApcPending.

With the above fixed, we finally managed to make the shellcode work, we just needed to fill in the user mode part of the code to run. We used MSFvenom, the Metasploit payload generator, to generate a user mode shellcode to spawn a reverse shell.

Targets with more than one logical processor

In the Observation #1 section of the previous part of the writeup we assumed that our target has only one logical processor. With this assumption, we could rely on the lookaside lists buffer reusing, knowing that we get the same buffer every time as long as the allocation size is the same. As a reminder, the lookaside lists are created upon initialization, a list for each size and logical processor, as depicted in the following table:

→ Allocation size

Logical Processor
0x1100 0x2100 0x4100 0x8100 0x10100 0x20100 0x40100 0x80100 0x100100
Processor 1 📝 📝 📝 📝 📝 📝 📝 📝 📝
Processor 2 📝 📝 📝 📝 📝 📝 📝 📝 📝
Processor n 📝 📝 📝 📝 📝 📝 📝 📝 📝

Each cell with the “📝” symbol is a separate lookaside list.

With more than one logical processor, things are a bit more complicated – we get the same buffer only as long as the allocation is made on the same logical processor. Our first attempt at overcoming this limitation was redundancy. When writing to one of the lookaside list buffers, write multiple times. When reading from one of the lookaside list buffers, read multiple times and choose the most common value. This approach would work if the logical processor usage was distributed evenly, but we found that it’s not the case. We tested our POC in VirtualBox, and from our observations, some logical processors are preferred over others. For a setup of 4 logical cores, here’s the distribution of handling the incoming packet in a test execution:

Logical processorIncoming packets handled
Logical processor 10.2%
Logical processor 20.8%
Logical processor 37.9%
Logical processor 491.1%

Here’s the distribution of handling the decompression:

Logical processorDecompressions executed
Logical processor 113.3%
Logical processor 25.1%
Logical processor 36.8%
Logical processor 474.8%

As you can see, in this specific case logical processor 4 did most of the work. Logical processor 1 handled only 1 out of every 500 incoming packets!

We tweaked the POC such that it sends several packets simultaneously from multiple threads to improve the logical processor usage distribution. We also added error detection, so that if the data that is read doesn’t make sense, another reading attempt is made instead of proceeding and most likely crashing the system. The changes we made were enough to make the POC work with VirtualBox targets with multiple logical processors, but from a quick test the POC doesn’t work with VMware targets or (at least some) physical computers with multiple logical processors. We didn’t try to improve the POC further to support all targets, which we believe can be achieved with a better strategy for a reading and writing order.

Our POC with the improvements can be found in the GitHub repository.

If you’d like to study the code, we suggest starting with the initial, less noisy version which was designed for a single logical processor. It can be found in a previous commit here.

ZecOps Detection

ZecOps classify forensics logs related to this issue as #SMBGhost and #SMBleed. You can find more information on how to use ZecOps solutions for Endpoints & Servers, Mobile devices, or applications. Besides SMBleed / SMBGhost, ZecOps Crash Forensics solutions can find other, previously unknown vulnerabilities, that are exploited in the wild. If you care about persistent threats – we’ll be happy to assist.

Remediation

You can remediate the impact of both issues by doing one of the following:

  • Applying the latest security issues (recommended)
  • Block port 445 / enforce host-isolation
  • Disable SMBv3.1.1 compression

Summary

This is the third and final part of the writeup, in which we used the findings from the previous parts to achieve RCE using SMBGhost and SMBleed. We hope you enjoyed the read. Here’s a recap of the milestones during our research on the SMB bugs:

  1. A write-what-where primitive, demonstrated in our previous research about achieving local privilege escalation.
  2. The discovery of the SMBleed bug, described in the first part of the writeup.
  3. An ability to read memory from the pool buffers allocated by the SrvNetAllocateBuffer function, demonstrated in Part II: Unauthenticated Memory Read – Preparing the Ground for an RCE.
  4. An ability to get the base address of the srvnet.sys module.
  5. An ability to call an arbitrary function.
  6. Arbitrary memory read.
  7. Shellcode execution.

ZecOps Mobile XDR is here, and its a game changer

Perform automated investigations in minutes to uncover cyber-espionage on smartphones and tablets.

LEARN MORE >

Partners, Resellers, Distributors and Innovative Security Teams

ZecOps provides the industry-first automated crash forensics platform across devices, operating systems and applications.

LEARN MORE >

SHARE THIS ARTICLE