I first encountered the concept of using direct system calls to bypass user-land API hooking a little more than a year ago when I read a blog post by Cornelis De Pla (@Cn33liz). It is an excellent post and worth a read as he does an amazing job explaining exactly what system calls are and how they work. At the time however, it was well beyond my ability to understand and for a long time I didn’t even try to implement syscalls into my own tools. In the last few months, I have been digging into more of these lower level concepts and I have realized that this is a technique that anyone can pretty easily implement into their existing tools without too much of a background in C/C++ (I will add some references at the bottom on how to do this in C# as well).
Using direct syscalls in offensive tools is not a new concept at this point. Many EDR products use userland API hooks to monitor suspicious in-memory behavior. As these things go, that means there has been an equal amount of talk about how to unhook, blind, or evade these products. When Cn33liz wrote his blog post and released Dumpert he was writing the syscalls into .asm files that were then included as part of the project.
This method comes with a few issues. First, it is very common for syscall numbers to change between OS versions. @J00ru does an excellent job of tracking these in his online syscall tables, but you still need to track and implement each version that you want to bring along. @Jackson_T went a long way to help ease this by creating SysWhispers, which automates the generation of all the structs and syscalls as well as builds in checks for the operating system you are executing on.
However, the second issue with this method is that you are bringing a lot of extra “stuff” with your PE. This increases the size of your payloads and also adds a lot of detection surface (there really isn’t a “good” reason for you to have any of that in there). Enter the “Hell’s Gate” technique.
“Hell’s Gate” is a term coined by @am0nsec and @RtlMateusz in this paper : https://vxug.fakedoma.in/papers/VXUG/Exclusive/HellsGate.pdf . The basic concept is to find the syscalls dynamically on the host by reading through ntdll.dll and then call them from your own custom implementation.
Let’s look at a common shellcode injection method and translate it into direct system calls using Hell’s Gate.
Above we have a very streamlined version of the most famous shellcode injection method. Open the target process, allocate space for our shellcode, write our shellcode into that space, change the permissions of that space to be executable, and finally start a new thread that kicks off the shellcode.
The first thing we need to do when translating an existing tool (or writing your own) into syscalls is figure out what the underlying syscall is. There are a couples ways to do this, in this case I set a break point on the API call (OpenProcess), viewed the disassembly, and step into the disassembly until I reach this point:
Here we end up in NtOpenProcess (you know you are in the right place because of the 0F 05 syscall line). The section in the red box is where we find the syscall number (26 for NtOpenProcess).
We will go through this for each API we want to replace. In this case we have:
OpenProcess -> NtOpenProcess
VirtualAllocEx -> NtAllocateVirtualMemory
WriteProcessMemory -> NtWriteVirtualMemory
VirtualProtectEx -> NtProtectVirtualMemory
CreateRemoteThread -> NtCreateThreadEx
Now we need to define our custom functions for each of these. You can find what these should look like at http://undocumented.ntinternals.net/ .
We create our own system calls with the same structure :
To save you a bit of pain (if you have come this far you are going to have a whole bunch of errors with that function since you need to define OBJECT_ATTRIBUTES AND CLIENT_ID structs) I would recommend finding all the syscalls you need, then using the amazing tool Syswhispers to get JUST the header file that it generates. It will have all the necessary structs and the building blocks for all your syscalls. But it is good to know how / why it generated those things in case you need to troubleshoot something not working (I have had to make small changes to Syswhispers output in the past).
Now we have defined our custom syscalls, we need to map them to the correct code on the host. There are actually a few different ways to do this, all with pros and cons.
The original Hell’s Gate paper uses the ntdll that is loaded into the process on start up to find and map the syscalls. In testing this I found that EDRs have started overwriting the function with their own hook. As an example this is what our syscall op codes for NtOpenProcess should look like on a Win 10 system:
0x4c 0x8b 0xd1 0xb8 0x26 0x0 0x0 0x0 0xf6 0x4 0x25 0x8 0x3 0xfe 0x7f 0x1 0x75 0x3 0xf 0x5 0xc3
This is what it looks like when run on a system running a popular EDR:
0xe9 0x73 0x3d 0xfe 0xbf 0xcc 0xcc 0xcc 0x6 0x4 0x25 0x8 0x3 0xe 0x7f 0x1 0x75 0x3 0xf 0x5 0xc3
It doesn’t matter if that doesn’t mean anything to you, the important thing is that the first 8 bytes are different, and they have overwritten the syscall number (0x26 from above).
So, I had to explore other options. In this post we are going to read the bytes of ntdll.dll from disk, thus bypassing any hooks that may have been placed by the EDR. Other options include using NtMapViewOfSection or manually mapping ntdll.dll (credit to @TheRealWover for recommending these). Reading directly from the file on disk is the most straight forward and was not detected when testing so we will stick with that for this post.
At this point I want to direct you all to an overall excellent blog : ired.team written by Mantvydas Baranauskas (@spotheplanet) . Every post there is worth reading. For what I’m doing now I felt his code was cleaner and easier to understand than anything I would write so the following PoC code is directly from this page https://www.ired.team/offensive-security/defense-evasion/retrieving-ntdll-syscall-stubs-at-run-time .
The idea here is that we read ntdll.dll from disk (note that this is very unusual behavior and a pretty good IoC for this technique) and map the various PE sections. It is good to have an understanding of standard PE format but that can be learned from other linked posts or Google as the focus here is implementation.
Now, we need to find the syscall op codes and point our custom function at them. First, our function is assigned to an array which we will fill with the opcodes that we will copy from ntdll.
The opcodes are found by iterating through the exported functions from ntdll.dll (found in .rdata from above). Once we find the function, the first 21 bytes from the pointer to the function are copied into our array.
With me so far? If you didn’t understand any of the last part, that is OK 😁 . I would recommend digging into it as there are a lot of important concepts there from both an offensive and defensive perspective. However, the important thing to get from this is we are reading ntdll.dll from disk to avoid in-memory hooks, finding the system call bytes for our API calls, and then mapping those bytes to our own custom version of that function so that when we call our custom function it executes those bytes.
CALLING OUR FUNCTIONS
Now all we do is call our new shiny custom functions as if we were calling the normal function. Implementing the new functions is sometimes a bit confusing, for instance here is our new NtOpenProcess.
As you can see, this is much more complicated than the original OpenProcess. Thankfully, there are many examples online (game hacking forums and driver development forums can be very helpful). Also, once you have done a few of these they are mostly pretty intuitive.
So….why did we go through all this effort? What exactly is this accomplishing? EDRs frequently use userland hooks of common APIs to detect malicious actions.
As an example, let’s execute the original injection code and monitor the process with API Monitor v2, this will give us a good idea of what can be seen by userland hooks.
As you can see, all of our API calls are visible, along with everything that is passed to them as arguments. Now, let’s do the same thing with our new shiny injector.
Nothing! Yay, we are invisible! Totally FUD! OK, not really. As with all things this is just a piece of the puzzle. However, real threat actors / malware are using these techniques which means you should be too!
This technique does help bypass userland hooks, but does nothing for monitoring done at the kernel level (as a test, run the PoC against Sysmon while monitoring for CreateRemoteThread). Also, every method of using direct syscalls does have IoCs so it is important to understand the pros and cons of each method. As with any PoC, this is not meant for operational use as there is still a fair bit of “OpSec unsafe” code in there.
Join the red team! We have cookies! But, if you hate cookies, kernel level defenses will still see all of this since there is nothing different happening once our code leaves ring 3. Also, each method of bringing / mapping syscalls has its own indicators. A really good post on IoCs for this type of technique is https://www.cyberbit.com/blog/endpoint-security/malware-mitigation-when-direct-system-calls-are-used/ . Also, though I haven’t seen anything on this yet, the syscalls are mapped into executable space so there may be some detection opportunities there?
Special thanks to Tim McGuffin (@NotMedic) for helping me out with testing various mapping / execution methods and @TheRealWover for his input and advice on the same.