OSX Dynamic Libraries and Code-Caves

I recently wrote about reading and writing memory on OSX using the mach_vm operations. I also recently published a tool called dylib_injector that injects dynamic libraries into another application's memory. However, I never gave a reason for why you would want to do the latter. Hopefully this rectifies that issue.

Scope

In programming, scope generally applies to how you declare and can use your variables:

void foo()
{
    int i;
    
    for( int j = 0; j < 10; j++ )
    {
        //both i and j can be accessed here
    }
    
    //j can no longer be accessed
}

Modifying memory is similar. When using the mach_vm operations, you are executing within your original program's own scope and requesting access to another program's memory. It's powerful, but also limited.

To clarify, the only way to redirect a function call in the target to your program is to write shellcode to a place in the target's memory and then modify the target's code to call that location. Such a method is called a memory code-cave and works for small changes - like setting a register or changing an argument to an internal function - but it becomes unwieldy for more complex operations.

Dynamic Libraries

By injecting a dynamic library inside another program, we are working in that program's scope. For example, we no longer need to request access to modify a piece of memory, we can just declare a pointer to it:

int *health = (int*)(0xDEADBEEF);
*health = 100;

That's a lot easier on its own, but the real power is injecting what's called an external codecave. Since we are in the target's scope, we can declare functions in the library and have the target call them. You can see an example of a code-cave in the aimbot for CS2D I recently released here. However, to avoid the other code from interferring, I will break down the parts below.

Example Target

In these examples, our target will be as follows:

    0x00105c24 cmp dword [0x313820], 0
    0x00105c2b jne 0x105fba
    0x00105c31 call 0x6c038

This is taken from CS2D and is the function responsible for rotating the player.

It's also to important to note the different asm syntaxes. Most decompilers output intel asm whereas gcc compiles using (by default) at&t syntax. A good cheatsheet for intel is here and at&t here.

The function

The hook function itself is pretty basic:

void hook() 
{
    __asm__( "pushal" );
    
    __asm__( "popal" );
    //__asm__( "original instruction" );
}

pushal and popal are the at&t equivalents to pushad and popad. They, respectively, save all the registers to the stack and then restore them all. It's not necessary, but it helps prevent crashes when you start modifying registers inside your codecave unintentionally.

If you do want to change the value of a register, then these will need to be removed.

The "original instruction" is going to be whatever instruction you plan to write over. In this case, it is going to be the call at 0x00105c31:

    __asm__( 
        "mov $0x6c038, %eax\n\t"
        "call *%eax\n\t"
    )

at&t syntax does not have a direct way to call an address, so we have to move our address into eax and call it.

The hook

To understand the hook, we have to a quick primer on opcodes. This is purely going to cover 32-bit opcodes and needs to be modified for 64-bit programs.

The general build process of a program is:

  1. High-Level-Language compiles to assembly.
  2. The assembly is linked against the system libraries to produce an executable.

The executable itself is a series of bytes. These bytes located in the code segment (as opposed to the data segment), form together to represent instructions. For example, 6a 00 is the machine's representation of push 0.

For our purposes, we are intrested in two attributes of this complex system: the length of instructions and how the opcodes for calls are generated.

Length

On Intel-based x86 architectures, instructions are variable-length. That means opcodes for different instructions are different lengths of bytes. For example, push ebp is 55, or 1 byte. As we saw above, push 0 is 6a 00 or 2 bytes.

We care about this because a call instruction is 5 bytes and not accounting for this length will shift all the opcodes and give you invalid instructions. Consider the following example:

push ebp
push 0
mov [esp+0x4], ecx

The opcodes:

55
6a 00
89 4c 24 04

It's important to remember that opcodes are not broken up but in one continuous stream. For this example we will keep them broken up to illustrate the problem.

If we are to write a call over the first instruction, the following is the result:

e8 00 00 00 00 //we will see how these are calculated later
04

04, however, is not a valid instruction and will cause the program to crash.

To get around this, we have to pad out invalid instructions with no-operations (NOPs). The opcode for a NOP is 0x90 and it's only effect is to increase the instruction pointer by 1 to the next instruction.

However, this is another instruction you need to account for in your hook, so it is good to try and find five-byte length instructions as your hook locations. Failing that, attempt to avoid hooking any instructions that modify the stack (e.g., push ebp, mov ebp, esp, etc.) to avoid the headache of balancing the stack manually.

Calls

The opcode for a call is e8 followed by four bytes that designate the instruction to call. Calls differ from jumps in that they place the calling location on the top of the stack before adjusting the instruction pointer to the offset being called. On 32-bit, the equation to calculate these bytes is:

address of function being called - current location - length_of_call_instruction( 5 )

The hook, for real

Knowing this, we can now write our hook.

    unsigned int *patch_address = (unsigned int*)0x00105c31;

    vm_protect( mach_task_self(), (unsigned int)patch_address, 5, 0, VM_PROT_WRITE | VM_PROT_EXECUTE | VM_PROT_READ );

    *patch_address = 0xe8;
    *(patch_address + 1) = (unsigned int)(&hook - 5) - (unsigned int)patch_address;

Since we are writing to a code section (which normally have the VM_READ and VM_EXECUTE protections), we first need to set it to allow writes using vm_protect. If we don't, when writing, the KERN_INVALID_ACCESS exception will be thrown.

Next we set our first byte to 0xe8, the opcode for a call. In this example, this is unnecessary, but I left it in to demonstrate in case you hook a location that is not a call.

Finally we use our equation to calculate the remain opcodes for our hook. Really the - 5 could be anywhere.

With this, you can do anything you want with a target process.