flare-emu

The analysis function of flare-emu has been further expanded

IDAPython library flare-emu team newly developed a library, this library is dependent on IDA Pro and Unicorn simulation framework, and to allow reverse engineers to simulate the function of the code through scripts, Unicorn supports x86, x86_64, ARM and ARM64 architecture.

provides an easy-to-use and flexible interface for user’s script simulation, aiming to set up all the basic work of flexible and robust simulators for different architectures, so that you can focus on solving code analysis problems.

5 different interfaces

It currently provides 5 different interfaces to deal with your code simulation needs, and it also has a series of related help and tool functions.

1.emulateRange

This API can simulate a series of instructions or functions in a user-specified context. For user-defined hooks, it can be used for a single instruction, or when calling a call instruction, the user can decide that the simulator is a single step skip Still single step (enter function call). This interface provides users with an easy way to specify values ​​for specified registers and stack parameters. If a byte string is specified, it is written into the simulator’s memory and the pointer is written into a register or stack variable. After the simulation, the user can use the utility function of to read data from the simulation memory or register, or use the returned Unicorn simulation object for direct detection. 

In addition, also provides a small package for emulateRange, named emulateSelection, which can be used to simulate some of the commands currently highlighted in IDA Pro. If flare-emu does not provide some functions you need, you can directly use the returned Unicorn simulator object.

2. iterate

This API is used to force a specific branch in the simulation function to achieve the expected running path. The user can specify a target address list, or specify the address of a function from which the cross-reference list of the function is used as the target, and a callback for reaching the target. The program will execute to the given target address, although the current conditions may jump to other branches.

Similar to the emulateRange API, it also provides user-defined hook options for individual instructions and when a “call” instruction is encountered. An example of the iterate API is to implement functions similar to our argtracker tool.

3.emulateBytes

This API provides a simple way to simulate an external shellcode. The provided bytes will not be added to the IDB, but will be directly simulated and executed. This is very useful for preparing the simulation environment. For example, flare-emu itself uses this API to manipulate the model specific registers (MSR) of the ARM64 CPU, which is not exposed by Unicorn in order to enable vector floating point (VFP) instructions and register access. Like emulateRange, if flare-emu does not provide some functions you need, you can directly use the returned Unicorn simulator object.

4.iterateAllPaths

This API is very similar to iterate. It only provides the target function and not the target address. It will try to find all paths and simulate. This is very useful when performing code analysis, because code analysis requires access to every basic block of the function.

5.emulateFrom

This API is very useful when the function boundary is not clearly defined, because the obfuscated binary file or is usually defined this way. As long as you provide a starting address, it will simulate until there is nothing to simulate, or you stop the simulation in one of the hooks. This can be called by setting the strict parameter to False, and enabling dynamic code to detect, flare-emu will let IDA Pro execute instructions during the simulation.

An installation

To install flare-emu, just put flare_emu.py and flare_emu_hooks.py into IDA Pro’s python directory, and then import them as modules in IDApython scripts. flare-emu depends on Unicorn and its Python bindings.

Precautions

Flare-emu is written using the new IDA Pro 7x API, and it is not compatible with previous versions of IDA Pro.

Specific usage

Although flare-emu can be used to solve many different code analysis problems, one of its more common uses is to help researchers decrypt strings in malware binary files. Here, let’s briefly mention FLOSS. FLOSS is a good tool, which can usually be performed automatically by trying to identify the string decryption function and using simulation to decrypt the string passed in each cross-reference. However, FLOSS is not always able to recognize these functions and simulate them correctly using its general methods. Sometimes you need to do more work. At this time, it is the opportunity for flare-emu to show off. As long as you use it smoothly, flare-emu can save you a lot of time. Below, let’s take a look at some common scenarios encountered by malware analysts when processing encrypted strings.

Simple string decryption scenario

If you have determined the function to decrypt all the strings in the x86_64 binary file, this function will be called everywhere and decrypt many different strings. In IDA Pro, you can name this function decryptString. Below is your flare-emu script, used to decrypt all these strings, and use the decrypted string as a comment every time the function is called, and record each decrypted string and its decrypted address.

from __future__ import print_function
import idc
import idaapi
import idautils
import flare_emu

def decrypt(argv):
    myEH = flare_emu.EmuHelper()
    myEH.emulateRange(idc.get_name_ea_simple("decryptString"), registers = {"arg1":argv[0], "arg2":argv[1], 
                           "arg3":argv[2], "arg4":argv[3]})
    return myEH.getEmuString(argv[0])
    
def iterateCallback(eh, address, argv, userData):
    s = decrypt(argv)
    print("%016X: %s" % (address, s))
    idc.set_cmt(address, s, 0)
    
if __name__ == '__main__':   
    eh = flare_emu.EmuHelper()
    eh.iterate(idc.get_name_ea_simple("decryptString"), iterateCallback)

In __main__, we first create an instance of the EmuHelper class from flare-emu, which is the class where we use flare-emu to do everything. Next, we use iterate API to provide it with the address of the decryptString function and the name of the callback function. EmuHelper will call the callback function for each cross-reference simulated.

The iterateCallback function receives the EmuHelper instance named eh, and the address of the cross-reference, the parameters passed to this particular call, and a special dictionary named userData here. Although userData is not used in this simple example, it is treated as a persistent context of the simulator, where you can store your own custom data. But be careful, because flare-emu itself also uses this dictionary to store the key information needed to perform the task. . One of the data fragments is the EmuHelper instance itself, which is stored in “EmuHelper”. If you are interested, you can search the source code to learn more about this dictionary. This callback function only calls the decrypt function, prints the decrypted string, and creates a comment for it at the calling address of decryptString.

decrypt creates a second instance of EmuHelper to simulate the decryptString function itself, which will decrypt the string for us. The prototype of this decryptString function is as follows: char * decryptString (char * text, int textLength, char * key, int keyLength). It simply decrypts the string. Our decrypt function passes the parameters received by the iterateCallback function to our call to EmuHelper’s emulateRange API. Since this is an x86_64 binary file, the calling convention uses registers to pass parameters instead of stacks. flare-emu automatically determines which registers represent which parameters based on the binary file architecture and file format determined by IDA Pro, so you can write some architecture-independent code. If this is a 32-bit x86, you can use the stack parameter to pass parameters as follows: myEH.emulateRange(idc.get_name_ea_simple(“decryptString”), stack = [0, argv[0], argv[1], argv [2], argv[3]]). The first stack value is the return address in x86, so we only use 0 as the placeholder value here. After the simulation is complete, we call the getEmuString API to retrieve the null-terminated string stored in the memory location, which is specified by the first parameter passed to the function.

Simulation function

emulateRange(startAddr, endAddr=None, registers=None, stack=None, instructionHook=None, callHook=None, memAccessHook=None, hookData=None, skipCalls=True, hookApis=True, strict=True, count=0): simulation The range of instructions from startAddress to endAddress does not include endAddress instructions. If endAddress is None, the simulation will stop when a “return” type command is encountered in the same function where the simulation started.

Registers: Registers is a dictionary, where the key is the register name and the value is the register value. The key is a concept in the registry in windows. The key value is located at the end of the registry structure chain, similar to the file of the file system, and contains the actual configuration information and data used when the current computer and application programs are executed. The key value contains several data types to adapt to the needs of different environments. Some special register names are created by flare-emu and can be used here, such as arg1, arg2, ret and pc.

Stack: A stack is a set of values ​​pushed on the stack in reverse order, just like the parameters of a function in x86. In x86, remember to use the first value in this array as the return address of the function call, not the first parameter of the function. flare-emu will initialize the context and memory of the simulated thread according to the values ​​specified in the registers and stack parameters. If a string is specified for any of these values, it will be written to a certain location in memory, and the pointer to that memory will be written to the specified register or stack location.

instructionHook: instructionHook can be defined as a function called before simulating each instruction. Its prototype is as follows: instructionHook(unicornObject, address, instructionSize, userData).

callHook: callHook can be defined as a function called before simulating each instruction. Its prototype is as follows: callHook (address, arguments, functionName, userData).

hookData: hookData is a dictionary that contains user-defined data, which can be used in hook functions. It is a way to persist data throughout the simulation process. Flare-emu also uses this dictionary for its own purposes, so care must be taken not to define defined keys. Since it is named in Unicorn, this variable is usually named userData in user-defined hook functions.

skipCalls: skipCalls will cause the simulator to skip “call” type instructions and adjust the stack accordingly. The default is True.

hookApis: hookApis causes flare-emu to perform simple implementations of some of the more common runtime and OS library functions it encounters during the simulation process, which saves you from worrying about calling memcpy, strcat, malloc and other functions, and the default is True.

memAccessHook: memAccessHook can be defined as a function, which is called when accessing memory for reading and writing. Its prototype is as follows: memAccessHook(unicornObject, accessType, memAccessAddress, memAccessSize, memValue, userData).

Strict: When set to True (default), check the branch target to ensure that the disassembler expects instructions. Otherwise it will skip the branch instruction. If set to False, flare-emu will simulate them in IDA Pro.

Count: count is the maximum number of instructions to be simulated, the default value is 0, which means there is no limit.

iterate(target, targetCallback, preEmuCallback=None, callHook=None, instructionHook=None, hookData=None, resetEmuMem=False, hookApis=True, memAccessHook=None): For each target specified by target, execute a separate simulation from the beginning. Function until the target address. The simulation will be forced to follow the necessary branches to reach each goal. The target can be the address of the function, in which case the target list will be filled with all cross references for the specified function. Alternatively, target can be an explicit target list.

targetCallback: targetCallback is a function you create that will be called by flare-emu for each target found during simulation. Its prototype is as follows: targetHook(emuHelper, address, arguments, userData).

preEmuCallback: preEmuCallback is a function you create that will be called before the simulation of each target starts. If needed, you can implement some setup codes here.

resetEmuMem: resetEmuMem will cause flare-emu to reset the simulation memory before the simulation of each target starts, the default is False.

iterateAllPaths(target, targetCallback, preEmuCallback=None, callHook=None, instructionHook=None, hookData=None, resetEmuMem=False, hookApis=True, memAccessHook=None, maxPaths=MAXCODEPATHS, maxNodes=MAXNODESEARCH): For functions containing address targets, Perform a separate simulation for each path found through it, up to maxPaths.

maxPaths: maxPaths will search and simulate the maximum number of paths of the function. Some more complex functions may cause the graph search function to take a long time or never complete. Adjust this parameter within a reasonable time to meet your needs.

maxNodes: The maximum number of basic blocks to be searched when searching for a path through the objective function. This is a safety measure to prevent unreasonable search time and hangs.

emulateBytes(bytes, registers=None, stack=None, baseAddress=0x400000, instructionHook=None, hookData=None): If possible, write the code contained in the byte to the emulation memory of baseAddress, and simulate from the beginning of the byte All instructions to the end.

emulateFrom(startAddr, registers=None, stack=None, instructionHook=None, callHook=None, memAccessHook=None, hookData=None, skipCalls=True, hookApis=True, strict=True, count=0): This API is at the function boundary It is very useful if it is not clearly defined, and ambiguous binary files or are usually defined this way. If you provide a start address as startAddr, it will simulate until there is nothing to simulate or you stop the simulation in one of the hooks. This can be called by setting the strict parameter to False to enable dynamic code discovery. In addition, flare-emu will encounter IDA Pro commands during the simulation.

Utility Function

The following is an incomplete list of some useful utility functions provided by the EmuHelper class.

hexString(value): The returned value is a hexadecimal formatted string for logging and printing statements;

getIDBString(address): Returns the string at the address in IDB, up to the null terminator. However, the characters are not necessarily printable, and are used to retrieve strings without emulation context.

skipInstruction(userData, useIDA=False): Call this function from the simulation hook to skip the current instruction and move the program counter to the next instruction. The useIDA option has been added to handle the situation where IDA Pro folds multiple instructions into one pseudo-instruction, and you want to skip all these instructions. This function cannot be called multiple times from a single instruction hook to skip multiple instructions. To skip multiple instructions, it is recommended not to write the program counter directly when simulating the ARM code, because this may cause problems in the thumb mode. So, please try to use EmuHelper’s changeProgramCounter API (described below).

changeProgramCounter(userData, newAddress): Call it from the simulation hook to change the value of the program counter register. This API is responsible for the thumb mode tracking of the ARM architecture.

getRegVal(registerName): Retrieve the value of the specified register, sensitive to sub-register addressing. For example, “ax” will return the lower 16 bits of the EAX/RAX register in x86.

stopEmulation(userData): Call it from the simulation hook to stop the simulation, use it instead of calling the emu_stop Unicorn API, so that the EmuHelper object can handle the bookkeeping related to the iterate function.

getEmuString(address): Returns a string located at an address in the simulated memory, until the null terminator, the character may not be printable.

getEmuWideString(address): Returns a “wide character” string at an address in the simulated memory, up to a null terminator. “Wide character” here refers to any byte sequence that contains a null byte every other byte, just like the UTF-16 LE encoded ASCII string, the character may not be printable.

getEmuBytes(address, length): returns the byte string at the address in the simulated memory;

getEmuPtr(address): returns the pointer value at a given address;

writeEmuPtr(address): write the pointer value to the given address in the simulated memory;

loadBytes(bytes, address=None): allocate memory in the simulator and write bytes into it;

isValidEmuPtr(address): If the provided address points to a valid simulated memory, it returns True;

getEmuMemRegion(address): Returns a tuple containing the start and end addresses of the memory region that contains the address provided, or None if the address is invalid.

getArgv(): Call it from the analog hook of the “call” type instruction to receive the parameter array of the function.

addApiHook(apiName, hook): Add a new API hook for this EmuHelper instance. Whenever a call instruction to apiName is encountered during simulation, EmuHelper will call the function specified by the hook. If hook is a string, then it should be the name of the API that EmuHelper has suspended, in this case, it will call its existing hook function. If the hook is a function, it will call the function.

addApiHook(apiName, hook): allocate enough emulator memory to contain the size bytes, it tries to execute the requested address, but if it overlaps with the existing memory area, it will allocate in the unused memory area and return the new address . If the address is not page aligned, it will return an address that will maintain the same page alignment offset in the new area. For example, when 0x1000 has been allocated, requesting address 0x1234 may make it allocated to 0x2000 and return 0x2234.