Ready to Start Your Career?

By: current_user
February 4, 2016
Intro to Win64 Assembly and Process Dumping

By: current_user
February 4, 2016
Recently I've checked out the "Intro to Malware Analysis and Reverse Engineering" course by Sean Pierce. Inspired by his contribution and taking a rest from my current activities, I also decided to share something with you. What I noticed is that Sean references rather outdated tools in his videos. Windows XP? Really? Well, it's a pretty good debugger, I can't argue. But its development is so slow, that I'm afraid my grandchildren will turn gray before they see x64 version go live. There were times long ago when SoftICE ruled the world. Yeah, it was (and it surely is) the undisputed God of all debuggers. Times change, however. All becomes history. Import REC as import table reconstructor has also overgrown with moss, not being able to work in 64-bit world. What are the alternatives? CHimpREC, Scylla. But they don't always work as expected, too. Well, what I'm trying to say here is that there may (and definitely will) be time when your handy tool can't fulfill its purpose any longer; when it just fails, leaving you barehanded. What will you do when your hammer breaks? Will you wait for someone to fix the hammer for you, or will you forge the new, better one yourself? Just remember, that a hacker is not the one, who has mastered the art of using tools, but is able to build the tools themselves. For reverse engineering (and malware analysis) knowledge of assembly language is vital. You should have at least some basic understandings of it to follow what's written further. Get Intel 64 and IA-32 Architectures Software Developer Manuals[1] and AMD64 Architecture Programmer's Manuals[2]. Read 'em. Learn 'em. Meanwhile, I'll give you a short introduction to Win64 assembly and process memory dumping.
Programming in assembly for Windows is very simple. Operating system provides you with an Application Programming Interface (API) -- a set of functions, scattered across dynamic-link libraries (DLLs) like Kernel32.dll, User32.dll, etc. You just link the required libraries to your application and call functions they provide. There’re a lot of different assemblers in the world, but I would recommend you use flat assembler[3]. This is a very fast and flexible assembler with an extremely powerful macroinstructions support. Just try it. You'll love it, I'm sure. Not worth mentioning, that you should get familiar with PE file format. Get Microsoft PE and COFF Specification[4]. There's also a great document about portable executables by Bernd Luevelsmeyer[5], the one I myself studied and referenced while learning the subject in the past.
Win64 application source code template for flat assembler
Win64 application source code template replicates the standard portable executable structure: format pe64entry start section '. text' code readable executable start:; {here goes executable code}section '. data' data readable writeable; {here goes data}section '. idata' import data readable writeable; {here goes import table}The first `format pe64` directive tells fasm to produce PE32+ executable image. It can be followed by additional `console` or `gui` keywords to explicitly specify Windows Subsystem: character or GUI respectively. `entry {label}` directive defines address of entry point. `section` directive, followed by name and flags, defines a new PE section. For example, `section '. text' code readable executable` from theabove template will add a new section with name ". text" and IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE flags. Any number of sections may be defined as needed. If you need export table, you can define export section for it:section '. edata' export data readable If you need a section for resources, you can add it, too:section '. rsrc' resource data readable. And so on, and so forth. You get the idea. Please refer the flat assembler programmer's manual (comes with fasm) for more details. Save the above template as "stub. asm" or whatever name you call it and compile:> fasm stub. asmA 512-byte, empty, containing only PE header, . exe file will be produced. Feel free to investigate it with some PE Info tool. fastcall -- Windows x64 calling convention-----------------------------------------------The first four arguments with size of 8, 16, 32, or 64 bits are passed(in order left to right) in registers RCX, RDX, R8, and R9. Arguments five and higher are passed on the stack. All arguments are right-justified in registers. The caller is responsible for allocating "shadow" space on the stack for parameters to the callee and must always allocate sufficient space for the four register parameters, even if the callee doesn’t have that many parameters. This means, that you always need to reserve 32 bytes on the stack before calling an API function, even if it has less than four params. Keep in mind, that for functions with more than four parameters, shadow space must be reserved *after* parameters five and above have been pushed to stack. Stack pointer and malloc or alloca memory must be aligned to 16 byte. Results are returned in RAX register. Those are the basic rules. More detailed description is available on MSDN[6].
My first Win64 app in assembly
Enough theory, let's code! The first program will simply show a message box (WinAPI function MessageBox) and exit (WinAPI function Exit Process). MSDN WinAPI Reference gives the following syntax for Message Box:int WINAPI MessageBox(_In_opt_ HWND hWnd, _In_opt_ LPCTSTR lpText, _In_opt_ LPCTSTR lpCaption, _In_ UINT uType);Pay attention to requirements:DLL: User32.dllUnicode and ANSI Names: Message BoxW (Unicode) and MessageBoxA (ANSI)Unicode and ANSI Names gives actual names of the function as defined inDLL export table. It means, that Message Box function doesn't actually existin nature. It's an alias to either Message BoxA or Message BoxW. You could already have guessed, that "A" suffix in the name of the function stands for"ANSI"; "W" stands for "Wide". You should always consider using Unicode variants of Windows API functions when coding in year 2016+ unless you absolutely must revert to ANSI functions for some strange reason. Throughout the text I'll always refer to Unicode variants, even when not explicitly specifying W suffix. As you can see from the syntax, the function accepts four parameters. According to the fast call calling convention described above, the first four parameters are passed in RCX, RDX, R8, and R9 registers:- hWnd is optional and can be NULL (0), goes to RCX;- lp Text is a pointer to a zero-terminated message text string, goes toRDX;- lpCaption is a pointer to a zero-terminated message box caption string, goes to R8;- uType defines type of the message box and buttons it contains, goes toR9. First, let's add message and caption to display on the message box. This is initialized data and should be stored in the corresponding PE section:section '. data' data readable writeableszText du 'Hello, Cybrary. it!', 0szCaption du 'My 1st Win64 App', 0To call the function, the following code should be added to a section, which contains executable code:section '. text' code readable executablestart:sub rsp, 8 ; align stack to 16-byte boundary. ; App will crash if stack is not aligned. sub rsp, 32 ; reserve 32 bytes shadow space for parametersmov r9, 0 ; uType = MB_OKmov r8, szCaption ; save pointer to caption text to R8lea rdx, [szText] ; other method of saving a pointerxor rcx, rcx ; the same as 'mov rcx, 0', but smaller codecall [MessageBoxW] ; call the functionflat assembler package has a set of macros to make life easier. By including those macros in source code it is possible to simplify, among the variety of other things, calling of WinAPI functions. So that the code to call Message Box will look as follows:invoke MessageBox, NULL, szText, szCaption, MB_OK Pretty much high-level, right? The next step is to include references to external functions to ImportTable. section '. idata' import data readable writeable;; Import Directory Table; (see Microsoft PE and COFF Specification, section 5. 4. 1);dd 0, 0, 0, rva dll_user32, rva imports_user32dd 0, 0, 0, 0, 0;; User32 Import Lookup Table; (see Microsoft PE and COFF Specification, section 5. 4. 2);imports_user32:MessageBoxW dq rva user32_fn_MessageBoxWdq 0;; List of linked DLLs;dll_user32:db 'User32.dll', 0;; User32 Hint/Name Table; (see Microsoft PE and COFF Specification, section 5. 4. 3);user32_fn_MessageBoxW:dw 0db 'MessageBoxW', 0flat assembler package has macros to simplify Import Table construction, too. So that importing functions from a User32.dll and Kernel32.dll will be like library kernel32, 'KERNEL32.dll', user32, 'USER32.dll'include 'apikernel32. inc'include 'apiuser32. inc'What is left is to call Exit Process function from Kernel32.dll and include a reference to it to Import Table. Below is full source code of the application.
format pe64 guientry startsection '. text' code readable executablestart:sub rsp, 40xor r9, r9mov r8, szCaptionmov rdx, szTextxor rcx, rcxcall [MessageBoxW]xor rcx, rcxcall [ExitProcess]section '. data' data readable writeableszText du 'Hello, Cybrary. it!', 0szCaption du 'My 1st Win64 App', 0section '. idata' import data readable writeabledd 0, 0, 0, rva dll_kernel32, rva imports_kernel32dd 0, 0, 0, rva dll_user32, rva imports_user32dd 0, 0, 0, 0, 0imports_kernel32:ExitProcess dq rva kernel32_fn_ExitProcessdq 0imports_user32:MessageBoxW dq rva user32_fn_MessageBoxWdq 0dll_kernel32:db 'Kernel32.dll', 0dll_user32:db 'User32.dll', 0kernel32_fn_ExitProcess:dw 0db 'ExitProcess', 0user32_fn_MessageBoxW:dw 0db 'MessageBoxW', 0
You can type the above code in a text editor (don't copy-paste code from examples or you won't remember it in 5 minutes), save, and compile with the following command:> fasm {filename}Great! Now you know how to code in assembly! But do you know what happens when you run a program in Windows?
Process initialization
I assume you already have understanding of what a process is. It is a running program. In simple words. It consists of a virtual address space, which is allocated to it by operating system; executable image, mapped to that address space; one or more execution threads - units, to which operating system allocates processor time; and a bunch of data structures and handles to various system resources. Each process has its unique identifier --process ID or PID. A new process can be created by calling one of the process-creationfunctions: CreateProcess, CreateProcessAsUser, CreateProcessWithTokenW, orCreateProcessWithLogonW. Creating a Windows process consists of several stages carried out in three parts of the operating system: Windows client-side library, the Windows subsystem process, and the Windows executive. Basic process creation flow is described in Figure 1. Windows client-side library(kernel32.dll or advapi.dll)+-----------------------+| Convert and validate || creation flags and || parameters |+-----------+-----------+|v+-----------+-----------+| Open . exe and create || section object || |+-----------+-----------+|v+-----------+-----------+| || Create process object || |+-----------+-----------+|v+-----------+-----------+| || Create thread object || |+-----------+-----------+| Windows Subsystemv (Csrss. exe)+-----------+-----------+ +-----------------------+| Perform specific | | Set up new process || Windows Subsystem +----------------->+ and thread || process initialization| | |+-----------------------+ +-----------+-----------+|+------------------------------------------+|| Windows Executivev (Ntdll.dll)+-----------+-----------+ +-----------+-----------+| Start initial | | Finalize process || thread execution +----------------->| initialization || | | |+-----------+-----------+ +-----------+-----------+| |v v+-----------+-----------+ +-----------+-----------+| | | Jump to Entry Point || Return to caller | | to start execution || | | |+-----------------------+ +-----------------------+Figure 1 - Process creation flow.
Note, that when a process-creating function returns to caller (the left part of the diagram), the new process may not be fully initialized, yet (the right part of the diagram). Hopefully, there's an API function to help you wait for the process to be fully initialized: Wait For InputIdle. I highly recommend you get a copy of Windows Internals[7] book and study it for details on how Windows works.
Dumping process memory
To dump a process means to take a snapshot of its address space at a given time. Dumps can be later used for offline code analysis and debugging. There're numerous tools created for this purpose. Some of them are rather sophisticated, allowing you to take snapshots at time intervals or on triggered events, rebuild PE images. Actually, creating memory dump can be considered basic (if not trivial) task. But it's always good to know how things work, especially basic. With this knowledge you will be able to move on to something advanced. To dump memory of a process you'll need to get its PID, then gain access to its address space, and, finally, copy and write that address space to file. To get process ID you could create the process from your application. In that case, a function which created the process will return its PID. Please note, that before dumping you'll need to wait until newly created process is fully initialized. WaitForInputIdle API function can help you with that. However, most probably you'll want to dump an already running process. A couple of techniques to obtain ID of an active process exists. Below I'll describe you one of them. There's a tool help library[8] in Windows. As MSDN states, the functions provided by the tool help library make it easier for you to obtain information about currently executing applications. These functions are designed to streamline the creation of tools, specifically debuggers. Given that you know the name of executable file, from which the process originates (process name), the algorithm of retrieving its PID could be something like described in Figure 2 below:
. . . . . . . . . . . /( Start )/```````````||v+-------+--------+| || CreateToolhelp32Snapshot| |+-------+--------+||v/ // Process32First / INVALID_HANDLE_VALUE?/ /+---FALSE--+ <-------------NO--+ +--YES---+| / / || / / || / / || + + || |TRUE || | || v || / || / PROCESSENTRY32. szExeFile || / matches required process name? || +-------> +--YES-----------+ || | / | || | / | || | / | || | + | || | |NO | || | | | || | v | || | / | || | / Process32Next | || | / | || +-TRUE--+ + | || / | || / | || / | || + | || |FALSE | || | | |+--------------+ v || +------------+-------------+ || | Get | || | PROCESSENTRY32. th32ProcessID || | | || +------------+-------------+ |v | |/ | |/ GetLastError returned | |/ ERROR_NO_MORE_FILES? | |+ +--YES--------------+ | |/ | | |/ | | |/ | | |+ | | ||NO | | || | | |v v | v. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . . . . . . . . . . . . . / Error getting / / Process not / | / Error getting // process info / / found / | / system snapshot /``````````+``````` ``````````+``````` | ``````````+```````| | | || +-----------------+ | | || | Close snapshot | | | |+->| handle with |<-+-----------+ || CloseHandle | |+--------+--------+ || |+-----------------------------------------+|v. . . . . . . . . . . /( Finish )/```````````
Figure 2 - Find Process ID Algorithm.
First, you need to create a system snapshot of all running processes using API function CreateToolhelp32Snapshot with dwFlags = TH32CS_SNAPPROCESSand th32ProcessID = 0. After that, you iterate through the snapshot withProcess32First/Process32Next functions until one of them returns FALSE. In this case, GetLastError returning ERROR_NO_MORE_FILES will indicate, that there're no more processes left in the snapshot. Upon success, Process32First/Process32Next functions fill the PROCESSENTRY32 structure with information about process from the snapshot, which includes process ID you're looking for. Once you have ID of the process, you can get access to its address space by calling Open Process API function with PROCESS_VM_READ access flag, and then read portions of its memory by calling Read Process Memory. However, before you can read memory, you have to know the location (starting address)where to read from. Tool help library is here again for your aid. All required information is stored in modules information of a process. You will need to create a snapshot of the process using the old goodCreateToolhelp32Snapshot function but passing dwFlags = TH32CS_SNAPMODULE |TH32CS_SNAPMODULE32 and th32ProcessID = {PID} this time. After that, you iterate through the snapshot with Module32First/Module32Next functions until one of them returns FALSE. In this case, GetLastError returning ERROR_NO_MORE_FILES will indicate, that there're no more modules left in the snapshot. Upon success, Module32First/Module32Next functions fill theMODULEENTRY32 structure with information about modules belonging to the process (executable image and all linked DLLs), which includes base address and size of the module you're looking for. Having module base address (MODULEENTRY32. modBaseAddr) and size(MODULEENTRY32. modBaseSize), you can allocate a heap of memory of enough size to hold full module dump with some memory allocation function (for example, HeapAlloc) and then copy module address space to the heap with the help of ReadProcessMemory function. Once that done, you simply write memory from the heap to file.
My first memory dumper for Win64
Now, when you already know how to code in asm for Windows and how to dump process memory, you are able to create your own cool process dumper. However, some practice won't hurt and below I'm sharing with you source code ofdumpp. exe – a simple quick and dirty process memory dumper I created as an addition to this tutorial for you to investigate and get more familiar withWin64 assembly. It's a console application which receives one parameter:name of a process to dump. Dump is saved as {process_name}. dump file in the current directory. I tried my best to comment source code for people who are not very familiar with assembly. If you're one of them, then you should go through it in the first place, because everybody knows: the best way to learn a programming language is to read sources. And not to be too much boring, I diluted it with some assembly tricks like self-modifying code, exit procedure without entry, miscellaneous exit locations from procedure, tips on flat assembler syntax. Save source as 'dumpp. asm' or whatever name you like, then compile and execute as "dumpp. exe process_name". To debug the program you may want to try x64dbg (https://x64dbg.com/).
format pe64 console ; Create Win64 Console applicationentry _entry ; Original Entry Point;
Define PROCESSENTRY32W structure used by Windows Toolhelp functions;struc PROCESSENTRY32W{. dwSize rd 1. cntUsage rd 1. th32ProcessID rd 1. th32DefaultHeapID rq 1. th32ModuleID rd 1rd 1. cntThreads rd 1. th32ParentProcessID rd 1. pcPriClassBase rd 1. dwFlags rd 1. szExeFile rw 260rw 2};
Define MODULEENTRY32W structure used by Windows Toolhelp functions;struc MODULEENTRY32W{. dwSize rd 1. th32ModuleID rd 1. th32ProcessID rd 1. GlblcntUsage rd 1. ProccntUsage rd 1rd 1. modBaseAddr rq 1. modBaseSize rq 1. hModule rq 1. szModule rw 256. szExePath rw 260};
Define PE32+ section containing executable code.
section '. text' code readable writeable executablequit:mov rcx, r12 ; never forget to close handles andcall [CloseHandle] ; free system resources which are notcall [FreeConsole] ; needed any morexor rcx, rcxcall [ExitProcess];
Start program execution from here;
_entry:;; Align stack to DOUBLE QUADWORD (16 bytes) and reserve 32 bytes; for local variables as required by fastcall. ; Note: app will crash if stack is not dqword aligned. ;sub rsp, 40mov rbp, vars ; see notes in '. data' section for its purposecall [AllocConsole];; Get StdOut handle;mov ecx, -11 ; STD_OUTPUT_HANDLEcall [GetStdHandle]mov [hStdOut], rax;; get command line string and convert it to array of argument strings;call [GetCommandLineW]mov [lpCmdLine], raxlea rdx, [numArgs]mov rcx, raxcall [CommandLineToArgvW]mov [lpArgList], rax;; Check if number of command line arguments passed is correct. ; Exit with error if not. ;cmp [numArgs], dword 2jnz error. invalid_args;; Take a snapshot of all system processes. ; If the function fails with INVALID_HANDLE_VALUE error code, ; then display the appropriate error message and quit. ;xor edx, edxmov ecx, 2 ; TH32CS_SNAPPROCESScall [CreateToolhelp32Snapshot]cmp rax, -1 ; INVALID_HANDLE_VALUEjz error. create_system_snapshotmov r12, rax ; R12 will keep snapshot handle; registers R12-R15, RBP are not destroyed; by WinAPI calls, so it's convenient to use; them to store frequently used variables. ; gives less code size and faster access to; values compared to when stored in memory;
The following routine iterates through the snapshot of system processes; to retrieve information about each process with the help of Process32First; and Process32Next Windows API functions, then checks process name to see; if the required process is found;; BONUS: This routine contains a very simple example of self-modifying code. ; Check how it works using a debugger. ;get_process_info_loop:mov rbx, Process32FirstWlea r13, [pcEntry] ; R13 will point to PROCESSENTRY32Wmov [r13], dword sizeof. pcEntry;; FASM SYNTAX NOTE:; In flat assembler syntax, the label whose name begins with dot is treated; as local label, and its name is attached to the name of last global label; (with name beginning with anything but dot) to make the full name of this; label. So you can use the short name (beginning with dot) of this label; anywhere before the next global label is defined, and in the other places; you have to use the full name. ; . cont and all labels starting with dot below are local for a code block; between two global labels: get_process_info_loop and error. Within this; block they can be addressed with their short names, i. e. jmp . cont; To access these labels from other parts of code use full name. ; See, for example, `jnz error. get_process_info` instruction below. ;. cont:mov rdx, r13mov rcx, r12call qword [rbx]test al, aldb 0x74 ;<- SMC part 1. The two bytes are `jz error. get_process_info`. smc1: ; originally, but changed to `jz . finish` at runtimedb error. get_process_info - $ - 1;; check if process name matches the name passed on command line;lea rdx, [r13 + 44] ; rdx = pointer to pcEntry. szExeFilemov rcx, [lpArgList]mov rcx, [rcx + 8] ; rcx = pointer to the first cmd line argumentcall [lstrcmpW]test eax, eax ; strings equal?jz stage2. smc2: ; <- SMC part 2. or al, al ; junk command, just to reserve 2 bytes. ; changed to `jmp . cont` at runtimeadd rbx, 8 ; <- what is this for? you tellmov byte [. smc1], . finish - . smc1 - 1 ; modify SMC part 1mov word [. smc2], (. cont - . smc2 - 2) shl 8 + 0xeb ; modify SMC part 2jmp . cont. finish:call [GetLastError]cmp eax, 18 ; ERROR_NO_MORE_FILESjz error. process_not_found;
Error handling routine. ; Performs preparations for displaying appropriate error messages. ;error:. get_process_info:mov rdx, szErrGetProcessInfomov r8d, szErrGetProcessInfo. sizejmp . show. invalid_args: ; if you don't understand what's goingmov r8, [lpArgList] ; on here, use debugger to find it out. mov r8, [r8] ; HINT: run the app without command linemov rdx, szErrInvalidArgs ; arguments to get here. @@: ;<----------------- anonymous labellea rcx, [tmpbuf]push rcxcall [wsprintfW]pop rdxmov r8d, eaxjmp . show. process_not_found:mov r8, [lpArgList]mov r8, [r8 + 8]mov rdx, szErrProcessNotFoundjmp @b ;<- jump to the nearest preceding anonymous label (above); use `jmp @f` to jump to the nearest following (below). create_system_snapshot:mov rdx, szErrCreateSystemSnapshotmov r8d, szErrCreateSystemSnapshot. sizejmp . show. create_module_snapshot:mov rdx, szErrCreateModuleSnapshotmov r8d, szErrCreateModuleSnapshot. sizejmp . show. get_module_info:mov rdx, szErrGetModuleInfomov r8d, szErrGetModuleInfo. sizejmp . show. open_process:mov rdx, szErrOpenProcessmov r8d, szErrOpenProcess. sizejmp . show. allocate_heap:mov rdx, szErrHeapAllocmov r8d, szErrHeapAlloc. sizejmp . show. read_process_memory:mov rdx, szErrReadProcessMemorymov r8d, szErrReadProcessMemory. sizejmp . show. create_file:mov rdx, szErrorCreateFilemov r8d, szErrorCreateFile. size. show:;; Call showMessage procedure to display error message, then quit. ;push qword quit; ^^^^^^^; Normally the above should have been `call showMessage` instruction, ; followed by `jmp quit`, but due to the code structure and design, ; showMessage procedure starts directly after the above part of code, so; there's no need in calling it. The return address can simply be pushed; directly on the stack, which reduces code size and speeds up execution. ; Ain't asm cool? :);
Uses WriteConsoleW to display a message. ; IN: RDX - pointer to Unicode string to display; R8D - size of the string;showMessage:lea r9, [numCharsWritten]mov rcx, [hStdOut]push rbpmov rbp, rsppush qword 0sub rsp, 32call [WriteConsoleW]mov rsp, rbppop rbpretstage2:;; open process object with PROCESS_VM_READ access;mov r8d, [r13 + 8] ; [pcEntry. th32ProcessID]xor rdx, rdxmov ecx, 16 ; PROCESS_VM_READcall [OpenProcess]test rax, raxjz error. open_processmov [hProcess], rax;; close previous (system processes) snapshot, not needed any more;mov rcx, r12call [CloseHandle];; open snapshot of desired process and include all its modules in it;mov edx, [r13 + 8] ; [pcEntry. th32ProcessID]mov ecx, 0x00000018 ; TH32CS_SNAPMODULE | TH32CS_SNAPMODULE32call [CreateToolhelp32Snapshot]cmp rax, -1 ; INVALID_HANDLE_VALUE?jz error. create_module_snapshotmov r12, rax ; R12 will keep snapshot handle;; loop through the list of modules in the snapshot to find executable. ; quit with error message if not found or any other error occurred;mov rbx, Module32FirstW@@:lea rdx, [mdEntry]mov [rdx], dword sizeof. mdEntrymov rcx, r12call qword [rbx]test al, aljz . errorlea rdx, [mdEntry. szModule]mov rcx, [lpArgList]mov rcx, [rcx + 8]call [lstrcmpW]test eax, eaxjz stage3mov rbx, Module32NextWjmp @b. error:mov rcx, [hProcess]call [CloseHandle]jmp error. get_module_info;
The following routine can exit to different locations based on which error; occurred. Investigate how it works. ;stage3:;; allocate a heap of MODULEENTRY32. modBaseSize bytes;mov r15, [mdEntry. modBaseSize]mov rdx, r15xor r8, r8xor ecx, ecxcall [HeapCreate]test rax, raxjnz @fpush qword error. allocate_heapjmp . ret@@:mov [hHeap], raxmov r8, r15mov edx, 8 ; HEAP_ZERO_MEMORYmov rcx, raxcall [HeapAlloc]test rax, raxjnz . read_process_memorypush qword error. allocate_heapjmp . ret2. read_process_memory:mov [lpHeap], rax;; read (dump) process memory to allocated heap;mov r8, raxmov r9, r15mov rdx, [mdEntry. modBaseAddr]mov rcx, [hProcess]push rbpmov rbp, rsplea rax, [numCharsWritten]push raxsub rsp, 32call [ReadProcessMemory]mov rsp, rbppop rbptest al, aljnz . save_dumppush qword error. read_process_memoryjmp . ret2. save_dump:;; save memory dump to file in the current directory. ; overwrite if such file exists;lea r8, [mdEntry. szModule]lea rdx, [szDumpFileName]lea rcx, [tmpbuf]call [wsprintfW]xor r9, r9mov r8d, 1 ; FILE_SHARE_READmov edx, 0xc0000000 ; GENERIC_READ | GENERIC_WRITElea rcx, [tmpbuf]push rbpmov rbp, rsppush r9 ; hTemplateFile = NULLpush qword 128 ; FILE_ATTRIBUTE_NORMALpush qword 2 ; CREATE_ALWAYSsub rsp, 32call [CreateFileW]mov rsp, rbppop rbptest rax, raxjnz @fpush qword error. create_filejmp . ret2@@:mov [hFile], raxlea r9, [numCharsWritten]mov r8, r15mov rdx, [lpHeap]mov rcx, raxpush rbpmov rbp, rsppush qword 0sub rsp, 32call [WriteFile]mov rsp, rbppop rbptest al, aljnz @fcall [GetLastError]push qword error. create_filejmp . ret3@@:push qword quit. ret3: ; based on progress before error occurredmov rcx, [hFile] ; this or that handles should or should notcall [CloseHandle] ; be released. ret2:mov rcx, [hHeap]call [HeapDestroy]. ret:mov rcx, [hProcess]call [CloseHandle]ret;
Define PE32+ section containing (un)initialized data.
section '. data' data readable writeableszDumpFileName du '%s. dump', 0; Error massages;; FASM SYNTAX NOTE:; `du` directive accepts the quoted string values of any length, which will; be converted into chain of words with zeroed high byte. This way Unicode; strings are defined. ;szErrInvalidArgs:du 'Intro to Win64 Assembly and Process Dumping Practice Application'du 13, 10, 'See https://www. cybrary. it/0p3n/intro-to-win64-assembly-'du 'and-process-dumping', 13, 10, 'Usage: %s process_name', 0szErrCreateSystemSnapshot:du 'Failed to create a snapshot of system processes', 0. size = ($ - szErrCreateSystemSnapshot) / 2 ;string size in bytesszErrCreateModuleSnapshot:du 'Failed to create a snapshot of process modules', 0. size = ($ - szErrCreateModuleSnapshot) / 2szErrGetProcessInfo:du 'Failed to get process information from the snapshot', 0. size = ($ - szErrGetProcessInfo) / 2szErrGetModuleInfo:du 'Failed to get module information from process snapshot', 0. size = ($ - szErrGetModuleInfo) / 2szErrProcessNotFound:du 'No running processes with name "%s" found' , 0szErrOpenProcess:du 'Failed to get access to the process', 0. size = ($ - szErrOpenProcess) / 2szErrHeapAlloc:du 'Failed to allocate memory for process dumping', 0. size = ($ - szErrHeapAlloc) / 2szErrReadProcessMemory:du 'Failed to read memory of the process', 0. size = ($ - szErrReadProcessMemory) / 2szErrorCreateFile:du 'Failed to create or write to dump file', 0. size = ($ - szErrorCreateFile) / 2;
Uninitialized global variables;; FASM SYNTAX NOTE:; `virtual` directive defines virtual data at specified address. This data; will not be included in the output file, but labels defined there can be; used in other parts of source. `virtual at rbp` tells fasm, that all; labels inside the virtual data space will be relative to the value of; RBP register. For example, instruction `mov rcx, [hStdOut]` will be; assembled as `mov rcx, [rbp+0]`, instruction `lea rdx, [lpCmdLine]` will; be assembled as `lea rdx, [rbp+8]`, and so on. Note `mov rbp, vars`; instruction at the beginning of executable code. It initializes RBP; register with base address of uninitialized data. Using register-based; addressing produces smaller code: avg. 4 bytes per `mov reg, [rbp+num]`; instruction instead of 7-8 bytes for direct addressing `mov reg, mem`. ; `sizeof. vars = $ - $$` defines the size of virtual data. This amount; of data should then be reserved with `vars rb sizeof. vars` expression. ; This is required for fasm to properly compute PE section virtual size; during executable image creation. ;virtual at rbphStdOut rq 1hProcess rq 1hHeap rq 1lpHeap rq 1hFile rq 1lpCmdLine rq 1lpArgList rq 1numArgs rd 1numCharsWritten rd 2pcEntry PROCESSENTRY32Wsizeof. pcEntry = $ - pcEntrymdEntry MODULEENTRY32Wsizeof. mdEntry = $ - mdEntrytmpbuf rb 512sizeof. vars = $ - $$end virtualalign 16vars rb sizeof. vars;
Define PE32+ import section
section '. idata' import data readable writeable;
Import Directory Table; (see Microsoft PE and COFF Specification, section 5. 4. 1);dd 0, 0, 0, rva dll_kernel32, rva imports_kernel32dd 0, 0, 0, rva dll_shell32, rva imports_shell32dd 0, 0, 0, rva dll_user32, rva imports_user32dd 0, 0, 0, 0, 0; Kernel32 Import Lookup Table; (see Microsoft PE and COFF Specification, section 5. 4. 2);imports_kernel32:AllocConsole dq rva kernel32_fn_AllocConsoleCloseHandle dq rva kernel32_fn_CloseHandleCreateFileW dq rva kernel32_fn_CreateFileWCreateToolhelp32Snapshot dq rva kernel32_fn_CreateToolhelp32SnapshotExitProcess dq rva kernel32_fn_ExitProcessFreeConsole dq rva kernel32_fn_FreeConsoleGetCommandLineW dq rva kernel32_fn_GetCommandLineWGetLastError dq rva kernel32_fn_GetLastErrorGetStdHandle dq rva kernel32_fn_GetStdHandleHeapAlloc dq rva kernel32_fn_HeapAllocHeapCreate dq rva kernel32_fn_HeapCreateHeapDestroy dq rva kernel32_fn_HeapDestroylstrcmpW dq rva kernel32_fn_lstrcmpWModule32FirstW dq rva kernel32_fn_Module32FirstWModule32NextW dq rva kernel32_fn_Module32NextWOpenProcess dq rva kernel32_fn_OpenProcessProcess32FirstW dq rva kernel32_fn_Process32FirstWProcess32NextW dq rva kernel32_fn_Process32NextWReadProcessMemory dq rva kernel32_fn_ReadProcessMemoryWriteConsoleW dq rva kernel32_fn_WriteConsoleWWriteFile dq rva kernel32_fn_WriteFiledq 0;
Shell32 Import Lookup Table; (see Microsoft PE and COFF Specification, section 5. 4. 2);imports_shell32:CommandLineToArgvW dq rva shell32_fn_CommandLineToArgvWdq 0;
User32 Import Lookup Table; (see Microsoft PE and COFF Specification, section 5. 4. 2);imports_user32:wsprintfW dq rva user32_fn_wsprintfWdq 0;
List of linked DLLs;dll_kernel32:db 'Kernel32.dll', 0dll_shell32:db 'Shell32.dll', 0dll_user32:db 'User32.dll', 0;
Kernel32 Hint/Name Table; (see Microsoft PE and COFF Specification, section 5. 4. 3);kernel32_fn_AllocConsole:dw 0db 'AllocConsole', 0kernel32_fn_CloseHandle:dw 0db 'CloseHandle', 0kernel32_fn_CreateFileW:dw 0db 'CreateFileW', 0kernel32_fn_CreateToolhelp32Snapshot:dw 0db 'CreateToolhelp32Snapshot'kernel32_fn_ExitProcess:dw 0db 'ExitProcess', 0kernel32_fn_FreeConsole:dw 0db 'FreeConsole', 0kernel32_fn_GetCommandLineW:dw 0db 'GetCommandLineW', 0kernel32_fn_GetLastError:dw 0db 'GetLastError', 0kernel32_fn_GetStdHandle:dw 0db 'GetStdHandle', 0kernel32_fn_HeapAlloc:dw 0db 'HeapAlloc', 0kernel32_fn_HeapCreate:dw 0db 'HeapCreate', 0kernel32_fn_HeapDestroy:dw 0db 'HeapDestroy', 0kernel32_fn_lstrcmpW:dw 0db 'lstrcmpW', 0kernel32_fn_Module32FirstW:dw 0db 'Module32FirstW', 0kernel32_fn_Module32NextW:dw 0db 'Module32NextW', 0kernel32_fn_OpenProcess:dw 0db 'OpenProcess', 0kernel32_fn_Process32FirstW:dw 0db 'Process32FirstW', 0kernel32_fn_Process32NextW:dw 0db 'Process32NextW', 0kernel32_fn_ReadProcessMemory:dw 0db 'ReadProcessMemory', 0kernel32_fn_WriteConsoleW:dw 0db 'WriteConsoleW', 0kernel32_fn_WriteFile:dw 0db 'WriteFile', 0;
Shell32 Hint/Name Table; (see Microsoft PE and COFF Specification, section 5.4.3); shell32_fn_CommandLineToArgvW:dw 0db 'CommandLineToArgvW', 0;
User32 Hint/Name Table; (see Microsoft PE and COFF Specification, section 5.4.3); user32_fn_wsprintfW:dw 0db 'wsprintfW', 0
References
1. Intel 64 and IA-32 Architectures Software Developer Manuals http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
2. AMD64 Architecture Programmer's Manuals http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/
3. flat assembler http://flatassembler.net
4. Microsoft PE and COFF Specification https://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx
5. The PE File Format by Bernd Luevelsmeyer http://www.pelib.com/resources/luevel.txt
6. Overview of x64 Calling Conventions https://msdn.microsoft.com/en-us/library/ms235286.aspx
7. Windows Internals https://technet.microsoft.com/en-us/sysinternals/bb963901.aspx
8. MSDN: Tool Help Library https://msdn.microsoft.com/en-us/library/windows/desktop/ms686837(v=vs.85).aspx
Programming in assembly for Windows is very simple. Operating system provides you with an Application Programming Interface (API) -- a set of functions, scattered across dynamic-link libraries (DLLs) like Kernel32.dll, User32.dll, etc. You just link the required libraries to your application and call functions they provide. There’re a lot of different assemblers in the world, but I would recommend you use flat assembler[3]. This is a very fast and flexible assembler with an extremely powerful macroinstructions support. Just try it. You'll love it, I'm sure. Not worth mentioning, that you should get familiar with PE file format. Get Microsoft PE and COFF Specification[4]. There's also a great document about portable executables by Bernd Luevelsmeyer[5], the one I myself studied and referenced while learning the subject in the past.
Win64 application source code template for flat assembler
Win64 application source code template replicates the standard portable executable structure: format pe64entry start section '. text' code readable executable start:; {here goes executable code}section '. data' data readable writeable; {here goes data}section '. idata' import data readable writeable; {here goes import table}The first `format pe64` directive tells fasm to produce PE32+ executable image. It can be followed by additional `console` or `gui` keywords to explicitly specify Windows Subsystem: character or GUI respectively. `entry {label}` directive defines address of entry point. `section` directive, followed by name and flags, defines a new PE section. For example, `section '. text' code readable executable` from theabove template will add a new section with name ". text" and IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE flags. Any number of sections may be defined as needed. If you need export table, you can define export section for it:section '. edata' export data readable If you need a section for resources, you can add it, too:section '. rsrc' resource data readable. And so on, and so forth. You get the idea. Please refer the flat assembler programmer's manual (comes with fasm) for more details. Save the above template as "stub. asm" or whatever name you call it and compile:> fasm stub. asmA 512-byte, empty, containing only PE header, . exe file will be produced. Feel free to investigate it with some PE Info tool. fastcall -- Windows x64 calling convention-----------------------------------------------The first four arguments with size of 8, 16, 32, or 64 bits are passed(in order left to right) in registers RCX, RDX, R8, and R9. Arguments five and higher are passed on the stack. All arguments are right-justified in registers. The caller is responsible for allocating "shadow" space on the stack for parameters to the callee and must always allocate sufficient space for the four register parameters, even if the callee doesn’t have that many parameters. This means, that you always need to reserve 32 bytes on the stack before calling an API function, even if it has less than four params. Keep in mind, that for functions with more than four parameters, shadow space must be reserved *after* parameters five and above have been pushed to stack. Stack pointer and malloc or alloca memory must be aligned to 16 byte. Results are returned in RAX register. Those are the basic rules. More detailed description is available on MSDN[6].
My first Win64 app in assembly
Enough theory, let's code! The first program will simply show a message box (WinAPI function MessageBox) and exit (WinAPI function Exit Process). MSDN WinAPI Reference gives the following syntax for Message Box:int WINAPI MessageBox(_In_opt_ HWND hWnd, _In_opt_ LPCTSTR lpText, _In_opt_ LPCTSTR lpCaption, _In_ UINT uType);Pay attention to requirements:DLL: User32.dllUnicode and ANSI Names: Message BoxW (Unicode) and MessageBoxA (ANSI)Unicode and ANSI Names gives actual names of the function as defined inDLL export table. It means, that Message Box function doesn't actually existin nature. It's an alias to either Message BoxA or Message BoxW. You could already have guessed, that "A" suffix in the name of the function stands for"ANSI"; "W" stands for "Wide". You should always consider using Unicode variants of Windows API functions when coding in year 2016+ unless you absolutely must revert to ANSI functions for some strange reason. Throughout the text I'll always refer to Unicode variants, even when not explicitly specifying W suffix. As you can see from the syntax, the function accepts four parameters. According to the fast call calling convention described above, the first four parameters are passed in RCX, RDX, R8, and R9 registers:- hWnd is optional and can be NULL (0), goes to RCX;- lp Text is a pointer to a zero-terminated message text string, goes toRDX;- lpCaption is a pointer to a zero-terminated message box caption string, goes to R8;- uType defines type of the message box and buttons it contains, goes toR9. First, let's add message and caption to display on the message box. This is initialized data and should be stored in the corresponding PE section:section '. data' data readable writeableszText du 'Hello, Cybrary. it!', 0szCaption du 'My 1st Win64 App', 0To call the function, the following code should be added to a section, which contains executable code:section '. text' code readable executablestart:sub rsp, 8 ; align stack to 16-byte boundary. ; App will crash if stack is not aligned. sub rsp, 32 ; reserve 32 bytes shadow space for parametersmov r9, 0 ; uType = MB_OKmov r8, szCaption ; save pointer to caption text to R8lea rdx, [szText] ; other method of saving a pointerxor rcx, rcx ; the same as 'mov rcx, 0', but smaller codecall [MessageBoxW] ; call the functionflat assembler package has a set of macros to make life easier. By including those macros in source code it is possible to simplify, among the variety of other things, calling of WinAPI functions. So that the code to call Message Box will look as follows:invoke MessageBox, NULL, szText, szCaption, MB_OK Pretty much high-level, right? The next step is to include references to external functions to ImportTable. section '. idata' import data readable writeable;; Import Directory Table; (see Microsoft PE and COFF Specification, section 5. 4. 1);dd 0, 0, 0, rva dll_user32, rva imports_user32dd 0, 0, 0, 0, 0;; User32 Import Lookup Table; (see Microsoft PE and COFF Specification, section 5. 4. 2);imports_user32:MessageBoxW dq rva user32_fn_MessageBoxWdq 0;; List of linked DLLs;dll_user32:db 'User32.dll', 0;; User32 Hint/Name Table; (see Microsoft PE and COFF Specification, section 5. 4. 3);user32_fn_MessageBoxW:dw 0db 'MessageBoxW', 0flat assembler package has macros to simplify Import Table construction, too. So that importing functions from a User32.dll and Kernel32.dll will be like library kernel32, 'KERNEL32.dll', user32, 'USER32.dll'include 'apikernel32. inc'include 'apiuser32. inc'What is left is to call Exit Process function from Kernel32.dll and include a reference to it to Import Table. Below is full source code of the application.
format pe64 guientry startsection '. text' code readable executablestart:sub rsp, 40xor r9, r9mov r8, szCaptionmov rdx, szTextxor rcx, rcxcall [MessageBoxW]xor rcx, rcxcall [ExitProcess]section '. data' data readable writeableszText du 'Hello, Cybrary. it!', 0szCaption du 'My 1st Win64 App', 0section '. idata' import data readable writeabledd 0, 0, 0, rva dll_kernel32, rva imports_kernel32dd 0, 0, 0, rva dll_user32, rva imports_user32dd 0, 0, 0, 0, 0imports_kernel32:ExitProcess dq rva kernel32_fn_ExitProcessdq 0imports_user32:MessageBoxW dq rva user32_fn_MessageBoxWdq 0dll_kernel32:db 'Kernel32.dll', 0dll_user32:db 'User32.dll', 0kernel32_fn_ExitProcess:dw 0db 'ExitProcess', 0user32_fn_MessageBoxW:dw 0db 'MessageBoxW', 0
You can type the above code in a text editor (don't copy-paste code from examples or you won't remember it in 5 minutes), save, and compile with the following command:> fasm {filename}Great! Now you know how to code in assembly! But do you know what happens when you run a program in Windows?
Process initialization
I assume you already have understanding of what a process is. It is a running program. In simple words. It consists of a virtual address space, which is allocated to it by operating system; executable image, mapped to that address space; one or more execution threads - units, to which operating system allocates processor time; and a bunch of data structures and handles to various system resources. Each process has its unique identifier --process ID or PID. A new process can be created by calling one of the process-creationfunctions: CreateProcess, CreateProcessAsUser, CreateProcessWithTokenW, orCreateProcessWithLogonW. Creating a Windows process consists of several stages carried out in three parts of the operating system: Windows client-side library, the Windows subsystem process, and the Windows executive. Basic process creation flow is described in Figure 1. Windows client-side library(kernel32.dll or advapi.dll)+-----------------------+| Convert and validate || creation flags and || parameters |+-----------+-----------+|v+-----------+-----------+| Open . exe and create || section object || |+-----------+-----------+|v+-----------+-----------+| || Create process object || |+-----------+-----------+|v+-----------+-----------+| || Create thread object || |+-----------+-----------+| Windows Subsystemv (Csrss. exe)+-----------+-----------+ +-----------------------+| Perform specific | | Set up new process || Windows Subsystem +----------------->+ and thread || process initialization| | |+-----------------------+ +-----------+-----------+|+------------------------------------------+|| Windows Executivev (Ntdll.dll)+-----------+-----------+ +-----------+-----------+| Start initial | | Finalize process || thread execution +----------------->| initialization || | | |+-----------+-----------+ +-----------+-----------+| |v v+-----------+-----------+ +-----------+-----------+| | | Jump to Entry Point || Return to caller | | to start execution || | | |+-----------------------+ +-----------------------+Figure 1 - Process creation flow.
Note, that when a process-creating function returns to caller (the left part of the diagram), the new process may not be fully initialized, yet (the right part of the diagram). Hopefully, there's an API function to help you wait for the process to be fully initialized: Wait For InputIdle. I highly recommend you get a copy of Windows Internals[7] book and study it for details on how Windows works.
Dumping process memory
To dump a process means to take a snapshot of its address space at a given time. Dumps can be later used for offline code analysis and debugging. There're numerous tools created for this purpose. Some of them are rather sophisticated, allowing you to take snapshots at time intervals or on triggered events, rebuild PE images. Actually, creating memory dump can be considered basic (if not trivial) task. But it's always good to know how things work, especially basic. With this knowledge you will be able to move on to something advanced. To dump memory of a process you'll need to get its PID, then gain access to its address space, and, finally, copy and write that address space to file. To get process ID you could create the process from your application. In that case, a function which created the process will return its PID. Please note, that before dumping you'll need to wait until newly created process is fully initialized. WaitForInputIdle API function can help you with that. However, most probably you'll want to dump an already running process. A couple of techniques to obtain ID of an active process exists. Below I'll describe you one of them. There's a tool help library[8] in Windows. As MSDN states, the functions provided by the tool help library make it easier for you to obtain information about currently executing applications. These functions are designed to streamline the creation of tools, specifically debuggers. Given that you know the name of executable file, from which the process originates (process name), the algorithm of retrieving its PID could be something like described in Figure 2 below:
. . . . . . . . . . . /( Start )/```````````||v+-------+--------+| || CreateToolhelp32Snapshot| |+-------+--------+||v/ // Process32First / INVALID_HANDLE_VALUE?/ /+---FALSE--+ <-------------NO--+ +--YES---+| / / || / / || / / || + + || |TRUE || | || v || / || / PROCESSENTRY32. szExeFile || / matches required process name? || +-------> +--YES-----------+ || | / | || | / | || | / | || | + | || | |NO | || | | | || | v | || | / | || | / Process32Next | || | / | || +-TRUE--+ + | || / | || / | || / | || + | || |FALSE | || | | |+--------------+ v || +------------+-------------+ || | Get | || | PROCESSENTRY32. th32ProcessID || | | || +------------+-------------+ |v | |/ | |/ GetLastError returned | |/ ERROR_NO_MORE_FILES? | |+ +--YES--------------+ | |/ | | |/ | | |/ | | |+ | | ||NO | | || | | |v v | v. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . . . . . . . . . . . . . / Error getting / / Process not / | / Error getting // process info / / found / | / system snapshot /``````````+``````` ``````````+``````` | ``````````+```````| | | || +-----------------+ | | || | Close snapshot | | | |+->| handle with |<-+-----------+ || CloseHandle | |+--------+--------+ || |+-----------------------------------------+|v. . . . . . . . . . . /( Finish )/```````````
Figure 2 - Find Process ID Algorithm.
First, you need to create a system snapshot of all running processes using API function CreateToolhelp32Snapshot with dwFlags = TH32CS_SNAPPROCESSand th32ProcessID = 0. After that, you iterate through the snapshot withProcess32First/Process32Next functions until one of them returns FALSE. In this case, GetLastError returning ERROR_NO_MORE_FILES will indicate, that there're no more processes left in the snapshot. Upon success, Process32First/Process32Next functions fill the PROCESSENTRY32 structure with information about process from the snapshot, which includes process ID you're looking for. Once you have ID of the process, you can get access to its address space by calling Open Process API function with PROCESS_VM_READ access flag, and then read portions of its memory by calling Read Process Memory. However, before you can read memory, you have to know the location (starting address)where to read from. Tool help library is here again for your aid. All required information is stored in modules information of a process. You will need to create a snapshot of the process using the old goodCreateToolhelp32Snapshot function but passing dwFlags = TH32CS_SNAPMODULE |TH32CS_SNAPMODULE32 and th32ProcessID = {PID} this time. After that, you iterate through the snapshot with Module32First/Module32Next functions until one of them returns FALSE. In this case, GetLastError returning ERROR_NO_MORE_FILES will indicate, that there're no more modules left in the snapshot. Upon success, Module32First/Module32Next functions fill theMODULEENTRY32 structure with information about modules belonging to the process (executable image and all linked DLLs), which includes base address and size of the module you're looking for. Having module base address (MODULEENTRY32. modBaseAddr) and size(MODULEENTRY32. modBaseSize), you can allocate a heap of memory of enough size to hold full module dump with some memory allocation function (for example, HeapAlloc) and then copy module address space to the heap with the help of ReadProcessMemory function. Once that done, you simply write memory from the heap to file.
My first memory dumper for Win64
Now, when you already know how to code in asm for Windows and how to dump process memory, you are able to create your own cool process dumper. However, some practice won't hurt and below I'm sharing with you source code ofdumpp. exe – a simple quick and dirty process memory dumper I created as an addition to this tutorial for you to investigate and get more familiar withWin64 assembly. It's a console application which receives one parameter:name of a process to dump. Dump is saved as {process_name}. dump file in the current directory. I tried my best to comment source code for people who are not very familiar with assembly. If you're one of them, then you should go through it in the first place, because everybody knows: the best way to learn a programming language is to read sources. And not to be too much boring, I diluted it with some assembly tricks like self-modifying code, exit procedure without entry, miscellaneous exit locations from procedure, tips on flat assembler syntax. Save source as 'dumpp. asm' or whatever name you like, then compile and execute as "dumpp. exe process_name". To debug the program you may want to try x64dbg (https://x64dbg.com/).
format pe64 console ; Create Win64 Console applicationentry _entry ; Original Entry Point;
Define PROCESSENTRY32W structure used by Windows Toolhelp functions;struc PROCESSENTRY32W{. dwSize rd 1. cntUsage rd 1. th32ProcessID rd 1. th32DefaultHeapID rq 1. th32ModuleID rd 1rd 1. cntThreads rd 1. th32ParentProcessID rd 1. pcPriClassBase rd 1. dwFlags rd 1. szExeFile rw 260rw 2};
Define MODULEENTRY32W structure used by Windows Toolhelp functions;struc MODULEENTRY32W{. dwSize rd 1. th32ModuleID rd 1. th32ProcessID rd 1. GlblcntUsage rd 1. ProccntUsage rd 1rd 1. modBaseAddr rq 1. modBaseSize rq 1. hModule rq 1. szModule rw 256. szExePath rw 260};
Define PE32+ section containing executable code.
section '. text' code readable writeable executablequit:mov rcx, r12 ; never forget to close handles andcall [CloseHandle] ; free system resources which are notcall [FreeConsole] ; needed any morexor rcx, rcxcall [ExitProcess];
Start program execution from here;
_entry:;; Align stack to DOUBLE QUADWORD (16 bytes) and reserve 32 bytes; for local variables as required by fastcall. ; Note: app will crash if stack is not dqword aligned. ;sub rsp, 40mov rbp, vars ; see notes in '. data' section for its purposecall [AllocConsole];; Get StdOut handle;mov ecx, -11 ; STD_OUTPUT_HANDLEcall [GetStdHandle]mov [hStdOut], rax;; get command line string and convert it to array of argument strings;call [GetCommandLineW]mov [lpCmdLine], raxlea rdx, [numArgs]mov rcx, raxcall [CommandLineToArgvW]mov [lpArgList], rax;; Check if number of command line arguments passed is correct. ; Exit with error if not. ;cmp [numArgs], dword 2jnz error. invalid_args;; Take a snapshot of all system processes. ; If the function fails with INVALID_HANDLE_VALUE error code, ; then display the appropriate error message and quit. ;xor edx, edxmov ecx, 2 ; TH32CS_SNAPPROCESScall [CreateToolhelp32Snapshot]cmp rax, -1 ; INVALID_HANDLE_VALUEjz error. create_system_snapshotmov r12, rax ; R12 will keep snapshot handle; registers R12-R15, RBP are not destroyed; by WinAPI calls, so it's convenient to use; them to store frequently used variables. ; gives less code size and faster access to; values compared to when stored in memory;
The following routine iterates through the snapshot of system processes; to retrieve information about each process with the help of Process32First; and Process32Next Windows API functions, then checks process name to see; if the required process is found;; BONUS: This routine contains a very simple example of self-modifying code. ; Check how it works using a debugger. ;get_process_info_loop:mov rbx, Process32FirstWlea r13, [pcEntry] ; R13 will point to PROCESSENTRY32Wmov [r13], dword sizeof. pcEntry;; FASM SYNTAX NOTE:; In flat assembler syntax, the label whose name begins with dot is treated; as local label, and its name is attached to the name of last global label; (with name beginning with anything but dot) to make the full name of this; label. So you can use the short name (beginning with dot) of this label; anywhere before the next global label is defined, and in the other places; you have to use the full name. ; . cont and all labels starting with dot below are local for a code block; between two global labels: get_process_info_loop and error. Within this; block they can be addressed with their short names, i. e. jmp . cont; To access these labels from other parts of code use full name. ; See, for example, `jnz error. get_process_info` instruction below. ;. cont:mov rdx, r13mov rcx, r12call qword [rbx]test al, aldb 0x74 ;<- SMC part 1. The two bytes are `jz error. get_process_info`. smc1: ; originally, but changed to `jz . finish` at runtimedb error. get_process_info - $ - 1;; check if process name matches the name passed on command line;lea rdx, [r13 + 44] ; rdx = pointer to pcEntry. szExeFilemov rcx, [lpArgList]mov rcx, [rcx + 8] ; rcx = pointer to the first cmd line argumentcall [lstrcmpW]test eax, eax ; strings equal?jz stage2. smc2: ; <- SMC part 2. or al, al ; junk command, just to reserve 2 bytes. ; changed to `jmp . cont` at runtimeadd rbx, 8 ; <- what is this for? you tellmov byte [. smc1], . finish - . smc1 - 1 ; modify SMC part 1mov word [. smc2], (. cont - . smc2 - 2) shl 8 + 0xeb ; modify SMC part 2jmp . cont. finish:call [GetLastError]cmp eax, 18 ; ERROR_NO_MORE_FILESjz error. process_not_found;
Error handling routine. ; Performs preparations for displaying appropriate error messages. ;error:. get_process_info:mov rdx, szErrGetProcessInfomov r8d, szErrGetProcessInfo. sizejmp . show. invalid_args: ; if you don't understand what's goingmov r8, [lpArgList] ; on here, use debugger to find it out. mov r8, [r8] ; HINT: run the app without command linemov rdx, szErrInvalidArgs ; arguments to get here. @@: ;<----------------- anonymous labellea rcx, [tmpbuf]push rcxcall [wsprintfW]pop rdxmov r8d, eaxjmp . show. process_not_found:mov r8, [lpArgList]mov r8, [r8 + 8]mov rdx, szErrProcessNotFoundjmp @b ;<- jump to the nearest preceding anonymous label (above); use `jmp @f` to jump to the nearest following (below). create_system_snapshot:mov rdx, szErrCreateSystemSnapshotmov r8d, szErrCreateSystemSnapshot. sizejmp . show. create_module_snapshot:mov rdx, szErrCreateModuleSnapshotmov r8d, szErrCreateModuleSnapshot. sizejmp . show. get_module_info:mov rdx, szErrGetModuleInfomov r8d, szErrGetModuleInfo. sizejmp . show. open_process:mov rdx, szErrOpenProcessmov r8d, szErrOpenProcess. sizejmp . show. allocate_heap:mov rdx, szErrHeapAllocmov r8d, szErrHeapAlloc. sizejmp . show. read_process_memory:mov rdx, szErrReadProcessMemorymov r8d, szErrReadProcessMemory. sizejmp . show. create_file:mov rdx, szErrorCreateFilemov r8d, szErrorCreateFile. size. show:;; Call showMessage procedure to display error message, then quit. ;push qword quit; ^^^^^^^; Normally the above should have been `call showMessage` instruction, ; followed by `jmp quit`, but due to the code structure and design, ; showMessage procedure starts directly after the above part of code, so; there's no need in calling it. The return address can simply be pushed; directly on the stack, which reduces code size and speeds up execution. ; Ain't asm cool? :);
Uses WriteConsoleW to display a message. ; IN: RDX - pointer to Unicode string to display; R8D - size of the string;showMessage:lea r9, [numCharsWritten]mov rcx, [hStdOut]push rbpmov rbp, rsppush qword 0sub rsp, 32call [WriteConsoleW]mov rsp, rbppop rbpretstage2:;; open process object with PROCESS_VM_READ access;mov r8d, [r13 + 8] ; [pcEntry. th32ProcessID]xor rdx, rdxmov ecx, 16 ; PROCESS_VM_READcall [OpenProcess]test rax, raxjz error. open_processmov [hProcess], rax;; close previous (system processes) snapshot, not needed any more;mov rcx, r12call [CloseHandle];; open snapshot of desired process and include all its modules in it;mov edx, [r13 + 8] ; [pcEntry. th32ProcessID]mov ecx, 0x00000018 ; TH32CS_SNAPMODULE | TH32CS_SNAPMODULE32call [CreateToolhelp32Snapshot]cmp rax, -1 ; INVALID_HANDLE_VALUE?jz error. create_module_snapshotmov r12, rax ; R12 will keep snapshot handle;; loop through the list of modules in the snapshot to find executable. ; quit with error message if not found or any other error occurred;mov rbx, Module32FirstW@@:lea rdx, [mdEntry]mov [rdx], dword sizeof. mdEntrymov rcx, r12call qword [rbx]test al, aljz . errorlea rdx, [mdEntry. szModule]mov rcx, [lpArgList]mov rcx, [rcx + 8]call [lstrcmpW]test eax, eaxjz stage3mov rbx, Module32NextWjmp @b. error:mov rcx, [hProcess]call [CloseHandle]jmp error. get_module_info;
The following routine can exit to different locations based on which error; occurred. Investigate how it works. ;stage3:;; allocate a heap of MODULEENTRY32. modBaseSize bytes;mov r15, [mdEntry. modBaseSize]mov rdx, r15xor r8, r8xor ecx, ecxcall [HeapCreate]test rax, raxjnz @fpush qword error. allocate_heapjmp . ret@@:mov [hHeap], raxmov r8, r15mov edx, 8 ; HEAP_ZERO_MEMORYmov rcx, raxcall [HeapAlloc]test rax, raxjnz . read_process_memorypush qword error. allocate_heapjmp . ret2. read_process_memory:mov [lpHeap], rax;; read (dump) process memory to allocated heap;mov r8, raxmov r9, r15mov rdx, [mdEntry. modBaseAddr]mov rcx, [hProcess]push rbpmov rbp, rsplea rax, [numCharsWritten]push raxsub rsp, 32call [ReadProcessMemory]mov rsp, rbppop rbptest al, aljnz . save_dumppush qword error. read_process_memoryjmp . ret2. save_dump:;; save memory dump to file in the current directory. ; overwrite if such file exists;lea r8, [mdEntry. szModule]lea rdx, [szDumpFileName]lea rcx, [tmpbuf]call [wsprintfW]xor r9, r9mov r8d, 1 ; FILE_SHARE_READmov edx, 0xc0000000 ; GENERIC_READ | GENERIC_WRITElea rcx, [tmpbuf]push rbpmov rbp, rsppush r9 ; hTemplateFile = NULLpush qword 128 ; FILE_ATTRIBUTE_NORMALpush qword 2 ; CREATE_ALWAYSsub rsp, 32call [CreateFileW]mov rsp, rbppop rbptest rax, raxjnz @fpush qword error. create_filejmp . ret2@@:mov [hFile], raxlea r9, [numCharsWritten]mov r8, r15mov rdx, [lpHeap]mov rcx, raxpush rbpmov rbp, rsppush qword 0sub rsp, 32call [WriteFile]mov rsp, rbppop rbptest al, aljnz @fcall [GetLastError]push qword error. create_filejmp . ret3@@:push qword quit. ret3: ; based on progress before error occurredmov rcx, [hFile] ; this or that handles should or should notcall [CloseHandle] ; be released. ret2:mov rcx, [hHeap]call [HeapDestroy]. ret:mov rcx, [hProcess]call [CloseHandle]ret;
Define PE32+ section containing (un)initialized data.
section '. data' data readable writeableszDumpFileName du '%s. dump', 0; Error massages;; FASM SYNTAX NOTE:; `du` directive accepts the quoted string values of any length, which will; be converted into chain of words with zeroed high byte. This way Unicode; strings are defined. ;szErrInvalidArgs:du 'Intro to Win64 Assembly and Process Dumping Practice Application'du 13, 10, 'See https://www. cybrary. it/0p3n/intro-to-win64-assembly-'du 'and-process-dumping', 13, 10, 'Usage: %s process_name', 0szErrCreateSystemSnapshot:du 'Failed to create a snapshot of system processes', 0. size = ($ - szErrCreateSystemSnapshot) / 2 ;string size in bytesszErrCreateModuleSnapshot:du 'Failed to create a snapshot of process modules', 0. size = ($ - szErrCreateModuleSnapshot) / 2szErrGetProcessInfo:du 'Failed to get process information from the snapshot', 0. size = ($ - szErrGetProcessInfo) / 2szErrGetModuleInfo:du 'Failed to get module information from process snapshot', 0. size = ($ - szErrGetModuleInfo) / 2szErrProcessNotFound:du 'No running processes with name "%s" found' , 0szErrOpenProcess:du 'Failed to get access to the process', 0. size = ($ - szErrOpenProcess) / 2szErrHeapAlloc:du 'Failed to allocate memory for process dumping', 0. size = ($ - szErrHeapAlloc) / 2szErrReadProcessMemory:du 'Failed to read memory of the process', 0. size = ($ - szErrReadProcessMemory) / 2szErrorCreateFile:du 'Failed to create or write to dump file', 0. size = ($ - szErrorCreateFile) / 2;
Uninitialized global variables;; FASM SYNTAX NOTE:; `virtual` directive defines virtual data at specified address. This data; will not be included in the output file, but labels defined there can be; used in other parts of source. `virtual at rbp` tells fasm, that all; labels inside the virtual data space will be relative to the value of; RBP register. For example, instruction `mov rcx, [hStdOut]` will be; assembled as `mov rcx, [rbp+0]`, instruction `lea rdx, [lpCmdLine]` will; be assembled as `lea rdx, [rbp+8]`, and so on. Note `mov rbp, vars`; instruction at the beginning of executable code. It initializes RBP; register with base address of uninitialized data. Using register-based; addressing produces smaller code: avg. 4 bytes per `mov reg, [rbp+num]`; instruction instead of 7-8 bytes for direct addressing `mov reg, mem`. ; `sizeof. vars = $ - $$` defines the size of virtual data. This amount; of data should then be reserved with `vars rb sizeof. vars` expression. ; This is required for fasm to properly compute PE section virtual size; during executable image creation. ;virtual at rbphStdOut rq 1hProcess rq 1hHeap rq 1lpHeap rq 1hFile rq 1lpCmdLine rq 1lpArgList rq 1numArgs rd 1numCharsWritten rd 2pcEntry PROCESSENTRY32Wsizeof. pcEntry = $ - pcEntrymdEntry MODULEENTRY32Wsizeof. mdEntry = $ - mdEntrytmpbuf rb 512sizeof. vars = $ - $$end virtualalign 16vars rb sizeof. vars;
Define PE32+ import section
section '. idata' import data readable writeable;
Import Directory Table; (see Microsoft PE and COFF Specification, section 5. 4. 1);dd 0, 0, 0, rva dll_kernel32, rva imports_kernel32dd 0, 0, 0, rva dll_shell32, rva imports_shell32dd 0, 0, 0, rva dll_user32, rva imports_user32dd 0, 0, 0, 0, 0; Kernel32 Import Lookup Table; (see Microsoft PE and COFF Specification, section 5. 4. 2);imports_kernel32:AllocConsole dq rva kernel32_fn_AllocConsoleCloseHandle dq rva kernel32_fn_CloseHandleCreateFileW dq rva kernel32_fn_CreateFileWCreateToolhelp32Snapshot dq rva kernel32_fn_CreateToolhelp32SnapshotExitProcess dq rva kernel32_fn_ExitProcessFreeConsole dq rva kernel32_fn_FreeConsoleGetCommandLineW dq rva kernel32_fn_GetCommandLineWGetLastError dq rva kernel32_fn_GetLastErrorGetStdHandle dq rva kernel32_fn_GetStdHandleHeapAlloc dq rva kernel32_fn_HeapAllocHeapCreate dq rva kernel32_fn_HeapCreateHeapDestroy dq rva kernel32_fn_HeapDestroylstrcmpW dq rva kernel32_fn_lstrcmpWModule32FirstW dq rva kernel32_fn_Module32FirstWModule32NextW dq rva kernel32_fn_Module32NextWOpenProcess dq rva kernel32_fn_OpenProcessProcess32FirstW dq rva kernel32_fn_Process32FirstWProcess32NextW dq rva kernel32_fn_Process32NextWReadProcessMemory dq rva kernel32_fn_ReadProcessMemoryWriteConsoleW dq rva kernel32_fn_WriteConsoleWWriteFile dq rva kernel32_fn_WriteFiledq 0;
Shell32 Import Lookup Table; (see Microsoft PE and COFF Specification, section 5. 4. 2);imports_shell32:CommandLineToArgvW dq rva shell32_fn_CommandLineToArgvWdq 0;
User32 Import Lookup Table; (see Microsoft PE and COFF Specification, section 5. 4. 2);imports_user32:wsprintfW dq rva user32_fn_wsprintfWdq 0;
List of linked DLLs;dll_kernel32:db 'Kernel32.dll', 0dll_shell32:db 'Shell32.dll', 0dll_user32:db 'User32.dll', 0;
Kernel32 Hint/Name Table; (see Microsoft PE and COFF Specification, section 5. 4. 3);kernel32_fn_AllocConsole:dw 0db 'AllocConsole', 0kernel32_fn_CloseHandle:dw 0db 'CloseHandle', 0kernel32_fn_CreateFileW:dw 0db 'CreateFileW', 0kernel32_fn_CreateToolhelp32Snapshot:dw 0db 'CreateToolhelp32Snapshot'kernel32_fn_ExitProcess:dw 0db 'ExitProcess', 0kernel32_fn_FreeConsole:dw 0db 'FreeConsole', 0kernel32_fn_GetCommandLineW:dw 0db 'GetCommandLineW', 0kernel32_fn_GetLastError:dw 0db 'GetLastError', 0kernel32_fn_GetStdHandle:dw 0db 'GetStdHandle', 0kernel32_fn_HeapAlloc:dw 0db 'HeapAlloc', 0kernel32_fn_HeapCreate:dw 0db 'HeapCreate', 0kernel32_fn_HeapDestroy:dw 0db 'HeapDestroy', 0kernel32_fn_lstrcmpW:dw 0db 'lstrcmpW', 0kernel32_fn_Module32FirstW:dw 0db 'Module32FirstW', 0kernel32_fn_Module32NextW:dw 0db 'Module32NextW', 0kernel32_fn_OpenProcess:dw 0db 'OpenProcess', 0kernel32_fn_Process32FirstW:dw 0db 'Process32FirstW', 0kernel32_fn_Process32NextW:dw 0db 'Process32NextW', 0kernel32_fn_ReadProcessMemory:dw 0db 'ReadProcessMemory', 0kernel32_fn_WriteConsoleW:dw 0db 'WriteConsoleW', 0kernel32_fn_WriteFile:dw 0db 'WriteFile', 0;
Shell32 Hint/Name Table; (see Microsoft PE and COFF Specification, section 5.4.3); shell32_fn_CommandLineToArgvW:dw 0db 'CommandLineToArgvW', 0;
User32 Hint/Name Table; (see Microsoft PE and COFF Specification, section 5.4.3); user32_fn_wsprintfW:dw 0db 'wsprintfW', 0
References
1. Intel 64 and IA-32 Architectures Software Developer Manuals http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
2. AMD64 Architecture Programmer's Manuals http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/
3. flat assembler http://flatassembler.net
4. Microsoft PE and COFF Specification https://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx
5. The PE File Format by Bernd Luevelsmeyer http://www.pelib.com/resources/luevel.txt
6. Overview of x64 Calling Conventions https://msdn.microsoft.com/en-us/library/ms235286.aspx
7. Windows Internals https://technet.microsoft.com/en-us/sysinternals/bb963901.aspx
8. MSDN: Tool Help Library https://msdn.microsoft.com/en-us/library/windows/desktop/ms686837(v=vs.85).aspx