13 hours 15 minutes

Video Transcription

Hello. This is Dr Miller and this is Episode 13.2 of Assembly.
Today we're gonna talk about the reverse engineering process and then give a simple example
the reverse engineering process.
So a dis assembler is a program that translates machine language into assembly language. So we take the raw binary and we create assembly out of that.
And then it's a process of taking a binary program and generating a view of a program that could be executed.
There are a lot of different caveats to that on people who try and mess up this assemblers and make things that
don't disassemble well.
So the disassembly process. So what it does is that is going to read in the binary data,
and this really depends on the format that the data is stored in based on the OS,
and then determine the layout in references. So once you've determined which architecture it's for, you can determine what the entry point is, where it's gonna start executing dynamic libraries. So what Libraries air going to be loaded into this binary that it's allowed to call and then system calls how the binary will interact with the system
if it's not running his route,
and then it'll start disassembling instructions, and it will start at the entry point.
So step one reading binary data.
So a dis assembler is going to use magic numbers in order to guess at a format. So often times this is just for convenience, so that when you double click or open a binary, the dis assembler grabs the right one. So that way, um, it's it's easier, and you don't have to select that as an option.
And then that is gonna be based on the operating system. In the method of loading the binary so it will load in the data, it will make sure that it reads it date into their correct structures. So that way it can then start analyzing that,
and when it analyzes, it will determine determine offsets. It'll look at the AAC architecture
and the operating systems, and so then we'll understand what memory locations we need to start looking at.
So as its determining the layout in the references, it figures out what the entry point is, and so this is going to depend on our different formats.
So L for Lenox MA Co. For us 10 and P E for Windows,
and then dynamic libraries are specified in that binary format and loaded by the operating system as a binary is loaded. And so the dis assembler will go ahead and understand what those are. So that way it can,
read in the data and understand what functions we have and what functions or it can be called.
And then system calls are going to depend on which Os and so I'm erupt on. Lennox is going to be different than an interrupt on Mac OS. And system calls are going to be done in a different way based on each OS. And the parameters might vary depending on,
um, which OS it's gonna be loaded on
so it will go ahead and then and start disassembling instructions. So the typical way that it works is it will start at the entry point, look up the instruction length, so if you're gonna be in thumb or arm, or if you're gonna be in X 86 where the instructions can have varying length, you got to figure out how long each instruction is.
You then disassemble that instruction and then if there is a branch that occurs. So if you see a jump or a branch,
then you basically have to say I can go to the next instruction and I might end up coming back on, depending on if the branches taken or it's not taken.
And so the dis assembler will add these in order into a queue of instructions that need to be disassembled and then continue on until all the instructions in the queue are processed.
And we do have two different types of dis assemblers. So linear sleep many or sweep does not start at the for. It does start at the first instruction,
Um, and it just basically goes to the next instruction and doesn't look at the branching, um, and just goes from the start until it gets to the end. And so it might disassemble instructions that are not actually instructions.
So then we typically like to see something that is what was called recursive descent.
So that's going to again start at the first instruction, and then it's going to follow the different possible past that we have. So if we have a jump or a branch or call a return, those air gonna basically add two things that we two different places that we could go
and then some functions may not return right, so those will get not shown or they'll stop there. And then some offsets might have some code that's gonna be hidden and that you have to understand the instruction and the architecture in order to find those
so typically a recursive descent is a much better one. A linear sleep is kind of a lazy way of doing it, this sort of simple way.
So here's a simple example.
So we have here a C program, and all it does is the puts is going to print off to the console Hello world, and then when it's all done, it returns zero. So it's written in a high level language.
That program there is used to write right low level code so somewhat low level, not not a super high level one.
And so if we have that code and we compile it and then we go ahead and disassemble it so we can use object dump in order to disassemble that code, we can see that it adds a lot of information in here so
we can see here. These air sort of the headers for the, um, function here. So it's instructions that are added by the compiler that if we're compiling a by hand, we may not add or may not need to had.
And then we can see here we have our pushy VP movie B p E S p, and then it saves ZC X. It pushes an offset on here for the call to puts. And you can see that the dis assembler
knows that puts is in the plt, which is part of Lennox, and that when it calls that function, it's
gonna print something, and so it's gonna call the puts function,
and then we end up cleaning up the stack and then doing our return.
So if we take the same example, we can see it in a sort of modern editor. So this is in Godhra,
and we can see that it sort of took the offset here that we saw in binary and the previous one, and it actually shows us Oh, by the way,
this is a string that's hello world,
and then we can see it says directly called the puts. And we can see the function signature that is associated with that. And also it's going to do additional highlighting, showing that when it does this L E A. This is parameter one, and then we see we have a local variable that it it has determined for that.
And so all of this is going to be done automatically by that type of dis assembler.
And once again, here's another, um,
view. So this is gonna be in Binary Ninja, and again we can see that they all have different ways of sort of showing the exact same information. So here again we can see the string inside of there, and it's going to
usar CSI. So it's saying the it's saving our CSI
and then we have a local variable and then we're saving EVP and again, cleaning up the stack and exiting.
So today we talked about the reverse engineering process and then gave a simple example and then beauty in a couple different dis assemblers,
so looking forward will
have an example to show you how to see different programming constructs and build a reverse engineering lab so that we can see those different constructs. And then we'll also talk about see constructs.
If you have questions, you can email me, Miller, MJ at you and Katie to you, and you can find me on Twitter at Mail House 30.

Up Next


This course will provide background and information related to programming in assembly. Assembly is the lowest level programming language which is useful in reverse engineering and malware analysis.

Instructed By

Instructor Profile Image
Matthew Miller
Assistant Professor at the University of Nebraska at Kearney