Assembly

Course
Time
13 hours 15 minutes
Difficulty
Beginner
CEU/CPE
14

Video Transcription

00:00
Hello. This is Dr Miller and this is Episode 13.1 of Assembly.
00:05
Today we're gonna talk about the compilation process and then some tools like binary format readers and disassemble er's
00:13
compilation process.
00:16
So we have a few examples of some different compilers like GCC visual studio or Min GW, which is the GCC compiler
00:25
and a compiler is just a program that translates some computer code that somebody has Britain from one programming language into another.
00:34
And so we have our source, which is generally written in a high level language, and then we have some sort of target,
00:40
and generally what we're gonna do is we're gonna take high level code, and we're gonna turn it into a binary, machine readable code that we can then execute on the system.
00:51
So the processes that takes in a high level program and then we'll generally produce an intermediate representation so that you could have
00:59
multiple different inputs and be able to apply the same optimization is to his different inputs. So, for example, C sharp allows you to write something in a variety of different languages and then it composite mall down to enter a common intermediate representation
01:15
and then is able to apply. Optimization is toe any code written
01:19
despite the language it's written in,
01:22
and then it generates the output program or the binary file.
01:26
So the output that's going to depend on the target architecture
01:30
so few examples here
01:33
she might be running Windows X 86 or OS 10 X 86 or Android on ARM
01:38
or IOS unarmed, and he could have also limits on arm or X 86.
01:44
And so your output is gonna highly depend on what architecture you're wanting to say. Four. So we've seen to this course X 86 code in an arm code and those air going to be different pieces of code that are generated
01:59
and what's generated is a binary file format. And so
02:04
what we have is we have a format defined by the operating system, something that is readable by each OS.
02:10
So some examples. So we have the Elf format, which is used for Lennox. We have the cough for calm common object file format, or P E file for Windows, and then OS 10 has mock Oh,
02:24
so here's some examples, So if we look on the left hand side here, we can see that we have the actual
02:31
um asking characters or hex characters. And so these air asking these air going to be hex
02:38
that show exactly what is in the file at exactly what offset so we can see offset. Zero. We have an M and a Z.
02:46
Additionally, inside of here, we can see there's a peon Annie and those air going to relate to these different headers. And so the headers basically tell the operating system exactly how the files laid out. So in a portable execute herbal or a P E, we have a DOS header, and then we have a P E header,
03:05
right? And as it says here, right, the DOS Center says, This is a binary file. And then the P header says, Oh, it's by the way. It's not DOS. It's this new version called P E.
03:14
And then inside of that we have additional headers or optional headers, and those will give it information about what system that can run on. Um, what what is the image base that was used to compile this and link it?
03:25
Um, and then we have the final part, which is the code and so inside the code. Then we would have the binary for movie a x 42 we can see that's in here. And then the return is going to be C three.
03:38
And so this is an example of a P file. And the designer of this does the basically the smallest p file he can get, which is why it's called Mini Dxy.
03:50
But we can see that the format
03:52
really specifies exactly what we need in binary.
03:57
Now here, the same designer. Here we have the Ma Cho file format,
04:00
and again, we have different what we call magic numbers so we can see inside of this one. It says feed face. So that's the magic number for Marco.
04:11
I don't know why they did it, but that's what they did back here. A little bit of history. So MZ stands for
04:16
Mark. And then I think Zippel wig is the name of the guy. And so he designed the format. So he put his initials in it.
04:24
And here we have feed face. And then we have different segments, um, threads. And then at the end, we have our code,
04:30
and so again, you can see that when it's going to do an exit right here it's doing. And 80 versus our previous one. We just did see three or return end.
04:42
And so the output format is highly dependent on the architecture that you are saving, too.
04:49
And then finally, we have the l format or execute herbal and likeable format,
04:55
and we can see inside of it. It has the magic number seven F and then the asking letters E L f.
05:01
And again we have some headers, right, and a table, and then we have a code. And so each one of these is the smallest sort of representation that you can have or minimal representation of that binary format.
05:15
So some tools.
05:17
So if you want to look into exactly what's code is generated by a high level, you can use several different ones so we can use P E view on windows,
05:27
and so that allows us to look at P E files and see the internal structure of them.
05:31
Additionally, we have object dump, and so this is gonna display information about object files,
05:38
um, and also some binary files, and then we'll have some dis assemblers,
05:44
so again. Object Dump is a dis assembler. It has the ability to do disassembly for it.
05:48
It's free. It's on all versions. Lynn Dicks, and so it's pretty easy to use, but it's very basic. It just does some text,
05:56
whereas thes three. So I'd a probationary Ninja and Deidre are all commercial dis assemblers, and so they have a lot more features, like being able to rename things and color coding.
06:08
Um, usually Ida Pro is sort of the default standard, but the price tag on it is a little bit prohibitive.
06:15
You know, 928 100 is a little about of a lot of people's range.
06:19
Um, binary Ninja is a little cheaper. It doesn't have quite as many features aside a pro. It's about $300
06:27
and then Godhra is free, and it has been developed by the NSA and then released.
06:31
And so it's sort of a new tool that just came out recently, and so a lot of people are looking into it as a alternative to Ida Pro. But any of these air going to allow you to disassemble or look at the code that gets generated
06:47
and then hex editors allow you to view and edit possibly, um, and save files that Aaron Hexi decimal representation. So hex is
06:57
numbers zero through nine and a through F, and those represent four bits for each one of those.
07:02
So some examples hx t or cheat engine. So those are just do hex editing. But also, all of the dis assemblers will actually show you Hexi decimal dumps. And so this one's from Binary Ninja, showing a hex dump of a binary that I loaded.
07:24
And then, if you want to, you can also read these binary formats using libraries. So I just pick some example Python ones here
07:30
so you can read a P e file or anel file or a Marco file, all from within Python.
07:38
And so these have the ability to parse those files and then to understand
07:43
on what the different parts are and give you representation so that if you want to do some programming on it, you have that ability.
07:48
Additionally, there are a lot of other languages. I just pick some some examples here.
07:55
So today we talked about compilers and what they do in their process, and then some tools, like binary format readers and binary formats and what they were and then some disassembly tools that you can use.
08:09
So looking forward, we're gonna look at the reverse engineering process so that we can understand
08:13
how to break apart binaries. If we write something new and we don't understand how it works,
08:18
we'll set up a miniature reverse engineering lab so that we can look at some of these see constructs that air high level constructs within the C language.
08:28
So what does the compiler do?
08:31
It turns high level code into machine readable code that is executed ble on a system.
08:37
If you have questions, you can email me Miller, mj at u n k dot you to you and you can find me on Twitter at No House 30.

Up Next

Assembly

This course will provide background and information related to programming in assembly. Assembly is the lowest level programming language which is useful in reverse engineering and malware analysis.

Instructed By

Instructor Profile Image
Matthew Miller
Assistant Professor at the University of Nebraska at Kearney
Instructor