13 hours 15 minutes
Hello. This is Dr Miller in this is episode 11.6 of Assembly.
Today we're gonna talk about memory offsets and then debugging and disassembly
So when we have registers, as we've noted before, load and store are how we get ***, tough from memory into a register.
assembler can Onley load a bite on and then do some shifting on on that bite.
It can't do math operations, and they're only certain values that can actually be loaded. And so here's this from the specification.
So if we have a move operation, it can only have values from zero to,
um four efs. So we can only move up to two bites into a register, given an immediate
um and so if we have a 32 bit value that we want to load into a register, we actually have to do it,
um, through multiple operations. So we basically have to load an offset that will then allow it to read more data than
a 16 bit number.
And this is because each arm instruction is 32 bits long and so therefore, there is no way for you to load a 32 bit number when you have to have part of the instruction has to be the op code to say what data operation am I going to do?
And so when we do that, when we went toe Lodha register from memory, we basically have to,
um, sort of jump off of a location that has where we want to get that data from.
And so, for example, if we do a load register are zero and then we put an equal sign and then this number here, it will actually generate some code saying there's going to be some data here, and that data is gonna be zero x 11223344
And then you notice that it's it's actually converted this from load register of the Equal sign
two. It has a PC, and then a um it has a negative there, but it's not really negative rate,
zero additional bites.
And this is because in arm, the instruction pointer actually points a couple bites ahead.
And so it is actually pointing to bites ahead. And so that is this location down here. So it's that offset zero from the program counter and the program counter actually points to here when it's executing this instruction,
so it's a little bit odd, but that's the way the assembler works now. It'll be a little bit easier to see him in our next example.
So here we have several bites that we defined. So we got our half words in our words. So half word is 16 bits
and a full word is 32 bits,
and we can see here inside. Of the code that we're loading are zero gets the value of a well, it's gonna jump down here at the end to an offset that points to a or an offset that points to B or C or D,
and then the half words these air going to take up
two bites and these words will take up four bites, so we'll see those differences in the addresses that get generated.
So what I've done here is I've taken the original code and then compared it to the disassembly, the disassembly I got from
using object dumps, object dump and in the minus T is disassembly. And then this is the binary that we want,
and then it generates a lot of source code. And so I used grab to filter it and only look for the text main with a bracket.
And so I will give me all of the code that is in here, which is why this is highlighted. And then a 30 means print 30 lines after you find that match
and so we can see here that,
um, the dis assembler figures out what this offset is. So it says main plus zero x to see for both of these.
But if you look at the program counter again, it's relative to the program counter. So this is 28 bites away. Is this offset down here?
I mean, if we take to see if we take 10470 and add to see, we end up with 1049 c
and so that is this location down here.
And then these air gonna be pointers to the actual memory addresses.
And so when it does, a load register is is bouncing down here and getting this value and copying it in,
we can see that the 1st 1 was a word. And so we see that it takes up addresses C and D,
and then the next one takes up E and F
and then the next one, which I think it's see it takes up 012 and three because it's four bites on. This one will take up 456 and seven.
And so we can see how these these addresses keep going down. And these two are the same right main plus to see.
But the programmer is the program counter is relative. It's pointing eight by eight bytes forward for the program counter
and then, in order to another way to visualize this. So in the previous example, I used Object Dump.
But additionally, there are professional tools that can allow you to basically look at the code.
And so when I compiled my code, I have debugging symbols in. And so the dis assembler actually knows the names of the variables that I used.
If we were to remove those the bugging symbols, it wouldn't actually have a name here.
It would just have this data offset
and again we can see that it automatically goes and calculates that for us and says, OK, this is data 1049 See this one is also the same. But again, those air relative program counters
and then we can see the same thing in Ida Pro.
So binary ninjas. A fairly inexpensive one eyed a pro is Ah, it's a much more expensive dis assembler that you can purchase for commercial purposes.
But again, it shows us what we writing code versus what we actually get from the binary.
And so it's going to again have these these relative offsets in there.
All right, the bugging.
So a lot of the times will probably basically want to use the same debugging tools on,
um, the raspberry pi that would be used on regular Linux. So, for example, when I go into bug something, I'll make sure that I install GEF
Um, and GDP should be installed.
Um, one issue that I've run into is that G B does not pick up on the transition from regular mode to thumb mood, and we'll talk about some mood,
and so are the bugger Has a bug in it.
Um, so as long as you're not doing that, the but do bugger works in the same way.
And additionally, we have programs like S trace.
This will work on X 86 to, but it'll trace system calls and signals, and we'll see some examples here.
So, for example, all of the system calls related to just printing off the number 42
we can see that it's it's calling all of these system calls that allow us to have our user name were running the exact command, which is what the shell will dio. And then it gets down to the system Call of right and we'll see that that prints off the number 42.
But there's a lot of information that can be gained by that. If you have a binary that you don't understand and you're trying to look at the assembly for it
and then with Judy Bee, we're going again have the same commands will notice that in again the GEF is installed here.
We can see that it lists the registers and so you can see the regular name for these. So R zero through our 12 are 13 is our stack pointer, so it points to the stack. Are 14 is the link register. AR 15 is the program counter
and so we can see all of those and the typical names that they're given, you can see the value that it has, and then you can see
well, if that's a point or what is the 0.0.2? So, for example, when you start a program
are zero is how many arguments you have are One is a listing of those and then are two is the pointer to the environment, and so are one has gotten changed here in the last instruction that I executed.
You can also see exactly where you are assed faras the stack. So we have, ah, a pointer to the stack
and then our disassembly showing which line is the next line that's going to get executed. And again it'll show things like programmer counter and then number 20. So it means plus 20
from that location is where it's loading that data from.
And so you can see all of that in the disassembly, just as we had before
some additional commands. So if you are interested in virtual memory, you can use VM EP to see what memory addresses are are mapped to our binary
Um, what libraries are loaded, you can use X files and then there's also a process memory map on and that's the info Prock map.
No spaces in between. And so those are some commands you can run. Once you've started a process that's running,
you can get information about the virtual memory map and the loaded files in the process memory.
Additionally, we can do the dis *** of a function and will show us all of the assembly for a particular,
um, function that we might have defined or you might want to look at.
And then if you're starting off and you have a C program and you want to figure out exactly how it works, well, you can generate a listing for that, um, c program.
And so basically, this will show you
the high level code and then what low level code gets created
and you can see, for example, we got some locations here that are defined, and those are probably not as useful. But
you can see commands like, you know, push these registers and then move this register into this register
and eventually a branch with a link for print f right. It ends up getting called because we're printing off something, and then these registers get set
are zero R one and R two.
And so the benefit of this is that allows you to instead of trying to reinvent the wheel of how do I call printer for How do I call this? If you know how to do it in a high level language, then you can reverse engineer and figure out how you would do it in raw assembly.
And so it kind of gives you a little link between What would the high level code be in what would some low level code B?
All right, so today, in our lecture, we talked about memory offsets and then some debugging and disassembly commands
and the future. We're going to talk about saving registers and an arm array indexing.
So why can't armload a 32 bit address?
Because there's no room in a four by top code to load a four byte memory adjusts.
If you have questions, you can email me Miller MJ at you and Kate I e. To you, and you can find me on Twitter at Milhouse, 30