Hello. This is Dr Miller, and this is Episode 15.3 of Assembly.
Today we're gonna talk about cross platform es and then the A s interest, intrinsic six and their implementation.
And then we'll disassemble these intrinsic stand look at them.
So cross platform A s.
So if you go out there and google a tiny A s library or one that is very small, you can find a cross platform see version that somebody has written.
It's called Tiny A S and C, and I put a get her blink here to this person's code. And so one of the things that they do is they implement a lot of the different a AES encryption libraries, and it's it's pretty easy to go through them and sort of read
how they work. And we've got a couple different ones here. So, for example, there's functions to initialize the context of the A s library.
Um, and then there is a couple different ways of encrypting. So we have Elektronik code book or E C B. And then we have cipher block Cheney or CBC.
And if you go and read on those, you'll learn about why, um Elektronik code book is useful and when you would want to use it and then when cipher block chaining is useful And when you want to use that,
um and so we have both the encrypt and the decrypt functions that we can use inside of there.
And so one of the things that you'll see and if you go in Google about a yes, so a s uses the Rindel soft cipher defines this s box. And so that s boxes a bunch of fights
that are used for substitution ins that we can have when we have before word one and we ever in verse one.
So when you go through and look at the A s code, you'll end up seeing all of these bites. So zero x 63 So this is defining what all those s boxes are. And so, in order to use it,
the programmer has to define all of these bites so that the encryption when they implement it, is going to work properly.
And so, for example, we'll see here. This is another function. Mixed columns, right, which we've talked about before, a little bit, but what it's going to do is it's going to mix these columns based on how a yes says that they're supposed to
be mixed, right? And so we've got a bunch of different indexes. Um, and you got to make sure that you get that right for the different columns inside of the state matrix.
Another example is the shift rose again in a s dot C and you can go and look at the source code. But this is shifting those rose in two different locations
and we can see that,
um, again, it involves quite a bit of code, right in comparison to what we'll look at in the next one.
Um, the one benefit of the of the previous one is that it's cross platform. So if you're architecture doesn't support these A s and trick six, then you can use it.
Um, but if you do have hardware support, then you can use the A yes, intrinsic implementation, and there are a couple of different versions of it. So this is just one of the implementations that I found
again. There's some code on get hub, and it supports again the initialization of the or key expansion a minute sports. The in Krypton decrypt. And this is just an 128 bit A s implementation.
Again, there's additional ones that can be implemented, and there's are implemented in that cross platform one.
So a couple of things we see the W mm intrinsic. So that's going to define a lot of different memory access operations.
Um, that we might want to use a swell as some other raw assembly instructions.
And so when you're looking in here and you see a type that is not your normal type, that's because they're using the W. M. M
um extension in order to implement those a yes intrinsic.
So here again, we're going to see these are basically raw assembly command. So it's gonna take Z code. It's going to generate as close to assembly as it can. It takes arguments. And so then it implements that very directly into our assembly code, and we'll see how that 11 correspondents works.
But we here have some macro. So this is the key round, and this is for encryption.
And so we can see the key round is actually this pound of fine here on what pound to find means is that it's basically going to replace, um, this code with all of this code. And so, for every key round, we end up taking the key at I and we have our round. And then we do that
assist. And then there's some shuffle, some X sores,
Um, and then where we do one more X or at the end, right. And so, for each one of these rounds, it's going to expand to one of these.
And then this down here is for the decryption, so it actually sets up both of them at the same time.
So when we get to the encrypt function again, this is in C. But this is the M. M. A s
encrypt. Right we see are we see a couple loads and an explorer, and then we have our encrypts for all of the rounds. And then we have the A s encrypt last at the very end, right? And so this is the very raw implementation of that. Using those a yes intrinsic sis, if we notice we didn't, we're not going to see
that s box, right? So that that is sort of built in to the processor.
So it knows what that substitution boxes when it's gonna go on, Do encryption.
Um, and it's just implemented. Is one instruction instead of the whole bunch of instructions that we saw inside of the sea version,
I'm decryption is gonna be similar, right? So, again, this is where it sets up the inverse key for decryption. So in this person's implementation, where we have it all at once, So we generate the, um,
key expansion, right? And then we do the inverse, right? So that's that inverse mixed columns I am. See, that's in there. So that's gonna raw get created for us directly an assembly. And then it's able to use the decrypt with the key at these different positions. Again, we can see where those get loaded,
um, and then the encrypt or decrypt last uses K zero so that
initial one that gets set is used on the last one. And so we have again The intrinsic six were able to use those in order to very directly access the hardware in order to do these operations, and you'll see that a lot with
very specific extensions that people want to use inside of sea in order to speed up their code.
And so now we can take that a yes. Library and weaken basically disassemble it so we can see what the assembly looks like.
So, for example, we can see each one of these operations ends up basically a 1 to 1 correspondence between,
For example, a S key. Gen assist the number one right. That is this substitution right here. And it's got our round, which are round is one right, two for eight.
We can all see those on there, and then we can see that we have the shuffle operation and then we can see we have the X or operations and there's a couple loads inside of there. And so again, it's a very direct correspondence. If you would were to take
the sea version and decrypted, clearly, it's not gonna be able to use the A s libraries, and additionally, it's going to do all of that math for encryption inside of the hardware. And so,
at the end of the day, it's going to take up more memory and it's gonna be slower.
And here we can see the encryption. So again, we've got some load and X or and then we've got our encrypt. We've gotta invert columns in there, so we got a couple extra operations, but it's directly corresponding. We got our A s. Encrypt is going to directly correspond to that,
and we can see our XMM registers in there and the compilers managing all of those forests.
But if you're disassembling some code and you see the V A s encrypt or V A s decrypt right, you know that they're going to use a yes for their encryption mechanism.
So today we talked about a cross platform A s, and then sort of compared that with the implementation that can leverage the A s and trimmed six. And then we disassemble it to sort of see that there was a direct correspondence between the intrinsic still finding that W mm library and
the assembly code that we will get generated for that.
So what are the benefits of using A S and I
So we have reduced code size and then we'll have faster execution.
If you have questions, you can email me Miller. I'm Jay. You and Kate. I e d. And you can find me on Twitter at Milhouse 30