13 hours 15 minutes
Hello. This is Dr Miller, and this is Episode 15.2 of Assembly.
Today, we're gonna talk about SSE an Imex, extensions and then the A s. And I extension.
So SSE and MX
so mm X generally is thought to stay in for a multimedia extension?
There's a little controversy over what it actually stands for, but that's the easiest way to think about it. So in 1997 until added these multimedia instructions. And so MX defined eight registers that could be used. Mm, zero through mm seven.
And these are 64 bit wide. Um, and that's on a 32 bit processor.
And so then we can have that they are packed into different sizes so you could have a bunch of bites. You could have a bunch of words.
You could have a bunch of the words that all fit in, decide one instruction, and then you can perform multiple operations on that piece of data such that you can do multimedia things, things that pixels and, um,
that type of manipulation.
Um, but whoa. But they did is they used the X 87 so the floating point registers, and so that meant that you couldn't mix IMAX and floating point operations.
So then what they did is they created the SSC instructions. So this is streaming S I M D extensions or SSC.
And so what that added was what we have s I m d is a single instruction, multiple data.
So again we can have one instruction that operates on multiple pieces of data and then hopefully that can be more efficient than having a lot of instructions go over that.
And so they extended the Intel X 86 architecture in order to do this and then they again use this in multimedia. So what this did is its plan to those an Imex instructions. And we'll see why in a second, Um and that's because they they did address those floating point operations issues.
So SSC one came out in 99. So not too much. Later and again, it added some additional registers. So now we have 128 bit registers
xmm zero through XMM seven. And again these could be packed. So, for example, we could have four imagers
in each one of these xmm registers, and so that allows us to very efficiently, low data from RAM into that register and then do operations that are s I m d applicable to that.
And so additionally So intel in a m de sort of were in a battle. At this point, they they have been for quite a long time.
And so they added Exim eight through 15. And so then when Intel came out with their X 64 architecture than they, of course, added those asses well, so we have 16 registers that can be used for these SSC instructions.
um, in 2001 SSC to came out, and this allowed them to use those MX instructions, and they could use XMM registers. And so this meant that you could basically convert all in the next code into S S E code,
um, and then use the X and then registers instead of the floating point registers. And so this sort of
made it so that we didn't need Emma Max anymore. We had the SSC to that anybody could use for libraries.
We also added s a C three and four. So three was in 2004.
Um, and we can have operate on the data within the same register. And so, for example, we could add all of the numbers stored in one registered together the same time. And so again, it provided sort of better flow for our assembly coat. And then SSE four came out in 2006
and it added a bunch of string and text functions. Um added some math operations, like sums on absolute differences and dot product. But again, they keep adding it to make it better so that you could doom or with the instructions instead of trying to do use a lot of instructions in order to do the same thing. And so
that's why they added a bunch of these different extensions.
And then, if you want Teoh be able to use these interests sick functions within your code or registers, you can basically include different library headers. And so we have again Emma Max ss CSSC 23
and they have different versions of four. They came out, um, and then we have a s.
So the A s and I So this is, um,
the advanced encryption standard or A s, and then and I very cleverly named new instructions
and So this provides specific instructions that could be used for the A yes, encryption standards. So when we created a standard that the America the advanced encryption standard,
then anybody could use that. And so this was the standard used by everybody. And so it made sense for Intel to implement these inside of instructions instead of people trying to rewrite the code from scratch. And so Intel has a really nice documentation.
The describes all of these instructions and gives examples of how they can be used.
So I'll just explain a little bit about the A s algorithm. So there are a couple different parts that we have for this. So the first is going to be key expansion. So we take the key and we end up expanding it or running it through a bunch of operations in order to mix all the bits for when we use the encryption.
the different key size is determined the number of rounds that we end up using. So we keep doing this thing for a certain
this process for a certain number of rounds. And so for doing 128 bit encryption, it's 10 rounds for doing 100 92 it's 12 and if we're doing 2 56 it's 14.
And then when we do encryption, we basically add the round key. And then we do these things of substitute somebody. It's shifts, um, rose and mix some rose. And then we end up repeating that multiple times. And then at the end, we do one final round
that doesn't do the mix columns. So you see it substitutes bites, it shifts rose, and then it adds around key. And so it doesn't do that mix column So it's the last step is just a little bit different than all the rest.
So again, I pulled these out of the white paper that Intel produced, so they have different operations. So one is a S key Gen assist. And so this is helping us to generate an encryption key. And we can see here that this takes basically two registers and then the round constant and we can see the round constants in here.
And so a yes very specifically defines with those round constants are So we got 1248 10 2040 81 be in 36 all in hex.
And so, um, we can see that they end up calling these only on the register. We got a couple of different registers. We got our
result register, and then our 128 bit input. And so we use those in order to generate,
the key, or do the key expansion from that.
We also have a S e N C or a yes, encrypt.
And so it's going to perform one round of a yes encryption, so accept them. One is our input data, and Xidan to is around key. And what this does is it shifts, rose, substitutes, bites, mixes columns and then takes xmm one and X or is it with
Ah, excellent. To an extent, um, two is our round key. So it's moving the data around and then doing an X or at the very end,
we also have a yes encrypt last and so again, that very last step. We don't mix columns, right? And so it basically does all those things that the encrypt does.
But it doesn't do a mixed columns right here. It just does the X or at the very end,
we also have the similar operations for decryption. So we got a S D. C or decrypt
and again we have our input data and we have around key. And so this is gonna provide that inverse operation. So we're going to decrypt the data.
And so here it's invert shift rose inferred substitute by its invert mixed columns.
Um, and then we do an X or with the data, and again we should end up getting the result that we had before.
But just as the encrypt does, decrypt has a last operation. So again, that last operation doesn't mix columns just like we did before.
So again, it doesn't Ember shift inverts up stewed bites and then doesn't X or
and then because we have the keys, there is a we'd end up doing an invert mixed columns,
but that ends up being operated on the key in the initial step and not inside of the decryption. And so they have this A s I m. C or invert mixed columns
that will be used with the, um after we After we expand our key, we use this in order to properly prepare the inverted key for it.
So today we talked about SSE and Emma Max extensions and then the A s and I extension.
So, looking forward, we're going to look at some 80 s libraries that you can use, and either you can use them inside of C code, or you can use the intrinsic and, ah generate raw assembly code for that.
So what are some of the A s specific instructions?
Well, here's the whole list of them. So we got key Gen assist encrypt, encrypt. Last decrypt decrypt last in invert mixed columns.
If you have questions, you can email me Miller. I'm Jay. You and Kate. I e d u and you can find me on Twitter at Milhouse 30.