Hello. This is Dr Miller, and this is Episode 12.1 of Assembly.
Today we're gonna talk about vector registers and then neon
it's a vector registers.
So the vector registers and the vector floating point co processor is allows us to do floating point operations. It in traditionally was a separate co processor. So it performed these floating point operations that the regular processor didn't have the ability to dio
and then going forward it. This has been deprecate id in favor of the neon architecture, which uses larger registers.
And then there's a difference between scaler and vector. So when you're talking about the V f. P, we have different registers. For example, we have the floating point status and control register, which has information about the different flags.
We also have additional registers, so we have ones that start with s and then some registered numbers. Those are 32 bits.
The D ones start with 60 have 64 bits, and the Cubans have 100 28 bits.
And then we have a set of what are called scaler registers that conduce scaler operations. So that's s zero through a seven
and then D zero through D three and these end up representing the same sets of bits and we'll see that in a diagram here in a little bit.
We also have vector registers.
So those are S a through S 31 d four through d 31 then Q zero through Q 15.
And then when you're doing vector operations using the F p, you have this wrap around, if the stride in the bit links, they're set, and so that allows it to do wrapping operations. And this is in older versions of arm and it's deprecate ID.
So, for example, you'll see some of these, but they won't work on devices like a raspberry pi, which I've been using in this class.
So here's a typical layout of the scaler and vector registers, and this will be the same for neon as we go forward.
So we have registers s zero on s one, and those composed together would be registered D zero.
So it's similar to X 86 where we have smaller registers, which then look like they're composed, is a larger register.
And then, for example, Q zero is all of these registers put together So it's all of a D zero and D one, which in turn, is is s zero s one s two and s three.
Same thing for each one of these que registers. And then you have the notion that we have these different scaler operations weaken dio and we can take a scaler and multiply it by a vector. And so these, in that parlance, are the vector registers and confused for vector operations.
When we get to neon, we will not have is many of these restrictions, and so
you'll just be able to use any of the registers for that. And so I was, Ah, change in the architecture.
But here, this just shows D zero through de que zeroth u Q seven or registers s one through s 31. So we have 32 s registers.
We have 16 d registers, and then we have eight key registers in this diagram.
But we have, um, the ability to do debugging
and so you can do info float, and that will show the status register status in control register. It also will just show registers s zero through s 31.
And so, if you're looking at it and you want to see the floating point or a manager in S zero to s 31? This is a good way to do it. You can also do info vector, which will print all of the registers, which can be a little bit overloading.
Um, but then also in GDP, you can just print a single register so you can say print dollar sign F zero D zero accuser or whatever number that you want to print.
So that's probably the easiest way to look at those registers and make sure that they have the values that they think that they should have.
so neon is is arms advanced s i m d or a single instruction multiple data for arm. And so they took the V f p, and they basically scaled it up and allowed it to do more things. So in floating point,
the VFW, we could just do floating point operations. But with neon, we can also
do operations on imagers.
It also has some ability to do some parallel processing. So when we looked at the V f p,
it only had the ability to basically go off and run in separate mode and then come back with the result. Whereas neon, we can just put the all the instructions together.
And then neon started with the cortex A eight and some of the newer ones.
But you can add it to sort of any of the newer types of architectures that that arm is producing.
And then neon will allow us to do either 6428 bit s I m d,
um, operations. And you can use those for things like GPS or empathy MP three, decoding.
And so when we talk about S I m. D, we will look at those types of operations we can do and sort of get appreciation for the power that it has.
So here's some example Neon instructions.
So we can do so. The V stands for Vector, and that's what
almost all the neon instructions will have in front of them.
So we have things like absolute value. We can add some registers, weaken, do a comparison. We can do a duplications. You can copy a bunch of data to all lanes of the vector.
We can load a value. We can figure out what the men and the maxes of different numbers weaken do movement and multiplication.
And so these are some examples. You can go to the developer pages and see the full specifications of all of these and more that I didn't include.
So, um, the vector load. So here's an example of an instruction. So we got defector load register, so V l d r
we'll talk about conditional programming later,
and then we can basically do a data type. So the data type,
one of these. So we got I and then a number or s. So I stands for Imager s stands for Signed you stands for unsigned F stands for floating point.
And then these can be either 8 16 32 or 64. So I can loden imager. That's eight bites or 16 or 32 or 64
and it could be signed could be unsigned. And then we got the two floating point. So we got either 32 bit floating point or a 64 bit floating point.
And then here we can just load a direct constant in, um for that,
and then we can either load it into a de register Oren s register. So remember, D is going to be,
um 64 bits s is going to be 32 bits.
We can also do a vector move on so we can do
ah, either move or moving. Invert. So this is a movie, and this is moving in, Bert, and then you again, we can have conditional execution. We can add any of these data types, so typically it's gonna load them in. I'm in a similar way so you can load imagers or floating point,
and then you can either load it into a quad register or a,
which is 100 28 or de register, which is 64.
So here is some examples, so we can do a regular move into register are nine of the value zero xff,
and then we can copy from our nine into s three. So we have to use the V move operation in order copy data into that register.
Um, you could load an immediate into a queue register, right? So it's gonna put the number zero in that key register.
Remember again that for these right, we can only do so many bits because each of these instructions takes up
some number of bits and we can only put for
all the instructions in arm are four bites. And so
there are some limitations to what you can do with some of these V move operations.
But you can do things like using the vector load. So, for example, I can load a,
um, variable into our zero, and then I can use this
El de are de eso that is double wide.
And so what it's gonna do is when I say that if I give it the register are too. It's gonna load the data from our zero into R two and r three because I added this D So I basically put two registers where 64 bits 32 in this register, 32 in this register.
And then if I do a V move, I can copy basically R two and R three into s for so little copy both of them, because it's going to read,
all of the bits into their.
So we have additional examples, so the vector move long. So one of the things when when they're doing this is they have the ability to basically do sign extension,
so either you can do sign extension. So if it's a negative number, you keep that sign bit and you fill it up. Or zero extension, which would allow you to have imagers, and they just get into a bigger register. So we'll get twice their length and ever get put into a larger register. We have moving narrow, so we're going the opposite way.
So you're copying the smallest half of the register, and then basically some bits are gonna get thrown away so you'd have toe understand that this end is going to stand for the narrow part.
And we also have, um, additional moving narrow with with saturation or sort of the very complicated neuve que word. It's gotta saturating narrow
into an unsigned result.
So you won't need these most of the time. But the armed documentation will tell you exactly what they do, and then you can always create examples and see
if you understand how they work.
So here is the from the armed documentation.
So, for example, we got move l. So we're going to move it, move and extend from a 30 or 64 bit register into ah 128 bit register.
And again we got the other ones, which I won't go over in a sailor here,
right? If it's if it's got a que behind it, that means that we're going to go to a Q word we get. We additionally have conditions.
Um, And then we got the destination register and, um,
in some of these rates, So cute Ian and GM or here, these have a different destination register for the further narrowing operation.
And then here we can give the types we got signed unsigned integers, um, for all of these different operations and there's different combinations of them.
So let's look at an example here. So,
for example, we can load s zero with this number, so I got 12345678 Remember, these are each going to be into bite pairs, so 78 is one bite, and we can see that down here.
Um, I also move a one b one c one D one into s one,
and then we do a move and extend into, um que nine from d zero. So remember, at zero and s one are the parts of d zero. And so when I move that as a ZA eight byte,
eight bit imager, right, so unsigned eight bits, we can see that when it goes here, we add some extra zeros in here, right?
So those extra zeros are making it so that we go from an eight bit an eight, an eight bit imager to a 16 bit imager. So when I look at the 16 bit imager, there's actually
bite here of zero right 0000 so you can see it interspersed zeros in between all of those.
And then here we have another example. So into Q 10 we move 16
um, bit imagers into into it so we can see here that the 78 and 56 remain the same. And then we had two pairs of zeros and the 34 and 12 remained the same. And we had two zeros
so that when you go from, we have a, um the 16 bit here, right, which is what we started with.
We have 16 bits here, and then we had another 16 bits of zeros on the top for each one.
your 16 bit imager here became a 32 bit register,
so the value got bigger so that we can use that. And so in these we have basically four imagers that just all of a sudden went from
16 bits to 32 bits each.
We can also do operations like the vector multiply so we can say mole, which has a key register or V mole in there and so
we can have any of these data types. So we got some imagers and we got some floating points, and so then you can do multiplication on these. So basically the elements of the vector are multiplied together. So we have an example here.
So, for example, if I move to and four into zero and this one
and then three and seven into S two and s three
when I try and multiple I d zero and D one together, the result is going to get stored in D one,
so we'll take d zero the first element and multiply by D won the first element because we're doing 32 bit multiplication, right? And these air 64 bit registers.
So I have two times three, and that gives me six and then I have seven times four. And so that gives me 28 which in hexes onesie.
So we can kind of get an idea of the multiple processing because I was able to just multiply
two sets of numbers together and get the result in one instruction, and we'll see more examples of that in the future.
So the benefit of the vector operations is it allows us to do wider math operations, so we have the normal 32 bit on the S registers, but it can overflow into those 64 bits,
and then we can do 64 bit operations that go into 128 bits.
And as we saw in that last example, we could do multiple operations in one instruction which will help with efficiency of the processor.
So today we talked about some of the vector registers, the history and kind of how they're laid out. And then we talked about the neon architecture in some examples of that.
So in the future, we're going to talk about some or examples of vector instructions. So we'll give an example of how toe do Cem imager multiplication
also then look at floating point operations. So we didn't go into many floating point. We'll do some examples with that,
and then we'll also look at the Newman S I M D operations.
So here's our quiz. How big or each of the vector registers?
So that s registers are 32 bit
the D registers of 64 bit, and the Q registers 128 bits.
If you have questions, you can email me Miller MJ at you and Kate I e. To you, and you can find me on Twitter at Mail House 30.