9 hours 41 minutes
I welcome the module. Six. This is less than 6.2 aggregation functions Now. Aggregation functions play a big role in writing advanced queries and sequel.
They perform calculations on a set of values. They come. They are commonly used with group by statements or having clauses, and we'll take a look at those and the upcoming lesson.
Most common use involves various types of reports, financial reports, user statistics. Any time you want to show data aggregated together in some way, you'll typically turn to the aggregation functions.
Something to consider when working with aggregate functions is that they ignore no values. And this is except the count function, which will not ignore the no value now. With that said, Let's go to our development environment and take a look at using some of these aggregate functions.
I welcome back to the development, Vayrynen said. I decided to take a quick look at the my sequel Documentation. Look at their aggregate group by function descriptions. We could see we have quite a few available. We'll be focusing on the average and the count for this lesson
Now, with that said, let's go ahead to and head over to our database and take a look at a nd beaver. And let's answer the question of how many students per school do we have
now. There is a special consideration that we need to take care of because it wasn't accounted for in our database Cedar and we're just going to make do with what we have. So on the user types table, we should have an entry for each school that we have.
So we have three schools, I believe so. There should be a total of six entries on this table
to represent teachers and student types at every school. Now, toe just saw this very quickly. We're gonna pretend that this school I d. Isn't even there that we only have these 12345 columns so that the teacher and student type is shared
between all schools.
And you do see that concept in databases at times as well. You will see some tables that are shared between all the companies.
And I said, let's start programming.
So we want to get the number of students per school. So how would we do that? Well, our user type is indicated on the pivot table So on the pivot table, which is user courses, were indicating
whether or were indicating what role a user plays someone just go ahead and start this query. So select star
You wanna alias that with a You see,
I would drop down to the next line,
and I'm gonna join user type
the user types table because that's where the types indicated. I'm gonna alias that with the u t gonna say on you see dot
use a type I d
equals ut that i d
u t dot title
And let's say we count this like this.
Count this by putting the column. Actually, we don't need to call me just put star in there
And we wanted to where
u t dot title is equal to student
and we want to group by because we need to use that with our arrogant functions.
If we don't use group by, this is what kind of results will get back.
It just counted everything.
So screw bidi titled
and run this
Oh, you know, Let's take this out.
There we go.
So we got 200 for the teacher, 200 for the student
and we're joined on the user type I D was Type I D.
Let's go in and change this around a little bit and see if we can change it to something else. So let's say
you see that
user, I d
on that one.
And of course, we need to put that in the group by Klaus
Okay, so you can see the user. Id's eight is enrolled in 10 courses,
so he's playing some kind of role in 10 courses.
So we go back to what we had.
This is most definitely wrong, and the reason it's wrong is that because we put the indicator of what role the user is playing on the pivot table, that user is going to show up on as a counted result more than once if their student
in more than one course. For example, if we have user I d. Three
and three different courses there, count is going to return three,
and we don't want that because we want to know how many students we have. We don't want to know how many students we have multiplied by how many courses they're taking,
so to get that
we actually have to change this query a little bit.
First we need to get the students that we have. So say you see
that user I d
on. Let's say where
is equal to student.
So we run this
on. We need to change the group by statement. Of course.
This is all the I ds off people that are playing a student role in a course.
So it is possible we could have someone who is a student and a teacher,
happens at a lot of schools as well.
So we have this query which exposes who is a student. But we want to count that. How would we do that?
Well, an easy way to do that. Which is to make this a derive table put in some brackets
alias. This was students and then select
But actually, we need to ask something in here. We need to add this school I d
you see that school idea?
And then, of course, we need a group by that too. Now we're spreading the user type across the whole database.
So that is why we're still using the school. I d,
um, against the user I d
we select this,
we see the school I d. That each user idea belongs to
now we're still gonna use this as a derive table. We're gonna alias that as students,
we're gonna select the school I d.
And we're gonna count Star, which is just miscount everything.
And we're gonna group by school. I d.
So this is two layers of aggregation.
We have the first drive table from that sub query that's using aggregate function. And then we're also aggregating outside that as well.
What? We do this
when you say from
Don't forget those keywords
I think I need to make sure to indicate
the table name I'm using.
Oh, silly mistake. Forgot the key word by
sure you get there, you gotta get your syntax right. Otherwise, it just doesn't work.
Okay, so and school i d two or school i d to whichever school that is. We know they have 56 students at school one. We know they have 42 students and at school three, we know what they have 47 students. So we have answered the question of how many students we have at each school. We use that by grouping by
the school idea in the user i d
and creating a drive table
and then counting those results. And we used to levels of aggregation on the derive table,
which I'm highlighting right now with the group by the school idea in the user I d to remove the duplicates from the pivot table.
We then use that as a derived query sub query. Alias did its students. Then we grouped by the school i D. And we counted the results, and that gave us the number of students per school.
So let's try another thing or another idea that will be a little bit easier. Let's get thestreet Dent's with their average, uh, average score.
So what do we need to do that so Well say selects star. So I think we're gonna need the users table
when alias that I got from the key word from users
gonna alias that as a you
I'm gonna join
the user courses.
Actually, we're just getting the students with their grades so we don't even need this pavement this pivot table. Just join grades
now a list that with a G
g dot user i d
equals you that i d
and g dot school I d
equals you that school i d
And then we want to slug. Let's get their name.
It's a first name.
And then let's just average
They're great. Score grey dot
And then we need to remember to a group by
We're only using their first name.
We had other columns that we were grouping by. We, of course, would want to include that in the group by statement as well.
you know what else we need to do? We need to include the school idea. Because
this, as written by itself, we would have accidentally aggregated the same person if they happen to have the same first name, which would definitely happen at a large school. We would have averaged their scores together. We don't want to do that. So
let's just do you that
do you that school idea to
Well, you will make sure we get this right.
Let's run that.
I've been feeling some numbers changed,
and here we have We have all the students at the respective school with their average score,
and that is done.
So the 2nd 1 was a little bit easier than the 1st 1 we did, because we didn't have to do two layers of aggregation like we did when we were counting it.
So let's go ahead and head over to this summary
brings us to the 6.2 summary and this lesson we discuss Abby, it function scope in purpose. On we looked at some simple usages, such as student count and students with average G p A.
And that completes this lesson. I hope to see you in the next thank you.
The Microsoft Azure Test is a premium Cybrary assessment created by Interview Mocha. The exam ...
AWS Data Engineering
The AWS Data Engineering test is a premium Cybrary assessment test created by Interview Mocha. ...