Azure Data Factory Demo

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
14 hours 28 minutes
Difficulty
Intermediate
CEU/CPE
15
Video Transcription
00:00
>> Hello Siberians, welcome to lesson 3.7 of module 3
00:00
of this course titled
00:00
Azure 301 Microsoft Azure Architect Design.
00:00
Here is an overview of the demo
00:00
that I'll be walking you through today.
00:00
In Azure, I currently have these two services running.
00:00
I have an Azure Blob Storage
00:00
and I have a CSV file that contains
00:00
certain customer information I
00:00
already uploaded to that Azure Blob Storage.
00:00
I also have an Azure SQL database
00:00
created already in Azure.
00:00
Let me go ahead and show you these two resources,
00:00
and then I'll show you the rest of what I'll be doing.
00:00
I'm currently on the Azure portal,
00:00
you can see my storage account here,
00:00
and that's my SQL server that also has MySQL database.
00:00
If I go ahead and refresh that,
00:00
you can see SQL database over there.
00:00
If I clicked connects to
00:00
my SQL database and I click on Query Editor,
00:00
I probably do not have permission on the network level,
00:00
so let's go ahead and give
00:00
myself permission on the network level.
00:00
To do that, I'll need to go to
00:00
my server and I'll need to go on
00:00
the firewall and virtual networks and I'll
00:00
need to select the option to have my client IP address,
00:00
and I'll save that.
00:00
That should give me network connectivity access.
00:00
The other reason why I brought you to this page is to
00:00
point out this configuration to you.
00:00
Allow Azure services and resources to
00:00
>> access the server.
00:00
>> I need to have this configuration enabled on
00:00
my SQL server in order for
00:00
Data Factory to be able to connect to it.
00:00
Let's go back to my SQL database
00:00
and I'm going to connect to that using Query Editor.
00:00
Now I'm logged in to my SQL database.
00:00
If I expand that, you see that I
00:00
currently have no tables,
00:00
so let's go ahead and create a table.
00:00
The query is going to create
00:00
a simple table that's
00:00
going to have these different columns.
00:00
Go ahead and run the query, and that succeeded.
00:00
If I go to refresh this,
00:00
I should see my new table over here.
00:00
That's good. The other thing that I'll show you is
00:00
my storage account and
00:00
a CSV file that I already have in it.
00:00
Let's go back to my resource group.
00:00
Let's select the storage account.
00:00
I have a container,
00:00
a Blob container called customer data,
00:00
and within that blob container is
00:00
the CSV file called customer data.CSV.
00:00
If I click on that and if I click on edit,
00:00
you can see the information in there
00:00
essentially just contains four columns also,
00:00
comma separated, first name,
00:00
last name, age, and email.
00:00
That's what I currently have.
00:00
Let's see what I'm going to be
00:00
showing you the rest of this demo.
00:00
The first thing that I'll be showing you is,
00:00
I'll be creating an Azure Data Factory,
00:00
I will then go ahead and create
00:00
a pipeline within that data Factory.
00:00
Then my pipeline, I'll specify linked services
00:00
both to my Azure Blob Storage
00:00
and to my Azure SQL Database.
00:00
I'll specify the dataset for the Azure Blob that will
00:00
specify the CSV file and
00:00
the dataset for Azure SQL that will specify
00:00
the Azure SQL database table that I just created.
00:00
I'll then create a copy activity to link
00:00
>> those together,
00:00
>> to copy the data from the CSV file
00:00
into my SQL database table,
00:00
and of course all of that will execute within
00:00
the integration runtime in Azure Data Factory.
00:00
We verified the prerequisites already,
00:00
so I'll go ahead and show you
00:00
the creation of an Azure Data Factory Service.
00:00
This is the official representation
00:00
of what I'll be doing.
00:00
Let's go create the service.
00:00
I'm back to the Azure portal and what
00:00
I'll do is I'll go back over here.
00:00
I'll click on Create a resource
00:00
and our type in, data factory.
00:00
I click on Data Factory, I click Create.
00:00
I'll specify a name for my Data Factory.
00:00
I will just call it adftest1120.
00:00
That's the suffix that I'm using for
00:00
>> every resource that
00:00
>> I'm creating for this particular lesson.
00:00
I'll put this in the resource group,
00:00
so I'll put it in the resource group I created earlier,
00:00
and I'll specify the location as UK South,
00:00
I'm not going to be using CI/CD for now,
00:00
so I'm going to disable the option to use GIT.
00:00
I'll go ahead and click on Create.
00:00
It's as simple as that,
00:00
as creating a Data Factory.
00:00
After we have created a Data Factory,
00:00
now we can go in to build our data pipelines.
00:00
Data Factory now successfully created.
00:00
That didn't take long to create at all,
00:00
it only took a few seconds actually.
00:00
What I'll do is if I click on Go to resource,
00:00
you can see the Azure Data Factory resource over here.
00:00
The next thing that I'll be showing you is how to
00:00
create an Azure Data Factory pipeline.
00:00
Here's the visual representation of
00:00
>> what we'll be doing.
00:00
>> Before I go ahead and create
00:00
anything like the linked services,
00:00
I'll first of all need to create
00:00
a pipeline in Azure Data Factory.
00:00
Let's go ahead and do that now.
00:00
I'm back to the Azure Data Factory Service
00:00
in Azure portal.
00:00
What I'll click on, is I'll click
00:00
on Auto and monitor here.
00:00
In order to create a new pipeline,
00:00
I could either use the wizard option here,
00:00
but what I'm going to be doing is I'm going to be using
00:00
the proper autorun option over here,
00:00
so let's click on Auto on the left-hand side.
00:00
I have the option to specify what I want to auto.
00:00
I'll go ahead and click on add new resource,
00:00
and I'll click on add new pipeline.
00:00
Let's give the pipeline a name.
00:00
I'm giving it a name of customer data
00:00
azure Blob to Azure SQL,
00:00
and that's all I need to do for now.
00:00
I've created a pipeline and given it a name.
00:00
There's no need to click on saving anything for
00:00
now because we're still going to be
00:00
working within this pipeline.
00:00
The next one I'll be showing you is how
00:00
to create linked services for
00:00
Azure Blob Storage and Azure SQL in the factory.
00:00
Back to my pipeline,
00:00
enough for me to create the linked services,
00:00
I go to the lower left corner
00:00
over here and I click on connections.
00:00
Now, when I click on connections,
00:00
it opens up this connection tab
00:00
and I have linked services here.
00:00
I'll go ahead and click on new for new linked service,
00:00
and I can specify
00:00
different categories and then
00:00
I can pick the option that I want.
00:00
If I go under the Azure category,
00:00
I should be able to see Azure Blob Storage.
00:00
I'll go ahead and click on
00:00
that and I'll click on Continue.
00:00
I'll have to give this linked service a name.
00:00
I'll keep it simple.
00:00
I will call it Azure Blob Storage.
00:00
When it comes to the runtime that
00:00
this is going to execute in,
00:00
so rather than creating a new one
00:00
or rather than building a custom one,
00:00
I'm just going to use
00:00
the default Azure Integration Runtime.
00:00
It's going to ask me for the authentication method
00:00
to connect to this Azure Blob storage account.
00:00
I'm going to leave that set to account key.
00:00
The better way would be to use something like
00:00
managed identity or even patent [inaudible].
00:00
But for now, let's just use
00:00
account key to keep it simple.
00:00
I specified, I'm not
00:00
going to be integrating with keyboard,
00:00
I'm just going to be providing the connection string.
00:00
What options do I want
00:00
to use to select the account method?
00:00
I'll say from Azure subscription.
00:00
That's fine, I'll go ahead and
00:00
select my Azure subscription,
00:00
and I'll select my storage accounts directly from here,
00:00
and that's my storage account over there.
00:00
Once I've selected the storage account,
00:00
I have the option to click on Test
00:00
>> Connection over here.
00:00
>> Let's go ahead and click on Test Connection.
00:00
That's successful, that's great,
00:00
so I'll click on Create,
00:00
and that's created the linked service
00:00
for Azure Blob Storage.
00:00
Let's do the same thing for Azure SQL Database.
00:00
Under Linked Services over here,
00:00
I'll go ahead and click on New.
00:00
I'll go on the hedger and I'll look
00:00
for Azure SQL Database.
00:00
Here we go, Azure SQL Database.
00:00
I click on Continue,
00:00
and to keep it simple, again,
00:00
I'll call it Azure SQL Database,
00:00
leave it to run into default Integration Runtime,
00:00
I'll be using the connection string.
00:00
I'll go ahead and select my subscription.
00:00
I'll select my server name, which is this,
00:00
and I'll select my database name,
00:00
which is that database over there.
00:00
Let's go ahead and select that.
00:00
The authentication type that I'll be using,
00:00
I'll be using SQL authentication.
00:00
I'll specify my username for connecting to
00:00
my SQL database and I'll specify my password.
00:00
>> I can then test the connection again.
00:00
That's successful and apply it and click on ''Create''.
00:00
Now Azure Data Factory knows how to
00:00
connect to these two data sources,
00:00
Azure Blob and Azure SQL.
00:00
The next thing that I'll show you is how to
00:00
create the data set for
00:00
the CSV file in
00:00
Blob storage and for the Azure SQL database table,
00:00
which will be our output.
00:00
Let's create a data set for
00:00
the inputs and data set for the outputs.
00:00
Let's try to do that now.
00:00
Back in the Azure portal.
00:00
For me to be able to create data sets,
00:00
I go back to my pipeline over here.
00:00
Within my pipeline, I can click
00:00
on ''New''' and I can click on ''New Data Set'' here.
00:00
Let's click on new ''Data Set''.
00:00
I'm going to be selecting data set for
00:00
Azure Blob Storage and Azure SQL database also.
00:00
Data set for Azure Blob Storage.
00:00
Let's click on ''Continue'' and you can
00:00
see the supported format types.
00:00
In her case we'll be using CSV.
00:00
I will go ahead and select CSV
00:00
and now click on ''Continue''.
00:00
I will need to specify the name for that.
00:00
Let's specify customer data CSV,
00:00
I will specify the linked service tab by using that,
00:00
so I will specify the Azure Blob Storage link service
00:00
that I created earlier.
00:00
This is where I have to specify the inputs.
00:00
I can specify the container directory of file.
00:00
Easiest ways I can just click on ''Browse''.
00:00
I go on the customer data container
00:00
and I can see my foul day.
00:00
Let's select that, so that automatically populated.
00:00
I'll leave the input schema from the connection stops.
00:00
That's fine. How quiet and click on, ''Okay''.
00:00
What this has done, is it's giving me
00:00
the option to modify the name of a year.
00:00
I'll just leave this the way it is,
00:00
I'll show you a few things actually.
00:00
If I click under the connection section,
00:00
here I can test the connection
00:00
from a linked service again.
00:00
But here's an option to preview my data.
00:00
If I click on that option,
00:00
it will actually reach out and preview my data.
00:00
I can see what my data looks like here. That's great.
00:00
The other thing that we can also do here,
00:00
which I'll point out to you, is I can
00:00
also do things like,
00:00
let's say my column delimiter is not a comma,
00:00
it's something else I can select what I'm
00:00
using for the delimiting,
00:00
and I can specify other option.
00:00
The other thing that I want to select
00:00
is this option that says,
00:00
first row as header,
00:00
because my first row in the CSV file
00:00
>> is indeed my header.
00:00
>> Let's go ahead and select that option.
00:00
That's selected. That's all good.
00:00
Let's try and creates the data set for
00:00
Azure SQL database table.
00:00
What I'll do is I'll go back to
00:00
my pipeline and now go to new datasets.
00:00
This time around I will look for dataset
00:00
for Azure SQL database,
00:00
I'll click on ''Continue'' and I'll give it a name.
00:00
What I'll do is I'll give it a name of
00:00
Azure SQL Table outputs.
00:00
My linked service should be Azure SQL database.
00:00
What I created earlier
00:00
and I can specify my table name here,
00:00
so that's the table that I created earlier.
00:00
I'll select that table.
00:00
The input schema would be from connections star.
00:00
That's fine. I'll go ahead and
00:00
click ''Okay''' to this also.
00:00
It can it give time information there,
00:00
I can click on the ''Connection''.
00:00
If I previewed data,
00:00
there should be nothing there,
00:00
so it's currently empty.
00:00
The next thing that I'm going to show
00:00
you is how to create the copy activity,
00:00
which ties everything together so to speak.
00:00
What I'm going to be doing this,
00:00
>> I'm going to be creating
00:00
>> this copy activity that's going to be
00:00
copying information for my input data sets
00:00
to the output data sets.
00:00
Let's go ahead and do that.
00:00
If I go back to my pipeline tab over here,
00:00
and on the pipeline tab I can see
00:00
the different activities that have
00:00
fill in there and there's a lot of them.
00:00
The one I'll be using will be under the category of
00:00
move and transform category.
00:00
I can drag this copy data.
00:00
That's one of the most popular activities.
00:00
If I drag that to this walking pin or
00:00
>> to this workspace,
00:00
>> the copy data option it's going to give
00:00
me the information down here.
00:00
First of all, I have to give it a name.
00:00
I'll give it a name of customer data- CSV-to-SQL.
00:00
Let's go ahead and put that in there.
00:00
I can give it a description if I want to.
00:00
But the main part is I need to specify the source.
00:00
That's where I specify my source data sets,
00:00
which is the CSV file.
00:00
I can, again preview the data
00:00
and I can make certain modifications there.
00:00
Then I have the Sink,
00:00
which the sink is also the destination.
00:00
I can specify my destination data set,
00:00
which would be my Azure SQL table.
00:00
I specify that.
00:00
It's going to give me the option to act.
00:00
It can actually say automatically create table here,
00:00
so I'm not going to be using that.
00:00
But the other important one is the mapping because in
00:00
my case is some of the edges don't map directly.
00:00
If I go on the mapping,
00:00
I click on ''Import Schemas'' it's going to import
00:00
schemas from the source and the destination,
00:00
so I can map them to each other. Here we go.
00:00
That's the source and that's the destination.
00:00
Let's have a look at that very quickly.
00:00
You can see that under the source,
00:00
at the destination, I have the first name and
00:00
last name columns,
00:00
which is not mapped to anything.
00:00
In this case I have just first as the header column.
00:00
I'll select first to be mapped to that.
00:00
I'll select last to be mapped to last name.
00:00
All the headers are matched already.
00:00
Once I have these,
00:00
I can go ahead and test the pipeline.
00:00
I always recommend testing
00:00
the pipeline before we go ahead and publish it.
00:00
That's always a good practice,
00:00
that's we're not going to be publishing
00:00
something that doesn't work.
00:00
If I go ahead and click on the ''Bug'' here,
00:00
and it's going to start running that.
00:00
We're going to be able to view the status
00:00
and see what that looks like over here.
00:00
By the way, why it's going through this testing,
00:00
what I'll do is I'll go back to Azure.
00:00
Let's go back to my resource group.
00:00
Let's go on to Azure SQL.
00:00
I can connect back to my database.
00:00
Put in my database information,
00:00
and I can see my table over here, so that's good.
00:00
Let's see how the task is doing
00:00
since succeeded now, so that looks good.
00:00
Let's quiet and verify within my SQL database here.
00:00
What I'm going to be doing is I'm going to be
00:00
selecting all for my table.
00:00
I'll go ahead and run this query, and there we go.
00:00
You see all this information now populated cause
00:00
data factory has move them
00:00
across and I can actually check the count.
00:00
I think I have 300 users.
00:00
Let's go ahead and
00:00
execute on that query that counts them.
00:00
If I go ahead and execute that,
00:00
that's 300 users already
00:00
inserted into this table. That's great.
00:00
The final thing that we can do is we can publish
00:00
this pipeline so that it's something that we can reuse.
00:00
At the top here I can click on ''Publish 0''.
00:00
When I click on ''Publish 0''',
00:00
it's going to deploy this pipeline to the Data Factory.
00:00
What that means is I can now shadow.
00:00
Let's go successfully published.
00:00
I can add a trigger.
00:00
I can trigger this entire workflow either manually,
00:00
I can trigger it on a shadow.
00:00
That's something that takes
00:00
the data from time-to-time and
00:00
inserts them from one source to another destination.
00:00
I can trigger it based on an event.
00:00
That's one of the advantages of Azure Data Factory.
00:00
That brings me to the end
00:00
of this particular demonstration.
00:00
I hope you found it useful
00:00
and I'll see you in the next lesson.
Up Next