Azure Data Factory

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
14 hours 28 minutes
Difficulty
Intermediate
CEU/CPE
15
Video Transcription
00:00
>> Hello Cybrarians.
00:00
Welcome to lesson 3.6 of
00:00
>> Module 3 of this course titled,
00:00
>> AZ-301 Microsoft Azure Architect Design.
00:00
Here are the learning objectives for this video.
00:00
We'll start out by giving
00:00
an overview of the Azure Data Factory Service.
00:00
This we help you to get a clear understanding of
00:00
what service is and what it does.
00:00
Then we'll build on that understanding by showing
00:00
a sample scenario of where
00:00
Azure Data Factory can be of use.
00:00
Finally, I'll cover some built-in
00:00
>> connectors that we can
00:00
>> use with the Azure Data Factory Service.
00:00
What is Azure Data Factory?
00:00
According to Microsoft,
00:00
it's a Cloud-based data integration service that allows
00:00
us to orchestrate and automates
00:00
data movement and data transformation.
00:00
What that means is that this
00:00
>> service is about automating
00:00
>> the movement of data
00:00
between different external data stores.
00:00
Here's a quick example.
00:00
Let's consider a case where we
00:00
have certain user information stored
00:00
in an Azure Blob Storage and want
00:00
to move this information from its current state.
00:00
We want to move it into an Azure SQL database table.
00:00
This is a use case that
00:00
Azure Data Factory can help us to achieve.
00:00
But how are we going to achieve this with
00:00
Azure Data Factory? Here's how.
00:00
The first thing that we need to do is we need to let
00:00
Azure Data Factory know the addresses of
00:00
both external data sources and the keys
00:00
or the authorization to be able to access them.
00:00
This is called a linked service in Azure Data Factory.
00:00
In the scenario that we have on the screen,
00:00
we need to create two linked services,
00:00
one for Azure storage and the other for azure SQL.
00:00
The linked services, what they will content will be
00:00
connection strings the data factory can use
00:00
at runtime to connect to
00:00
Azure storage and also to
00:00
>> connect to Azure SQL database.
00:00
>> Then we need two datasets.
00:00
A dataset is defined as a named view of data that
00:00
references the data that we wanted to
00:00
use as inputs or outputs.
00:00
What does that mean?
00:00
Let's go back to our scenario again.
00:00
The Azure Blob Datasets that we'll be creating.
00:00
It's going to specify the file I want to work with.
00:00
In this case is going to specify the blob folder is
00:00
going to specify the object
00:00
that contains our inputs data,
00:00
which is our CSV file.
00:00
The Azure SQL database, table datasets.
00:00
What that would do is that will specify the SQL table,
00:00
which would be the output.
00:00
In other words, dataset.
00:00
I'll pause to say, here's
00:00
a dataset for the data that we want
00:00
to walk with as the inputs and here's
00:00
datasets for where the output is,
00:00
which is a SQL table.
00:00
After we've defined these,
00:00
we need to define the job that will move the data.
00:00
In Azure Data Factory,
00:00
this job is referred to as an activity,
00:00
so the most popular activity
00:00
in Azure Data Factory is a copy activity
00:00
which simply moves data
00:00
>> from one source to a destination.
00:00
>> Activity usually goes within a pipeline,
00:00
so what that means is,
00:00
we typically have multiple chains
00:00
of activities and then we'll
00:00
group them together within something called
00:00
a pipeline in Azure Data Factory.
00:00
This whole process of connecting to
00:00
external data source and
00:00
copying data from its source to destination.
00:00
By the way, the destination is also
00:00
referred to as a sink in Azure Data Factory.
00:00
But this whole process that we're describing,
00:00
it's going to have to execute somewhere.
00:00
It's going to have to execute in an environment.
00:00
It's going to need compute to
00:00
be able to perform the processes of
00:00
connecting to data stores and
00:00
collecting information regarding the dataset.
00:00
That's where the integration runtimes comes in.
00:00
Integration runtime in Azure Data Factory
00:00
is the compute infrastructure
00:00
that's used by Azure Data Factory to
00:00
provide at data integration capabilities.
00:00
In other words, an integration runtime is what provides
00:00
the bridge between our activity
00:00
>> and the linked services.
00:00
>> Of course, all of these would be
00:00
defined under the Azure Data Factory Service.
00:00
This slide is just to give you
00:00
a little bit of an overview
00:00
of some of the built-in connectors that
00:00
exist in Azure Data Factory.
00:00
When we're talking about being
00:00
>> able to move data between
00:00
>> different data sources and
00:00
being able to do data transformation.
00:00
This is the scope of what we're referring to.
00:00
Whenever you see the light blue,
00:00
that means Azure Data Factory supports read and write.
00:00
In other words, it can use this datastore as
00:00
an input or as
00:00
an output and wherever you see the light purple,
00:00
that's when it supports
00:00
that as an input, supports read-only.
00:00
For example, we could move data from Azure Blob.
00:00
We can move data into Azure Cosmos DB
00:00
from Azure Data Lake Gen2.
00:00
We could move that into
00:00
Salesforce or we can move the data between
00:00
these different extra now sources
00:00
>> or external datastores.
00:00
>> Here's some other useful information
00:00
about Azure Data Factory.
00:00
Azure Data Factory as two versions.
00:00
We have version 1 and we have version 2.
00:00
Version 2 is an improvement of vision 1,
00:00
its support for capabilities like
00:00
pipeline runs and activity runs and trigger runs.
00:00
In other words, it's much more mature and it has
00:00
much more capabilities and also
00:00
integration than what version 1 does,
00:00
so we always want to use version 2 going forward.
00:00
Here are some of the other use cases of
00:00
Azure Data Factory beyond what I described earlier.
00:00
For example, it allows us to be able to use
00:00
dynamic data pipelines for big data workflow.
00:00
That's some of the things that we
00:00
talked about, data movement,
00:00
data transformation as we move
00:00
data between different data sources.
00:00
Here's another good use case for Azure Data Factory.
00:00
We can migrate SSIS to Azure Data Factory.
00:00
What that means is the SQL Server Integration Services
00:00
that we may be running on-premises,
00:00
we can actually do a lift-and-shift of
00:00
this service and shift that into Azure Data Factory.
00:00
What that means is that we can have a whole Platform as
00:00
a Service solution with
00:00
a SQL Server Integration Services running
00:00
in Azure Data Factory and our databases
00:00
exist in Azure SQL or Azure SQL Managed Instance.
00:00
This brings me to the end of this lesson.
00:00
Thanks very much for watching,
00:00
and I'll see you in the next lesson.
Up Next