Azure CosmosDB Data Migration

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
14 hours 28 minutes
Difficulty
Intermediate
CEU/CPE
15
Video Transcription
00:00
>> Hello, Cyberians.
00:00
Welcome to Lesson 3.10 of Module 3 of this course
00:00
titled AZ-301: Microsoft Azure Architect Design.
00:00
Here are the learning objectives in this video.
00:00
We'll start out by covering the concept of
00:00
partitioning as it relates to Azure Cosmos DB.
00:00
We'll then proceed to cover
00:00
the data migration scenarios that
00:00
Cosmos DB support for the different APIs.
00:00
Let's get into this.
00:00
Let's talk about partitioning
00:00
as it relates to Azure Cosmos DB.
00:00
Azure Cosmos DB uses
00:00
this concept called dynamic partitioning.
00:00
Whenever we set up Azure Cosmos DB
00:00
and we create our containers,
00:00
one of the important thing that we need to select is
00:00
something called a petition in ID
00:00
or a petition in key value.
00:00
To put it very simply,
00:00
items with the same partition key are kept
00:00
together on the same physical node.
00:00
Because Azure Cosmos DB does not store our data
00:00
on a single node to avoid hot partitions,
00:00
what happened is that the data within
00:00
the collections will be spread across
00:00
different physical nodes within the Azure data centers.
00:00
What's going to determine which datas are kept together
00:00
will be based on what was selected
00:00
>> for partitioning key.
00:00
>> It's always a good idea to choose a partitioning key
00:00
that has a wide range of values and access patterns.
00:00
Good candidate, for example,
00:00
for a partition key would be something like properties
00:00
that appear very frequently when you do your queries.
00:00
You want to group those together
00:00
to make your queries much more quicker.
00:00
So that queries can be efficiently routed,
00:00
you want to group them together on the same partitions.
00:00
That's a very key decisions that we have to make.
00:00
Let's talk about migrating data into Azure Cosmos DB.
00:00
Here are some of the tools that are
00:00
available to us when we're thinking
00:00
about migrating data from
00:00
existing NoSQL databases into Cosmos DB.
00:00
We have the Azure Database Migration Service,
00:00
we have the Azure Cosmos DB Data Migration Tool,
00:00
we have AZCopy,
00:00
we have the CQL Shell or the COPY command,
00:00
CQL, Cassandra Query Language,
00:00
and we have Spark.
00:00
Now, which of these tools that we're going to be using,
00:00
the paint on the API that was selected when we
00:00
created our Cosmos DB and
00:00
the paint on which database that we're moving from.
00:00
For example, if we're working with
00:00
the MongoDB API of Cosmos DB,
00:00
we can use the Azure Database Migration Service.
00:00
Actually, this is the only case when it comes to
00:00
Cosmos DB that we can use
00:00
the Azure Database Migration Service.
00:00
You can see the source
00:00
that's supported on the right-hand side.
00:00
For example, we can move database from MongoDB server
00:00
on-premises on a virtual machine straight
00:00
into the MongoDB API of Azure Cosmos DB.
00:00
I'll show you a demo of this.
00:00
If we're working with a SQL API,
00:00
the native API of Azure Cosmos DB,
00:00
what formerly called DocumentDB,
00:00
we can use the Azure Cosmos DB Data Migration Tool
00:00
and the sources are what
00:00
you can see on the right-hand side.
00:00
For example, we can use
00:00
the Cosmos DB Data Migration Tool to
00:00
move data from JSON files, CSV files, MongoDB,
00:00
Azure Table storage, Amazon DynamoDB,
00:00
and the others that you can see
00:00
and we can use that to move data
00:00
into the SQL API of Cosmos DB.
00:00
When talking about Table API,
00:00
we can use the Azure Cosmos DB
00:00
Data Migration Tool to migrate
00:00
data from an Azure Table storage into the Table API.
00:00
We can also use AZCopy to copy data,
00:00
auto export and import data from
00:00
>> an Azure Table storage.
00:00
>> If we're referring to the Cassandra API,
00:00
we can use the Cassandra Query Language COPY command
00:00
to move what exists in
00:00
Cassandra workloads into the Cassandra API.
00:00
But one thing that I want you to be clear of is
00:00
what's supported with what API.
00:00
We can also use Spark to move
00:00
existing Cassandra workloads into Cassandra API.
00:00
Here's a visual representation
00:00
of the thing that we just mentioned.
00:00
When it comes to Azure Database Migration Service
00:00
and Azure Cosmos DB,
00:00
only MongoDB is supported.
00:00
When it comes to Azure Cosmos DB Data Migration Tool,
00:00
only the SQL API and Table API are supported,
00:00
and you can see the different sources that are
00:00
supported on the right-hand side for each.
00:00
Let's talk about Azure Cosmos DB design decisions.
00:00
When it comes to availability,
00:00
the first thing that I want to mention to you is
00:00
about durability of this service.
00:00
Data is durably committed by
00:00
a quorum of replicas before
00:00
a write operation is acknowledged.
00:00
What that means is that if you make
00:00
a write operation request to Azure Cosmos DB,
00:00
that data is going to be committed in
00:00
about four different places in the Azure region
00:00
that you're making the request into
00:00
before the write operation is acknowledged,
00:00
which means the data is highly durable.
00:00
Of course, this changes if you're using more to
00:00
write regions that's even much more durable.
00:00
It's going to ensure that that data is committed
00:00
to another write region before there's
00:00
>> an acknowledgment.
00:00
>> When it comes to automatic online backup
00:00
that's automatically enabled for Azure Cosmos DB,
00:00
a backup is taken every four hours and
00:00
it's kept for 30 days in redundant storage.
00:00
Here's one caveat to that is that if you
00:00
do want to do a restore using the backup,
00:00
you need to raise a support ticket with Microsoft,
00:00
and they're going to be the one to
00:00
do the restore for you and they're going to be
00:00
doing the restore in most cases
00:00
to another Cosmos DB account.
00:00
What that means is that time is of
00:00
essence whenever you need to do a restore
00:00
because you need to ensure that you
00:00
raise a request with Microsoft before
00:00
the retention period of 30 days
00:00
expires for the data that you want to restore.
00:00
Good practice is to set up at least two regions,
00:00
preferably set up at least two write
00:00
regions because that will
00:00
help you also in terms of automatic failover.
00:00
I'll talk about that a little bit in a minute,
00:00
but that's best practice right there.
00:00
If you're going to be using
00:00
single write region with multiple read regions.
00:00
In other words, only one
00:00
write master and the others.
00:00
I just read replicas.
00:00
There's an option to enable
00:00
>> automatic failover so that if
00:00
>> the region where the write master is located,
00:00
if there were to be a failure in that region,
00:00
Microsoft will automatically failover
00:00
write operations to one of
00:00
your read replicas and then that
00:00
>> becomes the new master,
00:00
>> so to speak, but you have to enable that option.
00:00
If you're using multiple write regions,
00:00
you do not need to enable automatic failover.
00:00
If you're using multiple write regions,
00:00
if there were to be a failure in one of
00:00
your write regions,
00:00
Microsoft automatically fails over the request
00:00
to the other available write regions.
00:00
When it comes to scalability,
00:00
Azure Cosmos DB uses something
00:00
called request units and
00:00
that's what determines the performance.
00:00
For example, one kilobyte of document
00:00
read equals to one request unit and on and on.
00:00
You can go up to a million request units per container,
00:00
and you can go up to
00:00
a million requests units per database.
00:00
Now you can increase that where you need to
00:00
visit Microsoft support ticket.
00:00
The other thing that I want to point out to
00:00
you on the screen is
00:00
the maximum storage per
00:00
container and the maximum storage per database,
00:00
and you can see that that's
00:00
unlimited and that's also very important to note.
00:00
When it comes to monitoring of Azure Cosmos DB,
00:00
it's very similar to what we talked
00:00
>> about for Azure SQL.
00:00
>> There are three main things that we're referring to.
00:00
We're talking about metrics,
00:00
which is information that's coming from
00:00
the platform about the state of
00:00
the service at a point in time,
00:00
we're talking about Activity Logs,
00:00
which is information that's coming from
00:00
the management's layer on
00:00
the subscription level for
00:00
the service and we're talking about diagnostic logs,
00:00
which is something that's not enabled by default and we
00:00
have to enable within the service itself.
00:00
For metrics and activity logs
00:00
that are automatically connected,
00:00
nothing to do to enable them.
00:00
It does not cost us yet with them.
00:00
With diagnostic logs, we have
00:00
to enable that by ourselves.
00:00
When we go to enable it,
00:00
we have to select three options for the destination.
00:00
For example, we can decide to store
00:00
that data in an Azure storage account that's good for
00:00
archiving use cases and we can also define
00:00
the retention period for
00:00
the logs within the storage account.
00:00
Then we have the Azure Event option
00:00
where we can send the data to event up,
00:00
and this is good if we're looking
00:00
>> to build something like
00:00
>> database telemetry or
00:00
hot pipelines for monitoring solution.
00:00
The final option that we have for diagnostic logs is
00:00
the Log Analytics workspace where we can send
00:00
this information to Log Analytics workspace.
00:00
We can then do
00:00
queries and do reporting and lots of example.
00:00
We could easily put together
00:00
a query that's going to identify areas of
00:00
cases of database performance issues within
00:00
Azure Cosmos DB using
00:00
the Kusto Query Language that Log Analytics supports.
Up Next