Related Technologies
Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or
Already have an account? Sign In »

Time
9 hours 59 minutes
Difficulty
Intermediate
CEU/CPE
10
Video Transcription
00:00
>> This is the last domain of the CSA guidance,
00:00
and you've done a great job.
00:00
Now there are a few more modules in the course,
00:00
but if you're a technology-oriented person,
00:00
you'll enjoy the exposure to
00:00
different technology stacks that
00:00
we discuss in this domain.
00:00
Although Cloud is an enabler for
00:00
many different applications and
00:00
larger-level technologies,
00:00
it's not the be-all, end-all of software.
00:00
The CSA recognizes that
00:00
certain technologies are tightly bound to the Cloud,
00:00
but bring their own unique
00:00
security concerns at the same time.
00:00
In this domain, domain 14 we'll
00:00
cover the technologies that are very closely
00:00
related to Cloud but don't fit neatly into
00:00
the other 13 CSA domains that we've discussed so far.
00:00
This module focuses on four technologies, in fact,
00:00
all of which fit into one of two broad categories.
00:00
They either extensively rely on Cloud computing,
00:00
such as big data and serverless, that we'll discuss,
00:00
or they often integrate
00:00
with cloud computing on the back-end,
00:00
and we'll talk about mobile and
00:00
IoT which fall into that category.
00:00
The remainder of this video is going to be
00:00
focused on big data and serverless.
00:00
We'll jump into the 3V's of big data,
00:00
review the distributed components,
00:00
talk about specific security considerations,
00:00
and then we'll also talk about security considerations
00:00
for serverless.
00:00
Big data involves working with extremely large data sets.
00:00
We're talking at least a terabyte, if not petabytes,
00:00
which are thousands of terabytes,
00:00
or even exabytes which are millions of terabytes.
00:00
The high-volume characteristic of big data
00:00
is defined by these large data sets.
00:00
High velocity describes the frequency
00:00
at which the data is generated and captured.
00:00
Big data can be used to analyze
00:00
static and historical data sets,
00:00
but it often involves not just historical data,
00:00
but incoming streams of
00:00
data providing real-time feedback.
00:00
The data itself is often covering from
00:00
several different sources and
00:00
maybe even of different types.
00:00
This high variety of data can be structured,
00:00
say in a defined relational database format,
00:00
XML, JSON, or even a delimited files,
00:00
or it could be semi-structured,
00:00
such as loosely typed JSON or maybe even emails,
00:00
or it can be completely unstructured,
00:00
free text, videos, images, etc.
00:00
The volume velocity and variety
00:00
characteristics are referred to the 3V's.
00:00
The characteristics of the Cloud
00:00
provide a good platform to support these.
00:00
The large pooling of resources provide
00:00
powerful storage and compute capabilities,
00:00
while the elastic nature can
00:00
scale up and down when needed.
00:00
Big data systems can be broken
00:00
into three main components.
00:00
Data gets collected, it gets stored,
00:00
and it gets processed.
00:00
Let's talk about each of those three simple phases
00:00
in a bit more detail.
00:00
First thing, you need to collect the data.
00:00
In fact, large amounts of data need to be ingested.
00:00
As discussed in the last slide,
00:00
this can be batch imports of
00:00
historical data sets or streams of live data.
00:00
This ingestion from different data sources
00:00
is handled through distributed data collection.
00:00
The incoming data can be very lightweight in nature,
00:00
say a simple click-stream
00:00
that's just a few kilobytes in size,
00:00
or it can be much heavier in size, like video streaming.
00:00
By having your data collection
00:00
distributed, you can adjust,
00:00
tune and reallocate resources as needed
00:00
based on the velocity and the volume of incoming data.
00:00
Once you have the data coming in, you want to store it.
00:00
But the total amount of space you'd need to store
00:00
the data is just way too
00:00
big to put on a single hard drive.
00:00
In Domain 8 we went over
00:00
storage virtualization in the Cloud.
00:00
Distributed storage is similar in that it spreads data
00:00
across physical or even virtual storage sources.
00:00
This includes replication, data
00:00
striping for redundancy in case a storage node fails,
00:00
and it also optimizes to handle
00:00
massive rates of writing and reading the data.
00:00
They are also non-relational database systems
00:00
called NoSQL that can scale and fit the need.
00:00
In fact, the CSA guidance doesn't talk about this much,
00:00
but many Cloud providers themselves have PaaS services
00:00
to simplify your operations
00:00
and managing this type of storage.
00:00
They're often referred to as
00:00
Data Lake and Data Lake Storage.
00:00
Finally, you have all this data ingested and persisted,
00:00
you want to get some value of it.
00:00
This is where the distributed
00:00
processing comes into place.
00:00
Keep in mind that amounts of data is huge.
00:00
This isn't something you can have
00:00
a single server load into memory and analyze.
00:00
The data-set analysis needs to be distributed across
00:00
numerous machines to handle
00:00
both the size and rate of change.
00:00
Algorithms and technologies that
00:00
enable this are also rapidly improving.
00:00
The CSA guidance mentions
00:00
a MapReduce algorithm and Spark technology,
00:00
but even those are getting a bit
00:00
dated at the time of this recording.
00:00
Keep in mind the CCSK exam isn't about
00:00
specific technologies that implement these components.
00:00
More it's about understanding
00:00
the pipeline of ingesting data,
00:00
persisting data, and analyzing data.
00:00
Security and privacy are high priorities when you have
00:00
large amounts of potentially sensitive information,
00:00
and this is exactly the case with big data systems.
00:00
Let's cover some of the specific
00:00
security considerations for big data.
00:00
As data moves through the pipeline of collect, store,
00:00
then process, it will be
00:00
persisted in many different forms.
00:00
There's the massive data lake
00:00
in the middle of the pipeline.
00:00
You need to make sure that data is encrypted at rest.
00:00
At the same time, data will be temporarily
00:00
persisted during the ingestion and analysis phases.
00:00
Keep in mind the intermediary storage
00:00
used in these activities.
00:00
This includes container local storage and
00:00
data volumes attached to virtual machines.
00:00
When processing more confidential data,
00:00
you may even want to ensure that data in memory and
00:00
swap space used by virtual machines are secure.
00:00
You may recall providers give you an option to run on
00:00
isolated and even secured hardware.
00:00
This rules out the risk of
00:00
another tenant somehow getting access to
00:00
your virtual machines because
00:00
you are the only one using this hardware.
00:00
It's worth consideration when you're in
00:00
analyzing data that is considered truly confidential.
00:00
Moving on to the next point,
00:00
asymmetrical encryption is used to ensure data
00:00
at rest cannot be accessed without the private key.
00:00
It can be difficult to manage all these keys.
00:00
You have the different nodes involved in
00:00
constructing and managing the data lake of information,
00:00
and you also have the intermediary nodes that
00:00
are involved in collecting and processing.
00:00
PaaS services can simplify this management,
00:00
but if you are managing a big data platform
00:00
that runs in an IS paradigm,
00:00
you will need to understand this at
00:00
an extra level of detail and
00:00
make sure the keys are managed
00:00
and distributed appropriately.
00:00
Which bleeds into the next point of
00:00
securing big data platform you are using.
00:00
The defaults of
00:00
big data platforms like Hadoop, Hortonworks,
00:00
and so forth haven't historically put
00:00
security as a first-class concern.
00:00
The default settings should be reviewed for security.
00:00
With SaaS and PaaS services,
00:00
the provider will take care of more,
00:00
but you'll still want to understand
00:00
the specific knobs and dials you
00:00
have in the realm of
00:00
network access and securing the data,
00:00
not just encrypting the data,
00:00
but also having access management policies
00:00
set up at the platform layer to control
00:00
who and what can access
00:00
the different data-sets using
00:00
the big data application plane.
00:00
Ultimately, you'll want to know
00:00
the platform you're working with.
00:00
Capabilities vary between applications you deploy over
00:00
IS or the PaaS and SaaS offerings you may select.
00:00
Artificial intelligence and machine learning tie into
00:00
big data for testing
00:00
and training decision-making algorithms.
00:00
These technologies can provide
00:00
huge value for your business,
00:00
so don't shy away from them
00:00
because you don't understand them.
00:00
But at the same time, be ready to
00:00
dive deep and make sure you understand what
00:00
needs to be done to minimize data exposure and
00:00
adhere to the applicable compliance
00:00
and privacy requirements.
00:00
In domain 7, we looked at
00:00
the different compute categories and
00:00
examined serverless in detail.
00:00
If you need to refresh on that,
00:00
feel free to go back and review those videos.
00:00
But in short, serverless really isn't serverless,
00:00
rather the provider is abstracting
00:00
the underlying servers that are used to host the compute.
00:00
Development teams love this because it shifts
00:00
the server management burden over to the provider.
00:00
Providers like this, so they have greater control over
00:00
the workload distribution across
00:00
the underlying physical infrastructure that they own.
00:00
As a result, there are more and more technology options
00:00
on how to realize serverless.
00:00
Functions as a serverless may be
00:00
proprietary to a provider such
00:00
as AWS Lambda or Azure Functions and Logic Apps,
00:00
but there are more and more agnostic
00:00
frameworks being developed.
00:00
For example, phishing, which is a serverless solution
00:00
you can run on top of a Kubernetes cluster,
00:00
giving you more control and
00:00
reducing the vendor lock-in concerns.
00:00
Now, the CSA considers serverless
00:00
more than just functions as a service.
00:00
It includes other paths services
00:00
such as objects storage,
00:00
Cloud Load Balancers, database as a service,
00:00
machine learning's message queues, and so on.
00:00
Well, personally, I think this expansion
00:00
of the definition creates a lot of
00:00
confusion between what is
00:00
considered PaaS and how you define serverless.
00:00
Be aware that this point when you get questions
00:00
about serverless on the CCSK exam.
00:00
Whether we're talking about functions
00:00
as a service or other paradigms,
00:00
the concept of logging is going to change.
00:00
Your application, or more specifically,
00:00
the serverless functions need
00:00
to handle the logging themselves.
00:00
The access management rules need to
00:00
be defined with the platform.
00:00
When you're using PaaS offerings,
00:00
this can be done at the management plane,
00:00
but if you're using the provider agnostic platforms,
00:00
this will require coordination between
00:00
the management plane and the plane you
00:00
use to administer the serverless platform.
00:00
Once you have good logging you can identify incidents.
00:00
The subsequent incident response procedures
00:00
will also be different.
00:00
Those used in a traditional on-prem model,
00:00
and even those response message used in an IS model,
00:00
will need to be changed.
00:00
As a closing point on serverless,
00:00
being that so much responsibility moves to the provider,
00:00
you'll need to pay even closer attention
00:00
to compliance levels the provider
00:00
achieves with the different servers
00:00
lists offerings that they're giving you.
00:00
To close out this video,
00:00
we discussed big data,
00:00
the 3V's, distributed
00:00
components, and security considerations.
00:00
We then gave a summary of serverless,
00:00
and reviewed some of the specific
00:00
security considerations in that paradigm.
Up Next
Similar Content