This is the last domain of the C S. A guidance, and you've done a great job. Now there are a few more modules in the course, but if you're a technology oriented person, you'll enjoy the exposure to different technology stacks that we discuss in this domain.
Although Cloud is an enabler for many different applications in larger level technologies, it's not the be all end all of software.
The CSO recognizes that certain technologies air tightly bound to the cloud but bring their own unique security concerns at the same time.
In this domain domain, 14 will cover the technologies that are very closely related to cloud. But don't fit neatly into the other 13 c. S. A domains that we've discussed so far.
This module focuses on four technologies in fact, all of which fit into one of two broad categories. They either extensively rely on cloud computing, such as big data and serverless that will discuss,
or they often integrate with cloud computing on the back end, and we'll talk about Mobile and I O. T. Which fall into that category.
Remainder of this video is going to be focused on big data and serverless will jump into the three V's of Big Data review. The distributed components talk about specific security considerations, and then we'll also talk about security considerations. For server lis,
Big data involved working with extremely large data sets. We're talking at least a terabyte, if not petabytes, which are thousands of terabytes or even exabytes, which are millions of terabytes.
The high volume characteristic of big data is defined by these large data sets. High Velocity describes the frequency at which the data is generated and captured.
Big data can be used to analyze static and historical data sets, but it often involves not just historical data but incoming streams of data providing real time feedback.
The data itself is often covering from several different sources and maybe even of different types. This high variety of data can be structured, saying a defined relational database format, XML, Jason or even a delimited files. Or it could be semi structured, such as loosely type, chase on or maybe even emails
or can can be completely unstructured.
Free text videos, images, etcetera.
The volume, velocity and variety characteristics are referred to the three V's and the characteristics of the cloud provide a good platform to support these. The large pooling of resource is provide powerful storage and compute capabilities, while the elastic nature can scale up and down when needed.
Big data systems can be broken into three main components. David gets collected, it gets stored and it gets processed. Let's talk about each of those three simple phases in a bit more detail.
First thing you need to collect the data. In fact, large amounts of data need to be ingested as discussed in the last slide. This could be batch imports of historical data sets or streams of live data.
This suggestion, from different data sources is handled through distributed data collection. Incoming data can be very lightweight in nature, say Ah, simple click stream. That's just a few kilobytes in size. Or it could be much heavier in size like video streaming.
And by having your data collection distributed, you can adjust tune and reallocate resources as needed, based on the velocity and the volume of incoming data. Once you have the data coming in, you want to store it. But the total amount of space you'd mean to store the data is just way too big to put on a single. Our drive
in domain eight. We went over storage virtualization in the cloud distributed storage is similar in that it spreads data across physical or even virtual storage sources. This includes replication data striping for redundancy in case the storage node fails. And it also optimizes to handle massive rates of writing and reading the data.
There are also non relational database systems called no sequel that can scale and fit the need. In fact, the C. S a guidance doesn't talk about this much. But many cloud providers themselves have passed services to simplify your operations and managing this type of storage. They're often referred to his data lake and day leg storage.
Finally, you have all this data ingested and persisted. You want to get some value of it. This is where the distributed processing comes into place. Keep in mind that amounts of data is huge. This isn't something. You can have a single server load into memory and analyze the data said analysis needs to be distributed across numerous machines to handle both the size and rate of change
algorithms and technologies that enable this are also rapidly improving.
The C S. A guiding mentions map produced algorithm and spark technology. But even those air getting a bit dated at the time of this recording keep in mind the CCS K exam isn't about specific technologies that implement these components more. It's about understanding the pipeline of ingesting data, persisting data and analyzing data.
Security and privacy are high priorities when you have large amounts of potentially sensitive information, and this is exactly the case with big data systems. Let's cover some of the specific security considerations for big data.
As data moves through the pipeline of collect store than process, it will be persisted in many different forms. There is the massive Data lake in the middle of the pipeline, even to make sure that data is encrypted at rest. At the same time, data will be temporarily persisted during the ingestion and analysis faces.
Keep in mind the intermediary stores used in these activities. This includes container, local storage and data volumes attached to virtual machines. When processing more confidential data, you may even want to ensure that data in memory and swap space used by virtual machines are secure.
You may recall providers give you the option to run on heis elated and even secured hardware. This rules out the risk of another tenant somehow getting access to your virtual machines because you are the only one using this hardware, and it's worth consideration when you're in analyzing data that is considered truly confidential.
Moving on to the next point. Asymmetrical encryption is used to ensure data at rest cannot be accessed without the private key. It could be difficult to manage all these keys.
You have the different nodes involved in constructing and managing the data lake of information, and you also have the intermediary notes that are involved in collecting and processing
past services. Consent if i this management. But if you are managing a big data platform that runs in on, I asked Paradigm, you'll need to understand this at an extra level of detail and make sure the keys are managed and distributed appropriately,
which believes into the next point of securing big data platform you are using the defaults of big data platforms like a do port works and so forth haven't historically put security as a first class concern. The default setting should be reviewed for security with sass and past services.
The provider will take care of more, but you'll still want to understand the specific knobs and dials you have
in the realm of network access and securing the data, not just encrypting the data, but also having access management policies set up at the platform layer to control who and what can access the different data sets using the big data application plane. Ultimately, you'll want to know the platform you're working with.
Capabilities vary between applications. You deploy over I as
or the pass and SAS offering. You may select
artificial intelligence and machine learning, tying to big data for testing and training decision making algorithms.
These technologies can provide huge value for your business, so don't shy away from them because you don't understand them at the same time. Be ready to dive deep and make sure you understand what needs to be done to minimize data exposure and adhere to the applicable compliance and privacy requirements
in domain. Seven. We looked at the different compute categories and examine surveillance and detail. If you need a refresher on that, feel free to go back and review those videos. But in short, serverless really isn't serve Earless rather than provider is abstracting the underlining servers that he used to host the Compute
Development Teams love this because it shifts the server management burden over to the provider
providers like this so they have greater control over the workload distribution across the underlying physical infrastructure that they own. As a result, their arm or more technology options on how to realize serverless functions as a service, maybe proprietary, to provider such as AWS, Lambda or reserve functions and logic gaps.
But there are more and more agnostic frameworks being developed, for example, fish in which is a serverless solution. You can run on top of a kubernetes cluster,
giving you more control and reducing the vendor lock and concerns. Now the C S. A. Considers serverless more than just functions as a service. It includes other path services such as object storage, cloud load balancers, database as a service machine, learning's message, queues and so on. While personally, I think this expansion of the definition creates a lot of confusion
between what is considered past
and how you define serverless. Be aware that this point when you get questions about serverless on the sea CSK exam, whether we're talking about functions of the service or other paradigms. The concept of logging is gonna change your application. Or, more specifically, the serverless functions need to handle the logging themselves. The access management rules need to be defined with the platform
When you're using pass offerings,
this could be done at the management plane. But if you're using the provider agnostic platforms, this role require coordination between the management plane and the plane you used to administer. The serverless platform.
Once you have good logging, can identify incidents. The subsequent incident response procedures will also be different. Those used in a traditional on Prem model and even those response Mrs used in an eye *** model will need to be changed as a closing point on serverless. Being that so much responsibility moves to the provider,
you need to pay even closer attention to compliance levels the provider achieves
with the different servers list offerings that they're giving you
to close out this video. We discussed big data, the three V's distributed components and security considerations. We then gave a summary of server lists and reviews some of the specific security considerations in that paradigm