Extracting and Cleaning Data Using Python Lab

Extracting and cleaning data are just two components of the data wrangling process (gathering, extracting, cleaning, and storing data). Where extracting data is the process of drawing out only relevant data in an attempt to answer a fundamental question(s) during analysis. And where data cleaning involves the removal of data that may have negative impacts on the true data’s behavior. These things include missing or deleted data, unexpected character types (commas, semicolon, numbers, etc.), outliners, unexpected values, different formats (US or European), etc. In this lab, we will be working with the kddcup.data.corrected dataset to prepare it for analysis. First, we will use Python to separate the data out based on its classification (normal or abnormal). Then we will use Python to clean the data by removing the flow labels and punctuation marks that may cause problems with our model. Last, we will import the data into Pandas and explore how to structure, shape, and clean the data using a statistical Python libraries.

Overview

Extracting and cleaning data are just two components of the data wrangling process (gathering, extracting, cleaning, and storing data). Where extracting data is the process of drawing out only relevant data in an attempt to answer a fundamental question(s) during analysis. And where data cleaning involves the removal of data that may have negative impacts on the true data’s behavior. These things include missing or deleted data, unexpected character types (commas, semicolon, numbers, etc.), outliners, unexpected values, different formats (US or European), etc. In this lab, we will be working with the kddcup.data.corrected dataset to prepare it for analysis. First, we will use Python to separate the data out based on its classification (normal or abnormal). Then we will use Python to clean the data by removing the flow labels and punctuation marks that may cause problems with our model. Last, we will import the data into Pandas and explore how to structure, shape, and clean the data using a statistical Python libraries.

Learning Partner
Infosec Learning
Infosec Learning
Infosec Learning provides businesses, colleges, governments, and K-12 school districts a feature rich information technology training and skill assessment service via an advanced, cloud based, virtual machine powered platform, capable of significant customization with unlimited scale and growth potential.