Overview

Introduction

The Design the Data Warehouse module provides you with the instructions and devices to develop hands on skills in the following topics:

  • Building an Enterprise Data Warehouse Model Using Snowflake Schema with All Required Columns
  • Adding a Data Lineage for a Dimension Table
  • Working with Fact Tables
  • Adding Data Lineage for a Fact Table
  • Working with Column Store Index
  • Partitioning the fact table
  • Implementing an incremental data load

Exercise 1 - Building an Enterprise Data Warehouse Model Using Snowflake Schema

In this exercise, you will analyse the snowflake schema available in the sample AdventureWorksDW2012 database by creating a multidimensional cube. A schema is called a snowflake schema if one or more dimension tables do not join directly to the fact table but join through other dimension tables. They are normalised and hence support ease of dimension maintenance.

A Data Warehouse:

  • Collects, consolidates and organises historical data for reporting and analysis purpose.
  • Supports business decisions and serve users.
  • Uses tools such as online analytical processing (OLAP) and data mining.
  • Is built on relational database technology.
  • Is based on a dimensional model.

Exercise 2 - Adding a Data Lineage for a Dimension Table

The data warehousing model follows the dimensional model which is very asymmetric. The dimensional model contains one large table in the centre of the schema which is connected to other tables. The central table is called the fact table and the other tables are the dimension tables.

A dimension table is a logical grouping of data. A dimension table has a primary key column known as the surrogate key to uniquely identify each record it contains. The surrogate key is a system-generated key. You can associate the dimension table to the fact table using the surrogate key. Surrogate keys provide the means to maintain data warehouse information when dimensions change.

In data warehousing, it is very important to view the origin of data right from the start of the design phase till the end of the implementation phase. Viewing of data helps in analysis of sources and target mapping. Data lineage reporting helps in various data warehouse maintenance tasks. The lineage information includes key pieces of table metadata. It includes the source systems of the data and how the data was transformed and manipulated.

Exercise 3 - Working with Fact Tables

A fact table:

  • Is the central table in a star schema of a data warehouse
  • Stores quantitative information for analysis
  • Is often demoralized
  • Joins with dimension tables
  • Holds the data to be analyzed
  • Consists of two types of columns

You can physically partition very huge fact tables for implementation and maintenance design considerations. In this exercise, you will learn to work with fact tables.

Exercise 4 - Adding Data Lineage for a Fact Table

You learnt how to add data lineage for dimension tables. Similarly, you can add data for fact tables. In this exercise, you will learn to add data lineage for the created fact table.

Exercise 5 - Working with Column Store Index

A columnstore index is a technology to store, retrieve and manage data by using a columnar data format, called a columnstore. A column store is data that is logically organized as a table with rows and columns and physically stored in a column-wise data format. A row store is data that is logically organized as a table with rows and columns and then physically stored in a row-wise data format. A row group is a group of rows that are compressed into column store format at the same time. A column segment is a column of data from within the rowgroup.

In this exercise, you will learn to work with column store index.

Exercise 6 - Partitioning the Fact Table

Partitioning makes the large tables and indexes more manageable and efficient to use. In this exercise, you will partition a fact table in AdventureWorksDW2012 database using file groups.

Exercise 7 - Implement an Incremental Data Load

Incremental data loads are the regular data loads into the data warehouse. Only new or updated data are loaded in this load. In this exercise, you will implement an incremental data load using SSIS.

Comprehensive Learning

See the full benefits of our immersive learning experience with interactive courses and guided career paths.