Data Warehousing

Overview

Teaching: 45 min
Exercises: 15 min
Questions
  • What is a datawarehouse?

  • How can I leverage relational databases for reporting and BI?

  • How can I use python to populate a datawarehouse for Tableau?

Objectives
  • Model real world situations with a schema

  • Identify facts and dimensions within a datawarehouse.

We saw in the previous lesson that a database can be used to store structured, tabular data. The collection of tables, their structures and constraints is sometimes referred to as a database schema.

What is a Data Warehouse?

Design Overview: Facts and Dimensions

A datawarehouse is made up of two types of tables. Facts represent a single measurement. A measurement is simply a value without a context. In datawarehousing, these are referred to as “Facts”.

*Facts* refer to individual measurements without a context.
*Dimensions* refer to pieces of context that give meaning to a *Fact*.

ETL : Extract Transform Load

Extract Transform Load refers to the process of taking many sources of data, and loading them to a common database. A single ETL Process will take a source and load it into one or more targets.

Data Integration and Aggregation

After loading data, it may be important to derive or aggregate additional results. This is known as data integration.

Reporting and BI

Tools such as BI are meant to read data from a datawarehouse and create visualization.

Creating an ETL Process

Break out gapminder dataset into two tables: GDP by year, and a country/continent table.

Key Points

  • Data warehouses are designed to facilitate reporting.

  • Business intelligence tools are built to use data warehouse.

  • A data warehouse is built by an ETL process, and a data integration process.