Data Warehouse Fundamentals and Architecture

Difference Between Data Warehouse and Data Mart

BasisData WarehouseData Mart
ScopeEntire organizationSpecific department
SizeLargeSmall
DataAll business dataSubject-specific data
UsersWhole organizationDepartment users
CostHighLow
TimeMore implementation timeLess implementation time
ComplexityComplexSimple

Key Takeaways

  • Data Warehouse: Stores organization-wide data.
  • Data Mart: Stores department-specific data.

πŸ‘‰ Warehouse = Whole company
πŸ‘‰ Mart = One department

What is a Data Warehouse?

A Data Warehouse is a centralized repository used to store integrated, historical data from multiple sources for analysis and decision-making.

Core Characteristics

  • Subject-Oriented: Organized by subjects like sales or customers.
  • Integrated: Combines data from different sources.
  • Time-Variant: Stores historical data.
  • Non-Volatile: Data remains stable once stored.

Basic Elements of a Data Warehouse

  1. Data Sources: Databases, files, and applications.
  2. ETL: Extract, Transform, and Load process.
  3. Data Warehouse Storage: Central storage of processed data.
  4. Data Marts: Department-specific subsets.
  5. Metadata: Data about data.
  6. OLAP: Tools for analysis.
  7. Front-End Tools: Reports and dashboards.

Data Warehouse Architecture

Data Warehouse Architecture is the structure that defines how data is collected, processed, stored, and accessed for analysis.

Layers of Architecture

  1. Data Sources: Data collected from databases, ERP, CRM, and files.
  2. ETL Layer: Extracts, transforms, and loads data.
  3. Data Warehouse Storage: Central repository for integrated data.
  4. Data Marts: Department-specific subsets.
  5. OLAP Layer: Used for data analysis.
  6. Front-End Tools: Reports and dashboards for users.

Architecture Types

  • Single-Tier: Simple architecture.
  • Two-Tier: Client connected to the warehouse.
  • Three-Tier: Most commonly used (Bottom, Middle, and Top layers).

Top-Down vs. Bottom-Up Approaches

Top-Down: Builds the complete Data Warehouse first, then creates data marts.
Bottom-Up: Builds Data Marts first, then integrates them into a Data Warehouse.

BasisTop-DownBottom-Up
Starts WithData WarehouseData Marts
ApproachCentralizedIncremental
TimeMoreLess
CostHighLow
ComplexityComplexSimple
FlexibilityLessMore

Difference Between OLAP and OLTP

BasisOLAPOLTP
Full FormOnline Analytical ProcessingOnline Transaction Processing
PurposeData analysisDaily transactions
DataHistorical dataCurrent/real-time data
OperationsComplex queriesInsert, Update, Delete
UsersManagers, AnalystsCustomers, Employees
SpeedSlower (analysis-heavy)Faster (transaction-heavy)
ExampleData WarehouseBanking, ATM, E-commerce

Key Components

a) Data Mart

A Data Mart is a smaller subset of a Data Warehouse designed for a specific department like Sales, HR, or Finance.

b) Data Staging

Data Staging is the temporary area where data is collected, cleaned, and transformed before loading into the Data Warehouse.

c) Metadata

Metadata is data about data. It provides information about the structure, source, and meaning of data.

Business Systems: KM, CRM, and ERP

a) KM (Knowledge Management)

Knowledge Management (KM) is the process of collecting, organizing, and sharing knowledge to improve productivity.

b) CRM (Customer Relationship Management)

CRM is a system used to manage interactions with customers and track sales.

c) ERP (Enterprise Resource Planning)

ERP is software used to integrate core business processes like finance, HR, and inventory.

Data Warehouse Lifecycle

  1. Requirement Analysis: Identify business needs.
  2. Design: Create data models and architecture.
  3. Data Extraction & ETL: Collect and clean data.
  4. Development: Build and store data.
  5. Testing: Check accuracy and performance.
  6. Deployment: Make data available to users.
  7. Maintenance: Monitor and improve the system.

Star Schema

Star Schema is a schema where one Fact Table is connected to multiple Dimension Tables.

  • Fact Table: Contains measures (sales, quantity, revenue).
  • Dimension Tables: Contain descriptive details (customer, product, time).

Data Warehousing as an Environment

Data warehousing is an environment, not just a product. It is a complete framework involving data sources, ETL processes, storage, analysis tools, and management processes working together.

Metadata Types

  1. Technical Metadata: Describes technical details like table names and data types.
  2. Business Metadata: Describes data in business terms and rules.
  3. Operational Metadata: Describes processing information like load times and error logs.