Data Warehouse Fundamentals and Architecture
Difference Between Data Warehouse and Data Mart
| Basis | Data Warehouse | Data Mart |
|---|---|---|
| Scope | Entire organization | Specific department |
| Size | Large | Small |
| Data | All business data | Subject-specific data |
| Users | Whole organization | Department users |
| Cost | High | Low |
| Time | More implementation time | Less implementation time |
| Complexity | Complex | Simple |
Key Takeaways
- Data Warehouse: Stores organization-wide data.
- Data Mart: Stores department-specific data.
π Warehouse = Whole company
π Mart = One department
What is a Data Warehouse?
A Data Warehouse is a centralized repository used to store integrated, historical data from multiple sources for analysis and decision-making.
Core Characteristics
- Subject-Oriented: Organized by subjects like sales or customers.
- Integrated: Combines data from different sources.
- Time-Variant: Stores historical data.
- Non-Volatile: Data remains stable once stored.
Basic Elements of a Data Warehouse
- Data Sources: Databases, files, and applications.
- ETL: Extract, Transform, and Load process.
- Data Warehouse Storage: Central storage of processed data.
- Data Marts: Department-specific subsets.
- Metadata: Data about data.
- OLAP: Tools for analysis.
- Front-End Tools: Reports and dashboards.
Data Warehouse Architecture
Data Warehouse Architecture is the structure that defines how data is collected, processed, stored, and accessed for analysis.
Layers of Architecture
- Data Sources: Data collected from databases, ERP, CRM, and files.
- ETL Layer: Extracts, transforms, and loads data.
- Data Warehouse Storage: Central repository for integrated data.
- Data Marts: Department-specific subsets.
- OLAP Layer: Used for data analysis.
- Front-End Tools: Reports and dashboards for users.
Architecture Types
- Single-Tier: Simple architecture.
- Two-Tier: Client connected to the warehouse.
- Three-Tier: Most commonly used (Bottom, Middle, and Top layers).
Top-Down vs. Bottom-Up Approaches
Top-Down: Builds the complete Data Warehouse first, then creates data marts.
Bottom-Up: Builds Data Marts first, then integrates them into a Data Warehouse.
| Basis | Top-Down | Bottom-Up |
|---|---|---|
| Starts With | Data Warehouse | Data Marts |
| Approach | Centralized | Incremental |
| Time | More | Less |
| Cost | High | Low |
| Complexity | Complex | Simple |
| Flexibility | Less | More |
Difference Between OLAP and OLTP
| Basis | OLAP | OLTP |
|---|---|---|
| Full Form | Online Analytical Processing | Online Transaction Processing |
| Purpose | Data analysis | Daily transactions |
| Data | Historical data | Current/real-time data |
| Operations | Complex queries | Insert, Update, Delete |
| Users | Managers, Analysts | Customers, Employees |
| Speed | Slower (analysis-heavy) | Faster (transaction-heavy) |
| Example | Data Warehouse | Banking, ATM, E-commerce |
Key Components
a) Data Mart
A Data Mart is a smaller subset of a Data Warehouse designed for a specific department like Sales, HR, or Finance.
b) Data Staging
Data Staging is the temporary area where data is collected, cleaned, and transformed before loading into the Data Warehouse.
c) Metadata
Metadata is data about data. It provides information about the structure, source, and meaning of data.
Business Systems: KM, CRM, and ERP
a) KM (Knowledge Management)
Knowledge Management (KM) is the process of collecting, organizing, and sharing knowledge to improve productivity.
b) CRM (Customer Relationship Management)
CRM is a system used to manage interactions with customers and track sales.
c) ERP (Enterprise Resource Planning)
ERP is software used to integrate core business processes like finance, HR, and inventory.
Data Warehouse Lifecycle
- Requirement Analysis: Identify business needs.
- Design: Create data models and architecture.
- Data Extraction & ETL: Collect and clean data.
- Development: Build and store data.
- Testing: Check accuracy and performance.
- Deployment: Make data available to users.
- Maintenance: Monitor and improve the system.
Star Schema
Star Schema is a schema where one Fact Table is connected to multiple Dimension Tables.
- Fact Table: Contains measures (sales, quantity, revenue).
- Dimension Tables: Contain descriptive details (customer, product, time).
Data Warehousing as an Environment
Data warehousing is an environment, not just a product. It is a complete framework involving data sources, ETL processes, storage, analysis tools, and management processes working together.
Metadata Types
- Technical Metadata: Describes technical details like table names and data types.
- Business Metadata: Describes data in business terms and rules.
- Operational Metadata: Describes processing information like load times and error logs.
