Azure Storage and Data Analytics Essentials
Azure Storage Solutions
1. Azure Blob vs. File Storage
| Feature | Azure Blob Storage | Azure File Storage |
|---|---|---|
| Data Type | Unstructured objects (blobs) | Shared files (SMB protocol) |
| Use Case | Images, videos, backups, logs | Shared folders, lift-and-shift |
| Access | REST APIs | Network drive mounting |
| Scalability | High (Cloud applications) | High (System file sharing) |
2. Azure Blob Storage Access Tiers
- Hot Tier: Frequently accessed data; higher storage cost, low access cost.
- Cool Tier: Infrequently accessed data (min. 30 days); lower storage cost, higher access cost.
- Archive Tier: Long-term backup; lowest storage cost, highest retrieval time.
3. Key Capabilities of Azure Storage
- Massive scalability
- High availability and durability
- Secure data storage with encryption
- Multiple services (Blob, File, Queue, Table)
- Backup and disaster recovery support
- REST API and SDK accessibility
4. Azure Blob Storage Features
Features: Stores large amounts of unstructured data, supports various media/documents, provides multiple access tiers, and ensures high security.
Use Cases: Media streaming, data backup, big data analytics, and application logging.
5. Data Durability and Availability
Azure ensures data integrity via replication:
- LRS: Locally Redundant Storage (within one datacenter).
- ZRS: Zone-Redundant Storage (across availability zones).
- GRS: Geo-Redundant Storage (across geographic regions).
6. Azure Table Storage Partitioning
Partitioning uses a Partition Key to group entities, enabling faster queries, improved performance, and efficient workload distribution.
7. NoSQL Database Characteristics
- Schema-less design
- High scalability
- Handles structured and unstructured data
- Distributed architecture
- Flexible models (key-value, document, graph, column-family)
8. Azure Cosmos DB Consistency Levels
- Strong: Latest data always visible.
- Bounded Staleness: Minor data lag.
- Session: Consistent within a user session.
- Consistent Prefix: Preserves update order.
- Eventual: Highest performance, lowest consistency.
9. Global Distribution
Azure Cosmos DB replicates data across multiple regions, allowing applications to read/write from the nearest location to reduce latency.
10. Throughput Modes
| Feature | Provisioned Throughput | Serverless Mode |
|---|---|---|
| Billing | Fixed RU/s | Actual usage |
| Workload | Predictable | Infrequent |
| Scale | Large-scale apps | Dev/Test |
11. Azure File Storage
Provides fully managed cloud file shares via SMB. Used for shared application data, legacy migrations, and VM file sharing.
12. Azure Table Storage vs. Relational
Unlike relational databases, Table Storage is a NoSQL key-value store with no fixed schema, no joins, and utilizes Partition/Row keys.
13. Azure Cosmos DB Use Cases
Use Cases: Real-time web/mobile apps, IoT, gaming, and e-commerce. Benefits: Global distribution, low latency, and automatic scaling.
14. Supported APIs
Cosmos DB supports SQL, MongoDB, Cassandra, Gremlin (graph), and Table APIs.
Unit IV: Data Analytics in Azure
1. Data Analytics Pipeline Stages
- Data collection
- Data ingestion
- Data storage
- Data processing
- Data analysis
- Data visualization and reporting
2. Azure Data Factory
A cloud-based integration service that collects data, performs ETL/ELT, automates workflows, and transfers data between systems.
3. Lakehouse Architecture
Combines data lake and warehouse features. It supports structured/unstructured data, lowers costs, and enables advanced machine learning.
4. Large-Scale Analytics Elements
Includes ingestion, distributed storage, parallel processing, transformation, and visualization.
5. Ingestion Considerations
Factors include volume/velocity, processing type (batch vs. real-time), quality, security, and latency.
6. Apache Spark
Used for big data analytics, machine learning, and real-time stream processing via in-memory computation.
7. Analytical Data Store Selection
Consider data type, query performance, scalability, cost, security, and real-time requirements.
8. Real-Time Analytics
Processes data immediately. Examples: Fraud detection, live traffic monitoring, and IoT sensor analysis.
9. Batch vs. Stream Processing
| Feature | Batch Processing | Stream Processing |
|---|---|---|
| Method | Groups | Continuous |
| Latency | High | Low |
| Example | Payroll | Stock market |
10. Azure Analytical Stores
Options include Azure Data Lake Storage, Synapse Analytics, Microsoft Fabric, Cosmos DB, and SQL Data Warehouse.
11. Microsoft Fabric
An integrated platform combining data engineering, warehousing, and analytics. It supports Lakehouse architecture and unified reporting.
12. Batch vs. Streaming Data
Batch data is collected over time for periodic reports, while streaming data is a continuous flow processed instantly for real-time monitoring.
13. Real-Time Technologies
- Fabric Real-Time Intelligence: For dashboards and monitoring.
- Lakehouse: For structured/unstructured streaming data.
- Spark Structured Streaming: For high-scalability, low-latency processing.
