Azure Storage and Data Analytics Essentials

Azure Storage Solutions

1. Azure Blob vs. File Storage

FeatureAzure Blob StorageAzure File Storage
Data TypeUnstructured objects (blobs)Shared files (SMB protocol)
Use CaseImages, videos, backups, logsShared folders, lift-and-shift
AccessREST APIsNetwork drive mounting
ScalabilityHigh (Cloud applications)High (System file sharing)

2. Azure Blob Storage Access Tiers

  • Hot Tier: Frequently accessed data; higher storage cost, low access cost.
  • Cool Tier: Infrequently accessed data (min. 30 days); lower storage cost, higher access cost.
  • Archive Tier: Long-term backup; lowest storage cost, highest retrieval time.

3. Key Capabilities of Azure Storage

  • Massive scalability
  • High availability and durability
  • Secure data storage with encryption
  • Multiple services (Blob, File, Queue, Table)
  • Backup and disaster recovery support
  • REST API and SDK accessibility

4. Azure Blob Storage Features

Features: Stores large amounts of unstructured data, supports various media/documents, provides multiple access tiers, and ensures high security.

Use Cases: Media streaming, data backup, big data analytics, and application logging.

5. Data Durability and Availability

Azure ensures data integrity via replication:

  • LRS: Locally Redundant Storage (within one datacenter).
  • ZRS: Zone-Redundant Storage (across availability zones).
  • GRS: Geo-Redundant Storage (across geographic regions).

6. Azure Table Storage Partitioning

Partitioning uses a Partition Key to group entities, enabling faster queries, improved performance, and efficient workload distribution.

7. NoSQL Database Characteristics

  • Schema-less design
  • High scalability
  • Handles structured and unstructured data
  • Distributed architecture
  • Flexible models (key-value, document, graph, column-family)

8. Azure Cosmos DB Consistency Levels

  • Strong: Latest data always visible.
  • Bounded Staleness: Minor data lag.
  • Session: Consistent within a user session.
  • Consistent Prefix: Preserves update order.
  • Eventual: Highest performance, lowest consistency.

9. Global Distribution

Azure Cosmos DB replicates data across multiple regions, allowing applications to read/write from the nearest location to reduce latency.

10. Throughput Modes

FeatureProvisioned ThroughputServerless Mode
BillingFixed RU/sActual usage
WorkloadPredictableInfrequent
ScaleLarge-scale appsDev/Test

11. Azure File Storage

Provides fully managed cloud file shares via SMB. Used for shared application data, legacy migrations, and VM file sharing.

12. Azure Table Storage vs. Relational

Unlike relational databases, Table Storage is a NoSQL key-value store with no fixed schema, no joins, and utilizes Partition/Row keys.

13. Azure Cosmos DB Use Cases

Use Cases: Real-time web/mobile apps, IoT, gaming, and e-commerce. Benefits: Global distribution, low latency, and automatic scaling.

14. Supported APIs

Cosmos DB supports SQL, MongoDB, Cassandra, Gremlin (graph), and Table APIs.

Unit IV: Data Analytics in Azure

1. Data Analytics Pipeline Stages

  1. Data collection
  2. Data ingestion
  3. Data storage
  4. Data processing
  5. Data analysis
  6. Data visualization and reporting

2. Azure Data Factory

A cloud-based integration service that collects data, performs ETL/ELT, automates workflows, and transfers data between systems.

3. Lakehouse Architecture

Combines data lake and warehouse features. It supports structured/unstructured data, lowers costs, and enables advanced machine learning.

4. Large-Scale Analytics Elements

Includes ingestion, distributed storage, parallel processing, transformation, and visualization.

5. Ingestion Considerations

Factors include volume/velocity, processing type (batch vs. real-time), quality, security, and latency.

6. Apache Spark

Used for big data analytics, machine learning, and real-time stream processing via in-memory computation.

7. Analytical Data Store Selection

Consider data type, query performance, scalability, cost, security, and real-time requirements.

8. Real-Time Analytics

Processes data immediately. Examples: Fraud detection, live traffic monitoring, and IoT sensor analysis.

9. Batch vs. Stream Processing

FeatureBatch ProcessingStream Processing
MethodGroupsContinuous
LatencyHighLow
ExamplePayrollStock market

10. Azure Analytical Stores

Options include Azure Data Lake Storage, Synapse Analytics, Microsoft Fabric, Cosmos DB, and SQL Data Warehouse.

11. Microsoft Fabric

An integrated platform combining data engineering, warehousing, and analytics. It supports Lakehouse architecture and unified reporting.

12. Batch vs. Streaming Data

Batch data is collected over time for periodic reports, while streaming data is a continuous flow processed instantly for real-time monitoring.

13. Real-Time Technologies

  • Fabric Real-Time Intelligence: For dashboards and monitoring.
  • Lakehouse: For structured/unstructured streaming data.
  • Spark Structured Streaming: For high-scalability, low-latency processing.