End-to-end batch pipeline: Azure SQL to Delta Lake Gold with SCD Type 2
A production-grade batch data engineering pipeline that simulates a music streaming platform's analytics backend. The pipeline extracts data incrementally from Azure SQL Server using watermark-based c...

A production-grade batch data engineering pipeline that simulates a music streaming platform's analytics backend. The pipeline extracts data incrementally from Azure SQL Server using watermark-based change detection, lands raw Parquet files in ADLS Gen2 (Bronze), applies PySpark transformations with SHA-256 hash-based CDC in Databricks (Silver), and maintains historical dimension tracking via SCD Type 2 in Delta Lake (Gold). Orchestrated by Azure Data Factory with metadata-driven parameterization, secured through Azure Key Vault and Managed Identities, and fully provisioned via Terraform IaC. The star schema covers users, artists, tracks, dates, and streaming events — enabling temporal analytics across subscription changes, artist growth, and listening behavior over time.
No maintenance status has been set yet.
You must be logged in to comment
Sign in to commentNo comments yet
Be the first to share your thoughts!