Discover data engineering projects built with Apache Spark. Browse workflows, pipelines, and integrations from the community.
8 projects found
Content monitoring analytics service using latest AWS S3 Tables along with MSK, EMR (SLA=20 mins)
Modern Lakehouse Architecture with Kafka + Spark Structured Streaming + Delta Lake
A real-time data streaming pipeline that captures live posts from Bluesky regarding the NBA, perform
A batch ETL pipeline that processes Yelp business raw data to generate analytics and insights
Reddit Data Engineering ETL Pipeline: Spark, Airflow, MinIO in Docker Medallion Architecture
What if your dashboards were as realtime as Max vestappen!
An end-to-end automated pipeline for collecting, processing, and analyzing news articles with machin
SCALABLE_YAHOO_API_ETL_PIPELINE_USING_AIRFLOW