Scheduled data processing jobs
Discover open-source data engineering projects in Batch Processing from the community.
15 projects found
Predict, simulate, and debug Airflow schedules before they fail.
Visualize how your Airflow DAG schedules are distributed across the day with an interactive heatmap
Modern Lakehouse Architecture with Kafka + Spark Structured Streaming + Delta Lake
A real-time data streaming pipeline that captures live posts from Bluesky regarding the NBA, perform
A batch ETL pipeline that processes Yelp business raw data to generate analytics and insights
Building medallion architecture for crowd-sourced reviews using Snowflake native features
LLM Based Smart Clothing Suggestion
Reddit Data Engineering ETL Pipeline: Spark, Airflow, MinIO in Docker Medallion Architecture
Fully AWS-native data pipelines for processing basketball (NBA) data.
Never miss a new top starred repository
A friendly (and sometimes strict!) animated DAG auditor for Apache Airflow 3.1+
An end-to-end automated pipeline for collecting, processing, and analyzing news articles with machin
A powerful CLI tool that generates LLM-powered documentation for dbt models and columns
SCALABLE_YAHOO_API_ETL_PIPELINE_USING_AIRFLOW
Bulk manage Airflow DAG states effortlessly — pause or unpause in one action.