A data pipeline for exploring and analyzing nearby beverage places using dbt, Dagster, and Spark.
Before getting started, you'll need:
-
Local Spark Thrift Server
- This project requires a local Spark Thrift Server to run the dbt models
- The Spark SQL endpoint should be available at
jdbc:hive2://localhost:10000 - Make sure your Spark Thrift Server is properly configured with Hudi support
-
Environment Variables
GOOGLE_PLACES_API_KEY: Your Google Places API key- Other environment variables as specified in the project
.
├── dbt_main/ # dbt project files
│ └── models/ # dbt models
├── dag/ # Dagster pipeline definitions
└── docker-compose.yml # Docker configuration
- Start your local Spark Thrift Server
- Set up environment variables in a
.envfile - Run the Dagster pipeline:
dagster dev
- Access the Dagster UI at
http://localhost:3000
To run dbt models directly:
cd dbt_main
dbt buildThe pipeline is scheduled to run every Friday at 6 AM by default. You can modify this in dag/definitions.py.
- This project uses Apache Hudi for incremental data processing
- Check the Dagster UI for pipeline execution details and logs
