Parkrunner Data Pipeline Project

Streamlit Preview

Extraction Script Instructions

Environment Setup
An .env file must be included in the project root with the following variables:
- DB_HOST
- DB_PORT
- DB_USER
- DB_PASSWORD
- DB_NAME
Dependency Installation
Ensure all necessary packages are installed by running: pip install -r requirements.txt
Data Extraction and Insertion
Run the extraction script using:

python runETL.py

The script will: Get a list of all UK parkrun events. Request the url from each event's most recent result page. Etract and store the result data from each event. Transform the results into a DataFrame Insert the cleaned data into the Pagilla database.

Cron Details

The extraction script should be scheduled to run weekly on a weekday(e.g. Every Monday at 10 PM GMT). The extraction process should take between 2-4 hours, due to request delays to reduce server load. Please do not run the script on weekends.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.devcontainer		.devcontainer
data		data
images		images
pages		pages
utils		utils
.gitignore		.gitignore
Home.py		Home.py
Planning.md		Planning.md
README.md		README.md
flowchartlr.png		flowchartlr.png
requirements.txt		requirements.txt
runETL.py		runETL.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parkrunner Data Pipeline Project

Streamlit Preview

Extraction Script Instructions

Cron Details

About

Uh oh!

Releases

Packages

Uh oh!

Languages

RupertWatson/Parkrunner

Folders and files

Latest commit

History

Repository files navigation

Parkrunner Data Pipeline Project

Streamlit Preview

Extraction Script Instructions

Cron Details

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages