This repository contains three main projects showcasing my data engineering skills: ETL, Web Scraping and Data Extraction, and SQLite Database Operations.
This project demonstrates the ETL process which involves extracting data from various file formats, transforming the data, and loading it into a target file.
Features:
- Extract data from CSV, JSON, and XML files.
- Transform data by converting units.
- Load data into a CSV file.
- Log each phase of the ETL process.
Usage:
- Run
data_extraction.pyto execute the ETL process.
This project involves scraping data from a web page, parsing the HTML to extract specific data points, and storing the extracted data in both CSV and SQLite database formats.
Features:
- Scrape data from a specified URL.
- Parse HTML content to extract table data.
- Save extracted data to a CSV file and an SQLite database.
Usage:
- Run
webscraping.pyto perform web scraping and data extraction.
This project demonstrates various database operations using SQLite, including creating tables, inserting data, and querying the database.
Features:
- Read data from CSV files.
- Create and populate SQLite tables.
- Perform SQL queries and operations on the database.
- Append new data to tables.
Usage:
- Run
sqlite_operations.pyto perform database operations.
To install the required Python libraries, navigate to the directory containing requirements.txt and run:
pip install -r requirements.txtPlease go through Docs:
This README.md includes sections for each project, a brief description, usage instructions, and links to the HLD and LLD documents located in the doc directory.