Refactoring-Regression-Pipeline-with-GenAI

Refactoring my "First Pipeline" project using GenAI fundementals

Installation to System (https://numpy.org/install/)

**Windows choc install numpy

**Linux sudo apt install python3-numpy ~ Run script using: import numpy as np print(np.version)

Or install to current project

pip install numpy

Installing Pandas (https://pandas.pydata.org/getting_started.html)

** Linux ~ pip install pandas

N.B install all modules ('using pip modle_name'), in the .py file, before running, and install kaggle 'pip install kaggle' kaggle kernels pull yashsahu02/drw-crypto-market-prediction

What's a .parquet file?

A Parquet file is a type of data file format designed for efficient storage and fast processing of large datasets, especially in big data and analytics workflows.

What is Parquet? Apache Parquet is an open-source, columnar storage file format optimized for use with big data processing frameworks like Apache Spark, Hadoop, and Python libraries like pandas and pyarrow.

It stores data by columns rather than by rows, which makes it highly efficient for analytical queries that often only need a subset of columns.

It supports efficient compression and encoding schemes to reduce file size and speed up data reading.

Why use Parquet? Efficient storage: Because it's columnar, it compresses data better than row-based formats like CSV.

Faster queries: Reads only the columns needed, improving speed.

Schema-aware: The file contains metadata describing data types, so it's easy to read the data correctly.

Compatibility: Widely used in data pipelines, cloud data warehouses, and machine learning workflows.

Project Structure

crypto-regression/ │ ├── data/ │ └── kaggle/ # Original data path ├── models/ │ └── saved_predictions/ ├── src/ │ ├── data_loader.py │ ├── preprocessing.py │ ├── feature_selection.py │ ├── train_models.py │ └── utils.py ├── main.py └── README.md

Acknowledgements

This project is inspired by and based on Yash Sahu's notebook on Kaggle.
Credit to the original author for data exploration, modeling flow, and base structure.

This refactor emphasizes modularity, scalability, and maintainability through GenAI principles.

Built by Kagiso (@blaQPablo88) using GenAI assistance.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
app.py		app.py
drw-crypto-market-prediction.ipynb		drw-crypto-market-prediction.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Refactoring-Regression-Pipeline-with-GenAI

Installation to System (https://numpy.org/install/)

Or install to current project

Installing Pandas (https://pandas.pydata.org/getting_started.html)

What's a .parquet file?

Project Structure

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

blaQPablo88/First_Pipeline_Python

Folders and files

Latest commit

History

Repository files navigation

Refactoring-Regression-Pipeline-with-GenAI

Installation to System (https://numpy.org/install/)

Or install to current project

Installing Pandas (https://pandas.pydata.org/getting_started.html)

What's a .parquet file?

Project Structure

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages