Machine-Learning-Fundamentals

A comprehensive collection of machine learning exercises implemented in Python, covering everything from binary classification to clustering algorithms. This project demonstrates practical applications of supervised and unsupervised learning techniques using real-world datasets.

What This Project Does

This repository contains six machine learning exercises that progress from basic classification to advanced clustering techniques. Each exercise is implemented as a Jupyter notebook with complete code, visualizations, and analysis.

The project uses commit log data to explore various machine learning concepts:

• Predicting working days vs weekends from commit patterns
• Visualizing decision boundaries for different algorithms
• Handling multiclass classification with categorical features
• Preventing overfitting through proper data splitting
• Performing regression analysis on user behavior
• Discovering user groups through clustering

Files Overview

`src/ex00/00_binary_classifier_logreg.ipynb`

Implements binary classification using logistic regression to predict whether commits were made on weekdays or weekends. Features engineering extracts commit counts before and after midday as predictive features.

`src/ex01/01_binary_classifier_svm_tree.ipynb`

Compares three classification algorithms (logistic regression, SVM, decision trees) with decision boundary visualization. Demonstrates how different algorithms create different separation strategies for the same dataset.

`src/ex02/02_multiclassi_one-hot.ipynb`

Extends classification to predict specific weekdays using multiclass algorithms. Implements one-hot encoding for categorical features and includes random forest classification with feature importance analysis.

`src/ex03/03_split_crossval.ipynb`

Addresses overfitting through train-test splits and cross-validation techniques. Compares model performance using proper validation methods to ensure generalization to unseen data.

`src/ex04/04_regression.ipynb`

Implements regression analysis to predict average time deltas between deadlines and first commits. Uses user behavior data including newsfeed views and commit frequency as predictive features.

`src/ex05/05_clustering.ipynb`

Applies unsupervised learning through clustering algorithms to identify user behavior patterns. Groups users based on their activity patterns for potential targeted interventions.

Key Techniques Used

Feature Engineering: Custom extraction of temporal features from timestamp data
One-Hot Encoding: Converting categorical variables into numerical format for machine learning algorithms
Cross-Validation: K-fold validation for robust model evaluation
Decision Boundary Visualization: 2D plotting of algorithm decision surfaces
Train-Test Splitting: Proper data separation to prevent overfitting
Feature Importance Analysis: Identifying which variables contribute most to predictions

Technologies and Libraries

Technology	Purpose
scikit-learn (0.23.1)	Primary machine learning library providing classification, regression, and clustering algorithms
Jupyter Notebook	Interactive development environment for data analysis and visualization
NumPy	Numerical computing library for array operations and mathematical functions
Pandas	Data manipulation and analysis library for structured data handling
Matplotlib	Plotting library for creating static visualizations and decision boundary plots

Project Structure

├── src/
│   ├── ex00/
│   ├── ex01/
│   ├── ex02/
│   ├── ex03/
│   ├── ex04/
│   └── ex05/
└── data/

src/: Contains all exercise notebooks organized by topic, each focusing on specific machine learning concepts
data/: Storage directory for datasets used across exercises, including processed commit logs and user behavior data

Each exercise directory (src/ex00/, src/ex01/, etc.) contains a single Jupyter notebook that implements the complete machine learning pipeline for that specific technique, from data preprocessing through model evaluation and visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine-Learning-Fundamentals

What This Project Does

Files Overview

`src/ex00/00_binary_classifier_logreg.ipynb`

`src/ex01/01_binary_classifier_svm_tree.ipynb`

`src/ex02/02_multiclassi_one-hot.ipynb`

`src/ex03/03_split_crossval.ipynb`

`src/ex04/04_regression.ipynb`

`src/ex05/05_clustering.ipynb`

Key Techniques Used

Technologies and Libraries

Project Structure

About

Uh oh!

Releases

Packages

Languages

dxseva/Machine-Learning-Fundamentals

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-Fundamentals

What This Project Does

Files Overview

src/ex00/00_binary_classifier_logreg.ipynb

src/ex01/01_binary_classifier_svm_tree.ipynb

src/ex02/02_multiclassi_one-hot.ipynb

src/ex03/03_split_crossval.ipynb

src/ex04/04_regression.ipynb

src/ex05/05_clustering.ipynb

Key Techniques Used

Technologies and Libraries

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`src/ex00/00_binary_classifier_logreg.ipynb`

`src/ex01/01_binary_classifier_svm_tree.ipynb`

`src/ex02/02_multiclassi_one-hot.ipynb`

`src/ex03/03_split_crossval.ipynb`

`src/ex04/04_regression.ipynb`

`src/ex05/05_clustering.ipynb`

Packages