A modern, data-driven web application that predicts developer salaries using machine learning trained on real Stack Overflow 2023 survey data.
Note: Add your PNG screenshot files to
docs/images/to display them here.
Real-time salary prediction with probability distributions and data insights
Clean, educational interface explaining the ML approach
To add screenshots:
- Save your app screenshots as PNG files
- Name them
prediction-results.pngandwelcome-interface.png - Place them in the
docs/images/folder - Remove the
.placeholderfiles - Commit and push the changes
This application helps developers understand salary expectations based on their profile using a Naive Bayes machine learning model trained on 35,082 real developer responses from the Stack Overflow 2023 Developer Survey.
- π§ Real ML Predictions: Naive Bayes model with variable elimination inference
- π Data Transparency: Shows exactly how many similar developers exist in the dataset
- π¨ Modern UI: Clean, responsive React frontend with educational ML explanations
- β‘ Fast API: Flask backend with real-time predictions
- π Probability Distributions: Confidence scores across all salary ranges
- π Global Data: Includes developers from 10+ countries with various experience levels
- Python 3.8+
- Node.js 16+
- npm or yarn
-
Clone the repository
git clone https://github.com/yousef20920/SalaryPredict.git cd SalaryPredict -
Set up the backend
# Install Python dependencies pip install flask flask-cors pandas numpy # Start the Flask API server python3 app.py
The API will be available at
http://localhost:5001 -
Set up the frontend
# Navigate to frontend directory cd frontend # Install dependencies npm install # Start the React development server npm start
The web app will be available at
http://localhost:3000
βββββββββββββββββββ HTTP/JSON βββββββββββββββββββ
β React Frontendβ βββββββββββββββΊ β Flask Backend β
β (Port 3000) β β (Port 5001) β
βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Naive Bayes ML β
β Model β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Stack Overflow β
β 2023 Survey Dataβ
β (35k records) β
βββββββββββββββββββ
- Stack Overflow 2023 Developer Survey: 35,082 real developer responses
- Variables: Age, Education, Employment Type, Remote Work, Experience, Dev Type, Company Size, Country
- Target: Salary ranges (<$50K, $50K-$75K, $75K-$100K, $100K-$150K, $150K+)
- Algorithm: Naive Bayes with Variable Elimination
- Training Data: Preprocessed Stack Overflow survey responses
- Features: 8 categorical variables representing developer profile
- Output: Probability distribution across 5 salary ranges
- Training Samples: 35,082 developer profiles
- Global Coverage: 10+ countries (US, Germany, UK, India, Canada, etc.)
- Experience Range: <1 year to 15+ years
- Company Sizes: Startups (1-9) to Enterprise (5K+)
- Flask: Web framework for API
- Python 3.8+: Core backend language
- Pandas: Data processing and analysis
- NumPy: Numerical computations
- Custom ML: Naive Bayes implementation with variable elimination
- React 18: Modern UI framework
- Axios: HTTP client for API calls
- CSS3: Custom styling with gradients and animations
- Responsive Design: Mobile-friendly interface
- preprocess_stackoverflow.py: Cleans and processes raw survey data
- naive_bayes_solution.py: Implements the ML model
- bnetbase.py: Bayesian network foundation classes
SalaryPredict/
βββ app.py # Flask API server
βββ naive_bayes_solution.py # ML model implementation
βββ bnetbase.py # Bayesian network base classes
βββ preprocess_stackoverflow.py # Data preprocessing pipeline
βββ simple_script_example.py # Comparison with rule-based approach
βββ docs/ # Documentation and assets
β βββ images/ # Screenshots and visual assets
β βββ prediction-results.png
β βββ welcome-interface.png
β βββ README.md
βββ data/ # Dataset files
β βββ stackoverflow-train.csv # Processed training data
β βββ stackoverflow-test.csv # Test data
β βββ survey_results_schema.csv # Data schema documentation
β βββ README_2023.txt # Dataset documentation
βββ frontend/ # React web application
βββ public/
βββ src/
β βββ App.js # Main React component
β βββ App.css # Styling
β βββ index.js # Entry point
βββ package.json # Dependencies
Health check endpoint
{
"message": "Salary Prediction API is running!",
"status": "healthy"
}Returns available options for each input field
{
"Age": ["Under 18", "18-24", "25-34", "35-44", "45-54", "55-64", "65+"],
"Education": ["High School or Less", "Some College", "Associate", "Professional/PhD", "Other"],
// ... other fields
}Predicts salary based on developer profile
Request Body:
{
"Age": "25-34",
"Education": "Professional/PhD",
"Employment": "Full-time",
"RemoteWork": "Hybrid",
"Experience": "3-5 years",
"DevType": "Full-stack",
"CompanySize": "Large (100-499)",
"Country": "United States"
}Response:
{
"prediction": "100K-150K",
"prediction_display": "$100,000 - $150,000",
"confidence": 0.346,
"probabilities": {
"<50K": 0.072,
"50K-75K": 0.198,
"75K-100K": 0.215,
"100K-150K": 0.346,
"150K+": 0.169
},
"data_insights": {
"total_training_samples": 35082,
"exact_profile_matches": 0,
"feature_matches": {
"Age": 16157,
"DevType": 12517,
// ... more statistics
}
}
}- Uses real Stack Overflow 2023 Developer Survey data
- Processes 35,082 developer responses into clean training data
- Maps categorical variables to consistent domains
- Naive Bayes: Assumes feature independence given the target class
- Variable Elimination: Efficient inference algorithm for probabilistic queries
- Conditional Probability Tables: Learned from training data frequencies
# 1. Set evidence for all input variables
for field, value in user_input.items():
variable.set_evidence(value)
# 2. Query salary variable using variable elimination
salary_probabilities = ve(model, salary_var, evidence_vars)
# 3. Return probability distribution across salary rangesThe app includes simple_script_example.py to demonstrate why ML is superior:
β Simple Rules Approach:
- Hardcoded if-else statements
- No learning from data
- No confidence scores
- Can't handle complex interactions
β Machine Learning Approach:
- Learns patterns from 35,082 real developers
- Provides probability distributions
- Handles feature interactions
- Data-driven and transparent
- Side-by-side Layout: Form on left, results on right
- Real-time Validation: Form validation with helpful error messages
- Educational Content: Explains how ML works and why it's better than simple rules
- Data Transparency: Shows exactly how many similar developers exist
- Responsive Design: Works on desktop, tablet, and mobile
- Clean introduction to the application
- Educational content about ML vs simple rules
- Clear explanation of what the app predicts
- Professional gradient design
- Salary Range Prediction: Clear display of predicted range (e.g., "$100K - $150K")
- Confidence Score: Shows model confidence (e.g., "33.0%")
- Probability Bars: Visual representation of all salary range probabilities
- Data Insights: Transparent breakdown of training data statistics
- Profile Matching: Shows exact and similar developer counts
- Real-time Form Validation: Immediate feedback on form completion
- Progressive Disclosure: Information revealed as user interacts
- Visual Feedback: Color-coded probability bars and confidence indicators
- Educational Tooltips: ML insights explained in simple terms
See the Screenshots section above for visual examples of these features.
Follow the Quick Start guide above for local development.
# Using gunicorn for production
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5001 app:app# Build for production
cd frontend
npm run build
# Serve with any static file server
npm install -g serve
serve -s build -l 3000# Backend
export FLASK_ENV=production
export FLASK_DEBUG=false
# Frontend
export REACT_APP_API_URL=https://your-api-domain.com# Test the API endpoints
python3 -c "
import requests
response = requests.get('http://localhost:5001/')
print(response.json())
"curl -X POST http://localhost:5001/api/predict \
-H "Content-Type: application/json" \
-d '{
"Age": "25-34",
"Education": "Professional/PhD",
"Employment": "Full-time",
"RemoteWork": "Hybrid",
"Experience": "3-5 years",
"DevType": "Full-stack",
"CompanySize": "Large (100-499)",
"Country": "United States"
}'- Training Data: 35,082 developer profiles
- Global Reach: 10+ countries represented
- Experience Range: Complete career spectrum from junior to senior
- Company Diversity: Startups to enterprise companies
The app shows users:
- How many developers have their exact profile
- Similar developer counts for each characteristic
- Training data size and composition
- Salary distribution in the dataset
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 for Python code
- Use meaningful commit messages
- Add comments for complex ML logic
- Test API endpoints before committing
To add or update screenshots:
- Take high-quality screenshots (at least 1200px wide)
- Save them in
docs/images/with descriptive names - Update the README to reference the new images
- Remove the corresponding
.placeholderfiles
Current screenshot requirements:
prediction-results.png: Show the prediction interface with resultswelcome-interface.png: Show the welcome page with educational content
This project is licensed under the MIT License - see the LICENSE file for details.
- Stack Overflow: For providing the 2023 Developer Survey dataset
- Open Database License: Stack Overflow survey data is available under ODbL
- Community: Thanks to all developers who participated in the survey
- GitHub Issues: Report bugs or request features
- Documentation: Check this README for comprehensive guidance
- Email: For additional support or questions
Built with β€οΈ by yousef20920
Empowering developers with data-driven salary insights since 2025