This project presents a framework to forecast temperature at known locations using machine learning and deep learning models, and extend those predictions to unknown spatial points using interpolation methods. The pipeline integrates temporal feature engineering, model training, spatial interpolation, and visualization to provide an end-to-end temperature forecasting solution.
Developed as part of a research-oriented academic project at San Jose State University.
- Predicts hourly temperature using:
- Baseline models: Linear Regression, Ridge, Lasso, Random Forest, Gradient Boosting, XGBoost
- Deep learning model: LSTM (Long Short-Term Memory)
- Uses trained LSTM models to forecast future temperatures at 8 known locations
- Applies spatial interpolation (Linear, Cubic, Inverse Distance Weighting) to estimate temperatures at unknown locations
- Visualizes the predictions using 2D heatmaps and 3D surface plots
Temperature data is often sparse and temporally dependent. This project aims to:
- Improve temperature prediction accuracy using advanced ML and DL techniques
- Enable spatial generalization using interpolation
- Support applications in climate modeling, environmental monitoring, and agriculture
-
Data Preparation:
- Hourly temperature data from 8 locations is cleaned and structured
- Cyclic time features (sine/cosine for hour and day), lag features, and rolling window statistics are engineered
-
Model Training:
- Baseline models are trained with and without preprocessing
- LSTM models are trained per location and saved as
.kerasfiles for reuse
-
Prediction:
- LSTM models forecast temperature one hour into the future
- MAE and RMSE are used for evaluation
-
Spatial Interpolation:
- Predicted values from known locations are extended using:
- Linear interpolation
- Cubic interpolation
- Inverse Distance Weighting (IDW)
- Predicted values from known locations are extended using:
-
Visualization:
- 2D heatmaps and 3D plots are generated to display spatial temperature gradients
| Model | MAE | MSE | R² |
|---|---|---|---|
| Linear | 0.68 | 0.85 | 0.99 |
| Ridge | 0.68 | 0.85 | 0.99 |
| Lasso | 0.67 | 0.85 | 0.99 |
| Random Forest | 0.59 | 0.65 | 1.00 |
| Gradient Boosting | 0.57 | 0.64 | 1.00 |
| XGBoost | 0.61 | 0.68 | 1.00 |
| Location | MAE | RMSE |
|---|---|---|
| Aotizhongxin | 0.37 | 0.47 |
| Dongsi | 0.34 | 0.50 |
| Guanyuan | 0.34 | 0.39 |
| Gucheng | 1.53 | 1.55 |
| Nongzhanguan | 0.63 | 0.66 |
| Tiantan | 0.59 | 0.77 |
| Wanliu | 1.47 | 1.67 |
| Wanshouxigong | 0.64 | 0.78 |
- LSTM networks for time-series prediction
- Feature engineering: cyclic time, lag, rolling windows
- Spatial interpolation: IDW, Linear, Cubic
- Evaluation metrics: MAE, RMSE, R²
- Visualization with matplotlib and seaborn
- Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
- Hochreiter & Schmidhuber, Long Short-Term Memory, 1997
- Brockwell & Davis, Introduction to Time Series and Forecasting
- Shepard, D., Interpolation Function for Irregularly-Spaced Data, 1968
Special thanks to Prof. Jun Liu and San Jose State University for the guidance and computational resources provided during the project.
Sai Preeth Aduwala
San Jose State University
Email: saipreeth.aduwala@sjsu.edu