Skip to content

NLP pipeline for classifying unstructured "Near-Miss" (Hiyari-Hatto) reports. Uses TF-IDF & statistical classification to structure ambiguous safety data for risk analysis. ⚠️ πŸ“Š

Notifications You must be signed in to change notification settings

gauravdev148/AI_Hazard_Risk_Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚠️ Hiyari-Hatto (Near-Miss) AI Analyzer

Python Framework License Status

An NLP-driven framework for classifying unstructured industrial safety reports.

This project automates the risk assessment of "Hiyari-Hatto" (Near-Miss) incidents in construction environments. By using Natural Language Processing (NLP) and rule-based logic, it converts unstructured worker reports into structured safety data, enabling site managers to predict and prevent accidents before they occur.


πŸ“Έ Dashboard Preview

Dashboard Interface (Figure 1: Real-time analysis interface showing risk classification and keyword extraction)


πŸš€ Key Features

  • Natural Language Processing (NLP):
    • Analyzes raw text using TextBlob and NLTK.
    • Extracts safety-critical keywords (e.g., "leak," "spark," "voltage").
  • Automated Risk Classification:
    • Categorizes incidents into High (Critical), Medium (Caution), and Low (Monitor) based on industry safety standards.
  • Real-Time Visualization:
    • Interactive dashboard built with Flask and Bootstrap 5.
    • Visualizes risk trends using Chart.js.
  • Industrial Applicability:
    • Designed to handle noisy, non-technical language often found in on-site worker logs.

πŸ› οΈ Tech Stack

Component Technology
Backend Python, Flask (Web Server)
NLP Engine NLTK, TextBlob, Pandas
Frontend HTML5, CSS3, Bootstrap 5
Data Analysis Scikit-learn, NumPy

πŸ“‚ Project Structure

AI_Hazard_Risk_Analyzer/
β”œβ”€β”€ data/                  # Sample Hiyari-Hatto datasets (CSV)
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ app.py             # Main Flask application logic
β”‚   └── templates/         # UI Dashboard (HTML/Jinja2)
β”œβ”€β”€ requirements.txt       # Research dependencies
└── README.md              # Project documentation

⚑ How to Run

1. Setup Environment

# Create Virtual Environment (Sandbox)
python -m venv venv

# Activate (Windows)
.\venv\Scripts\activate

# Install Dependencies
pip install -r requirements.txt

2. Initialize NLP Resources

# Download required NLTK data (run once)
python -m textblob.download_corpora

3. Launch Application

# Navigate to the Source Folder
cd src

# Start the Analysis Engine
python app.py

Access the dashboard at: http://127.0.0.1:5001


πŸ”§ Troubleshooting (Windows Users)

If you see a "running scripts is disabled" error when trying to activate the environment, run this command in PowerShell to allow script execution:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process

Then try running .\venv\Scripts\activate again.


πŸ”¬ Research Context

This tool was developed to address the "Data Silo" problem in industrial safety. While large construction firms collect thousands of Near-Miss reports, they are often stored as text logs that are difficult to analyze at scale.

Future Scope:

  1. Integration with Formal Verification methods to model safety state transitions.
  2. Expansion of the dataset to include multi-lingual support (Hindi/Japanese).

πŸ‘€ Author

Gaurav Dev

  • Research Interests: Industrial Safety Systems, Formal Methods, NLP.

About

NLP pipeline for classifying unstructured "Near-Miss" (Hiyari-Hatto) reports. Uses TF-IDF & statistical classification to structure ambiguous safety data for risk analysis. ⚠️ πŸ“Š

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published