Malware Static Analysis Engine (PE Feature Engineering + ML Classifier)

This project implements a full static-analysis workflow for Windows PE malware detection.
It combines PE header parsing, entropy-based section profiling, API import analysis, and a machine learning classifier trained on engineered features.

Core Components

1. PE Header Parser (`PE_header_parser.py`)

Extracts raw PE metadata.
Dumps section names, virtual/raw sizes, and entropy.
Enumerates DLL imports and API calls.
Designed for analyst-side sample inspection.

2. Static Malware Classifier (`static_malware_classifier.py`)

Feature engineering pipeline that extracts:

File size
Section entropy statistics (mean/max/std)
Suspicious API usage (VirtualAlloc, WriteProcessMemory, CreateRemoteThread, etc.)
DLL diversity
Compile timestamp (year)
Import volume and behavioral indicators

Exports dataset → trains Random Forest → outputs classification report.

3. Dataset (`pe_features.csv`)

Structure:

file_size, mean_entropy, max_entropy, entropy_std,
import_count, suspicious_api_count, unique_dll_count,
compile_year, label

Sample count in this demo dataset:

Malware: 1
Benign: 1
(Expandable - script supports full directories.)

How It Works

Parse PE
```
python PE_header_parser.py
```
Extract Features + Train Model
```
python static_malware_classifier.py
```
Outputs
- pe_features.csv: engineered dataset
- Classification metrics
- Full import/section dump for each sample

Why This Matters

Static analysis is the first line of triage in SOC and IR workflows.
This project automates the extraction of structural and behavioral signals directly from the binary—zero execution required.

SOC/IR Applications:

Quick risk scoring of suspicious executables
Detecting anomalous entropy patterns (packing/obfuscation)
Identifying malware-like import behavior
Building ML-assisted pre-sandbox triage engines

This repo is structured so it can be expanded into:

CNN/RNN models using byte sequences
Hybrid static+dynamic classifiers
YARA-SVM hybrid detection pipeline

Project Structure

PE_header_parser.py
static_malware_classifier.py
pe_features.csv
malware_samples/
benign_samples/
README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Malware Static Analysis Engine (PE Feature Engineering + ML Classifier)

Core Components

1. PE Header Parser (`PE_header_parser.py`)

2. Static Malware Classifier (`static_malware_classifier.py`)

3. Dataset (`pe_features.csv`)

How It Works

Why This Matters

Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benign_samples		benign_samples
malware_samples		malware_samples
PE_header_parser.py		PE_header_parser.py
README.md		README.md
output_PE_header_parser.txt		output_PE_header_parser.txt
pe_features.csv		pe_features.csv
static_malware_classifier .py		static_malware_classifier .py

atharimran728/malware-static-analysis-ML-engine

Folders and files

Latest commit

History

Repository files navigation

Malware Static Analysis Engine (PE Feature Engineering + ML Classifier)

Core Components

1. PE Header Parser (PE_header_parser.py)

2. Static Malware Classifier (static_malware_classifier.py)

3. Dataset (pe_features.csv)

How It Works

Why This Matters

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. PE Header Parser (`PE_header_parser.py`)

2. Static Malware Classifier (`static_malware_classifier.py`)

3. Dataset (`pe_features.csv`)

Packages