This project implements a full static-analysis workflow for Windows PE malware detection.
It combines PE header parsing, entropy-based section profiling, API import analysis, and a machine learning classifier trained on engineered features.
- Extracts raw PE metadata.
- Dumps section names, virtual/raw sizes, and entropy.
- Enumerates DLL imports and API calls.
- Designed for analyst-side sample inspection.
Feature engineering pipeline that extracts:
- File size
- Section entropy statistics (mean/max/std)
- Suspicious API usage (
VirtualAlloc,WriteProcessMemory,CreateRemoteThread, etc.) - DLL diversity
- Compile timestamp (year)
- Import volume and behavioral indicators
Exports dataset → trains Random Forest → outputs classification report.
Structure:
file_size, mean_entropy, max_entropy, entropy_std,
import_count, suspicious_api_count, unique_dll_count,
compile_year, label
Sample count in this demo dataset:
- Malware: 1
- Benign: 1
(Expandable - script supports full directories.)
-
Parse PE
python PE_header_parser.py
-
Extract Features + Train Model
python static_malware_classifier.py
-
Outputs
pe_features.csv: engineered dataset- Classification metrics
- Full import/section dump for each sample
Static analysis is the first line of triage in SOC and IR workflows.
This project automates the extraction of structural and behavioral signals directly from the binary—zero execution required.
SOC/IR Applications:
- Quick risk scoring of suspicious executables
- Detecting anomalous entropy patterns (packing/obfuscation)
- Identifying malware-like import behavior
- Building ML-assisted pre-sandbox triage engines
This repo is structured so it can be expanded into:
- CNN/RNN models using byte sequences
- Hybrid static+dynamic classifiers
- YARA-SVM hybrid detection pipeline
PE_header_parser.py
static_malware_classifier.py
pe_features.csv
malware_samples/
benign_samples/
README.md