Skip to content

dfirvault/NGINX_Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧰 Log Combiner Tool A fast, multithreaded Python script that recursively scans folders for web server logs (access, error, ssl) β€” including .log and .xz compressed files β€” and combines them by type into a clean, organized output directory.

The .xz extension is commonly used for pre-compressed log files in NGINX environments. During DFIR investigations, these files are often all you have. This tool automatically detects and decompresses .xz files in-memory, letting you immediately parse and combine logs into searchable, readable text files.

You can then index the combined logs into your favorite analysis tool β€” I recommend Splunk.

πŸ“Έ Examples

Input β€” Raw Logs (including .xz):

image

Output β€” Clean Combined File:

image

Simple, single file to work with:

✨ FEATURES

πŸ” Automatically detects access, error, and ssl logs by filename

πŸ“¦ Supports both plain .log and compressed .xz files

⚑ Blazing fast with multithreaded processing

πŸ—‚οΈ Preserves original subfolder structure in output

πŸ§‘β€πŸ’» Simple interactive CLI β€” no arguments needed

🧱 No external dependencies β€” pure Python

πŸ“Œ Important Behavior – Log Detection Logic

The script matches files based on the presence of keywords in the filename, not strict naming conventions. For example:

Files like access.log, access.log-20250623, broadway_access_20250623.xz will all be treated as access logs

Similarly, any file with error or ssl in its name will be matched accordingly

This flexible matching ensures compatibility with most rotated or archived log naming schemes.

πŸ“‚ EXAMPLE STRUCTURE

Input Directory (/logs): /logs β”œβ”€β”€ site1 β”‚ β”œβ”€β”€ access.log β”‚ β”œβ”€β”€ error.log β”‚ β”œβ”€β”€ access.log-20250623.xz β”‚ └── access.log-20250624.xz β”œβ”€β”€ site2 β”‚ β”œβ”€β”€ ssl.log β”‚ └── error.log

Output Directory (/combined_logs):

/combined_logs β”œβ”€β”€ site1 β”‚ β”œβ”€β”€ combined-access.log β”‚ β”œβ”€β”€ combined-error.log β”œβ”€β”€ site2 β”‚ β”œβ”€β”€ combined-ssl.log β”‚ β”œβ”€β”€ combined-error.log πŸš€ HOW TO USE

Run the script: python NGINX_Parser.py

When prompted:

Enter the input directory containing your logs Enter the output directory for the combined logs Done! The tool will process and combine your logs by type into the output directory.

πŸ› οΈ HOW IT WORKS

πŸ“› File names are scanned for keywords:

"access" β†’ Access logs "error" β†’ Error logs "ssl" β†’ SSL logs

🧩 .xz files are extracted in-memory using Python’s built-in lzma module

πŸ“ All matched files are grouped by type and written into:

combined-access.log combined-error.log combined-ssl.log

πŸ§ͺ REQUIREMENTS

Python 3.7 or newer

No additional dependencies (100% standard library)

πŸ“„ LICENSE

This project is licensed under the MIT License.

πŸ‘¨β€πŸ’» Developed by Jacob Wilson – https://dfirvault.com

πŸ’¬ Feedback, forks, and pull requests are always welcome!