AgentSlurm is an intelligent, modular, and adaptive agentic AI system analyzer for SLURM job scripts in High-Performance Computing (HPC) environments. It identifies potential pitfalls, misconfigurations, and optimization opportunities, with a particular focus on Lustre filesystem performance, providing tailored feedback to users.
- For Users: Faster job execution, better resource allocation, improved HPC understanding.
- For Sysadmins: Reduced support load, proactive issue detection.
- For HPC Systems: Higher throughput, efficient resource utilization.
- Python 3.8+
- Conda (recommended) or pip
-
Clone the repository:
git clone https://github.com/basillicus/agentSlurm.git cd agentSlurm -
Create and activate a conda environment (recommended):
conda create -n agentslurm-env python=3.9 # Or your preferred Python version conda activate agentslurm-env -
Install dependencies:
pip install -e .
Analyze a SLURM script:
agentslurm your_script.slurmExample:
agentslurm agentSlurm/llm_test_script.slurmFor more options, including LLM integration and user profiles:
agentslurm --help- Lustre I/O Analysis: Detects missing or suboptimal Lustre striping configurations
- LUSTRE-001: Warns when large-file tools lack
lfs setstripeconfiguration - LUSTRE-002: Warns when small-file tools use wide striping
- LUSTRE-001: Warns when large-file tools lack
- LLM Integration: Optional deep analysis using OpenAI, Anthropic, Ollama, or Hugging Face models
- User Profiles: Tailored feedback for Basic, Medium, or Advanced HPC users
- Learning System: Converts LLM insights into deterministic rules for future analysis
- Customizable Focus: Option to focus analysis on specific categories (e.g., LUSTRE, PERFORMANCE)
- User Guide: Learn how to use AgentSlurm and interpret its reports. docs/user_guide.md
- Developer Guide: Understand the architecture, contribute, and extend the system. docs/developer_guide.md
This README provides a high-level overview. For detailed information, please refer to the comprehensive User and Developer Guides.