-
Notifications
You must be signed in to change notification settings - Fork 633
Description
Hello PyRIT team,
I would like to propose integrating my comprehensive collection of jailbreak templates as an extension to the existing attack templates in pyrit/datasets/jailbreak/templates/.
Background & Motivation
Through my research while being an AI Red Teamer, I've observed that the current jailbreak templates in PyRIT, while foundational, may not adequately challenge modern LLM safety measures and are bit outdated. This presents a gap in comprehensive red teaming capabilities where attack templates becomes the Heart of PyRIT.
Proposed Contribution
I have developed around 80+ jailbreak templates and still continue to develop attacks based on latest techniques, available at: https://github.com/Arth-Singh/Arth-Jailbreak-Templates ; I have also taken few of the existing attacks in pyrit and went ahead to enhance them.
Technical Details
- Format: All templates follow PyRIT-compatible YAML structure with standardized metadata
Preliminary Validation
Initial testing on GPT-4o has shown promising results, with several templates successfully eliciting responses to sensitive queries (e.g., illicit substance synthesis) that standard approaches fail to achieve.
Quantitative Evaluation Offer
If empirical validation data would support this integration, I am prepared to conduct comprehensive quantitative analysis over the weekend, including success rate measurements across multiple models.
I believe this contribution would significantly enhance PyRIT's red team capabilities and support more robust AI safety testing. I'm happy to discuss implementation details, provide additional documentation, or conduct any required validation studies.
Thank you for considering this contribution to the PyRIT project.
Best regards,
Arth Singh
LinkedIn: https://www.linkedin.com/in/arthsingh7in/