-
Notifications
You must be signed in to change notification settings - Fork 10
Enhance README with features and usage details #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Expanded the README to include detailed project features, setup instructions, usage examples, and notes on customization and limitations.
|
Pull Request Ethics, Security & Compliance Checklist
By submitting this pull request, I confirm I have considered the security, privacy, and ethical implications of my contribution in accordance with university policy and applicable legal and professional standards. Thank you for helping maintain a secure and responsible codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the Asset Scanner README documentation to provide comprehensive information about features, setup, usage, and customization. The documentation update transforms a basic README into a well-structured guide for users and contributors.
Changes:
- Expanded project description with detailed feature list including hybrid detection, OCR capabilities, and risk assessment
- Restructured documentation with improved formatting using tables, sections, and examples
- Added comprehensive CLI documentation, customization guides, and example outputs
- Included contributing guidelines and references to project policies
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | `--root` | Path | Recursively scan all files in a directory. | `python scanner.py --root "/path/to/"` | | ||
| | `--patterns` | Path | Custom path to `patterns.json`. | `--patterns ./configs/patterns.json` | | ||
| | `--out` | Path | Path to save structured scan results (e.g., `.json`, `.txt`). | `--out results.json` | | ||
| | `--ext` | List | Filter by file extensions (_default: .txt, .docx, .pdf_). | `--ext .txt .md` | |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documented default extensions for --ext are incorrect. According to scanner.py line 185, the actual default is ['.txt', '.json'], not .txt, .docx, .pdf. The scanner.py line 51 defines DEFAULT_TARGET_EXTS with a broader list, but these are not the defaults for the --ext parameter.
| | `--ext` | List | Filter by file extensions (_default: .txt, .docx, .pdf_). | `--ext .txt .md` | | |
| | `--ext` | List | Filter by file extensions (_default: .txt, .json_). | `--ext .txt .md` | |
| "tip": "Rotate immediately; revoke if exposed.", | ||
| "compliance": ["GDPR Art. 33 — Data Breach Notification"] |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example compliance reference is incomplete. According to risk_rules.json lines 14-19, the aws_access_key actually includes multiple compliance references: 'Privacy Act 1988 (Cth) — APP 11', 'Privacy Act 1988 (Cth) — Notifiable Data Breaches (NDB) scheme, Part IIIC', and 'GDPR Art. 32 — Security of processing'. The tip should also be 'Rotate immediately; revoke if exposed; move to a secrets manager; purge from history.'
| "file": "example.docx", | ||
| "line": 1, | ||
| "match": "john.doe@example.com", | ||
| "risk": "Medium", |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example shows email risk as 'Medium', but according to risk_rules.json line 3, emails are classified as 'Low' risk, not 'Medium'.
| "risk": "Medium", | |
| "risk": "Low", |
| ## 🌟 Contributing | ||
|
|
||
| We welcome contributions! Fork the repository, create a feature branch, and submit a pull request. | ||
| Please adhere to our [Code of Conduct](CODE_OF_CONDUCT.md). |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README references a CODE_OF_CONDUCT.md file, but this file does not exist in the asset-scanner directory. This will result in a broken link for users viewing the asset-scanner README.
| Please adhere to our [Code of Conduct](CODE_OF_CONDUCT.md). | |
| Please adhere to our [Code of Conduct](../CODE_OF_CONDUCT.md). |
| This project is licensed under the [MIT License](LICENSE). | ||
| See the `LICENSE` file for full details. |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README references a LICENSE file with a relative link, but no LICENSE file exists in the asset-scanner directory. This will result in a broken link for users viewing the asset-scanner README.
| This project is licensed under the [MIT License](LICENSE). | |
| See the `LICENSE` file for full details. | |
| This project is licensed under the MIT License. | |
| See the `LICENSE` file in the repository root for full details. |
| | `--patterns` | Path | Custom path to `patterns.json`. | `--patterns ./configs/patterns.json` | | ||
| | `--out` | Path | Path to save structured scan results (e.g., `.json`, `.txt`). | `--out results.json` | | ||
| | `--ext` | List | Filter by file extensions (_default: .txt, .docx, .pdf_). | `--ext .txt .md` | | ||
| | `--no-console`| Flag | Suppress console output. Only write to the output file. | `--no-console` | |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --no-console flag is documented in the CLI table, but this argument does not exist in scanner.py's argument parser (lines 173-190). This feature is not implemented.
| | `--no-console`| Flag | Suppress console output. Only write to the output file. | `--no-console` | |
| ## 📝 Example | ||
| Scanning a document containing: | ||
| ## 🥼 Example |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The emoji '🥼' (lab coat) is unusual for an 'Example' section. Consider using a more standard emoji like '📝' (memo) or '💡' (light bulb) for consistency with other section headers.
| ## 🥼 Example | |
| ## 📝 Example |
Expanded the README to include detailed project features, setup instructions, usage examples, and notes on customization and limitations.