A Python script that uses AI to automatically generate and add relevant tags to your Obsidian notes. The script intelligently processes unprocessed or recently modified notes, analyzes their content, and adds contextually appropriate tags to the frontmatter.
- 🔍 Intelligently processes notes based on modification timestamps with built-in cooldown mechanisms
- 🤖 Uses AI (Ollama locally or OpenAI) to generate contextually relevant tags
- 📊 Analyzes existing tags in your vault for consistency
- 🏷️ Updates note frontmatter with new tags (without duplicating existing ones)
- ⚡ Batch processing support for large vaults
- ⏱️ Built-in rate limit handling and advanced retry mechanisms
- 🛡️ Smart cooldown system prevents unnecessary reprocessing
- 📝 Comprehensive logging for troubleshooting
Maintaining a consistent tagging system in Obsidian can be challenging but is crucial for effective knowledge management. This script helps solve common tagging problems:
- Consistency: By analyzing your entire vault's existing tags, it maintains a cohesive tagging system
- Reduced Cognitive Load: No need to stop and think about appropriate tags while writing
- Improved Discoverability: Better tagging means your notes are easier to find later
- Time-Saving: Automatically processes new and modified notes intelligently
- Knowledge Connections: Good tags help reveal connections between seemingly unrelated notes
- Scalable: Batch processing handles large vaults efficiently
The script is especially useful as part of a regular note-taking workflow. It can run frequently without wasting resources, as it only processes files that need attention.
The script uses an intelligent processing system:
- Never Processed: Files without a
processedtimestamp in their frontmatter are automatically processed - Modified Since Processing: Files modified after their last processing timestamp are reprocessed (with a 15-minute cooldown to prevent rapid reprocessing)
- Ignore Buffer: Files processed within the last 15 minutes are always included to handle edge cases
- Batch Support: Large numbers of files can be processed in configurable batches to avoid overwhelming the AI service
This approach ensures notes get tagged when needed without unnecessary reprocessing.
- Python 3.7+
- Obsidian vault with markdown files
- One of the following:
- Ollama (free, runs locally)
- OpenAI API key (paid, cloud-based)
- Required Python packages:
- requests
- python-dateutil
- pyyaml
- openai (optional, for OpenAI API)
- Clone this repository:
git clone https://github.com/undergroundpost/obsidian-auto-tagger.git
cd obsidian-auto-tagger- Install the required packages:
pip install requests python-dateutil pyyaml
# If using OpenAI
pip install openai- Copy and modify the example config file:
cp config.yaml.example config.yaml
# Edit config.yaml with your preferred settings- Make the script executable (on Unix-like systems):
chmod +x generate_tags.py- Place the prompt file in the same directory:
# Make sure generate_tags.md is in the same directory as the scriptYou can configure the script in three ways:
- Config file: Create a
config.yamlfile in the same directory as the script - Environment-specific locations: The script will search in these locations (in order):
- Script directory:
./config.yaml - User config:
~/.config/generate_tags/config.yaml - System-wide:
/etc/generate_tags/config.yaml
- Script directory:
- Command-line options: Override settings for a single run
# Folder settings
INPUT_FOLDER: "/Users/username/Obsidian/Vault"
# Exclude folders (list of folders to ignore when scanning)
EXCLUDE_FOLDERS:
- "/Users/username/Obsidian/Vault/AI" # AI-related folder
- "/Users/username/Obsidian/Vault/Private" # Private notes
- "/Users/username/Obsidian/Vault/Templates" # Templates folder
# LLM Provider settings
LLM_PROVIDER: "ollama" # Options: "ollama" or "openai"
# Ollama settings (used when LLM_PROVIDER is "ollama")
OLLAMA_MODEL: "gemma3:12b" # Model to use
OLLAMA_SERVER_ADDRESS: "http://localhost:11434" # Ollama server address
OLLAMA_CONTEXT_WINDOW: 32000 # Context window size
# OpenAI settings (used when LLM_PROVIDER is "openai")
OPENAI_API_KEY: "" # Your OpenAI API key (required for OpenAI)
OPENAI_MODEL: "gpt-3.5-turbo" # OpenAI model to use
OPENAI_MAX_TOKENS: 4000 # Maximum tokens for responsesNote: The script also supports the legacy EXCLUDE_FOLDER (singular) for backward compatibility, but EXCLUDE_FOLDERS (plural) is recommended.
The script requires a prompt file named generate_tags.md in the same directory as the script. This file contains instructions for the LLM on how to generate tags. A template is provided in the repository.
./generate_tags.py./generate_tags.py --input "/path/to/vault" --model "llama3:8b" --server "http://192.168.1.100:11434"./generate_tags.py --limit 10./generate_tags.py --batch-mode --batch-size 25./generate_tags.py --debug./generate_tags.py --provider openai --api-key "your-key" --delay 5| Option | Description |
|---|---|
--debug |
Enable detailed debug logging |
--input |
Override input folder |
--exclude |
Override exclude folders (can be used multiple times) |
--provider |
Set LLM provider: "ollama" or "openai" |
--model |
Override model name (for either provider) |
--server |
Override Ollama server address |
--api-key |
Override OpenAI API key |
--delay |
Add delay between files in seconds (helps with API rate limits) |
--log-level |
Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
--limit |
Limit number of files to process (0 for no limit) |
--batch-mode |
Enable batch processing for large numbers of files |
--batch-size |
Number of files per batch (default: 20) |
./generate_tags.py --exclude "/path/to/exclude1" --exclude "/path/to/exclude2"./generate_tags.py --batch-mode --batch-size 30 --delay 2 --limit 100This processes up to 100 files in batches of 30, with a 2-second delay between files.
For large vaults with many files to process, the script offers batch processing:
- Automatic Batching: Use
--batch-modeto enable batching when processing many files - Configurable Batch Size: Set
--batch-sizeto control how many files are processed together (default: 20) - Batch Pauses: 30-second pause between batches to avoid overwhelming the AI service
- Progress Tracking: Clear progress indicators show batch and overall progress
Example for processing a large vault:
./generate_tags.py --batch-mode --batch-size 25The script supports two LLM providers for generating tags:
- Ollama (default): Use a local Ollama server running on your machine or network
- OpenAI: Use OpenAI's API (requires an API key)
To use OpenAI instead of Ollama, you need to:
-
Install the OpenAI Python package:
pip install openai
-
Set your OpenAI API key in the config.yaml file:
LLM_PROVIDER: "openai" OPENAI_API_KEY: "your-api-key-here" OPENAI_MODEL: "gpt-3.5-turbo" # default, or use "gpt-4" for better results
-
Alternatively, you can set these via command line:
./generate_tags.py --provider openai --api-key "your-api-key-here" --model "gpt-4"
The script uses "gpt-3.5-turbo" as the default OpenAI model, which provides a good balance between availability, cost, and quality. For even better tagging quality, you can use "gpt-4" or "o1-mini", though these may have different quota limits or costs.
The script uses intelligent processing logic to avoid unnecessary work:
- Unprocessed Files: Any file without a
processedtimestamp in its frontmatter - Modified Files: Files modified after their
processedtimestamp (with 15-minute cooldown) - Recently Processed: Files processed within the last 15 minutes (ignore buffer for edge cases)
- 15-minute cooldown: Prevents rapid reprocessing of the same file
- Ignore buffer: Recently processed files are always included to handle timing edge cases
- Modification tracking: Only reprocesses files that have actually changed
This system ensures efficient processing while avoiding missed files due to timing issues.
The script automatically creates log files in a logs directory next to the script. Each log file is named with the current date (generate_tags_YYYY-MM-DD.log) and contains detailed information about the script's execution, including any errors or warnings.
You can control the logging verbosity with these command-line options:
--debug: Enable detailed debug logging--log-level: Set a specific logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
Example:
./generate_tags.py --log-level WARNING # Only show warnings and errorsLog files are helpful for troubleshooting and reviewing what happened during previous runs.
Since the script only processes files that need attention, you can run it more frequently than traditional "daily" scripts:
Frequent runs (recommended):
# Every 2 hours
0 */2 * * * /path/to/generate_tags.py
# Every hour during work hours
0 9-17 * * * /path/to/generate_tags.pyTraditional daily run:
# Once daily at 1 AM
0 1 * * * /path/to/generate_tags.pyTo run the script every 2 hours:
0 */2 * * * /path/to/generate_tags.py
- Open Task Scheduler
- Create a new Basic Task
- Set the trigger to your preferred frequency (hourly, daily, etc.)
- Set the action to "Start a program"
- Program:
python - Arguments:
C:\path\to\generate_tags.py
No files found for processing:
- Check that your
INPUT_FOLDERpath is correct - Verify that files aren't being excluded by
EXCLUDE_FOLDERS - Run with
--debugto see detailed file scanning information
Processing timestamp issues:
- Files track processing time in their frontmatter with a
processedfield - If you need to force reprocessing, remove the
processedfield from the frontmatter - The 15-minute cooldown prevents rapid reprocessing
Rate limiting with OpenAI:
- Use
--delay 5to add delays between API calls - Consider using
--batch-modewith smaller batch sizes - Monitor your OpenAI usage dashboard
Large vault performance:
- Use
--batch-modefor processing many files - Adjust
--batch-sizebased on your system and AI service capacity - Use
--limitto process files incrementally
MIT License