Obsidian Auto Tagger

A Python script that uses AI to automatically generate and add relevant tags to your Obsidian notes. The script intelligently processes unprocessed or recently modified notes, analyzes their content, and adds contextually appropriate tags to the frontmatter.

Key Features

🔍 Intelligently processes notes based on modification timestamps with built-in cooldown mechanisms
🤖 Uses AI (Ollama locally or OpenAI) to generate contextually relevant tags
📊 Analyzes existing tags in your vault for consistency
🏷️ Updates note frontmatter with new tags (without duplicating existing ones)
⚡ Batch processing support for large vaults
⏱️ Built-in rate limit handling and advanced retry mechanisms
🛡️ Smart cooldown system prevents unnecessary reprocessing
📝 Comprehensive logging for troubleshooting

Why Use This Script?

Maintaining a consistent tagging system in Obsidian can be challenging but is crucial for effective knowledge management. This script helps solve common tagging problems:

Consistency: By analyzing your entire vault's existing tags, it maintains a cohesive tagging system
Reduced Cognitive Load: No need to stop and think about appropriate tags while writing
Improved Discoverability: Better tagging means your notes are easier to find later
Time-Saving: Automatically processes new and modified notes intelligently
Knowledge Connections: Good tags help reveal connections between seemingly unrelated notes
Scalable: Batch processing handles large vaults efficiently

The script is especially useful as part of a regular note-taking workflow. It can run frequently without wasting resources, as it only processes files that need attention.

How It Works

The script uses an intelligent processing system:

Never Processed: Files without a processed timestamp in their frontmatter are automatically processed
Modified Since Processing: Files modified after their last processing timestamp are reprocessed (with a 15-minute cooldown to prevent rapid reprocessing)
Ignore Buffer: Files processed within the last 15 minutes are always included to handle edge cases
Batch Support: Large numbers of files can be processed in configurable batches to avoid overwhelming the AI service

This approach ensures notes get tagged when needed without unnecessary reprocessing.

Requirements

Python 3.7+
Obsidian vault with markdown files
One of the following:
- Ollama (free, runs locally)
- OpenAI API key (paid, cloud-based)
Required Python packages:
- requests
- python-dateutil
- pyyaml
- openai (optional, for OpenAI API)

Installation

Clone this repository:

git clone https://github.com/undergroundpost/obsidian-auto-tagger.git
cd obsidian-auto-tagger

Install the required packages:

pip install requests python-dateutil pyyaml
# If using OpenAI
pip install openai

Copy and modify the example config file:

cp config.yaml.example config.yaml
# Edit config.yaml with your preferred settings

Make the script executable (on Unix-like systems):

chmod +x generate_tags.py

Place the prompt file in the same directory:

# Make sure generate_tags.md is in the same directory as the script

Configuration

You can configure the script in three ways:

Config file: Create a config.yaml file in the same directory as the script
Environment-specific locations: The script will search in these locations (in order):
- Script directory: ./config.yaml
- User config: ~/.config/generate_tags/config.yaml
- System-wide: /etc/generate_tags/config.yaml
Command-line options: Override settings for a single run

Config File Example

# Folder settings
INPUT_FOLDER: "/Users/username/Obsidian/Vault" 

# Exclude folders (list of folders to ignore when scanning)
EXCLUDE_FOLDERS:
  - "/Users/username/Obsidian/Vault/AI"            # AI-related folder
  - "/Users/username/Obsidian/Vault/Private"       # Private notes
  - "/Users/username/Obsidian/Vault/Templates"     # Templates folder

# LLM Provider settings
LLM_PROVIDER: "ollama"                             # Options: "ollama" or "openai"

# Ollama settings (used when LLM_PROVIDER is "ollama")
OLLAMA_MODEL: "gemma3:12b"                         # Model to use
OLLAMA_SERVER_ADDRESS: "http://localhost:11434"    # Ollama server address
OLLAMA_CONTEXT_WINDOW: 32000                       # Context window size

# OpenAI settings (used when LLM_PROVIDER is "openai")
OPENAI_API_KEY: ""                                 # Your OpenAI API key (required for OpenAI)
OPENAI_MODEL: "gpt-3.5-turbo"                      # OpenAI model to use
OPENAI_MAX_TOKENS: 4000                            # Maximum tokens for responses

Note: The script also supports the legacy EXCLUDE_FOLDER (singular) for backward compatibility, but EXCLUDE_FOLDERS (plural) is recommended.

Prompt File

The script requires a prompt file named generate_tags.md in the same directory as the script. This file contains instructions for the LLM on how to generate tags. A template is provided in the repository.

Usage

Basic usage

./generate_tags.py

Override configuration

./generate_tags.py --input "/path/to/vault" --model "llama3:8b" --server "http://192.168.1.100:11434"

Process limited number of files

./generate_tags.py --limit 10

Use batch processing for large vaults

./generate_tags.py --batch-mode --batch-size 25

Enable debug logging

./generate_tags.py --debug

OpenAI with rate limit protection

./generate_tags.py --provider openai --api-key "your-key" --delay 5

Command-line Options

Option	Description
`--debug`	Enable detailed debug logging
`--input`	Override input folder
`--exclude`	Override exclude folders (can be used multiple times)
`--provider`	Set LLM provider: "ollama" or "openai"
`--model`	Override model name (for either provider)
`--server`	Override Ollama server address
`--api-key`	Override OpenAI API key
`--delay`	Add delay between files in seconds (helps with API rate limits)
`--log-level`	Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
`--limit`	Limit number of files to process (0 for no limit)
`--batch-mode`	Enable batch processing for large numbers of files
`--batch-size`	Number of files per batch (default: 20)

Example with Multiple Exclude Folders

./generate_tags.py --exclude "/path/to/exclude1" --exclude "/path/to/exclude2"

Example for Large Vaults

./generate_tags.py --batch-mode --batch-size 30 --delay 2 --limit 100

This processes up to 100 files in batches of 30, with a 2-second delay between files.

Batch Processing

For large vaults with many files to process, the script offers batch processing:

Automatic Batching: Use --batch-mode to enable batching when processing many files
Configurable Batch Size: Set --batch-size to control how many files are processed together (default: 20)
Batch Pauses: 30-second pause between batches to avoid overwhelming the AI service
Progress Tracking: Clear progress indicators show batch and overall progress

Example for processing a large vault:

./generate_tags.py --batch-mode --batch-size 25

LLM Provider Support

The script supports two LLM providers for generating tags:

Ollama (default): Use a local Ollama server running on your machine or network
OpenAI: Use OpenAI's API (requires an API key)

Configuring OpenAI

To use OpenAI instead of Ollama, you need to:

Install the OpenAI Python package:
```
pip install openai
```

Set your OpenAI API key in the config.yaml file:

LLM_PROVIDER: "openai"
OPENAI_API_KEY: "your-api-key-here"
OPENAI_MODEL: "gpt-3.5-turbo"  # default, or use "gpt-4" for better results

Alternatively, you can set these via command line:

./generate_tags.py --provider openai --api-key "your-api-key-here" --model "gpt-4"

The script uses "gpt-3.5-turbo" as the default OpenAI model, which provides a good balance between availability, cost, and quality. For even better tagging quality, you can use "gpt-4" or "o1-mini", though these may have different quota limits or costs.

Processing Logic

The script uses intelligent processing logic to avoid unnecessary work:

File Selection Criteria

Unprocessed Files: Any file without a processed timestamp in its frontmatter
Modified Files: Files modified after their processed timestamp (with 15-minute cooldown)
Recently Processed: Files processed within the last 15 minutes (ignore buffer for edge cases)

Cooldown System

15-minute cooldown: Prevents rapid reprocessing of the same file
Ignore buffer: Recently processed files are always included to handle timing edge cases
Modification tracking: Only reprocesses files that have actually changed

This system ensures efficient processing while avoiding missed files due to timing issues.

Logging

The script automatically creates log files in a logs directory next to the script. Each log file is named with the current date (generate_tags_YYYY-MM-DD.log) and contains detailed information about the script's execution, including any errors or warnings.

You can control the logging verbosity with these command-line options:

--debug: Enable detailed debug logging
--log-level: Set a specific logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Example:

./generate_tags.py --log-level WARNING  # Only show warnings and errors

Log files are helpful for troubleshooting and reviewing what happened during previous runs.

Scheduling

Flexible Scheduling

Since the script only processes files that need attention, you can run it more frequently than traditional "daily" scripts:

Frequent runs (recommended):

# Every 2 hours
0 */2 * * * /path/to/generate_tags.py

# Every hour during work hours
0 9-17 * * * /path/to/generate_tags.py

Traditional daily run:

# Once daily at 1 AM
0 1 * * * /path/to/generate_tags.py

On Unix/Linux/macOS (cron)

To run the script every 2 hours:

0 */2 * * * /path/to/generate_tags.py

On Windows (Task Scheduler)

Open Task Scheduler
Create a new Basic Task
Set the trigger to your preferred frequency (hourly, daily, etc.)
Set the action to "Start a program"
Program: python
Arguments: C:\path\to\generate_tags.py

Troubleshooting

Common Issues

No files found for processing:

Check that your INPUT_FOLDER path is correct
Verify that files aren't being excluded by EXCLUDE_FOLDERS
Run with --debug to see detailed file scanning information

Processing timestamp issues:

Files track processing time in their frontmatter with a processed field
If you need to force reprocessing, remove the processed field from the frontmatter
The 15-minute cooldown prevents rapid reprocessing

Rate limiting with OpenAI:

Use --delay 5 to add delays between API calls
Consider using --batch-mode with smaller batch sizes
Monitor your OpenAI usage dashboard

Large vault performance:

Use --batch-mode for processing many files
Adjust --batch-size based on your system and AI service capacity
Use --limit to process files incrementally

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
generate_tags.md		generate_tags.md
generate_tags.py		generate_tags.py

License

undergroundpost/obsidian-auto-tagger

Folders and files

Latest commit

History

Repository files navigation

Obsidian Auto Tagger

Key Features

Why Use This Script?

How It Works

Requirements

Installation

Configuration

Config File Example

Prompt File

Usage

Basic usage

Override configuration

Process limited number of files

Use batch processing for large vaults

Enable debug logging

OpenAI with rate limit protection

Command-line Options

Example with Multiple Exclude Folders

Example for Large Vaults

Batch Processing

LLM Provider Support

Configuring OpenAI

Processing Logic

File Selection Criteria

Cooldown System

Logging

Scheduling

Flexible Scheduling

On Unix/Linux/macOS (cron)

On Windows (Task Scheduler)

Troubleshooting

Common Issues

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages