Skip to content

DataBooth/try-flockmtl

Repository files navigation

try-flockmtl

FlockMTL/Ollama DuckDB Integration

Experiments with the FlockMTL DuckDB extension.

Status

Status: Experimental

There appears to be some issue with the FlockMTL extension and DuckDB 1.2.2, which is causing the GetAlterInfo not implemented for this type error. This might be due to a mismatch between the extension and the DuckDB version, however the documenation suggests that the extension is compatible with DuckDB 1.2.2.

Diagraam

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Local Environment
        PII_Data["PII Data<br/>(Sensitive)"] --> Schema_Inference[Schema Inference]
        PII_Data --> Deidentify[De-identification]
        Schema_Inference --> Synthetic[Generate Synthetic Data]
        
        Synthetic --> DuckDB["DuckDB"]
        Deidentify --> Deidentified["De-identified Data<br/>(Pseudonymised)"]
        Deidentified --> DuckDB
        
        DuckDB --> FlockMTL["FlockMTL Extension"]
        FlockMTL -->|Local Processing| Local_LLM["Ollama<br/>(Local LLM)"]
    end

    subgraph Cloud Environment
        MotherDuck["MotherDuck"] --> Cloud_LLM["Commercial LLM"]
        Synthetic --> MotherDuck
        Deidentified --> MotherDuck
    end

    %% Restricted flows
    PII_Data -.->|"❌ Not Allowed"| MotherDuck
    PII_Data -.->|"❌ Not Allowed"| Cloud_LLM
    FlockMTL -.->|"❌ PII"| Cloud_LLM

    style PII_Data fill:#ffcccc,stroke:#ff0000
    style Deidentified fill:#ccffcc,stroke:#00ff00
    style Synthetic fill:#ccccff,stroke:#0000ff
Loading

Why?

Modern data workflows increasingly require local, private, and flexible access to large language models (LLMs) for analytics, summarisation, and automation. This project enables you to:

  • Integrate local LLMs (via Ollama) directly with DuckDB using FlockMTL, so you can run LLM-powered SQL queries on your data.
  • Automate and manage LLM model registration and configuration using Python and TOML, making your setup reproducible and easy to maintain.
  • Debug and audit all SQL interactions for transparency and troubleshooting.

This is a common pattern that is emerging in the data ecosystem, where LLMs are used to augment SQL queries and data processing. For example:

  • Note that the commercial MotherDuck offering offers built-in support for various LLMs via the prompt() method.
  • Similarly, Google Sheets now has built-in support for the Gemini suite of LLMs via the =AI() function.

What?

This repository provides:

  • (Configuration-driven, object-oriented Python script (flockmtl_manager.py) to:
    • Check Ollama endpoint health
    • Register Ollama models with FlockMTL in DuckDB
    • Run test completions via SQL
    • Log all SQL commands (with Loguru) and export them to flockmtl.sql
  • Sample TOML config (flockmtl.toml) for easy model and endpoint management
  • Justfile for convenient CLI automation of common tasks (running scripts, managing Ollama models, etc.)

How?

1. Prerequisites

  • Python 3.13+
  • DuckDB installed (CLI or Python package)
  • Ollama installed and running (see Ollama docs)
  • FlockMTL DuckDB extension installed
  • Python dependencies:
uv add duckdb httpx loguru
  • (Optional) Just for task automation

2. Setup

a. Configure Models and Endpoint

Edit flockmtl.toml to specify your Ollama endpoint and models. For example:

[ollama]
api_url = "http://127.0.0.1:11434"

[[models]]
name = "mixtral_local"
ollama_name = "mixtral:latest"
context_window = 128000
max_output_tokens = 2048

[[models]]
name = "llama2_local"
ollama_name = "llama2:latest"
context_window = 4096
max_output_tokens = 1024

b. Start Ollama

ollama serve

or use the Justfile recipe:

just ollama-serve

c. Register Models & Test

Run the Python manager script:

python flockmtl_manager.py

This will:

  • Check if the Ollama endpoint is available
  • Register the Ollama API secret with FlockMTL
  • Register each model from the TOML file
  • Run a test LLM completion via SQL
  • Log all SQL statements to flockmtl.log and flockmtl.sql

d. Manage with Justfile

Use the included Justfile for tasks like:

just ollama-list         # List local Ollama models
just ollama-pull llama2  # Pull a model
just run                 # Run the manager script

3. Troubleshooting

  • Extension errors: Ensure the FlockMTL extension matches your DuckDB version and platform. Use INSTALL flockmtl; LOAD flockmtl; in DuckDB, or see DuckDB extension docs.
  • Hanging on test completion:
    • Make sure Ollama is running and the specified model is available.
    • Test the model directly: ollama run llama2 --prompt "Hello"
  • SQL errors or debugging:
    • Check flockmtl.sql for all executed SQL statements.
    • Review flockmtl.log for detailed logs.

4. Project Structure

.
├── flockmtl_manager.py   # Main Python script (OO, config-driven)
├── flockmtl.toml         # Model & endpoint configuration
├── flockmtl.sql          # All executed SQL commands (for debugging)
├── flockmtl.log          # Loguru logs (SQL and actions)
├── Justfile              # Task automation commands

5. Extending

  • Add more models to flockmtl.toml as needed.
  • Customise or extend the Python script for advanced workflows (e.g., model removal, batch prompts).
  • Use the SQL log for bug reports or reproducibility.

Example Usage

# List available tasks
just

# Start Ollama server
just ollama-serve

# Pull a new model
just ollama-pull llama2

# Run the integration script
just run

Compatibility Notes

  • DuckDB and FlockMTL extension versions must be compatible. If you see errors about GetAlterInfo not implemented for this type, update your FlockMTL extension to match your DuckDB version.
  • Ollama models must be pulled before use.
  • Python 3.13+ is required for built-in TOML parsing.

License

Apache-2.0 License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.


Credits


Questions or issues?

Open an issue or contact github@databooth.com.au.

About

FlockMTL DuckDB extension

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published