Skip to content

DepoIndex is an automated, AI-powered workflow that reads deposition transcripts, detects every distinct subject discussed, and produces a Table of Contents (TOC).

License

Notifications You must be signed in to change notification settings

crowaltz24/DepoIndex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DepoIndex

DepoIndex is an automated, AI-powered workflow that reads deposition transcripts, detects every distinct subject discussed, and produces a Table of Contents (TOC). For each topic, the system lists the starting page number and line number, then outputs the results as a chronologically ordered Topic Index.

DepoIndex enables clerks to generate an accurate, paginated topic index from a deposition in minutes - eliminating manual scanning and ensuring that judges can jump directly to any point of interest.

To-Do

  • Topic Extraction
  • Table of Contents Generation
  • Validation Notebook
  • CLI wrapper

Requirements

  • Python 3.12
  • Ollama with the following models pulled:
    • Gemma3 (4B) for topic extraction.
    • DeepSeek-R1 (8B) for reasoning based validation.

Setup

  1. Clone repo:

    git clone https://crowaltz24/DepoIndex
    cd DepoIndex
  2. Ensure your directory structure is as follows:

    DepoIndex
    |_ /inputs
    |_ /outputs
    |_ /utils
    |_ /validation
    
  3. Create venv:

    python -m venv venv
    venv/scripts/activate
  4. Install requirements:

    pip install -r requirements.txt

How to Use

CLI

  1. Place your Deposition Transcript in /inputs.

  2. Run the script, providing input and output file names.

    python build_toc.py --file deposition.pdf --out toc.docx
  • --file accepts input path (relative to /inputs)
    • This may be in .pdf or .txt format.
  • --out accepts output path (relative to /outputs)
    • This may be in .md or .docx format
  1. Access your outputs from the /outputs directory.

Manual Usage

Run depoindex.py.

OR

Run the scripts one by one:

  • topic_extraction.py extracts topics into extracted_topics.json.
  • topc_generator.py uses the extracted topics to generate a table of contents, saving it in Markdown and docx formats.

Validation

validation.ipynb is... a validation notebook.

It takes a random sample of topics from y our extracted topics, and runs them past a reasoning LLM to compare with an excerpt of the text they were generated from to judge accuracy.

NOTE: You can change the number_of_topics to validate on as many topics as you want.

License

MIT License.

About

DepoIndex is an automated, AI-powered workflow that reads deposition transcripts, detects every distinct subject discussed, and produces a Table of Contents (TOC).

Resources

License

Stars

Watchers

Forks