DepoIndex

DepoIndex is an automated, AI-powered workflow that reads deposition transcripts, detects every distinct subject discussed, and produces a Table of Contents (TOC). For each topic, the system lists the starting page number and line number, then outputs the results as a chronologically ordered Topic Index.

DepoIndex enables clerks to generate an accurate, paginated topic index from a deposition in minutes - eliminating manual scanning and ensuring that judges can jump directly to any point of interest.

To-Do

~~Topic Extraction~~
~~Table of Contents Generation~~
~~Validation Notebook~~
~~CLI wrapper~~

Requirements

Python 3.12
Ollama with the following models pulled:
- Gemma3 (4B) for topic extraction.
- DeepSeek-R1 (8B) for reasoning based validation.

Setup

Clone repo:

git clone https://crowaltz24/DepoIndex
cd DepoIndex

Ensure your directory structure is as follows:

DepoIndex
|_ /inputs
|_ /outputs
|_ /utils
|_ /validation

Create venv:

python -m venv venv
venv/scripts/activate

Install requirements:
```
pip install -r requirements.txt
```

How to Use

CLI

Place your Deposition Transcript in /inputs.

Run the script, providing input and output file names.

python build_toc.py --file deposition.pdf --out toc.docx

--file accepts input path (relative to /inputs)
- This may be in .pdf or .txt format.
--out accepts output path (relative to /outputs)
- This may be in .md or .docx format

Access your outputs from the /outputs directory.

Manual Usage

Run depoindex.py.

OR

Run the scripts one by one:

topic_extraction.py extracts topics into extracted_topics.json.
topc_generator.py uses the extracted topics to generate a table of contents, saving it in Markdown and docx formats.

Validation

validation.ipynb is... a validation notebook.

It takes a random sample of topics from y our extracted topics, and runs them past a reasoning LLM to compare with an excerpt of the text they were generated from to judge accuracy.

NOTE: You can change the number_of_topics to validate on as many topics as you want.

License

MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DepoIndex

To-Do

Requirements

Setup

How to Use

CLI

Manual Usage

Validation

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
inputs		inputs
outputs		outputs
utils		utils
validation		validation
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
build_toc.py		build_toc.py
requirements.txt		requirements.txt

License

crowaltz24/DepoIndex

Folders and files

Latest commit

History

Repository files navigation

DepoIndex

To-Do

Requirements

Setup

How to Use

CLI

Manual Usage

Validation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages