DepoIndex is an automated, AI-powered workflow that reads deposition transcripts, detects every distinct subject discussed, and produces a Table of Contents (TOC). For each topic, the system lists the starting page number and line number, then outputs the results as a chronologically ordered Topic Index.
DepoIndex enables clerks to generate an accurate, paginated topic index from a deposition in minutes - eliminating manual scanning and ensuring that judges can jump directly to any point of interest.
Topic ExtractionTable of Contents GenerationValidation NotebookCLI wrapper
- Python 3.12
- Ollama with the following models pulled:
- Gemma3 (4B) for topic extraction.
- DeepSeek-R1 (8B) for reasoning based validation.
-
Clone repo:
git clone https://crowaltz24/DepoIndex cd DepoIndex -
Ensure your directory structure is as follows:
DepoIndex |_ /inputs |_ /outputs |_ /utils |_ /validation -
Create venv:
python -m venv venv venv/scripts/activate
-
Install requirements:
pip install -r requirements.txt
-
Place your Deposition Transcript in
/inputs. -
Run the script, providing input and output file names.
python build_toc.py --file deposition.pdf --out toc.docx
--fileaccepts input path (relative to/inputs)- This may be in
.pdfor.txtformat.
- This may be in
--outaccepts output path (relative to/outputs)- This may be in
.mdor.docxformat
- This may be in
- Access your outputs from the
/outputsdirectory.
Run depoindex.py.
OR
Run the scripts one by one:
topic_extraction.pyextracts topics intoextracted_topics.json.topc_generator.pyuses the extracted topics to generate a table of contents, saving it in Markdown and docx formats.
validation.ipynb is... a validation notebook.
It takes a random sample of topics from y our extracted topics, and runs them past a reasoning LLM to compare with an excerpt of the text they were generated from to judge accuracy.
NOTE: You can change the number_of_topics to validate on as many topics as you want.
MIT License.