Skip to content

Conversation

@haitmo
Copy link
Contributor

@haitmo haitmo commented Oct 17, 2025

No description provided.

haitmo and others added 3 commits October 10, 2025 09:14
* refactor: simplify connection setup by removing retry logic and pooling configuration

* fix: add endpoint to Azure translation client configuration

* feat: split embedding model demand between questions and documents (#507)

* feat: split embedding model demand betwee questions and documents

* Clean up comments in llm.py

* fix: delete Q&A documents when Chat message is deleted (#515)

* fix: prevent error when pure keyword Q&A returns no source (#516)

* fix: update broken user guide links (#513)

updated user guide links

* fix: caught exceptions for users with no email in entra_sync

* fix: NLTK cache in Dockerfile, quieter logs, Entra user skip on blank email (#517)

* feat: split embedding model demand betwee questions and documents

* fix: enhance logging during markdown extraction and node creation

* feat: add log filtering for specific request keywords

* feat: add logging filter to raise log level for specific endpoints

* feat: enhance logging during markdown extraction process

* feat: add logging for markdown splitting process and page tag fixing

* fix: handle missing email and integrity errors during user update or creation

* feat: preload NLTK stopwords to avoid runtime errors in llama_index SentenceSplitter

* feat: preload NLTK "punkt" and "stopwords" corpora to prevent runtime failures in llama_index SentenceSplitter

* feat: preload additional NLTK corpora for enhanced NLP functionality in llama_index

* feat: integrate WTPSplit for improved sentence splitting in MarkdownSplitter and preload model data

* fix: update Python dependency installation process to ensure latest setuptools is used

* fix: correct import path for WTPSplit in Dockerfile to ensure proper functionality

* fix: update Dockerfile to set NLTK_DATA environment variable and remove WTPSplit model data preload

* fix: refactor MarkdownSplitter to use SentenceSplitter for sentence splitting

* fix: add download of 'punkt_tab' NLTK resource for enhanced NLP functionality

* fix: refactor MarkdownSplitter to remove WTPSplit dependency and improve logging

* fix: remove unnecessary logging in process_document_helper and extract_markdown functions

* fix: remove unnecessary comments in OttoLLM class to improve code clarity

---------

Co-authored-by: otto-jumpbox <otto@justice.gc.ca>

---------

Co-authored-by: otto-jumpbox <otto@justice.gc.ca>
Co-authored-by: jannable <Jason.Annable@justice.gc.ca>
Co-authored-by: jannable <jason@magnara.ca>
Add cost warning buttons div to chat message template
* perf: single pgvector db connection per app instance

* Limit embedding sleep time to a maximum of 64 seconds

---------

Co-authored-by: Jules Kuehn <jk@jules.lol>
@github-actions
Copy link
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  django/chat
  llm.py 208-209, 235-248, 581, 587-590, 593, 596-603, 610-642
  models.py 767-768
  responses.py
  utils.py 515-528
  django/laws
  tasks.py
  views.py
  django/librarian
  tasks.py 50-59, 184-187
  django/librarian/utils
  markdown_splitter.py
  django/otto
  views.py
  django/otto/utils
  logging.py 38
Project Total  

This report was generated by python-coverage-comment-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants