Skip to content

Conversation

@jun-create
Copy link
Collaborator

@jun-create jun-create commented Sep 17, 2025

User description

Summary

Fixes #166


Changes Made

  • Modified to do parallel loading of files from sequential loading of files on images_UI, tables_UI
  • Fixed embed results on images_UI
  • Fixed timeout on chat_UI
  • Chat now works

Context / Rationale

Feature refining


Related Docs or References

image

General Checklist

  • I have tested these changes locally
  • I have updated relevant documentation or added comments where needed
  • I have linked relevant issues and tagged reviewers
  • I have followed coding conventions and naming standards

PR Type

Enhancement, Bug fix


Description

  • Implement parallel loading for images and tables UI

  • Fix embedding results display and timeout handling

  • Refactor document components for better reusability

  • Update chat prompts and add timeout configuration


Diagram Walkthrough

flowchart LR
  A["Sequential Loading"] --> B["Parallel Loading"]
  B --> C["Images UI"]
  B --> D["Tables UI"]
  E["Chat Timeout"] --> F["300s Timeout"]
  G["Embedding Display"] --> H["Small Table Format"]
  I["Document Components"] --> J["Reusable Components"]
Loading

File Walkthrough

Relevant files
Enhancement
rag_config.py
Update RAG validation examples and chunk formatting           

chat_service/models/rag_config.py

  • Updated validation examples with more generic document-focused queries
  • Removed similarity score from chunk headers for cleaner output
+8/-17   
2_images_UI.py
Implement parallel loading and enhance image extraction UI

frontend/my_pages/2_images_UI.py

  • Implemented parallel loading using asyncio.gather() for multiple
    documents
  • Added reusable document components import and usage
  • Enhanced error handling with proper status messages and timeout
    handling
  • Fixed embedding display to show results in small table format
  • Added refresh button and improved UI layout
+179/-168
3_tables_UI.py
Implement parallel loading for table extraction                   

frontend/my_pages/3_tables_UI.py

  • Refactored to use parallel loading with asyncio.gather()
  • Integrated reusable document components
  • Improved error handling and status reporting
  • Added refresh button and cleaner UI layout
+63/-148
Bug fix
5_chat_UI.py
Add timeout and update chat prompts                                           

frontend/my_pages/5_chat_UI.py

  • Added 300-second timeout to HTTP client configuration
  • Updated preset query button prompts to be more document-specific
+4/-4     
7_metadata_UI.py
Improve metadata display error handling                                   

frontend/my_pages/7_metadata_UI.py

  • Enhanced error handling for metadata display
  • Added proper fallback messages for failed processing or no metadata
+7/-1     

@jun-create
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces parallel loading for images and tables in the frontend, which is a great performance improvement. The code has been refactored to use asyncio.gather effectively. Additionally, several fixes are included, such as increasing the timeout for chat requests and improving error handling and status reporting across different UI pages. My review focuses on a few areas for improvement: correcting a blocking I/O call in an async function, fixing incorrect or missing type hints to improve code quality and maintainability, and cleaning up some duplicated code. Overall, these are solid changes that enhance the application's performance and robustness.

jun-create and others added 4 commits September 17, 2025 15:00
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@jun-create jun-create marked this pull request as ready for review September 17, 2025 08:54
@jun-create jun-create requested a review from MingJ7 September 17, 2025 08:54
@codiumai-pr-agent-free
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅

166 - Fully compliant

Compliant requirements:

  • Chat page
    • User input
    • Single file rag
    • Multiple file rag
    • Assistant reply
  • Pending check chat backend
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Unused Variable

The image_bytes variable is initialized but may not be assigned a value before being referenced in the conditional check and download button logic.

image_bytes = None  # Initialize to handle scope issues

with col1:
    # Display actual image from URL
    if "detail" in image_data:
        st.error("An error occurred while loading the image.")
    else:
        # Fetch image with authenticated client
        with httpx.Client(
            cookies=st.session_state.httpx_cookies
        ) as client:
            img_response = client.get(image_url)
Potential Error

The display_tables function attempts to convert a DataFrame to CSV even when df might be None, which could cause runtime errors.

if df is not None:
    csv_string = df.to_csv(index=False)
    st.code(csv_string, language="csv")
Commented Code

There's a commented line of code that appears to be a duplicate of the line that was replaced, which could cause confusion during maintenance.

# chunk_header = f"--- Document Section {i + 1} (Relevance: {chunk.get('similarity_score', 0):.2f}) ---"             chunk_header = f"--- Document Section {i + 1} (Relevance: {chunk.get('similarity_score', 0):.2f}) ---"
chunk_header = f"--- Document Section {i + 1} ---"

@codiumai-pr-agent-free
Copy link

codiumai-pr-agent-free bot commented Sep 17, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Improve embedding process efficiency

Refactor the display_all function to initiate and poll for each document's
embedding status in a single concurrent task, improving UI responsiveness by
showing results as they become available.

frontend/my_pages/2_images_UI.py [176-186]

 async def display_all(expanders: list[DocumentExpander]):
     displays = [display_images(expander) for expander in expanders]
     await asyncio.gather(*displays, return_exceptions=True)
     with st.spinner("Embedding document for RAG..."):
-        # Post - starts embedding, response from embed_pdf is 202
-        embed_start = [embed_pdf(expander.doc_id, "semantic") for expander in expanders]
-        await asyncio.gather(*embed_start, return_exceptions=True)
+        # Start embedding and poll for each document individually
+        embedding_tasks = []
+        for expander in expanders:
+            async def embed_and_display(expander):
+                await embed_pdf(expander.doc_id, "semantic")
+                await display_embedding(expander)
+            embedding_tasks.append(embed_and_display(expander))
+        
+        await asyncio.gather(*embedding_tasks, return_exceptions=True)
 
-        # Get - checks if embedding is done
-        embed_response = [display_embedding(expander) for expander in expanders] 
-        await asyncio.gather(*embed_response, return_exceptions=True)
-
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies an inefficient async pattern and proposes a better one that improves UI responsiveness by processing and displaying results per document, rather than in separate bulk stages.

Medium
  • Update

Copy link
Owner

@NotYuSheng NotYuSheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Will verify status before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Sub-Issue] Chat page

4 participants