-
Notifications
You must be signed in to change notification settings - Fork 0
Created report generator #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR introduces a complete clinical report generator application that reads DAIC analysis data, generates clinical reports via Google's Gemini API, and exports them as PDF documents. It includes project configuration files, comprehensive documentation, repository management, application logic with end-to-end orchestration, and sample analysis data. Changes
Sequence DiagramsequenceDiagram
actor User
participant App as ClinicalReportGenerator
participant FileSystem as File System
participant API as Gemini API
participant Converter as PDF Converter
User->>App: run()
activate App
App->>FileSystem: read_daic_analysis_report()
alt Input file exists
FileSystem-->>App: Analysis data
else Input file missing
App->>FileSystem: Generate sample data
FileSystem-->>App: Sample data
end
App->>API: generate_clinical_analysis_report(input_data)
activate API
API-->>App: Markdown report
deactivate API
App->>Converter: create_pdf_report(markdown_content)
activate Converter
Converter->>Converter: Convert Markdown to HTML
Converter->>Converter: Apply CSS styling
Converter->>FileSystem: Render and save PDF
Converter-->>App: PDF created
deactivate Converter
App-->>User: Workflow complete
deactivate App
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (6)
clinical-report-generator/README.md (2)
7-25: Add language identifier to fenced code blocks.Per static analysis (MD040), fenced code blocks should specify a language for proper syntax highlighting and accessibility.
Suggested fix
-``` +```text clinical-report-generator/ ├── src/
136-147: Add language identifier to input data format code block.Suggested fix
-``` +```text Global State-Determining Indicators: Tremor: X words (examples)clinical-report-generator/src/clinical_report_generator.py (4)
30-37:LOG_LEVELvalidation could prevent startup failures.If an invalid
LOG_LEVEL(e.g.,"INVALID") is set in the environment,getattr(logging, LOG_LEVEL)will raise anAttributeError, crashing the application with an unhelpful error.Suggested fix
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO") +VALID_LOG_LEVELS = {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"} +if LOG_LEVEL.upper() not in VALID_LOG_LEVELS: + LOG_LEVEL = "INFO" LOG_DIR = Path("logs") LOG_DIR.mkdir(exist_ok=True) logging.basicConfig( - level=getattr(logging, LOG_LEVEL), + level=getattr(logging, LOG_LEVEL.upper()),
399-401: Uselogger.exceptionfor stack trace and catch specific exceptions.Per static analysis (TRY400, BLE001), use
logging.exceptionto include the stack trace and avoid catching bareException.Suggested fix
- except Exception as e: - logger.error(f"Error during API call: {e}") + except genai.APIError as e: + logger.exception(f"Error during API call: {e}") return f"Error during API call: {e}"Note: Check the actual exception types from the
google-genaiSDK to catch appropriately.
456-458: Use specific exception types andlogger.exception.Per static analysis hints, avoid catching bare
Exceptionand uselogger.exceptionto include stack traces.Suggested fix
- except Exception as e: - logger.error(f"Critical error writing PDF: {e}") + except (OSError, IOError) as e: + logger.exception(f"Critical error writing PDF: {e}") return False
512-518: Uselogger.exceptionfor better debugging.Per static analysis (TRY400),
logger.exceptionautomatically includes the stack trace. TheValueErrorhandler at lines 513-514 could benefit from this as well.Suggested fix
except ValueError as e: - logger.error(f"Configuration error: {e}") - logger.error("Please check your .env file and ensure GEMINI_API_KEY is set") + logger.exception(f"Configuration error: {e}") + logger.info("Please check your .env file and ensure GEMINI_API_KEY is set") exit(1) except Exception as e: logger.error(f"Unexpected error: {e}", exc_info=True)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
clinical-report-generator/.env.exampleclinical-report-generator/.gitignoreclinical-report-generator/README.mdclinical-report-generator/output/.gitkeepclinical-report-generator/requirements.txtclinical-report-generator/src/clinical_report_generator.pydaic_analysis_report.txt
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
clinical-report-generator/README.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
136-136: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
225-225: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.10)
clinical-report-generator/src/clinical_report_generator.py
55-55: Avoid specifying long messages outside the exception class
(TRY003)
108-108: Consider moving this statement to an else block
(TRY300)
109-109: Do not catch blind exception: Exception
(BLE001)
110-110: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
118-118: Do not catch blind exception: Exception
(BLE001)
119-119: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
137-137: Local variable prompt1 is assigned to but never used
Remove assignment to unused variable prompt1
(F841)
214-214: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
297-297: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
331-331: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
354-354: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
355-355: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
370-370: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
397-397: Consider moving this statement to an else block
(TRY300)
399-399: Do not catch blind exception: Exception
(BLE001)
400-400: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
456-456: Do not catch blind exception: Exception
(BLE001)
457-457: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
513-513: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
514-514: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (5)
daic_analysis_report.txt (1)
1-13: Sample data file looks appropriate for testing.The format aligns with the expected input data structure documented in the README and matches the parsing expectations in the generator code.
clinical-report-generator/output/.gitkeep (1)
1-1: LGTM!Standard practice to preserve the output directory structure in version control.
clinical-report-generator/.gitignore (1)
1-54: LGTM!Comprehensive
.gitignorewith appropriate rules for Python projects. Good security practice ignoring.env, API keys, and secrets while preserving directory structure via.gitkeepexceptions.clinical-report-generator/src/clinical_report_generator.py (1)
45-57: Good API key validation.The check for both missing key and placeholder value (
"your_api_key_here") is a practical safeguard against misconfiguration.clinical-report-generator/requirements.txt (1)
1-4: The package namegoogle-genaiis correct. The package exists on PyPI (version 1.56.0) and matches the import statementimport google.genai as genaiused in the codebase. No changes needed.Likely an incorrect or invalid review comment.
| MODEL_NAME=gemini-2.5-flash | ||
|
|
||
| # File Paths | ||
| INPUT_FILE_PATH=data/daic_analysis_report.txt | ||
| OUTPUT_PDF_PATH=output/Clinical_Analysis_Report.pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistency between .env.example and code defaults.
-
Model name mismatch: The example uses
gemini-2.5-flash, but the code default inclinical_report_generator.py(line 48) isgemini-2.0-flash-exp. -
Input file path: The example references
data/daic_analysis_report.txt, but the sample filedaic_analysis_report.txtis placed at the repository root, not in adata/subdirectory.
Consider aligning these values to avoid user confusion.
Suggested fixes
# Model Configuration
-MODEL_NAME=gemini-2.5-flash
+MODEL_NAME=gemini-2.0-flash-exp
# File Paths
-INPUT_FILE_PATH=data/daic_analysis_report.txt
+INPUT_FILE_PATH=../daic_analysis_report.txtOr move daic_analysis_report.txt to clinical-report-generator/data/daic_analysis_report.txt.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| MODEL_NAME=gemini-2.5-flash | |
| # File Paths | |
| INPUT_FILE_PATH=data/daic_analysis_report.txt | |
| OUTPUT_PDF_PATH=output/Clinical_Analysis_Report.pdf | |
| MODEL_NAME=gemini-2.0-flash-exp | |
| # File Paths | |
| INPUT_FILE_PATH=../daic_analysis_report.txt | |
| OUTPUT_PDF_PATH=output/Clinical_Analysis_Report.pdf |
🤖 Prompt for AI Agents
In clinical-report-generator/.env.example around lines 9 to 13, align the
example values with the code defaults: change MODEL_NAME to match the default
used in clinical_report_generator.py (gemini-2.0-flash-exp) and correct
INPUT_FILE_PATH to point to the actual sample file location
(../daic_analysis_report.txt or move the sample into
clinical-report-generator/data/ and keep data/daic_analysis_report.txt); update
OUTPUT_PDF_PATH only if the code expects a different path.
| MODEL_NAME=gemini-2.5-flash | ||
| INPUT_FILE_PATH=data/daic_analysis_report.txt | ||
| OUTPUT_PDF_PATH=output/Clinical_Analysis_Report.pdf | ||
| LOG_LEVEL=INFO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model name inconsistency with code default.
The documented MODEL_NAME=gemini-2.5-flash differs from the code's default value gemini-2.0-flash-exp in clinical_report_generator.py line 48. Align these to reduce confusion.
🤖 Prompt for AI Agents
In clinical-report-generator/README.md around lines 76 to 79, the documented
MODEL_NAME=gemini-2.5-flash conflicts with the code default gemini-2.0-flash-exp
in clinical_report_generator.py line 48; update one to match the other
(preferably make the README model name identical to the code default or change
the code default to the documented value), and ensure any related documentation
or environment variable examples and tests are updated accordingly so the repo
uses a single canonical MODEL_NAME.
| except Exception as e: | ||
| logger.error(f"Error creating sample file: {e}") | ||
| return f"Error creating sample file: {e}" | ||
|
|
||
| try: | ||
| with open(self.input_file_path, "r", encoding="utf-8") as f: | ||
| content = f.read() | ||
| logger.info(f"Successfully read {len(content)} characters from input file") | ||
| return content | ||
| except Exception as e: | ||
| logger.error(f"Error reading file: {e}") | ||
| return f"Error reading file: {e}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling pattern is fragile.
Returning error messages as strings and checking "Error" in input_data (line 474) is unreliable—legitimate input data containing the word "Error" would trigger false failures. Also, catching bare Exception hides specific error types.
Suggested refactor using exceptions
- except Exception as e:
- logger.error(f"Error creating sample file: {e}")
- return f"Error creating sample file: {e}"
+ except OSError as e:
+ logger.exception(f"Error creating sample file: {e}")
+ raise
try:
with open(self.input_file_path, "r", encoding="utf-8") as f:
content = f.read()
logger.info(f"Successfully read {len(content)} characters from input file")
return content
- except Exception as e:
- logger.error(f"Error reading file: {e}")
- return f"Error reading file: {e}"
+ except OSError as e:
+ logger.exception(f"Error reading file: {e}")
+ raiseThen update run() to catch these exceptions explicitly rather than string matching.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| logger.error(f"Error creating sample file: {e}") | |
| return f"Error creating sample file: {e}" | |
| try: | |
| with open(self.input_file_path, "r", encoding="utf-8") as f: | |
| content = f.read() | |
| logger.info(f"Successfully read {len(content)} characters from input file") | |
| return content | |
| except Exception as e: | |
| logger.error(f"Error reading file: {e}") | |
| return f"Error reading file: {e}" | |
| except OSError as e: | |
| logger.exception(f"Error creating sample file: {e}") | |
| raise | |
| try: | |
| with open(self.input_file_path, "r", encoding="utf-8") as f: | |
| content = f.read() | |
| logger.info(f"Successfully read {len(content)} characters from input file") | |
| return content | |
| except OSError as e: | |
| logger.exception(f"Error reading file: {e}") | |
| raise |
🧰 Tools
🪛 Ruff (0.14.10)
109-109: Do not catch blind exception: Exception
(BLE001)
110-110: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
118-118: Do not catch blind exception: Exception
(BLE001)
119-119: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In clinical-report-generator/src/clinical_report_generator.py around lines 109
to 120, the current pattern returns error messages as strings and catches bare
Exception which is fragile and causes downstream code to rely on string matching
("Error" in input_data). Replace the string-return error handling with specific
exceptions: raise a FileNotFoundError when the input file is missing, raise a
PermissionError for permission issues, and raise an IOError/ValueError for read
failures as appropriate; avoid catching bare Exception—catch and re-raise
specific exceptions if you need to add context. Then update run() to call this
method inside a try/except that catches those specific exceptions
(FileNotFoundError, PermissionError, IOError/ValueError) and handles them
appropriately (logging and returning or propagating an error code), removing any
string-based "Error" checks.
| prompt1 = f"""### SYSTEM ROLE | ||
| You are an Expert Clinical Data Scientist and Behavioral Analyst specializing in the Distress Analysis Interview Corpus (DAIC). Your task is to interpret raw automated analysis logs and synthesize them into a professional, human-readable clinical insight report. | ||
|
|
||
| ### CONTEXT & TASK | ||
| I will provide you with a raw text file titled "DAIC Analysis Report". This file contains: | ||
| 1. **Global State-Determining Indicators**: Aggregated words and sentences associated with specific states. | ||
| 2. **Session Reports**: Data for individual sessions including frame counts, hidden norm means, and anomalies. | ||
|
|
||
| ### INSTRUCTIONS (Step-by-Step) | ||
| Please follow this Chain-of-Thought process: | ||
|
|
||
| 1. **Executive Summary**: Summarize the data's purpose. Explicitly mention how many distinct sessions are analyzed. | ||
| 2. **Global Indicator Analysis**: | ||
| * Analyze the "Global State-Determining Indicators". | ||
| * **CRITICAL SANITY CHECK**: Compare word counts for "Tremor" vs. "Speech Disfluency". If identical, flag as "System Artifact" or "High Feature Overlap". | ||
| 3. **Session-by-Session Deep Dive**: | ||
| * Process EVERY session found in the input. | ||
| * **Isolation Rule**: Treat each session as a data silo. Never mix metrics between sessions. | ||
| * **Psycholinguistic Analysis**: Categorize words (e.g., "Narrative Fillers" like 'um/uh' vs. "Cognitive Hedges" like 'think/know'). | ||
| 4. **Comparative Insights**: | ||
| * Compare "Hidden Norm Means" and discuss magnitude of difference. | ||
| * Contrast vocabulary patterns. | ||
|
|
||
| ### OUTPUT FORMAT (Markdown) | ||
| # Clinical Behavioral Analysis Report | ||
|
|
||
| ## 1. Executive Summary | ||
| [Summary including count of sessions analyzed] | ||
|
|
||
| ## 2. Global Behavioral Markers | ||
| * **Tremor Indicators**: [Analysis with specific word counts] | ||
| * **Disfluency Indicators**: [Analysis - include note on overlap if found] | ||
|
|
||
| ## 3. Session Analysis | ||
| ### Session [ID] | ||
| * **Metrics**: Frames: [X] | Hidden Norm Mean: [Y] | ||
| * **Psycholinguistic Profile**: [Analysis of specific words] | ||
|
|
||
| [Repeat for ALL sessions] | ||
|
|
||
| ## 4. Comparative Insights & Critical Conclusion | ||
| * **Intensity Check:** Compare "Hidden Norm Means" | ||
| * **Vocabulary Contrast:** Contrast patterns | ||
| * **Medical Consequence:** Flag any system artifacts | ||
|
|
||
| --- | ||
| ### INPUT DATA | ||
| {input_data} | ||
| """ | ||
|
|
||
| prompt=prompt = f"""### SYSTEM ROLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused prompt1 variable.
prompt1 (lines 137-185) is defined but never used. This appears to be a previous iteration of the prompt. Per static analysis (F841), remove this dead code.
Suggested fix
Remove lines 137-185 entirely.
🧰 Tools
🪛 Ruff (0.14.10)
137-137: Local variable prompt1 is assigned to but never used
Remove assignment to unused variable prompt1
(F841)
🤖 Prompt for AI Agents
In clinical-report-generator/src/clinical_report_generator.py around lines 137
to 187, there is an unused prompt1 multi-line string defined (lines ~137–185)
that causes a dead-code warning (F841); remove the entire prompt1 declaration
block, leaving the active prompt assignment (prompt = ...) intact, and run a
quick grep to confirm no other references to prompt1 remain before committing.
| {input_data} | ||
| """ | ||
|
|
||
| prompt=prompt = f"""### SYSTEM ROLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate assignment typo.
The line contains prompt=prompt = f"""... which is a typo. This works but is confusing.
Suggested fix
- prompt=prompt = f"""### SYSTEM ROLE
+ prompt = f"""### SYSTEM ROLE📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| prompt=prompt = f"""### SYSTEM ROLE | |
| prompt = f"""### SYSTEM ROLE |
🤖 Prompt for AI Agents
In clinical-report-generator/src/clinical_report_generator.py around line 187,
there is a duplicate assignment typo `prompt=prompt = f"""...`; replace it with
a single assignment like `prompt = f"""...` (remove the extra `prompt=`) so only
one variable is assigned; ensure there are no other stray duplicate assignments
on the surrounding lines.
| if "Error" in input_data: | ||
| logger.error(f"Failed to read input data: {input_data}") | ||
| return False | ||
|
|
||
| # Step 2: Generate report | ||
| report_markdown = self.generate_clinical_analysis_report(input_data) | ||
|
|
||
| if not report_markdown or "Error" in report_markdown: | ||
| logger.error(f"Failed to generate report: {report_markdown}") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fragile error detection via string matching.
Checking "Error" in input_data and "Error" in report_markdown is fragile. If input data or API response legitimately contains "Error", this triggers false negatives. This is a symptom of the error handling pattern flagged earlier.
Consider refactoring to use exceptions for error signaling instead of magic strings.
Overview
Added a professional Clinical Analysis Report Generator module for processing DAIC analysis data and generating PDF reports using Google's Gemini API.
What's Added
Features
Testing
Tested successfully with sample DAIC data from Model-Speech module.
Dependencies
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.