Skip to content

Conversation

@haoruizhou
Copy link
Collaborator

No description provided.

Optimize file uploader for large file streaming

Improves memory efficiency and scalability of the CANInfluxStreamer by batching CSV row processing, introducing queue size limits, and adding configuration for large file uploads. Adds an option to skip or estimate row counting for very large files, preventing memory buildup. Updates environment variable usage and webhook URL, and adds python-dotenv to requirements.

new dbc

add async write tracking and adaptive rate limiting
Implements disk-based processing for large zip uploads in the file-uploader to avoid memory issues
@haoruizhou haoruizhou requested a review from Copilot October 1, 2025 18:01
@haoruizhou haoruizhou merged commit 987b09c into main Oct 1, 2025
3 of 4 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes token extraction complexity and migrates to a simplified preset token approach for the DAQ system. The change eliminates shell script dependencies and simplifies deployment by using a predefined InfluxDB admin token instead of dynamically extracting tokens during startup.

  • Removed token extraction scripts and logic from the startup process
  • Added preset token configuration via INFLUXDB_ADMIN_TOKEN environment variable
  • Updated all services to use the new preset token approach with fallback defaults

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
installer/scripts/start-daq-system.sh Removed token extraction steps, simplified to direct docker-compose startup
installer/scripts/migrate-to-preset-token.sh Added migration script to help users transition from old token extraction method
installer/scripts/extract-influx-token.sh Completely removed token extraction script
installer/file-uploader/helper.py Updated to use INFLUXDB_TOKEN env var and added disk-based processing for large files
installer/file-uploader/app.py Changed token variable name and improved file processing logic
installer/docker-compose.yml Updated all services to use INFLUXDB_ADMIN_TOKEN with fallback defaults
installer/.env.example Added INFLUXDB_ADMIN_TOKEN configuration with security warning
installer/SIMPLIFIED_SETUP.md Added comprehensive documentation for the new preset token approach
Comments suppressed due to low confidence (2)

installer/file-uploader/helper.py:1

  • Logic error when total is 0. If total is 0 (which can happen when progress counting is disabled), the condition sent >= total will be sent >= 0 which is always true for non-negative values. This could cause premature completion notifications. Should check if total > 0 and sent >= total and not PROGRESS[task_id].get('done'):
import zipfile, csv, io, time, asyncio, tempfile, subprocess, shutil

installer/docker-compose.yml:1

  • Logging sensitive information like tokens and passwords to console output. These values could appear in logs, terminal history, or CI/CD output. Consider masking these values or providing instructions to view them securely.
networks:

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


self.client = InfluxDBClient(
url=self.url, token=os.getenv("TOKEN") or "", org=self.org, enable_gzip=True
url=self.url, token=os.getenv("INFLUXDB_TOKEN") or "", org=self.org, enable_gzip=True
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent environment variable usage. The code uses INFLUXDB_TOKEN but other parts of the system use INFLUXDB_ADMIN_TOKEN. This should be os.getenv('INFLUXDB_TOKEN') or os.getenv('INFLUXDB_ADMIN_TOKEN') or '' to maintain compatibility or consistently use one variable name.

Suggested change
url=self.url, token=os.getenv("INFLUXDB_TOKEN") or "", org=self.org, enable_gzip=True
url=self.url, token=os.getenv("INFLUXDB_TOKEN") or os.getenv("INFLUXDB_ADMIN_TOKEN") or "", org=self.org, enable_gzip=True

Copilot uses AI. Check for mistakes.
Comment on lines +740 to +741
record=batch_points,
_callback_id=callback_id # Custom attribute for tracking
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write method likely doesn't accept a _callback_id parameter. This appears to be a custom attribute that won't be passed through the InfluxDB client's write method. The callback tracking mechanism may not work as intended.

Suggested change
record=batch_points,
_callback_id=callback_id # Custom attribute for tracking
record=batch_points

Copilot uses AI. Check for mistakes.
self._consecutive_failures = 0

# Remove from pending callbacks and update progress
callback_id = getattr(conf, '_callback_id', None)
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting to get _callback_id from conf object, but this attribute was never set on the configuration object in the write method. This will always return None, making the callback tracking mechanism non-functional.

Copilot uses AI. Check for mistakes.
Comment on lines +791 to +792

print(f"📊 Configured for {estimated_size_mb:.1f}MB file: batch_size={self.batch_size}, "
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This print statement is inside the configure_for_file_size method but only executes for very large files (1GB+). The debug information would be helpful for all file size configurations. Consider moving this print statement outside the conditional or adding it to all branches.

Suggested change
print(f"📊 Configured for {estimated_size_mb:.1f}MB file: batch_size={self.batch_size}, "
print(f"📊 Configured for {estimated_size_mb:.1f}MB file: batch_size={self.batch_size}, "

Copilot uses AI. Check for mistakes.
@haoruizhou haoruizhou deleted the linux-functional branch October 10, 2025 03:38
haoruizhou added a commit that referenced this pull request Nov 13, 2025
Removed token extraction, now using pre-defined token
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants