Skip to content

Conversation

@jfzunigac
Copy link
Contributor

Fixed Buffer File Management

  • Removed file move operations - Upload buffer files directly instead of renaming them first, this would cause missed uploads due to buffer file naming discrepancy after restart
  • Added timestamp to filenames in order to support fixed interval scheduled uploads

Enhanced Error Handling

  • S3 upload failures now throw LogStreamWriterException for Singer to retry
  • Added backpressure protection by blocking writes when buffer is full and uploads consistently fail
  • Preserve buffer on failures - Keep data intact for retry instead of losing it

Simplified Architecture

  • Remove unnecessary flushing after each write, instead flushing in endCommit() only
  • Consolidated time-based uploads - Use a more reliable approach by scheduling fixed interval tasks to check if file should be uploaded based on the attached timestamp

Also simplified S3Writer.close() to only:

Cancel scheduled upload tasks

  1. Flush and close buffered output stream
  2. Removed S3 upload logic from shutdown path to match other writers (KafkaWriter).
  3. Removed related unit tests (testClose, testBufferFileCleanupAfterClose)
  4. Buffer file recovery on restart unchanged

This fixes a shutdown race condition with DefautlLogMonitor in which it would incorrectly report the log streams as "stuck" due to the blocking nature of S3Writer.close(). Now shutdown should be faster and consistent with other writers. Data should be safety maintained since buffer files are preserved on disk and Singer can resume uploads after restart. In addition, we can prevent spikes in uploads to S3 during daily Singer restarts.

The only drawback to this is that the chance for data loss when the instance shuts down will be higher depending on the upload interval configured by the user and the grace period for shutdown. However, this is okay as we can't 100% guarantee no data loss during shutdowns or host failures in the first place. In addition, this follows the same principles as other OSS agents like FluentBit

@jfzunigac jfzunigac requested a review from a team as a code owner October 2, 2025 22:18
@jfzunigac jfzunigac merged commit 1bc5ec6 into pinterest:master Oct 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants