fix : Optimize orchestrator logging with non-blocking async I/O #26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @sr-857
This PR addresses a critical performance bottleneck in the FlowOrchestrator where synchronous file I/O was blocking the event loop during every step of a transform scan. By moving the execution log state to memory and offloading disk writes to a background thread, we significantly reduce overhead and prevent UI/API freezes during complex workflows.
[The Problem] :->
1.) Blocking I/O: The previous implementation read and wrote the entire JSON log file synchronously for every single step update.
2.) O(N²) Complexity: As the log file grew, reading/writing it took exponentially longer, causing massive slowdowns for long-running scans.
3.) Event Loop Starvation: These blocking operations paused all other concurrent tasks in the application.
[The Solution] :->
1.) In-Memory State: The execution log is now maintained in self.execution_log_data, eliminating the need to read from disk before every update.
2.)Async Writes: File writing is now handled by asyncio.to_thread(self._write_log_to_disk), effectively unblocking the main event loop.
_update_execution_log and _finalize_execution_log are now async functions that manage the state efficiently.
1.)Performance: Drastically reduced execution time for multi-step transforms.
2.) Scalability: The orchestrator can now handle much larger workflows without degradation.
3.)Responsiveness: The application remains responsive even during heavy logging activity.