Skip to content

Conversation

@jvsena42
Copy link
Member

@jvsena42 jvsena42 commented Jan 11, 2026

Description

Problem
Memory corruption crash during wallet restore:

malloc: Incorrect checksum for freed object 0x1013da8d8: probably modified after being freed.
The crash occurs when multiple concurrent tasks call VssBackupClient.awaitSetup() simultaneously.

Root Cause
The awaitSetup() method has a race condition. When multiple tasks call it concurrently:

Task A checks if let existingSetup = isSetup → nil → proceeds to create setup task
Task B checks if let existingSetup = isSetup → nil (Task A hasn't assigned yet) → also proceeds
Both tasks create separate setupTask instances
Both call setup() → double initialization of Rust FFI → memory corruption

The logs show VSS setup being called 7 times concurrently:

DEBUG: VSS client setting up… - VssBackupClient [VssBackupClient.swift: setup(walletIndex:) line: 18]

Solution:
Use a Swift actor to ensure thread-safe setup coordination. The actor isolates the setupTask state, ensuring only one setup runs at a time.

Linked Issues/Tasks

N/A

Screenshot / Video

To test:

  • Restore a wallet
  • Should complete without crash
  • Check logs show "VSS client setting up" twice:
    • Initial setup when the app starts the restore flow
    • Setup after reset() is called before performing the full restore (to ensure a clean state)

…etup.Add VssSetupCoordinator actor to ensure thread-safe VSS client setup.

The race condition occurred when multiple concurrent tasks called
awaitSetup() simultaneously, causing double initialization of the
Rust FFI and memory corruption
@jvsena42 jvsena42 self-assigned this Jan 11, 2026
@jvsena42 jvsena42 requested a review from Copilot January 11, 2026 18:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical race condition in the VSS backup client that caused memory corruption during concurrent wallet restore operations. The issue occurred when multiple tasks simultaneously called awaitSetup(), leading to double initialization of the Rust FFI layer.

Changes:

  • Introduced VssSetupCoordinator actor to ensure thread-safe, single-instance setup coordination
  • Refactored awaitSetup() to delegate setup orchestration to the coordinator
  • Updated reset() method to work asynchronously with the new actor-based coordination

@ovitrif
Copy link
Collaborator

ovitrif commented Jan 11, 2026

Makes perfect sense to use actor here. The reason I didn't in Android is because Kotlin doesn't have it at all and the CompletableDefferred is quite nice in there to solve for lack of actor model in its concurrency.

But the proper way to mirror that code in Swift is likely with actor.

@jvsena42
Copy link
Member Author

OBS: The crash only happened to me one time, couldn't reproduce it again

@jvsena42 jvsena42 requested review from ovitrif and pwltr January 11, 2026 21:05
@piotr-iohk
Copy link
Collaborator

Note: Observed a crash today myself (not on this branch ofc, but also couldn't reproduce) and problem report analysis shown:

Root Cause: Concurrent Backup Operations During Migration
Looking at other threads at crash time:
  Thread 1: Encoding ActivityBackupV1 → Activity → OnchainActivity to JSON
  Thread 4: Writing to VSS backup (vssStore)
  Thread 5: Encoding BlocktankBackupV1 → IBtInfo → ILspNode to JSON
All these are triggered by BackupService.triggerBackup(category:).
The crash happens because multiple backup operations run concurrently, and they're calling:
BackupService.getBackupDataBytes(category:)

Hopefully this will fix the problem 🤞

Copy link
Collaborator

@ovitrif ovitrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utAck

Testing (sanity migration check)

Copy link
Contributor

@pwltr pwltr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, looks good

@ovitrif
Copy link
Collaborator

ovitrif commented Jan 12, 2026

@jvsena42 what are "integration-tests" and why they keep failing? 🤔

Copy link
Collaborator

@ovitrif ovitrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tAck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants