Skip to content

Conversation

@gonzalezzfelipe
Copy link
Contributor

@gonzalezzfelipe gonzalezzfelipe commented Dec 29, 2025

The idea consists on separating the index construction from the utxoset generation, as to avoid creating the indexes on each step of the bootstrapping process, similar to defer-indexes on Kupo.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added build_indexes command to rebuild UTXO indexes in configurable chunks
    • Added support for querying and iterating UTXOs from the state store
  • Improvements

    • Enhanced UTXO indexing with deferred index application for better control
    • Added performance timing instrumentation across core operations
    • Optimized state preparation workflow in export functionality

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 29, 2025

📝 Walkthrough

Walkthrough

The PR adds timing instrumentation across multiple Cardano work unit modules, refactors the StateWriter API to support deferred UTXO indexing via a boolean parameter, introduces UTXO iteration capabilities in the state store, and implements a new CLI command for building UTXO indexes in chunks with conditional state preparation in the export workflow.

Changes

Cohort / File(s) Summary
Timing Instrumentation
crates/cardano/src/estart/mod.rs, ewrap/mod.rs, rewards/mod.rs, rupd/mod.rs
Added std::time::Instant imports and elapsed time logging for work unit commits and execution phases; replaced debug logs with info-level logs that capture timing metrics.
StateWriter API Refactoring
crates/core/src/state.rs
Updated StateWriter::apply_utxoset signature to accept additional defer_indexes: bool parameter; added new index_utxoset method for separate index application.
StateStore Enhancement
crates/core/src/state.rs
Introduced UtxoIter associated type, iter_utxos(), and amount_of_utxos() methods to expose UTXO iteration and counting capabilities with EraCbor item type.
Call Site Updates
crates/core/src/batch.rs, crates/cardano/src/genesis/mod.rs, src/facade.rs
Updated apply_utxoset invocations to pass false as the new defer_indexes parameter; adjusted related method signatures.
UTXO Iteration & Indexing
crates/redb3/src/state/mod.rs, state/utxoset.rs
Implemented deferred indexing logic in apply_utxoset (conditionally applies FilterIndexes::apply); exposed UtxosIterator via public imports; changed iterator error type from StorageError to StateError; added UtxosTable::len() method; changed db_mut visibility to public.
New Build Indexes CLI Command
src/bin/dolos/doctor/build_indexes.rs, doctor/mod.rs
Added new async CLI subcommand that iterates UTXOs in configurable chunks, builds and writes/indexes a UtxoSetDelta, with progress tracking and database compaction; added BuildIndexes(build_indexes::Args) variant to command dispatcher.
Export Workflow Refactoring
src/bin/dolos/data/export.rs
Introduced prepare_state helper for conditional state compaction/integrity checks; deferred chain preparation to only execute when include_chain is true; added state preparation call within include_state branch with updated progress messaging.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Export as Export Handler
    participant StateStore as StateStore
    participant DB as Database

    CLI->>Export: run(config, args, feedback)
    
    Export->>StateStore: prepare_wal(&mut stores.wal, &pb)
    StateStore->>DB: compact & verify WAL
    DB-->>StateStore: ✓
    StateStore-->>Export: ✓
    
    alt include_chain == true
        Export->>StateStore: prepare_chain(&mut stores.chain, &pb)
        StateStore->>DB: compact & verify chain
        DB-->>StateStore: ✓
        StateStore-->>Export: ✓
    end
    
    alt include_state == true
        Export->>StateStore: prepare_state(&mut stores.state, &pb)
        StateStore->>DB: compact & verify state
        DB-->>StateStore: ✓
        StateStore-->>Export: ✓
    end
    
    Export->>CLI: "creating archive"
    CLI-->>Export: ✓
Loading
sequenceDiagram
    participant CLI as CLI Command
    participant Domain as Domain
    participant UtxoIterator as UTXO Iterator
    participant StateWriter as StateWriter
    participant DB as Database

    CLI->>Domain: initialize()
    Domain-->>CLI: domain instance
    
    CLI->>UtxoIterator: iter_utxos() chunks
    loop process chunk
        UtxoIterator->>DB: fetch UTXO batch
        DB-->>UtxoIterator: TxoRef, EraCbor
        UtxoIterator->>UtxoIterator: decode & collect
    end
    UtxoIterator-->>CLI: UtxoSetDelta
    
    CLI->>StateWriter: apply_utxoset(delta, true)
    rect rgb(200, 220, 240)
        Note over StateWriter: defer_indexes=true<br/>(skip immediate indexing)
    end
    StateWriter->>DB: write UTXO updates
    DB-->>StateWriter: ✓
    StateWriter-->>CLI: ✓
    
    CLI->>StateWriter: index_utxoset(delta)
    StateWriter->>DB: build & apply indexes
    DB-->>StateWriter: ✓
    StateWriter-->>CLI: ✓
    
    CLI->>DB: compact()
    DB-->>CLI: ✓
    
    CLI->>DB: integrity_check()
    DB-->>CLI: ✓ / error with context
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested reviewers

  • scarmuega

Poem

🐰 Hops through code with indexed delight,
Timing stamps now glow so bright!
UTXO chunks in batches dance,
Deferred indexes at their chance,
State preparation's graceful flow,
A rabbit's work! Watch databases grow. 🌿✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: introducing a separate command for building UTXO indexes rather than building them during utxoset generation.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gonzalezzfelipe gonzalezzfelipe marked this pull request as draft December 29, 2025 20:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
src/bin/dolos/data/export.rs (1)

54-66: Consider handling the None case from db_mut() more gracefully.

The unwrap() on line 58 could panic if there are multiple references to the Arc<Database>. While prepare_wal uses the same pattern, this could cause confusing runtime errors. Consider returning an error instead.

🔎 Proposed fix
 fn prepare_state(
     state: &mut dolos_redb3::state::StateStore,
     pb: &crate::feedback::ProgressBar,
 ) -> miette::Result<()> {
-    let db = state.db_mut().unwrap();
+    let db = state.db_mut().ok_or_else(|| miette::miette!("cannot get exclusive access to state database"))?;
     pb.set_message("compacting state");
     db.compact().into_diagnostic()?;
src/bin/dolos/doctor/build_indexes.rs (2)

19-19: Remove commented-out code.

Line 19 contains commented-out code (//crate::common::setup_tracing(&config.logging)?;). If tracing setup is not needed here, remove the line entirely. If it's needed later, track it with a TODO or issue.


50-56: Handle the None case from db_mut() gracefully.

Similar to the issue in export.rs, the unwrap() on line 50 could panic if there are multiple references to the database Arc. Consider returning an error with context.

🔎 Proposed fix
-    let db = domain.state.db_mut().unwrap();
+    let db = domain.state.db_mut()
+        .ok_or_else(|| miette::miette!("cannot get exclusive access to state database"))?;
crates/redb3/src/state/mod.rs (1)

216-224: Fix inconsistent indentation.

Lines 219-221 have incorrect indentation. The if !defer_indexes block content should be indented consistently.

🔎 Proposed fix
     fn apply_utxoset(&self, delta: &dolos_core::UtxoSetDelta, defer_indexes: bool) -> Result<(), StateError> {
         utxoset::UtxosTable::apply(&self.wx, delta)?;

         if !defer_indexes {
-        utxoset::FilterIndexes::apply(&self.wx, delta)?;
+            utxoset::FilterIndexes::apply(&self.wx, delta)?;
         }

         Ok(())
     }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb64e93 and c7fde18.

📒 Files selected for processing (13)
  • crates/cardano/src/estart/mod.rs
  • crates/cardano/src/ewrap/mod.rs
  • crates/cardano/src/genesis/mod.rs
  • crates/cardano/src/rewards/mod.rs
  • crates/cardano/src/rupd/mod.rs
  • crates/core/src/batch.rs
  • crates/core/src/state.rs
  • crates/redb3/src/state/mod.rs
  • crates/redb3/src/state/utxoset.rs
  • src/bin/dolos/data/export.rs
  • src/bin/dolos/doctor/build_indexes.rs
  • src/bin/dolos/doctor/mod.rs
  • src/facade.rs
🧰 Additional context used
🧬 Code graph analysis (7)
crates/cardano/src/ewrap/mod.rs (1)
crates/cardano/src/ewrap/loading.rs (12)
  • boundary (272-272)
  • boundary (274-274)
  • boundary (276-276)
  • boundary (278-278)
  • load (244-281)
  • state (42-42)
  • state (89-89)
  • state (121-121)
  • state (151-151)
  • state (173-173)
  • state (185-185)
  • state (201-201)
src/bin/dolos/doctor/build_indexes.rs (1)
src/bin/dolos/common.rs (1)
  • setup_domain (133-166)
crates/cardano/src/genesis/mod.rs (1)
crates/cardano/src/utxoset.rs (1)
  • build_custom_utxos_delta (163-179)
src/bin/dolos/doctor/mod.rs (1)
src/bin/dolos/doctor/build_indexes.rs (1)
  • run (18-59)
crates/redb3/src/state/utxoset.rs (3)
crates/redb3/src/state/mod.rs (2)
  • next (28-38)
  • next (46-55)
crates/core/src/state.rs (1)
  • next (224-234)
crates/core/src/archive.rs (1)
  • next (133-143)
crates/core/src/state.rs (1)
crates/redb3/src/state/mod.rs (4)
  • apply_utxoset (216-224)
  • index_utxoset (226-230)
  • iter_utxos (318-322)
  • amount_of_utxos (324-328)
crates/redb3/src/state/mod.rs (1)
crates/redb3/src/state/utxoset.rs (4)
  • apply (81-107)
  • apply (278-362)
  • iter (53-57)
  • len (59-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Check Build
🔇 Additional comments (20)
crates/cardano/src/rupd/mod.rs (1)

1-1: LGTM! Timing instrumentation added correctly.

The timing instrumentation follows a consistent pattern across the codebase and is appropriately placed to measure both log_work and execute durations.

Also applies to: 114-114, 164-165, 176-176, 188-189

crates/cardano/src/ewrap/mod.rs (1)

3-3: LGTM! Timing instrumentation added correctly.

The timing instrumentation is consistent with the pattern used in other modules (rupd, estart, rewards) and appropriately measures the EWRAP work unit execution time.

Also applies to: 10-10, 159-159, 166-166

crates/cardano/src/rewards/mod.rs (1)

1-1: LGTM! Timing instrumentation added correctly.

The timing instrumentation follows the same pattern as other modules and appropriately measures the rewards calculation duration.

Also applies to: 334-334, 423-424

crates/cardano/src/estart/mod.rs (1)

1-1: LGTM! Timing instrumentation added correctly.

The timing instrumentation is consistent with the pattern used across other work unit modules and appropriately measures the ESTART work unit execution time.

Also applies to: 6-6, 107-107, 114-114

crates/core/src/batch.rs (1)

301-301: LGTM! API usage updated correctly.

The call to apply_utxoset has been updated to pass the new defer_indexes boolean parameter (set to false), which maintains the existing behavior of building indexes immediately during UTXO application.

crates/cardano/src/genesis/mod.rs (1)

121-121: LGTM! API usage updated correctly.

Both calls to apply_utxoset have been updated to pass the new defer_indexes boolean parameter (set to false), maintaining the existing behavior of building indexes immediately during genesis/bootstrap.

Also applies to: 124-124

src/facade.rs (1)

64-64: LGTM! API usage updated correctly.

The call to apply_utxoset has been updated to pass the new defer_indexes boolean parameter (set to false), maintaining the existing behavior of building indexes immediately during rollback operations.

src/bin/dolos/doctor/mod.rs (1)

6-6: LGTM! New command integrated correctly.

The new BuildIndexes command is properly integrated into the doctor module structure, following the same pattern as existing commands.

Also applies to: 49-49

crates/redb3/src/state/utxoset.rs (3)

1-1: LGTM!

The import addition for StateError aligns with the error handling changes in the iterator.


21-39: LGTM!

The UtxosIterator changes correctly map redb errors to StateError::InternalStoreError, maintaining consistency with the error handling patterns used in other iterators in the codebase (e.g., EntityIter in mod.rs).


53-62: LGTM!

The removal of #[allow(unused)] on iter and the addition of len are straightforward and enable the new UTXO iteration and counting capabilities needed by the build-indexes command.

crates/core/src/state.rs (3)

6-6: LGTM!

Adding EraCbor to the public imports is necessary for the new UtxoIter associated type.


319-323: LGTM!

The new UtxoIter associated type and related methods (iter_utxos, amount_of_utxos) are well-designed and follow the established patterns in the trait.


403-406: LGTM!

The new methods iter_utxos and amount_of_utxos complete the trait interface needed for the build-indexes command.

crates/redb3/src/state/mod.rs (6)

4-4: LGTM!

The import addition for UtxoSet is consistent with the module's usage.


15-15: LGTM!

Importing UtxosIterator enables the new UtxoIter associated type for the StateStore implementation.


115-117: Visibility increased from pub(crate) to pub.

The db_mut method is now public to allow external callers (like the build-indexes command) to perform compaction and integrity checks. This expands the public API surface.

Ensure this visibility change is intentional and documented, as it exposes internal database access to external consumers.


226-230: LGTM!

The index_utxoset implementation correctly delegates to FilterIndexes::apply, enabling separate index construction as designed.


240-240: LGTM!

Setting UtxoIter = UtxosIterator correctly implements the new associated type from the trait.


318-328: LGTM!

The implementations of iter_utxos and amount_of_utxos are straightforward and correctly delegate to the underlying table operations.

Comment on lines +278 to +281
fn apply_utxoset(&self, delta: &UtxoSetDelta, defer_indexes: bool) -> Result<(), StateError>;

/// Apply utxoset delta to indexes only.
fn index_utxoset(&self, delta: &UtxoSetDelta) -> Result<(), StateError>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n "apply_utxoset\(" --type rust -A 2

Repository: txpipe/dolos

Length of output: 1950


Breaking API change incomplete: apply_utxoset requires defer_indexes parameter, but 3 call sites were not updated.

The signature change from apply_utxoset(&self, delta: &UtxoSetDelta) to apply_utxoset(&self, delta: &UtxoSetDelta, defer_indexes: bool) is a breaking change. However, the following call sites were not updated:

  • crates/testing/src/toy_domain.rs:20
  • crates/testing/src/toy_domain.rs:130
  • crates/redb3/src/state/utxoset.rs:456

All these calls still use the old signature without the defer_indexes parameter. Update them to pass the parameter (typically false based on other call sites).

🤖 Prompt for AI Agents
In crates/core/src/state.rs around lines 278-281 the method signature for
apply_utxoset was changed to add a defer_indexes: bool parameter, but three call
sites still use the old signature; update each call to pass the new boolean
argument (use false where other call sites use false): modify
crates/testing/src/toy_domain.rs at lines ~20 and ~130 to call
apply_utxoset(delta, false) and update crates/redb3/src/state/utxoset.rs at ~456
to call apply_utxoset(delta, false) so the calls match the new signature and
compile.

Comment on lines 102 to 111
if args.include_state {
prepare_state(&mut stores.state, &pb)?;
let path = root.join("state");

archive
.append_path_with_name(&path, "state")
.into_diagnostic()?;

pb.set_message("creating archive");
pb.set_message("creating state");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Progress message is set after the operation completes.

The message "creating state" is set on line 110 after the archive append operation has already completed. This provides misleading feedback to the user. Consider setting the message before the operation begins, similar to how prepare_state sets its messages before the operations.

🔎 Proposed fix
     if args.include_state {
+        pb.set_message("creating state");
         prepare_state(&mut stores.state, &pb)?;
         let path = root.join("state");

         archive
             .append_path_with_name(&path, "state")
             .into_diagnostic()?;
-
-        pb.set_message("creating state");
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if args.include_state {
prepare_state(&mut stores.state, &pb)?;
let path = root.join("state");
archive
.append_path_with_name(&path, "state")
.into_diagnostic()?;
pb.set_message("creating archive");
pb.set_message("creating state");
}
if args.include_state {
pb.set_message("creating state");
prepare_state(&mut stores.state, &pb)?;
let path = root.join("state");
archive
.append_path_with_name(&path, "state")
.into_diagnostic()?;
}
🤖 Prompt for AI Agents
In src/bin/dolos/data/export.rs around lines 102 to 111, the progress message
"creating state" is being set after the archive.append_path_with_name call,
which gives misleading feedback; move the pb.set_message("creating state") to
occur before calling archive.append_path_with_name(&path, "state") (i.e., set
the progress message immediately after preparing the path and before starting
the append operation) so the user sees the message while the archive operation
runs.

Comment on lines +34 to +48
for chunk in remaining.chunks(args.chunk).into_iter() {
let produced_utxo = chunk.into_iter().map(|x| {
let (k, v) = x.into_diagnostic().context("decoding utxoset")?;
Ok((k, Arc::new(v)))
}).collect::<miette::Result<_>>()?;
let utxoset = UtxoSetDelta {produced_utxo, ..Default::default()};

{
let writer = domain.state.start_writer().into_diagnostic().context("starting writer")?;
writer.index_utxoset(&utxoset).into_diagnostic().context("indexing")?;
writer.commit().into_diagnostic().context("committing")?;
}

progress.inc(args.chunk as u64);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Progress increment may over-count on the last chunk.

progress.inc(args.chunk as u64) on line 47 always increments by the configured chunk size, but the last chunk may contain fewer items. This causes the progress bar to potentially exceed 100% or show inaccurate progress.

🔎 Proposed fix
     for chunk in remaining.chunks(args.chunk).into_iter() {
-        let produced_utxo = chunk.into_iter().map(|x| {
+        let chunk_items: Vec<_> = chunk.collect();
+        let chunk_len = chunk_items.len();
+        let produced_utxo = chunk_items.into_iter().map(|x| {
             let (k, v) = x.into_diagnostic().context("decoding utxoset")?;
             Ok((k, Arc::new(v)))
         }).collect::<miette::Result<_>>()?;
         let utxoset = UtxoSetDelta {produced_utxo, ..Default::default()};

         { 
             let writer = domain.state.start_writer().into_diagnostic().context("starting writer")?;
             writer.index_utxoset(&utxoset).into_diagnostic().context("indexing")?;
             writer.commit().into_diagnostic().context("committing")?; 
         }

-        progress.inc(args.chunk as u64);
+        progress.inc(chunk_len as u64);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for chunk in remaining.chunks(args.chunk).into_iter() {
let produced_utxo = chunk.into_iter().map(|x| {
let (k, v) = x.into_diagnostic().context("decoding utxoset")?;
Ok((k, Arc::new(v)))
}).collect::<miette::Result<_>>()?;
let utxoset = UtxoSetDelta {produced_utxo, ..Default::default()};
{
let writer = domain.state.start_writer().into_diagnostic().context("starting writer")?;
writer.index_utxoset(&utxoset).into_diagnostic().context("indexing")?;
writer.commit().into_diagnostic().context("committing")?;
}
progress.inc(args.chunk as u64);
}
for chunk in remaining.chunks(args.chunk).into_iter() {
let produced_utxo = chunk.into_iter().map(|x| {
let (k, v) = x.into_diagnostic().context("decoding utxoset")?;
Ok((k, Arc::new(v)))
}).collect::<miette::Result<_>>()?;
let utxoset = UtxoSetDelta {produced_utxo, ..Default::default()};
{
let writer = domain.state.start_writer().into_diagnostic().context("starting writer")?;
writer.index_utxoset(&utxoset).into_diagnostic().context("indexing")?;
writer.commit().into_diagnostic().context("committing")?;
}
progress.inc(chunk.len() as u64);
}
🤖 Prompt for AI Agents
In src/bin/dolos/doctor/build_indexes.rs around lines 34 to 48, the progress bar
is always incremented by args.chunk which overcounts when the final chunk is
smaller; compute the actual number of items in the current chunk (e.g., take
chunk.len() before consuming it) and call progress.inc with that actual count
(as u64) instead of args.chunk so the progress reflects the true number of
processed items.

Comment on lines +37 to +38
/// catch up store data from WAL records
BuildIndexes(build_indexes::Args),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the command description.

The comment on line 37 says "catch up store data from WAL records" which appears to be copied from the CatchupStores command above. This should be updated to accurately describe the BuildIndexes command's purpose.

🔎 Proposed fix for the description
-    /// catch up store data from WAL records
+    /// build indexes for the UTXO set
     BuildIndexes(build_indexes::Args),
🤖 Prompt for AI Agents
In src/bin/dolos/doctor/mod.rs around lines 37-38, the doc comment for the
BuildIndexes variant incorrectly repeats "catch up store data from WAL records";
update that comment to accurately describe BuildIndexes (for example: "Build or
rebuild persistent indexes from store data" or "Build indexes for the
store/repositories"), replacing the copied CatchupStores description so it
clearly reflects the command's purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants