-
Notifications
You must be signed in to change notification settings - Fork 47
experiment: build indexes for utxoset with a different command #825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe PR adds timing instrumentation across multiple Cardano work unit modules, refactors the StateWriter API to support deferred UTXO indexing via a boolean parameter, introduces UTXO iteration capabilities in the state store, and implements a new CLI command for building UTXO indexes in chunks with conditional state preparation in the export workflow. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant Export as Export Handler
participant StateStore as StateStore
participant DB as Database
CLI->>Export: run(config, args, feedback)
Export->>StateStore: prepare_wal(&mut stores.wal, &pb)
StateStore->>DB: compact & verify WAL
DB-->>StateStore: ✓
StateStore-->>Export: ✓
alt include_chain == true
Export->>StateStore: prepare_chain(&mut stores.chain, &pb)
StateStore->>DB: compact & verify chain
DB-->>StateStore: ✓
StateStore-->>Export: ✓
end
alt include_state == true
Export->>StateStore: prepare_state(&mut stores.state, &pb)
StateStore->>DB: compact & verify state
DB-->>StateStore: ✓
StateStore-->>Export: ✓
end
Export->>CLI: "creating archive"
CLI-->>Export: ✓
sequenceDiagram
participant CLI as CLI Command
participant Domain as Domain
participant UtxoIterator as UTXO Iterator
participant StateWriter as StateWriter
participant DB as Database
CLI->>Domain: initialize()
Domain-->>CLI: domain instance
CLI->>UtxoIterator: iter_utxos() chunks
loop process chunk
UtxoIterator->>DB: fetch UTXO batch
DB-->>UtxoIterator: TxoRef, EraCbor
UtxoIterator->>UtxoIterator: decode & collect
end
UtxoIterator-->>CLI: UtxoSetDelta
CLI->>StateWriter: apply_utxoset(delta, true)
rect rgb(200, 220, 240)
Note over StateWriter: defer_indexes=true<br/>(skip immediate indexing)
end
StateWriter->>DB: write UTXO updates
DB-->>StateWriter: ✓
StateWriter-->>CLI: ✓
CLI->>StateWriter: index_utxoset(delta)
StateWriter->>DB: build & apply indexes
DB-->>StateWriter: ✓
StateWriter-->>CLI: ✓
CLI->>DB: compact()
DB-->>CLI: ✓
CLI->>DB: integrity_check()
DB-->>CLI: ✓ / error with context
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
src/bin/dolos/data/export.rs (1)
54-66: Consider handling theNonecase fromdb_mut()more gracefully.The
unwrap()on line 58 could panic if there are multiple references to theArc<Database>. Whileprepare_waluses the same pattern, this could cause confusing runtime errors. Consider returning an error instead.🔎 Proposed fix
fn prepare_state( state: &mut dolos_redb3::state::StateStore, pb: &crate::feedback::ProgressBar, ) -> miette::Result<()> { - let db = state.db_mut().unwrap(); + let db = state.db_mut().ok_or_else(|| miette::miette!("cannot get exclusive access to state database"))?; pb.set_message("compacting state"); db.compact().into_diagnostic()?;src/bin/dolos/doctor/build_indexes.rs (2)
19-19: Remove commented-out code.Line 19 contains commented-out code (
//crate::common::setup_tracing(&config.logging)?;). If tracing setup is not needed here, remove the line entirely. If it's needed later, track it with a TODO or issue.
50-56: Handle theNonecase fromdb_mut()gracefully.Similar to the issue in
export.rs, theunwrap()on line 50 could panic if there are multiple references to the database Arc. Consider returning an error with context.🔎 Proposed fix
- let db = domain.state.db_mut().unwrap(); + let db = domain.state.db_mut() + .ok_or_else(|| miette::miette!("cannot get exclusive access to state database"))?;crates/redb3/src/state/mod.rs (1)
216-224: Fix inconsistent indentation.Lines 219-221 have incorrect indentation. The
if !defer_indexesblock content should be indented consistently.🔎 Proposed fix
fn apply_utxoset(&self, delta: &dolos_core::UtxoSetDelta, defer_indexes: bool) -> Result<(), StateError> { utxoset::UtxosTable::apply(&self.wx, delta)?; if !defer_indexes { - utxoset::FilterIndexes::apply(&self.wx, delta)?; + utxoset::FilterIndexes::apply(&self.wx, delta)?; } Ok(()) }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
crates/cardano/src/estart/mod.rscrates/cardano/src/ewrap/mod.rscrates/cardano/src/genesis/mod.rscrates/cardano/src/rewards/mod.rscrates/cardano/src/rupd/mod.rscrates/core/src/batch.rscrates/core/src/state.rscrates/redb3/src/state/mod.rscrates/redb3/src/state/utxoset.rssrc/bin/dolos/data/export.rssrc/bin/dolos/doctor/build_indexes.rssrc/bin/dolos/doctor/mod.rssrc/facade.rs
🧰 Additional context used
🧬 Code graph analysis (7)
crates/cardano/src/ewrap/mod.rs (1)
crates/cardano/src/ewrap/loading.rs (12)
boundary(272-272)boundary(274-274)boundary(276-276)boundary(278-278)load(244-281)state(42-42)state(89-89)state(121-121)state(151-151)state(173-173)state(185-185)state(201-201)
src/bin/dolos/doctor/build_indexes.rs (1)
src/bin/dolos/common.rs (1)
setup_domain(133-166)
crates/cardano/src/genesis/mod.rs (1)
crates/cardano/src/utxoset.rs (1)
build_custom_utxos_delta(163-179)
src/bin/dolos/doctor/mod.rs (1)
src/bin/dolos/doctor/build_indexes.rs (1)
run(18-59)
crates/redb3/src/state/utxoset.rs (3)
crates/redb3/src/state/mod.rs (2)
next(28-38)next(46-55)crates/core/src/state.rs (1)
next(224-234)crates/core/src/archive.rs (1)
next(133-143)
crates/core/src/state.rs (1)
crates/redb3/src/state/mod.rs (4)
apply_utxoset(216-224)index_utxoset(226-230)iter_utxos(318-322)amount_of_utxos(324-328)
crates/redb3/src/state/mod.rs (1)
crates/redb3/src/state/utxoset.rs (4)
apply(81-107)apply(278-362)iter(53-57)len(59-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Check Build
🔇 Additional comments (20)
crates/cardano/src/rupd/mod.rs (1)
1-1: LGTM! Timing instrumentation added correctly.The timing instrumentation follows a consistent pattern across the codebase and is appropriately placed to measure both
log_workandexecutedurations.Also applies to: 114-114, 164-165, 176-176, 188-189
crates/cardano/src/ewrap/mod.rs (1)
3-3: LGTM! Timing instrumentation added correctly.The timing instrumentation is consistent with the pattern used in other modules (rupd, estart, rewards) and appropriately measures the EWRAP work unit execution time.
Also applies to: 10-10, 159-159, 166-166
crates/cardano/src/rewards/mod.rs (1)
1-1: LGTM! Timing instrumentation added correctly.The timing instrumentation follows the same pattern as other modules and appropriately measures the rewards calculation duration.
Also applies to: 334-334, 423-424
crates/cardano/src/estart/mod.rs (1)
1-1: LGTM! Timing instrumentation added correctly.The timing instrumentation is consistent with the pattern used across other work unit modules and appropriately measures the ESTART work unit execution time.
Also applies to: 6-6, 107-107, 114-114
crates/core/src/batch.rs (1)
301-301: LGTM! API usage updated correctly.The call to
apply_utxosethas been updated to pass the newdefer_indexesboolean parameter (set tofalse), which maintains the existing behavior of building indexes immediately during UTXO application.crates/cardano/src/genesis/mod.rs (1)
121-121: LGTM! API usage updated correctly.Both calls to
apply_utxosethave been updated to pass the newdefer_indexesboolean parameter (set tofalse), maintaining the existing behavior of building indexes immediately during genesis/bootstrap.Also applies to: 124-124
src/facade.rs (1)
64-64: LGTM! API usage updated correctly.The call to
apply_utxosethas been updated to pass the newdefer_indexesboolean parameter (set tofalse), maintaining the existing behavior of building indexes immediately during rollback operations.src/bin/dolos/doctor/mod.rs (1)
6-6: LGTM! New command integrated correctly.The new
BuildIndexescommand is properly integrated into the doctor module structure, following the same pattern as existing commands.Also applies to: 49-49
crates/redb3/src/state/utxoset.rs (3)
1-1: LGTM!The import addition for
StateErroraligns with the error handling changes in the iterator.
21-39: LGTM!The
UtxosIteratorchanges correctly map redb errors toStateError::InternalStoreError, maintaining consistency with the error handling patterns used in other iterators in the codebase (e.g.,EntityIterinmod.rs).
53-62: LGTM!The removal of
#[allow(unused)]oniterand the addition oflenare straightforward and enable the new UTXO iteration and counting capabilities needed by thebuild-indexescommand.crates/core/src/state.rs (3)
6-6: LGTM!Adding
EraCborto the public imports is necessary for the newUtxoIterassociated type.
319-323: LGTM!The new
UtxoIterassociated type and related methods (iter_utxos,amount_of_utxos) are well-designed and follow the established patterns in the trait.
403-406: LGTM!The new methods
iter_utxosandamount_of_utxoscomplete the trait interface needed for thebuild-indexescommand.crates/redb3/src/state/mod.rs (6)
4-4: LGTM!The import addition for
UtxoSetis consistent with the module's usage.
15-15: LGTM!Importing
UtxosIteratorenables the newUtxoIterassociated type for theStateStoreimplementation.
115-117: Visibility increased frompub(crate)topub.The
db_mutmethod is now public to allow external callers (like thebuild-indexescommand) to perform compaction and integrity checks. This expands the public API surface.Ensure this visibility change is intentional and documented, as it exposes internal database access to external consumers.
226-230: LGTM!The
index_utxosetimplementation correctly delegates toFilterIndexes::apply, enabling separate index construction as designed.
240-240: LGTM!Setting
UtxoIter = UtxosIteratorcorrectly implements the new associated type from the trait.
318-328: LGTM!The implementations of
iter_utxosandamount_of_utxosare straightforward and correctly delegate to the underlying table operations.
| fn apply_utxoset(&self, delta: &UtxoSetDelta, defer_indexes: bool) -> Result<(), StateError>; | ||
|
|
||
| /// Apply utxoset delta to indexes only. | ||
| fn index_utxoset(&self, delta: &UtxoSetDelta) -> Result<(), StateError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n "apply_utxoset\(" --type rust -A 2Repository: txpipe/dolos
Length of output: 1950
Breaking API change incomplete: apply_utxoset requires defer_indexes parameter, but 3 call sites were not updated.
The signature change from apply_utxoset(&self, delta: &UtxoSetDelta) to apply_utxoset(&self, delta: &UtxoSetDelta, defer_indexes: bool) is a breaking change. However, the following call sites were not updated:
crates/testing/src/toy_domain.rs:20crates/testing/src/toy_domain.rs:130crates/redb3/src/state/utxoset.rs:456
All these calls still use the old signature without the defer_indexes parameter. Update them to pass the parameter (typically false based on other call sites).
🤖 Prompt for AI Agents
In crates/core/src/state.rs around lines 278-281 the method signature for
apply_utxoset was changed to add a defer_indexes: bool parameter, but three call
sites still use the old signature; update each call to pass the new boolean
argument (use false where other call sites use false): modify
crates/testing/src/toy_domain.rs at lines ~20 and ~130 to call
apply_utxoset(delta, false) and update crates/redb3/src/state/utxoset.rs at ~456
to call apply_utxoset(delta, false) so the calls match the new signature and
compile.
| if args.include_state { | ||
| prepare_state(&mut stores.state, &pb)?; | ||
| let path = root.join("state"); | ||
|
|
||
| archive | ||
| .append_path_with_name(&path, "state") | ||
| .into_diagnostic()?; | ||
|
|
||
| pb.set_message("creating archive"); | ||
| pb.set_message("creating state"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress message is set after the operation completes.
The message "creating state" is set on line 110 after the archive append operation has already completed. This provides misleading feedback to the user. Consider setting the message before the operation begins, similar to how prepare_state sets its messages before the operations.
🔎 Proposed fix
if args.include_state {
+ pb.set_message("creating state");
prepare_state(&mut stores.state, &pb)?;
let path = root.join("state");
archive
.append_path_with_name(&path, "state")
.into_diagnostic()?;
-
- pb.set_message("creating state");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if args.include_state { | |
| prepare_state(&mut stores.state, &pb)?; | |
| let path = root.join("state"); | |
| archive | |
| .append_path_with_name(&path, "state") | |
| .into_diagnostic()?; | |
| pb.set_message("creating archive"); | |
| pb.set_message("creating state"); | |
| } | |
| if args.include_state { | |
| pb.set_message("creating state"); | |
| prepare_state(&mut stores.state, &pb)?; | |
| let path = root.join("state"); | |
| archive | |
| .append_path_with_name(&path, "state") | |
| .into_diagnostic()?; | |
| } |
🤖 Prompt for AI Agents
In src/bin/dolos/data/export.rs around lines 102 to 111, the progress message
"creating state" is being set after the archive.append_path_with_name call,
which gives misleading feedback; move the pb.set_message("creating state") to
occur before calling archive.append_path_with_name(&path, "state") (i.e., set
the progress message immediately after preparing the path and before starting
the append operation) so the user sees the message while the archive operation
runs.
| for chunk in remaining.chunks(args.chunk).into_iter() { | ||
| let produced_utxo = chunk.into_iter().map(|x| { | ||
| let (k, v) = x.into_diagnostic().context("decoding utxoset")?; | ||
| Ok((k, Arc::new(v))) | ||
| }).collect::<miette::Result<_>>()?; | ||
| let utxoset = UtxoSetDelta {produced_utxo, ..Default::default()}; | ||
|
|
||
| { | ||
| let writer = domain.state.start_writer().into_diagnostic().context("starting writer")?; | ||
| writer.index_utxoset(&utxoset).into_diagnostic().context("indexing")?; | ||
| writer.commit().into_diagnostic().context("committing")?; | ||
| } | ||
|
|
||
| progress.inc(args.chunk as u64); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress increment may over-count on the last chunk.
progress.inc(args.chunk as u64) on line 47 always increments by the configured chunk size, but the last chunk may contain fewer items. This causes the progress bar to potentially exceed 100% or show inaccurate progress.
🔎 Proposed fix
for chunk in remaining.chunks(args.chunk).into_iter() {
- let produced_utxo = chunk.into_iter().map(|x| {
+ let chunk_items: Vec<_> = chunk.collect();
+ let chunk_len = chunk_items.len();
+ let produced_utxo = chunk_items.into_iter().map(|x| {
let (k, v) = x.into_diagnostic().context("decoding utxoset")?;
Ok((k, Arc::new(v)))
}).collect::<miette::Result<_>>()?;
let utxoset = UtxoSetDelta {produced_utxo, ..Default::default()};
{
let writer = domain.state.start_writer().into_diagnostic().context("starting writer")?;
writer.index_utxoset(&utxoset).into_diagnostic().context("indexing")?;
writer.commit().into_diagnostic().context("committing")?;
}
- progress.inc(args.chunk as u64);
+ progress.inc(chunk_len as u64);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for chunk in remaining.chunks(args.chunk).into_iter() { | |
| let produced_utxo = chunk.into_iter().map(|x| { | |
| let (k, v) = x.into_diagnostic().context("decoding utxoset")?; | |
| Ok((k, Arc::new(v))) | |
| }).collect::<miette::Result<_>>()?; | |
| let utxoset = UtxoSetDelta {produced_utxo, ..Default::default()}; | |
| { | |
| let writer = domain.state.start_writer().into_diagnostic().context("starting writer")?; | |
| writer.index_utxoset(&utxoset).into_diagnostic().context("indexing")?; | |
| writer.commit().into_diagnostic().context("committing")?; | |
| } | |
| progress.inc(args.chunk as u64); | |
| } | |
| for chunk in remaining.chunks(args.chunk).into_iter() { | |
| let produced_utxo = chunk.into_iter().map(|x| { | |
| let (k, v) = x.into_diagnostic().context("decoding utxoset")?; | |
| Ok((k, Arc::new(v))) | |
| }).collect::<miette::Result<_>>()?; | |
| let utxoset = UtxoSetDelta {produced_utxo, ..Default::default()}; | |
| { | |
| let writer = domain.state.start_writer().into_diagnostic().context("starting writer")?; | |
| writer.index_utxoset(&utxoset).into_diagnostic().context("indexing")?; | |
| writer.commit().into_diagnostic().context("committing")?; | |
| } | |
| progress.inc(chunk.len() as u64); | |
| } |
🤖 Prompt for AI Agents
In src/bin/dolos/doctor/build_indexes.rs around lines 34 to 48, the progress bar
is always incremented by args.chunk which overcounts when the final chunk is
smaller; compute the actual number of items in the current chunk (e.g., take
chunk.len() before consuming it) and call progress.inc with that actual count
(as u64) instead of args.chunk so the progress reflects the true number of
processed items.
| /// catch up store data from WAL records | ||
| BuildIndexes(build_indexes::Args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the command description.
The comment on line 37 says "catch up store data from WAL records" which appears to be copied from the CatchupStores command above. This should be updated to accurately describe the BuildIndexes command's purpose.
🔎 Proposed fix for the description
- /// catch up store data from WAL records
+ /// build indexes for the UTXO set
BuildIndexes(build_indexes::Args),🤖 Prompt for AI Agents
In src/bin/dolos/doctor/mod.rs around lines 37-38, the doc comment for the
BuildIndexes variant incorrectly repeats "catch up store data from WAL records";
update that comment to accurately describe BuildIndexes (for example: "Build or
rebuild persistent indexes from store data" or "Build indexes for the
store/repositories"), replacing the copied CatchupStores description so it
clearly reflects the command's purpose.
The idea consists on separating the index construction from the utxoset generation, as to avoid creating the indexes on each step of the bootstrapping process, similar to
defer-indexeson Kupo.Summary by CodeRabbit
Release Notes
New Features
build_indexescommand to rebuild UTXO indexes in configurable chunksImprovements
✏️ Tip: You can customize this high-level summary in your review settings.