Skip to content

Conversation

@bplatz
Copy link
Contributor

@bplatz bplatz commented Aug 21, 2025

Summary

This sits on top of PR #1095 which in turn sits on top of several others. The final part of the full branching support which sits on top of this branch is #1102 which adds cuckoo filters for index garbage collection.

This PR implements core branch management operations for Fluree, providing Git-like capabilities for database version control:

  • Merge with fast-forward and squash modes - Integrate changes from source into target branch
  • Rebase with squash mode - Replay source branch onto target (true git-style rebase)
  • Safe reset - Non-destructively revert branches to previous states
  • Branch divergence analysis - Determine relationships between branches
  • Branch graph visualization - ASCII and JSON representations of branch relationships

Features Implemented

1. Merge Operations (Updates Target Branch)

Fast-forward

Automatically moves target branch pointer when no divergence exists:

@(fluree/merge! conn "mydb:feature" "mydb:main")
;; Returns: {:status :success :strategy "fast-forward" ...}

Squash Merge

Combines multiple commits into a single commit on target, with intelligent cancellation of assert/retract pairs:
@(fluree/merge! conn "mydb:feature" "mydb:main"
  {:squash? true
   :message "Feature complete: Add user authentication"})

2. Rebase Operations (Updates Source Branch)

True git-style rebase that updates the source branch by replaying it onto target:
@(fluree/rebase! conn "mydb:feature" "mydb:main"
  {:squash? true
   :message "Rebase feature onto latest main"})
;; Note: Updates feature branch, main remains unchanged

3. Safe Reset

Creates a new commit that reverts the branch to a previous state (non-destructive):
;; Reset to transaction number
@(fluree/reset-branch! conn "mydb:main" {:t 90}
  {:message "Reverting to stable state"})

;; Reset to commit SHA
@(fluree/reset-branch! conn "mydb:main" {:sha "abc123"}
  {:message "Reverting to release v1.0"})

4. Branch Analysis & Visualization

Divergence Analysis

Check if branches can fast-forward before performing operations:
(def divergence @(fluree/branch-divergence conn "mydb:feature" "mydb:main"))
;; Returns: {:can-fast-forward true 
;;           :fast-forward-direction :branch1-to-branch2
;;           :common-ancestor "sha..."}

Branch Graph

Visualize branch relationships:
;; ASCII visualization
(println (<!! (merge/branch-graph conn "mydb" {:format :ascii})))

;; JSON data structure
(def graph (<!! (merge/branch-graph conn "mydb" {:format :json})))

Key Implementation Details

- Correct git semantics: merge! updates target branch, rebase! updates source branch
- Smart cancellation: When squashing, assert/retract pairs for the same value cancel out
- Multi-cardinality support: Correctly handles both set and list semantics with meta field preservation
- Namespace handling: Properly manages RDF namespace mappings across branch operations
- Branch recreation: Rebase operations delete and recreate branches to maintain correct history
- Comprehensive testing: All 246 tests passing, including cancellation, divergence, and file-based storage

Architecture

The implementation is modularized across focused namespaces:
- fluree.db.merge - Public API (refactored for clarity)
- fluree.db.merge.operations - Core operations (squash!, fast-forward!, safe-reset!)
- fluree.db.merge.flake - Flake manipulation and cancellation logic
- fluree.db.merge.branch - Branch analysis and LCA detection
- fluree.db.merge.graph - Branch visualization
- Supporting namespaces for commits, database prep, and responses

Documentation

See docs/branch-operations.md for:
- Complete API reference with correct semantics
- Usage examples and workflows
- Clear distinction between merge and rebase operations
- Implementation details for developers
- Troubleshooting guide

Testing

Run tests with:
make cljtest

Test coverage includes:
- Fast-forward merge scenarios
- Squash merge with divergent branches
- Rebase operations updating source branch
- Assert/retract cancellation verification
- Safe reset operations
- Branch graph visualization
- File and memory storage compatibility

Next Steps

- Implement commit-by-commit replay for rebase (without squash)
- Add detailed anomaly reports to merge operations
- Implement three-way merge with conflict resolution (when SHACL/policy validation is added)
- Implement hard reset mode with branch pointer manipulation
- Add cherry-pick functionality for selective commit application

@bplatz bplatz changed the base branch from main to feature/branching August 21, 2025 20:03
@bplatz bplatz force-pushed the feature/branching branch from 970f0e4 to ce3b475 Compare August 21, 2025 20:12
@bplatz bplatz force-pushed the feature/branching branch from ce3b475 to b2a5f9f Compare August 27, 2025 02:57
@bplatz bplatz changed the title Add rebase and reset operations for branch management Add merge, rebase and reset operations for branch management Sep 2, 2025
@bplatz bplatz marked this pull request as ready for review September 2, 2025 15:58
@bplatz bplatz requested a review from a team September 2, 2025 15:58
- Resolved conflict in src/fluree/db/commit/storage.cljc
  - Preferred feature/branching version using storage/content-read-json
  - Kept cleaner hash calculation logic from feature/branching
- Fixed branch-graph test by correcting read-commit-jsonld function call
  - Removed extra hash parameter that was causing test failures
- Merged all branch improvements including:
  - Branch metadata flattening for index optimization
  - Helper functions consolidation in util.branch namespace
  - Secondary publisher support
  - Validation improvements and extracted validate-publish function
  - Main branch helper functions
  - Drop API error message enhancements
  - Deterministic test timestamps
- All tests now pass
can-ff? (<? (merge-branch/can-fast-forward? conn source-db target-db
source-branch-info target-branch-info))]

(when preview?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't short circuit because of the when. The subsequent merging code here will always run whether or not the preview? option is set, which is the opposite of what we want when people set that option.

source-branch-info target-branch-info))]

(when preview?
(reduced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like it's taking place inside of a reduction. What is this reduced doing?

;; Delete the existing branch
(<? (api.branch/delete-branch! conn branch-spec))
;; Recreate it pointing to the new commit
(<? (api.branch/create-branch!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to do this in a non-destructive way? i'm worried about an error upon recreation after the original branch data has been deleted.

:squash? - Combine all commits into one (default false)
:preview? - Dry run without changes (default false)
Returns promise resolving to merge result with anomalies report."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not return a promise; it returns a core.async channel.

:squash? - Combine all commits into one (default false)
:preview? - Dry run without changes (default false)
Returns promise resolving to rebase result."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not return a promise.

:from from
:to to
:lca lca
:strategy (if squash? "squash" "replay")}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the squash? case should be handled by a separate function. squashed? is referenced many times in subsequent code, which is a source of future bugs as these functions get modified.

@@ -0,0 +1,103 @@
(ns fluree.db.merge.db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a protocol for this functionality. A lot of the code here assumes we'll be operating on a FlakeDB explicitly, and there's a function that assumes that an AsyncDB will always wrap a FlakeDB. We should be able to use any dbs we receive interchangeably without ever having to explicitly check what kind of dbs they are, so new functionality on dbs should be defined in terms of new protocols.

(merge-branch/validate-same-ledger! from to)
(let [{:keys [ff squash? preview?]
:or {ff :auto
squash? false}} opts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to control the behavior of these functions through options maps, then we need to have an explicit options validation step. It's really easy for a user to use {:preview true} instead of {:preview? true} and perform some destructive process that they didn't intend.

This is part of the reason why I don't like options maps. I would rather have separate, explicit functions for all the legal options combinations we accept (like preview-merge ...). Those functions could share internal helper functions for shared functionality. This makes the api explicit, reduces the complexity of both the code and documentation, and makes it obvious what is and is not supported from what functions are available.

(defn- branch-origin
"Get the commit ID a branch was created from."
[branch-info]
(or (get-in branch-info [:created-from "f:commit" "@id"]) ; nameservice expanded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should operate on a single format internally, and convert between external representations as necessary. Checking all these cases whenever we want to find information about a branch seems like a source of future bugs.

(let [commit-catalog (:commit-catalog conn)
latest-expanded (<? (merge-commit/expand-latest-commit conn db))
error-ch (async/chan)
tuples (commit-storage/trace-commits commit-catalog latest-expanded 0 error-ch)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle any errors as nothing ever takes anything off of the error channel. It looks like this code will just hang, possibly locking the whole jvm process, if an error occurs.

#### Response

```clojure
{:status :success ; or :error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the return should just be the same as a regular commit - return the db with the new HEAD.


| Mode | Status | Description |
|------|--------|-------------|
| Safe mode (`:mode :safe`) | ✅ Implemented | Creates a revert commit to target state |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think revert is a distinct operation from reset, I would vote we move this to a different api and leave reset unimplemented for now.

|------|--------|-------------|
| Safe mode (`:mode :safe`) | ✅ Implemented | Creates a revert commit to target state |
| Hard mode (`:mode :hard`) | ❌ Not Implemented | Will move branch pointer (rewrite history) |
| Preview (`:preview? true`) | ✅ Implemented | Dry-run to see what would happen |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, but if we're copying git's api we might want to call this dry-run? instead : )

;; Returns:
{:common-ancestor "fluree:commit:sha256:..."
:can-fast-forward true ; or false
:fast-forward-direction :branch1-to-branch2} ; or :branch2-to-branch1, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this datastructure can be made more useful:

{:common-ancestor <lca>
 :target [<commita> <commitb>] ;; commits since lca
 :source [<commitc> <commitd> <commite>]} ;; commits since lca

can-fast-forward is implicit - only possible when :target is empty. This way we don't have to try to decipher what the direction is by parsing the :fast-forward-direction and we have the information we need to check for conflicts.

@bplatz bplatz marked this pull request as draft January 9, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants