Skip to content

Conversation

@newren
Copy link

@newren newren commented Dec 24, 2025

This bug dates back to ort's introduction, in particular with 7bee6c1 (merge-ort: avoid recursing into directories when we don't need to, 2021-07-16).

…dling

At GitHub, a few repositories were triggering errors of the form:

    git: merge-ort.c:3037: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
    Aborted (core dumped)

While these may look similar to both
    a562d90 (merge-ort: fix failing merges in special corner case,
                  2025-11-03)
and
    f6ecb60 (merge-ort: fix directory rename on top of source of other
                  rename/delete, 2025-08-06)
the cause is different and in this case the problem is not an
over-conservative assertion, but a bug before the assertion where we did
not update all relevant state appropriately.

It sadly took me a really long time to figure out how to get a simple
reproducer for this one.  It doesn't really have that many moving parts,
but there are multiple pieces of background information needed to
understand it.

First of all, when we have two files added at the same path, merge-ort
does a two-way merge of those files.  If we have two directories added
at the same path, we basically do the same thing (taking the union of
files, and two-way merging files with the same name).  But two-way
merging requires components of the same type.  We can't merge the
contents of a regular file with a directory, or with a symlink, or with
a submodule.  Nor can any of those other types be merged with each
other, e.g. merging a submodule with a directory is a bad idea.  When
two paths have the same name but their types do not match, merge-ort is
forced to move one of them to an alternate filename (using the
unique_path() function).

Second, if two commits being merged have more than one merge-base,
merge-ort will merge the merge-bases to create a virtual merge-base, and
use that as the base commit.

Third, one of the really important optimizations in merge-ort is trivial
tree-level resolution (roughly meaning merging trees without recursing
into them).  This optimization has some nuance to it that is important
to the current bug, and to understand it, it helps to first look at the
high-level overview of how merge-ort runs; there are basically three
high-level functions that the work is divided between:
    collect_merge_info() - walks the top-level trees getting individual
                           paths of interest
    detect_renames() - detect renames between paths in order to match up
                       paths for three-way merging
    process_entries() - does a few things of interest:
      * three-way merging of files,
      * other special handling (e.g. adjusting paths with conflicting
        types to avoid path collisions)
      * as it finishes handling all the files within a subdirectory,
        writes out a new tree object for that directory

If it were not for renames, we could just always do tree-level merging
whenever the tree on at least one side was unmodified.  Unfortunately,
we need to recurse into trees to determine whether there are renames.
However, we can also do tree-level merging so long as there aren't any
*relevant* renames (another merge-ort optimization), which we can
determine without recursing into trees.

We would also be able to do tree-level merging if we somehow apriori
knew what renames existed, by only recursing into the trees which we
could otherwise trivially merge if they contained files involved in
renames.  That might not seem useful, because we need to find out the
renames and we have to recurse into trees to do so, but when you find
out that the process_entries() step is more computationally expensive
than the collect_merge_info() step, it yields an interesting strategy:
   * run collect_merge_info()
   * run detect_renames()
   * cache the renames()
   * restart -- rerun collect_merge_info(), using the cached renames to
     only recurse into the needed trees
   * we already have the renames cached so no need to re-detect
   * run process_entries() on the reduced list of paths
which was implemented back in 7bee6c1 (merge-ort: avoid recursing
into directories when we don't need to, 2021-07-16)  Crucially, this
restarting only occurs if the number of paths we could skip recursing
into exceeds the number we still need to recurse into by some safety
factor (wanted_factor in handle_deferred_entries()); forgetting this
fact is a great way to repeatedly fail to create a minimal testcase for
several days and go down alternate wrong paths).

Now, I earlier summarized this optimization as "merging trees without
recursing into them", but this optimization does not require that all
three sides of history has a directory at a given path.  So long as the
tree on one side matches the tree in the base version, we can decide to
resolve in favor of whatever the other side of history has at that path
-- be it a directory, a file, a submodule, or a symlink.  Unfortunately,
the code in question didn't fully realize this, and was written assuming
the base version and both sides would have a directory at the given
path, as can be seen by the "ci->filemask == 0" comment in
resolve_trivial_directory_merge() that was added as part of 7bee6c1
(merge-ort: avoid recursing into directories when we don't need to,
2021-07-16).  A few additional lines of code are needed to handle cases
where we have something other than a directory on the other side of
history.

But, knowing that resolve_trivial_directory_merge() doesn't have
sufficient state updating logic doesn't show us how to trigger a bug
without combining with the other bits of information we provided above.
Here's a relevant testcase:
   * branches A & B
   * commit A1: adds "folder" as a directory with files tracked under it
   * commit B1: adds "folder" as a submodule
   * commit A2: merges B1 into A1, keeping "folder" as a directory
     (and in fact, with no changes to "folder" since A1), discarding the
     submodule
   * commit B2: merges A1 into B1, keeping "folder" as a submodule
     (and in fact, with no changes to "folder" since B1), discarding the
     directory
Here, if we try to merge A2 & B2, the logic proceeds as follows:
   * we have multiple merge-bases: A1 & B1.  So we have to merge those
     to get a virtual merge base.
   * due to "folder" as a directory and "folder" as a submodule, the
     path collision logic triggers and renames "folder" as a submodule
     to "folder~Temporary merge branch 2" so we can keep it alongside
     "folder" as a directory.
   * we now have a virtual merge base (containing both "folder"
     directory and a "folder~Temporary merge branch 2" submodule) and
     can now do the outer merge
   * in the first step of the outer merge, we attempt to defer recursing
     into folder/ as a directory, but find we need to for rename
     detection.
   * in rename detection, we note that "folder~Temporary merge branch 2"
     has the same hash as "folder" as a submodule in B2, which means we
     have an exact rename.
   * after rename detection, we discover no path in folder/ is needed
     for renames, and so we can cache renames and restart.
   * after restarting, we avoid recursing into "folder/" and realize we
     can resolve it trivially since it hasn't been modified.  The
     resolution removes "folder/", leaving us only "folder" as a
     submodule from commit B2.
   * After this point, we should have a rename/delete conflict on
     "folder~Temporary merge branch 2" -> "folder", but our marking of
     the merge of "folder" as clean broke our ability to handle that and
     in fact triggers an assertion in process_renames().

When there was a df_conflict (directory/"file" conflict, where "file"
could be submodule or regular file or symlink), ensure
resolve_trivial_directory_merge() handles it properly.  In particular:
  * do not pre-emptively mark the path as cleanly merged if the
    remaining path is a file; allow it to be processed in
    process_entries() later to determine if it was clean
  * clear the parts of dirmask or filemask corresponding to the matching
    sides of history, since we are resolving those away
  * clear the df_conflict bit afterwards; since we cleared away the two
    matching sides and only have one side left, that one side can't
    have a directory/file conflict with itself.

Also add the above minimal testcase showcasing this bug to t6422, **with
a sufficient number of paths under the folder/ directory to actually
trigger it**.  (I wish I could have all those days back from all the
wrong paths I went down due to not having enough files under that
directory...)

I know this commit has a very high ratio of lines in the commit message
to lines of comments, and a relatively high ratio of comments to actual
code, but given how long it took me to track down, on the off chance
that we ever need to further modify this logic, I wanted it thoroughly
documented for future me and for whatever other poor soul might end up
needing to read this commit message.

Signed-off-by: Elijah Newren <newren@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant