Skip to content

merge: more advanced version #106

@m-mohr

Description

@m-mohr

Originally posted by @andyjenkinson in #90 (comment)

What you're describing is extremely complex because overlaps are not 1:1, there are chains of consequences. For example:

  • A overlaps B
  • A overlaps C
  • A is newer than B
  • C is newer than A

Also even for the simple cases where you have an entire dataset which are all 'newer', what you find is that often a 'newer' set of boundaries don't overlap with each other, but it's not possible to introduce them 1-by-1 because the intermediate state introduces overlaps. Basically for both of these reasons it requires set-level comparison logic. We approach this in stages by computing non-overlapping 'islands', then processing each in chunks. This still also requires that the user is happy with a simple logic of 'is newer' - in fact often it is the case that it's undesirable to replace an older, higher quality one with a newer predicted boundary.

The other complication by the way is that even if boundaries don't overlap, identifiers are only unique within a single dataset. Again that's why we create a separate Feature object with a (different) guaranteed-unique Boundary ID and then link each 'input' object's metadata as a separate object in order to retain the original metadata. (In GFID this has to be a 1:many relationship because we deduplicate the geometries, and even when two objects have the same geometry they typically have different (conflicting) metadata - e.g. two different dates, two different claimed delineation methods. We do this because it's actually quite common to see the same geometry appear in multiple datasets, e.g. FotW contains many national LPIS geometries, each which itself is a separately published dataset.

This is why solving the deduplication problem required creating an entire product (i.e. Global FieldID) focused on that goal. Ideally we would just make it so that also why we cannot make GFID fiboa-compliant, because the FIBOA specification is not compatible

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions