Skip to content

Adler32RollingChecksumV2 seems to give bad results #32

@Spavid04

Description

@Spavid04

Description

When using the V2 rolling checksum algorithm, files that are identical or very slightly different result in huge deltas: the whole new file gets added as the delta.

Environment

  • repo freshly cloned from the current master branch (commit d87ee31)
  • VS2022 17.2.6 on Windows 10 x64

I had to make a small code change so the command line app would use the V2 algorithm by default:

diff --git a/source/Octodiff/Core/SupportedAlgorithms.cs b/source/Octodiff/Core/SupportedAlgorithms.cs
index 2cc2aa5..5552f13 100644
--- a/source/Octodiff/Core/SupportedAlgorithms.cs
+++ b/source/Octodiff/Core/SupportedAlgorithms.cs
@@ -52,7 +52,7 @@ namespace Octodiff.Core
 
         public virtual IRollingChecksum Default()
         {
-            return Adler32Rolling();
+            return Adler32Rolling(true);
         }
 
         public virtual IRollingChecksum Create(string algorithm)

Steps to reproduce

  • grab a random binary file; my test was kernel32.dll from windows\system32
  • create 2 copies of it: copy1.dll and copy2.dll
  • modify copy2.dll very slightly; I simply changed the first byte from 'M' to 'A'
  • run octodiff to create the deltas:
    • Octodiff.exe signature kernel32.dll signature.bin
    • Octodiff.exe delta signature.bin copy1.dll delta1.bin
    • Octodiff.exe delta signature.bin copy2.dll delta2.bin
  • observe how the delta files are very "not delta-y"

Other notes

The V1 version of the algorithm does produce expectedly small delta files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions