Skip to content

Conversation

@alex
Copy link
Contributor

@alex alex commented Nov 27, 2025

These are from #194 -- all I did was convert them to a JSON schema.

@alex alex force-pushed the mlkem-seed-json branch 2 times, most recently from 6213424 to 7e1b6be Compare November 27, 2025 15:41
@alex
Copy link
Contributor Author

alex commented Nov 27, 2025

(If it's useful, I can post the conversion script)

@botovq
Copy link
Contributor

botovq commented Nov 28, 2025

Thanks! I haven't yet had the time to test and review this in detail. It looks like this adds more cases of multiple distinct test groups types in the same file, see #191 and #192 (comment), so the three testvectors_v1/mlkem_*_seed_test.json files may need to be split into three files each.

Also, have you considered if the mlkem_seed_test_schema.json can be merged into the mlkem_test_schema.json?

@alex
Copy link
Contributor Author

alex commented Nov 28, 2025

🙃 went to make it consistent with the existing files, forgot that we wanted to split the existing ones apart.

@alex
Copy link
Contributor Author

alex commented Nov 29, 2025

Ok, split apart now.

Copy link
Member

@cpu cpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on. Sorry about the delay in reviewing. I wanted to look at this one more carefully.

(If it's useful, I can post the conversion script)

I think it'd be helpful to put that in the PR description in a folded details block, or linked as a gist, in case we need to revisit this in the future.

@alex
Copy link
Contributor Author

alex commented Dec 16, 2025

Original script:

Details
#!/usr/bin/env python3
"""
Converts line-based ML-KEM test vectors to JSON format.
"""

import json
import os
import re


def parse_line_file(filepath):
    """Parse a line-based test vector file into a list of test dictionaries."""
    tests = []
    current_test = {}

    with open(filepath, 'r') as f:
        for line in f:
            line = line.strip()
            if not line:
                if current_test:
                    tests.append(current_test)
                    current_test = {}
                continue

            if ' = ' in line:
                key, value = line.split(' = ', 1)
                current_test[key] = value

    if current_test:
        tests.append(current_test)

    return tests


def extract_comment_text(comment):
    """Extract a cleaner comment from the raw comment field."""
    match = re.search(r'seeds (\d+)', comment)
    if match:
        return f"Test vector from seed {match.group(1)}"
    return comment


def convert_keygen(input_path, param_set, start_tc_id=1):
    """Convert keygen test vectors."""
    tests = parse_line_file(input_path)

    json_tests = []
    for i, test in enumerate(tests):
        json_test = {
            "tcId": start_tc_id + i,
            "comment": test.get("comment", ""),
            "seed": test["entropy"],
            "ek": test["expected_public_key"],
            "dk": test["expected_expanded_private_key"],
            "result": "valid"
        }
        json_tests.append(json_test)

    return {
        "type": "MLKEMKeyGen",
        "source": {
            "name": "FIPS 203",
            "version": "1.0"
        },
        "parameterSet": param_set,
        "tests": json_tests
    }


def convert_encaps(input_path, param_set, start_tc_id=1):
    """Convert encaps test vectors."""
    tests = parse_line_file(input_path)

    json_tests = []
    for i, test in enumerate(tests):
        result = test.get("expected_result", "pass")
        result = "valid" if result == "pass" else "invalid"

        json_test = {
            "tcId": start_tc_id + i,
            "comment": test.get("comment", ""),
            "ek": test["public_key"],
            "m": test["entropy"],
            "c": test.get("expected_ciphertext", ""),
            "K": test.get("expected_shared_secret", ""),
            "result": result
        }
        json_tests.append(json_test)

    return {
        "type": "MLKEMEncaps",
        "source": {
            "name": "FIPS 203",
            "version": "1.0"
        },
        "parameterSet": param_set,
        "tests": json_tests
    }


def convert_decaps(input_path, param_set, start_tc_id=1):
    """Convert decaps test vectors."""
    tests = parse_line_file(input_path)

    json_tests = []
    for i, test in enumerate(tests):
        result = test.get("expected_result", "pass")
        result = "valid" if result == "pass" else "invalid"

        json_test = {
            "tcId": start_tc_id + i,
            "comment": test.get("comment", ""),
            "seed": test["private_key"],
            "c": test["ciphertext"],
            "K": test.get("expected_shared_secret", ""),
            "result": result
        }
        json_tests.append(json_test)

    return {
        "type": "MLKEMDecaps",
        "source": {
            "name": "FIPS 203",
            "version": "1.0"
        },
        "parameterSet": param_set,
        "tests": json_tests
    }


def write_output(base_dir, filename, test_group, schema):
    """Write a single test group to a JSON file."""
    num_tests = len(test_group["tests"])
    output = {
        "algorithm": "ML-KEM",
        "schema": schema,
        "numberOfTests": num_tests,
        "testGroups": [test_group]
    }

    output_path = os.path.join(base_dir, filename)
    with open(output_path, 'w') as f:
        json.dump(output, f, indent=2)

    print(f"Wrote {output_path} with {num_tests} tests")


def main():
    base_dir = "testvectors_v1"

    param_sets = [
        ("512", "ML-KEM-512"),
        ("768", "ML-KEM-768"),
        ("1024", "ML-KEM-1024"),
    ]

    for size, param_set in param_sets:
        # Convert keygen (tcIds start at 1 for each file)
        keygen_path = os.path.join(base_dir, f"keygen{size}ml-kem")
        if os.path.exists(keygen_path):
            keygen_group = convert_keygen(keygen_path, param_set)
            write_output(base_dir, f"mlkem_{size}_keygen_seed_test.json", keygen_group,
                         "mlkem_keygen_seed_test_schema.json")

        # Convert encaps
        encaps_path = os.path.join(base_dir, f"encaps{size}ml-kem")
        if os.path.exists(encaps_path):
            encaps_group = convert_encaps(encaps_path, param_set)
            write_output(base_dir, f"mlkem_{size}_encaps_seed_test.json", encaps_group,
                         "mlkem_encaps_seed_test_schema.json")

        # Convert decaps
        decaps_path = os.path.join(base_dir, f"decaps{size}ml-kem")
        if os.path.exists(decaps_path):
            decaps_group = convert_decaps(decaps_path, param_set)
            write_output(base_dir, f"mlkem_{size}_decaps_seed_test.json", decaps_group,
                         "mlkem_decaps_seed_test_schema.json")

        print()


if __name__ == "__main__":
    main()

@alex
Copy link
Contributor Author

alex commented Dec 16, 2025

(I'm not going to have time to run with this for another week, but if anyone wants to grab this and run with it, that's great. If not, I'll follow up in a week.)

alex and others added 4 commits December 23, 2025 17:53
These are from C2SP#194 -- all I did was convert them to a JSON schema.
Remove duplicate mlkem_encaps_seed_test_schema.json and update the
seed-based encaps test vectors to use the existing schema. This aligns
the encaps test format with the existing schema conventions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make ek optional in mlkem_test_schema.json so it can be used for both
the existing decaps tests (which include ek) and the new seed-based
decaps tests (which omit ek). Remove the duplicate
mlkem_decaps_seed_test_schema.json.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove the ", seed: ..." portion from test vector comments to avoid
confusion with the decapsulation key seed field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@alex
Copy link
Contributor Author

alex commented Dec 23, 2025

Ok, addressed all three review comments (each in their own commit for simplicity).

I did not roll these vectors into the existing files, or attempted to compute ek -- this branch has been a purely mechanical translation of the original vectors, and I'm trying to avoid anything that might introduce errors.

cpu added 7 commits December 30, 2025 12:03
Add a throw-away Go tool for generating the ek values for each of the
NIST mlkem decaps seed tests. Produced ek values are written to
a textfile alongside the JSON data since the JSON munging is better
handled by Python. Test cases that can't produce an ek (e.g. due to an
invalid seed input length) have an empty line produced instead of the
hex encoded ek.

Notably the Go stdlib doesn't include MLKEM 512 so we use the Cloudflare
circl impl for that case. Since we have that dep anyway, we can
cross-check the 768/1024 ek values across the stdlib and circl impls as
an extra validation step.
Use a throw-away Python script to carefully insert the hex encoded ek
values for each mlkem_*_decaps_seed_test.json testcase (when the ek
value is available).

Also renumber the tcID's by 1 to accommodate the goal of merging these
into the mlkem_*_test.json files that presently have 1 pre-existing test
case each.
This takes the test groups from the new mlkem_*_decaps_seed_test.json
vectors into the pre-existing mlkem_*_test.json files now that the new
test groups have their ek values populated as apppropriate.

We use another throw-away Python script for this to maintain the overall
formatting/field order of the test data easily.
Now that the test groups have been merged, update the count of test
cases to properly reflect the total. This allows the vector linting to
pass.
Use a throw-away script to renumber the mlkem_*_encaps_seed_test.json
test cases based on the highest tcID in the pre-existing
mlkem_*_encaps_test.json test files, then merge the test group after the
initial test group.
Finally, we can remove the intermediate ek values and the separate
mlkem_*_decaps_seed_test.json and mlkem_encaps_*_seed_test.json files
since they're now incorporated into mlkem_*_test.json and
mlkem_*_encaps_test.json
The tooling we added to update/merge the vectors is no longer needed.
@cpu
Copy link
Member

cpu commented Dec 30, 2025

Ok, addressed all three review comments (each in their own commit for simplicity).

Thanks!

I did not roll these vectors into the existing files, or attempted to compute ek -- this branch has been a purely mechanical translation of the original vectors, and I'm trying to avoid anything that might introduce errors.

I understand the hesitance to introduce new errors but I think we need more than a 1:1 mechanical translation here. I'd prefer not to merge new vector files that are an exact schema match to pre-existing vectors instead of merging them.

I've tacked on some extra commits that do the work to generate ek where appropriate, renumber the new test cases, and merge the new test groups into the existing test files. I've tried to do this a bit more step-wise in distinct commits and included the throw-away scripts I used so that its hopefully easier to verify there are no errors introduced. Once everyone is happy I think we should squash-merge this so that all the intermediate bits fall away.

@cpu cpu requested a review from sgmenda December 30, 2025 18:10
@alex
Copy link
Contributor Author

alex commented Dec 30, 2025

Thank you!

@botovq
Copy link
Contributor

botovq commented Jan 4, 2026

It would be nice if someone could replace unusally -> unusually (ten times) in testvectors_v1/mlkem_768_encaps_test.json (github suggestions seem not to work here perhaps due to the excessively long lines)

@alex
Copy link
Contributor Author

alex commented Jan 4, 2026

@cpu do you want to do that or would you like me to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants