Added seed based MLKEM test vectors #197

alex · 2025-11-27T15:36:52Z

These are from #194 -- all I did was convert them to a JSON schema.

alex · 2025-11-27T17:32:15Z

(If it's useful, I can post the conversion script)

botovq · 2025-11-28T16:17:07Z

Thanks! I haven't yet had the time to test and review this in detail. It looks like this adds more cases of multiple distinct test groups types in the same file, see #191 and #192 (comment), so the three testvectors_v1/mlkem_*_seed_test.json files may need to be split into three files each.

Also, have you considered if the mlkem_seed_test_schema.json can be merged into the mlkem_test_schema.json?

alex · 2025-11-28T18:37:17Z

🙃 went to make it consistent with the existing files, forgot that we wanted to split the existing ones apart.

alex · 2025-11-29T16:04:42Z

Ok, split apart now.

cpu

Thanks for taking this on. Sorry about the delay in reviewing. I wanted to look at this one more carefully.

(If it's useful, I can post the conversion script)

I think it'd be helpful to put that in the PR description in a folded details block, or linked as a gist, in case we need to revisit this in the future.

schemas/mlkem_encaps_seed_test_schema.json

schemas/mlkem_decaps_seed_test_schema.json

testvectors_v1/mlkem_1024_decaps_seed_test.json

alex · 2025-12-16T01:24:51Z

Original script:

Details

#!/usr/bin/env python3
"""
Converts line-based ML-KEM test vectors to JSON format.
"""

import json
import os
import re


def parse_line_file(filepath):
    """Parse a line-based test vector file into a list of test dictionaries."""
    tests = []
    current_test = {}

    with open(filepath, 'r') as f:
        for line in f:
            line = line.strip()
            if not line:
                if current_test:
                    tests.append(current_test)
                    current_test = {}
                continue

            if ' = ' in line:
                key, value = line.split(' = ', 1)
                current_test[key] = value

    if current_test:
        tests.append(current_test)

    return tests


def extract_comment_text(comment):
    """Extract a cleaner comment from the raw comment field."""
    match = re.search(r'seeds (\d+)', comment)
    if match:
        return f"Test vector from seed {match.group(1)}"
    return comment


def convert_keygen(input_path, param_set, start_tc_id=1):
    """Convert keygen test vectors."""
    tests = parse_line_file(input_path)

    json_tests = []
    for i, test in enumerate(tests):
        json_test = {
            "tcId": start_tc_id + i,
            "comment": test.get("comment", ""),
            "seed": test["entropy"],
            "ek": test["expected_public_key"],
            "dk": test["expected_expanded_private_key"],
            "result": "valid"
        }
        json_tests.append(json_test)

    return {
        "type": "MLKEMKeyGen",
        "source": {
            "name": "FIPS 203",
            "version": "1.0"
        },
        "parameterSet": param_set,
        "tests": json_tests
    }


def convert_encaps(input_path, param_set, start_tc_id=1):
    """Convert encaps test vectors."""
    tests = parse_line_file(input_path)

    json_tests = []
    for i, test in enumerate(tests):
        result = test.get("expected_result", "pass")
        result = "valid" if result == "pass" else "invalid"

        json_test = {
            "tcId": start_tc_id + i,
            "comment": test.get("comment", ""),
            "ek": test["public_key"],
            "m": test["entropy"],
            "c": test.get("expected_ciphertext", ""),
            "K": test.get("expected_shared_secret", ""),
            "result": result
        }
        json_tests.append(json_test)

    return {
        "type": "MLKEMEncaps",
        "source": {
            "name": "FIPS 203",
            "version": "1.0"
        },
        "parameterSet": param_set,
        "tests": json_tests
    }


def convert_decaps(input_path, param_set, start_tc_id=1):
    """Convert decaps test vectors."""
    tests = parse_line_file(input_path)

    json_tests = []
    for i, test in enumerate(tests):
        result = test.get("expected_result", "pass")
        result = "valid" if result == "pass" else "invalid"

        json_test = {
            "tcId": start_tc_id + i,
            "comment": test.get("comment", ""),
            "seed": test["private_key"],
            "c": test["ciphertext"],
            "K": test.get("expected_shared_secret", ""),
            "result": result
        }
        json_tests.append(json_test)

    return {
        "type": "MLKEMDecaps",
        "source": {
            "name": "FIPS 203",
            "version": "1.0"
        },
        "parameterSet": param_set,
        "tests": json_tests
    }


def write_output(base_dir, filename, test_group, schema):
    """Write a single test group to a JSON file."""
    num_tests = len(test_group["tests"])
    output = {
        "algorithm": "ML-KEM",
        "schema": schema,
        "numberOfTests": num_tests,
        "testGroups": [test_group]
    }

    output_path = os.path.join(base_dir, filename)
    with open(output_path, 'w') as f:
        json.dump(output, f, indent=2)

    print(f"Wrote {output_path} with {num_tests} tests")


def main():
    base_dir = "testvectors_v1"

    param_sets = [
        ("512", "ML-KEM-512"),
        ("768", "ML-KEM-768"),
        ("1024", "ML-KEM-1024"),
    ]

    for size, param_set in param_sets:
        # Convert keygen (tcIds start at 1 for each file)
        keygen_path = os.path.join(base_dir, f"keygen{size}ml-kem")
        if os.path.exists(keygen_path):
            keygen_group = convert_keygen(keygen_path, param_set)
            write_output(base_dir, f"mlkem_{size}_keygen_seed_test.json", keygen_group,
                         "mlkem_keygen_seed_test_schema.json")

        # Convert encaps
        encaps_path = os.path.join(base_dir, f"encaps{size}ml-kem")
        if os.path.exists(encaps_path):
            encaps_group = convert_encaps(encaps_path, param_set)
            write_output(base_dir, f"mlkem_{size}_encaps_seed_test.json", encaps_group,
                         "mlkem_encaps_seed_test_schema.json")

        # Convert decaps
        decaps_path = os.path.join(base_dir, f"decaps{size}ml-kem")
        if os.path.exists(decaps_path):
            decaps_group = convert_decaps(decaps_path, param_set)
            write_output(base_dir, f"mlkem_{size}_decaps_seed_test.json", decaps_group,
                         "mlkem_decaps_seed_test_schema.json")

        print()


if __name__ == "__main__":
    main()

alex · 2025-12-16T01:25:59Z

(I'm not going to have time to run with this for another week, but if anyone wants to grab this and run with it, that's great. If not, I'll follow up in a week.)

These are from C2SP#194 -- all I did was convert them to a JSON schema.

Remove duplicate mlkem_encaps_seed_test_schema.json and update the seed-based encaps test vectors to use the existing schema. This aligns the encaps test format with the existing schema conventions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Make ek optional in mlkem_test_schema.json so it can be used for both the existing decaps tests (which include ek) and the new seed-based decaps tests (which omit ek). Remove the duplicate mlkem_decaps_seed_test_schema.json. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove the ", seed: ..." portion from test vector comments to avoid confusion with the decapsulation key seed field. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

alex · 2025-12-23T22:53:31Z

Ok, addressed all three review comments (each in their own commit for simplicity).

I did not roll these vectors into the existing files, or attempted to compute ek -- this branch has been a purely mechanical translation of the original vectors, and I'm trying to avoid anything that might introduce errors.

Add a throw-away Go tool for generating the ek values for each of the NIST mlkem decaps seed tests. Produced ek values are written to a textfile alongside the JSON data since the JSON munging is better handled by Python. Test cases that can't produce an ek (e.g. due to an invalid seed input length) have an empty line produced instead of the hex encoded ek. Notably the Go stdlib doesn't include MLKEM 512 so we use the Cloudflare circl impl for that case. Since we have that dep anyway, we can cross-check the 768/1024 ek values across the stdlib and circl impls as an extra validation step.

Use a throw-away Python script to carefully insert the hex encoded ek values for each mlkem_*_decaps_seed_test.json testcase (when the ek value is available). Also renumber the tcID's by 1 to accommodate the goal of merging these into the mlkem_*_test.json files that presently have 1 pre-existing test case each.

This takes the test groups from the new mlkem_*_decaps_seed_test.json vectors into the pre-existing mlkem_*_test.json files now that the new test groups have their ek values populated as apppropriate. We use another throw-away Python script for this to maintain the overall formatting/field order of the test data easily.

Now that the test groups have been merged, update the count of test cases to properly reflect the total. This allows the vector linting to pass.

Use a throw-away script to renumber the mlkem_*_encaps_seed_test.json test cases based on the highest tcID in the pre-existing mlkem_*_encaps_test.json test files, then merge the test group after the initial test group.

Finally, we can remove the intermediate ek values and the separate mlkem_*_decaps_seed_test.json and mlkem_encaps_*_seed_test.json files since they're now incorporated into mlkem_*_test.json and mlkem_*_encaps_test.json

The tooling we added to update/merge the vectors is no longer needed.

cpu · 2025-12-30T18:10:03Z

Ok, addressed all three review comments (each in their own commit for simplicity).

Thanks!

I did not roll these vectors into the existing files, or attempted to compute ek -- this branch has been a purely mechanical translation of the original vectors, and I'm trying to avoid anything that might introduce errors.

I understand the hesitance to introduce new errors but I think we need more than a 1:1 mechanical translation here. I'd prefer not to merge new vector files that are an exact schema match to pre-existing vectors instead of merging them.

I've tacked on some extra commits that do the work to generate ek where appropriate, renumber the new test cases, and merge the new test groups into the existing test files. I've tried to do this a bit more step-wise in distinct commits and included the throw-away scripts I used so that its hopefully easier to verify there are no errors introduced. Once everyone is happy I think we should squash-merge this so that all the intermediate bits fall away.

alex · 2025-12-30T18:14:55Z

Thank you!

botovq · 2026-01-04T12:04:47Z

It would be nice if someone could replace unusally -> unusually (ten times) in testvectors_v1/mlkem_768_encaps_test.json (github suggestions seem not to work here perhaps due to the excessively long lines)

alex · 2026-01-04T13:41:52Z

@cpu do you want to do that or would you like me to?

alex force-pushed the mlkem-seed-json branch 2 times, most recently from 6213424 to 7e1b6be Compare November 27, 2025 15:41

alex force-pushed the mlkem-seed-json branch from 7e1b6be to c593b4a Compare November 29, 2025 16:04

alex force-pushed the mlkem-seed-json branch from c593b4a to ec90c7e Compare November 29, 2025 16:05

cpu mentioned this pull request Dec 1, 2025

new seed based test vectors #194

Closed

sgmenda approved these changes Dec 5, 2025

View reviewed changes

cpu reviewed Dec 10, 2025

View reviewed changes

schemas/mlkem_encaps_seed_test_schema.json Show resolved Hide resolved

schemas/mlkem_decaps_seed_test_schema.json Show resolved Hide resolved

testvectors_v1/mlkem_1024_decaps_seed_test.json Outdated Show resolved Hide resolved

alex force-pushed the mlkem-seed-json branch from ec90c7e to 565332a Compare December 23, 2025 22:50

alex and others added 4 commits December 23, 2025 17:53

Added seed based MLKEM test vectors

5eb295b

These are from C2SP#194 -- all I did was convert them to a JSON schema.

Remove test generation seed from comments

d28ae0b

Remove the ", seed: ..." portion from test vector comments to avoid confusion with the decapsulation key seed field. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

alex force-pushed the mlkem-seed-json branch from 565332a to d28ae0b Compare December 23, 2025 22:53

cpu added 7 commits December 30, 2025 12:03

mlkem: fix numberOfTests

a261111

Now that the test groups have been merged, update the count of test cases to properly reflect the total. This allows the vector linting to pass.

mlkem: renumber and combine encaps_seed_tests

64e3a9d

Use a throw-away script to renumber the mlkem_*_encaps_seed_test.json test cases based on the highest tcID in the pre-existing mlkem_*_encaps_test.json test files, then merge the test group after the initial test group.

mlekm: remove folded in vectors/ek values

a406ee3

Finally, we can remove the intermediate ek values and the separate mlkem_*_decaps_seed_test.json and mlkem_encaps_*_seed_test.json files since they're now incorporated into mlkem_*_test.json and mlkem_*_encaps_test.json

mlkem: remove throw-away tooling

56c49d4

The tooling we added to update/merge the vectors is no longer needed.

cpu requested a review from sgmenda December 30, 2025 18:10

sgmenda approved these changes Dec 31, 2025

View reviewed changes

Added seed based MLKEM test vectors #197

Are you sure you want to change the base?

Added seed based MLKEM test vectors #197

Uh oh!

Conversation

alex commented Nov 27, 2025

Uh oh!

alex commented Nov 27, 2025

Uh oh!

botovq commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex commented Nov 28, 2025

Uh oh!

alex commented Nov 29, 2025

Uh oh!

cpu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alex commented Dec 16, 2025

Uh oh!

alex commented Dec 16, 2025

Uh oh!

alex commented Dec 23, 2025

Uh oh!

cpu commented Dec 30, 2025

Uh oh!

alex commented Dec 30, 2025

Uh oh!

botovq commented Jan 4, 2026

Uh oh!

alex commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

botovq commented Nov 28, 2025 •

edited

Loading