-
Notifications
You must be signed in to change notification settings - Fork 319
Added seed based MLKEM test vectors #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6213424 to
7e1b6be
Compare
|
(If it's useful, I can post the conversion script) |
|
Thanks! I haven't yet had the time to test and review this in detail. It looks like this adds more cases of multiple distinct test groups types in the same file, see #191 and #192 (comment), so the three Also, have you considered if the |
|
🙃 went to make it consistent with the existing files, forgot that we wanted to split the existing ones apart. |
7e1b6be to
c593b4a
Compare
|
Ok, split apart now. |
c593b4a to
ec90c7e
Compare
cpu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on. Sorry about the delay in reviewing. I wanted to look at this one more carefully.
(If it's useful, I can post the conversion script)
I think it'd be helpful to put that in the PR description in a folded details block, or linked as a gist, in case we need to revisit this in the future.
|
Original script: Details#!/usr/bin/env python3
"""
Converts line-based ML-KEM test vectors to JSON format.
"""
import json
import os
import re
def parse_line_file(filepath):
"""Parse a line-based test vector file into a list of test dictionaries."""
tests = []
current_test = {}
with open(filepath, 'r') as f:
for line in f:
line = line.strip()
if not line:
if current_test:
tests.append(current_test)
current_test = {}
continue
if ' = ' in line:
key, value = line.split(' = ', 1)
current_test[key] = value
if current_test:
tests.append(current_test)
return tests
def extract_comment_text(comment):
"""Extract a cleaner comment from the raw comment field."""
match = re.search(r'seeds (\d+)', comment)
if match:
return f"Test vector from seed {match.group(1)}"
return comment
def convert_keygen(input_path, param_set, start_tc_id=1):
"""Convert keygen test vectors."""
tests = parse_line_file(input_path)
json_tests = []
for i, test in enumerate(tests):
json_test = {
"tcId": start_tc_id + i,
"comment": test.get("comment", ""),
"seed": test["entropy"],
"ek": test["expected_public_key"],
"dk": test["expected_expanded_private_key"],
"result": "valid"
}
json_tests.append(json_test)
return {
"type": "MLKEMKeyGen",
"source": {
"name": "FIPS 203",
"version": "1.0"
},
"parameterSet": param_set,
"tests": json_tests
}
def convert_encaps(input_path, param_set, start_tc_id=1):
"""Convert encaps test vectors."""
tests = parse_line_file(input_path)
json_tests = []
for i, test in enumerate(tests):
result = test.get("expected_result", "pass")
result = "valid" if result == "pass" else "invalid"
json_test = {
"tcId": start_tc_id + i,
"comment": test.get("comment", ""),
"ek": test["public_key"],
"m": test["entropy"],
"c": test.get("expected_ciphertext", ""),
"K": test.get("expected_shared_secret", ""),
"result": result
}
json_tests.append(json_test)
return {
"type": "MLKEMEncaps",
"source": {
"name": "FIPS 203",
"version": "1.0"
},
"parameterSet": param_set,
"tests": json_tests
}
def convert_decaps(input_path, param_set, start_tc_id=1):
"""Convert decaps test vectors."""
tests = parse_line_file(input_path)
json_tests = []
for i, test in enumerate(tests):
result = test.get("expected_result", "pass")
result = "valid" if result == "pass" else "invalid"
json_test = {
"tcId": start_tc_id + i,
"comment": test.get("comment", ""),
"seed": test["private_key"],
"c": test["ciphertext"],
"K": test.get("expected_shared_secret", ""),
"result": result
}
json_tests.append(json_test)
return {
"type": "MLKEMDecaps",
"source": {
"name": "FIPS 203",
"version": "1.0"
},
"parameterSet": param_set,
"tests": json_tests
}
def write_output(base_dir, filename, test_group, schema):
"""Write a single test group to a JSON file."""
num_tests = len(test_group["tests"])
output = {
"algorithm": "ML-KEM",
"schema": schema,
"numberOfTests": num_tests,
"testGroups": [test_group]
}
output_path = os.path.join(base_dir, filename)
with open(output_path, 'w') as f:
json.dump(output, f, indent=2)
print(f"Wrote {output_path} with {num_tests} tests")
def main():
base_dir = "testvectors_v1"
param_sets = [
("512", "ML-KEM-512"),
("768", "ML-KEM-768"),
("1024", "ML-KEM-1024"),
]
for size, param_set in param_sets:
# Convert keygen (tcIds start at 1 for each file)
keygen_path = os.path.join(base_dir, f"keygen{size}ml-kem")
if os.path.exists(keygen_path):
keygen_group = convert_keygen(keygen_path, param_set)
write_output(base_dir, f"mlkem_{size}_keygen_seed_test.json", keygen_group,
"mlkem_keygen_seed_test_schema.json")
# Convert encaps
encaps_path = os.path.join(base_dir, f"encaps{size}ml-kem")
if os.path.exists(encaps_path):
encaps_group = convert_encaps(encaps_path, param_set)
write_output(base_dir, f"mlkem_{size}_encaps_seed_test.json", encaps_group,
"mlkem_encaps_seed_test_schema.json")
# Convert decaps
decaps_path = os.path.join(base_dir, f"decaps{size}ml-kem")
if os.path.exists(decaps_path):
decaps_group = convert_decaps(decaps_path, param_set)
write_output(base_dir, f"mlkem_{size}_decaps_seed_test.json", decaps_group,
"mlkem_decaps_seed_test_schema.json")
print()
if __name__ == "__main__":
main() |
|
(I'm not going to have time to run with this for another week, but if anyone wants to grab this and run with it, that's great. If not, I'll follow up in a week.) |
ec90c7e to
565332a
Compare
These are from C2SP#194 -- all I did was convert them to a JSON schema.
Remove duplicate mlkem_encaps_seed_test_schema.json and update the seed-based encaps test vectors to use the existing schema. This aligns the encaps test format with the existing schema conventions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make ek optional in mlkem_test_schema.json so it can be used for both the existing decaps tests (which include ek) and the new seed-based decaps tests (which omit ek). Remove the duplicate mlkem_decaps_seed_test_schema.json. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove the ", seed: ..." portion from test vector comments to avoid confusion with the decapsulation key seed field. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Ok, addressed all three review comments (each in their own commit for simplicity). I did not roll these vectors into the existing files, or attempted to compute |
565332a to
d28ae0b
Compare
Add a throw-away Go tool for generating the ek values for each of the NIST mlkem decaps seed tests. Produced ek values are written to a textfile alongside the JSON data since the JSON munging is better handled by Python. Test cases that can't produce an ek (e.g. due to an invalid seed input length) have an empty line produced instead of the hex encoded ek. Notably the Go stdlib doesn't include MLKEM 512 so we use the Cloudflare circl impl for that case. Since we have that dep anyway, we can cross-check the 768/1024 ek values across the stdlib and circl impls as an extra validation step.
Use a throw-away Python script to carefully insert the hex encoded ek values for each mlkem_*_decaps_seed_test.json testcase (when the ek value is available). Also renumber the tcID's by 1 to accommodate the goal of merging these into the mlkem_*_test.json files that presently have 1 pre-existing test case each.
This takes the test groups from the new mlkem_*_decaps_seed_test.json vectors into the pre-existing mlkem_*_test.json files now that the new test groups have their ek values populated as apppropriate. We use another throw-away Python script for this to maintain the overall formatting/field order of the test data easily.
Now that the test groups have been merged, update the count of test cases to properly reflect the total. This allows the vector linting to pass.
Use a throw-away script to renumber the mlkem_*_encaps_seed_test.json test cases based on the highest tcID in the pre-existing mlkem_*_encaps_test.json test files, then merge the test group after the initial test group.
Finally, we can remove the intermediate ek values and the separate mlkem_*_decaps_seed_test.json and mlkem_encaps_*_seed_test.json files since they're now incorporated into mlkem_*_test.json and mlkem_*_encaps_test.json
The tooling we added to update/merge the vectors is no longer needed.
Thanks!
I understand the hesitance to introduce new errors but I think we need more than a 1:1 mechanical translation here. I'd prefer not to merge new vector files that are an exact schema match to pre-existing vectors instead of merging them. I've tacked on some extra commits that do the work to generate |
|
Thank you! |
|
It would be nice if someone could replace |
|
@cpu do you want to do that or would you like me to? |
These are from #194 -- all I did was convert them to a JSON schema.