Skip to content

Conversation

@MikaelMayer
Copy link
Contributor

@MikaelMayer MikaelMayer commented Nov 20, 2025

Add B3 Language Support as Strata Backend Dialect

This PR adds complete support for the B3 verification language as a new Strata backend dialect.

The implementation successfully parses the entire B3 motivating example as-is.

DDM Definition

Three components:

  1. Concrete Syntax Tree (ParseCST.lean) - Defines B3's concrete syntax using DDM

    • Matches B3 language syntax including dots in identifiers (e.g., Map.select)
    • Pipe-delimited identifiers for special cases (e.g., |@0|)
  2. Abstract Syntax Tree (DefinitionAST.lean) - Defines B3's abstract syntax for Strata Core

    • De Bruijn indices for variable binding
    • Metadata unwrapping with @[unwrap] to simplify terminal nodes
    • Customizable metadata mapping
  3. Bidirectional Conversion (Conversion.lean) - AST ↔ CST conversion

    • Error tracking for unresolved identifiers, out-of-bounds references, and unsupported variable patterns

Technical Highlights

  • Inout parameters: Modeled as two values in the context (old and current), enabling old x syntax in procedure bodies and postconditions to reference pre-state values
  • Shadowed variables: Proper handling with disambiguation and detection of unsupported middle occurrences

Testing

Tests split across four categories: expressions, statements, declarations, and programs.

Round-trip tests display a formatted and minimal version of the AST in the conversion for better understanding.

Future Work

  • Verify B3 text files by invoking the B3 tool from Strata
  • Convert Strata Core to B3 AST for backend verification

@MikaelMayer MikaelMayer requested a review from a team as a code owner November 20, 2025 19:52
@MikaelMayer MikaelMayer marked this pull request as draft November 20, 2025 19:54
@atomb
Copy link
Contributor

atomb commented Nov 20, 2025

Why have Expression and Stmt types in here instead of using Lambda.LExpr and Imperative.Stmt? The DDM definition for B3 will allow capturing the exact syntax of B3, but I don't think we need to define it again beyond that.

@MikaelMayer
Copy link
Contributor Author

Why have Expression and Stmt types in here instead of using Lambda.LExpr and Imperative.Stmt? The DDM definition for B3 will allow capturing the exact syntax of B3, but I don't think we need to define it again beyond that.

If DDM is sufficient for my needs, then yes I'll use it. This is still a work in progress. You commented on this PR just before I enabled the draft mode.

@atomb
Copy link
Contributor

atomb commented Nov 20, 2025

If DDM is sufficient for my needs, then yes I'll use it. This is still a work in progress. You commented on this PR just before I enabled the draft mode.

If the DDM isn't sufficient, we should improve it until it is.

@MikaelMayer MikaelMayer marked this pull request as ready for review December 2, 2025 21:39
@atomb
Copy link
Contributor

atomb commented Dec 2, 2025

My main reaction to it is the same one as my original comment on the PR. It totally makes sense to have the B3 CST, but I think the B3 AST just duplicates what's in LExpr, Stmt, and Cmd. On the expression side, labeled expressions and let expressions aren't currently in LExpr, but labels to me seem like they should be metadata and let expressions could be either added to LExpr or encoded. I'm undecided which makes more sense.

On the command side, B3 includes check and reach, but we've already decided we'd like to add those to the Cmd type one way or another. On the statement side, B3 includes exit, but we've also already decided to add that to Stmt. The current goto is a stopgap that'll be unnecessary once we have an unstructured representation alongside the structured one. There's a draft PR for that already: #202. Finally, it includes ifcase and choose. We can encode those using other control flow, and I'd argue that that's the right way to do it. Having a non-deterministic choose is useful, but I think it makes sense to represent programs either fully deterministically, like in Stmt, or fully non-deterministically, like in NondetStmt, and to be able to go back and forth. Any given analysis works better if either everything's in one form or everything's in the other form.

Finally, procedure declarations include inout parameters. The procedure notions in Boogie, which should move to Imperative, don't yet include that, but should.

@MikaelMayer MikaelMayer force-pushed the import-b3-language branch 2 times, most recently from 4bbff2b to 8c7a4b6 Compare December 17, 2025 16:22
This commit adds the B3 language implementation with full DDM integration.

Changes:
- B3 language AST definitions and conversions
- DDM-based parsing and formatting for B3
- Applied @[unwrap] to literal operations (intLit, boolLit, stringLit) and id operation
- Fixed mapMetadata functions to handle unwrapped parameters
- Comprehensive test suite for B3 DDM formatting
- Removed Strata/Languages/B3/B3.lean (empty documentation file with no code)
- Investigated old() support for inout parameters:
  - B3 has old_id operation with syntax: old name (not old(name))
  - Test expectations show unwrap is working correctly (indices are plain Nat)
  - Further investigation needed for proper old() syntax in specs and body
DDM is used for both CST and AST representations, so renamed:
- FromDDMContext -> FromCSTContext
- expressionFromDDM -> expressionFromCST
- statementFromDDM -> statementFromCST
- declFromDDM -> declFromCST
- programFromDDM -> programFromCST

This makes it clear these functions convert from CST to AST.
- Inout parameters now create two context entries (old and current values)
- ToCSTContext.lookup now returns (name, isOld) tuple
- isOld is computed dynamically by checking for shadowing
- When isOld is true, emit old_id CST node instead of id
- Added test for inout parameter with old() in spec and body
- This enables proper roundtrip for procedures using old values of inout parameters
- Inout parameters now create two context entries (old and current values)
- ToCSTContext.lookup returns (name, isOld) computed dynamically
- isOld determined by checking if variable is first occurrence with shadowing
- When isOld is true, emit old_id CST node which prints as 'old x'
- Added test demonstrating old() in both spec and body for inout parameters
- Enables proper roundtrip for procedures using old values
The isOld flag is now computed directly in lookup, so isShadowed is not needed.
The inProcedure field was never checked, so removed it along with enterProcedure function.
- Moved Strata/Languages/B3/DDMConversion.lean to DDMTransform/Conversion.lean
- Removed entire AnnotationPreserving section with SR-suffixed functions
- The non-SR functions with B3AnnFromCST M instance already preserve annotations
- Eliminated ~500 lines of duplicated code
- Updated all imports in test files
- Removed all SR-suffixed functions (~500 lines)
- Added missing ToCST helper functions (fParameterToCST, pParameterToCST, specToCST)
- Updated tests to call B3.expressionFromCST/expressionToCST directly
- Fixed old_id syntax to use name:0 to avoid parentheses
- All conversions now preserve annotations using B3AnnFromCST M instance
- Add @[unwrap] to terminal parameters in ParseCST.lean (Num, Str, Ident)
  to avoid unnecessary metadata wrapping, matching DefinitionAST pattern
- Update Conversion.lean ToCST/FromCST to handle unwrapped terminals
- Fix autoinv context: now uses ctx' (includes declared variable) instead of ctx
- Update test expectations to reflect unwrapped terminal signatures
- Delete unused files: Identifiers.lean and Examples/DDMTransform.lean
- Remove @[unwrap] from operators with multiple RHS elements or prefixes
  to preserve position information for sub-components
- Keep @[unwrap] only for: natLit, strLit, id (single terminal on RHS)
- Revert Conversion.lean changes for operators that should remain wrapped
- Delete unused README.md from DDMTransform directory
- Update test expectations to match corrected signatures
- Remove annForId and annForIdValue from B3AnnFromCST typeclass
- Use ann directly for .id case instead of calling annForId
- B3AnnFromCST should only be used to extract multiple metadata from one,
  or combine multiple into one, not for 1:1 passthrough cases
- Update typeclass documentation to clarify purpose
…ions

- Document exact CST → AST metadata count for each method
- Show which metadata are being extracted (1→2, 1→3, 1→5)
- Make it clear why each method exists (AST needs more metadata than CST has)
- Example: 'CST: .natLit ann n (1 metadata) → AST: .literal m (.intLit m2 n) (2 metadata)'
- Replace verbose CST→AST examples with short explanations
- Focus on why: 'AST needs one extra metadata for X'
- Group related methods with section comments
- Much easier to read and understand at a glance
github-merge-queue bot pushed a commit that referenced this pull request Dec 18, 2025
Adds a document describing the semantics of Strata Core (consisting of
`Lambda` and `Imperative` components). The document is written in Verso
and imports the Strata library to allow docstrings to appear directly in
the text.

Note that Strata Core is not a new dialect, but rather a new name for
the combination of `Lambda` and `Imperative`. It does not yet have a
concrete syntax. Concrete syntax will likely be provided through an
evolution of #224, to assist in the goal of keeping Strata Core as close
to B3 as possible.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Josh Cohen <cohenjo@amazon.com>
Co-authored-by: Mikaël Mayer <MikaelMayer@users.noreply.github.com>
Co-authored-by: Shilpi Goel <shigoel@gmail.com>
Co-authored-by: Andrew Wells <130512013+andrewmwells-amazon@users.noreply.github.com>
MikaelMayer and others added 8 commits December 23, 2025 08:38
- Add custom identifier character classification to allow dots
- Include ? and ! suffixes for Lean naming conventions
- Maintain compatibility with precedence annotations (:)
- Add comprehensive tests for single and consecutive dots
- Verify AST preservation and round-trip formatting
- Updated all conversion functions to return (result, errors) tuples directly
- Removed duplicate WithErrors wrapper functions (expressionFromCSTWithErrors, expressionToCSTWithErrors, programFromCSTWithErrors, programToCSTWithErrors)
- Integrated error tracking into FromCSTContext.lookup and ToCSTContext.lookup
- Updated all helper functions (callArgFromCST, stmtFromCST, declFromCST, etc.) to thread errors
- Fixed bug: ToCSTContext.push now preserves inProcedure flag
- Fixed bug: Old values in procedure contexts now correctly supported without errors
- Fixed bug: Variable autoinv expressions now evaluated with variable in scope
- Added ToString and ToFormat instances for error types
- Updated format tests to display conversion errors in output
- Fixed String.mk deprecation warnings (replaced with String.ofList)
- Added DDMConversionErrorTests.lean for error tracking validation
@MikaelMayer MikaelMayer changed the title Import B3 language definitions and pretty printing rules Add B3 Language Support as Strata Backend Dialect Dec 24, 2025
Josh Cohen and others added 5 commits December 29, 2025 17:38
- Simplify lookup function by removing redundant bounds check
- Refactor lookupLast to use accumulator pattern for finding last occurrence
- Use record update syntax in FromCSTContext.push
- Improve code readability and maintainability
Address review feedback by documenting how the B3 AST differs from the CST:
- AST uses de Bruijn indices instead of identifier names
- AST has unified constructs where CST has multiple syntactic forms
- Note that conversions can return error lists

Also fix unused variable warning in pattern match.
joscoh
joscoh previously approved these changes Dec 30, 2025
Adapt B3 test files to work with the new DDM formatting API introduced in PR #287:
- Replace private 'cformat' with 'mformat(...).format'
- Replace private 'formatContext' with 'FormatContext.ofDialects'
- Replace private 'formatState' with manual FormatState construction

The DDM module refactoring made these methods private, requiring tests
to use the public API directly.
joscoh
joscoh previously approved these changes Dec 30, 2025
@shigoel shigoel enabled auto-merge December 31, 2025 16:28
@shigoel shigoel requested a review from joscoh December 31, 2025 16:28
@shigoel shigoel added this pull request to the merge queue Dec 31, 2025
Merged via the queue into main with commit 0810e7e Dec 31, 2025
11 checks passed
@shigoel shigoel deleted the import-b3-language branch December 31, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants