Skip to content

Conversation

@cristianoc
Copy link
Collaborator

@cristianoc cristianoc commented Dec 13, 2025

Summary

Add support for parallel processing of CMT files during dead code and exception analysis using OCaml 5 Domains.

Features

  • -parallel N: Process files using N parallel domains (0 = sequential, default)
  • -parallel -1: Auto-detect number of CPU cores using Domain.recommended_domain_count()
  • -timing: Report internal timing breakdown by analysis phase
  • Benchmark infrastructure: New test project for performance measurement

How It Works

The parallelization targets the CMT file processing phase, which is the main bottleneck (typically 75-80% of total analysis time).

Parallel execution model:

  1. Files are divided into chunks, one per domain
  2. Each domain processes its chunk independently, collecting results in local data structures
  3. When a domain finishes its chunk, it briefly acquires a mutex and merges its local results into shared state
  4. After all domains complete (Domain.join), the merged results are passed to the analysis phase

The mutex is only held during the merge operation (list concatenation), not during file processing, which minimizes contention.

The analysis and reporting phases remain sequential as they require merged cross-file data.

Testing

Correctness Tests

# Run regular reanalyze tests (sequential)
make test-reanalyze

# Run parallel mode tests (verifies parallel produces same results)
make test-reanalyze-parallel

Benchmark Tests

# Full benchmark: generates 50 copies of test files (~5000 files), compiles, and times analysis
make benchmark-reanalyze

# Or with custom number of copies
make benchmark-reanalyze COPIES=100

Observed Results (50 copies, ~4300 files, 12 cores)

Timing Breakdown

Phase Sequential Parallel (-1 = auto) Speedup
CMT processing 0.90s (78%) 0.45s (65%) 2.0x
Analysis 0.24s (21%) 0.24s (35%) 1.0x
Reporting 0.001s 0.001s -
Total 1.15s 0.69s 1.7x

Key Observations

  1. CMT processing is the bottleneck - accounts for ~78% of total time
  2. Parallelization provides ~2x speedup for CMT processing phase
  3. Overall speedup of ~1.7x with 12 cores
  4. Same results: Both modes report identical issue counts (19,000 issues)

Usage Examples

# Sequential (default)
reanalyze -config

# Parallel with 4 domains
reanalyze -config -parallel 4

# Auto-detect cores
reanalyze -config -parallel -1

# With timing output
reanalyze -config -parallel -1 -timing

Sample Timing Output

Using 12 parallel domains for 4900 files
  Analysis reported 19000 issues (...)

=== Timing ===
  CMT processing: 0.450s (65.5%)
  Analysis:       0.240s (34.3%)
  Reporting:      0.001s (0.1%)
  Total:          0.691s

Remove global ModulePath mutable state used during per-file AST traversal by
threading ModulePath.t explicitly through traversal code.

This is preparation for parallel per-file processing and reduces shared global
state during MAP.

Signed-Off-By: Cristiano Calcagno <cristiano.calcagno@gmail.com>
Record type dependency edges immediately instead of buffering them in a global
ref and flushing at end-of-file.

Note: this reorders debug output in reanalyze logs: addTypeReference lines now
appear next to extendTypeDependencies, rather than being emitted in a later
end-of-file flush.

Signed-Off-By: Cristiano Calcagno <cristiano.calcagno@gmail.com>
Stop using a global TypeLabels hashtable during AST processing.
Instead, compute type-label dependencies in a post-merge pass from merged
Declarations and add the corresponding type-reference edges before solving.

Fix: use raw decl positions (not declGetLoc/posAdjustment) when building
cross-file label indices, since reference graph keys are raw positions.

Note: debug output in deadcode expected logs is reordered due to moving label
linking from per-file traversal to the post-merge pass.

Signed-Off-By: Cristiano Calcagno <cristiano.calcagno@gmail.com>
Stop using global mutable exception declaration state during AST processing.
Instead, derive a pure exception lookup from merged Declarations and use it
when processing CrossFileItems exception refs.

Signed-Off-By: Cristiano Calcagno <cristiano.calcagno@gmail.com>
- Replace global Values.valueBindingsTable with per-file values_builder
- Replace global Checks.checks with per-file checks_builder
- processCmt now returns file_result with per-file data
- Add runChecks to process all checks after merging values tables
- Update Reanalyze.ml to collect exception results and run checks at end
- This enables future parallelization of AST processing
Add support for parallel processing of CMT files during dead code and
exception analysis using OCaml 5 Domains.

Features:
- New -parallel N flag: process files using N domains (0 = sequential)
- -parallel -1: auto-detect number of CPU cores
- New -timing flag: report internal timing breakdown by phase
- New benchmark infrastructure for performance testing

The parallelization targets the CMT file processing phase, which is the
main bottleneck (typically 75-80% of analysis time). The analysis and
reporting phases remain sequential as they require merged cross-file data.

Signed-Off-By: Cursor AI <noreply@cursor.com>
@cristianoc cristianoc requested a review from zth December 13, 2025 05:51
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Break down timing into sub-phases:
- CMT processing:
  - File loading: time spent reading and traversing CMT files
  - Result collection: time spent merging results from domains
- Analysis:
  - Merging: combining per-file data into global structures
  - Solving: running the liveness solver

Signed-Off-By: Cursor AI <noreply@cursor.com>
@cristianoc
Copy link
Collaborator Author

Updated Timing Breakdown (50 copies, ~4900 files, 12 cores)

Added granular timing that separates sub-phases:

Sequential Mode

=== Timing ===
  CMT processing:     0.779s (77.3%)
    - File loading:     0.779s
    - Result collection: 0.000s
  Analysis:           0.227s (22.6%)
    - Merging:          0.081s
    - Solving:          0.146s
  Reporting:          0.001s (0.1%)
  Total:              1.007s

Parallel Mode (auto-detect = 12 cores)

=== Timing ===
  CMT processing:     0.422s (63.5%)
    - File loading:     0.422s
    - Result collection: 0.000s
  Analysis:           0.242s (36.4%)
    - Merging:          0.094s
    - Solving:          0.148s
  Reporting:          0.001s (0.1%)
  Total:              0.664s

Key Observations

Sub-phase Sequential Parallel Speedup
File loading 0.779s 0.422s 1.85x
Result collection 0.000s 0.000s -
Merging 0.081s 0.094s 0.86x
Solving 0.146s 0.148s 0.99x
Total 1.007s 0.664s 1.52x
  • Result collection is negligible - mutex contention is minimal
  • File loading is the parallelization target and shows 1.85x speedup
  • Analysis phases (merging, solving) are sequential and take similar time in both modes
  • The slight slowdown in merging for parallel mode is likely due to different list ordering

@cristianoc
Copy link
Collaborator Author

cristianoc commented Dec 13, 2025

@cknitt to test, and eventually enable, parallelism, we need to build with ocaml 5.
CI currently also builds with older.
Thoughts?

@cknitt
Copy link
Member

cknitt commented Dec 13, 2025

@cknitt to test, and eventually enable, parallelism, we need to build with ocaml 5. CI currently also builds with older. Thoughts?

I kept the older versions in CI to make sure that we don't break compatibility inadvertently, so that we can support a broad version range when we eventually publish the compiler libs on OPAM.

But enabling parallelism is a good reason to move to OCaml 5 only, so this is fine with me.

@cristianoc
Copy link
Collaborator Author

@cknitt to test, and eventually enable, parallelism, we need to build with ocaml 5. CI currently also builds with older. Thoughts?

I kept the older versions in CI to make sure that we don't break compatibility inadvertently, so that we can support a broad version range when we eventually publish the compiler libs on OPAM.

But enabling parallelism is a good reason to move to OCaml 5 only, so this is fine with me.

What's the earliest version we should keep?

@cknitt
Copy link
Member

cknitt commented Dec 13, 2025

What's the earliest version we should keep?

I would say the earliest we can based on what features/lib you require for parallelism.

- Update CI compatibility test to use OCaml 5.0.1
- Update all package dependencies in dune-project to require OCaml >= 5.0.1
- Update rescript.opam.template to require OCaml >= 5.0.1
- Remove redundant OCaml version check from syntax_benchmarks/dune
- Add 'set -e' to exit immediately on errors
- Use 'opam exec --' to match checkformat.sh behavior
- Now make format will fail visibly instead of silently continuing when formatting fails
@cristianoc cristianoc force-pushed the reanalyze-reduce-global-state-ast branch from a55cdab to 452a4b1 Compare December 15, 2025 04:16
@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 15, 2025

Open in StackBlitz

rescript

npm i https://pkg.pr.new/rescript-lang/rescript@8089

@rescript/darwin-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-arm64@8089

@rescript/darwin-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-x64@8089

@rescript/linux-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-arm64@8089

@rescript/linux-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-x64@8089

@rescript/runtime

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/runtime@8089

@rescript/win32-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/win32-x64@8089

commit: c3cbe6e

@cristianoc cristianoc requested a review from cknitt December 15, 2025 04:52
@cristianoc
Copy link
Collaborator Author

What's the earliest version we should keep?

I would say the earliest we can based on what features/lib you require for parallelism.

OK it's 5.0.0.
PR ready to do go now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants