DO NOT MERGE: Automatic Deployment #12

philip-paul-mueller · 2025-11-21T14:06:08Z

This PR contains the workflow file for automatically updating GT4Py's custom Python index.
It must be included in any release PR, see instruction in the integration branch.

NOTE:
Currently the workflow looks for tags of the form __phimuell_deployment_test_* and uses the demo index.

…cl#2164) # Pull Request: Machine Learning Integration for DaCe ## Overview This PR adds comprehensive machine learning capabilities to DaCe through three tightly integrated components: 1. **Automatic Differentiation (AD)** - Reverse-mode gradient computation for SDFGs 2. **ONNX Integration** - Import and execute neural network models 3. **PyTorch Integration** - Bidirectional interoperability with PyTorch's autograd system Together, these components enable DaCe to optimize and accelerate machine learning workloads, particularly neural network training and inference. ## High-Level Architecture ``` PyTorch Model ↓ ONNX Export ↓ DaCe SDFG (Forward) ↓ Automatic Differentiation ↓ DaCe SDFG (Backward) ↓ Compiled Code Generation ↓ PyTorch Operator (with Autograd) ``` ## Component 1: Automatic Differentiation (`dace/autodiff/`) ### Purpose Provides **reverse-mode automatic differentiation** for SDFGs, enabling gradient computation for any DaCe program. This is the foundation for neural network training and gradient-based optimization. ### Key Capabilities - **Full SDFG Support**: Differentiates maps, tasklets, nested SDFGs, loops, and library nodes - **Control Flow**: Handles loops (LoopRegion) and conditionals - **ONNX Operations**: 50+ backward implementations for ONNX operators - **Data Forwarding**: Flexible strategies (store vs. recompute) for memory/compute tradeoffs - **Extensible Registry**: Plugin-based system for adding backward rules ### Core Algorithm 1. **Forward Pass Execution**: Run original computation and identify required intermediates 2. **Backward Pass Generation**: Traverse computation graph in reverse, accumulating gradients 3. **Node Reversal**: Each forward node (Map, Tasklet, ONNXOp) has a registered backward implementation 4. **Gradient Accumulation**: Use write-conflict resolution (WCR) for multi-path gradients ### Key Files | File | Lines | Purpose | |------|-------|---------| | `backward_pass_generator.py` | ~800 | Core AD engine that orchestrates backward pass generation | | `implementations/onnx_ops.py` | ~2000 | Backward implementations for 50+ ONNX operations | | `implementations/dace_nodes.py` | ~600 | Backward rules for core SDFG elements (Tasklet, Map, etc.) | | `data_forwarding/manager.py` | ~300 | Store vs. recompute strategy coordination | --- ## Component 2: ONNX Integration (`dace/libraries/onnx/`) ### Purpose Enables **importing and executing ONNX neural network models** within DaCe. Converts ONNX graphs to optimized DaCe SDFGs for efficient execution on CPU/GPU. ### Key Capabilities - **Model Import**: Load ONNX models from files or protobuf objects - **100+ Operations**: Dynamically generated node classes for all ONNX ops - **Shape Inference**: Automatic symbolic and concrete shape computation - **Multi-Strategy Implementations**: Pure (correctness), optimized (performance), hardware-specific - **Type Safety**: Schema-based validation and type checking ### Core Architecture **Dynamic Node Generation**: - Registry system generates Python classes for all ONNX operations at import time - Each operation has schema, properties, connectors, and implementations - Example: `ONNXConv`, `ONNXMatMul`, `ONNXSoftmax` (100+ generated classes) **Implementation Strategies**: 1. **Pure Implementations** (`pure_implementations.py`): Reference implementations in Python/NumPy 2. **Optimized Implementations** (`img_op_implementations.py`): Hand-crafted SDFGs for performance 3. **Hardware-Specific**: Future GPU/FPGA specialized implementations **Import Pipeline**: ``` ONNX Model → Validation → Shape Inference → Simplification → SDFG Construction → Compilation ``` ### Key Files | File | Lines | Purpose | |------|-------|---------| | `onnx_importer.py` | 711 | Main entry point, orchestrates import pipeline | | `op_implementations/pure_implementations.py` | 3052 | Reference implementations for 40+ operations | | `nodes/onnx_op_registry.py` | 325 | Dynamic node class generation | | `schema.py` | 390 | Type system and validation | | `shape_inference/symbolic_shape_infer.py` | 1976 | Symbolic shape inference (Microsoft-sourced) | --- ## Component 3: PyTorch Integration (`dace/libraries/torch/`) ### Purpose Provides **bidirectional integration** between PyTorch and DaCe. Enables optimizing PyTorch models with DaCe while maintaining PyTorch's autograd compatibility. ### Key Capabilities - **Model Optimization**: Convert `torch.nn.Module` to optimized DaCe SDFGs - **Autograd Integration**: Backward pass generation integrates with PyTorch's autograd - **Dual Dispatch**: C++ extension (performance) or CTypes (flexibility) - **Zero-Copy Tensors**: DLPack protocol for efficient memory sharing - **Training Support**: Full forward + backward pass compilation ### Core Architecture **Integration Flow**: ``` PyTorch Model → ONNX Export → DaCe SDFG → Backward Generation → Compilation → PyTorch Operator ``` **Dispatcher Strategies**: 1. **C++ Extension** (`cpp_torch_extension.py`): Native PyTorch operator with autograd - High performance - 64 parameter limit - Slower compilation 2. **CTypes Module** (`ctypes_module.py`): Pure Python dispatcher - Unlimited parameters - Faster compilation - Slight overhead **Zero-Copy Memory Sharing**: - DLPack protocol enables PyTorch tensors to view DaCe memory without copying - Bidirectional: DaCe → PyTorch (outputs) and PyTorch → DaCe (inputs) ### Key Files | File | Lines | Purpose | |------|-------|---------| | `dispatchers/cpp_torch_extension.py` | 717 | C++ code generation for PyTorch operators | | `dispatchers/ctypes_module.py` | 230 | CTypes-based dispatcher | | `dlpack.py` | 199 | Zero-copy tensor sharing via DLPack | | `environments/pytorch_env.py` | 94 | CMake build configuration | --- ## How Components Work Together ### Example: Training a PyTorch Model with DaCe ```python import torch from dace.frontend.python import DaceModule # 1. Define PyTorch model model = MyNeuralNetwork() optimizer = torch.optim.Adam(model.parameters()) # 2. Wrap with DaCe (compiles on first call) dace_model = DaceModule(model, dummy_inputs, backward=True) # 3. Training loop (standard PyTorch code) for inputs, labels in dataloader: optimizer.zero_grad() outputs = dace_model(inputs) # DaCe-optimized forward pass loss = criterion(outputs, labels) loss.backward() # DaCe-optimized backward pass optimizer.step() ``` **What Happens Internally**: 1. **First Call**: PyTorch model → ONNX export → DaCe SDFG (via ONNX integration) 2. **Backward Generation**: Forward SDFG → Backward SDFG (via autodiff) 3. **Compilation**: Both SDFGs compiled to optimized code 4. **Dispatcher**: C++ extension or CTypes wrapper created 5. **Forward Pass**: DaCe executes optimized forward computation 6. **Backward Pass**: DaCe executes generated backward computation 7. **Gradient Return**: Gradients flow back to PyTorch optimizer ### Data Flow ``` PyTorch Tensor (input) ↓ Zero-copy (DLPack) DaCe Array ↓ Optimized SDFG Execution DaCe Array (output) ↓ Zero-copy (DLPack) PyTorch Tensor (output) ↓ loss.backward() PyTorch Tensor (grad_output) ↓ Zero-copy (DLPack) DaCe Array (backward pass input) ↓ Backward SDFG Execution DaCe Array (grad_input) ↓ Zero-copy (DLPack) PyTorch Tensor (grad_input) ``` --- ## Testing Strategy ### Test Organization ``` tests/ ├── autodiff/ # AD correctness tests │ ├── test_single_state.py # Basic AD operations │ └── torch/ # PyTorch integration tests │ ├── test_training.py # End-to-end training │ ├── test_bert_encoder_backward.py # BERT model │ └── test_llama_decoder_backward.py # LLaMA model │ ├── onnx/ # ONNX import tests │ ├── test_python_frontend.py # Basic operations │ ├── test_bert_subgraphs.py # Real model subgraphs │ └── test_input_outputs.py # I/O handling │ └── torch/ # PyTorch integration tests │ ├── test_lenet.py # Simple CNN │ ├── test_bert_encoder.py # Transformer encoder │ └── test_llama_decoder.py # Decoder architecture │ └── npbench/ # AD tests on NPBench kernels ``` ### Test Coverage | Component | Test Files | Coverage | |-----------|-----------|----------| | Autodiff Core | 15+ files | Tasklets, maps, loops, nested SDFGs | | ONNX Integration | 20+ files | Import, execution, type handling | | PyTorch Integration | 15+ files | Forward, backward, training loops | ### Running Tests ```bash # All basic tests (excluding hardware-specific) pytest -m "(autodiff or torch or onnx) and not long" tests/ # AD tests only pytest tests/autodiff/ # ONNX tests only pytest tests/onnx/ # PyTorch tests only pytest tests/torch/ ``` --- ## Known Limitations and Future Work ### Current Limitations 1. **Recompute Strategy**: Experimental, not production-ready 2. **Control Flow**: Conditionals are inlined into state machine (not reversed as ControlFlowRegions) 3. **Second-Order Gradients**: Not yest tested --- ## Documentation Each component has detailed design documentation: - [`dace/autodiff/autodiff.md`](dace/autodiff/autodiff.md) - Complete AD system design - [`dace/libraries/onnx/onnx.md`](dace/libraries/onnx/onnx.md) - ONNX integration architecture - [`dace/libraries/torch/torch.md`](dace/libraries/torch/torch.md) - PyTorch integration details These documents provide: - Detailed component descriptions - Algorithm explanations - Code walkthrough - Extension points - Implementation notes --- ## Impact on DaCe ### Code Additions | Component | Lines of Code | Files | |-----------|--------------|-------| | Autodiff | ~8,000 | 15+ files | | ONNX | ~7,000 | 20+ files | | PyTorch | ~1,500 | 10+ files | | **Total** | **~16,500** | **45+ files** | ### Dependencies New dependencies (already in `setup.py`): - `onnx` - ONNX model format - `onnxsim` - ONNX graph simplification - `torch` - PyTorch framework (optional) - `protobuf` - Protocol buffers (for ONNX) - `jax` - For gradient numerical validation tests -`transformers` - For testing the Pytorch/ONNX frontends - `efficientnet_pytorch`- For testing EfficientNet --- --------- Co-authored-by: Oliver Rausch <oliverrausch99@gmail.com>

Modified the reloading scheme used by `ReloadableDLL`. If the library (of the compiled SDFG) is already loaded, through another instance of `CompiledSDFG` then `ReloadableDLL` will copy the SDFG library and try to load that until it founds a name that is free. In ICON4Py we noticed that this leads sometime to a segmentation fault on Linux, but not on MacOS X. We traced the main issue down to the fact that `ReloadableDLL` created a copy of the SDFG library without checking if the new name is already used, instead the file is simply overwritten. The new scheme changes this slightly, in the following ways: - If the new name is already taken, then no copy is performed and the class tries to use that file, that already exists. - Instead of copying library `n - 1` to `n` it always makes a copy from the initial library. --------- Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>

Updated ignored paths and build notification settings.

Increased pytest timeout from 300 to 600 seconds.

## Refactor `dace/data.py` into `dace/data/` package ### Summary This PR refactors the monolithic `dace/data.py` file into a modular `dace/data/` package with separate files for different functionality, improving code organization and maintainability. ### Changes - [x] **`dace/data/core.py`**: Core data descriptor classes (`Data`, `Scalar`, `Array`, `ContainerArray`, `Stream`, `Structure`, `View`, `Reference` and their subclasses) - [x] **`dace/data/tensor.py`**: Tensor/sparse tensor support (`Tensor`, `TensorIndex*` classes) - [x] **`dace/data/creation.py`**: Data descriptor creation functions (`create_datadescriptor`, `make_array_from_descriptor`, `make_reference_from_descriptor`) - [x] **`dace/data/ctypes_interop.py`**: Ctypes interoperability (`make_ctypes_argument`) - [x] **`dace/data/ml.py`**: ML-related descriptors (`ParameterArray`) - [x] **`dace/data/__init__.py`**: Re-exports all public API for backward compatibility - [x] **`dace/utils.py`**: Utility functions (`find_new_name`, `deduplicate`, `prod`) - [x] **`dace/properties.py`**: Updated to handle circular import gracefully - [x] **`dace/autodiff/library/library.py`**: Updated to import `ParameterArray` from the new location - [x] **Deleted** old `dace/data.py` file - [x] **Removed** `Number` and `ArrayLike` from `dace/data/__init__.py` (other places import directly) - [x] **Moved** `_prod` to `dace/utils.py` as `prod` (kept `_prod` export for backward compat) - [x] **Fixed** broken imports in `data_report.py`, `data_layout_tuner.py`, and `cutout.py` ### Backward Compatibility All public APIs are re-exported from `dace.data`, ensuring backward compatibility with existing code.  <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Refactor `dace/data.py`</issue_title> > <issue_description>`data.py` is a monolithic file containing classes for core data containers (Data, Scalar, Array, Stream, View, Reference, and their subclasses `*{View, Reference}`; functionality to get data descriptors from arbitrary objects; derived objects for Tensors and sparse tensors; and other functions. > > This issue will be resolved once `data.py` is refactored to a `dace/data/*` folder, which will contain separate files for: > 1. core descriptor classes > 2. structures (the Structure class and similar functionality) > 3. tensors/sparse tensors > 4. descriptor creation > 5. ML-related data descriptors, such as parameter arrays (see `dace/autodiff/library/library.py`) > 6...N. Other functions and classes categorized by their semantic meaning. > > The code for `dace/data/*` will be refactored out of `data.py` (which should not exist at the end of this issue), `dtypes.py` (which may exist but be shorter), and other files that contain data descriptors (subclasses of Data/Array/Stream/Structure/View/Reference, such as ParameterArray. Try to find all such subclasses in the codebase barring tests/* and samples/*). > > Lastly, utility functions in `data.py` and `dtypes.py` (only those two files for this issue), such as `find_new_name` from data.py and `deduplicate` from dtypes.py, should find themselves in a new `dace/utils.py` file.</issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> - Fixes spcl#2244  --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>

…to seq. maps inside GPU kernels or gpu dev. maps (spcl#2088) GPU codegen crashes and generates incorrect code with dynamic inputs to seq. maps inside GPU kernels or gpu dev. maps --------- Co-authored-by: alexnick83 <31545860+alexnick83@users.noreply.github.com> Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>

…pcl#2246) Updated the documentation for proposed pass decomposition, including changes to pass names and descriptions for clarity.

Updated everything.

d012c26

philip-paul-mueller changed the title ~~DO NOT MERGE~~ DO NOT MERGE: Automatic deployment Nov 21, 2025

philip-paul-mueller added 7 commits November 21, 2025 15:10

Disabled the kickstarter.

211e415

Let's disable everything.

f5d3d9d

Next step.

2970a75

Empty commit in the branch containing the workflow file.

eb31e6c

Small update.

6d71466

Made it run always.

81b8cfa

Now it has run once, so let's make it less runnable.

362ab70

philip-paul-mueller changed the title ~~DO NOT MERGE: Automatic deployment~~ DO NOT MERGE: Automatic Deployment Dec 1, 2025

Restored the original workflow files.

8b7cce5

philip-paul-mueller mentioned this pull request Dec 1, 2025

Do Not Merge: Integration Branch for GT4Py Next #1

Draft

affifboudaoud and others added 14 commits December 2, 2025 09:38

Update .coveragerc

076fd31

Modify codecov.yml to change ignored files and builds

e19e785

Updated ignored paths and build notification settings.

Increase timeout for ML tests (spcl#2243)

0efa622

Increased pytest timeout from 300 to 600 seconds.

Support dace.map syntax for struct fields (spcl#2187)

cc59d77

Modular Code Generation Docs: Add LowerConsume and remove numbering (s…

312f37f

…pcl#2246) Updated the documentation for proposed pass decomposition, including changes to pass names and descriptions for clarity.

Update C++ standard to C++20 (spcl#2253)

387f1e8

Merge remote-tracking branch 'spcl/main' into automatic_gt4py_deployment

96f963a

Made the update point to the correct repo.

f3198ef

Updated the dace updater workflow file.

ecb2785

Updated the workflow file.

32224bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DO NOT MERGE: Automatic Deployment #12

DO NOT MERGE: Automatic Deployment #12

Uh oh!

philip-paul-mueller commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DO NOT MERGE: Automatic Deployment #12

Are you sure you want to change the base?

DO NOT MERGE: Automatic Deployment #12

Uh oh!

Conversation

philip-paul-mueller commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

philip-paul-mueller commented Nov 21, 2025 •

edited

Loading