-
Notifications
You must be signed in to change notification settings - Fork 0
Do Not Merge: Integration Branch for GT4Py Next #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
philip-paul-mueller
wants to merge
17
commits into
main
Choose a base branch
from
gt4py-next-integration
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Apr 30, 2025
philip-paul-mueller
added a commit
to GridTools/gt4py
that referenced
this pull request
Apr 30, 2025
Instead of pulling directly from the official DaCe repo, we now (for the time being) pull from [this PR](GridTools/dace#1). This became necessary as we have a lot of open PR in DaCe and need some custom fixes (that can in their current form not be merged into DaCe). In the long term however, we should switch back to the main DaCe repo.
7fcf8f9 to
8b9b674
Compare
8b9b674 to
268fc18
Compare
964e84b to
2d85437
Compare
2d85437 to
9f72250
Compare
88c99f4 to
d779cd1
Compare
d779cd1 to
4f40029
Compare
4f40029 to
09dfda3
Compare
09dfda3 to
87c77ef
Compare
87c77ef to
c2a4e42
Compare
c2a4e42 to
0deba99
Compare
Merged
2 tasks
178037a to
9114985
Compare
33b63a1 to
2417e09
Compare
3472895 to
bed3b0e
Compare
a313f6a to
6ff7bf1
Compare
4f9c76f to
7a0d751
Compare
85080da to
3eec6f6
Compare
…cl#2164) # Pull Request: Machine Learning Integration for DaCe ## Overview This PR adds comprehensive machine learning capabilities to DaCe through three tightly integrated components: 1. **Automatic Differentiation (AD)** - Reverse-mode gradient computation for SDFGs 2. **ONNX Integration** - Import and execute neural network models 3. **PyTorch Integration** - Bidirectional interoperability with PyTorch's autograd system Together, these components enable DaCe to optimize and accelerate machine learning workloads, particularly neural network training and inference. ## High-Level Architecture ``` PyTorch Model ↓ ONNX Export ↓ DaCe SDFG (Forward) ↓ Automatic Differentiation ↓ DaCe SDFG (Backward) ↓ Compiled Code Generation ↓ PyTorch Operator (with Autograd) ``` ## Component 1: Automatic Differentiation (`dace/autodiff/`) ### Purpose Provides **reverse-mode automatic differentiation** for SDFGs, enabling gradient computation for any DaCe program. This is the foundation for neural network training and gradient-based optimization. ### Key Capabilities - **Full SDFG Support**: Differentiates maps, tasklets, nested SDFGs, loops, and library nodes - **Control Flow**: Handles loops (LoopRegion) and conditionals - **ONNX Operations**: 50+ backward implementations for ONNX operators - **Data Forwarding**: Flexible strategies (store vs. recompute) for memory/compute tradeoffs - **Extensible Registry**: Plugin-based system for adding backward rules ### Core Algorithm 1. **Forward Pass Execution**: Run original computation and identify required intermediates 2. **Backward Pass Generation**: Traverse computation graph in reverse, accumulating gradients 3. **Node Reversal**: Each forward node (Map, Tasklet, ONNXOp) has a registered backward implementation 4. **Gradient Accumulation**: Use write-conflict resolution (WCR) for multi-path gradients ### Key Files | File | Lines | Purpose | |------|-------|---------| | `backward_pass_generator.py` | ~800 | Core AD engine that orchestrates backward pass generation | | `implementations/onnx_ops.py` | ~2000 | Backward implementations for 50+ ONNX operations | | `implementations/dace_nodes.py` | ~600 | Backward rules for core SDFG elements (Tasklet, Map, etc.) | | `data_forwarding/manager.py` | ~300 | Store vs. recompute strategy coordination | --- ## Component 2: ONNX Integration (`dace/libraries/onnx/`) ### Purpose Enables **importing and executing ONNX neural network models** within DaCe. Converts ONNX graphs to optimized DaCe SDFGs for efficient execution on CPU/GPU. ### Key Capabilities - **Model Import**: Load ONNX models from files or protobuf objects - **100+ Operations**: Dynamically generated node classes for all ONNX ops - **Shape Inference**: Automatic symbolic and concrete shape computation - **Multi-Strategy Implementations**: Pure (correctness), optimized (performance), hardware-specific - **Type Safety**: Schema-based validation and type checking ### Core Architecture **Dynamic Node Generation**: - Registry system generates Python classes for all ONNX operations at import time - Each operation has schema, properties, connectors, and implementations - Example: `ONNXConv`, `ONNXMatMul`, `ONNXSoftmax` (100+ generated classes) **Implementation Strategies**: 1. **Pure Implementations** (`pure_implementations.py`): Reference implementations in Python/NumPy 2. **Optimized Implementations** (`img_op_implementations.py`): Hand-crafted SDFGs for performance 3. **Hardware-Specific**: Future GPU/FPGA specialized implementations **Import Pipeline**: ``` ONNX Model → Validation → Shape Inference → Simplification → SDFG Construction → Compilation ``` ### Key Files | File | Lines | Purpose | |------|-------|---------| | `onnx_importer.py` | 711 | Main entry point, orchestrates import pipeline | | `op_implementations/pure_implementations.py` | 3052 | Reference implementations for 40+ operations | | `nodes/onnx_op_registry.py` | 325 | Dynamic node class generation | | `schema.py` | 390 | Type system and validation | | `shape_inference/symbolic_shape_infer.py` | 1976 | Symbolic shape inference (Microsoft-sourced) | --- ## Component 3: PyTorch Integration (`dace/libraries/torch/`) ### Purpose Provides **bidirectional integration** between PyTorch and DaCe. Enables optimizing PyTorch models with DaCe while maintaining PyTorch's autograd compatibility. ### Key Capabilities - **Model Optimization**: Convert `torch.nn.Module` to optimized DaCe SDFGs - **Autograd Integration**: Backward pass generation integrates with PyTorch's autograd - **Dual Dispatch**: C++ extension (performance) or CTypes (flexibility) - **Zero-Copy Tensors**: DLPack protocol for efficient memory sharing - **Training Support**: Full forward + backward pass compilation ### Core Architecture **Integration Flow**: ``` PyTorch Model → ONNX Export → DaCe SDFG → Backward Generation → Compilation → PyTorch Operator ``` **Dispatcher Strategies**: 1. **C++ Extension** (`cpp_torch_extension.py`): Native PyTorch operator with autograd - High performance - 64 parameter limit - Slower compilation 2. **CTypes Module** (`ctypes_module.py`): Pure Python dispatcher - Unlimited parameters - Faster compilation - Slight overhead **Zero-Copy Memory Sharing**: - DLPack protocol enables PyTorch tensors to view DaCe memory without copying - Bidirectional: DaCe → PyTorch (outputs) and PyTorch → DaCe (inputs) ### Key Files | File | Lines | Purpose | |------|-------|---------| | `dispatchers/cpp_torch_extension.py` | 717 | C++ code generation for PyTorch operators | | `dispatchers/ctypes_module.py` | 230 | CTypes-based dispatcher | | `dlpack.py` | 199 | Zero-copy tensor sharing via DLPack | | `environments/pytorch_env.py` | 94 | CMake build configuration | --- ## How Components Work Together ### Example: Training a PyTorch Model with DaCe ```python import torch from dace.frontend.python import DaceModule # 1. Define PyTorch model model = MyNeuralNetwork() optimizer = torch.optim.Adam(model.parameters()) # 2. Wrap with DaCe (compiles on first call) dace_model = DaceModule(model, dummy_inputs, backward=True) # 3. Training loop (standard PyTorch code) for inputs, labels in dataloader: optimizer.zero_grad() outputs = dace_model(inputs) # DaCe-optimized forward pass loss = criterion(outputs, labels) loss.backward() # DaCe-optimized backward pass optimizer.step() ``` **What Happens Internally**: 1. **First Call**: PyTorch model → ONNX export → DaCe SDFG (via ONNX integration) 2. **Backward Generation**: Forward SDFG → Backward SDFG (via autodiff) 3. **Compilation**: Both SDFGs compiled to optimized code 4. **Dispatcher**: C++ extension or CTypes wrapper created 5. **Forward Pass**: DaCe executes optimized forward computation 6. **Backward Pass**: DaCe executes generated backward computation 7. **Gradient Return**: Gradients flow back to PyTorch optimizer ### Data Flow ``` PyTorch Tensor (input) ↓ Zero-copy (DLPack) DaCe Array ↓ Optimized SDFG Execution DaCe Array (output) ↓ Zero-copy (DLPack) PyTorch Tensor (output) ↓ loss.backward() PyTorch Tensor (grad_output) ↓ Zero-copy (DLPack) DaCe Array (backward pass input) ↓ Backward SDFG Execution DaCe Array (grad_input) ↓ Zero-copy (DLPack) PyTorch Tensor (grad_input) ``` --- ## Testing Strategy ### Test Organization ``` tests/ ├── autodiff/ # AD correctness tests │ ├── test_single_state.py # Basic AD operations │ └── torch/ # PyTorch integration tests │ ├── test_training.py # End-to-end training │ ├── test_bert_encoder_backward.py # BERT model │ └── test_llama_decoder_backward.py # LLaMA model │ ├── onnx/ # ONNX import tests │ ├── test_python_frontend.py # Basic operations │ ├── test_bert_subgraphs.py # Real model subgraphs │ └── test_input_outputs.py # I/O handling │ └── torch/ # PyTorch integration tests │ ├── test_lenet.py # Simple CNN │ ├── test_bert_encoder.py # Transformer encoder │ └── test_llama_decoder.py # Decoder architecture │ └── npbench/ # AD tests on NPBench kernels ``` ### Test Coverage | Component | Test Files | Coverage | |-----------|-----------|----------| | Autodiff Core | 15+ files | Tasklets, maps, loops, nested SDFGs | | ONNX Integration | 20+ files | Import, execution, type handling | | PyTorch Integration | 15+ files | Forward, backward, training loops | ### Running Tests ```bash # All basic tests (excluding hardware-specific) pytest -m "(autodiff or torch or onnx) and not long" tests/ # AD tests only pytest tests/autodiff/ # ONNX tests only pytest tests/onnx/ # PyTorch tests only pytest tests/torch/ ``` --- ## Known Limitations and Future Work ### Current Limitations 1. **Recompute Strategy**: Experimental, not production-ready 2. **Control Flow**: Conditionals are inlined into state machine (not reversed as ControlFlowRegions) 3. **Second-Order Gradients**: Not yest tested --- ## Documentation Each component has detailed design documentation: - [`dace/autodiff/autodiff.md`](dace/autodiff/autodiff.md) - Complete AD system design - [`dace/libraries/onnx/onnx.md`](dace/libraries/onnx/onnx.md) - ONNX integration architecture - [`dace/libraries/torch/torch.md`](dace/libraries/torch/torch.md) - PyTorch integration details These documents provide: - Detailed component descriptions - Algorithm explanations - Code walkthrough - Extension points - Implementation notes --- ## Impact on DaCe ### Code Additions | Component | Lines of Code | Files | |-----------|--------------|-------| | Autodiff | ~8,000 | 15+ files | | ONNX | ~7,000 | 20+ files | | PyTorch | ~1,500 | 10+ files | | **Total** | **~16,500** | **45+ files** | ### Dependencies New dependencies (already in `setup.py`): - `onnx` - ONNX model format - `onnxsim` - ONNX graph simplification - `torch` - PyTorch framework (optional) - `protobuf` - Protocol buffers (for ONNX) - `jax` - For gradient numerical validation tests -`transformers` - For testing the Pytorch/ONNX frontends - `efficientnet_pytorch`- For testing EfficientNet --- --------- Co-authored-by: Oliver Rausch <oliverrausch99@gmail.com>
Modified the reloading scheme used by `ReloadableDLL`. If the library (of the compiled SDFG) is already loaded, through another instance of `CompiledSDFG` then `ReloadableDLL` will copy the SDFG library and try to load that until it founds a name that is free. In ICON4Py we noticed that this leads sometime to a segmentation fault on Linux, but not on MacOS X. We traced the main issue down to the fact that `ReloadableDLL` created a copy of the SDFG library without checking if the new name is already used, instead the file is simply overwritten. The new scheme changes this slightly, in the following ways: - If the new name is already taken, then no copy is performed and the class tries to use that file, that already exists. - Instead of copying library `n - 1` to `n` it always makes a copy from the initial library. --------- Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>
Updated ignored paths and build notification settings.
Increased pytest timeout from 300 to 600 seconds.
## Refactor `dace/data.py` into `dace/data/` package
### Summary
This PR refactors the monolithic `dace/data.py` file into a modular
`dace/data/` package with separate files for different functionality,
improving code organization and maintainability.
### Changes
- [x] **`dace/data/core.py`**: Core data descriptor classes (`Data`,
`Scalar`, `Array`, `ContainerArray`, `Stream`, `Structure`, `View`,
`Reference` and their subclasses)
- [x] **`dace/data/tensor.py`**: Tensor/sparse tensor support (`Tensor`,
`TensorIndex*` classes)
- [x] **`dace/data/creation.py`**: Data descriptor creation functions
(`create_datadescriptor`, `make_array_from_descriptor`,
`make_reference_from_descriptor`)
- [x] **`dace/data/ctypes_interop.py`**: Ctypes interoperability
(`make_ctypes_argument`)
- [x] **`dace/data/ml.py`**: ML-related descriptors (`ParameterArray`)
- [x] **`dace/data/__init__.py`**: Re-exports all public API for
backward compatibility
- [x] **`dace/utils.py`**: Utility functions (`find_new_name`,
`deduplicate`, `prod`)
- [x] **`dace/properties.py`**: Updated to handle circular import
gracefully
- [x] **`dace/autodiff/library/library.py`**: Updated to import
`ParameterArray` from the new location
- [x] **Deleted** old `dace/data.py` file
- [x] **Removed** `Number` and `ArrayLike` from `dace/data/__init__.py`
(other places import directly)
- [x] **Moved** `_prod` to `dace/utils.py` as `prod` (kept `_prod`
export for backward compat)
- [x] **Fixed** broken imports in `data_report.py`,
`data_layout_tuner.py`, and `cutout.py`
### Backward Compatibility
All public APIs are re-exported from `dace.data`, ensuring backward
compatibility with existing code.
<!-- START COPILOT CODING AGENT SUFFIX -->
<details>
<summary>Original prompt</summary>
>
> ----
>
> *This section details on the original issue you should resolve*
>
> <issue_title>Refactor `dace/data.py`</issue_title>
> <issue_description>`data.py` is a monolithic file containing classes
for core data containers (Data, Scalar, Array, Stream, View, Reference,
and their subclasses `*{View, Reference}`; functionality to get data
descriptors from arbitrary objects; derived objects for Tensors and
sparse tensors; and other functions.
>
> This issue will be resolved once `data.py` is refactored to a
`dace/data/*` folder, which will contain separate files for:
> 1. core descriptor classes
> 2. structures (the Structure class and similar functionality)
> 3. tensors/sparse tensors
> 4. descriptor creation
> 5. ML-related data descriptors, such as parameter arrays (see
`dace/autodiff/library/library.py`)
> 6...N. Other functions and classes categorized by their semantic
meaning.
>
> The code for `dace/data/*` will be refactored out of `data.py` (which
should not exist at the end of this issue), `dtypes.py` (which may exist
but be shorter), and other files that contain data descriptors
(subclasses of Data/Array/Stream/Structure/View/Reference, such as
ParameterArray. Try to find all such subclasses in the codebase barring
tests/* and samples/*).
>
> Lastly, utility functions in `data.py` and `dtypes.py` (only those two
files for this issue), such as `find_new_name` from data.py and
`deduplicate` from dtypes.py, should find themselves in a new
`dace/utils.py` file.</issue_description>
>
> ## Comments on the Issue (you are @copilot in this section)
>
> <comments>
> </comments>
>
</details>
- Fixes spcl#2244
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>
…to seq. maps inside GPU kernels or gpu dev. maps (spcl#2088) GPU codegen crashes and generates incorrect code with dynamic inputs to seq. maps inside GPU kernels or gpu dev. maps --------- Co-authored-by: alexnick83 <31545860+alexnick83@users.noreply.github.com> Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
…pcl#2246) Updated the documentation for proposed pass decomposition, including changes to pass names and descriptions for clarity.
3eec6f6 to
ab9eaef
Compare
e552808 to
d4db8e7
Compare
commit ecb2785 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 08:19:40 2025 +0100 Updated the dace updater workflow file. commit f3198ef Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 07:41:26 2025 +0100 Made the update point to the correct repo. commit 96f963a Merge: 8b7cce5 387f1e8 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 07:37:48 2025 +0100 Merge remote-tracking branch 'spcl/main' into automatic_gt4py_deployment commit 8b7cce5 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 09:18:22 2025 +0100 Restored the original workflow files. commit 362ab70 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:41:40 2025 +0100 Now it has run once, so let's make it less runnable. commit 81b8cfa Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:39:09 2025 +0100 Made it run always. commit 6d71466 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:38:11 2025 +0100 Small update. commit eb31e6c Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:23:33 2025 +0100 Empty commit in the branch containing the workflow file. commit 2970a75 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:21:09 2025 +0100 Next step. commit f5d3d9d Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:17:56 2025 +0100 Let's disable everything. commit 211e415 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:10:43 2025 +0100 Disabled the kickstarter. commit d012c26 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:05:38 2025 +0100 Updated everything.
commit 2a89832 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon Oct 27 09:56:16 2025 +0100 Fixed GPU_TX_MARKER test commit c240128 Merge: 10160bc e38d006 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri Oct 24 18:29:55 2025 +0200 Merge remote-tracking branch 'upstream/main' into nvtx_ranges commit 10160bc Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri Oct 24 18:27:58 2025 +0200 Fix instrumentation for copies commit d14093c Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch> Date: Mon Oct 6 16:16:43 2025 +0200 Make pre-commit happy commit 68942a3 Merge: a3063e5 b415f62 Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch> Date: Tue Sep 30 17:11:35 2025 +0200 Merge remote-tracking branch 'upstream/main' into nvtx_ranges commit a3063e5 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Sep 23 18:03:26 2025 +0200 Working version of nvtx markers with allocations commit 6788b97 Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch> Date: Tue Sep 23 13:42:41 2025 +0200 Updated functions commit 455ad38 Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch> Date: Tue Sep 23 13:20:51 2025 +0200 Added marker on allocations as well commit 80ce99c Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch> Date: Wed Aug 20 19:02:36 2025 +0300 Avoid profiling tasklets commit 0314386 Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch> Date: Wed Aug 20 19:02:28 2025 +0300 Fix get_latest_report_path in case there's no report commit aad5e87 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Wed Aug 20 10:05:16 2025 +0200 Remove import of deleted file commit a3ff00e Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 18:16:11 2025 +0200 Revert "Improved GPU Copy (spcl#1976)" This reverts commit bc83c47. commit ea5f6ff Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 18:14:35 2025 +0200 Make format happy commit b1ea9af Merge: bbc1faf aabbe48 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 19:12:43 2025 +0300 Merge branch 'main' into nvtx_ranges commit bbc1faf Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 18:07:12 2025 +0200 Format a bit better with dace.instrument commit eea658f Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 18:04:54 2025 +0200 Fixes in gpu_tx_markers.py commit 2f43f7a Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 18:04:43 2025 +0200 Remove instrument_sdfg commit 0fdb4df Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 17:28:57 2025 +0200 Small refactoring of if statements in gpu_tx_markers.py commit 73c52bf Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 17:26:02 2025 +0200 Added on_sdfg_init/exit_begin/end functions commit ff70f2f Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 16:32:34 2025 +0200 Replaced is with == commit 3d626e0 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 16:31:04 2025 +0200 Fix local and global streams commit 209860d Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 16:27:04 2025 +0200 Improve _is_sdfg_in_device_code commit bc83c47 Author: Philip Müller <147368808+philip-paul-mueller@users.noreply.github.com> Date: Mon Jun 2 15:58:08 2025 +0200 Improved GPU Copy (spcl#1976) Before some 2D copies (especially if they had FORTRAN order) were turned into Maps, see [issue#1953](spcl#1953). This PR modifies the code generator in such a way that such copies are now handled. There is some legacy stuff that should also be looked at. --------- Co-authored-by: Philip Mueller <philip.paul.mueller@bluemain.ch> Co-authored-by: Tal Ben-Nun <tbennun@gmail.com> commit df99571 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Tue Aug 19 17:34:12 2025 +0300 Apply suggestion from @tbennun Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com> commit da00f21 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 19 17:49:47 2025 +0200 Avoid pushing rocTX markers before initializing HIP since it doesn't work commit a39308b Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 16 15:13:31 2025 +0200 Fix on_copy and on_scope for GPU_TX_MARKERS commit 2d554fa Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Thu May 15 15:20:05 2025 +0200 Removed preprocessor checks by properly placing ranges in NestedSDFGs and small fixes for CPU wrapper includes commit 5937a15 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Wed May 14 11:33:02 2025 +0200 Refactored a bit GPUTXMarkerProvider commit 9e8ec9e Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch> Date: Wed May 14 10:52:26 2025 +0200 Addressed PR comments for checking is the instrumentation is enabled commit c3f1932 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 12 17:29:30 2025 +0200 Small fixes and cleanups commit 366721f Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 12 17:23:21 2025 +0200 Fix order of imports in gpu_events.py commit 8ea4327 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 12 17:04:56 2025 +0200 Add markers for different SDFGs and states commit 22b372e Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 12 09:45:20 2025 +0200 Revert changes in GPU_Event provider commit e5adaef Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 12 09:34:34 2025 +0200 Allow building with HIP even if rocTX is not found commit b30f4a2 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 9 17:20:34 2025 +0200 Fix formatting commit 747f357 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 9 17:14:17 2025 +0200 Made test NVTX agnostic and updated documentation commit 646ca90 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 9 17:05:10 2025 +0200 Use same checks for enabling roctx as CMake commit c28036b Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 9 17:00:19 2025 +0200 Fix compilation for AMD gpu commit 855304d Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Thu May 8 11:58:00 2025 +0200 Fix library names commit 9df4f73 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Thu May 8 11:36:29 2025 +0200 Trying to use roctx commit a55aeb7 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Wed May 7 17:58:37 2025 +0200 Make formatting happy commit a8bcadf Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Wed May 7 17:50:10 2025 +0200 Renamed NVTX to GPU_TX_MARKERS and added note for AMD GPUs commit 7337233 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 17:30:35 2025 +0200 Changed nvtxRangePushA to nvtxRangePush commit 74c9117 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 17:23:42 2025 +0200 Fix copyright and GPU test commit 989bc32 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 17:12:59 2025 +0200 Make formatter happy commit 4f57297 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 17:09:58 2025 +0200 Remove NVTX markers from LIKWID since LIKWID has its own markers commit a4d2ff8 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 17:08:08 2025 +0200 Improved NVTX markers in likwid commit 1e71171 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 15:42:13 2025 +0200 Update NVTX Provider imports commit 438090f Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 15:41:56 2025 +0200 Update documentation commit 89b7864 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 15:41:48 2025 +0200 Small fix of whiteline in framecode commit ef5355b Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 15:38:02 2025 +0200 Refactored NVTX Instrumentation provider constructor and test for expected code commit bbf1d32 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Mon May 5 15:37:16 2025 +0200 Inherit LIKWID_GPU Instrumentation provider from NVTX as well commit 90b50ac Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 2 18:29:07 2025 +0200 Make GPUEventProvider inherit from NVTXProvider to enable the NVTX markers by default with it commit c584255 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 2 18:01:31 2025 +0200 Updated documentation commit 04836fb Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 2 18:01:21 2025 +0200 Moved the printing of NVTX range push and pop inside the NVTXProvider commit f5240b2 Author: Ioannis Magkanaris <iomagkanaris@gmail.com> Date: Fri May 2 17:25:04 2025 +0200 Added NVTX range in CPU wrapper for GPU kernel
commit 68ffa3b Merge: c069546 d99ad29 Author: Philipp Schaad <schaad.phil@gmail.com> Date: Sun Nov 30 06:17:32 2025 -0600 Merge branch 'main' into make_construct_args_public commit c069546 Merge: 41902c3 408a481 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Tue Nov 4 11:22:14 2025 +0100 Merge remote-tracking branch 'spcl/main' into make_construct_args_public commit 41902c3 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 16:01:26 2025 +0100 Fixed a bug. commit 65725f9 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 15:15:03 2025 +0100 This should be enough for bug compatibility. commit daf90e9 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 12:58:25 2025 +0100 Updated the thing a bit more. commit 2ddabbd Merge: 4da0c4e b44aeb0 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 12:54:19 2025 +0100 Merge remote-tracking branch 'spcl/main' into make_construct_args_public commit 4da0c4e Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 12:53:48 2025 +0100 Made some additional check. commit 69960ce Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 12:00:30 2025 +0100 Forgot to do this. commit 6e1a9ff Merge: c1214fa 1bf2173 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 11:25:46 2025 +0100 Merge remote-tracking branch 'spcl/main' into make_construct_args_public commit c1214fa Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 09:50:41 2025 +0100 Updated the tests and made it clear that you can not return a scalar from an SDFG. commit 9397a23 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 09:40:29 2025 +0100 Implemented the proper handling of tuples of size one. commit e8d909e Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 09:30:48 2025 +0100 Removed that stupid sclar return value feature that CAN NOT WORK. However, I saw that it also, under the hood sometimes tests if the argument is a pyobject. Since that thing is a pointer it is possible and I implemented it for that. But it was again not implemented properly, since for the case when the return value is passed as a regular argument, it was not checking that, only for managed return values. commit ab110d2 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 09:24:45 2025 +0100 Updated the description. commit 899b2a0 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 09:24:32 2025 +0100 Fixed some old stuff. commit 7f17e13 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 09:08:49 2025 +0100 Fixed a bug, but in a way I do not like. commit c2c1116 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 08:40:47 2025 +0100 Removed a missleading comment. commit ded5df8 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 08:04:38 2025 +0100 Made some refactoring to remove some strange DaCe behaviour. commit b029828 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 08:02:28 2025 +0100 Fixed an issue in safe_call commit b09c9fc Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Oct 31 07:17:36 2025 +0100 Included the first bunch of Tal's changes. commit e138b06 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Thu Oct 30 15:12:23 2025 +0100 Made the 'passed as positional and named argument'-error more explicit. commit f901a3d Author: Philip Mueller <philip.mueller@cscs.ch> Date: Thu Oct 30 15:05:00 2025 +0100 Fixed a bug in a unit test. Due to the refactoring the case that a variable is passed once as positional and as named argument is not detected and asserted. This test however, passed `a` always as positional argument and if `symbolic` is `True` also as named argument. commit 767260d Author: Philip Mueller <philip.mueller@cscs.ch> Date: Thu Oct 30 14:19:44 2025 +0100 Clarified a comment. commit 2b8123a Author: Philip Mueller <philip.mueller@cscs.ch> Date: Thu Oct 30 13:56:20 2025 +0100 Made the construct argumt vector function publich and also refactored some things.
commit 5b068e7 Author: Affifboudaoud <hk_boudaoud@esi.dz> Date: Sun Nov 23 22:50:46 2025 +0100 Add visited set to avoid visiting same node multiple times
commit c9f93fd Author: Edoardo Paone <edoardo.paone@cscs.ch> Date: Thu Dec 18 00:00:20 2025 +0100 fix state fusion for write-write hazard
d4db8e7 to
cd52c4b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the PR/branch that GT4Py.Next uses to pull DaCe.
It is essentially DaCe main together with our fixes that, for various reasons have not made it yet into DaCe main.
The process for updating this branch is is as follows there are no exceptions:
version.pyfile. Fornextwe are using the epoch 43,cartesianwould use 42. As version number the date is used. Thus the version (fornext) would look something like:'43!YYYY.MM.DD'.gt4py-next-integration).__gt4py-next-integration_YYYY_MM_DDand push it as well.Afterwards you have to update GT4Py's
pyproject.tomlfile.For this you have to update the version requirement of DaCe in the
dace-nextgroup at the beginning of the file to the version you just created, i.e. change it todace==43!YYYY.MM.DD.Then you have to update the the source in the
uvspecific parts of the file, there you have to change the source to the new tag you have just created.Then you have to update the uv look by running
uv sync --extra next --group dace-next, if you have installed the precommit hooks then this will be done automatically.NOTE: Once PR#2423 has been merged the second step, i.e. adapting the tag in the
uvspecific parts is no longer needed.On top of
DaCe/mainwe are using the following PRs:CompiledSDFGrefactoringMapFusionVerticalNo Longer Needed
DaCe.ConfigPruneSymbolsscope_tree_recursive()MapFusionother_subsetvalidationstate_fission()SubgraphViewtry_initialize()edges in Map fusionMapFusionVerticalRedundantSecondArrayimportinfast_call()compiled_sdfg_call_hooks_managerself._lastargsMutable No longer needed since we now use GT4Py PR#2353 and DaCe PR#2206.self._lastargsMutable (Should be replaced by a more permanent solution).MapFusion*AddThreadBlockMapapply_transformation_once_everywhere()