Skip to content

Conversation

@philip-paul-mueller
Copy link

@philip-paul-mueller philip-paul-mueller commented Apr 30, 2025

This is the PR/branch that GT4Py.Next uses to pull DaCe.
It is essentially DaCe main together with our fixes that, for various reasons have not made it yet into DaCe main.

The process for updating this branch is is as follows there are no exceptions:

  • You start with current DaCe main.
  • Then you include the PR that enables automatic Python index update, by squash merge it.
  • Then squash merge the PRs that are listed below, check if they have been merged into DaCe proper and if so remove them from the list.
  • Then update the version.py file. For next we are using the epoch 43, cartesian would use 42. As version number the date is used. Thus the version (for next) would look something like: '43!YYYY.MM.DD'.
  • Force push your changes to this branch (gt4py-next-integration).
  • Create a tag with the pattern __gt4py-next-integration_YYYY_MM_DD and push it as well.
  • Make sure that the workflow has been triggered.

Afterwards you have to update GT4Py's pyproject.toml file.
For this you have to update the version requirement of DaCe in the dace-next group at the beginning of the file to the version you just created, i.e. change it to dace==43!YYYY.MM.DD.
Then you have to update the the source in the uv specific parts of the file, there you have to change the source to the new tag you have just created.
Then you have to update the uv look by running uv sync --extra next --group dace-next, if you have installed the precommit hooks then this will be done automatically.

NOTE: Once PR#2423 has been merged the second step, i.e. adapting the tag in the uv specific parts is no longer needed.

On top of DaCe/main we are using the following PRs:

No Longer Needed

@philip-paul-mueller philip-paul-mueller marked this pull request as draft April 30, 2025 10:04
philip-paul-mueller added a commit to GridTools/gt4py that referenced this pull request Apr 30, 2025
Instead of pulling directly from the official DaCe repo, we now (for the
time being) pull from [this
PR](GridTools/dace#1).
This became necessary as we have a lot of open PR in DaCe and need some
custom fixes (that can in their current form not be merged into DaCe).
In the long term however, we should switch back to the main DaCe repo.
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 3 times, most recently from 964e84b to 2d85437 Compare May 26, 2025 05:22
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 2 times, most recently from 88c99f4 to d779cd1 Compare June 10, 2025 11:50
@edopao edopao force-pushed the gt4py-next-integration branch from d779cd1 to 4f40029 Compare June 12, 2025 12:46
@edopao edopao force-pushed the gt4py-next-integration branch from 87c77ef to c2a4e42 Compare June 27, 2025 14:08
@edopao edopao force-pushed the gt4py-next-integration branch from 178037a to 9114985 Compare July 14, 2025 08:42
@philip-paul-mueller philip-paul-mueller changed the title Do Not Merge: Integration Branch for GT4Py Do Not Merge: Integration Branch for GT4Py Next Jul 15, 2025
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from 33b63a1 to 2417e09 Compare July 21, 2025 07:42
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from 3472895 to bed3b0e Compare July 24, 2025 07:24
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from a313f6a to 6ff7bf1 Compare October 30, 2025 10:19
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 2 times, most recently from 4f9c76f to 7a0d751 Compare November 5, 2025 06:56
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 2 times, most recently from 85080da to 3eec6f6 Compare November 26, 2025 12:09
affifboudaoud and others added 9 commits December 2, 2025 09:38
…cl#2164)

# Pull Request: Machine Learning Integration for DaCe

## Overview

This PR adds comprehensive machine learning capabilities to DaCe through
three tightly integrated components:

1. **Automatic Differentiation (AD)** - Reverse-mode gradient
computation for SDFGs
2. **ONNX Integration** - Import and execute neural network models
3. **PyTorch Integration** - Bidirectional interoperability with
PyTorch's autograd system

Together, these components enable DaCe to optimize and accelerate
machine learning workloads, particularly neural network training and
inference.

## High-Level Architecture

```
PyTorch Model
     ↓
  ONNX Export
     ↓
DaCe SDFG (Forward)
     ↓
Automatic Differentiation
     ↓
DaCe SDFG (Backward)
     ↓
Compiled Code Generation
     ↓
PyTorch Operator (with Autograd)
```

## Component 1: Automatic Differentiation (`dace/autodiff/`)

### Purpose

Provides **reverse-mode automatic differentiation** for SDFGs, enabling
gradient computation for any DaCe program. This is the foundation for
neural network training and gradient-based optimization.

### Key Capabilities

- **Full SDFG Support**: Differentiates maps, tasklets, nested SDFGs,
loops, and library nodes
- **Control Flow**: Handles loops (LoopRegion) and conditionals
- **ONNX Operations**: 50+ backward implementations for ONNX operators
- **Data Forwarding**: Flexible strategies (store vs. recompute) for
memory/compute tradeoffs
- **Extensible Registry**: Plugin-based system for adding backward rules

### Core Algorithm

1. **Forward Pass Execution**: Run original computation and identify
required intermediates
2. **Backward Pass Generation**: Traverse computation graph in reverse,
accumulating gradients
3. **Node Reversal**: Each forward node (Map, Tasklet, ONNXOp) has a
registered backward implementation
4. **Gradient Accumulation**: Use write-conflict resolution (WCR) for
multi-path gradients

### Key Files

| File | Lines | Purpose |
|------|-------|---------|
| `backward_pass_generator.py` | ~800 | Core AD engine that orchestrates
backward pass generation |
| `implementations/onnx_ops.py` | ~2000 | Backward implementations for
50+ ONNX operations |
| `implementations/dace_nodes.py` | ~600 | Backward rules for core SDFG
elements (Tasklet, Map, etc.) |
| `data_forwarding/manager.py` | ~300 | Store vs. recompute strategy
coordination |


---

## Component 2: ONNX Integration (`dace/libraries/onnx/`)

### Purpose

Enables **importing and executing ONNX neural network models** within
DaCe. Converts ONNX graphs to optimized DaCe SDFGs for efficient
execution on CPU/GPU.

### Key Capabilities

- **Model Import**: Load ONNX models from files or protobuf objects
- **100+ Operations**: Dynamically generated node classes for all ONNX
ops
- **Shape Inference**: Automatic symbolic and concrete shape computation
- **Multi-Strategy Implementations**: Pure (correctness), optimized
(performance), hardware-specific
- **Type Safety**: Schema-based validation and type checking

### Core Architecture

**Dynamic Node Generation**:
- Registry system generates Python classes for all ONNX operations at
import time
- Each operation has schema, properties, connectors, and implementations
- Example: `ONNXConv`, `ONNXMatMul`, `ONNXSoftmax` (100+ generated
classes)

**Implementation Strategies**:
1. **Pure Implementations** (`pure_implementations.py`): Reference
implementations in Python/NumPy
2. **Optimized Implementations** (`img_op_implementations.py`):
Hand-crafted SDFGs for performance
3. **Hardware-Specific**: Future GPU/FPGA specialized implementations

**Import Pipeline**:
```
ONNX Model → Validation → Shape Inference → Simplification → SDFG Construction → Compilation
```

### Key Files

| File | Lines | Purpose |
|------|-------|---------|
| `onnx_importer.py` | 711 | Main entry point, orchestrates import
pipeline |
| `op_implementations/pure_implementations.py` | 3052 | Reference
implementations for 40+ operations |
| `nodes/onnx_op_registry.py` | 325 | Dynamic node class generation |
| `schema.py` | 390 | Type system and validation |
| `shape_inference/symbolic_shape_infer.py` | 1976 | Symbolic shape
inference (Microsoft-sourced) |

---

## Component 3: PyTorch Integration (`dace/libraries/torch/`)

### Purpose

Provides **bidirectional integration** between PyTorch and DaCe. Enables
optimizing PyTorch models with DaCe while maintaining PyTorch's autograd
compatibility.

### Key Capabilities

- **Model Optimization**: Convert `torch.nn.Module` to optimized DaCe
SDFGs
- **Autograd Integration**: Backward pass generation integrates with
PyTorch's autograd
- **Dual Dispatch**: C++ extension (performance) or CTypes (flexibility)
- **Zero-Copy Tensors**: DLPack protocol for efficient memory sharing
- **Training Support**: Full forward + backward pass compilation

### Core Architecture

**Integration Flow**:
```
PyTorch Model → ONNX Export → DaCe SDFG → Backward Generation → Compilation → PyTorch Operator
```

**Dispatcher Strategies**:
1. **C++ Extension** (`cpp_torch_extension.py`): Native PyTorch operator
with autograd
   - High performance
   - 64 parameter limit
   - Slower compilation
2. **CTypes Module** (`ctypes_module.py`): Pure Python dispatcher
   - Unlimited parameters
   - Faster compilation
   - Slight overhead

**Zero-Copy Memory Sharing**:
- DLPack protocol enables PyTorch tensors to view DaCe memory without
copying
- Bidirectional: DaCe → PyTorch (outputs) and PyTorch → DaCe (inputs)

### Key Files

| File | Lines | Purpose |
|------|-------|---------|
| `dispatchers/cpp_torch_extension.py` | 717 | C++ code generation for
PyTorch operators |
| `dispatchers/ctypes_module.py` | 230 | CTypes-based dispatcher |
| `dlpack.py` | 199 | Zero-copy tensor sharing via DLPack |
| `environments/pytorch_env.py` | 94 | CMake build configuration |


---

## How Components Work Together

### Example: Training a PyTorch Model with DaCe

```python
import torch
from dace.frontend.python import DaceModule

# 1. Define PyTorch model
model = MyNeuralNetwork()
optimizer = torch.optim.Adam(model.parameters())

# 2. Wrap with DaCe (compiles on first call)
dace_model = DaceModule(model, dummy_inputs, backward=True)

# 3. Training loop (standard PyTorch code)
for inputs, labels in dataloader:
    optimizer.zero_grad()
    outputs = dace_model(inputs)  # DaCe-optimized forward pass
    loss = criterion(outputs, labels)
    loss.backward()  # DaCe-optimized backward pass
    optimizer.step()
```

**What Happens Internally**:
1. **First Call**: PyTorch model → ONNX export → DaCe SDFG (via ONNX
integration)
2. **Backward Generation**: Forward SDFG → Backward SDFG (via autodiff)
3. **Compilation**: Both SDFGs compiled to optimized code
4. **Dispatcher**: C++ extension or CTypes wrapper created
5. **Forward Pass**: DaCe executes optimized forward computation
6. **Backward Pass**: DaCe executes generated backward computation
7. **Gradient Return**: Gradients flow back to PyTorch optimizer

### Data Flow

```
PyTorch Tensor (input)
    ↓ Zero-copy (DLPack)
DaCe Array
    ↓ Optimized SDFG Execution
DaCe Array (output)
    ↓ Zero-copy (DLPack)
PyTorch Tensor (output)
    ↓ loss.backward()
PyTorch Tensor (grad_output)
    ↓ Zero-copy (DLPack)
DaCe Array (backward pass input)
    ↓ Backward SDFG Execution
DaCe Array (grad_input)
    ↓ Zero-copy (DLPack)
PyTorch Tensor (grad_input)
```

---

## Testing Strategy

### Test Organization

```
tests/
├── autodiff/                       # AD correctness tests
│   ├── test_single_state.py        # Basic AD operations
│   └── torch/                      # PyTorch integration tests
│       ├── test_training.py        # End-to-end training
│       ├── test_bert_encoder_backward.py    # BERT model
│       └── test_llama_decoder_backward.py   # LLaMA model
│
├── onnx/                          # ONNX import tests
│   ├── test_python_frontend.py    # Basic operations
│   ├── test_bert_subgraphs.py     # Real model subgraphs
│   └── test_input_outputs.py      # I/O handling
│
└── torch/                          # PyTorch integration tests
│   ├── test_lenet.py               # Simple CNN
│   ├── test_bert_encoder.py        # Transformer encoder
│   └── test_llama_decoder.py       # Decoder architecture
│
└── npbench/                        # AD tests on NPBench kernels

```

### Test Coverage

| Component | Test Files | Coverage |
|-----------|-----------|----------|
| Autodiff Core | 15+ files | Tasklets, maps, loops, nested SDFGs |
| ONNX Integration | 20+ files | Import, execution, type handling |
| PyTorch Integration | 15+ files | Forward, backward, training loops |

### Running Tests

```bash
# All basic tests (excluding hardware-specific)
pytest -m "(autodiff or torch or onnx) and not long" tests/

# AD tests only
pytest tests/autodiff/

# ONNX tests only
pytest tests/onnx/

# PyTorch tests only
pytest tests/torch/
```

---

## Known Limitations and Future Work

### Current Limitations

1. **Recompute Strategy**: Experimental, not production-ready
2. **Control Flow**: Conditionals are inlined into state machine (not
reversed as ControlFlowRegions)
3. **Second-Order Gradients**: Not yest tested


---

## Documentation

Each component has detailed design documentation:

- [`dace/autodiff/autodiff.md`](dace/autodiff/autodiff.md) - Complete AD
system design
- [`dace/libraries/onnx/onnx.md`](dace/libraries/onnx/onnx.md) - ONNX
integration architecture
- [`dace/libraries/torch/torch.md`](dace/libraries/torch/torch.md) -
PyTorch integration details

These documents provide:
- Detailed component descriptions
- Algorithm explanations
- Code walkthrough
- Extension points
- Implementation notes

---

## Impact on DaCe

### Code Additions

| Component | Lines of Code | Files |
|-----------|--------------|-------|
| Autodiff | ~8,000 | 15+ files |
| ONNX | ~7,000 | 20+ files |
| PyTorch | ~1,500 | 10+ files |
| **Total** | **~16,500** | **45+ files** |

### Dependencies

New dependencies (already in `setup.py`):
- `onnx` - ONNX model format
- `onnxsim` - ONNX graph simplification
- `torch` - PyTorch framework (optional)
- `protobuf` - Protocol buffers (for ONNX)
- `jax` - For gradient numerical validation tests
-`transformers` - For testing the Pytorch/ONNX frontends
- `efficientnet_pytorch`- For testing EfficientNet

---

---------

Co-authored-by: Oliver Rausch <oliverrausch99@gmail.com>
Modified the reloading scheme used by `ReloadableDLL`.
If the library (of the compiled SDFG) is already loaded, through another
instance of `CompiledSDFG` then `ReloadableDLL` will copy the SDFG
library and try to load that until it founds a name that is free.
In ICON4Py we noticed that this leads sometime to a segmentation fault
on Linux, but not on MacOS X.
We traced the main issue down to the fact that `ReloadableDLL` created a
copy of the SDFG library without checking if the new name is already
used, instead the file is simply overwritten.

The new scheme changes this slightly, in the following ways:
- If the new name is already taken, then no copy is performed and the
class tries to use that file, that already exists.
- Instead of copying library `n - 1` to `n` it always makes a copy from
the initial library.

---------

Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>
Updated ignored paths and build notification settings.
Increased pytest timeout from 300 to 600 seconds.
## Refactor `dace/data.py` into `dace/data/` package

### Summary

This PR refactors the monolithic `dace/data.py` file into a modular
`dace/data/` package with separate files for different functionality,
improving code organization and maintainability.

### Changes

- [x] **`dace/data/core.py`**: Core data descriptor classes (`Data`,
`Scalar`, `Array`, `ContainerArray`, `Stream`, `Structure`, `View`,
`Reference` and their subclasses)
- [x] **`dace/data/tensor.py`**: Tensor/sparse tensor support (`Tensor`,
`TensorIndex*` classes)
- [x] **`dace/data/creation.py`**: Data descriptor creation functions
(`create_datadescriptor`, `make_array_from_descriptor`,
`make_reference_from_descriptor`)
- [x] **`dace/data/ctypes_interop.py`**: Ctypes interoperability
(`make_ctypes_argument`)
- [x] **`dace/data/ml.py`**: ML-related descriptors (`ParameterArray`)
- [x] **`dace/data/__init__.py`**: Re-exports all public API for
backward compatibility
- [x] **`dace/utils.py`**: Utility functions (`find_new_name`,
`deduplicate`, `prod`)
- [x] **`dace/properties.py`**: Updated to handle circular import
gracefully
- [x] **`dace/autodiff/library/library.py`**: Updated to import
`ParameterArray` from the new location
- [x] **Deleted** old `dace/data.py` file
- [x] **Removed** `Number` and `ArrayLike` from `dace/data/__init__.py`
(other places import directly)
- [x] **Moved** `_prod` to `dace/utils.py` as `prod` (kept `_prod`
export for backward compat)
- [x] **Fixed** broken imports in `data_report.py`,
`data_layout_tuner.py`, and `cutout.py`

### Backward Compatibility

All public APIs are re-exported from `dace.data`, ensuring backward
compatibility with existing code.

<!-- START COPILOT CODING AGENT SUFFIX -->



<details>

<summary>Original prompt</summary>

> 
> ----
> 
> *This section details on the original issue you should resolve*
> 
> <issue_title>Refactor `dace/data.py`</issue_title>
> <issue_description>`data.py` is a monolithic file containing classes
for core data containers (Data, Scalar, Array, Stream, View, Reference,
and their subclasses `*{View, Reference}`; functionality to get data
descriptors from arbitrary objects; derived objects for Tensors and
sparse tensors; and other functions.
> 
> This issue will be resolved once `data.py` is refactored to a
`dace/data/*` folder, which will contain separate files for:
> 1. core descriptor classes
> 2. structures (the Structure class and similar functionality)
> 3. tensors/sparse tensors
> 4. descriptor creation
> 5. ML-related data descriptors, such as parameter arrays (see
`dace/autodiff/library/library.py`)
> 6...N. Other functions and classes categorized by their semantic
meaning.
> 
> The code for `dace/data/*` will be refactored out of `data.py` (which
should not exist at the end of this issue), `dtypes.py` (which may exist
but be shorter), and other files that contain data descriptors
(subclasses of Data/Array/Stream/Structure/View/Reference, such as
ParameterArray. Try to find all such subclasses in the codebase barring
tests/* and samples/*).
> 
> Lastly, utility functions in `data.py` and `dtypes.py` (only those two
files for this issue), such as `find_new_name` from data.py and
`deduplicate` from dtypes.py, should find themselves in a new
`dace/utils.py` file.</issue_description>
> 
> ## Comments on the Issue (you are @copilot in this section)
> 
> <comments>
> </comments>
> 


</details>

- Fixes spcl#2244

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>
…to seq. maps inside GPU kernels or gpu dev. maps (spcl#2088)

GPU codegen crashes and generates incorrect code with dynamic inputs to
seq. maps inside GPU kernels or gpu dev. maps

---------

Co-authored-by: alexnick83 <31545860+alexnick83@users.noreply.github.com>
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
…pcl#2246)

Updated the documentation for proposed pass decomposition, including
changes to pass names and descriptions for clarity.
@edopao edopao force-pushed the gt4py-next-integration branch from 3eec6f6 to ab9eaef Compare December 11, 2025 13:27
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 2 times, most recently from e552808 to d4db8e7 Compare December 17, 2025 07:26
commit ecb2785
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Wed Dec 17 08:19:40 2025 +0100

    Updated the dace updater workflow file.

commit f3198ef
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Wed Dec 17 07:41:26 2025 +0100

    Made the update point to the correct repo.

commit 96f963a
Merge: 8b7cce5 387f1e8
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Wed Dec 17 07:37:48 2025 +0100

    Merge remote-tracking branch 'spcl/main' into automatic_gt4py_deployment

commit 8b7cce5
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Mon Dec 1 09:18:22 2025 +0100

    Restored the original workflow files.

commit 362ab70
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Mon Dec 1 07:41:40 2025 +0100

    Now it has run once, so let's make it less runnable.

commit 81b8cfa
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Mon Dec 1 07:39:09 2025 +0100

    Made it run always.

commit 6d71466
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Mon Dec 1 07:38:11 2025 +0100

    Small update.

commit eb31e6c
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:23:33 2025 +0100

    Empty commit in the branch containing the workflow file.

commit 2970a75
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:21:09 2025 +0100

    Next step.

commit f5d3d9d
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:17:56 2025 +0100

    Let's disable everything.

commit 211e415
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:10:43 2025 +0100

    Disabled the kickstarter.

commit d012c26
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:05:38 2025 +0100

    Updated everything.
commit 2a89832
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon Oct 27 09:56:16 2025 +0100

    Fixed GPU_TX_MARKER test

commit c240128
Merge: 10160bc e38d006
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri Oct 24 18:29:55 2025 +0200

    Merge remote-tracking branch 'upstream/main' into nvtx_ranges

commit 10160bc
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri Oct 24 18:27:58 2025 +0200

    Fix instrumentation for copies

commit d14093c
Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch>
Date:   Mon Oct 6 16:16:43 2025 +0200

    Make pre-commit happy

commit 68942a3
Merge: a3063e5 b415f62
Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch>
Date:   Tue Sep 30 17:11:35 2025 +0200

    Merge remote-tracking branch 'upstream/main' into nvtx_ranges

commit a3063e5
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Sep 23 18:03:26 2025 +0200

    Working version of nvtx markers with allocations

commit 6788b97
Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch>
Date:   Tue Sep 23 13:42:41 2025 +0200

    Updated functions

commit 455ad38
Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch>
Date:   Tue Sep 23 13:20:51 2025 +0200

    Added marker on allocations as well

commit 80ce99c
Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch>
Date:   Wed Aug 20 19:02:36 2025 +0300

    Avoid profiling tasklets

commit 0314386
Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch>
Date:   Wed Aug 20 19:02:28 2025 +0300

    Fix get_latest_report_path in case there's no report

commit aad5e87
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Wed Aug 20 10:05:16 2025 +0200

    Remove import of deleted file

commit a3ff00e
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 18:16:11 2025 +0200

    Revert "Improved GPU Copy (spcl#1976)"

    This reverts commit bc83c47.

commit ea5f6ff
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 18:14:35 2025 +0200

    Make format happy

commit b1ea9af
Merge: bbc1faf aabbe48
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 19:12:43 2025 +0300

    Merge branch 'main' into nvtx_ranges

commit bbc1faf
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 18:07:12 2025 +0200

    Format a bit better with dace.instrument

commit eea658f
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 18:04:54 2025 +0200

    Fixes in gpu_tx_markers.py

commit 2f43f7a
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 18:04:43 2025 +0200

    Remove instrument_sdfg

commit 0fdb4df
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 17:28:57 2025 +0200

    Small refactoring of if statements in gpu_tx_markers.py

commit 73c52bf
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 17:26:02 2025 +0200

    Added on_sdfg_init/exit_begin/end functions

commit ff70f2f
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 16:32:34 2025 +0200

    Replaced is with ==

commit 3d626e0
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 16:31:04 2025 +0200

    Fix local and global streams

commit 209860d
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 16:27:04 2025 +0200

    Improve _is_sdfg_in_device_code

commit bc83c47
Author: Philip Müller <147368808+philip-paul-mueller@users.noreply.github.com>
Date:   Mon Jun 2 15:58:08 2025 +0200

    Improved GPU Copy (spcl#1976)

    Before some 2D copies (especially if they had FORTRAN order) were turned
    into Maps, see [issue#1953](spcl#1953).
    This PR modifies the code generator in such a way that such copies are
    now handled.

    There is some legacy stuff that should also be looked at.

    ---------

    Co-authored-by: Philip Mueller <philip.paul.mueller@bluemain.ch>
    Co-authored-by: Tal Ben-Nun <tbennun@gmail.com>

commit df99571
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Tue Aug 19 17:34:12 2025 +0300

    Apply suggestion from @tbennun

    Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>

commit da00f21
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 19 17:49:47 2025 +0200

    Avoid pushing rocTX markers before initializing HIP since it doesn't work

commit a39308b
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 16 15:13:31 2025 +0200

    Fix on_copy and on_scope for GPU_TX_MARKERS

commit 2d554fa
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Thu May 15 15:20:05 2025 +0200

    Removed preprocessor checks by properly placing ranges in NestedSDFGs and small fixes for CPU wrapper includes

commit 5937a15
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Wed May 14 11:33:02 2025 +0200

    Refactored a bit GPUTXMarkerProvider

commit 9e8ec9e
Author: Ioannis Magkanaris <ioannis.magkanaris@cscs.ch>
Date:   Wed May 14 10:52:26 2025 +0200

    Addressed PR comments for checking is the instrumentation is enabled

commit c3f1932
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 12 17:29:30 2025 +0200

    Small fixes and cleanups

commit 366721f
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 12 17:23:21 2025 +0200

    Fix order of imports in gpu_events.py

commit 8ea4327
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 12 17:04:56 2025 +0200

    Add markers for different SDFGs and states

commit 22b372e
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 12 09:45:20 2025 +0200

    Revert changes in GPU_Event provider

commit e5adaef
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 12 09:34:34 2025 +0200

    Allow building with HIP even if rocTX is not found

commit b30f4a2
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 9 17:20:34 2025 +0200

    Fix formatting

commit 747f357
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 9 17:14:17 2025 +0200

    Made test NVTX agnostic and updated documentation

commit 646ca90
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 9 17:05:10 2025 +0200

    Use same checks for enabling roctx as CMake

commit c28036b
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 9 17:00:19 2025 +0200

    Fix compilation for AMD gpu

commit 855304d
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Thu May 8 11:58:00 2025 +0200

    Fix library names

commit 9df4f73
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Thu May 8 11:36:29 2025 +0200

    Trying to use roctx

commit a55aeb7
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Wed May 7 17:58:37 2025 +0200

    Make formatting happy

commit a8bcadf
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Wed May 7 17:50:10 2025 +0200

    Renamed NVTX to GPU_TX_MARKERS and added note for AMD GPUs

commit 7337233
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 17:30:35 2025 +0200

    Changed nvtxRangePushA to nvtxRangePush

commit 74c9117
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 17:23:42 2025 +0200

    Fix copyright and GPU test

commit 989bc32
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 17:12:59 2025 +0200

    Make formatter happy

commit 4f57297
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 17:09:58 2025 +0200

    Remove NVTX markers from LIKWID since LIKWID has its own markers

commit a4d2ff8
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 17:08:08 2025 +0200

    Improved NVTX markers in likwid

commit 1e71171
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 15:42:13 2025 +0200

    Update NVTX Provider imports

commit 438090f
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 15:41:56 2025 +0200

    Update documentation

commit 89b7864
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 15:41:48 2025 +0200

    Small fix of whiteline in framecode

commit ef5355b
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 15:38:02 2025 +0200

    Refactored NVTX Instrumentation provider constructor and test for expected code

commit bbf1d32
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Mon May 5 15:37:16 2025 +0200

    Inherit LIKWID_GPU Instrumentation provider from NVTX as well

commit 90b50ac
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 2 18:29:07 2025 +0200

    Make GPUEventProvider inherit from NVTXProvider to enable the NVTX markers by default with it

commit c584255
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 2 18:01:31 2025 +0200

    Updated documentation

commit 04836fb
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 2 18:01:21 2025 +0200

    Moved the printing of NVTX range push and pop inside the NVTXProvider

commit f5240b2
Author: Ioannis Magkanaris <iomagkanaris@gmail.com>
Date:   Fri May 2 17:25:04 2025 +0200

    Added NVTX range in CPU wrapper for GPU kernel
commit 68ffa3b
Merge: c069546 d99ad29
Author: Philipp Schaad <schaad.phil@gmail.com>
Date:   Sun Nov 30 06:17:32 2025 -0600

    Merge branch 'main' into make_construct_args_public

commit c069546
Merge: 41902c3 408a481
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Tue Nov 4 11:22:14 2025 +0100

    Merge remote-tracking branch 'spcl/main' into make_construct_args_public

commit 41902c3
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 16:01:26 2025 +0100

    Fixed a bug.

commit 65725f9
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 15:15:03 2025 +0100

    This should be enough for bug compatibility.

commit daf90e9
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 12:58:25 2025 +0100

    Updated the thing a bit more.

commit 2ddabbd
Merge: 4da0c4e b44aeb0
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 12:54:19 2025 +0100

    Merge remote-tracking branch 'spcl/main' into make_construct_args_public

commit 4da0c4e
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 12:53:48 2025 +0100

    Made some additional check.

commit 69960ce
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 12:00:30 2025 +0100

    Forgot to do this.

commit 6e1a9ff
Merge: c1214fa 1bf2173
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 11:25:46 2025 +0100

    Merge remote-tracking branch 'spcl/main' into make_construct_args_public

commit c1214fa
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 09:50:41 2025 +0100

    Updated the tests and made it clear that you can not return a scalar from an SDFG.

commit 9397a23
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 09:40:29 2025 +0100

    Implemented the proper handling of tuples of size one.

commit e8d909e
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 09:30:48 2025 +0100

    Removed that stupid sclar return value feature that CAN NOT WORK.

    However, I saw that it also, under the hood sometimes tests if the argument is a pyobject.
    Since that thing is a pointer it is possible and I implemented it for that.
    But it was again not implemented properly, since for the case when the return value is passed as a regular argument, it was not checking that, only for managed return values.

commit ab110d2
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 09:24:45 2025 +0100

    Updated the description.

commit 899b2a0
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 09:24:32 2025 +0100

    Fixed some old stuff.

commit 7f17e13
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 09:08:49 2025 +0100

    Fixed a bug, but in a way I do not like.

commit c2c1116
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 08:40:47 2025 +0100

    Removed a missleading comment.

commit ded5df8
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 08:04:38 2025 +0100

    Made some refactoring to remove some strange DaCe behaviour.

commit b029828
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 08:02:28 2025 +0100

    Fixed an issue in safe_call

commit b09c9fc
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Oct 31 07:17:36 2025 +0100

    Included the first bunch of Tal's changes.

commit e138b06
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Thu Oct 30 15:12:23 2025 +0100

    Made the 'passed as positional and named argument'-error more explicit.

commit f901a3d
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Thu Oct 30 15:05:00 2025 +0100

    Fixed a bug in a unit test.

    Due to the refactoring the case that a variable is passed once as positional and as named argument is not detected and asserted.
    This test however, passed `a` always as positional argument and if `symbolic` is `True` also as named argument.

commit 767260d
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Thu Oct 30 14:19:44 2025 +0100

    Clarified a comment.

commit 2b8123a
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Thu Oct 30 13:56:20 2025 +0100

    Made the construct argumt vector function publich and also refactored some things.
commit 5b068e7
Author: Affifboudaoud <hk_boudaoud@esi.dz>
Date:   Sun Nov 23 22:50:46 2025 +0100

    Add visited set to avoid visiting same node multiple times
commit 7994132
Merge: b8a0fd7 387f1e8
Author: Philipp Schaad <schaad.phil@gmail.com>
Date:   Fri Dec 12 09:25:10 2025 +0100

    Merge branch 'main' into fix_block_size_config

commit b8a0fd7
Author: Edoardo Paone <edoardo.paone@cscs.ch>
Date:   Thu Dec 11 14:14:34 2025 +0100

    edit
commit c9f93fd
Author: Edoardo Paone <edoardo.paone@cscs.ch>
Date:   Thu Dec 18 00:00:20 2025 +0100

    fix state fusion for write-write hazard
@edopao edopao force-pushed the gt4py-next-integration branch from d4db8e7 to cd52c4b Compare December 18, 2025 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants