diff --git a/README.md b/README.md index 3fbd2ccd..0f2f22bd 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ yet flexible open convention with the following systems in mind: - **Kernel libraries** - ship one wheel to support multiple frameworks, Python versions, and different languages. [[FlashInfer](https://docs.flashinfer.ai/)] - **Kernel DSLs** - reusable open ABI for JIT and AOT kernel exposure frameworks and runtimes. [[TileLang](https://tilelang.com/)][[cuteDSL](https://docs.nvidia.com/cutlass/latest/media/docs/pythonDSL/cute_dsl_general/compile_with_tvm_ffi.html)] -- **Frameworks and runtimes** - a uniform extension point for ABI-compliant libraries and DSLs. [[PyTorch](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-pytorch)][[JAX](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-jax)][[NumPy/CuPy](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-numpy)] +- **Frameworks and runtimes** - a uniform extension point for ABI-compliant libraries and DSLs. [[PyTorch](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-pytorch)][[JAX](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-jax)][[PaddlePaddle](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-paddle)][[NumPy/CuPy](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-numpy)] - **ML infrastructure** - out-of-box bindings and interop across languages. [[Python](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-python)][[C++](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-cpp)][[Rust](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-rust)] - **Coding agents** - a unified mechanism for shipping generated code in production. diff --git a/docs/concepts/tensor.rst b/docs/concepts/tensor.rst index d7f93430..2669f719 100644 --- a/docs/concepts/tensor.rst +++ b/docs/concepts/tensor.rst @@ -20,7 +20,7 @@ Tensor and DLPack At runtime, TVM-FFI often needs to accept tensors from many sources: -* Frameworks (e.g. PyTorch, JAX) via :py:meth:`array_api.array.__dlpack__`; +* Frameworks (e.g. PyTorch, JAX, PaddlePaddle) via :py:meth:`array_api.array.__dlpack__`; * C/C++ callers passing :c:struct:`DLTensor* `; * Tensors allocated by a library but managed by TVM-FFI itself. @@ -115,7 +115,7 @@ PyTorch Interop On the Python side, :py:class:`tvm_ffi.Tensor` is a managed n-dimensional array that: -* can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...) ` to import tensors from external frameworks, e.g., :ref:`PyTorch `, :ref:`JAX `, :ref:`NumPy/CuPy `; +* can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...) ` to import tensors from external frameworks, e.g., :ref:`PyTorch `, :ref:`JAX `, :ref:`PaddlePaddle `, :ref:`NumPy/CuPy `; * implements the DLPack protocol so it can be passed back to frameworks without copying, e.g., :py:func:`torch.from_dlpack`. The following example demonstrates a typical round-trip pattern: diff --git a/docs/get_started/quickstart.rst b/docs/get_started/quickstart.rst index 6d608e7e..f4ded6a6 100644 --- a/docs/get_started/quickstart.rst +++ b/docs/get_started/quickstart.rst @@ -27,7 +27,7 @@ This guide walks through shipping a minimal ``add_one`` function that computes TVM-FFI's Open ABI and FFI make it possible to **ship one library** for multiple frameworks and languages. We can build a single shared library that works across: -- **ML frameworks**, e.g. PyTorch, JAX, NumPy, CuPy, and others; +- **ML frameworks**, e.g. PyTorch, JAX, PaddlePaddle, NumPy, CuPy, and others; - **Languages**, e.g. C++, Python, Rust, and others; - **Python ABI versions**, e.g. one wheel that supports all Python versions, including free-threaded ones. @@ -37,7 +37,7 @@ We can build a single shared library that works across: - Python: 3.9 or newer - Compiler: C++17-capable toolchain (GCC/Clang/MSVC) - - Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy + - Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy, PaddlePaddle - CUDA: Any modern version (if you want to try the CUDA part) - TVM-FFI installed via: @@ -90,7 +90,7 @@ it also exports the function's metadata as a symbol ``__tvm_ffi__metadata_add_on The class :cpp:class:`tvm::ffi::TensorView` enables zero-copy interop with tensors from different ML frameworks: - NumPy, CuPy, -- PyTorch, JAX, or +- PyTorch, JAX, PaddlePaddle, or - any array type that supports the standard :external+data-api:doc:`DLPack protocol `. Finally, :cpp:func:`TVMFFIEnvGetStream` can be used in the CUDA code to launch kernels on the caller's stream. @@ -162,7 +162,7 @@ TVM-FFI integrates with CMake via ``find_package`` as demonstrated below: - Python version/ABI. They are not compiled or linked with Python and depend only on TVM-FFI's stable C ABI; - Languages, including C++, Python, Rust, or any other language that can interop with the C ABI; -- ML frameworks, such as PyTorch, JAX, NumPy, CuPy, or any array library that implements the standard :external+data-api:doc:`DLPack protocol `. +- ML frameworks, such as PyTorch, JAX, PaddlePaddle, NumPy, CuPy, or any array library that implements the standard :external+data-api:doc:`DLPack protocol `. .. _sec-use-across-framework: @@ -228,6 +228,18 @@ After installation, ``add_one_cuda`` can be registered as a target for JAX's ``f )(x) print(y) +.. _ship-to-paddle: + +PaddlePaddle +~~~~~~~~~~~~ + +Since PaddlePaddle 3.3.0, full TVM FFI support is provided. + +.. literalinclude:: ../../examples/quickstart/load/load_paddle.py + :language: python + :start-after: [example.begin] + :end-before: [example.end] + .. _ship-to-numpy: NumPy/CuPy diff --git a/docs/get_started/stable_c_abi.rst b/docs/get_started/stable_c_abi.rst index b8d8195b..0f6dbd55 100644 --- a/docs/get_started/stable_c_abi.rst +++ b/docs/get_started/stable_c_abi.rst @@ -125,7 +125,7 @@ Stability and Interoperability **Cross-language.** TVM-FFI implements this calling convention in multiple languages (C, C++, Python, Rust, ...), enabling code written in one language - or generated by a DSL targeting the ABI - to be called from another language. -**Cross-framework.** TVM-FFI uses standard data structures such as :external+data-api:doc:`DLPack tensors ` to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (NumPy, PyTorch, TensorFlow, CuPy, JAX, and others). +**Cross-framework.** TVM-FFI uses standard data structures such as :external+data-api:doc:`DLPack tensors ` to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (NumPy, PyTorch, TensorFlow, CuPy, JAX, PaddlePaddle, and others). Stable ABI in C Code diff --git a/examples/quickstart/README.md b/examples/quickstart/README.md index 1c23ecaf..093ebb64 100644 --- a/examples/quickstart/README.md +++ b/examples/quickstart/README.md @@ -57,6 +57,7 @@ To run library loading examples across ML frameworks (requires CUDA for the CUDA ```bash python load/load_pytorch.py +python load/load_paddle.py python load/load_numpy.py python load/load_cupy.py ``` diff --git a/examples/quickstart/load/load_paddle.py b/examples/quickstart/load/load_paddle.py new file mode 100644 index 00000000..1162e15f --- /dev/null +++ b/examples/quickstart/load/load_paddle.py @@ -0,0 +1,30 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# fmt: off +# ruff: noqa +# mypy: ignore-errors +# [example.begin] +# File: load/load_paddle.py +import tvm_ffi +mod = tvm_ffi.load_module("build/add_one_cuda.so") + +import paddle +x = paddle.tensor([1, 2, 3, 4, 5], dtype=paddle.float32, device="cuda") +y = paddle.empty_like(x) +mod.add_one_cuda(x, y) +print(y) +# [example.end] diff --git a/examples/quickstart/run_all_cuda.sh b/examples/quickstart/run_all_cuda.sh index d27bf0b3..d8807a50 100755 --- a/examples/quickstart/run_all_cuda.sh +++ b/examples/quickstart/run_all_cuda.sh @@ -26,3 +26,6 @@ python load/load_pytorch.py # To load and run `add_one_cuda.so` in CuPy python load/load_cupy.py + +# To load and run `add_one_cuda.so` in PaddlePaddle +python load/load_paddle.py