apache · junrushao · Jan 18, 2026 · Jan 17, 2026 · Jan 17, 2026 · Jan 17, 2026
diff --git a/README.md b/README.md
@@ -24,7 +24,7 @@ yet flexible open convention with the following systems in mind:
 
 - **Kernel libraries** - ship one wheel to support multiple frameworks, Python versions, and different languages. [[FlashInfer](https://docs.flashinfer.ai/)]
 - **Kernel DSLs** - reusable open ABI for JIT and AOT kernel exposure frameworks and runtimes. [[TileLang](https://tilelang.com/)][[cuteDSL](https://docs.nvidia.com/cutlass/latest/media/docs/pythonDSL/cute_dsl_general/compile_with_tvm_ffi.html)]
-- **Frameworks and runtimes** - a uniform extension point for ABI-compliant libraries and DSLs. [[PyTorch](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-pytorch)][[JAX](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-jax)][[NumPy/CuPy](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-numpy)]
+- **Frameworks and runtimes** - a uniform extension point for ABI-compliant libraries and DSLs. [[PyTorch](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-pytorch)][[JAX](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-jax)][[PaddlePaddle](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-paddle)][[NumPy/CuPy](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-numpy)]
- **Frameworks and runtimes** - a uniform extension point for ABI-compliant libraries and DSLs. [[PyTorch](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-pytorch)][[JAX](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-jax)][[PaddlePaddle](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-paddle)][[NumPy/CuPy](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-numpy)]
+- **Frameworks and runtimes** - a uniform extension point for ABI-compliant libraries and DSLs. [[JAX](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-jax)][[NumPy/CuPy](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-numpy)][[PaddlePaddle](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-paddle)][[PyTorch](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-pytorch)]
- **Frameworks and runtimes** - a uniform extension point for ABI-compliant libraries and DSLs. [[PyTorch](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-pytorch)][[JAX](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-jax)][[PaddlePaddle](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-paddle)][[NumPy/CuPy](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-numpy)]
+- **Frameworks and runtimes** - a uniform extension point for ABI-compliant libraries and DSLs. [[JAX](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-jax)][[NumPy/CuPy](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-numpy)][[PaddlePaddle](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-paddle)][[PyTorch](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-pytorch)]
 - **ML infrastructure** - out-of-box bindings and interop across languages. [[Python](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-python)][[C++](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-cpp)][[Rust](https://tvm.apache.org/ffi/get_started/quickstart.html#ship-to-rust)]
 - **Coding agents** - a unified mechanism for shipping generated code in production.
 

diff --git a/docs/concepts/tensor.rst b/docs/concepts/tensor.rst
@@ -20,7 +20,7 @@ Tensor and DLPack
 
 At runtime, TVM-FFI often needs to accept tensors from many sources:
 
-* Frameworks (e.g. PyTorch, JAX) via :py:meth:`array_api.array.__dlpack__`;
+* Frameworks (e.g. PyTorch, JAX, PaddlePaddle) via :py:meth:`array_api.array.__dlpack__`;
-* Frameworks (e.g. PyTorch, JAX, PaddlePaddle) via :py:meth:`array_api.array.__dlpack__`;
+* Frameworks (e.g. JAX, PaddlePaddle, PyTorch) via :py:meth:`array_api.array.__dlpack__`;
-* Frameworks (e.g. PyTorch, JAX, PaddlePaddle) via :py:meth:`array_api.array.__dlpack__`;
+* Frameworks (e.g. JAX, PaddlePaddle, PyTorch) via :py:meth:`array_api.array.__dlpack__`;
 * C/C++ callers passing :c:struct:`DLTensor* <DLTensor>`;
 * Tensors allocated by a library but managed by TVM-FFI itself.
 
@@ -115,7 +115,7 @@ PyTorch Interop
 
 On the Python side, :py:class:`tvm_ffi.Tensor` is a managed n-dimensional array that:
 
-* can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...) <tvm_ffi.from_dlpack>` to import tensors from external frameworks, e.g., :ref:`PyTorch <ship-to-pytorch>`, :ref:`JAX <ship-to-jax>`, :ref:`NumPy/CuPy <ship-to-numpy>`;
+* can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...) <tvm_ffi.from_dlpack>` to import tensors from external frameworks, e.g., :ref:`PyTorch <ship-to-pytorch>`, :ref:`JAX <ship-to-jax>`, :ref:`PaddlePaddle <ship-to-paddle>`, :ref:`NumPy/CuPy <ship-to-numpy>`;
-* can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...) <tvm_ffi.from_dlpack>` to import tensors from external frameworks, e.g., :ref:`PyTorch <ship-to-pytorch>`, :ref:`JAX <ship-to-jax>`, :ref:`PaddlePaddle <ship-to-paddle>`, :ref:`NumPy/CuPy <ship-to-numpy>`;
+* can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...) <tvm_ffi.from_dlpack>` to import tensors from external frameworks, e.g., :ref:`JAX <ship-to-jax>`, :ref:`NumPy/CuPy <ship-to-numpy>`, :ref:`PaddlePaddle <ship-to-paddle>`, :ref:`PyTorch <ship-to-pytorch>`;
-* can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...) <tvm_ffi.from_dlpack>` to import tensors from external frameworks, e.g., :ref:`PyTorch <ship-to-pytorch>`, :ref:`JAX <ship-to-jax>`, :ref:`PaddlePaddle <ship-to-paddle>`, :ref:`NumPy/CuPy <ship-to-numpy>`;
+* can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...) <tvm_ffi.from_dlpack>` to import tensors from external frameworks, e.g., :ref:`JAX <ship-to-jax>`, :ref:`NumPy/CuPy <ship-to-numpy>`, :ref:`PaddlePaddle <ship-to-paddle>`, :ref:`PyTorch <ship-to-pytorch>`;
 * implements the DLPack protocol so it can be passed back to frameworks without copying, e.g., :py:func:`torch.from_dlpack`.
 
 The following example demonstrates a typical round-trip pattern:

diff --git a/docs/get_started/quickstart.rst b/docs/get_started/quickstart.rst
@@ -27,7 +27,7 @@ This guide walks through shipping a minimal ``add_one`` function that computes
 TVM-FFI's Open ABI and FFI make it possible to **ship one library** for multiple frameworks and languages.
 We can build a single shared library that works across:
 
-- **ML frameworks**, e.g. PyTorch, JAX, NumPy, CuPy, and others;
+- **ML frameworks**, e.g. PyTorch, JAX, PaddlePaddle, NumPy, CuPy, and others;
- **ML frameworks**, e.g. PyTorch, JAX, PaddlePaddle, NumPy, CuPy, and others;
+- **ML frameworks**, e.g. CuPy, JAX, NumPy, PaddlePaddle, PyTorch, and others;
- **ML frameworks**, e.g. PyTorch, JAX, PaddlePaddle, NumPy, CuPy, and others;
+- **ML frameworks**, e.g. CuPy, JAX, NumPy, PaddlePaddle, PyTorch, and others;
 - **Languages**, e.g. C++, Python, Rust, and others;
 - **Python ABI versions**, e.g. one wheel that supports all Python versions, including free-threaded ones.
 
@@ -37,7 +37,7 @@ We can build a single shared library that works across:
 
    - Python: 3.9 or newer
    - Compiler: C++17-capable toolchain (GCC/Clang/MSVC)
-   - Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy
+   - Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy, PaddlePaddle
-   - Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy, PaddlePaddle
+   - Optional ML frameworks for testing: CuPy, JAX, NumPy, PaddlePaddle, PyTorch
-   - Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy, PaddlePaddle
+   - Optional ML frameworks for testing: CuPy, JAX, NumPy, PaddlePaddle, PyTorch
    - CUDA: Any modern version (if you want to try the CUDA part)
    - TVM-FFI installed via:
 
@@ -90,7 +90,7 @@ it also exports the function's metadata as a symbol ``__tvm_ffi__metadata_add_on
 The class :cpp:class:`tvm::ffi::TensorView` enables zero-copy interop with tensors from different ML frameworks:
 
 - NumPy, CuPy,
-- PyTorch, JAX, or
+- PyTorch, JAX, PaddlePaddle, or
- PyTorch, JAX, PaddlePaddle, or
+- JAX, PaddlePaddle, PyTorch, or
- PyTorch, JAX, PaddlePaddle, or
+- JAX, PaddlePaddle, PyTorch, or
 - any array type that supports the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
 
 Finally, :cpp:func:`TVMFFIEnvGetStream` can be used in the CUDA code to launch kernels on the caller's stream.
@@ -162,7 +162,7 @@ TVM-FFI integrates with CMake via ``find_package`` as demonstrated below:
 
 - Python version/ABI. They are not compiled or linked with Python and depend only on TVM-FFI's stable C ABI;
 - Languages, including C++, Python, Rust, or any other language that can interop with the C ABI;
-- ML frameworks, such as PyTorch, JAX, NumPy, CuPy, or any array library that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
+- ML frameworks, such as PyTorch, JAX, PaddlePaddle, NumPy, CuPy, or any array library that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
- ML frameworks, such as PyTorch, JAX, PaddlePaddle, NumPy, CuPy, or any array library that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
+- ML frameworks, such as CuPy, JAX, NumPy, PaddlePaddle, PyTorch, or any array library that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
- ML frameworks, such as PyTorch, JAX, PaddlePaddle, NumPy, CuPy, or any array library that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
+- ML frameworks, such as CuPy, JAX, NumPy, PaddlePaddle, PyTorch, or any array library that implements the standard :external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
 
 .. _sec-use-across-framework:
 
@@ -228,6 +228,18 @@ After installation, ``add_one_cuda`` can be registered as a target for JAX's ``f
   )(x)
   print(y)
 
+.. _ship-to-paddle:
+
+PaddlePaddle
+~~~~~~~~~~~~
+
+Since PaddlePaddle 3.3.0, full TVM FFI support is provided.
+
+.. literalinclude:: ../../examples/quickstart/load/load_paddle.py
+  :language: python
+  :start-after: [example.begin]
+  :end-before: [example.end]
+
 .. _ship-to-numpy:
 
 NumPy/CuPy

diff --git a/docs/get_started/stable_c_abi.rst b/docs/get_started/stable_c_abi.rst
@@ -125,7 +125,7 @@ Stability and Interoperability
 
 **Cross-language.** TVM-FFI implements this calling convention in multiple languages (C, C++, Python, Rust, ...), enabling code written in one language - or generated by a DSL targeting the ABI - to be called from another language.
 
-**Cross-framework.** TVM-FFI uses standard data structures such as :external+data-api:doc:`DLPack tensors <design_topics/data_interchange>` to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (NumPy, PyTorch, TensorFlow, CuPy, JAX, and others).
+**Cross-framework.** TVM-FFI uses standard data structures such as :external+data-api:doc:`DLPack tensors <design_topics/data_interchange>` to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (NumPy, PyTorch, TensorFlow, CuPy, JAX, PaddlePaddle, and others).
-**Cross-framework.** TVM-FFI uses standard data structures such as :external+data-api:doc:`DLPack tensors <design_topics/data_interchange>` to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (NumPy, PyTorch, TensorFlow, CuPy, JAX, PaddlePaddle, and others).
+**Cross-framework.** TVM-FFI uses standard data structures such as :external+data-api:doc:`DLPack tensors <design_topics/data_interchange>` to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (CuPy, JAX, NumPy, PaddlePaddle, PyTorch, TensorFlow, and others).
-**Cross-framework.** TVM-FFI uses standard data structures such as :external+data-api:doc:`DLPack tensors <design_topics/data_interchange>` to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (NumPy, PyTorch, TensorFlow, CuPy, JAX, PaddlePaddle, and others).
+**Cross-framework.** TVM-FFI uses standard data structures such as :external+data-api:doc:`DLPack tensors <design_topics/data_interchange>` to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (CuPy, JAX, NumPy, PaddlePaddle, PyTorch, TensorFlow, and others).
 
 
 Stable ABI in C Code

diff --git a/examples/quickstart/README.md b/examples/quickstart/README.md
@@ -57,6 +57,7 @@ To run library loading examples across ML frameworks (requires CUDA for the CUDA
 
 ```bash
 python load/load_pytorch.py
+python load/load_paddle.py
 python load/load_numpy.py
 python load/load_cupy.py
 ```

diff --git a/examples/quickstart/load/load_paddle.py b/examples/quickstart/load/load_paddle.py
@@ -0,0 +1,30 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# fmt: off
+# ruff: noqa
+# mypy: ignore-errors
+# [example.begin]
+# File: load/load_paddle.py
+import tvm_ffi
+mod = tvm_ffi.load_module("build/add_one_cuda.so")
+
+import paddle
-import tvm_ffi
-mod = tvm_ffi.load_module("build/add_one_cuda.so")
-
-import paddle
+import paddle
+import tvm_ffi
+
+mod = tvm_ffi.load_module("build/add_one_cuda.so")
-import tvm_ffi
-mod = tvm_ffi.load_module("build/add_one_cuda.so")
-
-import paddle
+import paddle
+import tvm_ffi
+
+mod = tvm_ffi.load_module("build/add_one_cuda.so")
+x = paddle.tensor([1, 2, 3, 4, 5], dtype=paddle.float32, device="cuda")
-x = paddle.tensor([1, 2, 3, 4, 5], dtype=paddle.float32, device="cuda")
+x = paddle.to_tensor([1, 2, 3, 4, 5], dtype=paddle.float32, place="cuda")
-x = paddle.tensor([1, 2, 3, 4, 5], dtype=paddle.float32, device="cuda")
+x = paddle.to_tensor([1, 2, 3, 4, 5], dtype=paddle.float32, place="cuda")
+y = paddle.empty_like(x)
+mod.add_one_cuda(x, y)
+print(y)
+# [example.end]
diff --git a/examples/quickstart/run_all_cuda.sh b/examples/quickstart/run_all_cuda.sh
@@ -26,3 +26,6 @@ python load/load_pytorch.py
 
 # To load and run `add_one_cuda.so` in CuPy
 python load/load_cupy.py
+
+# To load and run `add_one_cuda.so` in PaddlePaddle
+python load/load_paddle.py