-
Notifications
You must be signed in to change notification settings - Fork 111
Description
Hi, my company bought Ryzen 7 AI 350 laptop and an AI engineer is trying to figure out how Ryzen AI is even supposed to be used and how to bring own fine-tuned model to run on NPU, but we are being startled with the state of software stack and the fact that it is closed shuts down any ability to contribute a fix for it. Official gte-large-en-v1.5-bf16 example revealed severe defects in Ryzen AI software.
We had discovered that it is not trivial to bring custom models to run with Vitis and there's a giant amount of corner-cases which the documentation carefully avoids documenting to remain presentable, which is misleading. Trying to bring our own model, we attempted numerous surgeries on the model, ensured that all shapes are fixed, batch size is 1 (#309). But toolings severely lacks any debugging provisions that would tell why a particular operation fails (I assume this is relevant to #318). Nodes were not assigned to NPU at all which turned out to be an issue of its own (#324) and we had hope for two days that this has to be some mistake.
We managed to provide gte-large-en-v1.5-bf16 example with a correct environment with ancient version of torch, and that allowed us to reveal severe defects in the initial compilation process. Using the same code from the example, we managed to export our model (based on qwen2 architecture) but it only reinforced doubts about the correctness of the table of supported ops and the presence of severe defects in Ryzen AI software, I assume this is also the reason why reference to ops support page is not easily provided. Substituting gte for own model in the example would result in this trace:
I20260110 18:40:33.984828 13148 vitisai_compile_model.cpp:1263] Vitis AI EP Load ONNX Model Success
I20260110 18:40:33.984828 13148 vitisai_compile_model.cpp:1264] Graph Input Node Name/Shape (2)
I20260110 18:40:33.984828 13148 vitisai_compile_model.cpp:1268] input_ids : [1x4096]
I20260110 18:40:33.984828 13148 vitisai_compile_model.cpp:1268] attention_mask : [1x4096]
I20260110 18:40:33.984828 13148 vitisai_compile_model.cpp:1274] Graph Output Node Name/Shape (1)
I20260110 18:40:33.984828 13148 vitisai_compile_model.cpp:1278] embeddings : [1x1024]
[Vitis AI EP] No. of Operators : CPU 7 VAIML 2039
[Vitis AI EP] No. of Subgraphs : NPU 1 Actually running on NPU 1
Traceback (most recent call last):
File "C:\Users\[REDACTED]\RyzenAI-SW-main\example\gte-large-en-v1.5-bf16\run.py", line 90, in <module>
main(args)
File "C:\Users\[REDACTED]\RyzenAI-SW-main\example\gte-large-en-v1.5-bf16\run.py", line 13, in main
npu_session = ort.InferenceSession(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\[REDACTED]\.conda\envs\onnxtools\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 485, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "C:\Users\[REDACTED]\.conda\envs\onnxtools\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 584, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Sub(14) node with name '/base_model/Sub'
(The above is NOT a report for gte model, we just used literal example code to test our model there)
We tried inspecting reports generated for both gte and our model and found that the exception above is literally a lie, because gte model also has Sub and in fact it is also offloaded to CPU.
gte: report.json
In our report, Sub is being put to VAIML device, which causes a failure: report-2.json
So what exactly is going on here? Apart from obviously incorrect ops support table with respect to NPU, Vitis compilation process is incorrectly assigning operators.