Skip to content

vLLM-Gaudi for vLLM-v0.11.2

Latest

Choose a tag to compare

@PatrykWo PatrykWo released this 03 Dec 10:52
f9b6446

Highlights

This version is based on vLLM 0.11.2 and supports Intel® Gaudi® v1.22.2.

This release introduces the production-ready vLLM Hardware Plugin for Intel® Gaudi®, a community-driven integration layer based on the vLLM v1 architecture. It enables efficient, high-performance large language model (LLM) inference on Intel® Gaudi® AI accelerators. The plugin is an alternative to the vLLM fork, which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin.

The plugin provides feature parity with the fork, including mature, production-ready implementations of Automatic Prefix Caching (APC) and async scheduler. Two legacy features - multi-step scheduling and delayed sampling - have been discontinued, as their functionality is now covered by the async scheduler.

For more details on the plugin's implementation, see Plugin System.

To start using the plugin, follow the Basic Quick Start Guide and explore the rest of this documentation.

What's Changed

New Contributors

Full Changelog: v0.10.1...v0.11.2