Release vLLM-Gaudi for vLLM-v0.11.2 · vllm-project/vllm-gaudi

Highlights

This version is based on vLLM 0.11.2 and supports Intel® Gaudi® v1.22.2.

This release introduces the production-ready vLLM Hardware Plugin for Intel® Gaudi®, a community-driven integration layer based on the vLLM v1 architecture. It enables efficient, high-performance large language model (LLM) inference on Intel® Gaudi® AI accelerators. The plugin is an alternative to the vLLM fork, which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin.

The plugin provides feature parity with the fork, including mature, production-ready implementations of Automatic Prefix Caching (APC) and async scheduler. Two legacy features - multi-step scheduling and delayed sampling - have been discontinued, as their functionality is now covered by the async scheduler.

For more details on the plugin's implementation, see Plugin System.

To start using the plugin, follow the Basic Quick Start Guide and explore the rest of this documentation.

What's Changed

add commit-id to distinguish image and container for each PR by @xuechendi in #85
[Upstream fix] Fix after #23041 from upstream by @adobrzyn in #87
Change warmup scenario for execute dummy scenario by @adobrzyn in #54
remove enable_prompt_adapter in test to fix by @xuechendi in #91
Fix jenkins - remove failed test and fix later / update API by @xuechendi in #79
[Upstream fix] Fix after #23262 from upstream - Make new_block_ids None if empty by @adobrzyn in #93
Enable multimodal support + qwen2.5-vl by @attafosu in #92
Fix upstream PR 22668 that added additional arg to is_kv_cache_dtype_supported by @mswiniarsk in #96
Port defragmentation support from vllm-fork PR #1568 by @madamczyk-intel in #94
[Upstream fix] Fix after #22711 by @adobrzyn in #102
Reduce number of compilations when dynamic shapes is used by @anko-intel in #90
Warmup fix - for non contiguous PA runs, don't take more context blocks than possible by @adobrzyn in #97
[UT] Fix test args for bucketing tests by @adobrzyn in #105
[SW-236088] Add sampler unit tests by @kamil-kaczor in #99
Avoid copying dynamic slice of sampling_metadata tensors by @mswiniarsk in #88
Fix mm encoder inputs for mix-modalities in input batch by @attafosu in #103
Fix decode profiling by @kamil-kaczor in #106
fix upstream PR 23749 by @xuechendi in #108
Fix the failing introduced by upstream 22685 by @xuechendi in #110
fix an argument issue introduced by recent vllm upstream and add CI by @xuechendi in #111
Port G2 scaling convert from vllm-fork #1505 by @xuechendi in #112
Enable Spec Decode for HPU v1 - Part1(basic workflow + eagle) by @xuechendi in #81
fix qwen3-30B-A3B-FP8 - The number of dims cannot be packed into CompleteArgumentSpec:65535 by @xuechendi in #113
[FIX HOURLY Failure] transformer 4.56.0 is not compatible with INC by @xuechendi in #117
Remove test_load_model_weights_inplace by @kzawora-intel in #48
[BUG fix]Fix spec_decode introduced long graph compilation issue by @xuechendi in #127
[Bugfix] Warmup with continuous PA by @adobrzyn in #126
Disable warmup for defragmentator by @mswiniarsk in #132
Merging vllm docker implementation to vllm-gaudi (v1) by @PatrykWo in #125
Enable embedding feature by @slokesha in #120
Revert "Enable embedding feature" by @adobrzyn in #140
[Bugfix] Remove reqs without logits - merge prefill case by @adobrzyn in #137
Update CODEOWNERS by @mgawarkiewicz-intel in #144
Fix warmup break when max decode bucket bs > max num seq by @taran2210 in #107
Add tests for custom op registration by @Kacper-Pietkun in #109
Enable embedding feature by @slokesha in #141
Update CODEOWNERS file by @vivekgoe in #143
[Merged Prefill] Warmup for merged prefill by @adobrzyn in #104
Experimental support for Unified Attention by @madamczyk-intel in #133
Introducing sampler warmup as separate warmup step by @ksmusz in #131
Add support for LoRA by @vivekgoe in #51
Add data parallel support by @wuxun-zhang in #80
Increase allowed line length to 120 + reformat accordingly by @kzawora-intel in #130
[FIX HOURLY]Remove DP test from Hourly by @xuechendi in #147
Update CODEOWNERS by @afierka-intel in #135
Enable sampler compilation by @Kacper-Pietkun in #95
Add DP into CI by @wuxun-zhang in #146
Add TESTOWNERS by @kzawora-intel in #153
Patch FusedMoE forward to avoid dynamo recompilations by @kdamaszk in #158
[CI] Jenkins false positive bugfix by @kzawora-intel in #159
Fix dummy decode input for DP by @wuxun-zhang in #151
[Quick fix for CI]fix CI break on Qwen2.5-vl and update docker image by @xuechendi in #161
initial port for nixl by @hsubramony in #100
update nixl version in requirements by @hsubramony in #163
Re-quantize FP8 model with INC by @yiliu30 in #114
[Feature][SpecDecode][Part2] Eagle3,MTP enabling, accept_rate improvement by @xuechendi in #142
[BUGFIX] qwen2.5-vl failed after PR24444, provide a temp solution by @xuechendi in #162
Reenabling llama4 models by @afierka-intel in #128
Allow building vllm-plugin docker with upstream torch by @mmuszynskihabana in #155
[HOURLY FIX] For upstream PR-24548 changes by @xuechendi in #166
[BUGFIX] warmup failed after PR104, propose fix in this PR by @xuechendi in #148
TESTOWNERS update by @adobrzyn in #165
[TEMP-WA] Skip Qwen3-30B-A3B in tests - Bug introduced in upstream #24772 by @attafosu in #168
[CI FIX]Fix issue introduced by upstream PR #23974 by @xuechendi in #172
[CI FIX] Fix issue introduced by upstream #24745 by @xuechendi in #174
[BUG][Disable CI] Disable DP test due recent upstream change failed HPU DP by @xuechendi in #177
Fully overlap model execution by @tianmu-li in #134
Added fix for VLLM_WEIGHT_LOAD_FORCE_SYNC by @tianmu-li in #173
Introduce VLLM_SCALE_ADJUSTMENT by @xinyu-intel in #164
Support Ray distributed executor by @xinyu-intel in #169
Bug fix: hpu mrope by @attafosu in #167
Fix in docker compose functionality for v1-plugin by @PatrykWo in #185
CI fix by @adobrzyn in #186
Fix dp sync after upstream change #24105 by @wuxun-zhang in #179
Cache token ids on device for async_scheduling by @tianmu-li in #184
[BUGFIX] Fix hourly after PR#22772 by @adobrzyn in #197
[SW-240630] Qwen3-30B-MoE: Flatten post-attn seqs and restore model output shape by @attafosu in #176
fix block bucket size for DP+contiguous PA by @wuxun-zhang in #171
Fix swap in defragmentator by @kamil-kaczor in #182
Unified mixed batches by @madamczyk-intel in #196
[SW-236002] Support compressed int4 w4a16 format by @skavulya in #193
update HOURLY docker image and move DP to separate test run by @xuechendi in #209
Move hourly to aicf-gaudi2-07 by @xuechendi in #211
Create .readthedocs.yaml by @kzawora-intel in #219
[BUGFIX] Fix after PR25332 & 25321 & 25366 by @adobrzyn in #215
Fix DP dummy run crash for P/D by @wuxun-zhang in #194
Enable interleaved sliding window for gemma3 by @jiminha in #150
Update the script fix for gemma-3-4b test by @jiminha in #225
V0.10.2 docker updates / benchmark serving section (#191) - cherry-pick by @PatrykWo in #200
use vllm intree API to enable synced_model_load, #25126 by @xuechendi in #208
[FIX][Upstream caused crash] Fix crash caused by upstream PR 25184 by @xuechendi in #238
Enable p2d2 for nixl by @hsubramony in #237
[FIX][upstream crash]Fix due upstream change 25510 by @xuechendi in #241
Remove sync point from _prepare_sampling by @kdamaszk in #204
Align to lora_manager changes in upstream by @vivekgoe in #244
Update test owners: iboiko-habana, jkaniecki by @iboiko-habana in #247
Enable device_to_device nixl_connector support by @xuechendi in #240
Add fused_experts to HPUFp8MoEMethod to fix Deepseek by @kdamaszk in #228
add hf_token for CI by @xuechendi in #248
another PR for HF_TOKEN by @xuechendi in #251
update CI file to use my PR code by @xuechendi in #254
Fix crash due to PR 25541 by @xuechendi in #252
Add HPUMultiHeadAttention with FusedSDPA by @jiminha in #249
skip dp padding sync in set_forward_context by @wuxun-zhang in #226
[SW-236002] Enable group indexing for compressed w4a16 format by @skavulya in #243
Enable group indexing gptq by @jmamzax in #154
Adding dynamic swap number and defragmenter warmup by @ksmusz in #183
fix crash introduced by upstream PR 25613 and PR23991 by @xuechendi in #259
Fix crash introduced by 25489 - cause PD fail by @xuechendi in #260
[HOURLY RUN] update the scripts and action to run in seperate job by @xuechendi in #261
[sw 239237] Add last good commit based on PR257 by @xuechendi in #262
[GITHUB ACTION] update pre-merge to block CI for not ready PR by @xuechendi in #266
[GITHUB ACTION] quick fix for last update to pre-merge by @xuechendi in #267
remove DCO check by @xuechendi in #269
[GITHUB ACTION] [PRE_COMMIT] pre-check before start actual CI by @xuechendi in #270
[upstream crash] fix spec decode due to upstream 24986 by @xuechendi in #265
[GITHUB ACTION][HOURLY] add force push otherwise it failed to update by @xuechendi in #268
{GITHUB ACTION}[PRE_MERGE] last refine to enable DCO check by @xuechendi in #271
{GITHUB ACTION}[PRE_MERGE] post comments if PR failed to meet DCO or mergable requirement by @xuechendi in #273
[Unified Attention] Bucketing and Warmup for Unified Attention by @adobrzyn in #157
Adding prompt context flags for linear warmup by @iboiko-habana in #217
[FIX_FOR_VLLM_LATEST]{GITHUB ACTION}[PRE-MERGE] switch to last good commit or main based on label by @xuechendi in #279
[FIX_FOR_VLLM_LATEST] FIX_HOURLY_by_skip_embedding due upstream 25738 by @xuechendi in #280
{GITHUB ACTION} Add update stable commit action by @xuechendi in #282
Update LoRA tests by @vivekgoe in #255
[test] Add yaml files for fp8 tests by @ulivne in #53
Fix for negative logits by @pawel-olejniczak in #160
Enable modification of prompt BS by @ksmusz in #258
Fix DP dummy run cfg by @wuxun-zhang in #284
[Fix Hourly] install UCX from source instead using builtin wheel from nixl by @xuechendi in #289
{GITHUB ACTION} remove DCO block by @xuechendi in #290
Fix deepseek FP8 weight creation due to upstream vllm change by @skavulya in #281
Support sequence parallel MOE after upstream #24982 by @wuxun-zhang in #285
Enable H2d(runtime scale patching) for Torch compile by default by @jczaja in #235
[FIX_FOR_VLLM_LATEST] fix issue introduced by PR25896 and comment out still failing tests by @xuechendi in #292
[NIXL] Fix crash introduced by upstream PR #25902 by @xuechendi in #293
[MLA][Deepseek] Bring back deepseek after change from PR25896 by @xuechendi in #294
[FIX_FOR_VLLM_LATEST] Fix for crash introduced by upstream PR 19330 by @xuechendi in #295
Fix Embeding hang by @slokesha in #291
Fix after #16229, mm by @adobrzyn in #286
Add assert for empty buckets by @iboiko-habana in #236
Update CODEOWNERS by @michalkuligowski in #297
Use type strings to be compatible with python 3.10 by @madamczyk-intel in #214
Fixing padded iterators in _align_and_pad by @ksmusz in #300
[CI][NIXL]cache/reuse pre-build wheel to skip always re-build for nixl by @xuechendi in #304
[NIXL][Dockerfile] add docker file for latest vllm_gaudi + nixl for llmd by @xuechendi in #307
[GLM-4.5] [BugFix] make GLM-4.5 working by adding model to flatten_input list by @xuechendi in #306
[BugFix][Deepseek][INC] fix duplicate submodules for deepseek INC quantization by @skavulya in #305
Update CODEOWNERS by @iboiko-habana in #303
Add restriction of usage VLLM_DECODE_BLOCK_BUCKET_MAX>max_blocks by @iboiko-habana in #302
[README]Add NIXL installation guide in README by @xuechendi in #308
[FIX_FOR_VLLM_LATEST] fix issue brought by upstream PR #25893 by @xuechendi in #310
[FIX_FOR_VLLM_LATEST] update hpu_model_runner according to #25676 by @xuechendi in #311
{GITHUB ACTION}[BO_ACTION] New action for release branch out by @xuechendi in #312
[GITHUB ACTION][BO] update create_branch_action by @xuechendi in #315
[GITHUB ACTION]only trigger tests for certain folder and add skip-gaudi-tests by @xuechendi in #325
[GITHUB ACTION] Quick fix on pre-merge enabling files change compare on fork repo by @xuechendi in #328
Fix for missing graphed_buckets attr while bucketing is off by @ksmusz in #321
RUNTIME SCALE PATCHGIN info by @jczaja in #317
Fix calculating used blocks by @mswiniarsk in #318
Fix defragmenter compilation by @kzawora-intel in #334
Add Plugin V1 specific recipe changes by @nngokhale in #187
[SKIP CI][DP] disable DP test due hourly fail by @xuechendi in #339
Update long context README by @iboiko-habana in #256
Fix long-context scenarios - torch.cat error by @afierka-intel in #346
Remove changed-files CI step by @kzawora-intel in #351
[Bugfix] Fix bucketing of query + num_blocks neighbor expansion by @kzawora-intel in #350
[Docs] README update - bucketing, warmup, defragmenter and sampler warmup by @ksmusz in #353
[Bugfix] Fix decode bucket validity condition by @kzawora-intel in #355
[Bugfix] Fix bucketing UT by @kzawora-intel in #367
[GITHUB ACTION] Remove commits comparison so we can rerun by @xuechendi in #373
[CI] Set seeds for e2e tests by @kzawora-intel in #368
Fix dp padding after upstream change #25768 by @wuxun-zhang in #362
Create LICENSE by @kzawora-intel in #379
Change to starting page and installation by @PatrykWo in #371
[FIX_FOR_VLLM_LATEST] Fix upstream crash introduced by #24486 + #24926 + #25103 + #25807 by @iboiko-habana in #366
Enable Parallel Compilation feature for compile mode by default by @jwieczorekhabana in #370
[SW-239226] Adjust junit xml filenames for retry mechanism by @tlipinski1337 in #382
docs installation build formating fix by @PatrykWo in #384
Correct htexp._data_ptr utility by @xinyu-intel in #387
ray: pin ray to <2.49.0 by @xinyu-intel in #386
[FIX_FOR_VLLM_LATEST] Fix #24172, [Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py by @iboiko-habana in #388
[Bugfix] Fix min linear decode value by @adobrzyn in #391
[SW-241908] Omit all prompt buckets that exceed max_num_batched_tokens by @skavulya in #331
Experimental - fatal errro from 0.12 release by @adobrzyn in #398
Port: [Docs] CI failures chapter (#276) by @adobrzyn in #389
Fix issue with async_scheduling when dealing with chunked input by @tianmu-li in #360
nixl: support mla kvcache transfer by @xinyu-intel in #403
Unified Attention Accuracy Bugfixes by @kzawora-intel in #393
Minor optimizationm for bucketing calc by @michalkuligowski in #395
Fix linear assert by @kamil-kaczor in #401
Enviroment logs - disable prefix caching with conti pa + add vllm brnach+commit value to logs by @adobrzyn in #402
[FIX_FOR_VLLM_LATEST] Upstream vllm fixes for #26355 and #26737 by @iboiko-habana in #407
Cherrypick cd docker fixes/commits from v0.10.2 to main v0.11.0 by @nngokhale in #341
Unit test for prefix caching in Gaudi plugin by @iirzynska in #349
Add missing prompt bucket to warmup, when max_ctx is 0 by @iboiko-habana in #352
Unified attention improvemets by @adobrzyn in #363
[NIXL][BUGFIX][Gaudi2Gaudi accuracy] use 4d kv_cache for nixl_connector KV register and update host_buffer accordingly by @xuechendi in #411
Multi-image generation CI tests by @MohitIntel in #377
[FIX_FOR_VLLM_LATEST] Fix for Separate out vllm.utils.collections #26990 by @iboiko-habana in #413
Add fp8 calibration procedure by @afierka-intel in #309
[FIX_FOR_VLLM_LATEST] Fix for #27022 by @adobrzyn in #418
[CI]unified attn is too easy to fail, add small RTOL by @xuechendi in #422
Update supported_features.md by @mgawarkiewicz-intel in #180
[FIX_FOR_VLLM_LATEST] Fixes for upstream #26908 and #27143 and #27169 by @iboiko-habana in #427
[NIXL]Enable prefill TP < Decode TP with host_buffer by @xuechendi in #421
Fix typo in installation.md: correct script name to install_nixl.py by @yafshar in #385
[SW-242466] Update not_over_max_model_len filter to fix warmup perf regression by @skavulya in #424
Docs update post v0.11 by @PatrykWo in #428
[FIX_FOR_VLLM_LATEST] Fix for #26440 by @iboiko-habana in #442
[main] Defragmenter warmup accuracy workaround by @kzawora-intel in #436
Update docs: Quickstart - Executing inference by @pawel-olejniczak in #410
[Security] Update requirements.txt (#443) by @afierka-intel in #445
[GITHUB ACTION] Always run same job to same node by @xuechendi in #450
reuse DP allgather tensor across layers by @wuxun-zhang in #415
Support DP for unified attention by @wuxun-zhang in #242
[Linear warmup] Default values optimization by @adobrzyn in #426
Buckets from file - alpha version by @adobrzyn in #375
Fix math log2 exponential bucket error if max_model_len <= block_size by @skavulya in #451
Fix requirements filtering in HPU Dockerfiles by @jakub-sochacki in #419
Fix defragmentation for MLA-based models by @kzawora-intel in #470
[FIX_FOR_VLLM_LATEST] Fix for is_pin_memory_available import and skip of run_spec_decode_ngram_test due to #26060 by @iboiko-habana in #471
Update KVConnectorOutpout for P/D when async scheduling turned on by @wuxun-zhang in #468
Applying of [V1][spec decode] return logprobs for spec decoding #26060 by @iboiko-habana in #476
Gemma3 Multimodal optimization by @jiminha in #404
Fix prompt/decode profiler by @kamil-kaczor in #472
New docs part3 updates by @PatrykWo in #456
fix dummy run config for P/D prefiller instance by @wuxun-zhang in #467
Add granite calibration test to all tests function by @ulivne in #453
[FIX_FOR_VLLM_LATEST] Fix for Clean up utils #27552 by @iboiko-habana in #481
Added info if H2d (runtime scale patching) is set by @jczaja in #480
Update requirements.txt by @afierka-intel in #487
Update the duplicate module list for deepseek r1 by @yiliu30 in #478
[Security] Remove structurally dead code (#444) by @afierka-intel in #490
[Security] Fix/remove logically dead code (#448) by @afierka-intel in #491
[Security] Remove unused triton script with null-like value issue (#447) by @afierka-intel in #492
[FIX_FOR_VLLM_LATEST] Fix for Make LayerBlockType a Literal instead of Enum #27658 by @iboiko-habana in #499
rhel docker fix to main by @PatrykWo in #489
Fix profiler using wrong bucket by @kamil-kaczor in #497
Add docs: Plugin System by @pawel-olejniczak in #446
HPU Dockerfile for PyTorch CI HUD by @jakub-sochacki in #501
Add unified attention Granite-8b test by @kzawora-intel in #277
Unified Attention - High Level Profiler Integration by @kzawora-intel in #399
Use query in linear flags - seq as fallback option by @adobrzyn in #396
[SW-243111] Add correctors for decode buckets by @jbyczkow in #504
Add HABANA_VISIBLE_DEVICES env to Dockerfile.hpu used for PyTorch CI HUD by @jakub-sochacki in #506
Update troubleshooting.md by @michalkuligowski in #416
[FIX_FOR_VLLM_LATEST] Hourly fix after: [BugFix] Handle unscheduled requests properly when async scheduling #27756 by @adobrzyn in #507
Update TESTOWNERS by @jbyczkow in #494
MLA: reshape non-contiguous tensor by @xinyu-intel in #505
DP: allreduce on the host by @xinyu-intel in #498
Simplify requirements by @pawel-olejniczak in #458
Remove VLLM_DELAYED_SAMPLING by @xwu-intel in #433
Removing data from a deleted column by @PatrykWo in #514
Add Unified Attention docs by @madamczyk-intel in #275
Unified Attention - batch preparation rewrite by @kzawora-intel in #400
vllm matrix table by @PatrykWo in #517
Documentation updates - part 1 by @mhelf-intel in #493
Fix preemption handling by @kzawora-intel in #524
Removing leftovers fork from plugin by @PatrykWo in #525
[Bucketing] Prompt with 0 min and max context blocks by @adobrzyn in #534
Port: add VLLM_DISABLE_MARK_SCALES_AS_CONST by @zhejiangxiaomai in #522
Add graph compilation tracking to high level profiler by @kzawora-intel in #50
Update finished KV transfer state after every step by @wuxun-zhang in #532
[GITHUB ACTION][NIXL]update install_nixl.py script by @xuechendi in #543
Doc updates: introduction and developer guides by @mhelf-intel in #529
FP8 documentation review by @mhelf-intel in #518
Documentation: Troubleshooting and FAQ updates and the updated documentation structure by @mhelf-intel in #548
[New Feature] Add cpu core pinning to vllm-server to improve performance. by @louie-tsai in #502
Fix missing non-causal buckets by @kamil-kaczor in #540
[Docs] Unified attn style update by @adobrzyn in #533
Enable FP8 with unified attention by @afierka-intel in #516
Fix unified preemption no attr found by @kamil-kaczor in #528
Add tests for custom operator implementation correctness by @Kacper-Pietkun in #457
[SW-242523] Support per-tensor FP8 scaling by @skavulya in #483
Fix typo in bucketing_file.txt by @mgonchar in #553
[Docs] Readme for bucketing from file + env var added by @adobrzyn in #545
[FIX_FOR_VLLM_LATEST] Fix upstream execute_model crash by @iboiko-habana in #546
Fix for compiled_methods by @Kacper-Pietkun in #559
Skip HPUGraph exceed max_cudagraph_capture_size by @zhejiangxiaomai in #551
[FIX_FOR_VLLM_LATEST] Rename get_input_embeddings and get_multimodal_embeddings by @pawel-olejniczak in #561
Replace the deprecated logo by @mhelf-intel in #564
Automatically adjust VLLM_DECODE_BLOCK_BUCKET_MIN if it exceeds max_blocks by @dsocek in #432
v0 cleanup by @michalkuligowski in #563
[FIX_FOR_VLLM_LATEST] fix pr28534 by @iboiko-habana in #568
Fix for PR546, adding float32 and float16 by @iboiko-habana in #569
UX fix: hide warmup logs by @adobrzyn in #539
Final documentation improvements and broken link fixes by @mhelf-intel in #558
Readme updates and release notes for 0.10.2 by @mhelf-intel in #565
[FIX_FOR_VLLM_LATEST] Fix crash after the sampled_token_ids type change by @pawel-olejniczak in #575
Nixl deployment fixes by @PatrykWo in #573
Specify output tensor in matmul_qk - with version difference by @adobrzyn in #571
Update hpu_model_runner.py by @afierka-intel in #582
Fix for PR24248 by @iboiko-habana in #578
Edit docker file to resolve conflicts issue243959 by @PatrykWo in #587
Fix async scheduling + request preemption by @tianmu-li in #589
[PD][NIXL]Fix bug after upstream adding virtual block_size support by @xuechendi in #590
Port: Fix prefix caching automatic off with conti pa (#583) by @adobrzyn in #586
Edit CODEOWNERS for 0.11.2 BO by @PatrykWo in #604
Add support for chunked attention (#597) by @jkaniecki in #612
Fix reverse inull security issue (#588) by @afierka-intel in #611
cherry pick fixes for llama4 by @Luca-Calabria in #637
Cherry-pick release docker cmdline fixes, WA and long context support… by @nngokhale in #625
Add missing quantization files (#639) by @afierka-intel in #651
Doc changes from main to 0.11.2 by @mhelf-intel in #655
0.11.2 matrix update by @PatrykWo in #657
Documentation updates for 0.11.2 by @mhelf-intel in #666
Docs: broken links fixes to 0.11.2 by @mhelf-intel in #669
0.11.2 plugin release updates by @PatrykWo in #667

New Contributors

@mswiniarsk made their first contribution in #96
@anko-intel made their first contribution in #90
@mgawarkiewicz-intel made their first contribution in #144
@taran2210 made their first contribution in #107
@wuxun-zhang made their first contribution in #80
@hsubramony made their first contribution in #100
@jiminha made their first contribution in #150
@jwieczorekhabana made their first contribution in #370
@tlipinski1337 made their first contribution in #382
@iirzynska made their first contribution in #349
@MohitIntel made their first contribution in #377
@yafshar made their first contribution in #385
@jakub-sochacki made their first contribution in #419
@xwu-intel made their first contribution in #433
@zhejiangxiaomai made their first contribution in #522
@louie-tsai made their first contribution in #502
@mgonchar made their first contribution in #553
@dsocek made their first contribution in #432

Full Changelog: v0.10.1...v0.11.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM-Gaudi for vLLM-v0.11.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!