Skip to content

vllm.attention.backends

Modules:

Name Description
abstract
blocksparse_attn
differential_flash_attn

" An implementation of https://arxiv.org/pdf/2410.05258

dual_chunk_flash_attn

Attention layer with Dual chunk flash attention and sparse attention.

flash_attn

Attention layer with FlashAttention.

flashinfer
flashmla
hpu_attn
mla
placeholder_attn
rocm_aiter_mla
rocm_flash_attn

Attention layer ROCm GPUs.

triton_mla
utils

Attention backend utils

xformers

Attention layer with xFormers and PagedAttention.