vllm.utils.deep_gemm
Compatibility wrapper for DeepGEMM API changes.
Users of vLLM should always import only these wrappers.
__all__
module-attribute
¶
__all__ = [
"calc_diff",
"fp8_gemm_nt",
"m_grouped_fp8_gemm_nt_contiguous",
"fp8_m_grouped_gemm_nt_masked",
"per_block_cast_to_fp8",
"is_blackwell_deep_gemm_used",
]
_per_block_cast_impl
module-attribute
¶
_per_block_cast_impl: Callable[..., Any] | None = getattr(
_math_mod, "per_block_cast_to_fp8", None
)
_missing
¶
Placeholder for unavailable DeepGEMM backend.
_resolve_symbol
¶
Return the new symbol if it exists, otherwise the old one.
Source code in vllm/utils/deep_gemm.py
calc_diff
¶
Return a global difference metric for unit tests.
DeepGEMM kernels on Blackwell/B200 currently exhibit noticeable per-element
error, causing torch.testing.assert_close
to fail. Instead of checking
every element, we compute a cosine-style similarity over the whole tensor
and report 1 - sim
. Once kernel accuracy improves this helper can be
removed.
Source code in vllm/utils/deep_gemm.py
fp8_gemm_nt
¶
fp8_m_grouped_gemm_nt_masked
¶
is_blackwell_deep_gemm_used
cached
¶
is_blackwell_deep_gemm_used() -> bool
Return True
if vLLM is configured to use DeepGEMM on a
Blackwell-class GPU.