kernels.quantize

kernels.quantize

Dequantization utilities for bitsandbytes integration.

Functions

Name Description
dequantize Fast NF4 dequantization using bitsandbytes CUDA kernels.

dequantize

kernels.quantize.dequantize(W, quant_state=None, out=None)

Fast NF4 dequantization using bitsandbytes CUDA kernels.

Performs efficient dequantization of weights from NF4 format using bitsandbytes’ optimized CUDA implementations. Supports both legacy list and new QuantState formats.

Parameters

Name Type Description Default
W torch.Tensor Quantized weight tensor to dequantize required
quant_state QuantState | list | None Quantization state containing metadata needed for dequantization. Can be either a QuantState object or legacy list format. If None, returns W unchanged. None
out torch.Tensor | None Optional output tensor for storing dequantized results. Must match expected shape and dtype if provided. None

Returns

Name Type Description
torch.Tensor Dequantized tensor in the specified dtype (fp16 or bf16). Will be transposed if
torch.Tensor input W was transposed.

Raises

Name Type Description
AssertionError If provided output tensor doesn’t match expected shape / dtype.

Note

Uses CUDA streams for better performance when available in newer bitsandbytes versions (>0.43.3).