kernels.quantize
kernels.quantize
Dequantization utilities for bitsandbytes
integration.
Functions
Name | Description |
---|---|
dequantize | Fast NF4 dequantization using bitsandbytes CUDA kernels. |
dequantize
=None, out=None) kernels.quantize.dequantize(W, quant_state
Fast NF4 dequantization using bitsandbytes
CUDA kernels.
Performs efficient dequantization of weights from NF4 format using bitsandbytes
’
optimized CUDA implementations. Supports both legacy list and new QuantState
formats.
Parameters
Name | Type | Description | Default |
---|---|---|---|
W | torch.Tensor | Quantized weight tensor to dequantize | required |
quant_state | QuantState | list | None | Quantization state containing metadata needed for dequantization. Can be either a QuantState object or legacy list format. If None, returns W unchanged. |
None |
out | torch.Tensor | None | Optional output tensor for storing dequantized results. Must match expected shape and dtype if provided. | None |
Returns
Name | Type | Description |
---|---|---|
torch.Tensor | Dequantized tensor in the specified dtype (fp16 or bf16). Will be transposed if | |
torch.Tensor | input W was transposed. |
Raises
Name | Type | Description |
---|---|---|
AssertionError | If provided output tensor doesn’t match expected shape / dtype. |
Note
Uses CUDA streams for better performance when available in newer bitsandbytes
versions (>0.43.3).