utils.model_shard_quant
utils.model_shard_quant
module to handle loading model on cpu/meta device for FSDP
Functions
Name | Description |
---|---|
load_and_quantize | Loads value tensor into submodule of module , optionally skipping skip_names and converting to dtype . |
load_and_quantize
utils.model_shard_quant.load_and_quantize(
module,
name,
value,=None,
device=None,
dtype=None,
skip_names=False,
to_cpu=False,
to_meta=False,
verbose='bnb',
quant_method )
Loads value
tensor into submodule of module
, optionally skipping skip_names
and converting to dtype
.
Quantizes Params4bit
on device
then places on “cpu” if to_cpu=True or “meta” if to_meta=True.