utils.model_shard_quant

utils.model_shard_quant

module to handle loading model on cpu/meta device for FSDP

Functions

Name Description
load_and_quantize Loads value tensor into submodule of module, optionally skipping skip_names and converting to dtype.

load_and_quantize

utils.model_shard_quant.load_and_quantize(
    module,
    name,
    value,
    device=None,
    dtype=None,
    skip_names=None,
    to_cpu=False,
    to_meta=False,
    verbose=False,
    quant_method='bnb',
)

Loads value tensor into submodule of module, optionally skipping skip_names and converting to dtype.

Quantizes Params4bit on device then places on “cpu” if to_cpu=True or “meta” if to_meta=True.