Usage Reference
Config file
The most important part of the library is a user-defined config yaml file. It has five separate sections: training, pruning, quantization, finetuning, and fitcompress section, currently maintained by TensorFlow only, parameters. By default, the parameters in the config are the following:
Training parameters
The following table outlines the primary parameters used to configure the training process:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
int |
|
Total number of training epochs. |
|
int |
|
Additional epochs for fine-tuning. |
|
int |
|
Pretraining / warm-up epochs. |
|
str |
|
Weight rewinding policy. |
|
int |
|
Number of prune–fine-tune cycles. |
|
int |
|
Save checkpoint at this epoch ( |
Note
If you require additional parameters for the training or optimization loops, please define them directly in the config.yaml file.
Quantization parameters
Field |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Default k value for data quantization (0 = clamp negatives, 1 = keep). |
|
int |
|
Default integer bitwidth i for data quantization. |
|
int |
|
Default fractional bitwidth f for data quantization. |
|
bool |
|
Default k value for weight quantization (0 or 1). |
|
int |
|
Default integer bitwidth i for weight quantization. |
|
int |
|
Default fractional bitwidth f for weight quantization. |
|
bool |
|
Whether inputs to layers are quantized by default. |
|
bool |
|
Whether outputs of layers are quantized by default. |
|
bool |
|
Global switch to enable or disable quantization. |
|
float |
|
HGQ regularization coefficient for bitwidth stability. |
|
float |
|
HGQ loss coefficient scaling EBOPs. |
|
dict |
|
Dictionary for per-layer quantization overrides. |
|
bool |
|
Enable or disable High Granularity Quantization (HGQ). |
|
bool |
|
Use a real |
|
str |
|
Overflow handling mode ( |
|
str |
|
Rounding mode ( |
|
bool |
|
Enable a learned bit-shift multiplier inside ReLU layers. |
Fine-tuning parameters
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Name of the study. |
|
str |
|
Model architecture name. |
|
str |
|
Sampler selection for the search space. |
|
int |
|
Number of trials. |
|
HyperparameterSearch |
|
Ranges for non-grid samplers. |
Samplers
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Sampler class name (e.g., |
|
Dict[str, Any] |
|
Sampler-specific kwargs (e.g., |
More about samplers can be found in {optuna documentation}
HyperparameterSearch
Field |
Type |
Default |
Description |
|---|---|---|---|
|
Dict[str, List[Union[int, float]]] |
|
Numeric ranges |
|
Optional[Dict[str, List[str]]] |
|
Categorical choices. |
Pruning methods
PQuantML supports seven different pruning methods.
Method Overview
Method |
Model |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There are the parameters shared by all methods:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
List[str] |
|
Layer names to exclude from pruning. |
|
bool |
|
Master pruning on/off switch. |
|
float |
|
Optional pruning threshold decay term. |
Note
Layer names in disable_pruning_for_layers field must match your framework’s naming (e.g., Keras layer.name).
There are more details about every pruning method:
CS Pruning
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Selects this pruning schema. |
|
int |
|
Target temperature at the end of the schedule. |
|
int |
|
Initial sparsification threshold. |
DST Pruning
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Selects this pruning schema. |
|
float |
|
Mask dynamics update coefficient. |
|
float |
|
Upper bound on total pruning ratio. |
|
float |
|
Initial threshold value. |
|
str |
|
Thresholding granularity. |
PDP Pruning
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Selects this pruning schema. |
|
float |
|
Smoothing/regularization factor for gating. |
|
float |
|
Target sparsity level (0–1). |
|
float |
|
Annealing temperature. |
|
bool |
|
Enable structured pruning. |
Wanda Pruning
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Selects this pruning schema. |
|
Optional[int] |
|
Optional grouping constant. |
|
Optional[int] |
|
Optional grouping constant. |
|
float |
|
Target sparsity level (0–1). |
|
int |
|
Window size / steps for stats collection. |
|
int |
|
Warm-up steps before collecting statistics. |
|
bool |
|
Auto-compute pruning budget from data. |
Autosparse Pruning
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Selects this pruning schema. |
|
float |
|
Weight/penalty coefficient. |
|
int |
|
Epoch at which |
|
int |
|
Number of epochs in the tuning window. |
|
bool |
|
Apply sparsity in backward pass (if supported). |
|
float |
|
Initial threshold (often in logit space). |
|
str |
|
Thresholding granularity. |
Activation Pruning
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Selects this pruning schema. |
|
float |
|
Activation magnitude cutoff. |
|
int |
|
Steps used to aggregate statistics. |
|
int |
|
Steps to skip before collecting statistics. |
MDMM Pruning
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Selects this pruning schema. |
|
ConstraintType |
|
Constraint form: equality / ≤ / ≥. |
|
float |
|
Target value for the chosen metric. |
|
MetricType |
|
Specifies which metric is constrained. |
|
float |
|
Target sparsity when constraining sparsity. |
|
int |
|
Regularization / frequency parameter. |
|
float |
|
Feasibility tolerance. |
|
float |
|
Penalty scaling for constraint violation. |
|
float |
|
Damping term for numerical stability. |
|
bool |
|
Use gradient information during updates. |
|
|
|
L0 approximation mode. |
|
|
|
Aggregation mode for penalties. |
Optionally, there is also FITCompress method implemented for PyTorch:
FitCompress method
Field |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Master switch that enables or disables FITCompress. |
|
bool |
|
Whether FITCompress searches over quantization bit-width candidates. |
|
List[float] |
|
Candidate bit-widths evaluated during quantization search. |
|
dict |
|
Logarithmic pruning curve (base 10) with defined start, end, and step count. |
|
float |
|
Target compression ratio for the search procedure. |
|
bool |
|
Whether FITCompress searches over pruning ratios. |
|
bool |
|
Disable fallback in A* search: once a candidate is selected, all others discarded. |
|
bool |
|
Use Fisher Trace approximations to speed up FIT score estimation. |
|
float |
|
Multiplicative factor λ in the distance function (g + λf). |
Quantization layers in PQuantML
PQConv*D: Convolutional layers.PQAvgPool*D: Average pooling layers.PQBatchNorm*D: BatchNorm layers.PQDense: Linear layer.PQActivation: Activation layers (ReLU, Tanh)
Note
Currently, PQuantML supports two quantization modes: layer-wise fixed-point quantization, where each tensor uses a single bit-width configuration, and High-Granularity Quantization (HGQ).