Usage Reference

Config file

The most important part of the library is a user-defined config yaml file. It has five separate sections: training, pruning, quantization, finetuning, and fitcompress section, currently maintained by TensorFlow only, parameters. By default, the parameters in the config are the following:

Training parameters

The following table outlines the primary parameters used to configure the training process:

Field	Type	Default	Description
`epochs`	int	`200`	Total number of training epochs.
`fine_tuning_epochs`	int	`0`	Additional epochs for fine-tuning.
`pretraining_epochs`	int	`50`	Pretraining / warm-up epochs.
`rewind`	str	`"never"`	Weight rewinding policy.
`rounds`	int	`1`	Number of prune–fine-tune cycles.
`save_weights_epoch`	int	`-1`	Save checkpoint at this epoch (`-1` disables).

Note

If you require additional parameters for the training or optimization loops, please define them directly in the config.yaml file.

Quantization parameters

Field	Type	Default	Description
`default_data_keep_negatives`	bool	`0`	Default k value for data quantization (0 = clamp negatives, 1 = keep).
`default_data_integer_bits`	int	`0`	Default integer bitwidth i for data quantization.
`default_data_fractional_bits`	int	`0`	Default fractional bitwidth f for data quantization.
`default_weight_keep_negatives`	bool	`0`	Default k value for weight quantization (0 or 1).
`default_weight_integer_bits`	int	`0`	Default integer bitwidth i for weight quantization.
`default_weight_fractional_bits`	int	`0`	Default fractional bitwidth f for weight quantization.
`quantize_input`	bool	`true`	Whether inputs to layers are quantized by default.
`quantize_output`	bool	`true`	Whether outputs of layers are quantized by default.
`enable_quantization`	bool	`true`	Global switch to enable or disable quantization.
`hgq_gamma`	float	`0.0`	HGQ regularization coefficient for bitwidth stability.
`hgq_beta`	float	`0.0`	HGQ loss coefficient scaling EBOPs.
`layer_specific`	dict	`{}`	Dictionary for per-layer quantization overrides.
`use_hgq`	bool	`false`	Enable or disable High Granularity Quantization (HGQ).
`use_real_tanh`	bool	`false`	Use a real `tanh` instead of hard/approximate `tanh`.
`overflow`	str	`"SAT"`	Overflow handling mode (`SAT`, `SAT_SYM`, `WRAP`, `WRAP_SM`).
`round_mode`	str	`"RND"`	Rounding mode (`TRN`, `RND`, `RND_CONV`, `RND_ZERO`, etc.).
`use_relu_multiplier`	bool	`true`	Enable a learned bit-shift multiplier inside ReLU layers.

Fine-tuning parameters

Field	Type	Default	Description
`experiment_name`	str	`"experiment_1"`	Name of the study.
`model_name`	str	`"resnet18"`	Model architecture name.
`sampler`	str	`GridSampler`	Sampler selection for the search space.
`num_trials`	int	`0`	Number of trials.
`hyperparameter_search`	HyperparameterSearch	`{}`	Ranges for non-grid samplers.

Samplers

Field	Type	Default	Description
`type`	str	`"TPESampler"`	Sampler class name (e.g., `TPESampler`, `GridSampler`).
`params`	Dict[str, Any]	`{}`	Sampler-specific kwargs (e.g., `seed`, `search_space`).

More about samplers can be found in {optuna documentation}

HyperparameterSearch

Field	Type	Default	Description
`numerical`	Dict[str, List[Union[int, float]]]	`{}`	Numeric ranges `[low, high, step]`.
`categorical`	Optional[Dict[str, List[str]]]	`{}`	Categorical choices.

Pruning methods

PQuantML supports seven different pruning methods.

Method Overview

Method	Model
`cs`	`CSPruningModel`
`dst`	`DSTPruningModel`
`pdp`	`PDPPruningModel`
`wanda`	`WandaPruningModel`
`autosparse`	`AutoSparsePruningModel`
`activation_pruning`	`ActivationPruningModel`
`mdmm`	`MDMMPruningModel`

There are the parameters shared by all methods:

Field	Type	Default	Description
`disable_pruning_for_layers`	List[str]	`[]`	Layer names to exclude from pruning.
`enable_pruning`	bool	`true`	Master pruning on/off switch.
`threshold_decay`	float	`0.0`	Optional pruning threshold decay term.

Note

Layer names in disable_pruning_for_layers field must match your framework’s naming (e.g., Keras layer.name).

There are more details about every pruning method:

CS Pruning

Field	Type	Default	Description
`pruning_method`	str	`cs`	Selects this pruning schema.
`final_temp`	int	`200`	Target temperature at the end of the schedule.
`threshold_init`	int	`0`	Initial sparsification threshold.

DST Pruning

Field	Type	Default	Description
`pruning_method`	str	`dst`	Selects this pruning schema.
`alpha`	float	`5.0e-06`	Mask dynamics update coefficient.
`max_pruning_pct`	float	`0.99`	Upper bound on total pruning ratio.
`threshold_init`	float	`0.0`	Initial threshold value.
`threshold_type`	str	`"channelwise"`	Thresholding granularity.

PDP Pruning

Field	Type	Default	Description
`pruning_method`	str	`pdp`	Selects this pruning schema.
`epsilon`	float	`0.015`	Smoothing/regularization factor for gating.
`sparsity`	float	`0.8`	Target sparsity level (0–1).
`temperature`	float	`1.0e-05`	Annealing temperature.
`structured_pruning`	bool	`false`	Enable structured pruning.

Wanda Pruning

Field	Type	Default	Description
`pruning_method`	str	`wanda`	Selects this pruning schema.
`M`	Optional[int]	`null`	Optional grouping constant.
`N`	Optional[int]	`null`	Optional grouping constant.
`sparsity`	float	`0.9`	Target sparsity level (0–1).
`t_delta`	int	`100`	Window size / steps for stats collection.
`t_start_collecting_batch`	int	`100`	Warm-up steps before collecting statistics.
`calculate_pruning_budget`	bool	`true`	Auto-compute pruning budget from data.

Autosparse Pruning

Field	Type	Default	Description
`pruning_method`	str	`autosparse`	Selects this pruning schema.
`alpha`	float	`0.5`	Weight/penalty coefficient.
`alpha_reset_epoch`	int	`90`	Epoch at which `alpha` is reset/tuned.
`autotune_epochs`	int	`10`	Number of epochs in the tuning window.
`backward_sparsity`	bool	`false`	Apply sparsity in backward pass (if supported).
`threshold_init`	float	`-5.0`	Initial threshold (often in logit space).
`threshold_type`	str	`"channelwise"`	Thresholding granularity.

Activation Pruning

Field	Type	Default	Description
`pruning_method`	str	`activation_pruning`	Selects this pruning schema.
`threshold`	float	`0.3`	Activation magnitude cutoff.
`t_delta`	int	`50`	Steps used to aggregate statistics.
`t_start_collecting_batch`	int	`50`	Steps to skip before collecting statistics.

MDMM Pruning

Field	Type	Default	Description
`pruning_method`	str	`mdmm`	Selects this pruning schema.
`constraint_type`	ConstraintType	`"Equality"`	Constraint form: equality / ≤ / ≥.
`target_value`	float	`0.0`	Target value for the chosen metric.
`metric_type`	MetricType	`"UnstructuredSparsity"`	Specifies which metric is constrained.
`target_sparsity`	float	`0.9`	Target sparsity when constraining sparsity.
`rf`	int	`1`	Regularization / frequency parameter.
`epsilon`	float	`1.0e-03`	Feasibility tolerance.
`scale`	float	`10.0`	Penalty scaling for constraint violation.
`damping`	float	`1.0`	Damping term for numerical stability.
`use_grad`	bool	`false`	Use gradient information during updates.
`l0_mode`	`"coarse"` \| `"smooth"`	`"coarse"`	L0 approximation mode.
`scale_mode`	`"mean"` \| `"sum"`	`"mean"`	Aggregation mode for penalties.

Optionally, there is also FITCompress method implemented for PyTorch:

FitCompress method

Field	Type	Default	Description
`enable_fitcompress`	bool	`false`	Master switch that enables or disables FITCompress.
`optimize_quantization`	bool	`true`	Whether FITCompress searches over quantization bit-width candidates.
`quantization_schedule`	List[float]	`[7., 4., 3., 2.]`	Candidate bit-widths evaluated during quantization search.
`pruning_schedule`	dict	`{start: 0, end: -3, steps: 40}`	Logarithmic pruning curve (base 10) with defined start, end, and step count.
`compression_goal`	float	`0.10`	Target compression ratio for the search procedure.
`optimize_pruning`	bool	`false`	Whether FITCompress searches over pruning ratios.
`greedy_astar`	bool	`true`	Disable fallback in A* search: once a candidate is selected, all others discarded.
`approximate`	bool	`true`	Use Fisher Trace approximations to speed up FIT score estimation.
`f_lambda`	float	`1`	Multiplicative factor λ in the distance function (g + λf).

Quantization layers in PQuantML

PQConv*D: Convolutional layers.
PQAvgPool*D: Average pooling layers.
PQBatchNorm*D: BatchNorm layers.
PQDense: Linear layer.
PQActivation: Activation layers (ReLU, Tanh)

Note

Currently, PQuantML supports two quantization modes: layer-wise fixed-point quantization, where each tensor uses a single bit-width configuration, and High-Granularity Quantization (HGQ).