diff --git a/docs-guides/source/mlmodel-utilities.md b/docs-guides/source/mlmodel-utilities.md index 4c22b9e95..0b0f54ad3 100644 --- a/docs-guides/source/mlmodel-utilities.md +++ b/docs-guides/source/mlmodel-utilities.md @@ -120,7 +120,7 @@ optimization of the model via the `ct.optimize.coreml` API. ### Using the Metadata -The [`get_weights_metadata()`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.get_weights_metadata) utility returns the weights metadata as an ordered dictionary that maps to strings in [CoreMLWeightMetaData](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.CoreMLWeightMetaData) and preserves the sequential order of the weights. The results are useful when constructing [`cto.OptimizationConfig`](https://apple.github.io/coremltools/docs-guides/source/optimizecoreml-api-overview.html#customizing-ops-to-compress). +The [`get_weights_metadata()`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.get_weights_metadata) utility returns the weights metadata as an ordered dictionary that maps to strings in [CoreMLWeightMetaData](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.CoreMLWeightMetaData) and preserves the sequential order of the weights. The results are useful when constructing [`cto.coreml.OptimizationConfig`](https://apple.github.io/coremltools/docs-guides/source/optimizecoreml-api-overview.html#customizing-ops-to-compress). For example, with the [OptimizationConfig](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.config.html#coremltools.optimize.coreml.OptimizationConfig) class you have fine-grain control over applying different optimization configurations to different weights by directly setting `op_type_configs` and `op_name_configs` or using [`set_op_name`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.config.html#coremltools.optimize.coreml.OptimizationConfig.set_op_name) and [`set_op_type`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.config.html#coremltools.optimize.coreml.OptimizationConfig.set_op_type). When using [`set_op_name`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.config.html#coremltools.optimize.coreml.OptimizationConfig.set_op_name), you need to know the name for the `const` op that produces the weight. The `get_weights_metadata()` utility provides the weight name and the corresponding weight numpy data, along with metadata information. @@ -132,7 +132,7 @@ The following code loads the `SegmentationModel_with_metadata.mlpackage` saved i The example also shows how to get the name of the last weight in the model. The code palettizes all ops except the last weight, which is a common practical scenario when the last layer is more sensitive and should be skipped from quantization: ```python -import coremltools.optimize.coreml as cto +import coremltools.optimize as cto from coremltools.models import MLModel from coremltools.optimize.coreml import get_weights_metadata @@ -164,11 +164,11 @@ for weight_name, weight_metadata in weight_metadata_dict.items(): # Palettize all weights except for the last weight last_weight_name = list(weight_metadata_dict.keys())[-1] -global_config = cto.OpPalettizerConfig(nbits=6, mode="kmeans") -config = cto.OptimizationConfig( +global_config = cto.coreml.OpPalettizerConfig(nbits=6, mode="kmeans") +config = cto.coreml.OptimizationConfig( global_config=global_config, op_name_configs={last_weight_name: None}, ) -compressed_mlmodel = cto.palettize_weights(mlmodel, config) +compressed_mlmodel = cto.coreml.palettize_weights(mlmodel, config) ``` diff --git a/docs-guides/source/opt-palettization-api.md b/docs-guides/source/opt-palettization-api.md index 1d7f97a8d..aca5b9122 100644 --- a/docs-guides/source/opt-palettization-api.md +++ b/docs-guides/source/opt-palettization-api.md @@ -22,19 +22,19 @@ The following example shows `6-bit` palettization applied to all the ops which h This is controlled by setting the `weight_threshold` parameter to 512. ```python import coremltools as ct -import coremltools.optimize.coreml as cto +import coremltools.optimize as cto # load model mlmodel = ct.models.MLModel(uncompressed_model_path) # define op config -op_config = cto.OpPalettizerConfig(nbits=6, weight_threshold=512) +op_config = cto.coreml.OpPalettizerConfig(nbits=6, weight_threshold=512) # define optimization config by applying the op config globally to all ops -config = cto.OptimizationConfig(global_config=op_config) +config = cto.coreml.OptimizationConfig(global_config=op_config) # palettize weights -compressed_mlmodel = cto.palettize_weights(mlmodel, config) +compressed_mlmodel = cto.coreml.palettize_weights(mlmodel, config) ``` Some key parameters that the config accepts are: - `n_bits` : This controls the number of clusters, which are `2^n_bits` . @@ -54,18 +54,18 @@ to `8-bits`, and two of the conv ops (named `conv1` and `conv3`) are omitted fro ```python import coremltools as ct -import coremltools.optimize.coreml as cto +import coremltools.optimize as cto mlmodel = ct.models.MLModel(uncompressed_model_path) -global_config = cto.OpPalettizerConfig(nbits=6) -linear_config = cto.OpPalettizerConfig(nbits=8) -config = cto.OptimizationConfig( +global_config = cto.coreml.OpPalettizerConfig(nbits=6) +linear_config = cto.coreml.OpPalettizerConfig(nbits=8) +config = cto.coreml.OptimizationConfig( global_config=global_config, op_type_configs={"linear": linear_config}, op_name_configs={"conv1": None, "conv3": None}, ) -compressed_mlmodel = cto.palettize_weights(mlmodel, config) +compressed_mlmodel = cto.coreml.palettize_weights(mlmodel, config) ``` For more details, please follow the detailed API page for [coremltools.optimize.coreml.palettize_weights](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.palettize_weights) diff --git a/docs-guides/source/opt-quantization-api.md b/docs-guides/source/opt-quantization-api.md index 2eb013cc1..52b18f798 100644 --- a/docs-guides/source/opt-quantization-api.md +++ b/docs-guides/source/opt-quantization-api.md @@ -9,12 +9,14 @@ You can linearly quantize the weights of your Core ML model by using the [``linear_quantize_weights``](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.linear_quantize_weights) method as follows: ```python -import coremltools.optimize.coreml as cto +import coremltools.optimize as cto -op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric", weight_threshold=512) -config = cto.OptimizationConfig(global_config=op_config) +op_config = cto.coreml.OpLinearQuantizerConfig( + mode="linear_symmetric", weight_threshold=512 +) +config = cto.coreml.OptimizationConfig(global_config=op_config) -compressed_8_bit_model = cto.linear_quantize_weights(model, config=config) +compressed_8_bit_model = cto.coreml.linear_quantize_weights(model, config=config) ``` The method defaults to ``linear_symmetric``, which uses only per-channel scales and no zero-points. diff --git a/docs-guides/source/opt-workflow.md b/docs-guides/source/opt-workflow.md index 665f5fe30..f9791c6db 100644 --- a/docs-guides/source/opt-workflow.md +++ b/docs-guides/source/opt-workflow.md @@ -134,15 +134,15 @@ followed by data free palettization etc. Sample pseudocode of applying palettization to an `mlpackage` model: ```python import coremltools as ct -import coremltools.optimize.coreml as cto +import coremltools.optimize as cto mlmodel = ct.models.MLModel(uncompressed_model_path) -op_config = cto.OpPalettizerConfig(mode="kmeans", +op_config = cto.coreml.OpPalettizerConfig(mode="kmeans", nbits=4, granularity="per_grouped_channel", group_size=16) -model_config = cto.OptimizationConfig(global_config=op_config) -compressed_mlmodel = cto.palettize_weights(mlmodel, model_config) +model_config = cto.coreml.OptimizationConfig(global_config=op_config) +compressed_mlmodel = cto.coreml.palettize_weights(mlmodel, model_config) ``` Sample pseudocode of applying palettization to a torch model: @@ -191,7 +191,7 @@ Quantizing activations can be applied either to the torch model, or directly to an `mlpackage` model as well. Sample pseudocode snippet to do so: ```python import coremltools as ct -import coremltools.optimize.coreml as cto +import coremltools.optimize as cto # The following API is for coremltools==8.0b1 # It will be moved out of "experimental" in later versions of coremltools from coremltools.optimize.coreml.experimental import OpActivationLinearQuantizerConfig, \ @@ -201,16 +201,16 @@ mlmodel = ct.models.MLModel(uncompressed_model_path) # quantize activations to 8 bits (this will give an A8W16 model) act_quant_op_config = OpActivationLinearQuantizerConfig(mode="linear_symmetric") -act_quant_model_config = cto.OptimizationConfig(global_config=act_quant_op_config) +act_quant_model_config = cto.coreml.OptimizationConfig(global_config=act_quant_op_config) mlmodel_compressed_activations = linear_quantize_activations(mlmodel, act_quant_model_config, sample_data=...) # quantize weights to 8 bits (this will give an A8W8 model) -weight_quant_op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric", +weight_quant_op_config = cto.coreml.OpLinearQuantizerConfig(mode="linear_symmetric", dtype="int8") -weight_quant_model_config = cto.OptimizationConfig(weight_quant_op_config) -mlmodel_compressed = cto.linear_quantize_weights(mlmodel_compressed_activations, +weight_quant_model_config = cto.coreml.OptimizationConfig(weight_quant_op_config) +mlmodel_compressed = cto.coreml.linear_quantize_weights(mlmodel_compressed_activations, weight_quant_model_config) ```