fix torch.linspace #2416

twoertwein · 2024-12-11T03:30:09Z

@TobyRoseman where should I add testcases for this? I grep'ed through the tests, there are currently none for torch.linspace. And having quick tutorial how to get the test locally running would be great, too :) Getting RuntimeError: BlobWriter not loaded when calling ct.convert.

TobyRoseman · 2024-12-12T00:30:05Z

Thanks for looking into this.

Please add your unit test to the TestLinspace Test Class.

To get things running locally, see our Building from Source Document.

twoertwein · 2024-12-12T01:28:54Z

This existing test fails: TestLinspace::test_linspace_static_large[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')]. It tries to do torch.linspace(1, 2_000_000, 2_000_000) but it creates infs
array([ 1., 2., 3., ..., inf, inf, inf], dtype=float32)

Here is a minimal example when the inf start coming in

arange = mb.range_1d(start=0.0, end=2_000_000.0, step=1.0) # no infs
res = mb.add(x=arange, y=1.0) # now we have infs! (same also if we do mul instead of add)

Do you have an idea how to avoid that?

Based on the test name "fp16", this sounds somewhat expected: 2_000_000 > 65_504?

TobyRoseman · 2024-12-13T18:49:58Z

The TestLinspace Unit Tests work for me with the most recent main. Are you saying the existing unit tests fail with your change? If so, what is the error and stack trace?

twoertwein · 2024-12-13T23:00:07Z

Yes, it works on main and fails on this PR:

$ pytest -k 'test_linspace_static_large and fp16' coremltools/converters/mil/frontend/torch/test/test_torch_ops.py
F                                                                                                                                                                      [100%]
================================================================================== FAILURES ==================================================================================
__________________________________ TestLinspace.test_linspace_static_large[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')] __________________________________

self = <coremltools.converters.mil.frontend.torch.test.test_torch_ops.TestLinspace object at 0x141c19290>, compute_unit = <ComputeUnit.CPU_ONLY: 3>
backend = ('mlprogram', 'fp16')

    @pytest.mark.parametrize("compute_unit, backend", itertools.product(compute_units, backends))
    def test_linspace_static_large(self, compute_unit, backend):
        input_shape = tuple([1])
    
        class Model(nn.Module):
            def forward(self, x):
                return torch.linspace(1, 2_000_000, 2_000_000)
    
        model = Model()
>       self.run_compare_torch(input_shape, model, backend=backend, compute_unit=compute_unit)

coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:5189: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
coremltools/converters/mil/frontend/torch/test/testing_utils.py:369: in run_compare_torch
    model_spec, mlmodel, coreml_inputs, coreml_results = convert_and_compare(
coremltools/converters/mil/frontend/torch/test/testing_utils.py:313: in convert_and_compare
    np.testing.assert_allclose(coreml_result, torch_result, atol=atol, rtol=rtol)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<function assert_allclose.<locals>.compare at 0x157121440>, array([ 1.,  2.,  3., ..., inf, inf, inf], dtype=float32), array([1.000000e+00, 2.000000e+00, 3.000000e+00, ..., 1.999998e+06,
       1.999999e+06, 2.000000e+06], dtype=float32))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=0.05, atol=0.5', 'strict': False, ...}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=0.05, atol=0.5
E           
E           +inf location mismatch:
E            ACTUAL: array([ 1.,  2.,  3., ..., inf, inf, inf], dtype=float32)
E            DESIRED: array([1.000000e+00, 2.000000e+00, 3.000000e+00, ..., 1.999998e+06,
E                  1.999999e+06, 2.000000e+06], dtype=float32)

../miniforge3/lib/python3.11/contextlib.py:81: AssertionError

This PR is closer to the pytorch and numpy behavior:

import numpy as np
import torch

torch.linspace(0, 2_000_000, 2_000_000).to(dtype=torch.float16)
# tensor([0., 1., 2.,  ..., inf, inf, inf], dtype=torch.float16)

torch.linspace(0, 2_000_000, 2_000_000, dtype=torch.float16)
# RuntimeError: value cannot be converted to type at::Half without overflow

np.linspace(0, 2_000_000, 2_000_000, dtype=np.float16)
# array([ 0.,  1.,  2., ..., inf, inf, inf], dtype=float16)

np.linspace(0, 2_000_000, 2_000_000).astype(np.float16)
# array([ 0.,  1.,  2., ..., inf, inf, inf], dtype=float16)

It migt be good to explicilty cast the actual and expected output in run_compare_torch to the requested dtype (in this cast to np.float16)? Or to xfail this test (2 million cant be expressed as float16)?

edit: this static test ends up in the dynamic code because if nums_val < MAX_SIZE_CONSTANT_FOLDING is false for float16.

twoertwein · 2024-12-13T23:17:06Z

I extended an existing test to trigger the reported bug on main (these tests pass on this PR):

$ pytest -k 'test_linspace_dynamic and 10' coremltools/converters/mil/frontend/torch/test/test_torch_ops.py       
F...F.......                                                                                                                                                           [100%]
================================================================================== FAILURES ==================================================================================
____________________ TestLinspace.test_linspace_dynamic[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-start_end=(-0.1, -0.7)-steps=10] _____________________

self = <coremltools.converters.mil.frontend.torch.test.test_torch_ops.TestLinspace object at 0x12ea12690>, compute_unit = <ComputeUnit.CPU_ONLY: 3>
backend = ('mlprogram', 'fp16'), start_end = (-0.1, -0.7), steps = 10

    @pytest.mark.parametrize(
        "compute_unit, backend, start_end, steps",
        itertools.product(
            compute_units,
            backends,
            [(-0.1, -0.7), (1, 10)],
            [1, 2, 10, 100],
        ),
    )
    def test_linspace_dynamic(self, compute_unit, backend, start_end, steps):
        start, end = start_end
    
        class Model(nn.Module):
            def forward(self, x):
                return torch.linspace(x[0], x[1], steps)
    
        model = Model()
        inputs = [torch.Tensor([start, end])]
>       self.run_compare_torch(
            inputs,
            model,
            backend=backend,
            compute_unit=compute_unit,
            input_as_shape=False,
        )

coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:5551: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
coremltools/converters/mil/frontend/torch/test/testing_utils.py:369: in run_compare_torch
    model_spec, mlmodel, coreml_inputs, coreml_results = convert_and_compare(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input_data = [tensor([-0.1000, -0.7000])], model_spec = Model(original_name=Model)
expected_results = [array([-0.1       , -0.16666666, -0.23333332, -0.29999998, -0.36666664,
       -0.43333334, -0.5       , -0.56666666, -0.6333333 , -0.7       ],
      dtype=float32)]
atol = 0.5, rtol = 0.05, backend = ('mlprogram', 'fp16'), converter_input_type = None, compute_unit = <ComputeUnit.CPU_ONLY: 3>, minimum_deployment_target = None
converter = <function convert at 0x12a155260>

    def convert_and_compare(
        input_data,
        model_spec,
        expected_results=None,
        atol=1e-4,
        rtol=1e-05,
        backend=("neuralnetwork", "fp32"),
        converter_input_type=None,
        compute_unit=ct.ComputeUnit.CPU_ONLY,
        minimum_deployment_target=None,
        converter=ct.convert,
    ):
        """
        If expected results is not set, it will by default
        be set to the flattened output of the torch model.
    
        Inputs:
    
        - input_data: torch.tensor or list[torch.tensor]
        """
        if isinstance(model_spec, str):
            torch_model = torch.jit.load(model_spec)
        else:
            torch_model = model_spec
        if _HAS_TORCH_EXPORT_API and isinstance(torch_model, ExportedProgram):
            torch_model = torch_model.module()
    
        if not isinstance(input_data, (list, tuple)):
            input_data = [input_data]
    
        if expected_results is None:
            torch_input = _copy_input_data(input_data)
            expected_results = torch_model(*torch_input)
        expected_results = flatten_and_detach_torch_results(expected_results)
    
        PYTEST_CURRENT_TEST = os.environ.get("PYTEST_CURRENT_TEST").split("(call)")[0].strip()
        if PYTEST_CURRENT_TEST in debug_save_mlmodels:
            serialization_path = _create_current_pytest_serialization_path()
            Path(serialization_path).mkdir(parents=True, exist_ok=True)
            flat_inputs = flatten_and_detach_torch_results(input_data)
            np.savez(serialization_path + "ref_inputs.npz", *flat_inputs)
            np.savez(serialization_path + "ref_outputs.npz", *expected_results)
    
        mlmodel = convert_to_mlmodel(
            model_spec,
            input_data,
            backend=backend,
            converter_input_type=converter_input_type,
            compute_unit=compute_unit,
            minimum_deployment_target=minimum_deployment_target,
            converter=converter,
        )
    
        coreml_inputs = convert_to_coreml_inputs(mlmodel.input_description, input_data)
    
        if not _IS_MACOS or (mlmodel.is_package and coremltoolsutils._macos_version() < (12, 0)):
            return model_spec, mlmodel, coreml_inputs, None
    
        _, dtype = backend
        if mlmodel.compute_unit != ct.ComputeUnit.CPU_ONLY or (dtype == "fp16"):
            atol = max(atol * 100.0, 5e-1)
            rtol = max(rtol * 100.0, 5e-2)
    
        if not coremltoolsutils._has_custom_layer(mlmodel._spec):
            coreml_preds = mlmodel.predict(coreml_inputs)
            coreml_outputs = mlmodel._spec.description.output
            coreml_results = [coreml_preds[output.name] for output in coreml_outputs]
            for torch_result, coreml_result in zip(expected_results, coreml_results):
    
                if torch_result.shape == ():
                    torch_result = np.array([torch_result])
>               np.testing.assert_equal(coreml_result.shape, torch_result.shape)
E               AssertionError: 
E               Items are not equal:
E               item=0
E               
E                ACTUAL: 11
E                DESIRED: 10

coremltools/converters/mil/frontend/torch/test/testing_utils.py:312: AssertionError

[...]


FAILED coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestLinspace::test_linspace_dynamic[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-start_end=(-0.1, -0.7)-steps=10] - AssertionError: 
FAILED coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestLinspace::test_linspace_dynamic[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-start_end=(1, 10)-steps=10] - AssertionError:

twoertwein · 2024-12-20T03:04:51Z

I xfailed the test on fp16 for now.

Converting the pytorch expected results to float16 in run_compare_torch is not a solution as an inf is off by one:

+inf location mismatch:
 ACTUAL: array([ 1.,  2.,  3., ..., inf, inf, inf], dtype=float32)
 DESIRED: array([ 1.,  2.,  3., ..., inf, inf, inf], dtype=float16)

TobyRoseman · 2025-01-02T18:49:02Z

Since 2 million can not be represented using fp16, rather than xfailing those tests, it would be better to change the model being converted to not to use values that high.

Any ideas why we were not running into this issue before you change?

twoertwein · 2025-01-02T23:10:37Z

Any ideas why we were not running into this issue before you change?

Unfortuatly not. I don't undersand the casting magic in coremltools: maybe you do some operations in fp32 and cast them later in future operations. The previous code might have created the results in fp32 and had no operations afterwards. The new code creates the array and then scales and shifts it (which might invovle the actual casts to fp16)? Just speculating :)

twoertwein · 2025-01-02T23:23:43Z

it would be better to change the model being converted to not to use values that high.

Changed: using the largest integer in float16 - the test passes now :)

twoertwein added 2 commits December 10, 2024 22:27

fix torch.linspace

7b13aec

remove cast

c5a6d9f

scale, then shift

86fe117

testcase

603b4df

xfail test

59243f3

use largest flaot16 int in test

4513283

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix torch.linspace #2416

fix torch.linspace #2416

twoertwein commented Dec 11, 2024 •

edited

Loading

TobyRoseman commented Dec 12, 2024

twoertwein commented Dec 12, 2024 •

edited

Loading

TobyRoseman commented Dec 13, 2024

twoertwein commented Dec 13, 2024 •

edited

Loading

twoertwein commented Dec 13, 2024

twoertwein commented Dec 20, 2024

TobyRoseman commented Jan 2, 2025

twoertwein commented Jan 2, 2025

twoertwein commented Jan 2, 2025

fix torch.linspace #2416

Are you sure you want to change the base?

fix torch.linspace #2416

Conversation

twoertwein commented Dec 11, 2024 • edited Loading

TobyRoseman commented Dec 12, 2024

twoertwein commented Dec 12, 2024 • edited Loading

TobyRoseman commented Dec 13, 2024

twoertwein commented Dec 13, 2024 • edited Loading

twoertwein commented Dec 13, 2024

twoertwein commented Dec 20, 2024

TobyRoseman commented Jan 2, 2025

twoertwein commented Jan 2, 2025

twoertwein commented Jan 2, 2025

twoertwein commented Dec 11, 2024 •

edited

Loading

twoertwein commented Dec 12, 2024 •

edited

Loading

twoertwein commented Dec 13, 2024 •

edited

Loading