add `Ops.convolution` #428

glou-nes · 2024-12-27T23:40:12Z

Add convolution using StableHLO function to compute output tensor used by verification/interpreter. Add a structure because I find a limit with ccall:

extern "C" void f(long a, long b, long c, long d,long e, MlirAttribute dn) {
	auto pp = unwrap(dn);
	pp.dump();
}

extern "C" void f_bad(long a, long b, long c, long d,long e,long f, MlirAttribute dn) {
	auto pp = unwrap(dn);
	pp.dump();
}

The second one causes segfault.
@wsmoses do you think this approach is viable?
Probably needs a new file when #421 land.

wsmoses · 2024-12-27T23:47:35Z

src/Ops.jl

+    featureGroupCount::Int,
+    batchGroupCount::Int,
+)
+    cp = ConvolutionParams(


Seems reasonable to me but yeah I am confused about the arg thing

mofeing · 2024-12-27T23:51:25Z

Why do we need to add a function to the C-API?

glou-nes · 2024-12-28T00:13:03Z

It's using the StableHLO C++ interpreter function to compute the output tensor size, it's not directly available anywhere. So ReactantExtra seems to be the best place for this.

mofeing · 2024-12-28T00:35:28Z

i know computing the size of the result of stablehlo.convolution is hard, but can't it be computed from the notes in the spec? specs are not suuuper clear over how to compute the result shape but i managed to do it for dot_general and i think i understand now how to do the "shape inference" for fft. it takes time but i believe it's better to avoid putting stuff in the C-API if we can do it in Julia. if not doable, totally in favor of adding stuff to the C-API ofc.

in the spec of stablehlo.convolution, i've found the following constraints that are useful for computing the resulting shape:

(C20) is telling you that output_dimensions = [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension], so you can focus on the different dimensions.
(C25) says that for dim(result, result_dim)
- dim(lhs, input_batch_dimension) / batch_group_count if result_dim = output_batch_dimension
- dim(rhs, kernel_output_feature_dimension) if result_dim = output_feature_dimension
- num_windows otherwise (i.e. for output_spatial_dimensions), and there are some equations on how to compute num_windows

so, if you know all the input args and attributes of stablehlo.convolution, you should be able to compute the result shape in Julia, right? correct me if i'm wrong or if there is some extra complexity i'm missing.

@wsmoses i guess we could open an issue in the StableHLO repo to ask for better docs on how to compute result shapes? it would be very useful for other ops too. at least some coherent naming would improve current situation..

glou-nes · 2024-12-28T01:36:00Z

Yes completely one can infer the result size from Julia, I started from Paul implementation in ReactantNNlibExt.jl but It was missing several attributes from the generic operation! It's probably not the hardest operation, but it's complex enough ~100 LoC in C++. I read the spec and it's quiet useful and complex . I don't understand why we want to redo the computation on Julia side, this code inferConvolutionOp is already executed during verification pass; advantages of this approach is to directly get verifier error from Julia without rewriting the 34 constraints and get the fullset of features.

wsmoses · 2024-12-28T02:22:06Z

Yes completely one can infer the result size from Julia, I started from Paul implementation in ReactantNNlibExt.jl but It was missing several attributes from the generic operation! It's probably not the hardest operation, but it's complex enough ~100 LoC in C++. I read the spec and it's quiet useful and complex . I don't understand why we want to redo the computation on Julia side, this code inferConvolutionOp is already executed during verification pass; advantages of this approach is to directly get verifier error from Julia without rewriting the 34 constraints and get the fullset of features.

I could go either way on this. To be clear I’m fine with either approach. Advantage of having a Julia re implementation of the rules means it’ll be easier to debug shape mismatch issues on the Julia side. Con is it requires re implementation, and of course if things change that’s annoying. That said stablehlo shouldn’t really change much especially here.

another thing you could do is you could export this function in the stablehlo c bindings upstream

mofeing · 2024-12-28T12:03:12Z

... without rewriting the 34 constraints and get the fullset of features.

but this is the thing, you don't need to write the 34 constraints in Julia. actually, some of these constraints are not coded in the already available Ops as assertions but you already use them as ways to compute the input arguments/attributes.

in the Ops i wrote myself, i didn't wrote all the constraints. what i did is write easy checks so that we can fast check in Julia any superficial problem (e.g. you're using different eltypes or shapes do not mismatch) and i left deep checks to XLA, who will raise an exception. now that we have propagation of XLA errors to Julia, we can also manage these XLA exceptions.

Con is it requires re implementation, and of course if things change that’s annoying. That said stablehlo shouldn’t really change much especially here.

if there are changes, i'm confident it will be on these deep checks left to XLA with high probabily so we won't need to do modifications (probably).

but not computing the result shape is gonna make debugging more difficult when it's sth we can do ourselves.

having said this, i'm not against this PR and i think this is really needed, so thanks @glou-nes for pushing this forward. we can merge it and refactor it in the future if we see that it's a problem.
just wanna make it clear that always there are tradeoffs, and anything that we put into the C side will make it harder to debug on the Julia side (not only for us, but also for users)

@wsmoses one thing to keep in mind is that the XLA exceptions will be thrown on the verification step, not during MLIR emission, so unless you've used with_debug, you won't be able to link the XLA exception to a region of user code. also, i don't like too much the current implementation of mlir_backtrace and with_debug, too much overhead on calling backtrace and maybe is easier if we just do sth like Profiler.jl does: make a snapshot of the callstack and process it later or in parallel. also, AFAIK the XLA exception will be seen as thrown from run_pass_pipeline! probably, and not from the faulting user code.

glou-nes · 2024-12-29T00:06:43Z

Thank @mofeing for the explanation! I just want to make a point : with this, one get the verification error, directly in MLIR emission, for instance:

function bad_conv(x, ker)
           pp = dimension_number(4, 3, 2, 3, 4)
           Reactant.Ops.convolution(x, ker, pp; padding=[1 2; 3 4; 5 6]) #wrong padding here
end
x = Reactant.to_rarray(randn(Float64, 224, 224, 1, 1))
ker = Reactant.to_rarray(randn(Float64, 10, 10, 1, 1))
@code_hlo optimize=false bad_conv(x, ker)

We have:

loc("convolution"("/depot/dev/Reactant/src/Ops.jl":622:0)): error: expects padding-entries to have same dimension-size as size of window dimensions (2), but got: 3.
ERROR: AssertionError: cannot infer result type #assertion in Ops
[stacktrace]

mofeing · 2024-12-29T01:12:38Z

... with this, one get the verification error, directly in MLIR emission, for instance:

loc("convolution"("/depot/dev/Reactant/src/Ops.jl":622:0)): error: expects padding-entries to have same dimension-size as size of window dimensions (2), but got: 3.
ERROR: AssertionError: cannot infer result type #assertion in Ops
[stacktrace]

you're getting the location of the Ops.jl method, but not the location of the user code that calls Ops.convolution. that's what i'm referring to.

for that, you need to use Ops.with_debug but currently adds overhead and the print is quite ugly.

glou-nes · 2024-12-29T01:53:42Z

Sure adding stacktrace in every location make verifier hard to read for instance. Here we got the stacktrace in [stacktrace] I don't add it to simplify the code. It is not what you called "the location of the user code that calls Ops.convolution"?

  [1] convolution(input::TracedRArray{…}, kernel::TracedRArray{…}, dimension_numbers::Attribute; feature_group_count::Int64, batch_group_count::Int64, window_strides::Vector{…}, padding::Matrix{…}, lhs_dilation::Vector{…}, rhs_dilation::Vector{…}, window_reversal::Vector{…}, location::Location)
    @ Reactant.Ops /depot/dev/Reactant/src/Ops.jl:643
  [2] bad_conv
    @ ./REPL[19]:3 [inlined]
  [3] bad_conv(none::TracedRArray{Float64, 4}, none::TracedRArray{Float64, 4})
    @ Reactant ./<missing>:0
  [4] bad_conv
    @ ./REPL[19]:2 [inlined]
  [5] call_with_reactant(::typeof(v), ::TracedRArray{Float64, 4}, ::TracedRArray{Float64, 4})
[...]

glou-nes added 2 commits December 28, 2024 00:28

convolution & infer result

3437b28

remove debug

d3a401b

wsmoses reviewed Dec 27, 2024

View reviewed changes

mofeing mentioned this pull request Dec 9, 2024

Tracking issue for shortcuts to MLIR ops #273

Open

missing part

03ed37a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `Ops.convolution` #428

add `Ops.convolution` #428

glou-nes commented Dec 27, 2024 •

edited

Loading

wsmoses Dec 27, 2024

mofeing commented Dec 27, 2024

glou-nes commented Dec 28, 2024

mofeing commented Dec 28, 2024 •

edited

Loading

glou-nes commented Dec 28, 2024

wsmoses commented Dec 28, 2024

mofeing commented Dec 28, 2024 •

edited

Loading

glou-nes commented Dec 29, 2024

mofeing commented Dec 29, 2024

glou-nes commented Dec 29, 2024 •

edited

Loading

add Ops.convolution #428

Are you sure you want to change the base?

add Ops.convolution #428

Conversation

glou-nes commented Dec 27, 2024 • edited Loading

wsmoses Dec 27, 2024

Choose a reason for hiding this comment

mofeing commented Dec 27, 2024

glou-nes commented Dec 28, 2024

mofeing commented Dec 28, 2024 • edited Loading

glou-nes commented Dec 28, 2024

wsmoses commented Dec 28, 2024

mofeing commented Dec 28, 2024 • edited Loading

glou-nes commented Dec 29, 2024

mofeing commented Dec 29, 2024

glou-nes commented Dec 29, 2024 • edited Loading

add `Ops.convolution` #428

add `Ops.convolution` #428

glou-nes commented Dec 27, 2024 •

edited

Loading

mofeing commented Dec 28, 2024 •

edited

Loading

mofeing commented Dec 28, 2024 •

edited

Loading

glou-nes commented Dec 29, 2024 •

edited

Loading