Graph Pass: scaled_dot_product_attention_sliced_q #2418

jirmasek · 2024-12-16T14:32:01Z

For longer Q sequence lengths (typically >1024), it's beneficial to calculate the attention by an algorithm (inspired by Lazy Softmax) that is processing Q in chunks. The overall memory usage and execution time (given it's executed concurrently, e.g. on ANE) should be better, and in certain cases when models encounter OOMs for longer sequence lengths, models using this algorithm still work.

This PR implements a new graph pass that can optionally transform the MIL operation ios18.scaled_dot_product_attention into a set of operations calculating the attention by chunks of Q.

Parameters of the new graph pass:

min_seq_length (default: 1280) - the original MIL operation will only be transformed if the sequence length of Q is greater than or equal to this value.
seq_length_divider (default: 16) - defines the size of chunks (based on: chunk_size = sequence_length / seq_length_divider)

Example of performance of Depth-Anything model running on ANE:

original:
execution time: 131.55 ms
memory usage: 169.67 MB
with transformations applied by this graph pass:
execution time: 86.84 ms
memory usage: 93.34 MB

CI pipeline run: https://gitlab.com/coremltools1/coremltools/-/pipelines/1600785656

junpeiz · 2024-12-17T05:02:10Z

Great! Could you also add some concrete numbers in the PR description? For example, the memory and execution time of a model before/after using the slice Q algorithm?

junpeiz

The code LGTM. For the CI, could you pin peft==0.13.2 to see if you could get a green CI?

coremltools/converters/mil/mil/passes/tests/test_passes.py

coremltools/converters/mil/mil/passes/defs/scaled_dot_product_attention_sliced_q.py

junpeiz

Great work! The perf improvement (especially the memory save) looks really promising!

YifanShenSZ

Nice graph pass!

(Previously I thought the parallelization is done in kernel 😂 looks like we need to give them hint haha)

YifanShenSZ · 2025-01-03T08:02:05Z

coremltools/converters/mil/mil/passes/__init__.py

@@ -44,5 +44,6 @@
    optimize_state,
    optimize_tensor_operation,
    preprocess,
+    scaled_dot_product_attention_sliced_q,


Shall we name the source file as transformer.py?

Other names here are "category", so a larger categorical name might sound more suitable than a concrete graph pass name. Wdyt?

junpeiz requested review from TobyRoseman, aseemw, junpeiz, DawerG, YifanShenSZ, jakesabathia2 and cymbalrush December 16, 2024 14:55

junpeiz reviewed Dec 17, 2024

View reviewed changes

coremltools/converters/mil/mil/passes/tests/test_passes.py Outdated Show resolved Hide resolved

coremltools/converters/mil/mil/passes/tests/test_passes.py Outdated Show resolved Hide resolved

TobyRoseman reviewed Dec 18, 2024

View reviewed changes

coremltools/converters/mil/mil/passes/defs/scaled_dot_product_attention_sliced_q.py Show resolved Hide resolved

Jan Jirmasek added 5 commits December 23, 2024 18:52

graph pass - sdpa sliced q

02fdae2

code review feedback changes

88153af

fix issues with test CI runs

4873141

fix the unit tests expectations

c69cf80

simplify the attention calculation

05c5fc3

jirmasek force-pushed the jim/graph-pass-sdpa-sliced-q branch from 62daa78 to 05c5fc3 Compare December 23, 2024 18:56

junpeiz self-requested a review January 2, 2025 22:27

junpeiz approved these changes Jan 2, 2025

View reviewed changes

YifanShenSZ reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph Pass: scaled_dot_product_attention_sliced_q #2418

Graph Pass: scaled_dot_product_attention_sliced_q #2418

jirmasek commented Dec 16, 2024 •

edited

Loading

junpeiz commented Dec 17, 2024

junpeiz left a comment

junpeiz left a comment

YifanShenSZ left a comment

YifanShenSZ Jan 3, 2025

Graph Pass: scaled_dot_product_attention_sliced_q #2418

Are you sure you want to change the base?

Graph Pass: scaled_dot_product_attention_sliced_q #2418

Conversation

jirmasek commented Dec 16, 2024 • edited Loading

junpeiz commented Dec 17, 2024

junpeiz left a comment

Choose a reason for hiding this comment

junpeiz left a comment

Choose a reason for hiding this comment

YifanShenSZ left a comment

Choose a reason for hiding this comment

YifanShenSZ Jan 3, 2025

Choose a reason for hiding this comment

jirmasek commented Dec 16, 2024 •

edited

Loading