Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Grid loop creation for gemms #885

Merged
merged 12 commits into from
Feb 20, 2024

Conversation

KavithaTipturMadhu
Copy link
Contributor

No description provided.

@rengolin
Copy link
Contributor

Awesome! I'm seeing 38% improvement on mlir-gen BF16 16-threads (smaller improvements with less threads), and 22% for our PyTorch models BF16 16-threads, and no reductions in single-threaded benchmarks. I'll look at the code soon, but at least now we can run some benchmarks in parallel.

include/TPP/Dialect/Transform/SCF/Dialect/Passes.h Outdated Show resolved Hide resolved
lib/TPP/DefaultPipeline.cpp Outdated Show resolved Hide resolved
lib/TPP/DefaultPipeline.cpp Outdated Show resolved Hide resolved
lib/TPP/Dialect/Transform/CMakeLists.txt Outdated Show resolved Hide resolved
lib/TPP/Dialect/Xsmm/XsmmUtils.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/CombineXsmmPass.cpp Show resolved Hide resolved
test/Passes/pass-parallel-tile.mlir Show resolved Hide resolved
@alheinecke
Copy link
Contributor

BTW: I cannot build this PR:

/opt/rh/gcc-toolset-13/root/usr/lib/gcc/x86_64-redhat-linux/13/../../../../bin/ld: lib/libTPPPipeline.a(DefaultPipeline.cpp.o): in function (anonymous namespace)::DefaultPipeline::constructPipeline()': /../lib/TPP/DefaultPipeline.cpp:159: undefined reference to mlir::tpp::createSCFParallelLoopTiling(mlir::tpp::SCFParallelLoopTilingOptions const&)'
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)

@alheinecke
Copy link
Contributor

I could make it compile, after making it a default TPP pass
alheinecke@f97604c

@rengolin
Copy link
Contributor

I could make it compile, after making it a default TPP pass alheinecke@f97604c

This looks like the right fix. Adding it to the transform dialect is the wrong direction.

I'm not sure why the CI can build it and you can't. Perhaps a clean build would works? Regardless, your patch is the right direction anyway.

@alheinecke
Copy link
Contributor

I did a fully clean build... and following the instructions from our ReadME (have them in a script, where I just set the branch name and origin). It seems something went sideways that pass was in the LinAlgXTransform library?!

@rengolin
Copy link
Contributor

I did a fully clean build... and following the instructions from our ReadME (have them in a script, where I just set the branch name and origin). It seems something went sideways that pass was in the LinAlgXTransform library?!

It was, wrong place. Not sure how the CI build it either. I get this error with the PR:

lib/TPP/Dialect/Transform/SCFParallelLoopTiling.cpp:266:9: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
        tileParallelLoop(ploop, tileSizes, noMinMaxBounds);
        ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

But that's unrelated to your error / fix, I think.

@KavithaTipturMadhu
Copy link
Contributor Author

I did a fully clean build... and following the instructions from our ReadME (have them in a script, where I just set the branch name and origin). It seems something went sideways that pass was in the LinAlgXTransform library?!

It was, wrong place. Not sure how the CI build it either. I get this error with the PR:

lib/TPP/Dialect/Transform/SCFParallelLoopTiling.cpp:266:9: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
        tileParallelLoop(ploop, tileSizes, noMinMaxBounds);
        ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

But that's unrelated to your error / fix, I think.

Alex has fixed the return value of the pass to fix this error.

Copy link
Contributor

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself is very confusing, with manual copying the basic block, #if 0 and dead variables, but as discussed, it should be fine until we gather more examples and understand all the behaviour we want form it, including the dynamic case.

See if you can add some comment to the confusing parts trying to explain what you were trying to do or didn't do because of another problem (for example, the basic block copy), so that we can more easily read it later and fix it.

lib/TPP/Transforms/SCFParallelLoopTiling.cpp Outdated Show resolved Hide resolved
@KavithaTipturMadhu
Copy link
Contributor Author

The code itself is very confusing, with manual copying the basic block, #if 0 and dead variables, but as discussed, it should be fine until we gather more examples and understand all the behaviour we want form it, including the dynamic case.

See if you can add some comment to the confusing parts trying to explain what you were trying to do or didn't do because of another problem (for example, the basic block copy), so that we can more easily read it later and fix it.

I have added a couple of comments for now and gotten rid of #if 0 block and the dead code.

@KavithaTipturMadhu KavithaTipturMadhu merged commit 6f4b13b into plaidml:main Feb 20, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants