-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEDeconvolutionLayer
performance degradation in v24.11
#1150
Comments
Hi @alvoron For |
@morgolock can you install OpenMP by calling |
Hi @alvoron Will do but this is not something we normally test or support on |
@morgolock I've got almost the same results with |
Hi @alvoron Thanks for the additional information. I reproduced the issue. We are looking into it. |
Hi @morgolock |
Hi @alvoron No, the regression has not been fixed yet. Hope this helps |
@morgolock |
@morgolock Could you please double check the fix? Or some additional patches are required? |
Hi @alvoron I ran the test and I can confirm that it fixes the regression on I built the library with the following options: See the results below
|
Hi @alvoron Make sure you explicitly set the memory manager when you create the DeconvLayer, as shown below.
If you apply the patch and initialize NEDeconvolutionLayer with the MemoryManager you will solve the performance issue. Hope this helps |
@morgolock We're using ACL Convolution kernel via oneDNN, so the fix needs to be done on oneDNN side to recover Convolution performance. Could you please check GEMM as well? |
Hi @alvoron
In v24.11 we introduced this patch to improve memory management in ACL. This patch reduces considerably memory usage in some models. A side effect of this change is that it requires the user of the library to explicitly setup and configure the memory manager as shown in the reproducer above to get the best performance, otherwise you will see a performance regression. If you setup the memory manager as in the reproducer you will see no performance regression in v24.12. Hope this helps. |
@morgolock how to configure the memory manager if I'm using oneDNN to call ACL Convolution kernel? Does oneDNN provide an API to setup ACL memory manager? |
@morgolock @theComputeKid |
@alvoron Thanks, we will get back to you on the oneDNN side after our people get back from holidays. @morgolock it might be likely that this is a bug in the way we implemented stateless conv (we have a workaround in oneDNN for winograd, which is probably slowing things down). We might need to apply some fixes to make it threadsafe, so that the oneDNN workaround is not required. Would you like to track this as part of this issue? In which case, we should rename the issue to "stateless conv performance worse than NEConv". |
How ACL was built:
Platform:
Apple M2 Pro
Operating System:
macOS 13.4
Problem description:
NEDeconvolutionLayer
performance in 24.09 is better than in 24.11.Reproducer
How reproducer was built
The reproducer gives
7038
on 24.09 and10669
on 24.11.Could you please review potential performance issues in
NEDeconvolutionLayer
?I also observe degradation in Convolution, probably Deconvolution and Convolution issues have the same cause.
Also, it's worth to mention, I haven't observed these degradations on Ampere.
The text was updated successfully, but these errors were encountered: