You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reported by Email:
"TL;DR: MIPS32-specific mbed-crypto/include/mbedtls/bn_mul.h:MULADDC_* code path is slower than the pure C fallback and should be disabled.
All the tables below compare "RSA-2048 : NNN public/s" values as displayed by "programs/test/benchmark rsa".
(clang mips32r5/p5600 results are merged as the generated code is identical)
Note gcc has some problem at -O2 and -mtune=74kf/p5600 on this CPU, however pure C is still faster than the asm macro.
Results should be valid for 74K*/1074K* (a bit simpler but similar pipeline).
In-order single-issue core with short load delay and relatively low-latency multiply, no surprises.
Results should be valid for 4K*/34K*/1004K*/M51xx etc as they all have identical or quite similar pipeline.
In-order MIPS32R1 core not from MTI/IMG/WaveComp, old compilers yet pure C is better than asm.
The C code advantage holds (with roughly the same ratio) for the rest of MPI-employing tests too.
It also makes the produced binaries consistently smaller (kinda ~400 bytes).
This patch limits the original asm code to MIPS I and II ISAs (instead of disabling it completely) as its history is unclear from github logs and there might be some very old MIPS CPUs which run it faster than C fallback.
Another option is some obscure non-gcc/non-clang compiler for MIPS MCUs which generates slow code from C; this should be handled explicitly IMHO without compromising the common case.
MIPS III, IV are explicitly omitted as all modern compilers still support these ISAs well and O32 ABI is not the natural choice there in any case.
gcc-9.2 and clang-8 are from alpine linux edge (musl-based), benchmark binaries are always dynamically linked;
gcc-7.3/6.3/4.9 are from codescape-2018.11-01/2018.09-03/2016.05-08, benchmark binaries are statically linked against glibc;
XBurst compilers are gcc-4.9 and clang-5 from Android NDK r16b (the last that supports MIPS).
include/mbedtls/bn_mul.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/mbedtls/bn_mul.h b/include/mbedtls/bn_mul.h
index 163869ae7..a15cd8058 100644
--- a/include/mbedtls/bn_mul.h
+++ b/include/mbedtls/bn_mul.h
@@ -750,7 +750,11 @@
);
#endif /* Alpha */
-#if defined(__mips__) && !defined(__mips64)
+/*
+ * This code is slower and bigger than C on P5600, 4K*, 24K* family and XBurst
+ * with both clang and gcc (even older versions -- down to llvm-5 and gcc-4.9).
+ */
+#if defined(__mips__) && __mips < 3
#define MULADDC_INIT \
asm( \
2.23.0.windows.1
"
Issue request type
[ ] Question
[ ] Enhancement
[x ] Bug
The text was updated successfully, but these errors were encountered:
Description
Reported by Email:
"TL;DR: MIPS32-specific mbed-crypto/include/mbedtls/bn_mul.h:MULADDC_* code path is slower than the pure C fallback and should be disabled.
All the tables below compare "RSA-2048 : NNN public/s" values as displayed by "programs/test/benchmark rsa".
1200MHz P5600 (gcc: -mips32r2 -mtune=24kf/74kf/p5600, clang: -mcpu=mips32r5/p5600):
clang-8 -O2: 887 (747)
clang-8 -Os: 898 (748)
clang-8 -Oz: 876 (735)
gcc-9.2 -O2: 898,803,800 (750,752,752)
gcc-9.2 -Os: 856,859,865 (730,729,744)
gcc-7.3 -O2: 904,806,801 (772,770,781)
gcc-7.3 -Os: 855,875,884 (775,774,774)
gcc-6.3 -O2: 896,807,820 (773,767,776)
gcc-6.3 -Os: 847,868,883 (752,756,755)
gcc-4.9 -O2: 897,805,797 (776,773,775)
gcc-4.9 -Os: 860,859,865 (768,774,773)
(clang mips32r5/p5600 results are merged as the generated code is identical)
Note gcc has some problem at -O2 and -mtune=74kf/p5600 on this CPU, however pure C is still faster than the asm macro.
Results should be valid for 74K*/1074K* (a bit simpler but similar pipeline).
580MHz 24KEc: (gcc: -mips32r2 -mtune=24kc/4kc/74kc):
gcc-9.2 -O2: 199,201,200 (168,167,167)
gcc-9.2 -Os: 207,192,194 (167,166,166)
gcc-7.3 -O2: 200,194,202 (171,171,171)
gcc-7.3 -Os: 207,194,194 (170,169,169)
gcc-6.3 -O2: 197,194,203 (171,170,170)
gcc-6.3 -Os: 204,192,193 (169,169,169)
gcc-4.9 -O2: 197,203,202 (171,171,170)
gcc-4.9 -Os: 190,190,190 (169,169,169)
In-order single-issue core with short load delay and relatively low-latency multiply, no surprises.
Results should be valid for 4K*/34K*/1004K*/M51xx etc as they all have identical or quite similar pipeline.
Finally, 1004MHz XBurst (gcc/clang: built-in NDK defaults):
clang-5 -O2: 264 (242)
clang-5 -Os: 264 (242)
clang-5 -Oz: 258 (238)
gcc-4.9 -O2: 329 (244)
gcc-4.9 -Os: 309 (241)
In-order MIPS32R1 core not from MTI/IMG/WaveComp, old compilers yet pure C is better than asm.
The C code advantage holds (with roughly the same ratio) for the rest of MPI-employing tests too.
It also makes the produced binaries consistently smaller (kinda ~400 bytes).
This patch limits the original asm code to MIPS I and II ISAs (instead of disabling it completely) as its history is unclear from github logs and there might be some very old MIPS CPUs which run it faster than C fallback.
Another option is some obscure non-gcc/non-clang compiler for MIPS MCUs which generates slow code from C; this should be handled explicitly IMHO without compromising the common case.
MIPS III, IV are explicitly omitted as all modern compilers still support these ISAs well and O32 ABI is not the natural choice there in any case.
gcc-9.2 and clang-8 are from alpine linux edge (musl-based), benchmark binaries are always dynamically linked;
gcc-7.3/6.3/4.9 are from codescape-2018.11-01/2018.09-03/2016.05-08, benchmark binaries are statically linked against glibc;
XBurst compilers are gcc-4.9 and clang-5 from Android NDK r16b (the last that supports MIPS).
2.23.0.windows.1
"
Issue request type
The text was updated successfully, but these errors were encountered: