When allowed, use 32-bit indices rather than 64-bit indices in the
SIMD computation of masks. This runs up to 2x and 4x faster on
a number of AVX2 and AVX512 microbenchmarks.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Looks good. Would it make sense to automatically enable this if the incoming memref is known to have fewer than 2^32 elements?
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
127 | clang-tidy rightfully complains about the name not being camelBack :) |
Comment Actions
I was pondering over this. Would a 32-bit index space in general suffice?
For now I made it an option we can play around with and keep this automatic optimization in mind.
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
127 | you are right, I am mixing my styleguides |
Comment Actions
Thanks for tracking and fixing the perf bug Aart!
Maybe this also influences how aggressive we need to be in splitting in the codegen strategy to get to peak.
Will also be interesting to see the effects on mobile (cc @asaadaldien )
clang-tidy rightfully complains about the name not being camelBack :)