This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Improve SIMD compares with narrower indices
ClosedPublic

Authored by aartbik on Sep 3 2020, 4:01 PM.

Details

Summary

When allowed, use 32-bit indices rather than 64-bit indices in the
SIMD computation of masks. This runs up to 2x and 4x faster on
a number of AVX2 and AVX512 microbenchmarks.

Diff Detail

Event Timeline

aartbik created this revision.Sep 3 2020, 4:01 PM
Herald added a project: Restricted Project. · View Herald Transcript
aartbik requested review of this revision.Sep 3 2020, 4:01 PM
bkramer accepted this revision.Sep 3 2020, 4:30 PM

Looks good. Would it make sense to automatically enable this if the incoming memref is known to have fewer than 2^32 elements?

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
127

clang-tidy rightfully complains about the name not being camelBack :)

This revision is now accepted and ready to land.Sep 3 2020, 4:30 PM
aartbik marked an inline comment as done.Sep 3 2020, 5:14 PM

Looks good. Would it make sense to automatically enable this if the incoming memref is known to have fewer than 2^32 elements?

I was pondering over this. Would a 32-bit index space in general suffice?
For now I made it an option we can play around with and keep this automatic optimization in mind.

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
127

you are right, I am mixing my styleguides

aartbik updated this revision to Diff 289836.Sep 3 2020, 5:17 PM
aartbik marked an inline comment as done.

fixed method case, thanks Ben!

Thanks for tracking and fixing the perf bug Aart!
Maybe this also influences how aggressive we need to be in splitting in the codegen strategy to get to peak.
Will also be interesting to see the effects on mobile (cc @asaadaldien )