This is an archive of the discontinued LLVM Phabricator instance.

[x86] replace div/rem with shift/mask for shuffle combining
ClosedPublic

Authored by spatel on Jun 13 2017, 2:49 PM.

Details

Summary

We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that, so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask.

This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 ) about 9% faster overall.

I didn't bother with turning the multiplies into shifts, but we could check the perf from that transform and do that too if this isn't going too far already. :)

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Jun 13 2017, 2:49 PM
RKSimon accepted this revision.Jun 13 2017, 3:02 PM

LGTM

This revision is now accepted and ready to land.Jun 13 2017, 3:02 PM
chandlerc edited edge metadata.Jun 14 2017, 12:35 AM

This makes lots of sense to me. Honestly, fixing the multiplies might be worth it -- if you can benchmark >1% improvement, I might do it. I'm thinking about the architectures where integer multiplies are *very* slow (atom at least, also some ARM chips)

This makes lots of sense to me. Honestly, fixing the multiplies might be worth it -- if you can benchmark >1% improvement, I might do it. I'm thinking about the architectures where integer multiplies are *very* slow (atom at least, also some ARM chips)

Thanks for the review! Let me mark the multiplies with a TODO in this commit, and then I'll check that as a follow-up. I suspect that we can probably do better still by avoiding this entirely in some cases, but I need to run some more experiments.

This revision was automatically updated to reflect the committed changes.