This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] allow load/store merging if pairs can be rotated into place
ClosedPublic

Authored by spatel on Jul 10 2020, 9:20 AM.

Details

Summary

This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code.

This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895:
https://bugs.llvm.org/show_bug.cgi?id=41098
https://bugs.llvm.org/show_bug.cgi?id=44895

I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16".

Diff Detail

Event Timeline

spatel created this revision.Jul 10 2020, 9:20 AM

Can you create new PR for aarch64’s missed opportunity to use rev16?

Can you create new PR for aarch64’s missed opportunity to use rev16?

https://bugs.llvm.org/show_bug.cgi?id=46694

RKSimon accepted this revision.Jul 13 2020, 1:04 AM

LGTM - I'm curious whether rotates across more load/stores would be useful or not: https://gcc.godbolt.org/z/196ar9

This revision is now accepted and ready to land.Jul 13 2020, 1:04 AM

LGTM - I'm curious whether rotates across more load/stores would be useful or not: https://gcc.godbolt.org/z/196ar9

It's hard to say without a real app to show the usefulness. We're reducing instructions, but creating a longer critical path here, so the profitability of even the basic case will depend on uarch/benchmark.

"reverse_edge4_2()" in the godbolt example is effectively the same as the "rotate64_iterate()" regression test , so we already get that one. But the "rotate32_consecutive()" regression test shows a gap in our merging ability.

Compile-time may be another consideration on how far we should take this. IIRC, the existing merging code could cause noticeable slowdowns for large blocks with lots of memops.

This revision was automatically updated to reflect the committed changes.