This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Introduce a new post-isel optimization pass
ClosedPublic

Authored by aemerson on Oct 14 2020, 11:41 AM.

Details

Summary

This pass contains two optimizations.

a) Consider the following code:

FCMPSrr %0, %1, implicit-def $nzcv
%sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
%sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
FCMPSrr %0, %1, implicit-def $nzcv
%sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv

This kind of code where we have 2 FCMPs each feeding a CSEL can happen when we have a single IR fcmp being used by two selects. During selection, to ensure that there can be no clobbering of nzcv between the fcmp and the csel, we have to generate an fcmp immediately before each csel is selected.

However, often we can essentially CSE these together later in MachineCSE. This doesn't work though if there are unrelated flag-setting instructions in between the two FCMPs. In this case, the SUBS defines NZCV but it doesn't have any users, being overwritten by the second FCMP.

Our solution here is to try to convert flag setting operations between a interval of identical FCMPs, so that CSE will be able to eliminate one.

b) SelectionDAG imported patterns for arithmetic ops currently select the flag-setting ops for CSE reasons, and add the implicit-def nzcv operand to those instructions. However if those impdef operands are not marked as dead, the peephole optimizations are not able to optimize them into non-flag setting variants. The optimization here is to find these dead imp-defs and mark them as such.

This pass is only enabled when optimizations are enabled.

Diff Detail

Event Timeline

aemerson created this revision.Oct 14 2020, 11:41 AM
aemerson requested review of this revision.Oct 14 2020, 11:41 AM

I also have a follow up optimization for deduplicating compares when used by multiple selects which will live in this new pass.

aemerson planned changes to this revision.Oct 14 2020, 12:02 PM

Actually, this may interfere with the generic Peephole optimizations from doing their job around flag setting optimizations. I think the main issue here is that we're generating SUBS with a WZR physreg, which prevents the peepholes from working. I'll make a separate patch for that and re-purpose this review for the flag compare->select dedupe optimization.

paquette added inline comments.Oct 14 2020, 12:16 PM
llvm/lib/Target/AArch64/GISel/AArch64PostSelectOptimize.cpp
90

Maybe a little nicer?

for (auto &MI : instructionsWithoutDebug(MBB.rbegin(), MBB.rend())
92

Maybe a sledgehammer, but would it make sense to use LiveRegUnits here?

e.g. something like this would probably work?

LiveRegUnits LRU(...);
LRU.addLiveOuts(MBB);
bool NZCVDead = LRU.available(AArch64::NZCV);
for (...) {
   LRU.stepBackward(*II);

   // Did this instruction define NZCV?
   bool NZCVDeadAtCurrInstr = LRU.available(AArch64::NZCV);
   if (NZCVDead && !NZCVDeadAtCurrInstr) {
      // If we have a def and NZCV is dead...
   }

   NZCVDead = NZCVDeadAtCurrInstr;
}

This issue is remedies in D89419.

aemerson added inline comments.Oct 14 2020, 12:39 PM
llvm/lib/Target/AArch64/GISel/AArch64PostSelectOptimize.cpp
92

This particular optimization isn't needed anymore, but yes I couldn't quite remember what that LiveRegUnits utility was called.

aemerson updated this revision to Diff 298410.Oct 15 2020, 10:11 AM
aemerson edited the summary of this revision. (Show Details)

Use the LiveRegUnits utility to implement a similar optimization to before, except now we try to only do it between ranges of FCMP instructions.

aemerson updated this revision to Diff 298659.Oct 16 2020, 10:14 AM
aemerson edited the summary of this revision. (Show Details)

Add a second optimization that to be done using the same walk through the block as the fcmp range one. This optimization finds dead imp-def nzcv defs and marks them as dead, so peephole optimizations can work. As a result, the addsub tests now show the better codegen. We're missing 2 test failure fixes on addsub_ext.ll before we can use the same check lines, but otherwise all other checks are now the same as SDAG.

These changes seem to result in around a 0.7% geomean perf improvement on SPECINT2006 w/ -O3.

paquette added inline comments.Oct 22 2020, 3:03 PM
llvm/lib/Target/AArch64/GISel/AArch64PostSelectOptimize.cpp
1

some weird whitespace there

161

Should this assert that the function has been selected?

167

Remove braces?

aemerson updated this revision to Diff 300175.Oct 22 2020, 11:36 PM

Address comments.

This revision is now accepted and ready to land.Oct 23 2020, 9:59 AM
This revision was landed with ongoing or failed builds.Oct 23 2020, 10:18 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Oct 24 2020, 12:06 PM
thakis added inline comments.
llvm/utils/gn/secondary/llvm/lib/Target/AArch64/BUILD.gn
133

(fyi, no need to update files under llvm/utils/gn. they're updated automatically based on the cmake changes by a bot.)