This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Redundant copy elimination - remove more zero copies.
ClosedPublic

Authored by mcrosier on Jul 6 2017, 11:03 AM.

Details

Summary

This patch removes unnecessary zero copies in BBs that are targets of b.eq/b.ne and we know the result of the compare instruction is zero. For example,

BB#0:
  subs w0, w1, w2
  str w0, [x1]
  b.ne .LBB0_2
BB#1:
  mov w0, wzr  ; <-- redundant
  str w0, [x2]
.LBB0_2

See test cases for additional examples.

Chad

Diff Detail

Event Timeline

mcrosier created this revision.Jul 6 2017, 11:03 AM
mcrosier edited the summary of this revision. (Show Details)Jul 6 2017, 11:24 AM

Why did it not produce:

str wzr, [x2]

Instead? this saves the one instruction like before and also seems better for some better in general as it is not dependent on the subs.

Why did it not produce:

str wzr, [x2]

Instead? this saves the one instruction like before and also seems better for some better in general as it is not dependent on the subs.

Yes, with a bit more work that seem very possible. However, I'd prefer to address that in a subsequent patch. I can add a FIXME, if you prefer.

As a concrete example of how this might improve codegen, having the str wzr, [x2] could expose additional opportunities for the AArch64 load/store optimization pass to pair/merge more zero stores.

gberry edited edge metadata.Jul 18 2017, 12:31 PM

Can you re-upload the diff with full context?

mcrosier updated this revision to Diff 107155.Jul 18 2017, 12:35 PM

-Upload with full diff context, per Geoff's request.

mcrosier added inline comments.Jul 18 2017, 12:55 PM
lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
250

I just caught this little hiccup here.. This should be:

if (DstReg == AArch64::WZR || DstReg == AArch64::XZR)
  return false;

The current code can't possibly evaluate to true. The result is that we push {AArch64::WZR, 0} and {AArch64::XZR, 0} pairs onto the KnownRegs vector. While this is actually very true, this has no effect other then to increase compile-time.

I'll fix this now and rerun correctness just to be safe.

First pass comments below:

lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
135

This comment needs updating.

248

I think a few opcodes are missing here (it also might be a good idea to sort these so you can visually search through them more easily):
ADCSWr
ADCSXr
SBCSWr
SBCSXr
ADDSWrx
ADDSXrx
ADDSXrx64
SUBSWrx
SUBSXrx
SUBSXrx64

mcrosier added inline comments.Jul 18 2017, 1:21 PM
lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
135

Will do.

248

Sure, I can add those (along with test cases).

A few more comments (still looking)...

lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
7–8

This comment needs some updating as well: not just zero copies are removed and not just cbz/cbnz branches are handled.

19

Change this to 'if the flag setting opcode defined a register'

48–51

Not really related, but can you add a FIXME to look into handling CCMP opcodes as well?

206–207

Can you move this check down into the if(!ClobberedRegs[SrcReg] ...) check so that we don't miss cases where the destination is not WZR/XZR but the immediate is symbolic?

test/CodeGen/AArch64/machine-zero-copy-remove.mir
2

It seems like overkill to test every opcode here. I think more negative test coverage would be better.

Final comments below:

lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
155

I find reviewing this part of the change a bit confusing since we're using ClobberedRegs in both knownRegValInBlock() and optimizeCopy(), but the uses are distinct as far as I can tell. I wouldn't object to a refactoring change that split these uses into two variables.

mcrosier updated this revision to Diff 107516.Jul 20 2017, 8:02 AM

Address most of Geoff's feedback.

The one remaining ask is to add additional negative tests, which I'm working on this very moment.

mcrosier marked 10 inline comments as done.Jul 20 2017, 8:02 AM
mcrosier updated this revision to Diff 107523.Jul 20 2017, 8:22 AM

-Add one additional FIXME from an earlier feedback.

mcrosier updated this revision to Diff 107565.Jul 20 2017, 12:18 PM

-Add a few more negative tests, per Geoff's request.

mcrosier marked an inline comment as done.Jul 20 2017, 12:19 PM
mcrosier updated this revision to Diff 107567.Jul 20 2017, 12:32 PM

Just for fun add one more test where we can remove a redundant copy and a redundant move.

gberry accepted this revision.Jul 20 2017, 2:20 PM
This revision is now accepted and ready to land.Jul 20 2017, 2:20 PM
This revision was automatically updated to reflect the committed changes.