This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add pass to remove redundant copy after RA
ClosedPublic

Authored by junbuml on Jan 14 2016, 12:33 PM.

Details

Summary

This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :

BB0:
  cbz x0, .BB1
BB1:
  mov x0, xzr

Jun

Diff Detail

Repository
rL LLVM

Event Timeline

junbuml updated this revision to Diff 44915.Jan 14 2016, 12:33 PM
junbuml retitled this revision from to [AArch64]Remove Copy zero after cbz.
junbuml updated this object.
mcrosier retitled this revision from [AArch64]Remove Copy zero after cbz to [AArch64] Remove copy zero after cbz.Jan 14 2016, 1:57 PM
mcrosier updated this object.
mcrosier added reviewers: mcrosier, gberry.
junbuml updated this revision to Diff 44925.Jan 14 2016, 2:33 PM
junbuml retitled this revision from [AArch64] Remove copy zero after cbz to [AArch64] Remove copy zero after cbz/cbnz.
junbuml updated this object.
junbuml added reviewers: jmolloy, HaoLiu.
mcrosier added inline comments.Jan 15 2016, 11:39 AM
lib/CodeGen/MachineCopyPropagation.cpp
342 ↗(On Diff #44925)

Range-based loop?

lib/Target/AArch64/AArch64InstrInfo.cpp
985 ↗(On Diff #44925)

What about something like this:
This function removes unnecessary zero copies from BBs that are targets of
cbz/cbnz instructions. For instance, the copy instruction in below code can
// be removed because the cbz jumps to BB#2 when w0 is zero.

1031 ↗(On Diff #44925)

Range-based loop?

lib/Target/AArch64/AArch64InstrInfo.h
167 ↗(On Diff #44925)

Please use doxygen (i.e., ///) style comments. Also, you should remove the function name from the comment as Doxygen knows what to do (see r253029 as an example).

test/CodeGen/AArch64/machine-copy-remove.ll
57 ↗(On Diff #44925)

I'd suggest adding a comment to this case mention that the transform isn't safe because we don't know if the upper bits of the X register are zero.

junbuml added inline comments.Jan 15 2016, 12:43 PM
lib/Target/AArch64/AArch64InstrInfo.cpp
985 ↗(On Diff #44925)

Thanks! Looks better !

1031 ↗(On Diff #44925)

I don't think it's possible because we remove MI in the loop? Please correct me if I'm wrong.

lib/Target/AArch64/AArch64InstrInfo.h
167 ↗(On Diff #44925)

Oh.. Thanks. I will fix it.

test/CodeGen/AArch64/machine-copy-remove.ll
57 ↗(On Diff #44925)

I will do it !

junbuml updated this revision to Diff 45206.Jan 18 2016, 1:03 PM
junbuml updated this object.

Addressed Chad's comments.

MatzeB requested changes to this revision.Feb 8 2016, 2:17 PM
MatzeB edited edge metadata.

I don't like the integration into MachineCopyPropagation here, the algorithm is not using any other information or methods of that pass (except for the fact that the pass iterates over all basic blocks). This is not enough reason to introduce a new TII callback. You can just as well make this a standalone aarch64 specific pass.

The scope of the algorithm could be extended, I don't see why it should not work for cases like "cmp reg, #CC; b.(n)e XX ... mov reg, #CC" (i.e. any reg getting compared equal/unequal to a constant). In principle this could also be extended to check the whole dominance subtree below the comparison though I guess it is not worth the compiletime to compute a dominance tree here just for that.

lib/CodeGen/MachineCopyPropagation.cpp
342 ↗(On Diff #45206)

Using "MachineInstr &MI" instead of "auto &I" would be friendlier to the reader.

lib/Target/AArch64/AArch64InstrInfo.cpp
995 ↗(On Diff #45206)

I would move this downwards closer to the point where it is first assigned.

1052–1054 ↗(On Diff #45206)

These things cannot simply change the value of registers if they do they need additional MachineOperands with defs, uses, regmasks which already be handled below.

1056–1057 ↗(On Diff #45206)

This could be for (const MachineOperand &MO : MI->operands())

1066–1067 ↗(On Diff #45206)

This check can be moved into the assert(), as TargetRegs.count(0) will always be false.

This revision now requires changes to proceed.Feb 8 2016, 2:17 PM

Thanks Matthias for the review.

the algorithm is not using any other information or methods of that pass (except for the fact that the pass iterates over all basic blocks). This is not enough reason to introduce a new TII callback. You can just as well make this a standalone aarch64 specific pass.

I also agree with your comments , but I doubt introducing a new pass for this is reasonable. I may want to try to find any other good place for this. Please also let me know if you have any suggestion.

I doubt introducing a new pass for this is reasonable.

I don't think an AArch64-specific pass is all that unreasonable, especially if you have plans to extend the scope like Matthias suggests. AArch64 already has a few small passes (the AAarch64DeadRegisterDefinitions pass comes to mind as a similar example).

AArch64 already has a few small passes (the AAarch64DeadRegisterDefinitions pass comes to mind as a similar example).

Thanks Matthew. I'm looking at this pass now.

junbuml updated this revision to Diff 47816.Feb 12 2016, 10:07 AM
junbuml retitled this revision from [AArch64] Remove copy zero after cbz/cbnz to [AArch64] Add pass to remove redundant copy after RA.
junbuml updated this object.
junbuml edited edge metadata.

Based on the comments from Matthias and Matthew, added a new pass to remove redundant zero copies.

mcrosier added inline comments.Feb 12 2016, 10:33 AM
lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
8 ↗(On Diff #47816)

Were we planning on adding support for handling arbitrary const copies, per Matthias' suggestion?

junbuml added inline comments.Feb 12 2016, 10:47 AM
lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
8 ↗(On Diff #47816)

In this patch, I want to handle zero first which applied in most spec2000/2006 benchmakrs. And then I will remove below FIXME as I handle the arbitrary mov #imm cases as a separate patch. I believe this could lead review process easier.

mcrosier edited edge metadata.Feb 12 2016, 11:16 AM

Sounds reasonable to me, Jun.

By placing this pass before the control flow opt, we could remove basic blocks which contain only MOV zero.
E.g., we convert :

  cbz     x8, BB1
  ldr     w8, [x8,#4]
  b       BB2 
BB1:  
  mov     w8, wzr
BB2:  
  cmp     w22, w8

into

  cbz     x8, BB1 
  ldr     w8, [x8,#4]
BB1:
  cmp     w22, w8

This transformation happens in almost all spec2006 benchmarks.

MatzeB accepted this revision.Feb 12 2016, 11:39 AM
MatzeB edited edge metadata.

LGTM with nitpicks addressed.

lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
39 ↗(On Diff #47816)

I'd call this NumCopiesRemoved.

117–118 ↗(On Diff #47816)

Could be expressed as for (MachineInstr &MMI : make_range(MBB->begin(), MI->getIterator()))

162 ↗(On Diff #47816)

It's friendlier to people reading the sourcecode to write MachineBasicBlock &MBB : MF

This revision is now accepted and ready to land.Feb 12 2016, 11:40 AM

Thanks Matthias for your super fast review. I will land it after fixing your comment.

junbuml updated this revision to Diff 47846.Feb 12 2016, 1:18 PM
junbuml edited edge metadata.

Addressed Matthias' comment.

This revision was automatically updated to reflect the committed changes.