Page MenuHomePhabricator

[AArch64] Extend redundant copy elimination pass to handle non-zero stores.

Authored by mcrosier on Jan 31 2017, 12:53 PM.



This patch extends the current functionality of the AArch64 redundant copy elimination pass to handle non-zero cases such as:

  cmp x0, #1
  b.eq .LBB0_1
  orr x0, xzr, #0x1  ; <-- redundant copy; x0 known to hold #1.

This removes redundant copies from most of the SPEC200X tests. When running on Kryo, I saw a 2.7% and 1% improvement in spec2000/gap and spec2006/povray, respectively. All other testing was within noise.


Diff Detail


Event Timeline

mcrosier created this revision.Jan 31 2017, 12:53 PM
junbuml added inline comments.Feb 1 2017, 10:23 AM
121 ↗(On Diff #86479)

Is it always safe to do this directly without any check for isConditionalBranch(), Opcode, or MI.getOperand(0).isImm()?

137 ↗(On Diff #86479)

If we didn't see any instruction that sets NZCV between MI and PredMBB->begin() and NZCV is live-in from PredMBB's single predecessor, then we could continue scan the single predecessor of PredMBB. Maybe different patch if it make sense.

160 ↗(On Diff #86479)

At this point we know that I->getOperand(0) is zero register, so this if statement will be taken only when I->getOperand(1) is zero register. I don't see any problem when both getOperand(0) and getOperand(1) are zero register.

222 ↗(On Diff #86479)

Please update the comment as we continue checking non-zero case.

233 ↗(On Diff #86479)

Is there any case where both ZeroRegVal and KnownRegVal is true? If not, adding assert() will be good.

264 ↗(On Diff #86479)

We could move check for ZeroRegVal above if statement in line 259.

283 ↗(On Diff #86479)

The comment in MachineInstr.h mention that isMoveImmediate() return true for a move immediate (including conditional moves). I think the conditional move is not the case for AArch64, but just want to double confirm.

283 ↗(On Diff #86479)

We can move KnownRegVal in line 283. It seems that both ZeroRegVal and KnownRegVal cannot be true at the same time. Then we can use if-else, instead of if-if.

evandro added a subscriber: evandro.Feb 6 2017, 3:13 PM
mcrosier added inline comments.Feb 22 2017, 7:34 AM
121 ↗(On Diff #86479)

Yes, I believe so. We directly check that the opcode is Bcc above, so we know MI must be a conditional branch at this point. AFAIK, the 0 operand is always the CC for a Bcc.

137 ↗(On Diff #86479)

True, but I'd prefer to defer this for another day as you suggest.

160 ↗(On Diff #86479)

You're correct. I'll remove accordingly.

222 ↗(On Diff #86479)

Will do.

233 ↗(On Diff #86479)

No. I'll add an assertion.

283 ↗(On Diff #86479)

For AArch64, only the MOVi32imm and MOVi64imm pseudos will return true for isMoveImmediate() and these instructions are unconditionally executed.

mcrosier updated this revision to Diff 89373.Feb 22 2017, 9:26 AM

Address Jun's comments.

mcrosier marked 5 inline comments as done.Feb 22 2017, 9:35 AM
mcrosier added inline comments.
283 ↗(On Diff #86479)

I hoisted the ZeroRegVal and KnownRegVal to the outer checks, but left the code as an if-if, rather thank if-else to reduce indention. Let me know if you strongly prefer the the if-else and I'll happily

mcrosier updated this revision to Diff 89374.Feb 22 2017, 9:36 AM
mcrosier marked an inline comment as done.

Include test case that was accidentally dropped from previous diff.

MatzeB accepted this revision.Feb 22 2017, 10:22 AM

This looks good to me when the comments below are addressed.

84–86 ↗(On Diff #89374)

Looks like you synced that with, good.

105–108 ↗(On Diff #89374)

Use doxygen /// comment. It would help to talk about block \p MBB instead of current block, similarily to explain some of the other parameters.

120 ↗(On Diff #89374)

Is that cast to (int) here necessary?

129–130 ↗(On Diff #89374)

Add a comment that this is a shortcut for a block that is empty except for the branch.

136 ↗(On Diff #89374)

This slightly confused me as most loops in LLVM first initialize/declare the iteration variable then the end variable.

This could also be written as a range based for: for (MachineInstr &PredMI : make_range(std::next(MI.getReverseIterator()), PredMBB->rend()).

146–148 ↗(On Diff #89374)

does the DestReg matter here? I would assume this transformation to work as well when we write to an actual register...

155 ↗(On Diff #89374)

Somewhat off-topic from this specific patch and I don't expect this to be fixed here: It feels bad to scatter all those magic values like the 3 here around the targets, this function is another example where we have a lot of magic input numbers. At some point we should add some getter/accessor functions somewhere to give them a name or extend tablegen to generate some enum constants for us...

170 ↗(On Diff #89374)

typo but->that

182–183 ↗(On Diff #89374)

You can use DEBUG(dbgs() << "Remove redundant copy: " << *MI);

218 ↗(On Diff #89374)

Can't you just have a single variable KnownVal or similar here? You can then use KnownVal == 0 instead of ZeroRegVal.

271–280 ↗(On Diff #89374)

Do we really need to split this into Case 1/Case 2? There seems to be more equal code than different one so I would expect this to be better merged (with a slightly more complicated condition to check).

328 ↗(On Diff #89374)

You may even delay the resizing/allocation of the bitvector to the point where it is actually used (resizing it multiple times to the same size shouldn't hurt performance).

1 ↗(On Diff #89374)

The 2>&1 is no longer necessary for .mir tests and is discouraged now as it hides abort()/assert() messages if they happen.

5–8 ↗(On Diff #89374)

The IR block can be left out completely in this case the MIRParser will create the dummy functions.

14 ↗(On Diff #89374)

This can be slightly simplified by leaving out the block frequencies (i.e. just successors: %bb.1, %bb.2 Similar for the other blocks.

38 ↗(On Diff #89374)

I also tend to use CHECK-LABEL: name: test1 to better match the mir output with the typo this probably matched the dummy IR function instead of the beginning of the actual test1 mir function.

This revision is now accepted and ready to land.Feb 22 2017, 10:22 AM
mcrosier marked 12 inline comments as done.Feb 28 2017, 9:37 AM
mcrosier added inline comments.
120 ↗(On Diff #89374)

It is not. Will remove.

146–148 ↗(On Diff #89374)

No, this is a simplification not a requirement. I'll remove this check. I think the only necessary check is to make sure the compare doesn't clobber itself (e.g., SUBS x1, x1, 5).

155 ↗(On Diff #89374)

I agree.

mcrosier updated this revision to Diff 90051.EditedFeb 28 2017, 9:38 AM
mcrosier marked 2 inline comments as done.

Address Matthias' comments as well as rebase/rework patch after r295863.

junbuml edited edge metadata.Feb 28 2017, 10:54 AM


115 ↗(On Diff #90051)

This return the compare instruction the branch is predicated upon.

214 ↗(On Diff #90051)

Please remove it.

215–226 ↗(On Diff #90051)

Isn't it possible to merge the "if" and "else if" block as these two block do almost the same thing ? Maybe move guaranteesZeroRegInBlock() in knownRegValInBlock() ?

junbuml added inline comments.Feb 28 2017, 10:56 AM
115 ↗(On Diff #90051)

Please ignore this.

mcrosier updated this revision to Diff 90069.Feb 28 2017, 11:50 AM

Address Jun's feedback. Thanks, Jun.

mcrosier marked 4 inline comments as done.Feb 28 2017, 11:51 AM
gberry edited edge metadata.Feb 28 2017, 2:24 PM

A few minor comments below. Two questions:

  • can we avoid doing two backward scans of the pred block? perhaps just tracking the clobbers once
  • can you add some negative tests that check that clobbers inhibit this optimization when expected?
114 ↗(On Diff #90051)

Typo 'whos' -> 'whose'

57 ↗(On Diff #90069)

This typedef seems unnecessary. Why not just struct RegImm?

59 ↗(On Diff #90069)

This could be int32_t to save some space since SUBS can only reference 12-bit immediates (24 with lsl 12 which isn't currently handled by this change), but it probably isn't worth it since there aren't many RegImm objects.

171 ↗(On Diff #90069)

Typo 'include' -> 'includes'

226 ↗(On Diff #90069)

Perhaps invert the condition above and put the continue inside it to get rid of the 'else' here

mcrosier updated this revision to Diff 90194.Mar 1 2017, 9:22 AM
mcrosier marked 5 inline comments as done.

-Address Geoff's inline comments.
-Add a few negative tests.
-Use a single ClobberReg bit vector to track clobbered registers.

While I think it's possible to only scan the predecessor instructions once, I believe it would add a fair amount of complexity. More importantly we would need to maintain a list of copies every time we scan the predecessors (even in those cases when we can't optimize the move immediate/copy, which is the common case). Regardless, I'm happy to discuss this offline and update everyone with the result of that discussion.

I'm convinced the two passes over PredBB makes sense.

Could you add a couple of test cases where the known register and the eliminated MOV are different register sizes? I feel like we may get into trouble in the case where we're depending on the MOVi32imm zeroing out the high 32-bits of a 64-bit value. For example:

CMP w0, 1
BCC EQ, b2


w0 = MOVi32imm 1
STORE x0 # depends on high 32-bits of w0 being zero'd by MOVi32imm

A few more questions/suggestions below.

62 ↗(On Diff #90194)

Perhaps this could return an Optional<RegImm>, then you could get rid of the two out ref parameters?

173 ↗(On Diff #90194)

I think you could get rid of lines 172 and 173 if you do the trackRegDefs before the ClobberedRegs check.

215 ↗(On Diff #90194)

I believe this is no longer needed here.

355 ↗(On Diff #90194)

I don't think FirstUse is being calculated correctly in the case of the non-zero immediate. E.g. if there are no COPYs to worry about, FirstUse should be the SUBS compare instruction since we want to remove the kill of the known value register in that case if it is present. Your test case test10 should show this problem, though I'm not sure why -verfiy-machineinstrs isn't catching it.

mcrosier updated this revision to Diff 90352.Mar 2 2017, 10:00 AM
mcrosier marked 3 inline comments as done.

Address Geoff's latest comments.

mcrosier marked an inline comment as done.Mar 2 2017, 10:03 AM

I added the test cases you suggested (and cleaned up the others). In particular, I added the test (i.e., test 13) that has a 32 bit comparison and a move immediate that implicitly defines the upper bits. We can definitely get into trouble here. I added the necessary checks to make sure we don't remove the mov immediate in this case because that would leave the upper bits in some unknown state.

355 ↗(On Diff #90194)

You are correct. I've updated the logic and test cases to make sure the kill flags are removed.

junbuml added inline comments.Mar 2 2017, 10:18 AM
314–315 ↗(On Diff #90352)

If this is needed, it will be good to add comment why this is needed only for mov, not in Copy.

gberry accepted this revision.Mar 2 2017, 10:48 AM

LGTM with a few nits

212 ↗(On Diff #90352)

Typo 'inbetween' -> 'in between'

408 ↗(On Diff #90352)

Typo 'eliminated' -> 'eliminate'. And another before test15.

This revision was automatically updated to reflect the committed changes.
mcrosier marked an inline comment as done.