This is an archive of the discontinued LLVM Phabricator instance.

[Peephole] Advanced rewriting of copies to avoid cross register banks copies.
AbandonedPublic

Authored by qcolombet on Jun 10 2014, 9:56 AM.

Details

Reviewers
None
Summary

Hi,

The proposed patch extends the peephole optimization introduced in r190713 to allow even more cross register banks copies rewriting.
As it is, the extension may not be that useful, but I thought it may be easier to reviewer than the complete solution (see Motivating Examples and What Is Next?).

Thanks for your feedback.

  • Context **

In r190713 we introduced a peephole optimization that produces register-coalescer friendly copies when possible.
This optimization basically looks through a chain of copies to find a more suitable source for a cross register banks copy.
E.g.,
b = copy A <-- cross-bank copy

C = copy b <-- cross-bank copy

Is rewritten into:
b = copy A <-- cross-bank copy

C = copy A <-- same-bank copy

However, there are several instructions that are lowered via cross-bank copies that this optimization fails to optimize.
E.g.
b = insert_subreg e, A, sub0 <-- cross-bank copy

C = copy b.sub0 <-- cross-bank copy

Ideally, we would like to produce the following code:
b = insert_subreg e, A, sub0 <-- cross-bank copy

C = copy A <-- same-bank copy

  • Proposed Patch **

The proposed patch taught the existing cross-bank copy optimization how to deal with the instructions that generate cross-bank copies, i.e., insert_subreg, extract_subreg, reg_sequence, and subreg_to_reg.
We introduce a new helper class for that: ValueTracker.
This class implements the logic to look through the copy related instructions and get the related source.

For now, the advanced copy rewriting is disabled by default as it is not sufficient to solve the motivating examples and I had a hard time to come up with a test case because of that (see motivating example section). However, you can give it a try on your favorite platform with -disable-adv-copy-opt=false and if it helps, I would be happy to add a test case!

I have also checked that the introduced refactoring does not change the current code gen through the entire llvm-testsuite + SPECs, when the extension is disable, for both x86_64 and arm64 with both O3 and Os.

  • Motivating Examples **

Let us consider a couple of examples.

  • armv7s *

define <2 x i32> @testuvec(<2 x i32> %A, <2 x i32> %B) nounwind {
entry:

%div = udiv <2 x i32> %A, %B
ret <2 x i32> %div

}

We would like the following code to be generated on swift (which has a udiv instruction):
%A is in r0, r1
%B is in r2, r3
udiv r0, r2, r0
udiv r1, r3, r1
bx lr

However, we generate a far more complicated sequence of instructions because we do not recognize that we are moving r0, r1, etc, through d registers:
vmov d1, r0, r1
vmov d0, r2, r3
vmov r1, s2
vmov r0, s0
vmov r2, s3
udiv r0, r1, r0
vmov r1, s1
udiv r1, r2, r1
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr

  • AArch64 *

define i64 @test2(i128 %arg) {

%vec = bitcast i128 %arg to <2 x i64>
%scalar = extractelement <2 x i64> %vec, i32 0
ret i64 %scalar

}

One would expect that this code :
%arg is in x0, x1
we simply return x0
ret

However, we generate a less straight forward sequence:
fmov d0, x0
ins.d v0[1], x1
fmov x0, d0
ret

The proposed patch is not sufficient to catch those cases yet, as they use target specific instructions to implement the insert_subreg, extract_subreg logic. However, if the lowering was using the generic instructions, this optimization would have helped. See "What Is Next?” for how I plan to tackle that.

  • Testcase ?! **

Since the current patch does not yet support the motivating examples, I do not have something reasonably small that exercises the new path. Thus, I have disabled it by default until we have the full support.
Again, if you think that this optimization can help some of the cases you are seeing, give it a try, and propose your test case!

  • What Is Next? **
  • Teach the optimization about target specific nodes, so that we can handle the motivating examples.

The idea would be to add new tablegen properties so that we would be able to specify that an instruction is similar to a insert_subreg instruction, etc., the same way we did with bitcast (though a little bit more complicated).

  • Enable the optimization by default or provide a target hook to control it.

Thanks,
-Quentin

Diff Detail

Event Timeline

qcolombet updated this revision to Diff 10285.Jun 10 2014, 9:56 AM
qcolombet retitled this revision from to [Peephole] Advanced rewriting of copies to avoid cross register banks copies..
qcolombet updated this object.
qcolombet edited the test plan for this revision. (Show Details)
qcolombet added a subscriber: Unknown Object (MLST).
qcolombet updated this revision to Diff 10290.Jun 10 2014, 11:52 AM

Upload the latest patch (the previous version had a typo in the handling of SubregToReg).

qcolombet updated this revision to Diff 10516.Jun 17 2014, 3:09 PM

REG_SEQUENCEs are variadic instructions. Thus, we should use MachineInstr::getNumOperands instead of MCInstrDesc::getNumOperands to access the actual number of operands.

Ping^2.

-Quentin

hfinkel added inline comments.
lib/CodeGen/PeepholeOptimizer.cpp
148

follow -> follows

566

What does not work for physical registers?

789

Should this be an assert?

803

Should we also bail is hasUnmodeledSideEffects?

986

I think you might as well add isExtractSubreg().

qcolombet added inline comments.Jul 1 2014, 12:55 AM
lib/CodeGen/PeepholeOptimizer.cpp
566

You do not want to extend the live-ranges of physical registers as they add constraints to the register allocator.
Moreover, if you allow to extend the live-range of a physical register, unlike SSA virtual register, you will have to check that you do not redefine that register.

789

Good point.
If someone adds operand to a generic copy, I think we can yell at them :).

803

Good point.
This was not part of the previous implementation, but that would be safer.

986

Agree.
Though, I did not want to do that in that patch.
I can either do it before or after this patch lands.
Any preference?

With the agreed-to changes, LGTM.

lib/CodeGen/PeepholeOptimizer.cpp
566

Okay; please add this explanation as a comment here.

986

I don't have a strong preference. Personally, I'd do it first.

qcolombet added inline comments.Jul 1 2014, 6:03 AM
lib/CodeGen/PeepholeOptimizer.cpp
986

I thought there were plenty of users for this accessor but turns out all the users check the opcode of a SDNode whereas I am looking at MachineInstr. Thus, this patch is the first user of isExtractSubreg.
So I have updated the patch to include that change.

qcolombet updated this revision to Diff 10997.Jul 1 2014, 6:05 AM

Update according Hal's advices.

Thanks Hal for the feedback BTW.

This is r212100.

Thanks,
-Quentin

qcolombet abandoned this revision.Jul 1 2014, 8:04 AM

Didn't find another way to close this instance!