This is an archive of the discontinued LLVM Phabricator instance.

[RegisterCoalescing] Remove partial redundent copy
ClosedPublic

Authored by wmi on Jan 11 2017, 3:43 PM.

Details

Summary

The patch is to solve the performance problem described in PR27827.

Register coalescing sometimes cannot remove a copy because of interference. But if we can move the copy a little bit, like to its predecessors, we may find out that some of the copies moved to the predecessors are redundent. The patch handle a typical case like this:

BB0: BB1:

A = B    ...
/     \    /
      BB2:
         B = A

If B = A in BB2 has a reversed copy in BB0, B = A will be redundent if the execution goes through the path from BB0 to BB2. We may remove the so called partial redundency by moving B = A to BB1.

The correctness requirement for such transformation is:

  • A in B = A in BB2 is defined by the reversed copy A = B in BB0.
  • No B is referenced from the start of BB2 to B = A.
  • No B is defined from A = B to the end of BB0.

It is ok for BB0 to have multiple successors, but we require that BB1 has only one successor so we always move a copy to a colder place.

I got 1~2% improvement on an internal benchmark. I tested spec2000 and the performance was flat.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi updated this revision to Diff 84037.Jan 11 2017, 3:43 PM
wmi retitled this revision from to [RegisterCoalescing] Remove partial redundent copy.
wmi updated this object.
wmi added reviewers: qcolombet, MatzeB.
wmi set the repository for this revision to rL LLVM.
wmi added subscribers: llvm-commits, davidxl.
grandinj added inline comments.
lib/CodeGen/RegisterCoalescer.cpp
194

removePartialRedundency->removePartialRedundancy

870

redundent ->redundant

test/CodeGen/X86/pre-coalesce.ll
3

redundent ->redundant

qcolombet edited edge metadata.Jan 12 2017, 2:50 PM

Hi Wei,

Could you write a .mir test instead of a .ll one?
You should be able to get your input with llc -stop-before=regalloc or something like that.

In the assumption I would add that because of:

we require that BB1 has only one successor so we always move a copy to a colder place.

and

No B is referenced from the start of BB2 to B = A.

We know that B is not defined at the end of BB1, which is otherwise a requirement.

I'll have a look at the patch later, unless Matthias beats me at it.

Cheers,
-Quentin

MatzeB edited edge metadata.Jan 12 2017, 5:39 PM

Looks sensible to me. Some points:

  • I'd also like to see a .mir test instead of the .ll one.
  • What happens in this case: Will we loop endlessly because B=A is removed and re-added to BB1?

BB0:

A = B;
CondJmp BBX, BB1

BB1:

B = A;
CondJmp BBY, BB1
lib/CodeGen/RegisterCoalescer.cpp
902

Redundancy

903

Use MachineInstr &CopyMI to indicate it cannot be nullptr.

908

Use MachineBasicBlock &MBB.

935

I'd prefer to spell out the types instead of auto as that is friendlier to people reading the code.
Same with the inner VNI loop.

wmi marked 5 inline comments as done.Jan 13 2017, 12:48 PM

Thanks for the review.

I'd also like to see a .mir test instead of the .ll one.

A mir test is Added.

What happens in this case: Will we loop endlessly because B=A is removed and re-added to BB1?

It is guaranteed by the requirement that BB1 has only one successor. In this way, we can only move the copy from a hotter to a colder place, so no endless loop will happen. I add this correctness precondition in the comment.

Thanks,
Wei.

lib/CodeGen/RegisterCoalescer.cpp
903

Fixed.

908

Fixed.

wmi updated this revision to Diff 84356.Jan 13 2017, 12:49 PM
wmi edited edge metadata.

Address Noel, Quentin and Matthias's comments.

qcolombet accepted this revision.Jan 13 2017, 4:10 PM
qcolombet edited edge metadata.

Hi Wei,

LGTM.
Couple of nitpicks below before you push (no need for additional review).

Cheers,
-Quentin

lib/CodeGen/RegisterCoalescer.cpp
991

typo: hotter

1001

We could move the debug statements in the next if and add a else condition.
Would be more natural IMHO.

This revision is now accepted and ready to land.Jan 13 2017, 4:10 PM
This revision was automatically updated to reflect the committed changes.

Hi,

in this code where you erase the instruction:

/ Remove CopyMI.

SlotIndex EndPoint = IntB.Query(CopyIdx.getRegSlot()).endPoint();
LIS->removeVRegDefAt(IntB, CopyIdx.getRegSlot());
LIS->RemoveMachineInstrFromMaps(CopyMI);
CopyMI.eraseFromParent();

shouldn't we also add it to the list of erased instructions (ErasedInstr) to avoid visiting it while looping over the work list?