This is an archive of the discontinued LLVM Phabricator instance.

Remove redundent register mov by improving TwoAddressInstructionPass
ClosedPublic

Authored by wmi on Feb 20 2015, 10:18 PM.

Details

Summary

Hi,

We found a problem in TwoAddressInstructionPass which could generate redundent register mov insns in loop, and proposed a patch to fix it.

Here is the small testcase:

1.c:

int M, total;

void foo() {

int i;
for (i = 0; i < M; i++) {
  total = total + i / 2;
}

}

~/workarea/llvm-r230041/build/bin/clang -O2 -fno-vectorize -fno-unroll-loops -S 1.c

This is the kernel loop in 1.s:

.LBB0_2: # %for.body

  1. =>This Inner Loop Header: Depth=1 movl %edx, %esi movl %ecx, %edx shrl $31, %edx addl %ecx, %edx sarl %edx addl %esi, %edx incl %ecx cmpl %eax, %ecx jl .LBB0_2 --------------------------

The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".

The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body

Predecessors according to CFG: BB#1 BB#2
    %vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
    %vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
    %vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
    %vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
    %vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
    %vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
    %vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
    CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
    %vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
    %vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
    JL_4 <BB#2>, %EFLAGS<imp-use,kill>

Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.

The patch fixed the problem and improved O2 performance of google internal benchmarks by 0.74% on average (The biggest improvement for a benchmark is 5%)

Wei.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi updated this revision to Diff 20453.Feb 20 2015, 10:18 PM
wmi retitled this revision from to Remove redundent register mov by improving TwoAddressInstructionPass .
wmi updated this object.
wmi edited the test plan for this revision. (Show Details)
wmi added reviewers: chandlerc, bob.wilson.
wmi set the repository for this revision to rL LLVM.
wmi added a subscriber: Unknown Object (MLST).

Hi Wei,

Inlined a couple of comments.
The main problem is that I believe my miss the general case here.

Thanks,
-Quentin

lib/CodeGen/TwoAddressInstructionPass.cpp
318

Use the API of MachineRegisterInfo to achieve this.
E.g, MRI has getUniqueVRegDef and use_nodgb_instructions.

629

Have you experiment with several users?
I.e., this code wouldn’t catch this:

// %reg101 = MOV %reg100
// %reg102 = ...
// %reg103 = ADD %reg102, %reg101
// %reg104 = <something> %reg103
// %reg100 = MOV %reg103

Whereas this is fundamentally the same problem.

The bottom line is that this looks like a narrow case of a more general problem.
I’m fine moving incrementally, as long as you commit that you will working on this.

638

I would refactor this code a bit differently.

Something like.
if (useA && useA->isCopy) {

 unsigned DefCopyA = useA->getOperand(0).getReg();
// Where match is the check for isCopy and getOperand(1).getReg() == DefCopyA
 if (match(DefCopyA, defB)
  return false;
 if (match(DefCopyA, defC)
  return true;

}

test/CodeGen/X86/twoaddr-coalesce-3.ll
2

Use FileCheck and CHECK lines pleas.

8

Remove metadata and explain what this function is testing, so that future updates would be easier.

chandlerc edited edge metadata.Feb 23 2015, 6:46 PM

No real comments on the code, Quentin knows it much better than I do. =]

test/CodeGen/X86/twoaddr-coalesce-3.ll
8

In addition, if there is a way to fold this into an existing test, that would be much better. This is usually possible due to using FileCheck, and CHECK-LABEL bracketed assertions.

wmi added a comment.Feb 23 2015, 7:07 PM

Thanks for the review. Will fix according to your comments, and will expand the code to handle the case you mentioned.

To solve the problem generally, I think it is best to implement it in register allocation phase, because it is possible that the choice made in Two Addr Instruction pass may not be the best in register allocation context. gcc removed the regmove pass (it is the correspondence to TwoAddrInst pass) and implemented the heuristic in register allocation, and got better performance. I am collecting testcases for which a better heuristic is necessary. This is a part I like to work on, however, I want to work on it incrementally as you suggest -- solve the common problem in TwoAddrInst pass for the first step.

wmi updated this revision to Diff 20641.Feb 24 2015, 5:02 PM
wmi edited edge metadata.

I rewrite the code to expand the cases that can be handled.

Now for %reg103 = ADD %reg102, %reg101; I check whether there is a reversed copy chain from %reg102 to %reg103 or from %reg101 to %reg103. If there is a reversed copy chain from a src operand to a dest oeprand, that src operand will be choosen to be merged with the dest operand. In this way, the case Quentin suggested like the following will be handled.

%reg101 = MOV %reg100
%reg102 = ...
%reg103 = ADD %reg102, %reg101
%reg104 = <something> %reg103
// %reg100 = MOV %reg103

I still don't use getUniqueVRegDef to replace getSingleDef because getSingleDef can find out the def inside current BB when a use has multiple defs (one in current BB and one in predecessor BB). This is common because the IR in TwoAddrInstruction pass is already out-of-ssa form. getUniqueVRegDef will return NULL for that case.

Thanks,
Wei.

Minor style comments only.

lib/CodeGen/TwoAddressInstructionPass.cpp
314–315

Please use our modern doxygen comment style for new code. (I know a bunch of old code doesn't)

http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments

340–341

It would be good to document your idea for how to handle this more generally in a FIXME here and maybe file a PR to track it?

wmi updated this revision to Diff 20654.Feb 24 2015, 10:25 PM

Address Chandler's comments:

  1. change the comments to doxygen format.
  2. Add a FIXME in the code and file a PR to track the problems that havn't been addressed: http://llvm.org/bugs/show_bug.cgi?id=22689
majnemer added inline comments.
lib/CodeGen/TwoAddressInstructionPass.cpp
316

Please use MachineInstr * instead of MachineInstr*.

318

Variable names are capitalized.

327

nullptr

qcolombet added inline comments.Feb 25 2015, 9:53 AM
lib/CodeGen/TwoAddressInstructionPass.cpp
319

Use def_instructions or def_operands instead.
That way, you wouldn’t have to check for isDef in the loop.

342

I would rather have a set of visited copies or reg instead of a magic constant.
If it turns out to be too expensive, then we can introduce parametrizable magic constant!

345

LLVM idiom would be the opposite:
if (!Def || !…)

return false;
628

nitpick:
"to *hopefully* eliminate an otherwise unavoidable copy.”

We do not have guarantee that copy will indeed be eliminated.

635

For the record, if you are interested into this, look into the register coalescer pass.

test/CodeGen/X86/twoaddr-coalesce-3.ll
44

No need for #0.

As a general comment you'll want to get into the habit of using clang-format on your patches. It'll solve the formatting side of the coding convention changes you'll need to get used to. :)

-eric

wmi updated this revision to Diff 20892.Feb 27 2015, 1:50 PM

Addressed Quentin and Majnemer's comments.

As Eric suggested, I tried clang-format. It is great.

Thanks,
Wei.

Hi Wei,

Looks almost good :).

I should have pay more attention to the tests, sorry about that. Please make them more verbose and do not rely on basic block labels.

Thanks,
-Quentin

test/CodeGen/X86/twoaddr-coalesce-3.ll
23

I do not think we usually rely on basic block labels. Instead, check for branch instructions to know when you cross a basic block boundary.

24

Check the actual chain of computation, i.e., surrounding instruction + operands, to be sure we generate correct code.

wmi updated this revision to Diff 20904.Feb 27 2015, 3:09 PM

Thanks. I rewrited the CHECKs for the test.

qcolombet added inline comments.Feb 27 2015, 3:18 PM
test/CodeGen/X86/twoaddr-coalesce-3.ll
23

Like I said, you shouldn’t use basic block labels, but rely on branch instruction on other block.
E.g, on my machine the label looks like this: LBB0_…, i.e., no leading ‘.’.

So, what I was saying was to use something like:

; End of the first block.
CHECK: jp
; We enter the loop
CHECK loop boby
; The loop body is done
CHECK jp

wmi added inline comments.Feb 27 2015, 3:35 PM
lib/CodeGen/TwoAddressInstructionPass.cpp
314–315

Fixed.

340–341

FIXME added. File a PR here: http://llvm.org/bugs/show_bug.cgi?id=22689

test/CodeGen/X86/twoaddr-coalesce-3.ll
23

Yes, I understand your concern. However using branch instruction in previous block as boundary will introduce extra code in loop preheader, including mov. Like the following movl total(%rip), ..., I can make the test right, but it introduces extra complexity.

jle .LBB0_4

BB#1: # %for.body.lr.ph

xorl %edx, %edx
movl total(%rip), %ecx
.align 16, 0x90
.LBB0_2: # %for.body

  1. =>This Inner Loop Header: Depth=1

movl %edx, %esi
shrl $31, %esi
addl %edx, %esi
sarl %esi
addl %esi, %ecx
incl %edx
cmpl %eax, %edx
jl .LBB0_2

Do you like CHECK: [[LOOP:[.]?LBB0_[0-9]+]]; ? If not, I will follow your way to use last branch as the boundary.

qcolombet added inline comments.Feb 27 2015, 3:48 PM
test/CodeGen/X86/twoaddr-coalesce-3.ll
23

In that case, I would use 'movl total' as an anchor.

Another way to avoid this label problem is to specify a -mtriple instead of just a -march. Anyhow, I am not sure labels are stable between debug and assert builds. Same thing for the assembly comments, i.e., I am not sure your output will contain: # %for.body.

The bottom line is that I really think you shouldn't rely on labels, but I may be wrong of course!

chandlerc added inline comments.Feb 27 2015, 4:01 PM
test/CodeGen/X86/twoaddr-coalesce-3.ll
23

FWIW, I agree that relying on specific label naming isn't a great strategy. However, I think for assembly tests, the instruction comments are very valuable and we should always keep them stable. For example, all of the shuffle testing using the comments rather than checking magic numbers. So I would suggest

[[LOOP1:^[.\w]+:]]: # %for.body

Essentially, match the specific x86 pattern for labels, any label, and the comment to identify which one.

wmi updated this revision to Diff 20912.Feb 27 2015, 4:29 PM

Follow Chandler's suggestion for the testcase since it is helpful for testcase readability. If it somehow doesn't work for certain platforms, I am ready to fix it using mov total(%rip), ... as an anchor.

Thanks,
Wei.

qcolombet added inline comments.Feb 27 2015, 4:30 PM
test/CodeGen/X86/twoaddr-coalesce-3.ll
23

Agreed.

I was not sure the comment were printed in release mode, but now that I think about this it happens only if the would pipeline is in release mode.

qcolombet accepted this revision.Feb 27 2015, 4:33 PM
qcolombet added a reviewer: qcolombet.

Thanks Wei.

LGTM.

Q.

This revision is now accepted and ready to land.Feb 27 2015, 4:33 PM
wmi added a comment.Feb 27 2015, 4:39 PM

Thanks for your patience! The comment is helpful!

Wei.

echristo closed this revision.Mar 3 2015, 2:05 PM

Applied thusly:

dzur:~/sources/llvm> git svn dcommit
Committing to https://llvm.org/svn/llvm-project/llvm/trunk ...
A test/CodeGen/X86/twoaddr-coalesce-3.ll
M lib/CodeGen/TwoAddressInstructionPass.cpp
Committed r231148

-eric