This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
15
TwoAddressInstructionPass.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
11
twoaddr-coalesce-3.ll

Differential D7806

Remove redundent register mov by improving TwoAddressInstructionPass
ClosedPublic

Authored by wmi on Feb 20 2015, 10:18 PM.

Download Raw Diff

Details

Reviewers

qcolombet
bob.wilson
chandlerc

Summary

Hi,

We found a problem in TwoAddressInstructionPass which could generate redundent register mov insns in loop, and proposed a patch to fix it.

Here is the small testcase:

1.c:

int M, total;

void foo() {

int i;
for (i = 0; i < M; i++) {
  total = total + i / 2;
}

}

~/workarea/llvm-r230041/build/bin/clang -O2 -fno-vectorize -fno-unroll-loops -S 1.c

This is the kernel loop in 1.s:

.LBB0_2: # %for.body

=>This Inner Loop Header: Depth=1 movl %edx, %esi movl %ecx, %edx shrl $31, %edx addl %ecx, %edx sarl %edx addl %esi, %edx incl %ecx cmpl %eax, %ecx jl .LBB0_2 --------------------------

The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".

The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body

Predecessors according to CFG: BB#1 BB#2
    %vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
    %vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
    %vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
    %vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
    %vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
    %vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
    %vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
    CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
    %vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
    %vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
    JL_4 <BB#2>, %EFLAGS<imp-use,kill>

Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.

The patch fixed the problem and improved O2 performance of google internal benchmarks by 0.74% on average (The biggest improvement for a benchmark is 5%)

Wei.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi updated this revision to Diff 20453.Feb 20 2015, 10:18 PM

wmi retitled this revision from to Remove redundent register mov by improving TwoAddressInstructionPass .

wmi updated this object.

wmi edited the test plan for this revision. (Show Details)

wmi added reviewers: chandlerc, bob.wilson.

wmi set the repository for this revision to rL LLVM.

wmi added a subscriber: Unknown Object (MLST).

Hi Wei,

Inlined a couple of comments.
The main problem is that I believe my miss the general case here.

Thanks,
-Quentin

lib/CodeGen/TwoAddressInstructionPass.cpp
318	Use the API of MachineRegisterInfo to achieve this. E.g, MRI has getUniqueVRegDef and use_nodgb_instructions.
626	Have you experiment with several users? I.e., this code wouldn’t catch this: // %reg101 = MOV %reg100 // %reg102 = ... // %reg103 = ADD %reg102, %reg101 // %reg104 = <something> %reg103 // %reg100 = MOV %reg103 Whereas this is fundamentally the same problem. The bottom line is that this looks like a narrow case of a more general problem. I’m fine moving incrementally, as long as you commit that you will working on this.
635	I would refactor this code a bit differently. Something like. if (useA && useA->isCopy) { unsigned DefCopyA = useA->getOperand(0).getReg(); // Where match is the check for isCopy and getOperand(1).getReg() == DefCopyA if (match(DefCopyA, defB) return false; if (match(DefCopyA, defC) return true; }
test/CodeGen/X86/twoaddr-coalesce-3.ll
2	Use FileCheck and CHECK lines pleas.
8	Remove metadata and explain what this function is testing, so that future updates would be easier.

No real comments on the code, Quentin knows it much better than I do. =]

test/CodeGen/X86/twoaddr-coalesce-3.ll
8	In addition, if there is a way to fold this into an existing test, that would be much better. This is usually possible due to using FileCheck, and CHECK-LABEL bracketed assertions.

Thanks for the review. Will fix according to your comments, and will expand the code to handle the case you mentioned.

To solve the problem generally, I think it is best to implement it in register allocation phase, because it is possible that the choice made in Two Addr Instruction pass may not be the best in register allocation context. gcc removed the regmove pass (it is the correspondence to TwoAddrInst pass) and implemented the heuristic in register allocation, and got better performance. I am collecting testcases for which a better heuristic is necessary. This is a part I like to work on, however, I want to work on it incrementally as you suggest -- solve the common problem in TwoAddrInst pass for the first step.

I rewrite the code to expand the cases that can be handled.

Now for %reg103 = ADD %reg102, %reg101; I check whether there is a reversed copy chain from %reg102 to %reg103 or from %reg101 to %reg103. If there is a reversed copy chain from a src operand to a dest oeprand, that src operand will be choosen to be merged with the dest operand. In this way, the case Quentin suggested like the following will be handled.

%reg101 = MOV %reg100
%reg102 = ...
%reg103 = ADD %reg102, %reg101
%reg104 = <something> %reg103
// %reg100 = MOV %reg103

I still don't use getUniqueVRegDef to replace getSingleDef because getSingleDef can find out the def inside current BB when a use has multiple defs (one in current BB and one in predecessor BB). This is common because the IR in TwoAddrInstruction pass is already out-of-ssa form. getUniqueVRegDef will return NULL for that case.

Thanks,
Wei.

Minor style comments only.

lib/CodeGen/TwoAddressInstructionPass.cpp
314–315	Please use our modern doxygen comment style for new code. (I know a bunch of old code doesn't) http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments
340–341	It would be good to document your idea for how to handle this more generally in a FIXME here and maybe file a PR to track it?

Address Chandler's comments:

change the comments to doxygen format.
Add a FIXME in the code and file a PR to track the problems that havn't been addressed: http://llvm.org/bugs/show_bug.cgi?id=22689

majnemer added a subscriber: majnemer.Feb 24 2015, 11:03 PM

majnemer added inline comments.

lib/CodeGen/TwoAddressInstructionPass.cpp
316	Please use `MachineInstr ` instead of `MachineInstr`.
318	Variable names are capitalized.
327	nullptr

qcolombet added inline comments.Feb 25 2015, 9:53 AM

lib/CodeGen/TwoAddressInstructionPass.cpp
319	Use def_instructions or def_operands instead. That way, you wouldn’t have to check for isDef in the loop.
342	I would rather have a set of visited copies or reg instead of a magic constant. If it turns out to be too expensive, then we can introduce parametrizable magic constant!
345	LLVM idiom would be the opposite: if (!Def \|\| !…) return false;
625	nitpick: "to hopefully eliminate an otherwise unavoidable copy.” We do not have guarantee that copy will indeed be eliminated.
632	For the record, if you are interested into this, look into the register coalescer pass.
test/CodeGen/X86/twoaddr-coalesce-3.ll
45	No need for #0.

As a general comment you'll want to get into the habit of using clang-format on your patches. It'll solve the formatting side of the coding convention changes you'll need to get used to. :)

-eric

Addressed Quentin and Majnemer's comments.

As Eric suggested, I tried clang-format. It is great.

Thanks,
Wei.

Hi Wei,

Looks almost good :).

I should have pay more attention to the tests, sorry about that. Please make them more verbose and do not rely on basic block labels.

Thanks,
-Quentin

test/CodeGen/X86/twoaddr-coalesce-3.ll
22	I do not think we usually rely on basic block labels. Instead, check for branch instructions to know when you cross a basic block boundary.
23	Check the actual chain of computation, i.e., surrounding instruction + operands, to be sure we generate correct code.

Thanks. I rewrited the CHECKs for the test.

qcolombet added inline comments.Feb 27 2015, 3:18 PM

test/CodeGen/X86/twoaddr-coalesce-3.ll
23	Like I said, you shouldn’t use basic block labels, but rely on branch instruction on other block. E.g, on my machine the label looks like this: LBB0_…, i.e., no leading ‘.’. So, what I was saying was to use something like: ; End of the first block. CHECK: jp ; We enter the loop CHECK loop boby ; The loop body is done CHECK jp

wmi added inline comments.Feb 27 2015, 3:35 PM

lib/CodeGen/TwoAddressInstructionPass.cpp
314–315	Fixed.
340–341	FIXME added. File a PR here: http://llvm.org/bugs/show_bug.cgi?id=22689
test/CodeGen/X86/twoaddr-coalesce-3.ll
23	Yes, I understand your concern. However using branch instruction in previous block as boundary will introduce extra code in loop preheader, including mov. Like the following movl total(%rip), ..., I can make the test right, but it introduces extra complexity. jle .LBB0_4 BB#1: # %for.body.lr.ph xorl %edx, %edx movl total(%rip), %ecx .align 16, 0x90 .LBB0_2: # %for.body =>This Inner Loop Header: Depth=1 movl %edx, %esi shrl $31, %esi addl %edx, %esi sarl %esi addl %esi, %ecx incl %edx cmpl %eax, %edx jl .LBB0_2 Do you like CHECK: [[LOOP:[.]?LBB0_[0-9]+]]; ? If not, I will follow your way to use last branch as the boundary.

qcolombet added inline comments.Feb 27 2015, 3:48 PM

test/CodeGen/X86/twoaddr-coalesce-3.ll
23	In that case, I would use 'movl total' as an anchor. Another way to avoid this label problem is to specify a -mtriple instead of just a -march. Anyhow, I am not sure labels are stable between debug and assert builds. Same thing for the assembly comments, i.e., I am not sure your output will contain: # %for.body. The bottom line is that I really think you shouldn't rely on labels, but I may be wrong of course!

chandlerc added inline comments.Feb 27 2015, 4:01 PM

test/CodeGen/X86/twoaddr-coalesce-3.ll
23	FWIW, I agree that relying on specific label naming isn't a great strategy. However, I think for assembly tests, the instruction comments are very valuable and we should always keep them stable. For example, all of the shuffle testing using the comments rather than checking magic numbers. So I would suggest [[LOOP1:^[.\w]+:]]: # %for.body Essentially, match the specific x86 pattern for labels, any label, and the comment to identify which one.

Follow Chandler's suggestion for the testcase since it is helpful for testcase readability. If it somehow doesn't work for certain platforms, I am ready to fix it using mov total(％rip), ... as an anchor.

Thanks,
Wei.

qcolombet added inline comments.Feb 27 2015, 4:30 PM

test/CodeGen/X86/twoaddr-coalesce-3.ll
23	Agreed. I was not sure the comment were printed in release mode, but now that I think about this it happens only if the would pipeline is in release mode.

Thanks Wei.

LGTM.

This revision is now accepted and ready to land.Feb 27 2015, 4:33 PM

Thanks for your patience! The comment is helpful!

Wei.

Applied thusly:

dzur:~/sources/llvm> git svn dcommit
Committing to https://llvm.org/svn/llvm-project/llvm/trunk ...
A test/CodeGen/X86/twoaddr-coalesce-3.ll
M lib/CodeGen/TwoAddressInstructionPass.cpp
Committed r231148

-eric

Revision Contents

Path

Size

lib/

CodeGen/

TwoAddressInstructionPass.cpp

62 lines

test/

CodeGen/

X86/

twoaddr-coalesce-3.ll

78 lines

Diff 20892

lib/CodeGen/TwoAddressInstructionPass.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	class TwoAddressInstructionPass : public MachineFunctionPass {
// DstRegMap - A map from virtual registers to physical registers which are		// DstRegMap - A map from virtual registers to physical registers which are
// likely targets to be coalesced to due to copies to physical registers from		// likely targets to be coalesced to due to copies to physical registers from
// virtual registers. e.g. r1 = move v1024.		// virtual registers. e.g. r1 = move v1024.
DenseMap<unsigned, unsigned> DstRegMap;		DenseMap<unsigned, unsigned> DstRegMap;

bool sink3AddrInstruction(MachineInstr *MI, unsigned Reg,		bool sink3AddrInstruction(MachineInstr *MI, unsigned Reg,
MachineBasicBlock::iterator OldPos);		MachineBasicBlock::iterator OldPos);

		bool isRevCopyChain(unsigned FromReg, unsigned ToReg, int Maxlen);

bool noUseAfterLastDef(unsigned Reg, unsigned Dist, unsigned &LastDef);		bool noUseAfterLastDef(unsigned Reg, unsigned Dist, unsigned &LastDef);

bool isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,		bool isProfitableToCommute(unsigned regA, unsigned regB, unsigned regC,
MachineInstr *MI, unsigned Dist);		MachineInstr *MI, unsigned Dist);

bool commuteInstruction(MachineBasicBlock::iterator &mi,		bool commuteInstruction(MachineBasicBlock::iterator &mi,
unsigned RegB, unsigned RegC, unsigned Dist);		unsigned RegB, unsigned RegC, unsigned Dist);

▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	sink3AddrInstruction(MachineInstr *MI, unsigned SavedReg,

if (LIS)		if (LIS)
LIS->handleMove(MI);		LIS->handleMove(MI);

++Num3AddrSunk;		++Num3AddrSunk;
return true;		return true;
}		}

		/// getSingleDef -- return the MachineInstr* if it is the single def of the Reg
		/// in current BB.
		chandlercUnsubmitted Not Done Reply Inline Actions Please use our modern doxygen comment style for new code. (I know a bunch of old code doesn't) http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments chandlerc: Please use our modern doxygen comment style for new code. (I know a bunch of old code doesn't)…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		static MachineInstr getSingleDef(unsigned Reg, MachineBasicBlock BB,
		majnemerUnsubmitted Not Done Reply Inline Actions Please use `MachineInstr ` instead of `MachineInstr`. majnemer: Please use `MachineInstr ` instead of `MachineInstr`.
		const MachineRegisterInfo *MRI) {
		MachineInstr *Ret = nullptr;
		qcolombetUnsubmitted Not Done Reply Inline Actions Use the API of MachineRegisterInfo to achieve this. E.g, MRI has getUniqueVRegDef and use_nodgb_instructions. qcolombet: Use the API of MachineRegisterInfo to achieve this. E.g, MRI has getUniqueVRegDef and…
		majnemerUnsubmitted Not Done Reply Inline Actions Variable names are capitalized. majnemer: Variable names are capitalized.
		for (MachineInstr &DefMI : MRI->def_instructions(Reg)) {
		qcolombetUnsubmitted Not Done Reply Inline Actions Use def_instructions or def_operands instead. That way, you wouldn’t have to check for isDef in the loop. qcolombet: Use def_instructions or def_operands instead. That way, you wouldn’t have to check for isDef in…
		if (DefMI.getParent() != BB \|\| DefMI.isDebugValue())
		continue;
		if (!Ret)
		Ret = &DefMI;
		else if (Ret != &DefMI)
		return nullptr;
		}
		return Ret;
		majnemerUnsubmitted Not Done Reply Inline Actions nullptr majnemer: nullptr
		}

		/// Check if there is a reversed copy chain from FromReg to ToReg:
		/// %Tmp1 = copy %Tmp2;
		/// %FromReg = copy %Tmp1;
		/// %ToReg = add %FromReg ...
		/// %Tmp2 = copy %ToReg;
		/// MaxLen specifies the maximum length of the copy chain the func
		/// can walk through.
		bool TwoAddressInstructionPass::isRevCopyChain(unsigned FromReg, unsigned ToReg,
		int Maxlen) {
		unsigned TmpReg = FromReg;
		for (int i = 0; i < Maxlen; i++) {
		MachineInstr *Def = getSingleDef(TmpReg, MBB, MRI);
		chandlercUnsubmitted Not Done Reply Inline Actions It would be good to document your idea for how to handle this more generally in a FIXME here and maybe file a PR to track it? chandlerc: It would be good to document your idea for how to handle this more generally in a FIXME here…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions FIXME added. File a PR here: http://llvm.org/bugs/show_bug.cgi?id=22689 wmi: FIXME added. File a PR here: http://llvm.org/bugs/show_bug.cgi?id=22689
		if (!Def \|\| !Def->isCopy())
		qcolombetUnsubmitted Not Done Reply Inline Actions I would rather have a set of visited copies or reg instead of a magic constant. If it turns out to be too expensive, then we can introduce parametrizable magic constant! qcolombet: I would rather have a set of visited copies or reg instead of a magic constant. If it turns out…
		return false;

		TmpReg = Def->getOperand(1).getReg();
		qcolombetUnsubmitted Not Done Reply Inline Actions LLVM idiom would be the opposite: if (!Def \|\| !…) return false; qcolombet: LLVM idiom would be the opposite: if (!Def \|\| !…) return false;

		if (TmpReg == ToReg)
		return true;
		}
		return false;
		}

/// noUseAfterLastDef - Return true if there are no intervening uses between the		/// noUseAfterLastDef - Return true if there are no intervening uses between the
/// last instruction in the MBB that defines the specified register and the		/// last instruction in the MBB that defines the specified register and the
/// two-address instruction which is being processed. It also returns the last		/// two-address instruction which is being processed. It also returns the last
/// def location by reference		/// def location by reference
bool TwoAddressInstructionPass::noUseAfterLastDef(unsigned Reg, unsigned Dist,		bool TwoAddressInstructionPass::noUseAfterLastDef(unsigned Reg, unsigned Dist,
unsigned &LastDef) {		unsigned &LastDef) {
LastDef = 0;		LastDef = 0;
unsigned LastUse = Dist;		unsigned LastUse = Dist;
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	if (!noUseAfterLastDef(regC, Dist, LastDefC))
return false;		return false;

// If there is a use of regB between its last def (could be livein) and this		// If there is a use of regB between its last def (could be livein) and this
// instruction, then go ahead and make this transformation.		// instruction, then go ahead and make this transformation.
unsigned LastDefB = 0;		unsigned LastDefB = 0;
if (!noUseAfterLastDef(regB, Dist, LastDefB))		if (!noUseAfterLastDef(regB, Dist, LastDefB))
return true;		return true;

		// Look for situation like this:
		// %reg101 = MOV %reg100
		// %reg102 = ...
		// %reg103 = ADD %reg102, %reg101
		// ... = %reg103 ...
		// %reg100 = MOV %reg103
		// If there is a reversed copy chain from reg101 to reg103, commute the ADD
		// to eliminate an otherwise unavoidable copy.
		qcolombetUnsubmitted Not Done Reply Inline Actions nitpick: "to hopefully eliminate an otherwise unavoidable copy.” We do not have guarantee that copy will indeed be eliminated. qcolombet: nitpick: "to hopefully eliminate an otherwise unavoidable copy.” We do not have guarantee…
		// FIXME:
		qcolombetUnsubmitted Not Done Reply Inline Actions Have you experiment with several users? I.e., this code wouldn’t catch this: // %reg101 = MOV %reg100 // %reg102 = ... // %reg103 = ADD %reg102, %reg101 // %reg104 = <something> %reg103 // %reg100 = MOV %reg103 Whereas this is fundamentally the same problem. The bottom line is that this looks like a narrow case of a more general problem. I’m fine moving incrementally, as long as you commit that you will working on this. qcolombet: Have you experiment with several users? I.e., this code wouldn’t catch this: // %reg101 = MOV…
		// We can extend the logic further: If an pair of operands in an insn has
		// been merged, the insn could be regarded as a virtual copy, and the virtual
		// copy could also be used to construct a copy chain.
		// To more generally minimize register copies, ideally the logic of two addr
		// instruction pass should be integrated with register allocation pass where
		// interference graph is available.
		qcolombetUnsubmitted Not Done Reply Inline Actions For the record, if you are interested into this, look into the register coalescer pass. qcolombet: For the record, if you are interested into this, look into the register coalescer pass.
		if (isRevCopyChain(regC, regA, 3))
		return true;

		qcolombetUnsubmitted Not Done Reply Inline Actions I would refactor this code a bit differently. Something like. if (useA && useA->isCopy) { unsigned DefCopyA = useA->getOperand(0).getReg(); // Where match is the check for isCopy and getOperand(1).getReg() == DefCopyA if (match(DefCopyA, defB) return false; if (match(DefCopyA, defC) return true; } qcolombet: I would refactor this code a bit differently. Something like. if (useA && useA->isCopy) {…
		if (isRevCopyChain(regB, regA, 3))
		return false;

// Since there are no intervening uses for both registers, then commute		// Since there are no intervening uses for both registers, then commute
// if the def of regC is closer. Its live interval is shorter.		// if the def of regC is closer. Its live interval is shorter.
return LastDefB && LastDefC && LastDefC > LastDefB;		return LastDefB && LastDefC && LastDefC > LastDefB;
}		}

/// commuteInstruction - Commute a two-address instruction and update the basic		/// commuteInstruction - Commute a two-address instruction and update the basic
/// block, distance map, and live variables if needed. Return true if it is		/// block, distance map, and live variables if needed. Return true if it is
/// successful.		/// successful.
▲ Show 20 Lines • Show All 1,132 Lines • Show Last 20 Lines

test/CodeGen/X86/twoaddr-coalesce-3.ll

				; RUN: llc < %s -march=x86-64 \| FileCheck %s
				; This test is to ensure TwoAddrInstruction pass choose the proper operands to merge and
				qcolombetUnsubmitted Not Done Reply Inline Actions Use FileCheck and CHECK lines pleas. qcolombet: Use FileCheck and CHECK lines pleas.
				; generate less mov insns.

				@M = common global i32 0, align 4
				@total = common global i32 0, align 4
				@g = common global i32 0, align 4

				qcolombetUnsubmitted Not Done Reply Inline Actions Remove metadata and explain what this function is testing, so that future updates would be easier. qcolombet: Remove metadata and explain what this function is testing, so that future updates would be…
				chandlercUnsubmitted Not Done Reply Inline Actions In addition, if there is a way to fold this into an existing test, that would be much better. This is usually possible due to using FileCheck, and CHECK-LABEL bracketed assertions. chandlerc: In addition, if there is a way to fold this into an existing test, that would be much better.
				; Function Attrs: nounwind uwtable
				define void @foo() {
				entry:
				%0 = load i32* @M, align 4
				%cmp3 = icmp sgt i32 %0, 0
				br i1 %cmp3, label %for.body.lr.ph, label %for.end

				for.body.lr.ph: ; preds = %entry
				%total.promoted = load i32* @total, align 4
				br label %for.body

				; Check that only one mov will be generated in the kernel loop.
				; CHECK-LABEL: foo:
				; CHECK: .LBB0_2:
				qcolombetUnsubmitted Not Done Reply Inline Actions I do not think we usually rely on basic block labels. Instead, check for branch instructions to know when you cross a basic block boundary. qcolombet: I do not think we usually rely on basic block labels. Instead, check for branch instructions…
				; CHECK: mov
				qcolombetUnsubmitted Not Done Reply Inline Actions Check the actual chain of computation, i.e., surrounding instruction + operands, to be sure we generate correct code. qcolombet: Check the actual chain of computation, i.e., surrounding instruction + operands, to be sure we…
				qcolombetUnsubmitted Not Done Reply Inline Actions Like I said, you shouldn’t use basic block labels, but rely on branch instruction on other block. E.g, on my machine the label looks like this: LBB0_…, i.e., no leading ‘.’. So, what I was saying was to use something like: ; End of the first block. CHECK: jp ; We enter the loop CHECK loop boby ; The loop body is done CHECK jp qcolombet: Like I said, you shouldn’t use basic block labels, but rely on branch instruction on other…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, I understand your concern. However using branch instruction in previous block as boundary will introduce extra code in loop preheader, including mov. Like the following movl total(%rip), ..., I can make the test right, but it introduces extra complexity. jle .LBB0_4 BB#1: # %for.body.lr.ph xorl %edx, %edx movl total(%rip), %ecx .align 16, 0x90 .LBB0_2: # %for.body =>This Inner Loop Header: Depth=1 movl %edx, %esi shrl $31, %esi addl %edx, %esi sarl %esi addl %esi, %ecx incl %edx cmpl %eax, %edx jl .LBB0_2 Do you like CHECK: [[LOOP:[.]?LBB0_[0-9]+]]; ? If not, I will follow your way to use last branch as the boundary. wmi: Yes, I understand your concern. However using branch instruction in previous block as boundary…
				qcolombetUnsubmitted Not Done Reply Inline Actions In that case, I would use 'movl total' as an anchor. Another way to avoid this label problem is to specify a -mtriple instead of just a -march. Anyhow, I am not sure labels are stable between debug and assert builds. Same thing for the assembly comments, i.e., I am not sure your output will contain: # %for.body. The bottom line is that I really think you shouldn't rely on labels, but I may be wrong of course! qcolombet: In that case, I would use 'movl total' as an anchor. Another way to avoid this label problem…
				chandlercUnsubmitted Not Done Reply Inline Actions FWIW, I agree that relying on specific label naming isn't a great strategy. However, I think for assembly tests, the instruction comments are very valuable and we should always keep them stable. For example, all of the shuffle testing using the comments rather than checking magic numbers. So I would suggest [[LOOP1:^[.\w]+:]]: # %for.body Essentially, match the specific x86 pattern for labels, any label, and the comment to identify which one. chandlerc: FWIW, I agree that relying on specific label naming isn't a great strategy. However, I think…
				qcolombetUnsubmitted Not Done Reply Inline Actions Agreed. I was not sure the comment were printed in release mode, but now that I think about this it happens only if the would pipeline is in release mode. qcolombet: Agreed. I was not sure the comment were printed in release mode, but now that I think about…
				; CHECK-NOT: mov
				; CHECK: .LBB0_2
				for.body: ; preds = %for.body.lr.ph, %for.body
				%add5 = phi i32 [ %total.promoted, %for.body.lr.ph ], [ %add, %for.body ]
				%i.04 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
				%div = sdiv i32 %i.04, 2
				%add = add nsw i32 %div, %add5
				%inc = add nuw nsw i32 %i.04, 1
				%cmp = icmp slt i32 %inc, %0
				br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

				for.cond.for.end_crit_edge: ; preds = %for.body
				store i32 %add, i32* @total, align 4
				br label %for.end

				for.end: ; preds = %for.cond.for.end_crit_edge, %entry
				ret void
				}

				; Function Attrs: nounwind uwtable
				define void @goo() {
				entry:
				qcolombetUnsubmitted Not Done Reply Inline Actions No need for #0. qcolombet: No need for #0.
				%0 = load i32* @M, align 4
				%cmp3 = icmp sgt i32 %0, 0
				br i1 %cmp3, label %for.body.lr.ph, label %for.end

				for.body.lr.ph: ; preds = %entry
				%total.promoted = load i32* @total, align 4
				br label %for.body

				; Check that only two mov will be generated in the kernel loop.
				; CHECK-LABEL: goo:
				; CHECK: .LBB1_2:
				; CHECK: mov
				; CHECK: mov
				; CHECK-NOT: mov
				; CHECK: .LBB1_2
				for.body: ; preds = %for.body.lr.ph, %for.body
				%add5 = phi i32 [ %total.promoted, %for.body.lr.ph ], [ %add, %for.body ]
				%i.04 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
				%div = sdiv i32 %i.04, 2
				%add = add nsw i32 %div, %add5
				store volatile i32 %add, i32* @g, align 4
				%inc = add nuw nsw i32 %i.04, 1
				%cmp = icmp slt i32 %inc, %0
				br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

				for.cond.for.end_crit_edge: ; preds = %for.body
				store i32 %add, i32* @total, align 4
				br label %for.end

				for.end: ; preds = %for.cond.for.end_crit_edge, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Remove redundent register mov by improving TwoAddressInstructionPass ClosedPublic

Details

1.c:

}

This is the kernel loop in 1.s:

Diff Detail

Event Timeline

BB#1: # %for.body.lr.ph

Revision Contents

Diff 20892

lib/CodeGen/TwoAddressInstructionPass.cpp

test/CodeGen/X86/twoaddr-coalesce-3.ll

BB#1: # %for.body.lr.ph

Remove redundent register mov by improving TwoAddressInstructionPass
ClosedPublic