This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
FastISel.h
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1/2
FastISel.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
sink-instr-value.ll

Differential D90776

[FastISel] Sink more materializations to first use
AbandonedPublic

Authored by probinson on Nov 4 2020, 10:38 AM.

Download Raw Diff

Details

Reviewers

rnk
echristo
MatzeB
qcolombet

Summary

Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

Local values may also be used by no-op casts, which adds the
register to the RegFixups table. Without reversing the RegFixups
map direction, we don't have enough information to sink these
instructions.

This patch implements sinking for these value materialization
instructions, by iterating over the RegFixups map. Usually there
aren't a lot, and building clang in Debug mode using a patched
clang showed a barely-above-the-noise 0.5% increase in build time.

The benefit to reducing spills and restores also exists; I saw a
0.3% reduction in code size for that Debug build of clang.

The original source locations were lost as part of emitting these
instructions to the local value area, but the source location of
the first use seems like a very good second best choice. I saw
another 0.3% reduction in the number of instruction bytes with
line-0 attributions after this patch; the benefit is greater
than that because only some of these instructions had been
attributed to line 0 before the patch.

One interesting effect is that some of these value instructions
previously ended up intermixed with prologue instructions (e.g.
stack homing for parameters); now that they have proper source
locations, they reliably come after the prologue.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	410 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
	200 ms	windows > lld.ELF/invalid::symtab-sh-info.s

Event Timeline

probinson created this revision.Nov 4 2020, 10:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2020, 10:38 AM

Herald added subscribers: pengfei, ormris, hiraditya. · View Herald Transcript

probinson requested review of this revision.Nov 4 2020, 10:38 AM

Harbormaster completed remote builds in B77578: Diff 302903.Nov 4 2020, 11:12 AM

+ people who reviewed D43093.

Thanks! How much of the benefit do you think comes from looking across flush points, and how much do you think comes from handling no-op casts?

I think we should also consider scrapping the local value map or maybe flushing it after selecting each instruction. I don't think we're getting a whole lot of value from sharing local values between multiple non-call instructions. Most non-call instructions are pointer, integer, and FP arithmetic. Most integer arithmetic supports folding immediates. Pointer arithmetic may benefit from reusing a value. Consider a series of GEPs (array/field accesses) built off the same global variable loaded from the GOT.

I think we can use the same measurement Paul used to show this change was useful: flush the map after every instruction, rebuild clang with a new compiler, measure the binary size and prevalence of line 0 source locations. If size goes down and line zero locations go down, the local value map is more trouble than it's worth, or, at least, it should be limited to memoizing values per instruction. If we flush the map after every instruction, we can confidently give local values the location of the current instruction, and we don't need to do any complex sinking. We should be able to delete this code.

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
256	This looks like it's reverting rGa28e767f06d. This code was added to avoid O(n^2) complexity. Do you have an alternative solution to that issue?

I think we should also consider scrapping the local value map or maybe flushing it after selecting each instruction. I don't think we're getting a whole lot of value from sharing local values between multiple non-call instructions.

Toward the end of my work to come up with this patch, I was also thinking the value map is quite likely more trouble than it's worth. It increases register pressure, and once you start spilling the values, that ends up costing more than rematerializing would. Flushing the map on every instruction is an alternative that I thought of fairly late in this process (it has taken a couple of weeks to work out what was going on). I will probably try that experiment next week (I'm off tomorrow).

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
256	I was not aware of that, thanks for pointing it out. The need to number the entire block comes up when the value instruction is not used until after the next call. That typically happens in a case like `foo(&a, bar(&b));` where we compute `&a` before we start doing the call to `bar`. The PR37010 example has many thousands of those in a row, and we flush every time we see a call, so my patch will certainly run into this problem. On the other hand... if we could manage to preserve the original source info for the `&a` then we wouldn't need to do this sinking hack at all. Sinking the instruction doesn't recover the original source location, it gives us something sorta usable but not really correct. So I'd rather not sink these things at all, actually.

I prototyped a patch that would

call flushLocalValueMap() at the top of selectInstruction()
turn off the instruction-sinking stuff
not erase the DbgLoc applied to local value instructions

I also added statistics for the number of local values added to the map, and the number of lookups that got a hit (assumed to mean a reuse).

stat	no patch	with patch	delta
local values recorded	4686197	4711277	+25080 (+0.5%)
local values reused	1524558	1283825	-240733 (-16%)
reuse rate	32.5%	27.3%	-5.2%
.text size	127,537,392	127,507,888	-29505 (-0.02%)
Line-0 bytes	9.071,744	8,935,299	-136445 (-1.5%)

This tells me that comparatively few values were reuses across instruction boundaries (~5%). Despite those reuses going away, overall .text size still went down, which we could assume is due to fewer spills/reloads; I don't have stats there, we'd have to see what regalloc is willing to report.
There's also a pleasant decrease in the number of bytes with line-0 attributions. I noticed in small examples that reloads at the top of a block still get line 0, which seems not unfair.

I don't have build-time stats yet, but as the patch removes the old sinking code, I'd anticipate that build times would go down.

Once I have that data, I'll clean up the prototype patch and post that.

Building Clang with Clang, adding the prototype patch improved build time by a barely perceptible 0.5%.

I'll work on making it presentable, probably post it tomorrow.

Thanks for gathering the measurements! They seem promising.

I apologize for passing the work of measuring on to you. I think the last time I looked at this, I felt it would be easier (faster?) to code up the "local value sinking" feature than to run measurements and gather consensus to disable local value reuse. But, that was probably not the best tradeoff in the long run. Less code is better than more code.

See D91734 for an alternate approach as suggested by @rnk .

Abandoned in favor of D91734.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

FastISel.h

3 lines

lib/

CodeGen/

SelectionDAG/

FastISel.cpp

84 lines

test/

CodeGen/

X86/

sink-instr-value.ll

224 lines

Diff 302903

llvm/include/llvm/CodeGen/FastISel.h

Show First 20 Lines • Show All 559 Lines • ▼ Show 20 Lines	private:
/// Removes dead local value instructions after SavedLastLocalvalue.		/// Removes dead local value instructions after SavedLastLocalvalue.
void removeDeadLocalValueCode(MachineInstr *SavedLastLocalValue);		void removeDeadLocalValueCode(MachineInstr *SavedLastLocalValue);

struct InstOrderMap {		struct InstOrderMap {
DenseMap<MachineInstr *, unsigned> Orders;		DenseMap<MachineInstr *, unsigned> Orders;
MachineInstr *FirstTerminator = nullptr;		MachineInstr *FirstTerminator = nullptr;
unsigned FirstTerminatorOrder = std::numeric_limits<unsigned>::max();		unsigned FirstTerminatorOrder = std::numeric_limits<unsigned>::max();

void initialize(MachineBasicBlock *MBB,		void initialize(MachineBasicBlock *MBB);
MachineBasicBlock::iterator LastFlushPoint);
};		};

/// Sinks the local value materialization instruction LocalMI to its first use		/// Sinks the local value materialization instruction LocalMI to its first use
/// in the basic block, or deletes it if it is not used.		/// in the basic block, or deletes it if it is not used.
void sinkLocalValueMaterialization(MachineInstr &LocalMI, Register DefReg,		void sinkLocalValueMaterialization(MachineInstr &LocalMI, Register DefReg,
InstOrderMap &OrderMap);		InstOrderMap &OrderMap);

/// Insertion point before trying to select the current instruction.		/// Insertion point before trying to select the current instruction.
Show All 14 Lines

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	static bool isTerminatingEHLabel(MachineBasicBlock *MBB, MachineInstr &MI) {
// If this is a landingpad, the first non-phi instruction will be an EH_LABEL.		// If this is a landingpad, the first non-phi instruction will be an EH_LABEL.
// Don't consider that label to be a terminator.		// Don't consider that label to be a terminator.
return MI.getIterator() != MBB->getFirstNonPHI();		return MI.getIterator() != MBB->getFirstNonPHI();
}		}

/// Build a map of instruction orders. Return the first terminator and its		/// Build a map of instruction orders. Return the first terminator and its
/// order. Consider EH_LABEL instructions to be terminators as well, since local		/// order. Consider EH_LABEL instructions to be terminators as well, since local
/// values for phis after invokes must be materialized before the call.		/// values for phis after invokes must be materialized before the call.
void FastISel::InstOrderMap::initialize(		void FastISel::InstOrderMap::initialize(MachineBasicBlock *MBB) {
MachineBasicBlock *MBB, MachineBasicBlock::iterator LastFlushPoint) {
unsigned Order = 0;		unsigned Order = 0;
for (MachineInstr &I : *MBB) {		for (MachineInstr &I : *MBB) {
if (!FirstTerminator &&		if (!FirstTerminator &&
(I.isTerminator() \|\| isTerminatingEHLabel(MBB, I))) {		(I.isTerminator() \|\| isTerminatingEHLabel(MBB, I))) {
FirstTerminator = &I;		FirstTerminator = &I;
FirstTerminatorOrder = Order;		FirstTerminatorOrder = Order;
}		}
Orders[&I] = Order++;		Orders[&I] = Order++;

// We don't need to order instructions past the last flush point.
rnkUnsubmitted Not Done Reply Inline Actions This looks like it's reverting rGa28e767f06d. This code was added to avoid O(n^2) complexity. Do you have an alternative solution to that issue? rnk: This looks like it's reverting rGa28e767f06d. This code was added to avoid O(n^2) complexity.
probinsonAuthorUnsubmitted Done Reply Inline Actions I was not aware of that, thanks for pointing it out. The need to number the entire block comes up when the value instruction is not used until after the next call. That typically happens in a case like `foo(&a, bar(&b));` where we compute `&a` before we start doing the call to `bar`. The PR37010 example has many thousands of those in a row, and we flush every time we see a call, so my patch will certainly run into this problem. On the other hand... if we could manage to preserve the original source info for the `&a` then we wouldn't need to do this sinking hack at all. Sinking the instruction doesn't recover the original source location, it gives us something sorta usable but not really correct. So I'd rather not sink these things at all, actually. probinson: I was not aware of that, thanks for pointing it out. The need to number the entire block comes…
if (I.getIterator() == LastFlushPoint)
break;
}		}
}		}

void FastISel::sinkLocalValueMaterialization(MachineInstr &LocalMI,		void FastISel::sinkLocalValueMaterialization(MachineInstr &LocalMI,
Register DefReg,		Register DefReg,
InstOrderMap &OrderMap) {		InstOrderMap &OrderMap) {
// If this register is used by a register fixup, MRI will not contain all		// If DefReg is used by a register fixup, MRI will not directly describe
// the uses until after register fixups, so don't attempt to sink or DCE		// all the uses until after register fixups. Chase down what those fixups
// this instruction. Register fixups typically come from no-op cast		// would be and look for all of them when determining whether we can sink
// instructions, which replace the cast instruction vreg with the local		// this instruction. Register fixups typically come from no-op cast or
		// GEP instructions, which replace the instruction vreg with the local
// value vreg.		// value vreg.
if (FuncInfo.RegsWithFixups.count(DefReg))		SmallVector<Register, 4> Regs;
return;		Regs.push_back(DefReg);

// We can DCE this instruction if there are no uses and it wasn't a		// It's sad we have to iterate the entire map for each fixup.
// materialized for a successor PHI node.		// But it's probably no cheaper to maintain an inverse map, especially
		// as we can have DefReg show up more than once as a fixup target.
		// Note: Checking size() each iteration as we're adding entries.
		for (size_t RegsIndex = 0; RegsIndex < Regs.size(); ++RegsIndex) {
		if (FuncInfo.RegsWithFixups.count(Regs[RegsIndex]))
		for (auto &Fixup : FuncInfo.RegFixups)
		if (Fixup.getSecond() == Regs[RegsIndex])
		Regs.push_back(Fixup.getFirst());
		}

		// We can DCE this instruction if there are no uses, within or outside of
		// this block. If it has fixups, it has uses; without fixups, it might be
		// used locally, or by a PHI in a successor.
bool UsedByPHI = isRegUsedByPhiNodes(DefReg, FuncInfo);		bool UsedByPHI = isRegUsedByPhiNodes(DefReg, FuncInfo);
if (!UsedByPHI && MRI.use_nodbg_empty(DefReg)) {		if (Regs.size() == 1 && !UsedByPHI && MRI.use_nodbg_empty(DefReg)) {
if (EmitStartPt == &LocalMI)		if (EmitStartPt == &LocalMI)
EmitStartPt = EmitStartPt->getPrevNode();		EmitStartPt = EmitStartPt->getPrevNode();
LLVM_DEBUG(dbgs() << "removing dead local value materialization "		LLVM_DEBUG(dbgs() << "removing dead local value materialization "
<< LocalMI);		<< LocalMI);
OrderMap.Orders.erase(&LocalMI);		OrderMap.Orders.erase(&LocalMI);
LocalMI.eraseFromParent();		LocalMI.eraseFromParent();
return;		return;
}		}

// Number the instructions if we haven't yet so we can efficiently find the		// Number the instructions if we haven't yet so we can efficiently find the
// earliest use.		// earliest use.
if (OrderMap.Orders.empty())		if (OrderMap.Orders.empty())
OrderMap.initialize(FuncInfo.MBB, LastFlushPoint);		OrderMap.initialize(FuncInfo.MBB);

// Find the first user in the BB.		// Find the first user in the BB.
MachineInstr *FirstUser = nullptr;		MachineInstr *FirstUser = nullptr;
unsigned FirstOrder = std::numeric_limits<unsigned>::max();		unsigned FirstOrder = std::numeric_limits<unsigned>::max();
for (MachineInstr &UseInst : MRI.use_nodbg_instructions(DefReg)) {		for (Register R : Regs) {
		for (MachineInstr &UseInst : MRI.use_nodbg_instructions(R)) {
auto I = OrderMap.Orders.find(&UseInst);		auto I = OrderMap.Orders.find(&UseInst);
assert(I != OrderMap.Orders.end() &&		// Ignore uses outside this BB; they will be other uses of the
		// value produced by the IR instruction, and this BB necessarily
		// dominates all those uses.
		if (I == OrderMap.Orders.end()) {
		assert(Regs.size() > 1 &&
"local value used by instruction outside local region");		"local value used by instruction outside local region");
		continue;
		}
unsigned UseOrder = I->second;		unsigned UseOrder = I->second;
if (UseOrder < FirstOrder) {		if (UseOrder < FirstOrder) {
FirstOrder = UseOrder;		FirstOrder = UseOrder;
FirstUser = &UseInst;		FirstUser = &UseInst;
}		}
}		}
		}

// The insertion point will be the first terminator or the first user,		// The insertion point will be the first terminator or the first user,
// whichever came first. If there was no terminator, this must be a		// whichever came first. If there was no terminator, this must be a
// fallthrough block and the insertion point is the end of the block.		// fallthrough block and the insertion point is the end of the block.
MachineBasicBlock::instr_iterator SinkPos;		MachineBasicBlock::instr_iterator SinkPos;
if (UsedByPHI && OrderMap.FirstTerminatorOrder < FirstOrder) {		if ((UsedByPHI \|\| Regs.size() > 1) &&
		OrderMap.FirstTerminatorOrder < FirstOrder) {
FirstOrder = OrderMap.FirstTerminatorOrder;		FirstOrder = OrderMap.FirstTerminatorOrder;
SinkPos = OrderMap.FirstTerminator->getIterator();		SinkPos = OrderMap.FirstTerminator->getIterator();
} else if (FirstUser) {		} else if (FirstUser) {
SinkPos = FirstUser->getIterator();		SinkPos = FirstUser->getIterator();
} else {		} else {
assert(UsedByPHI && "must be users if not used by a phi");		assert((UsedByPHI \|\| Regs.size() > 1) &&
		"must be users if not used outside the block");
SinkPos = FuncInfo.MBB->instr_end();		SinkPos = FuncInfo.MBB->instr_end();
}		}

// Collect all DBG_VALUEs before the new insertion position so that we can		// Collect all DBG_VALUEs before the new insertion position so that we can
// sink them.		// sink them.
SmallVector<MachineInstr *, 1> DbgValues;		SmallVector<MachineInstr *, 1> DbgValues;
for (MachineInstr &DbgVal : MRI.use_instructions(DefReg)) {		for (Register R : Regs) {
		for (MachineInstr &DbgVal : MRI.use_instructions(R)) {
if (!DbgVal.isDebugValue())		if (!DbgVal.isDebugValue())
continue;		continue;
unsigned UseOrder = OrderMap.Orders[&DbgVal];		unsigned UseOrder = OrderMap.Orders[&DbgVal];
if (UseOrder < FirstOrder)		if (UseOrder < FirstOrder)
DbgValues.push_back(&DbgVal);		DbgValues.push_back(&DbgVal);
}		}
		}

// Sink LocalMI before SinkPos and assign it the same DebugLoc.		// Sink LocalMI before SinkPos and assign it the same DebugLoc.
LLVM_DEBUG(dbgs() << "sinking local value to first use " << LocalMI);		LLVM_DEBUG(dbgs() << "sinking local value to first use " << LocalMI);
FuncInfo.MBB->remove(&LocalMI);		FuncInfo.MBB->remove(&LocalMI);
FuncInfo.MBB->insert(SinkPos, &LocalMI);		FuncInfo.MBB->insert(SinkPos, &LocalMI);
if (SinkPos != FuncInfo.MBB->end())		if (SinkPos != FuncInfo.MBB->end())
LocalMI.setDebugLoc(SinkPos->getDebugLoc());		LocalMI.setDebugLoc(SinkPos->getDebugLoc());

▲ Show 20 Lines • Show All 2,193 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/sink-instr-value.ll

This file was added.

				; RUN: %llc_dwarf -O0 < %s \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				;; Show that sinking an instruction value to its first use can improve
				;; line tables. This kind of instruction is handled as a "local value"
				;; and so we lose the original source location; moving it to the first
				;; use can give it back a reasonable location.

				;; Test functions derived from the following, using clang -g1.
				;;
				;; int other(char *arr, int i);
				;; int another(char *arr);
				;;
				;; int test1(int a, int b) {
				;; char nums[10];
				;; int c = a + b;
				;; return other(nums, c);
				;; }
				;;
				;; int test2(int a, int b) {
				;; char nums[10];
				;; int c = 10;
				;; if (a)
				;; c = a + b;
				;; return other(nums, c);
				;; }
				;;
				;; int test3() {
				;; char a[10];
				;; char b[10];
				;; return other(a, another(b));
				;; }
				;;
				;; struct C { char d; char e; };
				;; C test4(char b, char p) {
				;; return { another(b) ? p : b,
				;; p};
				;; }

				;; Show leaq being moved out of the prologue.
				;
				; CHECK-LABEL: _Z5test1ii:
				; CHECK-NOT: leaq
				; CHECK: .loc 1 7 10
				; CHECK-NEXT: leaq
				; CHECK-NEXT: callq
				;
				define dso_local i32 @_Z5test1ii(i32 %a, i32 %b) !dbg !7 {
				entry:
				%a.addr = alloca i32, align 4
				%b.addr = alloca i32, align 4
				%nums = alloca [10 x i8], align 1
				%c = alloca i32, align 4
				store i32 %a, i32* %a.addr, align 4
				store i32 %b, i32* %b.addr, align 4
				%0 = load i32, i32* %a.addr, align 4, !dbg !9
				%1 = load i32, i32* %b.addr, align 4, !dbg !10
				%add = add nsw i32 %0, %1, !dbg !11
				store i32 %add, i32* %c, align 4, !dbg !12
				%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %nums, i64 0, i64 0, !dbg !13
				%2 = load i32, i32* %c, align 4, !dbg !14
				%call = call i32 @_Z5otherPci(i8* %arraydecay, i32 %2), !dbg !15
				ret i32 %call, !dbg !16
				}


				;; Show an leaq being moved across a call (past the previous flush point).
				;; This means we need to consider uses in the entire block.
				;
				; CHECK-LABEL: _Z5test2ii:
				; CHECK: .loc 1 15 22
				; CHECK-NEXT: movl
				; CHECK-NEXT: .loc 1 15 10
				; CHECK-NEXT: leaq
				; CHECK-NEXT: callq
				;
				define dso_local i32 @_Z5test2ii(i32 %a, i32 %b) #0 !dbg !17 {
				entry:
				%a.addr = alloca i32, align 4
				%b.addr = alloca i32, align 4
				%nums = alloca [10 x i8], align 1
				%c = alloca i32, align 4
				store i32 %a, i32* %a.addr, align 4
				store i32 %b, i32* %b.addr, align 4
				store i32 10, i32* %c, align 4, !dbg !18
				%0 = load i32, i32* %a.addr, align 4, !dbg !19
				%tobool = icmp ne i32 %0, 0, !dbg !19
				br i1 %tobool, label %if.then, label %if.end, !dbg !19

				if.then: ; preds = %entry
				%1 = load i32, i32* %a.addr, align 4, !dbg !20
				%2 = load i32, i32* %b.addr, align 4, !dbg !21
				%add = add nsw i32 %1, %2, !dbg !22
				store i32 %add, i32* %c, align 4, !dbg !23
				br label %if.end, !dbg !24

				if.end: ; preds = %if.then, %entry
				%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %nums, i64 0, i64 0, !dbg !25
				%3 = load i32, i32* %c, align 4, !dbg !26
				%call = call i32 @_Z5otherPci(i8* %arraydecay, i32 %3), !dbg !27
				ret i32 %call, !dbg !28
				}

				;; Show an leaq being moved across a call (past the previous flush point).
				;; This means we need to consider uses in the entire block.
				;
				; CHECK-LABEL: _Z5test3v:
				; CHECK-NOT: leaq
				; CHECK: .loc 1 21 19
				; CHECK-NEXT: leaq
				; CHECK-NEXT: callq _Z7anotherPc
				; CHECK: .loc 1 21 10
				; CHECK-NEXT: leaq
				; CHECK-NEXT: callq _Z5otherPci
				;
				define dso_local i32 @_Z5test3v() #0 !dbg !29 {
				entry:
				%a = alloca [10 x i8], align 1
				%b = alloca [10 x i8], align 1
				%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %a, i64 0, i64 0, !dbg !30
				%arraydecay1 = getelementptr inbounds [10 x i8], [10 x i8]* %b, i64 0, i64 0, !dbg !31
				%call = call i32 @_Z7anotherPc(i8* %arraydecay1), !dbg !32
				%call2 = call i32 @_Z5otherPci(i8* %arraydecay, i32 %call), !dbg !33
				ret i32 %call2, !dbg !34
				}


				;; Show a case where the original instruction has no uses at all in its
				;; block, so we move it to the end. This can save spill/restore cost.
				;
				; CHECK-LABEL: _Z5test4PcS_:
				; CHECK-NOT: leaq
				; CHECK: .loc 1 26 13
				; CHECK-NEXT: callq _Z7anotherPc
				; CHECK-NEXT: cmpl
				; CHECK-NEXT: leaq
				;
				%struct.C = type { i8, i8 }

				define dso_local { i8, i8 } @_Z5test4PcS_(i8* %b, i8* %p) #0 !dbg !35 {
				entry:
				%retval = alloca %struct.C, align 8
				%b.addr = alloca i8*, align 8
				%p.addr = alloca i8*, align 8
				store i8* %b, i8** %b.addr, align 8
				store i8* %p, i8** %p.addr, align 8
				%d = getelementptr inbounds %struct.C, %struct.C* %retval, i32 0, i32 0, !dbg !36
				%0 = load i8, i8* %b.addr, align 8, !dbg !37
				%call = call i32 @_Z7anotherPc(i8* %0), !dbg !38
				%tobool = icmp ne i32 %call, 0, !dbg !38
				br i1 %tobool, label %cond.true, label %cond.false, !dbg !38

				cond.true: ; preds = %entry
				%1 = load i8, i8* %p.addr, align 8, !dbg !39
				br label %cond.end, !dbg !38

				cond.false: ; preds = %entry
				%2 = load i8, i8* %b.addr, align 8, !dbg !40
				br label %cond.end, !dbg !38

				cond.end: ; preds = %cond.false, %cond.true
				%cond = phi i8* [ %1, %cond.true ], [ %2, %cond.false ], !dbg !38
				store i8* %cond, i8** %d, align 8, !dbg !36
				%e = getelementptr inbounds %struct.C, %struct.C* %retval, i32 0, i32 1, !dbg !36
				%3 = load i8, i8* %p.addr, align 8, !dbg !41
				store i8* %3, i8** %e, align 8, !dbg !36
				%4 = bitcast %struct.C* %retval to { i8, i8 }*, !dbg !42
				%5 = load { i8, i8 }, { i8, i8 }* %4, align 8, !dbg !42
				ret { i8, i8 } %5, !dbg !42
				}

				declare dso_local i32 @_Z5otherPci(i8*, i32) #1
				declare dso_local i32 @_Z7anotherPc(i8*) #1


				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4, !5}
				!llvm.ident = !{!6}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 12.0.0 (ALL clang version 99.99.0.57224 b9d61c39 debug)", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly, enums: !2, splitDebugInlining: false, nameTableKind: None)
				!1 = !DIFile(filename: "test4.cpp", directory: "/home/probinson/projects/scratch")
				!2 = !{}
				!3 = !{i32 7, !"Dwarf Version", i32 4}
				!4 = !{i32 2, !"Debug Info Version", i32 3}
				!5 = !{i32 1, !"wchar_size", i32 4}
				!6 = !{!"clang version 12.0.0 (ALL clang version 99.99.0.57224 b9d61c39 debug)"}
				!7 = distinct !DISubprogram(name: "test1", scope: !1, file: !1, line: 4, type: !8, scopeLine: 4, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!8 = !DISubroutineType(types: !2)
				!9 = !DILocation(line: 6, column: 11, scope: !7)
				!10 = !DILocation(line: 6, column: 15, scope: !7)
				!11 = !DILocation(line: 6, column: 13, scope: !7)
				!12 = !DILocation(line: 6, column: 7, scope: !7)
				!13 = !DILocation(line: 7, column: 16, scope: !7)
				!14 = !DILocation(line: 7, column: 22, scope: !7)
				!15 = !DILocation(line: 7, column: 10, scope: !7)
				!16 = !DILocation(line: 7, column: 3, scope: !7)
				!17 = distinct !DISubprogram(name: "test2", scope: !1, file: !1, line: 10, type: !8, scopeLine: 10, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!18 = !DILocation(line: 12, column: 7, scope: !17)
				!19 = !DILocation(line: 13, column: 7, scope: !17)
				!20 = !DILocation(line: 14, column: 9, scope: !17)
				!21 = !DILocation(line: 14, column: 13, scope: !17)
				!22 = !DILocation(line: 14, column: 11, scope: !17)
				!23 = !DILocation(line: 14, column: 7, scope: !17)
				!24 = !DILocation(line: 14, column: 5, scope: !17)
				!25 = !DILocation(line: 15, column: 16, scope: !17)
				!26 = !DILocation(line: 15, column: 22, scope: !17)
				!27 = !DILocation(line: 15, column: 10, scope: !17)
				!28 = !DILocation(line: 15, column: 3, scope: !17)
				!29 = distinct !DISubprogram(name: "test3", scope: !1, file: !1, line: 18, type: !8, scopeLine: 18, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!30 = !DILocation(line: 21, column: 16, scope: !29)
				!31 = !DILocation(line: 21, column: 27, scope: !29)
				!32 = !DILocation(line: 21, column: 19, scope: !29)
				!33 = !DILocation(line: 21, column: 10, scope: !29)
				!34 = !DILocation(line: 21, column: 3, scope: !29)
				!35 = distinct !DISubprogram(name: "test4", scope: !1, file: !1, line: 25, type: !8, scopeLine: 25, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!36 = !DILocation(line: 26, column: 11, scope: !35)
				!37 = !DILocation(line: 26, column: 21, scope: !35)
				!38 = !DILocation(line: 26, column: 13, scope: !35)
				!39 = !DILocation(line: 26, column: 26, scope: !35)
				!40 = !DILocation(line: 26, column: 30, scope: !35)
				!41 = !DILocation(line: 27, column: 13, scope: !35)
				!42 = !DILocation(line: 26, column: 3, scope: !35)

This is an archive of the discontinued LLVM Phabricator instance.

[FastISel] Sink more materializations to first useAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 302903

llvm/include/llvm/CodeGen/FastISel.h

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp

llvm/test/CodeGen/X86/sink-instr-value.ll

[FastISel] Sink more materializations to first use
AbandonedPublic