This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86FlagsCopyLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
O0-pipeline.ll
-
O3-pipeline.ll
4
copy-eflags.ll

Differential D45673

[x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses across basic blocks in the limited cases where it is very straight forward to do so.
ClosedPublic

Authored by chandlerc on Apr 15 2018, 9:29 AM.

Download Raw Diff

Details

Reviewers

bkramer
craig.topper
rnk

Commits

rG1f87618f8f45: [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses across…
rL330264: [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses

Summary

This will also be useful for other places where we do some limited
EFLAGS propagation across CFG edges and need to handle copy rewrites
afterward. I think this is rapidly approaching the maximum we can and
should be doing here. Everything else begins to require either heroic
analysis to prove how to do PHI insertion manually, or somehow managing
arbitrary PHI-ing of EFLAGS with general PHI insertion. Neither of these
seem at all promising so if those cases come up, we'll almost certainly
need to rewrite the parts of LLVM that produce those patterns.

We do now require dominator trees in order to reliably diagnose patterns
that would require PHI nodes. This is a bit unfortunate but it seems
better than the completely mysterious crash we would get otherwise.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 17095
Build 17095: arc lint + arc unit

Event Timeline

chandlerc created this revision.Apr 15 2018, 9:29 AM

Herald added subscribers: hiraditya, mcrosier, sanjoy. · View Herald TranscriptApr 15 2018, 9:29 AM

emaste added a subscriber: emaste.Apr 15 2018, 8:03 PM

rnk added a subscriber: rnk.Apr 16 2018, 2:26 PM

rnk added inline comments.

llvm/test/CodeGen/X86/copy-eflags.ll
293	This infinite loop with selects of undef feels over-reduced. It seems fragile, since a pass could come along and optimize the select to the non-undef operand. I liked the test case from PR37133, but it's reliance on debug info (any, not just standalone) scares me: int a, b; unsigned long long c; void e() { long long d; b = 33; d = a; if (c < a) { b = 32; d = c; } if (d) b = 2; } We should reopen it or file a split separate bug, because we generate two more BBs on that code when debug info is enabled. :( In any case, is there a better way to write this test case using legalized arithmetic? An i128 x64 test case with live-out eflags might also be interesting.

chandlerc added inline comments.Apr 16 2018, 3:37 PM

llvm/test/CodeGen/X86/copy-eflags.ll
293	I actually tried some to remove the undef stuff. I couldn't find a pattern that triggered the bug. =[ I guess I can keep trying...

Arrange this test case to be slightly less brittle.

Harbormaster completed remote builds in B17122: Diff 142721.Apr 16 2018, 5:03 PM

chandlerc added inline comments.Apr 16 2018, 5:06 PM

llvm/test/CodeGen/X86/copy-eflags.ll
293	Got the undefs out. How much more should I do here?

lgtm

llvm/test/CodeGen/X86/copy-eflags.ll
293	I guess the live-out eflags are getting created by a combination of SBB and select-to-jump x86 lowering logic. I guess this is as reduced as we can get.

This revision is now accepted and ready to land.Apr 17 2018, 9:44 AM

Closed by commit rL330264: [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses (authored by chandlerc). · Explain WhyApr 18 2018, 8:16 AM

This revision was automatically updated to reflect the committed changes.

mzolotukhin added a subscriber: mzolotukhin.Apr 18 2018, 11:11 PM

mzolotukhin added inline comments.

llvm/trunk/lib/Target/X86/X86FlagsCopyLowering.cpp
445–457 ↗	(On Diff #142943)	Could all of this be under `DEBUG`? The fatal error we're reporting here looks like an internal compiler assert anyway. If we put in entirely under `DEBUG`, we can avoid building dominators in Release compilers.

chandlerc added inline comments.Apr 19 2018, 12:23 AM

llvm/trunk/lib/Target/X86/X86FlagsCopyLowering.cpp
445–457 ↗	(On Diff #142943)	The reason we've kept a number of these as fatal errors is due to our concern about latent problems in LLVM. We're essentially introducing a pretty novel set of constraints on what kinds of EFLAGS copies can be lowered, and trying to get somewhat more reliable failure modes if this bites us... And at least one of these already helped uncover a bug (that this patch is fixing)... Is the domtree analysis cost showing up significantly in compile times for you? Are there other concerns?

mzolotukhin added inline comments.Apr 19 2018, 12:43 AM

llvm/trunk/lib/Target/X86/X86FlagsCopyLowering.cpp
445–457 ↗	(On Diff #142943)	I see a value of having it on at least for some time, but I think in the long run we should turn it into an assert. Would it be possible? I just noticed a new pass in the pipeline and wanted to check if it can be avoided. On O0 dom-tree analysis costs ~0.25% geomean (on CTMark), so each separate invocation doesn't seem to cost much, but they add up eventually. By the way, why didn't we catch the bug you are talking about with RA compiler bots? Maybe that's an argument for adding more RA bots?

chandlerc added inline comments.Apr 19 2018, 1:26 AM

llvm/trunk/lib/Target/X86/X86FlagsCopyLowering.cpp
445–457 ↗	(On Diff #142943)	Sorry, I misunderstood. Long term, we can almost certainly move all of this to an assert-only kind of check. We just are benefiting (for now) from more aggressive checking. But you ask a great question: why don't the release+asserts deployments catch these? The frustrating thing is that some of the patterns here are really rare, or really tricky to reproduce. One of the test cases we got only reproduces when you build the input with debug info enabled! We still don't know why. The other test case only reproduces on 32-bit x86, where we (sadly) have much less testing. It's also somewhat rare for benchmarks to hit this because the old lowering was really slow. If it were happening frequently in benchmarks, it might have been fixed sooner or the benchmarks would have been changed to not do that.... =/

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86FlagsCopyLowering.cpp

207 lines

test/

CodeGen/

X86/

O0-pipeline.ll

1 line

O3-pipeline.ll

1 line

copy-eflags.ll

96 lines

Diff 142573

llvm/lib/Target/X86/X86FlagsCopyLowering.cpp

Show All 30 Lines
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/SparseBitVector.h"		#include "llvm/ADT/SparseBitVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineConstantPool.h"		#include "llvm/CodeGen/MachineConstantPool.h"
		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/MachineSSAUpdater.h"		#include "llvm/CodeGen/MachineSSAUpdater.h"
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	public:
/// Pass identification, replacement for typeid.		/// Pass identification, replacement for typeid.
static char ID;		static char ID;

private:		private:
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
const X86InstrInfo *TII;		const X86InstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const TargetRegisterClass *PromoteRC;		const TargetRegisterClass *PromoteRC;
		MachineDominatorTree *MDT;

CondRegArray collectCondsInRegs(MachineBasicBlock &MBB,		CondRegArray collectCondsInRegs(MachineBasicBlock &MBB,
MachineInstr &CopyDefI);		MachineInstr &CopyDefI);

unsigned promoteCondToReg(MachineBasicBlock &MBB,		unsigned promoteCondToReg(MachineBasicBlock &MBB,
MachineBasicBlock::iterator TestPos,		MachineBasicBlock::iterator TestPos,
DebugLoc TestLoc, X86::CondCode Cond);		DebugLoc TestLoc, X86::CondCode Cond);
std::pair<unsigned, bool>		std::pair<unsigned, bool>
Show All 31 Lines

FunctionPass *llvm::createX86FlagsCopyLoweringPass() {		FunctionPass *llvm::createX86FlagsCopyLoweringPass() {
return new X86FlagsCopyLoweringPass();		return new X86FlagsCopyLoweringPass();
}		}

char X86FlagsCopyLoweringPass::ID = 0;		char X86FlagsCopyLoweringPass::ID = 0;

void X86FlagsCopyLoweringPass::getAnalysisUsage(AnalysisUsage &AU) const {		void X86FlagsCopyLoweringPass::getAnalysisUsage(AnalysisUsage &AU) const {
		AU.addRequired<MachineDominatorTree>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

namespace {		namespace {
/// An enumeration of the arithmetic instruction mnemonics which have		/// An enumeration of the arithmetic instruction mnemonics which have
/// interesting flag semantics.		/// interesting flag semantics.
///		///
/// We can map instruction opcodes into these mnemonics to make it easy to		/// We can map instruction opcodes into these mnemonics to make it easy to
▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {		bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {
DEBUG(dbgs() << "********** " << getPassName() << " : " << MF.getName()		DEBUG(dbgs() << "********** " << getPassName() << " : " << MF.getName()
<< " **********\n");		<< " **********\n");

auto &Subtarget = MF.getSubtarget<X86Subtarget>();		auto &Subtarget = MF.getSubtarget<X86Subtarget>();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
TII = Subtarget.getInstrInfo();		TII = Subtarget.getInstrInfo();
TRI = Subtarget.getRegisterInfo();		TRI = Subtarget.getRegisterInfo();
		MDT = &getAnalysis<MachineDominatorTree>();
PromoteRC = &X86::GR8RegClass;		PromoteRC = &X86::GR8RegClass;

if (MF.begin() == MF.end())		if (MF.begin() == MF.end())
// Nothing to do for a degenerate empty function...		// Nothing to do for a degenerate empty function...
return false;		return false;

SmallVector<MachineInstr *, 4> Copies;		SmallVector<MachineInstr *, 4> Copies;
for (MachineBasicBlock &MBB : MF)		for (MachineBasicBlock &MBB : MF)
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	for (MachineInstr *CopyI : Copies) {

// Gather the condition flags that have already been preserved in		// Gather the condition flags that have already been preserved in
// registers. We do this from scratch each time as we expect there to be		// registers. We do this from scratch each time as we expect there to be
// very few of them and we expect to not revisit the same copy definition		// very few of them and we expect to not revisit the same copy definition
// many times. If either of those change sufficiently we could build a map		// many times. If either of those change sufficiently we could build a map
// of these up front instead.		// of these up front instead.
CondRegArray CondRegs = collectCondsInRegs(TestMBB, CopyDefI);		CondRegArray CondRegs = collectCondsInRegs(TestMBB, CopyDefI);

for (auto MII = std::next(CopyI->getIterator()), MIE = MBB.instr_end();		// Collect the basic blocks we need to scan. Typically this will just be
		// a single basic block but we may have to scan multiple blocks if the
		// EFLAGS copy lives into successors.
		SmallVector<MachineBasicBlock *, 2> Blocks;
		SmallPtrSet<MachineBasicBlock *, 2> VisitedBlocks;
		Blocks.push_back(&MBB);
		VisitedBlocks.insert(&MBB);

		do {
		MachineBasicBlock &UseMBB = *Blocks.pop_back_val();

		// We currently don't do any PHI insertion and so we require that the
		// test basic block dominates all of the use basic blocks.
		//
		// We could in theory do PHI insertion here if it becomes useful by just
		// taking undef values in along every edge that we don't trace this
		// EFLAGS copy along. This isn't as bad as fully general PHI insertion,
		// but still seems like a great deal of complexity.
		//
		// Because it is theoretically possible that some earlier MI pass or
		// other lowering transformation could induce this to happen, we do
		// a hard check even in non-debug builds here.
		if (&TestMBB != &UseMBB && !MDT->dominates(&TestMBB, &UseMBB)) {
		DEBUG({
		dbgs() << "ERROR: Encountered use that is not dominated by our test "
		"basic block! Rewriting this would require inserting PHI "
		"nodes to track the flag state across the CFG.\n\nTest "
		"block:\n";
		TestMBB.dump();
		dbgs() << "Use block:\n";
		UseMBB.dump();
		});
		report_fatal_error("Cannot lower EFLAGS copy when original copy def "
		"does not dominate all uses.");
		}

		for (auto MII = &UseMBB == &MBB ? std::next(CopyI->getIterator())
		: UseMBB.instr_begin(),
		MIE = UseMBB.instr_end();
MII != MIE;) {		MII != MIE;) {
MachineInstr &MI = *MII++;		MachineInstr &MI = *MII++;
MachineOperand *FlagUse = MI.findRegisterUseOperand(X86::EFLAGS);		MachineOperand *FlagUse = MI.findRegisterUseOperand(X86::EFLAGS);
if (!FlagUse) {		if (!FlagUse) {
if (MI.findRegisterDefOperand(X86::EFLAGS)) {		if (MI.findRegisterDefOperand(X86::EFLAGS)) {
// If EFLAGS are defined, it's as-if they were killed. We can stop		// If EFLAGS are defined, it's as-if they were killed. We can stop
// scanning here.		// scanning here.
//		//
// NB!!! Many instructions only modify some flags. LLVM currently		// NB!!! Many instructions only modify some flags. LLVM currently
// models this as clobbering all flags, but if that ever changes this		// models this as clobbering all flags, but if that ever changes
// will need to be carefully updated to handle that more complex		// this will need to be carefully updated to handle that more
// logic.		// complex logic.
FlagsKilled = true;		FlagsKilled = true;
break;		break;
}		}
continue;		continue;
}		}

DEBUG(dbgs() << " Rewriting use: "; MI.dump());		DEBUG(dbgs() << " Rewriting use: "; MI.dump());

// Check the kill flag before we rewrite as that may change it.		// Check the kill flag before we rewrite as that may change it.
if (FlagUse->isKill())		if (FlagUse->isKill())
FlagsKilled = true;		FlagsKilled = true;

// Once we encounter a branch, the rest of the instructions must also be		// Once we encounter a branch, the rest of the instructions must also be
// branches. We can't rewrite in place here, so we handle them below.		// branches. We can't rewrite in place here, so we handle them below.
//		//
// Note that we don't have to handle tail calls here, even conditional		// Note that we don't have to handle tail calls here, even conditional
// tail calls, as those are not introduced into the X86 MI until post-RA		// tail calls, as those are not introduced into the X86 MI until post-RA
// branch folding or black placement. As a consequence, we get to deal		// branch folding or black placement. As a consequence, we get to deal
// with the simpler formulation of conditional branches followed by tail		// with the simpler formulation of conditional branches followed by tail
// calls.		// calls.
if (X86::getCondFromBranchOpc(MI.getOpcode()) != X86::COND_INVALID) {		if (X86::getCondFromBranchOpc(MI.getOpcode()) != X86::COND_INVALID) {
auto JmpIt = MI.getIterator();		auto JmpIt = MI.getIterator();
do {		do {
JmpIs.push_back(&*JmpIt);		JmpIs.push_back(&*JmpIt);
++JmpIt;		++JmpIt;
} while (JmpIt != MBB.instr_end() &&		} while (JmpIt != UseMBB.instr_end() &&
X86::getCondFromBranchOpc(JmpIt->getOpcode()) !=		X86::getCondFromBranchOpc(JmpIt->getOpcode()) !=
X86::COND_INVALID);		X86::COND_INVALID);
break;		break;
}		}

// Otherwise we can just rewrite in-place.		// Otherwise we can just rewrite in-place.
if (X86::getCondFromCMovOpc(MI.getOpcode()) != X86::COND_INVALID) {		if (X86::getCondFromCMovOpc(MI.getOpcode()) != X86::COND_INVALID) {
rewriteCMov(TestMBB, TestPos, TestLoc, MI, *FlagUse, CondRegs);		rewriteCMov(TestMBB, TestPos, TestLoc, MI, *FlagUse, CondRegs);
} else if (X86::getCondFromSETOpc(MI.getOpcode()) != X86::COND_INVALID) {		} else if (X86::getCondFromSETOpc(MI.getOpcode()) !=
		X86::COND_INVALID) {
rewriteSetCC(TestMBB, TestPos, TestLoc, MI, *FlagUse, CondRegs);		rewriteSetCC(TestMBB, TestPos, TestLoc, MI, *FlagUse, CondRegs);
} else if (MI.getOpcode() == TargetOpcode::COPY) {		} else if (MI.getOpcode() == TargetOpcode::COPY) {
rewriteCopy(MI, *FlagUse, CopyDefI);		rewriteCopy(MI, *FlagUse, CopyDefI);
} else {		} else {
// We assume that arithmetic instructions that use flags also def them.		// We assume that arithmetic instructions that use flags also def
		// them.
assert(MI.findRegisterDefOperand(X86::EFLAGS) &&		assert(MI.findRegisterDefOperand(X86::EFLAGS) &&
"Expected a def of EFLAGS for this instruction!");		"Expected a def of EFLAGS for this instruction!");

// NB!!! Several arithmetic instructions only partially update		// NB!!! Several arithmetic instructions only partially update
// flags. Theoretically, we could generate MI code sequences that		// flags. Theoretically, we could generate MI code sequences that
// would rely on this fact and observe different flags independently.		// would rely on this fact and observe different flags independently.
// But currently LLVM models all of these instructions as clobbering		// But currently LLVM models all of these instructions as clobbering
// all the flags in an undef way. We rely on that to simplify the		// all the flags in an undef way. We rely on that to simplify the
// logic.		// logic.
FlagsKilled = true;		FlagsKilled = true;

rewriteArithmetic(TestMBB, TestPos, TestLoc, MI, *FlagUse, CondRegs);		rewriteArithmetic(TestMBB, TestPos, TestLoc, MI, *FlagUse, CondRegs);
break;		break;
}		}

// If this was the last use of the flags, we're done.		// If this was the last use of the flags, we're done.
if (FlagsKilled)		if (FlagsKilled)
break;		break;
}		}

// If we didn't find a kill (or equivalent) check that the flags don't		// If the flags were killed, we're done with this block.
// live-out of the basic block. Currently we don't support lowering copies		if (FlagsKilled)
// of flags that live out in this fashion.		break;
if (!FlagsKilled &&
llvm::any_of(MBB.successors(), [](MachineBasicBlock *SuccMBB) {		// Otherwise we need to scan successors for ones where the flags live-in
return SuccMBB->isLiveIn(X86::EFLAGS);		// and queue those up for processing.
})) {		for (MachineBasicBlock *SuccMBB : UseMBB.successors())
DEBUG({		if (SuccMBB->isLiveIn(X86::EFLAGS) &&
dbgs() << "ERROR: Found a copied EFLAGS live-out from basic block:\n"		VisitedBlocks.insert(SuccMBB).second)
<< "----\n";		Blocks.push_back(SuccMBB);
MBB.dump();		} while (!Blocks.empty());
dbgs() << "----\n"
<< "ERROR: Cannot lower this EFLAGS copy!\n";
});
report_fatal_error(
"Cannot lower EFLAGS copy that lives out of a basic block!");
}

// Now rewrite the jumps that use the flags. These we handle specially		// Now rewrite the jumps that use the flags. These we handle specially
// because if there are multiple jumps we'll have to do surgery on the CFG.		// because if there are multiple jumps in a single basic block we'll have
		// to do surgery on the CFG.
		MachineBasicBlock *LastJmpMBB = nullptr;
for (MachineInstr *JmpI : JmpIs) {		for (MachineInstr *JmpI : JmpIs) {
// Past the first jump we need to split the blocks apart.		// Past the first jump within a basic block we need to split the blocks
if (JmpI != JmpIs.front())		// apart.
		if (JmpI->getParent() == LastJmpMBB)
splitBlock(JmpI->getParent(), JmpI, *TII);		splitBlock(JmpI->getParent(), JmpI, *TII);
		else
		LastJmpMBB = JmpI->getParent();

rewriteCondJmp(TestMBB, TestPos, TestLoc, *JmpI, CondRegs);		rewriteCondJmp(TestMBB, TestPos, TestLoc, *JmpI, CondRegs);
}		}

// FIXME: Mark the last use of EFLAGS before the copy's def as a kill if		// FIXME: Mark the last use of EFLAGS before the copy's def as a kill if
// the copy's def operand is itself a kill.		// the copy's def operand is itself a kill.
}		}

▲ Show 20 Lines • Show All 230 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/O0-pipeline.ll

	Show All 31 Lines
	; CHECK-NEXT: Exception handling preparation			; CHECK-NEXT: Exception handling preparation
	; CHECK-NEXT: Safe Stack instrumentation pass			; CHECK-NEXT: Safe Stack instrumentation pass
	; CHECK-NEXT: Insert stack protectors			; CHECK-NEXT: Insert stack protectors
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: X86 DAG->DAG Instruction Selection			; CHECK-NEXT: X86 DAG->DAG Instruction Selection
	; CHECK-NEXT: X86 PIC Global Base Reg Initialization			; CHECK-NEXT: X86 PIC Global Base Reg Initialization
	; CHECK-NEXT: Expand ISel Pseudo-instructions			; CHECK-NEXT: Expand ISel Pseudo-instructions
	; CHECK-NEXT: Local Stack Slot Allocation			; CHECK-NEXT: Local Stack Slot Allocation
				; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: X86 EFLAGS copy lowering			; CHECK-NEXT: X86 EFLAGS copy lowering
	; CHECK-NEXT: X86 WinAlloca Expander			; CHECK-NEXT: X86 WinAlloca Expander
	; CHECK-NEXT: Eliminate PHI nodes for register allocation			; CHECK-NEXT: Eliminate PHI nodes for register allocation
	; CHECK-NEXT: Two-Address instruction pass			; CHECK-NEXT: Two-Address instruction pass
	; CHECK-NEXT: Fast Register Allocator			; CHECK-NEXT: Fast Register Allocator
	; CHECK-NEXT: Bundle Machine CFG Edges			; CHECK-NEXT: Bundle Machine CFG Edges
	; CHECK-NEXT: X86 FP Stackifier			; CHECK-NEXT: X86 FP Stackifier
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	Show All 23 Lines

llvm/test/CodeGen/X86/O3-pipeline.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Machine code sinking			; CHECK-NEXT: Machine code sinking
	; CHECK-NEXT: Peephole Optimizations			; CHECK-NEXT: Peephole Optimizations
	; CHECK-NEXT: Remove dead machine instructions			; CHECK-NEXT: Remove dead machine instructions
	; CHECK-NEXT: Live Range Shrink			; CHECK-NEXT: Live Range Shrink
	; CHECK-NEXT: X86 Fixup SetCC			; CHECK-NEXT: X86 Fixup SetCC
	; CHECK-NEXT: X86 LEA Optimize			; CHECK-NEXT: X86 LEA Optimize
	; CHECK-NEXT: X86 Optimize Call Frame			; CHECK-NEXT: X86 Optimize Call Frame
	; CHECK-NEXT: X86 Avoid Store Forwarding Block			; CHECK-NEXT: X86 Avoid Store Forwarding Block
				; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: X86 EFLAGS copy lowering			; CHECK-NEXT: X86 EFLAGS copy lowering
	; CHECK-NEXT: X86 WinAlloca Expander			; CHECK-NEXT: X86 WinAlloca Expander
	; CHECK-NEXT: Detect Dead Lanes			; CHECK-NEXT: Detect Dead Lanes
	; CHECK-NEXT: Process Implicit Definitions			; CHECK-NEXT: Process Implicit Definitions
	; CHECK-NEXT: Remove unreachable machine basic blocks			; CHECK-NEXT: Remove unreachable machine basic blocks
	; CHECK-NEXT: Live Variable Analysis			; CHECK-NEXT: Live Variable Analysis
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/copy-eflags.ll

	Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
	then:			then:
	tail call void @external_a()			tail call void @external_a()
	ret void			ret void

	else:			else:
	tail call void @external_b()			tail call void @external_b()
	ret void			ret void
	}			}

				; Test a function that gets special select lowering into CFG with copied EFLAGS
				; threaded across the CFG. This requires our EFLAGS copy rewriting to handle
				; cross-block rewrites in at least some narrow cases.
				;
				; Note that the 'undef' values here are significant! Other values will in many
				; cases not effectively trigger the lowering we're interested in exercising.
				define void @PR37100(i8 %arg1, i16 %arg2, i64 %arg3) {
				; X32-LABEL: PR37100:
				; X32: # %bb.0: # %bb
				; X32-NEXT: pushl %ebx
				; X32-NEXT: .cfi_def_cfa_offset 8
				; X32-NEXT: pushl %edi
				; X32-NEXT: .cfi_def_cfa_offset 12
				; X32-NEXT: pushl %esi
				; X32-NEXT: .cfi_def_cfa_offset 16
				; X32-NEXT: .cfi_offset %esi, -16
				; X32-NEXT: .cfi_offset %edi, -12
				; X32-NEXT: .cfi_offset %ebx, -8
				; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X32-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X32-NEXT: movb {{[0-9]+}}(%esp), %bl
				; X32-NEXT: jmp .LBB3_1
				; X32-NEXT: .p2align 4, 0x90
				; X32-NEXT: .LBB3_5: # %bb1
				; X32-NEXT: # in Loop: Header=BB3_1 Depth=1
				; X32-NEXT: xorl %eax, %eax
				; X32-NEXT: xorl %edx, %edx
				; X32-NEXT: idivl %edi
				; X32-NEXT: .LBB3_1: # %bb1
				; X32-NEXT: # =>This Inner Loop Header: Depth=1
				; X32-NEXT: movsbl %bl, %eax
				; X32-NEXT: movl %eax, %edx
				; X32-NEXT: sarl $31, %edx
				; X32-NEXT: cmpl %eax, %esi
				; X32-NEXT: movl %ecx, %eax
				; X32-NEXT: sbbl %edx, %eax
				; X32-NEXT: setl %al
				; X32-NEXT: setl %dl
				; X32-NEXT: movzbl %dl, %edi
				; X32-NEXT: negl %edi
				; X32-NEXT: testb $-1, %al
				; X32-NEXT: jne .LBB3_3
				; X32-NEXT: # %bb.2: # %bb1
				; X32-NEXT: # in Loop: Header=BB3_1 Depth=1
				; X32-NEXT: # implicit-def: $bl
				; X32-NEXT: .LBB3_3: # %bb1
				; X32-NEXT: # in Loop: Header=BB3_1 Depth=1
				; X32-NEXT: testb $-1, %al
				; X32-NEXT: jne .LBB3_5
				; X32-NEXT: # %bb.4: # %bb1
				; X32-NEXT: # in Loop: Header=BB3_1 Depth=1
				; X32-NEXT: # implicit-def: $edi
				; X32-NEXT: jmp .LBB3_5
				;
				; X64-LABEL: PR37100:
				; X64: # %bb.0: # %bb
				; X64-NEXT: movq %rdx, %rcx
				; X64-NEXT: jmp .LBB3_1
				; X64-NEXT: .p2align 4, 0x90
				; X64-NEXT: .LBB3_3: # %bb1
				; X64-NEXT: # in Loop: Header=BB3_1 Depth=1
				; X64-NEXT: cmovll %eax, %esi
				; X64-NEXT: xorl %eax, %eax
				; X64-NEXT: xorl %edx, %edx
				; X64-NEXT: idivl %esi
				; X64-NEXT: .LBB3_1: # %bb1
				; X64-NEXT: # =>This Inner Loop Header: Depth=1
				; X64-NEXT: movsbq %dil, %rdx
				; X64-NEXT: xorl %eax, %eax
				; X64-NEXT: cmpq %rdx, %rcx
				; X64-NEXT: setl %al
				; X64-NEXT: negl %eax
				; X64-NEXT: cmpq %rdx, %rcx
				; X64-NEXT: jl .LBB3_3
				; X64-NEXT: # %bb.2: # %bb1
				; X64-NEXT: # in Loop: Header=BB3_1 Depth=1
				; X64-NEXT: # implicit-def: $dil
				; X64-NEXT: jmp .LBB3_3
				bb:
				br label %bb1

				bb1:
				%tmp = phi i8 [ %tmp8, %bb1 ], [ %arg1, %bb ]
				%tmp2 = phi i16 [ %tmp11, %bb1 ], [ %arg2, %bb ]
				%tmp3 = icmp sgt i16 %tmp2, 7
				%tmp4 = select i1 %tmp3, i16 %tmp2, i16 7
				%tmp5 = sext i8 %tmp to i64
				%tmp6 = icmp slt i64 %arg3, %tmp5
				%tmp7 = sext i1 %tmp6 to i32
				%tmp8 = select i1 %tmp6, i8 %tmp, i8 undef
				%tmp9 = select i1 %tmp6, i32 %tmp7, i32 undef
				%tmp10 = srem i32 0, %tmp9
				%tmp11 = trunc i32 %tmp10 to i16
				br label %bb1
				rnkUnsubmitted Not Done Reply Inline Actions This infinite loop with selects of undef feels over-reduced. It seems fragile, since a pass could come along and optimize the select to the non-undef operand. I liked the test case from PR37133, but it's reliance on debug info (any, not just standalone) scares me: int a, b; unsigned long long c; void e() { long long d; b = 33; d = a; if (c < a) { b = 32; d = c; } if (d) b = 2; } We should reopen it or file a split separate bug, because we generate two more BBs on that code when debug info is enabled. :( In any case, is there a better way to write this test case using legalized arithmetic? An i128 x64 test case with live-out eflags might also be interesting. rnk: This infinite loop with selects of undef feels over-reduced. It seems fragile, since a pass…
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions I actually tried some to remove the undef stuff. I couldn't find a pattern that triggered the bug. =[ I guess I can keep trying... chandlerc: I actually tried some to remove the undef stuff. I couldn't find a pattern that triggered the…
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions Got the undefs out. How much more should I do here? chandlerc: Got the undefs out. How much more should I do here?
				rnkUnsubmitted Not Done Reply Inline Actions I guess the live-out eflags are getting created by a combination of SBB and select-to-jump x86 lowering logic. I guess this is as reduced as we can get. rnk: I guess the live-out eflags are getting created by a combination of SBB and select-to-jump x86…
				}