Download Raw Diff

Details

Reviewers

mkuper
rnk
maksfb

Commits

rG75e25f6812da: X86: Fold tail calls into conditional branches where possible (PR26302)
rL280832: X86: Fold tail calls into conditional branches where possible (PR26302)

Summary

When branching to a block that immediately tail calls, it is possible to fold the call directly into the branch if the call is direct and there is no stack adjustment, saving one byte.

Example:

define void @f(i32 %x, i32 %y) {
entry:
  %p = icmp eq i32 %x, %y
  br i1 %p, label %bb1, label %bb2
bb1:
  tail call void @foo()
  ret void
bb2:
  tail call void @bar()
  ret void
}

before:

f:
        movl    4(%esp), %eax
        cmpl    8(%esp), %eax
        jne     .LBB0_2
        jmp     foo
.LBB0_2:
        jmp     bar

after:

f:
        movl    4(%esp), %eax
        cmpl    8(%esp), %eax
        jne     bar
.LBB0_1:
        jmp     foo

I don't expect any significant size savings from this (on a Clang bootstrap I saw 288 bytes), but it does make the code a little tighter.

This patch only does 32-bit, but 64-bit would work similarly.

Diff Detail

Event Timeline

+Maksim who I think mentioned at Euro-LLVM that he'd looked into this.

forgot about the debug info

If there are multiple users of the tail call (e.g. via different incoming edges), this can actually create a size regression by replacing a byte-immediate jump with a 32bit jump?

In D24108#531040, @joerg wrote:

If there are multiple users of the tail call (e.g. via different incoming edges), this can actually create a size regression by replacing a byte-immediate jump with a 32bit jump?

The transformation only occurs when there's a single edge to the tail call.

@hans: I think it is safe to do this optimization while optimizing for size. If you do it by default it could be a performance regression if you reverse direction of the conditional branch. That's what we found out while testing on a micro benchmark. I believe it's related to the way BTB works on Intel's CPUs. The problem is that unless you are doing LTO or work at the binary level, you don't always know the new direction of the branch.

Don't know a lot about tail call handling, so the review with a grain of salt.
(The large amount of new instructions seems unfortunate, but, as above, I don't know enough about this to say whether this is avoidable...)

In D24108#531998, @maksfb wrote:

@hans: I think it is safe to do this optimization while optimizing for size. If you do it by default it could be a performance regression if you reverse direction of the conditional branch. That's what we found out while testing on a micro benchmark. I believe it's related to the way BTB works on Intel's CPUs. The problem is that unless you are doing LTO or work at the binary level, you don't always know the new direction of the branch.

Is there a reason to believe the change in direction will cause worse performance, systematically?
Or is this just a random effect on that particular microbenchmark, and we may gain performance in other cases?

include/llvm/Target/TargetInstrInfo.h
1103	Perhaps add an assert that we never get here?
lib/Target/X86/X86InstrInfo.cpp
4028	What do you think about: if (BranchCond[0].getImm() > X86::LAST_VALID_COND) return false; Are there any valid condition codes you don't support? Admittedly, the current version is safer, but I'm not sure there's a point to future-proofing this against X86 adding condition codes.
4050	Maybe check this before checking the condition? (This would seem to be the most common reason for failure)
4079	Shouldn't I be contained in BranchCond?
4080	Can you explain what guarantees this? I didn't see a check in canMakeTailCallConditional().
test/CodeGen/X86/conditional-tailcall.ll
21	Why do we need an encoding check? (Probably better document this in the test itself, too.)

In D24108#532166, @mkuper wrote:

Don't know a lot about tail call handling, so the review with a grain of salt.
(The large amount of new instructions seems unfortunate, but, as above, I don't know enough about this to say whether this is avoidable...)

In D24108#531998, @maksfb wrote:

@hans: I think it is safe to do this optimization while optimizing for size. If you do it by default it could be a performance regression if you reverse direction of the conditional branch. That's what we found out while testing on a micro benchmark. I believe it's related to the way BTB works on Intel's CPUs. The problem is that unless you are doing LTO or work at the binary level, you don't always know the new direction of the branch.

Is there a reason to believe the change in direction will cause worse performance, systematically?
Or is this just a random effect on that particular microbenchmark, and we may gain performance in other cases?

It's hard to say, it will depend on the application. If we assume that the original branch direction was selected intentionally (based on compiler hints, profile, or otherwise) then we probably shouldn't expect a gain from reversing the direction. And in the micro benchmark the negative effect of the reversal severely outweighed a benefit from less instructions executed when direction was the same. Can't speculate beyond the micro benchmark as we weren't able to measure the effect on real-world applications beyond applying it to our jiited code in hhvm. But there we could guarantee the branch direction stayed the same.

Addressing comments, doing optsize only and avoiding adding so many instructions.

I'm a little bit confused by the multiple levels of pseudo-instructions here actually. We have TCRETURNdi which is expanded in X86ExpandPseudo to a TCRETURNdi, but that is actually replaced by a JMP_1 when it comes to MC lowering. IIUC, this is because we want to put different flags and stuff on TCRETURNdi and JMP_1, but there can't be two real instructions with the same encoding, so the first one is marked isCodeGenOnly.

Anyway, I think this version of the patch is a little less messier than the first.

Thanks very much for the review. New version uploaded.

lib/Target/X86/X86InstrInfo.cpp
4028	Thanks, that works great. I was worried about special condition codes like COND_NE_OR_P, but LAST_VALID_COND handles that.
4079	Hmm, I'm not sure I'm following. BranchCond is what we got from analyzeBranch() before. We don't really have a handle to the actual branch instruction, which is why we're searching for it here.
4080	BranchCond originally comes from X86InstrInfo::analyzeBranch(), and that one only puts one element in it. I'll add the same assert to canMakeTailCallConditional().

Thanks Hans, this looks much nicer!

lib/Target/X86/X86InstrInfo.cpp
4079	Sorry, I got confused. X86InstrInfo::AnalyzeBranchImpl() also returns a vector of MachineInstructions, but the analyzeBranch() interface doesn't expose that, only the MachineOperands. How inconvenient. Anything we can do about this, or do you think it would be better not to touch this?
lib/Target/X86/X86MCInstLower.cpp
508	Can you use X86::GetCondBranchFromCond()?

Addressing comments.

lib/Target/X86/X86InstrInfo.cpp
4079	We could change analyzeBranch() I suppose, but it would probably break some out-of-tree backends, and I'm not sure it's worth it.
lib/Target/X86/X86MCInstLower.cpp
508	Ah yes, much nicer.
test/CodeGen/X86/conditional-tailcall.ll
22	(Forgot to reply to this the first time.) When I worked on the patch, I initially forgot to change X86MCInstLower::Lower, which meant the printed assembly looked correct, but the binary instruction wasn't correct, so I wanted to test that. I'll add a comment to the test.

LGTM

lib/Target/X86/X86InstrInfo.cpp
4079	Yeah, you're right, I can't think of a really nice way to handle this, since the analyzeBranches() call is in in a generic part of this patch, not x86-specific. And the search here should normally be really short anyway. But could you please leave a comment documenting this, in case someone decides to refactor this later.
test/CodeGen/X86/conditional-tailcall.ll
22	I'm a bit confused about this. Without the change to X86MCInstLower, I'd expect you to get complete nonsense, not a poorly encoded jmp. Anyway, that's a problem with my understanding of this, not your patch. :-)

This revision is now accepted and ready to land.Sep 7 2016, 10:29 AM

hans added inline comments.Sep 7 2016, 11:00 AM

lib/Target/X86/X86InstrInfo.cpp
4079	Adding a comment.
test/CodeGen/X86/conditional-tailcall.ll
22	Yes, the bits were garbage, but with the original version of my patch it still got printed as "jne" in the assembly. That wouldn't happen with the current version, but it still seems like a good idea to do a quick check of the encoding.

Closed by commit rL280832: X86: Fold tail calls into conditional branches where possible (PR26302) (authored by hans). · Explain WhySep 7 2016, 11:00 AM

This revision was automatically updated to reflect the committed changes.

Diff 70544

include/llvm/Target/TargetInstrInfo.h

Show First 20 Lines • Show All 1,081 Lines • ▼ Show 20 Lines	public:
virtual bool isPredicated(const MachineInstr &MI) const {		virtual bool isPredicated(const MachineInstr &MI) const {
return false;		return false;
}		}

/// Returns true if the instruction is a		/// Returns true if the instruction is a
/// terminator instruction that has not been predicated.		/// terminator instruction that has not been predicated.
virtual bool isUnpredicatedTerminator(const MachineInstr &MI) const;		virtual bool isUnpredicatedTerminator(const MachineInstr &MI) const;

		/// Returns true if MI is an unconditional tail call.
		virtual bool isUnconditionalTailCall(const MachineInstr &MI) const {
		return false;
		}

		/// Returns true if the tail call can be made conditional on BranchCond.
		virtual bool
		canMakeTailCallConditional(SmallVectorImpl<MachineOperand> &Cond,
		const MachineInstr &TailCall) const {
		return false;
		}

		/// Replace the conditional branch in MBB with a conditional tail call.
		virtual void replaceBranchWithTailCall(MachineBasicBlock &MBB,
		mkuperUnsubmitted Done Reply Inline Actions Perhaps add an assert that we never get here? mkuper: Perhaps add an assert that we never get here?
		SmallVectorImpl<MachineOperand> &Cond,
		const MachineInstr &TailCall) const {
		llvm_unreachable("Target didn't implement replaceBranchWithTailCall!");
		}

/// Convert the instruction into a predicated instruction.		/// Convert the instruction into a predicated instruction.
/// It returns true if the operation was successful.		/// It returns true if the operation was successful.
virtual bool PredicateInstruction(MachineInstr &MI,		virtual bool PredicateInstruction(MachineInstr &MI,
ArrayRef<MachineOperand> Pred) const;		ArrayRef<MachineOperand> Pred) const;

/// Returns true if the first specified predicate		/// Returns true if the first specified predicate
/// subsumes the second, e.g. GE subsumes GT.		/// subsumes the second, e.g. GE subsumes GT.
virtual		virtual
▲ Show 20 Lines • Show All 427 Lines • Show Last 20 Lines

lib/CodeGen/BranchFolding.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "branchfolding"		#define DEBUG_TYPE "branchfolding"

STATISTIC(NumDeadBlocks, "Number of dead blocks removed");		STATISTIC(NumDeadBlocks, "Number of dead blocks removed");
STATISTIC(NumBranchOpts, "Number of branches optimized");		STATISTIC(NumBranchOpts, "Number of branches optimized");
STATISTIC(NumTailMerge , "Number of block tails merged");		STATISTIC(NumTailMerge , "Number of block tails merged");
STATISTIC(NumHoist , "Number of times common instructions are hoisted");		STATISTIC(NumHoist , "Number of times common instructions are hoisted");
		STATISTIC(NumTailCalls, "Number of tail calls optimized");

static cl::opt<cl::boolOrDefault> FlagEnableTailMerge("enable-tail-merge",		static cl::opt<cl::boolOrDefault> FlagEnableTailMerge("enable-tail-merge",
cl::init(cl::BOU_UNSET), cl::Hidden);		cl::init(cl::BOU_UNSET), cl::Hidden);

// Throttle for huge numbers of predecessors (compile speed problems)		// Throttle for huge numbers of predecessors (compile speed problems)
static cl::opt<unsigned>		static cl::opt<unsigned>
TailMergeThreshold("tail-merge-threshold",		TailMergeThreshold("tail-merge-threshold",
cl::desc("Max number of predecessors to consider tail merging"),		cl::desc("Max number of predecessors to consider tail merging"),
▲ Show 20 Lines • Show All 1,383 Lines • ▼ Show 20 Lines	if (MBB->succ_empty() && !PriorCond.empty() && !PriorFBB &&
MadeChange = true;		MadeChange = true;
++NumBranchOpts;		++NumBranchOpts;
return MadeChange;		return MadeChange;
}		}
}		}
}		}
}		}

		if (!IsEmptyBlock(MBB) && MBB->pred_size() == 1 &&
		MF.getFunction()->optForSize()) {
		// Changing "Jcc foo; foo: jmp bar;" into "Jcc bar;" might change the branch
		// direction, thereby defeating careful block placement and regressing
		// performance. Therefore, only consider this for optsize functions.
		MachineInstr &TailCall = *MBB->getFirstNonDebugInstr();
		if (TII->isUnconditionalTailCall(TailCall)) {
		MachineBasicBlock Pred = MBB->pred_begin();
		MachineBasicBlock PredTBB = nullptr, PredFBB = nullptr;
		SmallVector<MachineOperand, 4> PredCond;
		bool PredAnalyzable =
		!TII->analyzeBranch(*Pred, PredTBB, PredFBB, PredCond, true);

		if (PredAnalyzable && !PredCond.empty() && PredTBB == MBB) {
		// The predecessor has a conditional branch to this block which consists
		// of only a tail call. Try to fold the tail call into the conditional
		// branch.
		if (TII->canMakeTailCallConditional(PredCond, TailCall)) {
		TII->replaceBranchWithTailCall(*Pred, PredCond, TailCall);
		++NumTailCalls;
		Pred->removeSuccessor(MBB);
		MadeChange = true;
		return MadeChange;
		}
		}
		// If the predecessor is falling through to this block, we could reverse
		// the branch condition and fold the tail call into that. However, after
		// that we might have to re-arrange the CFG to fall through to the other
		// block and there is a high risk of regressing code size rather than
		// improving it.
		}
		}

// Analyze the branch in the current block.		// Analyze the branch in the current block.
MachineBasicBlock CurTBB = nullptr, CurFBB = nullptr;		MachineBasicBlock CurTBB = nullptr, CurFBB = nullptr;
SmallVector<MachineOperand, 4> CurCond;		SmallVector<MachineOperand, 4> CurCond;
bool CurUnAnalyzable =		bool CurUnAnalyzable =
TII->analyzeBranch(*MBB, CurTBB, CurFBB, CurCond, true);		TII->analyzeBranch(*MBB, CurTBB, CurFBB, CurCond, true);
if (!CurUnAnalyzable) {		if (!CurUnAnalyzable) {
// If the CFG for the prior block has extra edges, remove them.		// If the CFG for the prior block has extra edges, remove them.
MadeChange \|= MBB->CorrectExtraCFGEdges(CurTBB, CurFBB, !CurCond.empty());		MadeChange \|= MBB->CorrectExtraCFGEdges(CurTBB, CurFBB, !CurCond.empty());
▲ Show 20 Lines • Show All 490 Lines • Show Last 20 Lines

lib/Target/X86/X86ExpandPseudo.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	bool X86ExpandPseudo::ExpandMI(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI) {		MachineBasicBlock::iterator MBBI) {
MachineInstr &MI = *MBBI;		MachineInstr &MI = *MBBI;
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();
DebugLoc DL = MBBI->getDebugLoc();		DebugLoc DL = MBBI->getDebugLoc();
switch (Opcode) {		switch (Opcode) {
default:		default:
return false;		return false;
case X86::TCRETURNdi:		case X86::TCRETURNdi:
		case X86::TCRETURNdicc:
case X86::TCRETURNri:		case X86::TCRETURNri:
case X86::TCRETURNmi:		case X86::TCRETURNmi:
case X86::TCRETURNdi64:		case X86::TCRETURNdi64:
case X86::TCRETURNri64:		case X86::TCRETURNri64:
case X86::TCRETURNmi64: {		case X86::TCRETURNmi64: {
bool isMem = Opcode == X86::TCRETURNmi \|\| Opcode == X86::TCRETURNmi64;		bool isMem = Opcode == X86::TCRETURNmi \|\| Opcode == X86::TCRETURNmi64;
MachineOperand &JumpTarget = MBBI->getOperand(0);		MachineOperand &JumpTarget = MBBI->getOperand(0);
MachineOperand &StackAdjust = MBBI->getOperand(isMem ? 5 : 1);		MachineOperand &StackAdjust = MBBI->getOperand(isMem ? 5 : 1);
assert(StackAdjust.isImm() && "Expecting immediate value.");		assert(StackAdjust.isImm() && "Expecting immediate value.");

// Adjust stack pointer.		// Adjust stack pointer.
int StackAdj = StackAdjust.getImm();		int StackAdj = StackAdjust.getImm();
int MaxTCDelta = X86FI->getTCReturnAddrDelta();		int MaxTCDelta = X86FI->getTCReturnAddrDelta();
int Offset = 0;		int Offset = 0;
assert(MaxTCDelta <= 0 && "MaxTCDelta should never be positive");		assert(MaxTCDelta <= 0 && "MaxTCDelta should never be positive");

// Incoporate the retaddr area.		// Incoporate the retaddr area.
Offset = StackAdj-MaxTCDelta;		Offset = StackAdj - MaxTCDelta;
assert(Offset >= 0 && "Offset should never be negative");		assert(Offset >= 0 && "Offset should never be negative");

		if (Opcode == X86::TCRETURNdicc) {
		assert(Offset == 0 && "Conditional tail call cannot adjust the stack.");
		}

if (Offset) {		if (Offset) {
// Check for possible merge with preceding ADD instruction.		// Check for possible merge with preceding ADD instruction.
Offset += X86FL->mergeSPUpdates(MBB, MBBI, true);		Offset += X86FL->mergeSPUpdates(MBB, MBBI, true);
X86FL->emitSPUpdate(MBB, MBBI, Offset, /InEpilogue=/true);		X86FL->emitSPUpdate(MBB, MBBI, Offset, /InEpilogue=/true);
}		}

// Jump to label or value in register.		// Jump to label or value in register.
bool IsWin64 = STI->isTargetWin64();		bool IsWin64 = STI->isTargetWin64();
if (Opcode == X86::TCRETURNdi \|\| Opcode == X86::TCRETURNdi64) {		if (Opcode == X86::TCRETURNdi \|\| Opcode == X86::TCRETURNdicc \|\|
unsigned Op = (Opcode == X86::TCRETURNdi)		Opcode == X86::TCRETURNdi64) {
? X86::TAILJMPd		unsigned Op;
: (IsWin64 ? X86::TAILJMPd64_REX : X86::TAILJMPd64);		switch (Opcode) {
		case X86::TCRETURNdi:
		Op = X86::TAILJMPd;
		break;
		case X86::TCRETURNdicc:
		Op = X86::TAILJMPd_CC;
		break;
		default:
		Op = IsWin64 ? X86::TAILJMPd64_REX : X86::TAILJMPd64;
		break;
		}
MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(Op));		MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(Op));
if (JumpTarget.isGlobal())		if (JumpTarget.isGlobal()) {
MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(),		MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(),
JumpTarget.getTargetFlags());		JumpTarget.getTargetFlags());
else {		} else {
assert(JumpTarget.isSymbol());		assert(JumpTarget.isSymbol());
MIB.addExternalSymbol(JumpTarget.getSymbolName(),		MIB.addExternalSymbol(JumpTarget.getSymbolName(),
JumpTarget.getTargetFlags());		JumpTarget.getTargetFlags());
}		}
		if (Op == X86::TAILJMPd_CC) {
		MIB.addImm(MBBI->getOperand(2).getImm());
		}

} else if (Opcode == X86::TCRETURNmi \|\| Opcode == X86::TCRETURNmi64) {		} else if (Opcode == X86::TCRETURNmi \|\| Opcode == X86::TCRETURNmi64) {
unsigned Op = (Opcode == X86::TCRETURNmi)		unsigned Op = (Opcode == X86::TCRETURNmi)
? X86::TAILJMPm		? X86::TAILJMPm
: (IsWin64 ? X86::TAILJMPm64_REX : X86::TAILJMPm64);		: (IsWin64 ? X86::TAILJMPm64_REX : X86::TAILJMPm64);
MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(Op));		MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(Op));
for (unsigned i = 0; i != 5; ++i)		for (unsigned i = 0; i != 5; ++i)
MIB.addOperand(MBBI->getOperand(i));		MIB.addOperand(MBBI->getOperand(i));
} else if (Opcode == X86::TCRETURNri64) {		} else if (Opcode == X86::TCRETURNri64) {
▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrControl.td

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines

	// Tail call stuff.			// Tail call stuff.

	let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,			let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
	isCodeGenOnly = 1, SchedRW = [WriteJumpLd] in			isCodeGenOnly = 1, SchedRW = [WriteJumpLd] in
	let Uses = [ESP] in {			let Uses = [ESP] in {
	def TCRETURNdi : PseudoI<(outs),			def TCRETURNdi : PseudoI<(outs),
	(ins i32imm_pcrel:$dst, i32imm:$offset), []>;			(ins i32imm_pcrel:$dst, i32imm:$offset), []>;
				def TCRETURNdicc : PseudoI<(outs),
				(ins i32imm_pcrel:$dst, i32imm:$offset, i32imm:$cond), []>;
	def TCRETURNri : PseudoI<(outs),			def TCRETURNri : PseudoI<(outs),
	(ins ptr_rc_tailcall:$dst, i32imm:$offset), []>;			(ins ptr_rc_tailcall:$dst, i32imm:$offset), []>;
	let mayLoad = 1 in			let mayLoad = 1 in
	def TCRETURNmi : PseudoI<(outs),			def TCRETURNmi : PseudoI<(outs),
	(ins i32mem_TC:$dst, i32imm:$offset), []>;			(ins i32mem_TC:$dst, i32imm:$offset), []>;

	// FIXME: The should be pseudo instructions that are lowered when going to			// FIXME: The should be pseudo instructions that are lowered when going to
	// mcinst.			// mcinst.
	def TAILJMPd : Ii32PCRel<0xE9, RawFrm, (outs),			def TAILJMPd : Ii32PCRel<0xE9, RawFrm, (outs),
	(ins i32imm_pcrel:$dst),			(ins i32imm_pcrel:$dst),
	"jmp\t$dst",			"jmp\t$dst",
	[], IIC_JMP_REL>;			[], IIC_JMP_REL>;

				// This gets substituted to a conditional jump instruction in MC lowering.
				def TAILJMPd_CC : Ii32PCRel<0x80, RawFrm, (outs),
				(ins i32imm_pcrel:$dst, i32imm:$cond),
				"",
				[], IIC_JMP_REL>;

	def TAILJMPr : I<0xFF, MRM4r, (outs), (ins ptr_rc_tailcall:$dst),			def TAILJMPr : I<0xFF, MRM4r, (outs), (ins ptr_rc_tailcall:$dst),
	"", [], IIC_JMP_REG>; // FIXME: Remove encoding when JIT is dead.			"", [], IIC_JMP_REG>; // FIXME: Remove encoding when JIT is dead.
	let mayLoad = 1 in			let mayLoad = 1 in
	def TAILJMPm : I<0xFF, MRM4m, (outs), (ins i32mem_TC:$dst),			def TAILJMPm : I<0xFF, MRM4m, (outs), (ins i32mem_TC:$dst),
	"jmp{l}\t{*}$dst", [], IIC_JMP_MEM>;			"jmp{l}\t{*}$dst", [], IIC_JMP_MEM>;
	}			}


	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	public:
/// FMA231 #3, #2, #1		/// FMA231 #3, #2, #1
unsigned getFMA3OpcodeToCommuteOperands(const MachineInstr &MI,		unsigned getFMA3OpcodeToCommuteOperands(const MachineInstr &MI,
unsigned SrcOpIdx1,		unsigned SrcOpIdx1,
unsigned SrcOpIdx2,		unsigned SrcOpIdx2,
const X86InstrFMA3Group &FMA3Group) const;		const X86InstrFMA3Group &FMA3Group) const;

// Branch analysis.		// Branch analysis.
bool isUnpredicatedTerminator(const MachineInstr &MI) const override;		bool isUnpredicatedTerminator(const MachineInstr &MI) const override;
		bool isUnconditionalTailCall(const MachineInstr &MI) const override;
		bool canMakeTailCallConditional(SmallVectorImpl<MachineOperand> &Cond,
		const MachineInstr &TailCall) const override;
		void replaceBranchWithTailCall(MachineBasicBlock &MBB,
		SmallVectorImpl<MachineOperand> &Cond,
		const MachineInstr &TailCall) const override;

bool analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,		bool analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,		MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify) const override;		bool AllowModify) const override;

bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg,		bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg,
int64_t &Offset,		int64_t &Offset,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;
▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,001 Lines • ▼ Show 20 Lines	bool X86InstrInfo::isUnpredicatedTerminator(const MachineInstr &MI) const {
// Conditional branch is a special case.		// Conditional branch is a special case.
if (MI.isBranch() && !MI.isBarrier())		if (MI.isBranch() && !MI.isBarrier())
return true;		return true;
if (!MI.isPredicable())		if (!MI.isPredicable())
return true;		return true;
return !isPredicated(MI);		return !isPredicated(MI);
}		}

		bool X86InstrInfo::isUnconditionalTailCall(const MachineInstr &MI) const {
		switch (MI.getOpcode()) {
		case X86::TCRETURNdi:
		case X86::TCRETURNri:
		case X86::TCRETURNmi:
		case X86::TCRETURNdi64:
		case X86::TCRETURNri64:
		case X86::TCRETURNmi64:
		return true;
		default:
		return false;
		}
		}

		bool X86InstrInfo::canMakeTailCallConditional(
		SmallVectorImpl<MachineOperand> &BranchCond,
		const MachineInstr &TailCall) const {
		if (TailCall.getOpcode() != X86::TCRETURNdi) {
		// Only direct calls can be done with a conditional branch.
		mkuperUnsubmitted Not Done Reply Inline Actions What do you think about: if (BranchCond[0].getImm() > X86::LAST_VALID_COND) return false; Are there any valid condition codes you don't support? Admittedly, the current version is safer, but I'm not sure there's a point to future-proofing this against X86 adding condition codes. mkuper: What do you think about: ``` if (BranchCond[0].getImm() > X86::LAST_VALID_COND) return false…
		hansAuthorUnsubmitted Not Done Reply Inline Actions Thanks, that works great. I was worried about special condition codes like COND_NE_OR_P, but LAST_VALID_COND handles that. hans: Thanks, that works great. I was worried about special condition codes like COND_NE_OR_P, but…
		return false;
		}

		assert(BranchCond.size() == 1);
		if (BranchCond[0].getImm() > X86::LAST_VALID_COND) {
		// Can't make a conditional tail call with this condition.
		return false;
		}

		const X86MachineFunctionInfo *X86FI =
		TailCall.getParent()->getParent()->getInfo<X86MachineFunctionInfo>();
		if (X86FI->getTCReturnAddrDelta() != 0 \|\|
		TailCall.getOperand(1).getImm() != 0) {
		// A conditional tail call cannot do any stack adjustment.
		return false;
		}

		return true;
		}

		void X86InstrInfo::replaceBranchWithTailCall(
		MachineBasicBlock &MBB, SmallVectorImpl<MachineOperand> &BranchCond,
		mkuperUnsubmitted Done Reply Inline Actions Maybe check this before checking the condition? (This would seem to be the most common reason for failure) mkuper: Maybe check this before checking the condition? (This would seem to be the most common reason…
		const MachineInstr &TailCall) const {
		assert(canMakeTailCallConditional(BranchCond, TailCall));

		MachineBasicBlock::iterator I = MBB.end();
		while (I != MBB.begin()) {
		--I;
		if (I->isDebugValue())
		continue;
		if (!I->isBranch())
		assert(0 && "Can't find the branch to replace!");

		X86::CondCode CC = getCondFromBranchOpc(I->getOpcode());
		assert(BranchCond.size() == 1);
		if (CC != BranchCond[0].getImm())
		continue;

		break;
		}

		auto MIB = BuildMI(MBB, I, MBB.findDebugLoc(I), get(X86::TCRETURNdicc));
		MIB->addOperand(TailCall.getOperand(0)); // Destination.
		MIB.addImm(0); // Stack offset (not used).
		MIB->addOperand(BranchCond[0]); // Condition.
		MIB->addOperand(TailCall.getOperand(2)); // Regmask.

		I->eraseFromParent();
		}

// Given a MBB and its TBB, find the FBB which was a fallthrough MBB (it may		// Given a MBB and its TBB, find the FBB which was a fallthrough MBB (it may
		mkuperUnsubmitted Not Done Reply Inline Actions Shouldn't I be contained in BranchCond? mkuper: Shouldn't I be contained in BranchCond?
		hansAuthorUnsubmitted Not Done Reply Inline Actions Hmm, I'm not sure I'm following. BranchCond is what we got from analyzeBranch() before. We don't really have a handle to the actual branch instruction, which is why we're searching for it here. hans: Hmm, I'm not sure I'm following. BranchCond is what we got from analyzeBranch() before. We…
		mkuperUnsubmitted Not Done Reply Inline Actions Sorry, I got confused. X86InstrInfo::AnalyzeBranchImpl() also returns a vector of MachineInstructions, but the analyzeBranch() interface doesn't expose that, only the MachineOperands. How inconvenient. Anything we can do about this, or do you think it would be better not to touch this? mkuper: Sorry, I got confused. X86InstrInfo::AnalyzeBranchImpl() also returns a vector of…
		hansAuthorUnsubmitted Not Done Reply Inline Actions We could change analyzeBranch() I suppose, but it would probably break some out-of-tree backends, and I'm not sure it's worth it. hans: We could change analyzeBranch() I suppose, but it would probably break some out-of-tree…
		mkuperUnsubmitted Not Done Reply Inline Actions Yeah, you're right, I can't think of a really nice way to handle this, since the analyzeBranches() call is in in a generic part of this patch, not x86-specific. And the search here should normally be really short anyway. But could you please leave a comment documenting this, in case someone decides to refactor this later. mkuper: Yeah, you're right, I can't think of a really nice way to handle this, since the…
		hansAuthorUnsubmitted Not Done Reply Inline Actions Adding a comment. hans: Adding a comment.
// not be a fallthrough MBB now due to layout changes). Return nullptr if the		// not be a fallthrough MBB now due to layout changes). Return nullptr if the
		mkuperUnsubmitted Not Done Reply Inline Actions Can you explain what guarantees this? I didn't see a check in canMakeTailCallConditional(). mkuper: Can you explain what guarantees this? I didn't see a check in canMakeTailCallConditional().
		hansAuthorUnsubmitted Not Done Reply Inline Actions BranchCond originally comes from X86InstrInfo::analyzeBranch(), and that one only puts one element in it. I'll add the same assert to canMakeTailCallConditional(). hans: BranchCond originally comes from X86InstrInfo::analyzeBranch(), and that one only puts one…
// fallthrough MBB cannot be identified.		// fallthrough MBB cannot be identified.
static MachineBasicBlock getFallThroughMBB(MachineBasicBlock MBB,		static MachineBasicBlock getFallThroughMBB(MachineBasicBlock MBB,
MachineBasicBlock *TBB) {		MachineBasicBlock *TBB) {
// Look for non-EHPad successors other than TBB. If we find exactly one, it		// Look for non-EHPad successors other than TBB. If we find exactly one, it
// is the fallthrough MBB. If we find zero, then TBB is both the target MBB		// is the fallthrough MBB. If we find zero, then TBB is both the target MBB
// and fallthrough MBB. If we find more than one, we cannot identify the		// and fallthrough MBB. If we find more than one, we cannot identify the
// fallthrough MBB and should return nullptr.		// fallthrough MBB and should return nullptr.
MachineBasicBlock *FallthroughBB = nullptr;		MachineBasicBlock *FallthroughBB = nullptr;
▲ Show 20 Lines • Show All 4,373 Lines • Show Last 20 Lines

lib/Target/X86/X86MCInstLower.cpp

Show First 20 Lines • Show All 493 Lines • ▼ Show 20 Lines	case X86::CATCHRET: {
const X86Subtarget &Subtarget = AsmPrinter.getSubtarget();		const X86Subtarget &Subtarget = AsmPrinter.getSubtarget();
unsigned ReturnReg = Subtarget.is64Bit() ? X86::RAX : X86::EAX;		unsigned ReturnReg = Subtarget.is64Bit() ? X86::RAX : X86::EAX;
OutMI = MCInst();		OutMI = MCInst();
OutMI.setOpcode(getRetOpcode(Subtarget));		OutMI.setOpcode(getRetOpcode(Subtarget));
OutMI.addOperand(MCOperand::createReg(ReturnReg));		OutMI.addOperand(MCOperand::createReg(ReturnReg));
break;		break;
}		}

// TAILJMPd, TAILJMPd64 - Lower to the correct jump instructions.		// TAILJMPd, TAILJMPd64, TailJMPd_cc - Lower to the correct jump instruction.
case X86::TAILJMPr:		{ unsigned Opcode;
case X86::TAILJMPd:		case X86::TAILJMPr: Opcode = X86::JMP32r; goto SetTailJmpOpcode;
case X86::TAILJMPd64: {
unsigned Opcode;
switch (OutMI.getOpcode()) {
default: llvm_unreachable("Invalid opcode");
case X86::TAILJMPr: Opcode = X86::JMP32r; break;
case X86::TAILJMPd:		case X86::TAILJMPd:
case X86::TAILJMPd64: Opcode = X86::JMP_1; break;		case X86::TAILJMPd64: Opcode = X86::JMP_1; goto SetTailJmpOpcode;
}		case X86::TAILJMPd_CC:
		Opcode = X86::GetCondBranchFromCond(
		mkuperUnsubmitted Not Done Reply Inline Actions Can you use X86::GetCondBranchFromCond()? mkuper: Can you use X86::GetCondBranchFromCond()?
		hansAuthorUnsubmitted Not Done Reply Inline Actions Ah yes, much nicer. hans: Ah yes, much nicer.
		static_cast<X86::CondCode>(MI->getOperand(1).getImm()));
		goto SetTailJmpOpcode;

		SetTailJmpOpcode:
MCOperand Saved = OutMI.getOperand(0);		MCOperand Saved = OutMI.getOperand(0);
OutMI = MCInst();		OutMI = MCInst();
OutMI.setOpcode(Opcode);		OutMI.setOpcode(Opcode);
OutMI.addOperand(Saved);		OutMI.addOperand(Saved);
break;		break;
}		}

case X86::DEC16r:		case X86::DEC16r:
▲ Show 20 Lines • Show All 779 Lines • ▼ Show 20 Lines	case X86::CATCHRET: {
// Lower these as normal, but add some comments.		// Lower these as normal, but add some comments.
OutStreamer->AddComment("CATCHRET");		OutStreamer->AddComment("CATCHRET");
break;		break;
}		}

case X86::TAILJMPr:		case X86::TAILJMPr:
case X86::TAILJMPm:		case X86::TAILJMPm:
case X86::TAILJMPd:		case X86::TAILJMPd:
		case X86::TAILJMPd_CC:
case X86::TAILJMPr64:		case X86::TAILJMPr64:
case X86::TAILJMPm64:		case X86::TAILJMPm64:
case X86::TAILJMPd64:		case X86::TAILJMPd64:
case X86::TAILJMPr64_REX:		case X86::TAILJMPr64_REX:
case X86::TAILJMPm64_REX:		case X86::TAILJMPm64_REX:
case X86::TAILJMPd64_REX:		case X86::TAILJMPd64_REX:
// Lower these as normal, but add some comments.		// Lower these as normal, but add some comments.
OutStreamer->AddComment("TAILCALL");		OutStreamer->AddComment("TAILCALL");
▲ Show 20 Lines • Show All 425 Lines • Show Last 20 Lines

test/CodeGen/X86/conditional-tailcall.ll

This file was added.

				; RUN: llc < %s -march=x86 -show-mc-encoding \| FileCheck %s

				declare void @foo()
				declare void @bar()

				define void @f(i32 %x, i32 %y) optsize {
				entry:
				%p = icmp eq i32 %x, %y
				br i1 %p, label %bb1, label %bb2
				bb1:
				tail call void @foo()
				ret void
				bb2:
				tail call void @bar()
				ret void
				}

				; CHECK-LABEL: f:
				; CHECK: cmp
				; CHECK: jne bar
				; Check that the asm doesn't just look good, but uses the correct encoding.
				mkuperUnsubmitted Not Done Reply Inline Actions Why do we need an encoding check? (Probably better document this in the test itself, too.) mkuper: Why do we need an encoding check? (Probably better document this in the test itself, too.)
				; CHECK: encoding: [0x75,A]
				hansAuthorUnsubmitted Not Done Reply Inline Actions (Forgot to reply to this the first time.) When I worked on the patch, I initially forgot to change X86MCInstLower::Lower, which meant the printed assembly looked correct, but the binary instruction wasn't correct, so I wanted to test that. I'll add a comment to the test. hans: (Forgot to reply to this the first time.) When I worked on the patch, I initially forgot to…
				mkuperUnsubmitted Not Done Reply Inline Actions I'm a bit confused about this. Without the change to X86MCInstLower, I'd expect you to get complete nonsense, not a poorly encoded jmp. Anyway, that's a problem with my understanding of this, not your patch. :-) mkuper: I'm a bit confused about this. Without the change to X86MCInstLower, I'd expect you to get…
				hansAuthorUnsubmitted Not Done Reply Inline Actions Yes, the bits were garbage, but with the original version of my patch it still got printed as "jne" in the assembly. That wouldn't happen with the current version, but it still seems like a good idea to do a quick check of the encoding. hans: Yes, the bits were garbage, but with the original version of my patch it still got printed as…

				; CHECK: jmp foo

This is an archive of the discontinued LLVM Phabricator instance.

X86: Fold tail calls into conditional branches where possible (PR26302)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 70544

include/llvm/Target/TargetInstrInfo.h

lib/CodeGen/BranchFolding.cpp

lib/Target/X86/X86ExpandPseudo.cpp

lib/Target/X86/X86InstrControl.td

lib/Target/X86/X86InstrInfo.h

lib/Target/X86/X86InstrInfo.cpp

lib/Target/X86/X86MCInstLower.cpp

test/CodeGen/X86/conditional-tailcall.ll

This is an archive of the discontinued LLVM Phabricator instance.

X86: Fold tail calls into conditional branches where possible (PR26302)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 70544

include/llvm/Target/TargetInstrInfo.h

lib/CodeGen/BranchFolding.cpp

lib/Target/X86/X86ExpandPseudo.cpp

lib/Target/X86/X86InstrControl.td

lib/Target/X86/X86InstrInfo.h

lib/Target/X86/X86InstrInfo.cpp

lib/Target/X86/X86MCInstLower.cpp

test/CodeGen/X86/conditional-tailcall.ll

X86: Fold tail calls into conditional branches where possible (PR26302)
ClosedPublic