This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86InstrArithmetic.td
-
X86InstrInfo.h
-
X86InstrInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
GlobalISel/
-
add-scalar.ll
-
shl-scalar-widening.ll
-
shl-scalar.ll
-
fixup-bw-copy.ll
-
fshr.ll
-
iabs.ll
-
mul-constant-i8.ll
-
popcnt.ll
-
pr23664.ll
-
rotate4.ll
-
scheduler-backtracking.ll

Differential D55494

[x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
ClosedPublic

Authored by spatel on Dec 9 2018, 7:14 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
andreadb

Commits

rG44eaa492b872: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
rL348946: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA

Summary

This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. That allows us to combine add ops with register moves and save some instructions. This is another step towards allowing add truncation in generic DAGCombiner (see D54640).

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Dec 9 2018, 7:14 AM

Herald added subscribers: javed.absar, mcrosier. · View Herald TranscriptDec 9 2018, 7:14 AM

RKSimon added inline comments.Dec 9 2018, 8:48 AM

lib/Target/X86/X86InstrInfo.cpp
864 ↗	(On Diff #177424)	ADD8rr/ADD16rr
lib/Target/X86/X86InstrInfo.h
592 ↗	(On Diff #177424)	Would it cause too much code bloat to remove this default bool value? Or remove the argument and drive it off the MIOpc? bool Is16BitOp = !(MIOpc == X86::ADD8rr \|\| MIOpc == X86::ADD8ri);

Patch updated:
Determine op size based on the opcode parameter (no need to change the function signature).

spatel added inline comments.Dec 10 2018, 11:31 AM

lib/Target/X86/X86InstrInfo.h
592 ↗	(On Diff #177424)	Removing the argument looks better. I'll add an assert to make sure we actually have a 16-bit op here if it's not one of the 8-bit add opcodes.

Cheers - @andreadb @craig.topper do you have any comments?

craig.topper added inline comments.Dec 10 2018, 12:06 PM

lib/Target/X86/X86InstrInfo.cpp
810 ↗	(On Diff #177573)	Is this ever called with a 32-bit subtarget? It looks like your new 8-bit calls are all only in 64 bit mode which is good since leaOutReg's regclass would be wrong otherwise.

spatel marked an inline comment as done.Dec 10 2018, 12:14 PM

spatel added inline comments.

lib/Target/X86/X86InstrInfo.cpp
810 ↗	(On Diff #177573)	No - it's always guarded with the Subtarget.is64Bit() check in the calls below here. So I just copied that existing code for the 8-bit enhancement. That seemed weird to me, but I wasn't sure how this code would break on 32-bit.

craig.topper added inline comments.Dec 10 2018, 3:50 PM

lib/Target/X86/X86InstrInfo.cpp
810 ↗	(On Diff #177573)	For the 8-bit case in 32-bit mode you would need to use GR32_ABCD as the leaout register class. But GR32 would be fine for 16-bit in either mode. The comment when the 64-bit mode qualification was added for 16 bit mentioned partial register stalls. That was in 2009 so I'm not sure what CPUs it was considering. That wouldn't have been 32-bit mode specific other than 64-bit mode doesn't use AH/BH/CH/DH. Most of the 16 bit code has no code coverage today due to 16-bit op promotion in lowering. And the DisableLEA16 flag is always true, we should nuke it and all the unreachable code.

This patch looks good to me.

However, I will let Craig give the final approval.

test/CodeGen/X86/popcnt.ll
37–41 ↗	(On Diff #177573)	As I wrote before, I think this patch looks good to me. I just wanted to point out that the new codegen might lead to an increase in the number of merge opcodes on Sandybridge (see explanation below). That being said, I don't think it is something that we should be worrying about. Sorry in advance for the pedantic comment below... On Sandybridge (according to Agner and Intel docs), a partial write to AL triggers a merge opcode on a later read of AX/EAX/RAX. Basically, it is `as-if` AL is renamed separate from RAX. We trade a small increase in ILP at the cost of introducing a merge opcode on a dirty read of the underlying 2/4/8 byte register. AMD CPUs and modern Intel CPUs (from IvyBridge onwards) don't rename AL separate from RAX. That means, a write to AL always merges into RAX with a (false) dependency on RAX. Quoting Agner: `IvyBridge inserts an extra μop only in the case where a high 8-bit register (AH, BH, CH, DH) has been modified`. Intel cpus (not AMD) rename high8-bit registers. A write to AH allocates a distinct physical register. The advantage is that a write to AL can now go in parallel with a write to AH. The downside is that a read of AX/EAX/RAX triggers a merge uOp if AH is dirty. The latency of that merge uop tends to be very small (however, it varies from processor to processor). With this patch, we sometimes trade an partial byte read (example `addb`) with a full register read (through LEA) which has the potential of triggering an extra merge uOp on Sandybridge.

spatel mentioned this in rL348845: [x86] remove dead code for 16-bit LEA formation; NFC.Dec 11 2018, 6:10 AM

LGTM

I don't think Sandy Bridge and Ivy Bridge are different here. Sandy Bridge should only insert a merge when AH is dirty. I can't find anywhere in Intel's optimization manual that says they are different. The description in 3.5.2.4 describes the behavior from Sandy Bridge on.

This revision is now accepted and ready to land.Dec 11 2018, 7:21 AM

spatel mentioned this in rL348851: [x86] clean up code for converting 16-bit ops to LEA; NFC.Dec 11 2018, 7:32 AM

Patch updated:
Rebased after cleanup in rL348845 and rL348851. This should be functionally equivalent, but with less cruft. There's a 'TODO' comment for 32-bit target based on the comments here.

Closed by commit rL348946: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA (authored by spatel). · Explain WhyDec 12 2018, 10:03 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D55866: [DAGCombiner] allow narrowing of add followed by truncate.Dec 18 2018, 4:38 PM

spatel mentioned this in rL350006: [DAGCombiner] allow narrowing of add followed by truncate.Dec 22 2018, 9:14 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86InstrArithmetic.td

6 lines

X86InstrInfo.h

4 lines

X86InstrInfo.cpp

25 lines

test/

CodeGen/

X86/

GlobalISel/

add-scalar.ll

5 lines

shl-scalar-widening.ll

5 lines

4 lines

3 lines

4 lines

6 lines

4 lines

4 lines

4 lines

4 lines

scheduler-backtracking.ll

49 lines

Diff 177876

llvm/trunk/lib/Target/X86/X86InstrArithmetic.td

	Show First 20 Lines • Show All 907 Lines • ▼ Show 20 Lines
	/// tblgen can't handle dependent type references aggressively enough: PR8330			/// tblgen can't handle dependent type references aggressively enough: PR8330
	multiclass ArithBinOp_RF<bits<8> BaseOpc, bits<8> BaseOpc2, bits<8> BaseOpc4,			multiclass ArithBinOp_RF<bits<8> BaseOpc, bits<8> BaseOpc2, bits<8> BaseOpc4,
	string mnemonic, Format RegMRM, Format MemMRM,			string mnemonic, Format RegMRM, Format MemMRM,
	SDNode opnodeflag, SDNode opnode,			SDNode opnodeflag, SDNode opnode,
	bit CommutableRR, bit ConvertibleToThreeAddress> {			bit CommutableRR, bit ConvertibleToThreeAddress> {
	let Defs = [EFLAGS] in {			let Defs = [EFLAGS] in {
	let Constraints = "$src1 = $dst" in {			let Constraints = "$src1 = $dst" in {
	let isCommutable = CommutableRR in {			let isCommutable = CommutableRR in {
	def NAME#8rr : BinOpRR_RF<BaseOpc, mnemonic, Xi8 , opnodeflag>;
	let isConvertibleToThreeAddress = ConvertibleToThreeAddress in {			let isConvertibleToThreeAddress = ConvertibleToThreeAddress in {
				def NAME#8rr : BinOpRR_RF<BaseOpc, mnemonic, Xi8 , opnodeflag>;
	def NAME#16rr : BinOpRR_RF<BaseOpc, mnemonic, Xi16, opnodeflag>;			def NAME#16rr : BinOpRR_RF<BaseOpc, mnemonic, Xi16, opnodeflag>;
	def NAME#32rr : BinOpRR_RF<BaseOpc, mnemonic, Xi32, opnodeflag>;			def NAME#32rr : BinOpRR_RF<BaseOpc, mnemonic, Xi32, opnodeflag>;
	def NAME#64rr : BinOpRR_RF<BaseOpc, mnemonic, Xi64, opnodeflag>;			def NAME#64rr : BinOpRR_RF<BaseOpc, mnemonic, Xi64, opnodeflag>;
	} // isConvertibleToThreeAddress			} // isConvertibleToThreeAddress
	} // isCommutable			} // isCommutable

	def NAME#8rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi8>, FoldGenData<NAME#8rr>;			def NAME#8rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi8>, FoldGenData<NAME#8rr>;
	def NAME#16rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi16>, FoldGenData<NAME#16rr>;			def NAME#16rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi16>, FoldGenData<NAME#16rr>;
	def NAME#32rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi32>, FoldGenData<NAME#32rr>;			def NAME#32rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi32>, FoldGenData<NAME#32rr>;
	def NAME#64rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi64>, FoldGenData<NAME#64rr>;			def NAME#64rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi64>, FoldGenData<NAME#64rr>;

	def NAME#8rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi8 , opnodeflag>;			def NAME#8rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi8 , opnodeflag>;
	def NAME#16rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi16, opnodeflag>;			def NAME#16rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi16, opnodeflag>;
	def NAME#32rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi32, opnodeflag>;			def NAME#32rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi32, opnodeflag>;
	def NAME#64rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi64, opnodeflag>;			def NAME#64rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi64, opnodeflag>;

				let isConvertibleToThreeAddress = ConvertibleToThreeAddress in {
	def NAME#8ri : BinOpRI_RF<0x80, mnemonic, Xi8 , opnodeflag, RegMRM>;			def NAME#8ri : BinOpRI_RF<0x80, mnemonic, Xi8 , opnodeflag, RegMRM>;

	let isConvertibleToThreeAddress = ConvertibleToThreeAddress in {
	// NOTE: These are order specific, we want the ri8 forms to be listed			// NOTE: These are order specific, we want the ri8 forms to be listed
	// first so that they are slightly preferred to the ri forms.			// first so that they are slightly preferred to the ri forms.
	def NAME#16ri8 : BinOpRI8_RF<0x82, mnemonic, Xi16, opnodeflag, RegMRM>;			def NAME#16ri8 : BinOpRI8_RF<0x82, mnemonic, Xi16, opnodeflag, RegMRM>;
	def NAME#32ri8 : BinOpRI8_RF<0x82, mnemonic, Xi32, opnodeflag, RegMRM>;			def NAME#32ri8 : BinOpRI8_RF<0x82, mnemonic, Xi32, opnodeflag, RegMRM>;
	def NAME#64ri8 : BinOpRI8_RF<0x82, mnemonic, Xi64, opnodeflag, RegMRM>;			def NAME#64ri8 : BinOpRI8_RF<0x82, mnemonic, Xi64, opnodeflag, RegMRM>;

	def NAME#16ri : BinOpRI_RF<0x80, mnemonic, Xi16, opnodeflag, RegMRM>;			def NAME#16ri : BinOpRI_RF<0x80, mnemonic, Xi16, opnodeflag, RegMRM>;
	def NAME#32ri : BinOpRI_RF<0x80, mnemonic, Xi32, opnodeflag, RegMRM>;			def NAME#32ri : BinOpRI_RF<0x80, mnemonic, Xi32, opnodeflag, RegMRM>;
	▲ Show 20 Lines • Show All 404 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 578 Lines • ▼ Show 20 Lines	protected:

/// If the specific machine instruction is a instruction that moves/copies		/// If the specific machine instruction is a instruction that moves/copies
/// value from one register to another register return true along with		/// value from one register to another register return true along with
/// @Source machine operand and @Destination machine operand.		/// @Source machine operand and @Destination machine operand.
bool isCopyInstrImpl(const MachineInstr &MI, const MachineOperand *&Source,		bool isCopyInstrImpl(const MachineInstr &MI, const MachineOperand *&Source,
const MachineOperand *&Destination) const override;		const MachineOperand *&Destination) const override;

private:		private:
/// This is a helper for convertToThreeAddress for 16-bit instructions.		/// This is a helper for convertToThreeAddress for 8 and 16-bit instructions.
/// We use 32-bit LEA to form 3-address code by promoting to a 32-bit		/// We use 32-bit LEA to form 3-address code by promoting to a 32-bit
/// super-register and then truncating back down to a 16-bit sub-register.		/// super-register and then truncating back down to a 8/16-bit sub-register.
MachineInstr *convertToThreeAddressWithLEA(unsigned MIOpc,		MachineInstr *convertToThreeAddressWithLEA(unsigned MIOpc,
MachineFunction::iterator &MFI,		MachineFunction::iterator &MFI,
MachineInstr &MI,		MachineInstr &MI,
LiveVariables *LV) const;		LiveVariables *LV) const;

/// Handles memory folding for special case instructions, for instance those		/// Handles memory folding for special case instructions, for instance those
/// requiring custom manipulation of the address.		/// requiring custom manipulation of the address.
MachineInstr *foldMemoryOperandCustom(MachineFunction &MF, MachineInstr &MI,		MachineInstr *foldMemoryOperandCustom(MachineFunction &MF, MachineInstr &MI,
Show All 36 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 791 Lines • ▼ Show 20 Lines	bool X86InstrInfo::classifyLEAReg(MachineInstr &MI, const MachineOperand &Src,

// We've set all the parameters without issue.		// We've set all the parameters without issue.
return true;		return true;
}		}

MachineInstr *X86InstrInfo::convertToThreeAddressWithLEA(		MachineInstr *X86InstrInfo::convertToThreeAddressWithLEA(
unsigned MIOpc, MachineFunction::iterator &MFI, MachineInstr &MI,		unsigned MIOpc, MachineFunction::iterator &MFI, MachineInstr &MI,
LiveVariables *LV) const {		LiveVariables *LV) const {
		// We handle 8-bit adds and various 16-bit opcodes in the switch below.
		bool Is16BitOp = !(MIOpc == X86::ADD8rr \|\| MIOpc == X86::ADD8ri);
		MachineRegisterInfo &RegInfo = MFI->getParent()->getRegInfo();
		assert((!Is16BitOp \|\| RegInfo.getTargetRegisterInfo()->getRegSizeInBits(
		*RegInfo.getRegClass(MI.getOperand(0).getReg())) == 16) &&
		"Unexpected type for LEA transform");

// TODO: For a 32-bit target, we need to adjust the LEA variables with		// TODO: For a 32-bit target, we need to adjust the LEA variables with
// something like this:		// something like this:
// Opcode = X86::LEA32r;		// Opcode = X86::LEA32r;
// InRegLEA = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);		// InRegLEA = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);
// OutRegLEA =		// OutRegLEA =
// Is8BitOp ? RegInfo.createVirtualRegister(&X86::GR32ABCD_RegClass)		// Is8BitOp ? RegInfo.createVirtualRegister(&X86::GR32ABCD_RegClass)
// : RegInfo.createVirtualRegister(&X86::GR32RegClass);		// : RegInfo.createVirtualRegister(&X86::GR32RegClass);
if (!Subtarget.is64Bit())		if (!Subtarget.is64Bit())
return nullptr;		return nullptr;

MachineRegisterInfo &RegInfo = MFI->getParent()->getRegInfo();
unsigned Opcode = X86::LEA64_32r;		unsigned Opcode = X86::LEA64_32r;
unsigned InRegLEA = RegInfo.createVirtualRegister(&X86::GR64_NOSPRegClass);		unsigned InRegLEA = RegInfo.createVirtualRegister(&X86::GR64_NOSPRegClass);
unsigned OutRegLEA = RegInfo.createVirtualRegister(&X86::GR32RegClass);		unsigned OutRegLEA = RegInfo.createVirtualRegister(&X86::GR32RegClass);

// Build and insert into an implicit UNDEF value. This is OK because		// Build and insert into an implicit UNDEF value. This is OK because
// we will be shifting and then extracting the lower 16-bits.		// we will be shifting and then extracting the lower 8/16-bits.
// This has the potential to cause partial register stall. e.g.		// This has the potential to cause partial register stall. e.g.
// movw (%rbp,%rcx,2), %dx		// movw (%rbp,%rcx,2), %dx
// leal -65(%rdx), %esi		// leal -65(%rdx), %esi
// But testing has shown this does help performance in 64-bit mode (at		// But testing has shown this does help performance in 64-bit mode (at
// least on modern x86 machines).		// least on modern x86 machines).
MachineBasicBlock::iterator MBBI = MI.getIterator();		MachineBasicBlock::iterator MBBI = MI.getIterator();
unsigned Dest = MI.getOperand(0).getReg();		unsigned Dest = MI.getOperand(0).getReg();
unsigned Src = MI.getOperand(1).getReg();		unsigned Src = MI.getOperand(1).getReg();
bool IsDead = MI.getOperand(0).isDead();		bool IsDead = MI.getOperand(0).isDead();
bool IsKill = MI.getOperand(1).isKill();		bool IsKill = MI.getOperand(1).isKill();
		unsigned SubReg = Is16BitOp ? X86::sub_16bit : X86::sub_8bit;
assert(!MI.getOperand(1).isUndef() && "Undef op doesn't need optimization");		assert(!MI.getOperand(1).isUndef() && "Undef op doesn't need optimization");
BuildMI(*MFI, MBBI, MI.getDebugLoc(), get(X86::IMPLICIT_DEF), InRegLEA);		BuildMI(*MFI, MBBI, MI.getDebugLoc(), get(X86::IMPLICIT_DEF), InRegLEA);
MachineInstr *InsMI =		MachineInstr *InsMI =
BuildMI(*MFI, MBBI, MI.getDebugLoc(), get(TargetOpcode::COPY))		BuildMI(*MFI, MBBI, MI.getDebugLoc(), get(TargetOpcode::COPY))
.addReg(InRegLEA, RegState::Define, X86::sub_16bit)		.addReg(InRegLEA, RegState::Define, SubReg)
.addReg(Src, getKillRegState(IsKill));		.addReg(Src, getKillRegState(IsKill));

MachineInstrBuilder MIB =		MachineInstrBuilder MIB =
BuildMI(*MFI, MBBI, MI.getDebugLoc(), get(Opcode), OutRegLEA);		BuildMI(*MFI, MBBI, MI.getDebugLoc(), get(Opcode), OutRegLEA);
switch (MIOpc) {		switch (MIOpc) {
default: llvm_unreachable("Unreachable!");		default: llvm_unreachable("Unreachable!");
case X86::SHL16ri: {		case X86::SHL16ri: {
unsigned ShAmt = MI.getOperand(2).getImm();		unsigned ShAmt = MI.getOperand(2).getImm();
MIB.addReg(0).addImm(1ULL << ShAmt)		MIB.addReg(0).addImm(1ULL << ShAmt)
.addReg(InRegLEA, RegState::Kill).addImm(0).addReg(0);		.addReg(InRegLEA, RegState::Kill).addImm(0).addReg(0);
break;		break;
}		}
case X86::INC16r:		case X86::INC16r:
addRegOffset(MIB, InRegLEA, true, 1);		addRegOffset(MIB, InRegLEA, true, 1);
break;		break;
case X86::DEC16r:		case X86::DEC16r:
addRegOffset(MIB, InRegLEA, true, -1);		addRegOffset(MIB, InRegLEA, true, -1);
break;		break;
		case X86::ADD8ri:
case X86::ADD16ri:		case X86::ADD16ri:
case X86::ADD16ri8:		case X86::ADD16ri8:
case X86::ADD16ri_DB:		case X86::ADD16ri_DB:
case X86::ADD16ri8_DB:		case X86::ADD16ri8_DB:
addRegOffset(MIB, InRegLEA, true, MI.getOperand(2).getImm());		addRegOffset(MIB, InRegLEA, true, MI.getOperand(2).getImm());
break;		break;
		case X86::ADD8rr:
case X86::ADD16rr:		case X86::ADD16rr:
case X86::ADD16rr_DB: {		case X86::ADD16rr_DB: {
unsigned Src2 = MI.getOperand(2).getReg();		unsigned Src2 = MI.getOperand(2).getReg();
bool IsKill2 = MI.getOperand(2).isKill();		bool IsKill2 = MI.getOperand(2).isKill();
assert(!MI.getOperand(2).isUndef() && "Undef op doesn't need optimization");		assert(!MI.getOperand(2).isUndef() && "Undef op doesn't need optimization");
unsigned InRegLEA2 = 0;		unsigned InRegLEA2 = 0;
MachineInstr *InsMI2 = nullptr;		MachineInstr *InsMI2 = nullptr;
if (Src == Src2) {		if (Src == Src2) {
// ADD16rr killed %reg1028, %reg1028		// ADD8rr/ADD16rr killed %reg1028, %reg1028
// just a single insert_subreg.		// just a single insert_subreg.
addRegReg(MIB, InRegLEA, true, InRegLEA, false);		addRegReg(MIB, InRegLEA, true, InRegLEA, false);
} else {		} else {
if (Subtarget.is64Bit())		if (Subtarget.is64Bit())
InRegLEA2 = RegInfo.createVirtualRegister(&X86::GR64_NOSPRegClass);		InRegLEA2 = RegInfo.createVirtualRegister(&X86::GR64_NOSPRegClass);
else		else
InRegLEA2 = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);		InRegLEA2 = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);
// Build and insert into an implicit UNDEF value. This is OK because		// Build and insert into an implicit UNDEF value. This is OK because
// we will be shifting and then extracting the lower 16-bits.		// we will be shifting and then extracting the lower 8/16-bits.
BuildMI(MFI, &MIB, MI.getDebugLoc(), get(X86::IMPLICIT_DEF), InRegLEA2);		BuildMI(MFI, &MIB, MI.getDebugLoc(), get(X86::IMPLICIT_DEF), InRegLEA2);
InsMI2 = BuildMI(MFI, &MIB, MI.getDebugLoc(), get(TargetOpcode::COPY))		InsMI2 = BuildMI(MFI, &MIB, MI.getDebugLoc(), get(TargetOpcode::COPY))
.addReg(InRegLEA2, RegState::Define, X86::sub_16bit)		.addReg(InRegLEA2, RegState::Define, SubReg)
.addReg(Src2, getKillRegState(IsKill2));		.addReg(Src2, getKillRegState(IsKill2));
addRegReg(MIB, InRegLEA, true, InRegLEA2, true);		addRegReg(MIB, InRegLEA, true, InRegLEA2, true);
}		}
if (LV && IsKill2 && InsMI2)		if (LV && IsKill2 && InsMI2)
LV->replaceKillInstruction(Src2, MI, *InsMI2);		LV->replaceKillInstruction(Src2, MI, *InsMI2);
break;		break;
}		}
}		}

MachineInstr *NewMI = MIB;		MachineInstr *NewMI = MIB;
MachineInstr *ExtMI =		MachineInstr *ExtMI =
BuildMI(*MFI, MBBI, MI.getDebugLoc(), get(TargetOpcode::COPY))		BuildMI(*MFI, MBBI, MI.getDebugLoc(), get(TargetOpcode::COPY))
.addReg(Dest, RegState::Define \| getDeadRegState(IsDead))		.addReg(Dest, RegState::Define \| getDeadRegState(IsDead))
.addReg(OutRegLEA, RegState::Kill, X86::sub_16bit);		.addReg(OutRegLEA, RegState::Kill, SubReg);

if (LV) {		if (LV) {
// Update live variables.		// Update live variables.
LV->getVarInfo(InRegLEA).Kills.push_back(NewMI);		LV->getVarInfo(InRegLEA).Kills.push_back(NewMI);
LV->getVarInfo(OutRegLEA).Kills.push_back(ExtMI);		LV->getVarInfo(OutRegLEA).Kills.push_back(ExtMI);
if (IsKill)		if (IsKill)
LV->replaceKillInstruction(Src, MI, *InsMI);		LV->replaceKillInstruction(Src, MI, *InsMI);
if (IsDead)		if (IsDead)
▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	case X86::ADD32rr_DB: {
if (ImplicitOp2.getReg() != 0)		if (ImplicitOp2.getReg() != 0)
MIB.add(ImplicitOp2);		MIB.add(ImplicitOp2);

NewMI = addRegReg(MIB, SrcReg, isKill, SrcReg2, isKill2);		NewMI = addRegReg(MIB, SrcReg, isKill, SrcReg2, isKill2);
if (LV && Src2.isKill())		if (LV && Src2.isKill())
LV->replaceKillInstruction(SrcReg2, MI, *NewMI);		LV->replaceKillInstruction(SrcReg2, MI, *NewMI);
break;		break;
}		}
		case X86::ADD8rr:
case X86::ADD16rr:		case X86::ADD16rr:
case X86::ADD16rr_DB:		case X86::ADD16rr_DB:
return convertToThreeAddressWithLEA(MIOpc, MFI, MI, LV);		return convertToThreeAddressWithLEA(MIOpc, MFI, MI, LV);
case X86::ADD64ri32:		case X86::ADD64ri32:
case X86::ADD64ri8:		case X86::ADD64ri8:
case X86::ADD64ri32_DB:		case X86::ADD64ri32_DB:
case X86::ADD64ri8_DB:		case X86::ADD64ri8_DB:
assert(MI.getNumOperands() >= 3 && "Unknown add instruction!");		assert(MI.getNumOperands() >= 3 && "Unknown add instruction!");
Show All 19 Lines	MachineInstrBuilder MIB = BuildMI(MF, MI.getDebugLoc(), get(Opc))
.add(Dest)		.add(Dest)
.addReg(SrcReg, getKillRegState(isKill));		.addReg(SrcReg, getKillRegState(isKill));
if (ImplicitOp.getReg() != 0)		if (ImplicitOp.getReg() != 0)
MIB.add(ImplicitOp);		MIB.add(ImplicitOp);

NewMI = addOffset(MIB, MI.getOperand(2));		NewMI = addOffset(MIB, MI.getOperand(2));
break;		break;
}		}
		case X86::ADD8ri:
case X86::ADD16ri:		case X86::ADD16ri:
case X86::ADD16ri8:		case X86::ADD16ri8:
case X86::ADD16ri_DB:		case X86::ADD16ri_DB:
case X86::ADD16ri8_DB:		case X86::ADD16ri8_DB:
return convertToThreeAddressWithLEA(MIOpc, MFI, MI, LV);		return convertToThreeAddressWithLEA(MIOpc, MFI, MI, LV);
case X86::VMOVDQU8Z128rmk:		case X86::VMOVDQU8Z128rmk:
case X86::VMOVDQU8Z256rmk:		case X86::VMOVDQU8Z256rmk:
case X86::VMOVDQU8Zrmk:		case X86::VMOVDQU8Zrmk:
▲ Show 20 Lines • Show All 6,656 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/GlobalISel/add-scalar.ll

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; X32-NEXT: retl			; X32-NEXT: retl
	%ret = add i16 %arg1, %arg2			%ret = add i16 %arg1, %arg2
	ret i16 %ret			ret i16 %ret
	}			}

	define i8 @test_add_i8(i8 %arg1, i8 %arg2) {			define i8 @test_add_i8(i8 %arg1, i8 %arg2) {
	; X64-LABEL: test_add_i8:			; X64-LABEL: test_add_i8:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %esi, %eax			; X64-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: addb %dil, %al			; X64-NEXT: # kill: def $esi killed $esi def $rsi
				; X64-NEXT: leal (%rsi,%rdi), %eax
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: test_add_i8:			; X32-LABEL: test_add_i8:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movb {{[0-9]+}}(%esp), %al			; X32-NEXT: movb {{[0-9]+}}(%esp), %al
	; X32-NEXT: addb {{[0-9]+}}(%esp), %al			; X32-NEXT: addb {{[0-9]+}}(%esp), %al
	; X32-NEXT: retl			; X32-NEXT: retl
	Show All 28 Lines

llvm/trunk/test/CodeGen/X86/GlobalISel/shl-scalar-widening.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-linux-gnu -global-isel -verify-machineinstrs < %s -o - \| FileCheck %s --check-prefix=X64			; RUN: llc -mtriple=x86_64-linux-gnu -global-isel -verify-machineinstrs < %s -o - \| FileCheck %s --check-prefix=X64

	define i16 @test_shl_i4(i16 %v, i16 %a, i16 %b) {			define i16 @test_shl_i4(i16 %v, i16 %a, i16 %b) {
	; Let's say the arguments are the following unsigned			; Let's say the arguments are the following unsigned
	; integers in two’s complement representation:			; integers in two’s complement representation:
	;			;
	; %v: 77 (0000 0000 0100 1101)			; %v: 77 (0000 0000 0100 1101)
	; %a: 74 (0000 0000 0100 1010)			; %a: 74 (0000 0000 0100 1010)
	; %b: 72 (0000 0000 0100 1000)			; %b: 72 (0000 0000 0100 1000)
	; X64-LABEL: test_shl_i4:			; X64-LABEL: test_shl_i4:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: movl %edx, %ecx			; X64-NEXT: # kill: def $esi killed $esi def $rsi
	; X64-NEXT: addb %sil, %cl			; X64-NEXT: # kill: def $edx killed $edx def $rdx
				; X64-NEXT: leal (%rdx,%rsi), %ecx
	; X64-NEXT: andb $15, %cl			; X64-NEXT: andb $15, %cl
	; X64-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-NEXT: shlb %cl, %al			; X64-NEXT: shlb %cl, %al
	; X64-NEXT: andw $15, %ax			; X64-NEXT: andw $15, %ax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%v.t = trunc i16 %v to i4 ; %v.t: 13 (1101)			%v.t = trunc i16 %v to i4 ; %v.t: 13 (1101)
	%a.t = trunc i16 %a to i4 ; %a.t: 10 (1010)			%a.t = trunc i16 %a to i4 ; %a.t: 10 (1010)
	Show All 31 Lines

llvm/trunk/test/CodeGen/X86/GlobalISel/shl-scalar.ll

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%a = trunc i32 %arg1 to i8		%a = trunc i32 %arg1 to i8
%res = shl i8 %a, 5		%res = shl i8 %a, 5
ret i8 %res		ret i8 %res
}		}

define i8 @test_shl_i8_imm1(i32 %arg1) {		define i8 @test_shl_i8_imm1(i32 %arg1) {
; X64-LABEL: test_shl_i8_imm1:		; X64-LABEL: test_shl_i8_imm1:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movl %edi, %eax		; X64-NEXT: # kill: def $edi killed $edi def $rdi
; X64-NEXT: addb %al, %al		; X64-NEXT: leal (%rdi,%rdi), %eax
; X64-NEXT: # kill: def $al killed $al killed $eax		; X64-NEXT: # kill: def $al killed $al killed $eax
; X64-NEXT: retq		; X64-NEXT: retq
%a = trunc i32 %arg1 to i8		%a = trunc i32 %arg1 to i8
%res = shl i8 %a, 1		%res = shl i8 %a, 1
ret i8 %res		ret i8 %res
}		}

define i1 @test_shl_i1(i32 %arg1, i32 %arg2) {		define i1 @test_shl_i1(i32 %arg1, i32 %arg2) {
Show All 28 Lines

llvm/trunk/test/CodeGen/X86/fixup-bw-copy.ll

	Show All 37 Lines
	; BWOFF32-NEXT: retl			; BWOFF32-NEXT: retl
	ret i16 %a0			ret i16 %a0
	}			}

	; Verify we don't mess with H-reg copies (only generated in 32-bit mode).			; Verify we don't mess with H-reg copies (only generated in 32-bit mode).
	define i8 @test_movb_hreg(i16 %a0) {			define i8 @test_movb_hreg(i16 %a0) {
	; X64-LABEL: test_movb_hreg:			; X64-LABEL: test_movb_hreg:
	; X64: # %bb.0:			; X64: # %bb.0:
				; X64-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: shrl $8, %eax			; X64-NEXT: shrl $8, %eax
	; X64-NEXT: addb %dil, %al			; X64-NEXT: leal (%rax,%rdi), %eax
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: test_movb_hreg:			; X32-LABEL: test_movb_hreg:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: addb %al, %ah			; X32-NEXT: addb %al, %ah
	; X32-NEXT: movb %ah, %al			; X32-NEXT: movb %ah, %al
	; X32-NEXT: retl			; X32-NEXT: retl
	%tmp0 = trunc i16 %a0 to i8			%tmp0 = trunc i16 %a0 to i8
	%tmp1 = lshr i16 %a0, 8			%tmp1 = lshr i16 %a0, 8
	%tmp2 = trunc i16 %tmp1 to i8			%tmp2 = trunc i16 %tmp1 to i8
	%tmp3 = add i8 %tmp0, %tmp2			%tmp3 = add i8 %tmp0, %tmp2
	ret i8 %tmp3			ret i8 %tmp3
	}			}

llvm/trunk/test/CodeGen/X86/fshr.ll

	Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines
	; X86-NEXT: movb {{[0-9]+}}(%esp), %cl			; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-NEXT: shrb $7, %cl			; X86-NEXT: shrb $7, %cl
	; X86-NEXT: addb %al, %al			; X86-NEXT: addb %al, %al
	; X86-NEXT: orb %cl, %al			; X86-NEXT: orb %cl, %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: const_shift_i8:			; X64-LABEL: const_shift_i8:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: shrb $7, %sil			; X64-NEXT: shrb $7, %sil
	; X64-NEXT: addb %al, %al			; X64-NEXT: leal (%rdi,%rdi), %eax
	; X64-NEXT: orb %sil, %al			; X64-NEXT: orb %sil, %al
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp = tail call i8 @llvm.fshr.i8(i8 %x, i8 %y, i8 7)			%tmp = tail call i8 @llvm.fshr.i8(i8 %x, i8 %y, i8 7)
	ret i8 %tmp			ret i8 %tmp
	}			}

	define i16 @const_shift_i16(i16 %x, i16 %y) nounwind {			define i16 @const_shift_i16(i16 %x, i16 %y) nounwind {
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/iabs.ll

	Show All 15 Lines
	; X86-NEXT: movl %eax, %ecx			; X86-NEXT: movl %eax, %ecx
	; X86-NEXT: sarb $7, %cl			; X86-NEXT: sarb $7, %cl
	; X86-NEXT: addb %cl, %al			; X86-NEXT: addb %cl, %al
	; X86-NEXT: xorb %cl, %al			; X86-NEXT: xorb %cl, %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i8:			; X64-LABEL: test_i8:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: movl %eax, %ecx			; X64-NEXT: movl %edi, %ecx
	; X64-NEXT: sarb $7, %cl			; X64-NEXT: sarb $7, %cl
	; X64-NEXT: addb %cl, %al			; X64-NEXT: leal (%rdi,%rcx), %eax
	; X64-NEXT: xorb %cl, %al			; X64-NEXT: xorb %cl, %al
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp1neg = sub i8 0, %a			%tmp1neg = sub i8 0, %a
	%b = icmp sgt i8 %a, -1			%b = icmp sgt i8 %a, -1
	%abs = select i1 %b, i8 %a, i8 %tmp1neg			%abs = select i1 %b, i8 %a, i8 %tmp1neg
	ret i8 %abs			ret i8 %abs
	}			}
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/mul-constant-i8.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s --check-prefix=X64

	define i8 @test_mul_by_1(i8 %x) {			define i8 @test_mul_by_1(i8 %x) {
	; X64-LABEL: test_mul_by_1:			; X64-LABEL: test_mul_by_1:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = mul i8 %x, 1			%m = mul i8 %x, 1
	ret i8 %m			ret i8 %m
	}			}

	define i8 @test_mul_by_2(i8 %x) {			define i8 @test_mul_by_2(i8 %x) {
	; X64-LABEL: test_mul_by_2:			; X64-LABEL: test_mul_by_2:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: addb %al, %al			; X64-NEXT: leal (%rdi,%rdi), %eax
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = mul i8 %x, 2			%m = mul i8 %x, 2
	ret i8 %m			ret i8 %m
	}			}

	define i8 @test_mul_by_3(i8 %x) {			define i8 @test_mul_by_3(i8 %x) {
	; X64-LABEL: test_mul_by_3:			; X64-LABEL: test_mul_by_3:
	▲ Show 20 Lines • Show All 459 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/popcnt.ll

	Show All 19 Lines
	; X32-NEXT: movl %ecx, %eax			; X32-NEXT: movl %ecx, %eax
	; X32-NEXT: shrb $4, %al			; X32-NEXT: shrb $4, %al
	; X32-NEXT: addb %cl, %al			; X32-NEXT: addb %cl, %al
	; X32-NEXT: andb $15, %al			; X32-NEXT: andb $15, %al
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: cnt8:			; X64-LABEL: cnt8:
	; X64: # %bb.0:			; X64: # %bb.0:
				; X64-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: shrb %al			; X64-NEXT: shrb %al
	; X64-NEXT: andb $85, %al			; X64-NEXT: andb $85, %al
	; X64-NEXT: subb %al, %dil			; X64-NEXT: subb %al, %dil
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: andb $51, %al			; X64-NEXT: andb $51, %al
	; X64-NEXT: shrb $2, %dil			; X64-NEXT: shrb $2, %dil
	; X64-NEXT: andb $51, %dil			; X64-NEXT: andb $51, %dil
	; X64-NEXT: addb %al, %dil			; X64-NEXT: addb %al, %dil
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: shrb $4, %al			; X64-NEXT: shrb $4, %al
	; X64-NEXT: addb %dil, %al			; X64-NEXT: leal (%rax,%rdi), %eax
	; X64-NEXT: andb $15, %al			; X64-NEXT: andb $15, %al
				; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-POPCNT-LABEL: cnt8:			; X32-POPCNT-LABEL: cnt8:
	; X32-POPCNT: # %bb.0:			; X32-POPCNT: # %bb.0:
	; X32-POPCNT-NEXT: movzbl {{[0-9]+}}(%esp), %eax			; X32-POPCNT-NEXT: movzbl {{[0-9]+}}(%esp), %eax
	; X32-POPCNT-NEXT: popcntl %eax, %eax			; X32-POPCNT-NEXT: popcntl %eax, %eax
	; X32-POPCNT-NEXT: # kill: def $al killed $al killed $eax			; X32-POPCNT-NEXT: # kill: def $al killed $al killed $eax
	; X32-POPCNT-NEXT: retl			; X32-POPCNT-NEXT: retl
	▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/pr23664.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s

	define i2 @f(i32 %arg) {			define i2 @f(i32 %arg) {
	; CHECK-LABEL: f:			; CHECK-LABEL: f:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: # kill: def $edi killed $edi def $rdi
	; CHECK-NEXT: addb %al, %al			; CHECK-NEXT: leal (%rdi,%rdi), %eax
	; CHECK-NEXT: orb $1, %al			; CHECK-NEXT: orb $1, %al
	; CHECK-NEXT: # kill: def $al killed $al killed $eax			; CHECK-NEXT: # kill: def $al killed $al killed $eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%trunc = trunc i32 %arg to i1			%trunc = trunc i32 %arg to i1
	%sext = sext i1 %trunc to i2			%sext = sext i1 %trunc to i2
	%or = or i2 %sext, 1			%or = or i2 %sext, 1
	ret i2 %or			ret i2 %or
	}			}

llvm/trunk/test/CodeGen/X86/rotate4.ll

	Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines
	; X86-NEXT: movb {{[0-9]+}}(%esp), %cl			; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-NEXT: addb %cl, %cl			; X86-NEXT: addb %cl, %cl
	; X86-NEXT: andb $30, %cl			; X86-NEXT: andb $30, %cl
	; X86-NEXT: roll %cl, %eax			; X86-NEXT: roll %cl, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: rotate_demanded_bits_3:			; X64-LABEL: rotate_demanded_bits_3:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %esi, %ecx			; X64-NEXT: # kill: def $esi killed $esi def $rsi
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: addb %cl, %cl			; X64-NEXT: leal (%rsi,%rsi), %ecx
	; X64-NEXT: andb $30, %cl			; X64-NEXT: andb $30, %cl
	; X64-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-NEXT: roll %cl, %eax			; X64-NEXT: roll %cl, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%3 = shl i32 %1, 1			%3 = shl i32 %1, 1
	%4 = and i32 %3, 30			%4 = and i32 %3, 30
	%5 = shl i32 %0, %4			%5 = shl i32 %0, %4
	%6 = sub i32 0, %3			%6 = sub i32 0, %3
	%7 = and i32 %6, 30			%7 = and i32 %6, 30
	%8 = lshr i32 %0, %7			%8 = lshr i32 %0, %7
	%9 = or i32 %5, %8			%9 = or i32 %5, %8
	ret i32 %9			ret i32 %9
	}			}

llvm/trunk/test/CodeGen/X86/scheduler-backtracking.ll

	Show All 13 Lines
	; ILP: # %bb.0:			; ILP: # %bb.0:
	; ILP-NEXT: pushq %r14			; ILP-NEXT: pushq %r14
	; ILP-NEXT: pushq %rbx			; ILP-NEXT: pushq %rbx
	; ILP-NEXT: movq %rdi, %rax			; ILP-NEXT: movq %rdi, %rax
	; ILP-NEXT: xorl %r8d, %r8d			; ILP-NEXT: xorl %r8d, %r8d
	; ILP-NEXT: incl %esi			; ILP-NEXT: incl %esi
	; ILP-NEXT: addb %sil, %sil			; ILP-NEXT: addb %sil, %sil
	; ILP-NEXT: orb $1, %sil			; ILP-NEXT: orb $1, %sil
	; ILP-NEXT: movl $1, %r9d			; ILP-NEXT: movl $1, %r10d
	; ILP-NEXT: xorl %r14d, %r14d			; ILP-NEXT: xorl %r14d, %r14d
	; ILP-NEXT: movl %esi, %ecx			; ILP-NEXT: movl %esi, %ecx
	; ILP-NEXT: shldq %cl, %r9, %r14			; ILP-NEXT: shldq %cl, %r10, %r14
	; ILP-NEXT: movl $1, %edx			; ILP-NEXT: movl $1, %edx
	; ILP-NEXT: shlq %cl, %rdx			; ILP-NEXT: shlq %cl, %rdx
	; ILP-NEXT: movl %esi, %r11d			; ILP-NEXT: leal -128(%rsi), %r9d
	; ILP-NEXT: addb $-128, %r11b			; ILP-NEXT: movb $-128, %r11b
	; ILP-NEXT: movb $-128, %r10b
	; ILP-NEXT: xorl %ebx, %ebx			; ILP-NEXT: xorl %ebx, %ebx
	; ILP-NEXT: movl %r11d, %ecx			; ILP-NEXT: movl %r9d, %ecx
	; ILP-NEXT: shldq %cl, %r9, %rbx			; ILP-NEXT: shldq %cl, %r10, %rbx
	; ILP-NEXT: testb $64, %sil			; ILP-NEXT: testb $64, %sil
	; ILP-NEXT: cmovneq %rdx, %r14			; ILP-NEXT: cmovneq %rdx, %r14
	; ILP-NEXT: cmovneq %r8, %rdx			; ILP-NEXT: cmovneq %r8, %rdx
	; ILP-NEXT: movl $1, %edi			; ILP-NEXT: movl $1, %edi
	; ILP-NEXT: shlq %cl, %rdi			; ILP-NEXT: shlq %cl, %rdi
	; ILP-NEXT: subb %sil, %r10b			; ILP-NEXT: subb %sil, %r11b
	; ILP-NEXT: movl %r10d, %ecx			; ILP-NEXT: movl %r11d, %ecx
	; ILP-NEXT: shrdq %cl, %r8, %r9			; ILP-NEXT: shrdq %cl, %r8, %r10
	; ILP-NEXT: testb $64, %r10b
	; ILP-NEXT: cmovneq %r8, %r9
	; ILP-NEXT: testb $64, %r11b			; ILP-NEXT: testb $64, %r11b
				; ILP-NEXT: cmovneq %r8, %r10
				; ILP-NEXT: testb $64, %r9b
	; ILP-NEXT: cmovneq %rdi, %rbx			; ILP-NEXT: cmovneq %rdi, %rbx
	; ILP-NEXT: cmovneq %r8, %rdi			; ILP-NEXT: cmovneq %r8, %rdi
	; ILP-NEXT: testb %sil, %sil			; ILP-NEXT: testb %sil, %sil
	; ILP-NEXT: cmovsq %r8, %r14			; ILP-NEXT: cmovsq %r8, %r14
	; ILP-NEXT: cmovsq %r8, %rdx			; ILP-NEXT: cmovsq %r8, %rdx
	; ILP-NEXT: movq %r14, 8(%rax)			; ILP-NEXT: movq %r14, 8(%rax)
	; ILP-NEXT: movq %rdx, (%rax)			; ILP-NEXT: movq %rdx, (%rax)
	; ILP-NEXT: cmovnsq %r8, %rbx			; ILP-NEXT: cmovnsq %r8, %rbx
	; ILP-NEXT: cmoveq %r8, %rbx			; ILP-NEXT: cmoveq %r8, %rbx
	; ILP-NEXT: movq %rbx, 24(%rax)			; ILP-NEXT: movq %rbx, 24(%rax)
	; ILP-NEXT: cmovnsq %r9, %rdi			; ILP-NEXT: cmovnsq %r10, %rdi
	; ILP-NEXT: cmoveq %r8, %rdi			; ILP-NEXT: cmoveq %r8, %rdi
	; ILP-NEXT: movq %rdi, 16(%rax)			; ILP-NEXT: movq %rdi, 16(%rax)
	; ILP-NEXT: popq %rbx			; ILP-NEXT: popq %rbx
	; ILP-NEXT: popq %r14			; ILP-NEXT: popq %r14
	; ILP-NEXT: retq			; ILP-NEXT: retq
	;			;
	; HYBRID-LABEL: test1:			; HYBRID-LABEL: test1:
	; HYBRID: # %bb.0:			; HYBRID: # %bb.0:
	; HYBRID-NEXT: movq %rdi, %rax			; HYBRID-NEXT: movq %rdi, %rax
	; HYBRID-NEXT: incl %esi			; HYBRID-NEXT: incl %esi
	; HYBRID-NEXT: addb %sil, %sil			; HYBRID-NEXT: addb %sil, %sil
	; HYBRID-NEXT: orb $1, %sil			; HYBRID-NEXT: orb $1, %sil
	; HYBRID-NEXT: movb $-128, %cl			; HYBRID-NEXT: movb $-128, %cl
	; HYBRID-NEXT: subb %sil, %cl			; HYBRID-NEXT: subb %sil, %cl
	; HYBRID-NEXT: xorl %r8d, %r8d			; HYBRID-NEXT: xorl %r8d, %r8d
	; HYBRID-NEXT: movl $1, %r11d			; HYBRID-NEXT: movl $1, %r11d
	; HYBRID-NEXT: movl $1, %r9d			; HYBRID-NEXT: movl $1, %r9d
	; HYBRID-NEXT: shrdq %cl, %r8, %r9			; HYBRID-NEXT: shrdq %cl, %r8, %r9
	; HYBRID-NEXT: testb $64, %cl			; HYBRID-NEXT: testb $64, %cl
	; HYBRID-NEXT: cmovneq %r8, %r9			; HYBRID-NEXT: cmovneq %r8, %r9
	; HYBRID-NEXT: xorl %r10d, %r10d			; HYBRID-NEXT: xorl %r10d, %r10d
	; HYBRID-NEXT: movl %esi, %ecx			; HYBRID-NEXT: movl %esi, %ecx
	; HYBRID-NEXT: shldq %cl, %r11, %r10			; HYBRID-NEXT: shldq %cl, %r11, %r10
	; HYBRID-NEXT: addb $-128, %cl			; HYBRID-NEXT: leal -128(%rsi), %ecx
	; HYBRID-NEXT: xorl %edi, %edi			; HYBRID-NEXT: xorl %edi, %edi
	; HYBRID-NEXT: shldq %cl, %r11, %rdi			; HYBRID-NEXT: shldq %cl, %r11, %rdi
	; HYBRID-NEXT: movl $1, %edx			; HYBRID-NEXT: movl $1, %edx
	; HYBRID-NEXT: shlq %cl, %rdx			; HYBRID-NEXT: shlq %cl, %rdx
	; HYBRID-NEXT: testb $64, %cl			; HYBRID-NEXT: testb $64, %cl
	; HYBRID-NEXT: cmovneq %rdx, %rdi			; HYBRID-NEXT: cmovneq %rdx, %rdi
	; HYBRID-NEXT: cmovneq %r8, %rdx			; HYBRID-NEXT: cmovneq %r8, %rdx
	; HYBRID-NEXT: movl %esi, %ecx			; HYBRID-NEXT: movl %esi, %ecx
	Show All 26 Lines
	; BURR-NEXT: movl $1, %r11d			; BURR-NEXT: movl $1, %r11d
	; BURR-NEXT: movl $1, %r9d			; BURR-NEXT: movl $1, %r9d
	; BURR-NEXT: shrdq %cl, %r8, %r9			; BURR-NEXT: shrdq %cl, %r8, %r9
	; BURR-NEXT: testb $64, %cl			; BURR-NEXT: testb $64, %cl
	; BURR-NEXT: cmovneq %r8, %r9			; BURR-NEXT: cmovneq %r8, %r9
	; BURR-NEXT: xorl %r10d, %r10d			; BURR-NEXT: xorl %r10d, %r10d
	; BURR-NEXT: movl %esi, %ecx			; BURR-NEXT: movl %esi, %ecx
	; BURR-NEXT: shldq %cl, %r11, %r10			; BURR-NEXT: shldq %cl, %r11, %r10
	; BURR-NEXT: addb $-128, %cl			; BURR-NEXT: leal -128(%rsi), %ecx
	; BURR-NEXT: xorl %edi, %edi			; BURR-NEXT: xorl %edi, %edi
	; BURR-NEXT: shldq %cl, %r11, %rdi			; BURR-NEXT: shldq %cl, %r11, %rdi
	; BURR-NEXT: movl $1, %edx			; BURR-NEXT: movl $1, %edx
	; BURR-NEXT: shlq %cl, %rdx			; BURR-NEXT: shlq %cl, %rdx
	; BURR-NEXT: testb $64, %cl			; BURR-NEXT: testb $64, %cl
	; BURR-NEXT: cmovneq %rdx, %rdi			; BURR-NEXT: cmovneq %rdx, %rdi
	; BURR-NEXT: cmovneq %r8, %rdx			; BURR-NEXT: cmovneq %r8, %rdx
	; BURR-NEXT: movl %esi, %ecx			; BURR-NEXT: movl %esi, %ecx
	Show All 24 Lines
	; SRC-NEXT: movb $-128, %cl			; SRC-NEXT: movb $-128, %cl
	; SRC-NEXT: subb %sil, %cl			; SRC-NEXT: subb %sil, %cl
	; SRC-NEXT: xorl %r8d, %r8d			; SRC-NEXT: xorl %r8d, %r8d
	; SRC-NEXT: movl $1, %edi			; SRC-NEXT: movl $1, %edi
	; SRC-NEXT: movl $1, %r10d			; SRC-NEXT: movl $1, %r10d
	; SRC-NEXT: shrdq %cl, %r8, %r10			; SRC-NEXT: shrdq %cl, %r8, %r10
	; SRC-NEXT: testb $64, %cl			; SRC-NEXT: testb $64, %cl
	; SRC-NEXT: cmovneq %r8, %r10			; SRC-NEXT: cmovneq %r8, %r10
	; SRC-NEXT: movl %esi, %r9d			; SRC-NEXT: leal -128(%rsi), %r9d
	; SRC-NEXT: addb $-128, %r9b
	; SRC-NEXT: xorl %edx, %edx			; SRC-NEXT: xorl %edx, %edx
	; SRC-NEXT: movl %r9d, %ecx			; SRC-NEXT: movl %r9d, %ecx
	; SRC-NEXT: shldq %cl, %rdi, %rdx			; SRC-NEXT: shldq %cl, %rdi, %rdx
	; SRC-NEXT: xorl %r11d, %r11d			; SRC-NEXT: xorl %r11d, %r11d
	; SRC-NEXT: movl %esi, %ecx			; SRC-NEXT: movl %esi, %ecx
	; SRC-NEXT: shldq %cl, %rdi, %r11			; SRC-NEXT: shldq %cl, %rdi, %r11
	; SRC-NEXT: movl $1, %ebx			; SRC-NEXT: movl $1, %ebx
	; SRC-NEXT: shlq %cl, %rbx			; SRC-NEXT: shlq %cl, %rbx
	Show All 37 Lines
	; LIN-NEXT: cmovsq %r9, %rcx			; LIN-NEXT: cmovsq %r9, %rcx
	; LIN-NEXT: movq %rcx, (%rdi)			; LIN-NEXT: movq %rcx, (%rdi)
	; LIN-NEXT: xorl %edi, %edi			; LIN-NEXT: xorl %edi, %edi
	; LIN-NEXT: movl %esi, %ecx			; LIN-NEXT: movl %esi, %ecx
	; LIN-NEXT: shldq %cl, %r8, %rdi			; LIN-NEXT: shldq %cl, %r8, %rdi
	; LIN-NEXT: cmovneq %rdx, %rdi			; LIN-NEXT: cmovneq %rdx, %rdi
	; LIN-NEXT: cmovsq %r9, %rdi			; LIN-NEXT: cmovsq %r9, %rdi
	; LIN-NEXT: movq %rdi, 8(%rax)			; LIN-NEXT: movq %rdi, 8(%rax)
	; LIN-NEXT: movl %esi, %edx			; LIN-NEXT: leal -128(%rsi), %r10d
	; LIN-NEXT: addb $-128, %dl			; LIN-NEXT: movl $1, %edx
	; LIN-NEXT: movl $1, %r10d			; LIN-NEXT: movl %r10d, %ecx
	; LIN-NEXT: movl %edx, %ecx			; LIN-NEXT: shlq %cl, %rdx
	; LIN-NEXT: shlq %cl, %r10			; LIN-NEXT: testb $64, %r10b
	; LIN-NEXT: testb $64, %dl			; LIN-NEXT: movq %rdx, %rdi
	; LIN-NEXT: movq %r10, %rdi
	; LIN-NEXT: cmovneq %r9, %rdi			; LIN-NEXT: cmovneq %r9, %rdi
	; LIN-NEXT: movb $-128, %cl			; LIN-NEXT: movb $-128, %cl
	; LIN-NEXT: subb %sil, %cl			; LIN-NEXT: subb %sil, %cl
	; LIN-NEXT: movl $1, %esi			; LIN-NEXT: movl $1, %esi
	; LIN-NEXT: shrdq %cl, %r9, %rsi			; LIN-NEXT: shrdq %cl, %r9, %rsi
	; LIN-NEXT: testb $64, %cl			; LIN-NEXT: testb $64, %cl
	; LIN-NEXT: cmovneq %r9, %rsi			; LIN-NEXT: cmovneq %r9, %rsi
	; LIN-NEXT: cmovsq %rdi, %rsi			; LIN-NEXT: cmovsq %rdi, %rsi
	; LIN-NEXT: cmoveq %r9, %rsi			; LIN-NEXT: cmoveq %r9, %rsi
	; LIN-NEXT: movq %rsi, 16(%rax)			; LIN-NEXT: movq %rsi, 16(%rax)
	; LIN-NEXT: xorl %esi, %esi			; LIN-NEXT: xorl %esi, %esi
	; LIN-NEXT: movl %edx, %ecx			; LIN-NEXT: movl %r10d, %ecx
	; LIN-NEXT: shldq %cl, %r8, %rsi			; LIN-NEXT: shldq %cl, %r8, %rsi
	; LIN-NEXT: cmovneq %r10, %rsi			; LIN-NEXT: cmovneq %rdx, %rsi
	; LIN-NEXT: cmovnsq %r9, %rsi			; LIN-NEXT: cmovnsq %r9, %rsi
	; LIN-NEXT: cmoveq %r9, %rsi			; LIN-NEXT: cmoveq %r9, %rsi
	; LIN-NEXT: movq %rsi, 24(%rax)			; LIN-NEXT: movq %rsi, 24(%rax)
	; LIN-NEXT: retq			; LIN-NEXT: retq
	%b = add i256 %a, 1			%b = add i256 %a, 1
	%m = shl i256 %b, 1			%m = shl i256 %b, 1
	%p = add i256 %m, 1			%p = add i256 %m, 1
	%v = lshr i256 %b, %p			%v = lshr i256 %b, %p
	▲ Show 20 Lines • Show All 788 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEAClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 177876

llvm/trunk/lib/Target/X86/X86InstrArithmetic.td

llvm/trunk/lib/Target/X86/X86InstrInfo.h

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

llvm/trunk/test/CodeGen/X86/GlobalISel/add-scalar.ll

llvm/trunk/test/CodeGen/X86/GlobalISel/shl-scalar-widening.ll

llvm/trunk/test/CodeGen/X86/GlobalISel/shl-scalar.ll

llvm/trunk/test/CodeGen/X86/fixup-bw-copy.ll

llvm/trunk/test/CodeGen/X86/fshr.ll

llvm/trunk/test/CodeGen/X86/iabs.ll

llvm/trunk/test/CodeGen/X86/mul-constant-i8.ll

llvm/trunk/test/CodeGen/X86/popcnt.ll

llvm/trunk/test/CodeGen/X86/pr23664.ll

llvm/trunk/test/CodeGen/X86/rotate4.ll

llvm/trunk/test/CodeGen/X86/scheduler-backtracking.ll

[x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
ClosedPublic