This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelLowering.cpp
-
X86InstrArithmetic.td
-
X86InstrCompiler.td
1
X86InstrInfo.td
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
2
slow-incdec.ll

Differential D37177

[X86] Don't disable slow INC/DEC if optimizing for size
ClosedPublic

Authored by craig.topper on Aug 26 2017, 12:12 AM.

Download Raw Diff

Details

Reviewers

chandlerc
zvi
RKSimon
spatel

Commits

rG3be1db82b6cc: [X86] Don't disable slow INC/DEC if optimizing for size
rL312866: [X86] Don't disable slow INC/DEC if optimizing for size

Summary

Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.

This appears to match gcc behavior.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 26 2017, 12:12 AM

This patch intersects with a topic that @RKSimon was hoping to discuss at this year's dev meeting (it was proposed as a BOF for all targets, but I think we were too late to get an official BOF; also, I think Simon is out of the office for a week, so I don't expect any immediate comments from him):
Do we need fake "features" like FeatureSlowIncDec in isel? Can we just uniformly isel inc/dec and then transform it to add/sub in MI based on the CPU's scheduler model? Or maybe the opposite: we isel add/sub as the default and then convert to inc/dec?

This could be a MachineCombiner transform, or it might be easier to have a single-purpose inc/dec conversion pass that uses properties of the scheduler model to drive the decision.

test/CodeGen/X86/slow-incdec.ll
2–5	I don't know why these tests are this complicated. We can check for inc/dec output with tests as simple as: $ cat incdec.ll define i32 @inc(i32 %x) { %r = add i32 %x, 1 ret i32 %r } define i32 @dec(i32 %x) { %r = add i32 %x, -1 ret i32 %r } $ ./llc -o - incdec.ll -mtriple=i386-unknown-unkown -mattr=-slow-incdec \| grep eax \| grep -v mov incl %eax decl %eax $ ./llc -o - incdec.ll -mtriple=i386-unknown-unkown -mattr=+slow-incdec \| grep eax \| grep -v mov addl $1, %eax addl $-1, %eax

Use a simpler test case.

I'd also like to see less feature flags. Particularly the ones that encode the CPU name into a flag.

Is there anything in the CPU scheduler model that captures INC/DEC being slow. I think its only "slow" because it doesn't update the carry flag and creates false dependencies later.

I also think it should be disabled on earlier CPUs than Haswell. I don't think anything change in microarchitecture at Haswell that made it different. gcc disables INC/DEC at least back to "core2".

In D37177#853325, @craig.topper wrote:

I'd also like to see less feature flags. Particularly the ones that encode the CPU name into a flag.

Yes, those are extra wrong. :)

Is there anything in the CPU scheduler model that captures INC/DEC being slow. I think its only "slow" because it doesn't update the carry flag and creates false dependencies later.

There's nothing in there currently that I can see. We could transfer fake feature bits into the SchedMachineModel (a rename of that object might be due at that point) or something off to the side of it?

I also think it should be disabled on earlier CPUs than Haswell. I don't think anything change in microarchitecture at Haswell that made it different. gcc disables INC/DEC at least back to "core2".

Yes, I doubt we'll ever get a complete 1-to-1 mapping from uarch characteristic to the output code we want to see, so we'll end up fudging it. But at least if it's at the MI layer, we'll have a more principled approach about when it gets activated?

Side note specifically about this one: the goal of -Os (optsize) is fuzzy. Where to draw the line between size and perf is never clear to me. I thought -Oz was defined more concretely (reduce size no matter what it does to perf), but even that is not clear based on the current text ("Like -Os but reduces code size further"). So I don't object to this patch if it makes llvm behavior less suprising vs. gcc (or there's some measurable win somewhere?), but if we can move this case to MI just as easily and/or create some infrastructure to get all of the fake features moved over, I think that's preferable.

The surprising thing to me, and the reason I even looked at this is that we gave up the size optimization of MOV32r1/MOV32r_1 at -Os when the SlowIncDec flag is set.

Submit your changes to the slow-incdec.ll tests so this patch shows the diff?

test/CodeGen/X86/slow-incdec.ll
3	Add a common -check-prefix=CHECK check

Test has been pre-commited so now the test diff is relative to just this patch.

I generally like the direction, peanut gallery comment. No need to wait for me to land or anything...

lib/Target/X86/X86InstrInfo.td
911	Maybe name this `AvoidIncDec` or `NoIncDec` or `DisableIncDec` or ... rather than explaining the rationale in the name?

LGTM - Chandler's suggestion of renaming NotSlowIncDec_Or_OptForSize makes sense as well

This revision is now accepted and ready to land.Sep 9 2017, 3:54 AM

I'll rename, but the polarity of those suggestions is backwards. It needs to be something like UseIncDec or DontAvoidIncDec.

Closed by commit rL312866: [X86] Don't disable slow INC/DEC if optimizing for size (authored by ctopper). · Explain WhySep 9 2017, 10:13 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

8 lines

X86InstrArithmetic.td

6 lines

X86InstrCompiler.td

14 lines

X86InstrInfo.td

3 lines

test/

CodeGen/

X86/

slow-incdec.ll

129 lines

Diff 112791

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,556 Lines • ▼ Show 20 Lines	for (SDNode::use_iterator UI = Op.getNode()->use_begin(),
if (UI->getOpcode() != ISD::CopyToReg &&		if (UI->getOpcode() != ISD::CopyToReg &&
UI->getOpcode() != ISD::SETCC &&		UI->getOpcode() != ISD::SETCC &&
UI->getOpcode() != ISD::STORE)		UI->getOpcode() != ISD::STORE)
goto default_case;		goto default_case;

if (ConstantSDNode *C =		if (ConstantSDNode *C =
dyn_cast<ConstantSDNode>(ArithOp.getOperand(1))) {		dyn_cast<ConstantSDNode>(ArithOp.getOperand(1))) {
// An add of one will be selected as an INC.		// An add of one will be selected as an INC.
if (C->isOne() && !Subtarget.slowIncDec()) {		if (C->isOne() &&
		(!Subtarget.slowIncDec() \|\|
		DAG.getMachineFunction().getFunction()->optForSize())) {
Opcode = X86ISD::INC;		Opcode = X86ISD::INC;
NumOperands = 1;		NumOperands = 1;
break;		break;
}		}

// An add of negative one (subtract of one) will be selected as a DEC.		// An add of negative one (subtract of one) will be selected as a DEC.
if (C->isAllOnesValue() && !Subtarget.slowIncDec()) {		if (C->isAllOnesValue() &&
		(!Subtarget.slowIncDec() \|\|
		DAG.getMachineFunction().getFunction()->optForSize())) {
Opcode = X86ISD::DEC;		Opcode = X86ISD::DEC;
NumOperands = 1;		NumOperands = 1;
break;		break;
}		}
}		}

// Otherwise use a regular EFLAGS-setting add.		// Otherwise use a regular EFLAGS-setting add.
Opcode = X86ISD::ADD;		Opcode = X86ISD::ADD;
▲ Show 20 Lines • Show All 20,277 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrArithmetic.td

Show First 20 Lines • Show All 475 Lines • ▼ Show 20 Lines	def INC16r_alt : I<0x40, AddRegFrm, (outs GR16:$dst), (ins GR16:$src1),
"inc{w}\t$dst", [], IIC_UNARY_REG>,		"inc{w}\t$dst", [], IIC_UNARY_REG>,
OpSize16, Requires<[Not64BitMode]>;		OpSize16, Requires<[Not64BitMode]>;
def INC32r_alt : I<0x40, AddRegFrm, (outs GR32:$dst), (ins GR32:$src1),		def INC32r_alt : I<0x40, AddRegFrm, (outs GR32:$dst), (ins GR32:$src1),
"inc{l}\t$dst", [], IIC_UNARY_REG>,		"inc{l}\t$dst", [], IIC_UNARY_REG>,
OpSize32, Requires<[Not64BitMode]>;		OpSize32, Requires<[Not64BitMode]>;
} // CodeSize = 1, hasSideEffects = 0		} // CodeSize = 1, hasSideEffects = 0
} // Constraints = "$src1 = $dst", SchedRW		} // Constraints = "$src1 = $dst", SchedRW

let CodeSize = 2, SchedRW = [WriteALULd, WriteRMW], Predicates = [NotSlowIncDec] in {		let CodeSize = 2, SchedRW = [WriteALULd, WriteRMW],
		Predicates = [NotSlowIncDec_Or_OptForSize] in {
def INC8m : I<0xFE, MRM0m, (outs), (ins i8mem :$dst), "inc{b}\t$dst",		def INC8m : I<0xFE, MRM0m, (outs), (ins i8mem :$dst), "inc{b}\t$dst",
[(store (add (loadi8 addr:$dst), 1), addr:$dst),		[(store (add (loadi8 addr:$dst), 1), addr:$dst),
(implicit EFLAGS)], IIC_UNARY_MEM>;		(implicit EFLAGS)], IIC_UNARY_MEM>;
def INC16m : I<0xFF, MRM0m, (outs), (ins i16mem:$dst), "inc{w}\t$dst",		def INC16m : I<0xFF, MRM0m, (outs), (ins i16mem:$dst), "inc{w}\t$dst",
[(store (add (loadi16 addr:$dst), 1), addr:$dst),		[(store (add (loadi16 addr:$dst), 1), addr:$dst),
(implicit EFLAGS)], IIC_UNARY_MEM>, OpSize16;		(implicit EFLAGS)], IIC_UNARY_MEM>, OpSize16;
def INC32m : I<0xFF, MRM0m, (outs), (ins i32mem:$dst), "inc{l}\t$dst",		def INC32m : I<0xFF, MRM0m, (outs), (ins i32mem:$dst), "inc{l}\t$dst",
[(store (add (loadi32 addr:$dst), 1), addr:$dst),		[(store (add (loadi32 addr:$dst), 1), addr:$dst),
Show All 30 Lines	def DEC16r_alt : I<0x48, AddRegFrm, (outs GR16:$dst), (ins GR16:$src1),
OpSize16, Requires<[Not64BitMode]>;		OpSize16, Requires<[Not64BitMode]>;
def DEC32r_alt : I<0x48, AddRegFrm, (outs GR32:$dst), (ins GR32:$src1),		def DEC32r_alt : I<0x48, AddRegFrm, (outs GR32:$dst), (ins GR32:$src1),
"dec{l}\t$dst", [], IIC_UNARY_REG>,		"dec{l}\t$dst", [], IIC_UNARY_REG>,
OpSize32, Requires<[Not64BitMode]>;		OpSize32, Requires<[Not64BitMode]>;
} // CodeSize = 1, hasSideEffects = 0		} // CodeSize = 1, hasSideEffects = 0
} // Constraints = "$src1 = $dst", SchedRW		} // Constraints = "$src1 = $dst", SchedRW


let CodeSize = 2, SchedRW = [WriteALULd, WriteRMW], Predicates = [NotSlowIncDec] in {		let CodeSize = 2, SchedRW = [WriteALULd, WriteRMW],
		Predicates = [NotSlowIncDec_Or_OptForSize] in {
def DEC8m : I<0xFE, MRM1m, (outs), (ins i8mem :$dst), "dec{b}\t$dst",		def DEC8m : I<0xFE, MRM1m, (outs), (ins i8mem :$dst), "dec{b}\t$dst",
[(store (add (loadi8 addr:$dst), -1), addr:$dst),		[(store (add (loadi8 addr:$dst), -1), addr:$dst),
(implicit EFLAGS)], IIC_UNARY_MEM>;		(implicit EFLAGS)], IIC_UNARY_MEM>;
def DEC16m : I<0xFF, MRM1m, (outs), (ins i16mem:$dst), "dec{w}\t$dst",		def DEC16m : I<0xFF, MRM1m, (outs), (ins i16mem:$dst), "dec{w}\t$dst",
[(store (add (loadi16 addr:$dst), -1), addr:$dst),		[(store (add (loadi16 addr:$dst), -1), addr:$dst),
(implicit EFLAGS)], IIC_UNARY_MEM>, OpSize16;		(implicit EFLAGS)], IIC_UNARY_MEM>, OpSize16;
def DEC32m : I<0xFF, MRM1m, (outs), (ins i32mem:$dst), "dec{l}\t$dst",		def DEC32m : I<0xFF, MRM1m, (outs), (ins i32mem:$dst), "dec{l}\t$dst",
[(store (add (loadi32 addr:$dst), -1), addr:$dst),		[(store (add (loadi32 addr:$dst), -1), addr:$dst),
▲ Show 20 Lines • Show All 838 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrCompiler.td

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines
// Other widths can also make use of the 32-bit xor, which may have a smaller		// Other widths can also make use of the 32-bit xor, which may have a smaller
// encoding and avoid partial register updates.		// encoding and avoid partial register updates.
let AddedComplexity = 10 in {		let AddedComplexity = 10 in {
def : Pat<(i8 0), (EXTRACT_SUBREG (MOV32r0), sub_8bit)>;		def : Pat<(i8 0), (EXTRACT_SUBREG (MOV32r0), sub_8bit)>;
def : Pat<(i16 0), (EXTRACT_SUBREG (MOV32r0), sub_16bit)>;		def : Pat<(i16 0), (EXTRACT_SUBREG (MOV32r0), sub_16bit)>;
def : Pat<(i64 0), (SUBREG_TO_REG (i64 0), (MOV32r0), sub_32bit)>;		def : Pat<(i64 0), (SUBREG_TO_REG (i64 0), (MOV32r0), sub_32bit)>;
}		}

let Predicates = [OptForSize, NotSlowIncDec, Not64BitMode],		let Predicates = [OptForSize, Not64BitMode],
AddedComplexity = 10 in {		AddedComplexity = 10 in {
// Pseudo instructions for materializing 1 and -1 using XOR+INC/DEC,		// Pseudo instructions for materializing 1 and -1 using XOR+INC/DEC,
// which only require 3 bytes compared to MOV32ri which requires 5.		// which only require 3 bytes compared to MOV32ri which requires 5.
let Defs = [EFLAGS], isReMaterializable = 1, isPseudo = 1 in {		let Defs = [EFLAGS], isReMaterializable = 1, isPseudo = 1 in {
def MOV32r1 : I<0, Pseudo, (outs GR32:$dst), (ins), "",		def MOV32r1 : I<0, Pseudo, (outs GR32:$dst), (ins), "",
[(set GR32:$dst, 1)]>;		[(set GR32:$dst, 1)]>;
def MOV32r_1 : I<0, Pseudo, (outs GR32:$dst), (ins), "",		def MOV32r_1 : I<0, Pseudo, (outs GR32:$dst), (ins), "",
[(set GR32:$dst, -1)]>;		[(set GR32:$dst, -1)]>;
▲ Show 20 Lines • Show All 408 Lines • ▼ Show 20 Lines
defm LOCK_SUB : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, X86lock_sub, "sub">;		defm LOCK_SUB : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, X86lock_sub, "sub">;
defm LOCK_OR : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, X86lock_or , "or">;		defm LOCK_OR : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, X86lock_or , "or">;
defm LOCK_AND : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, X86lock_and, "and">;		defm LOCK_AND : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, X86lock_and, "and">;
defm LOCK_XOR : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, X86lock_xor, "xor">;		defm LOCK_XOR : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, X86lock_xor, "xor">;

multiclass LOCK_ArithUnOp<bits<8> Opc8, bits<8> Opc, Format Form,		multiclass LOCK_ArithUnOp<bits<8> Opc8, bits<8> Opc, Format Form,
int Increment, string mnemonic> {		int Increment, string mnemonic> {
let Defs = [EFLAGS], mayLoad = 1, mayStore = 1, isCodeGenOnly = 1,		let Defs = [EFLAGS], mayLoad = 1, mayStore = 1, isCodeGenOnly = 1,
SchedRW = [WriteALULd, WriteRMW], Predicates = [NotSlowIncDec] in {		SchedRW = [WriteALULd, WriteRMW] in {
def NAME#8m : I<Opc8, Form, (outs), (ins i8mem :$dst),		def NAME#8m : I<Opc8, Form, (outs), (ins i8mem :$dst),
!strconcat(mnemonic, "{b}\t$dst"),		!strconcat(mnemonic, "{b}\t$dst"),
[(set EFLAGS, (X86lock_add addr:$dst, (i8 Increment)))],		[(set EFLAGS, (X86lock_add addr:$dst, (i8 Increment)))],
IIC_UNARY_MEM>, LOCK;		IIC_UNARY_MEM>, LOCK;
def NAME#16m : I<Opc, Form, (outs), (ins i16mem:$dst),		def NAME#16m : I<Opc, Form, (outs), (ins i16mem:$dst),
!strconcat(mnemonic, "{w}\t$dst"),		!strconcat(mnemonic, "{w}\t$dst"),
[(set EFLAGS, (X86lock_add addr:$dst, (i16 Increment)))],		[(set EFLAGS, (X86lock_add addr:$dst, (i16 Increment)))],
IIC_UNARY_MEM>, OpSize16, LOCK;		IIC_UNARY_MEM>, OpSize16, LOCK;
def NAME#32m : I<Opc, Form, (outs), (ins i32mem:$dst),		def NAME#32m : I<Opc, Form, (outs), (ins i32mem:$dst),
!strconcat(mnemonic, "{l}\t$dst"),		!strconcat(mnemonic, "{l}\t$dst"),
[(set EFLAGS, (X86lock_add addr:$dst, (i32 Increment)))],		[(set EFLAGS, (X86lock_add addr:$dst, (i32 Increment)))],
IIC_UNARY_MEM>, OpSize32, LOCK;		IIC_UNARY_MEM>, OpSize32, LOCK;
def NAME#64m : RI<Opc, Form, (outs), (ins i64mem:$dst),		def NAME#64m : RI<Opc, Form, (outs), (ins i64mem:$dst),
!strconcat(mnemonic, "{q}\t$dst"),		!strconcat(mnemonic, "{q}\t$dst"),
[(set EFLAGS, (X86lock_add addr:$dst, (i64 Increment)))],		[(set EFLAGS, (X86lock_add addr:$dst, (i64 Increment)))],
IIC_UNARY_MEM>, LOCK;		IIC_UNARY_MEM>, LOCK;
}		}
}		}

		let Predicates = [NotSlowIncDec_Or_OptForSize] in {
defm LOCK_INC : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, 1, "inc">;		defm LOCK_INC : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, 1, "inc">;
defm LOCK_DEC : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, -1, "dec">;		defm LOCK_DEC : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, -1, "dec">;
		}

// Atomic compare and swap.		// Atomic compare and swap.
multiclass LCMPXCHG_UnOp<bits<8> Opc, Format Form, string mnemonic,		multiclass LCMPXCHG_UnOp<bits<8> Opc, Format Form, string mnemonic,
SDPatternOperator frag, X86MemOperand x86memop,		SDPatternOperator frag, X86MemOperand x86memop,
InstrItinClass itin> {		InstrItinClass itin> {
let isCodeGenOnly = 1, usesCustomInserter = 1 in {		let isCodeGenOnly = 1, usesCustomInserter = 1 in {
def NAME : I<Opc, Form, (outs), (ins x86memop:$ptr),		def NAME : I<Opc, Form, (outs), (ins x86memop:$ptr),
!strconcat(mnemonic, "\t$ptr"),		!strconcat(mnemonic, "\t$ptr"),
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	multiclass RELEASE_UNOP<dag dag8, dag dag16, dag dag32, dag dag64> {
def NAME#32m : I<0, Pseudo, (outs), (ins i32mem:$dst),		def NAME#32m : I<0, Pseudo, (outs), (ins i32mem:$dst),
"#UNOP "#NAME#"32m PSEUDO!",		"#UNOP "#NAME#"32m PSEUDO!",
[(atomic_store_32 addr:$dst, dag32)]>;		[(atomic_store_32 addr:$dst, dag32)]>;
def NAME#64m : I<0, Pseudo, (outs), (ins i64mem:$dst),		def NAME#64m : I<0, Pseudo, (outs), (ins i64mem:$dst),
"#UNOP "#NAME#"64m PSEUDO!",		"#UNOP "#NAME#"64m PSEUDO!",
[(atomic_store_64 addr:$dst, dag64)]>;		[(atomic_store_64 addr:$dst, dag64)]>;
}		}

let Defs = [EFLAGS] in {		let Defs = [EFLAGS], Predicates = [NotSlowIncDec_Or_OptForSize] in {
defm RELEASE_INC : RELEASE_UNOP<		defm RELEASE_INC : RELEASE_UNOP<
(add (atomic_load_8 addr:$dst), (i8 1)),		(add (atomic_load_8 addr:$dst), (i8 1)),
(add (atomic_load_16 addr:$dst), (i16 1)),		(add (atomic_load_16 addr:$dst), (i16 1)),
(add (atomic_load_32 addr:$dst), (i32 1)),		(add (atomic_load_32 addr:$dst), (i32 1)),
(add (atomic_load_64 addr:$dst), (i64 1))>, Requires<[NotSlowIncDec]>;		(add (atomic_load_64 addr:$dst), (i64 1))>;
defm RELEASE_DEC : RELEASE_UNOP<		defm RELEASE_DEC : RELEASE_UNOP<
(add (atomic_load_8 addr:$dst), (i8 -1)),		(add (atomic_load_8 addr:$dst), (i8 -1)),
(add (atomic_load_16 addr:$dst), (i16 -1)),		(add (atomic_load_16 addr:$dst), (i16 -1)),
(add (atomic_load_32 addr:$dst), (i32 -1)),		(add (atomic_load_32 addr:$dst), (i32 -1)),
(add (atomic_load_64 addr:$dst), (i64 -1))>, Requires<[NotSlowIncDec]>;		(add (atomic_load_64 addr:$dst), (i64 -1))>;
}		}
/*		/*
TODO: These don't work because the type inference of TableGen fails.		TODO: These don't work because the type inference of TableGen fails.
TODO: find a way to fix it.		TODO: find a way to fix it.
let Defs = [EFLAGS] in {		let Defs = [EFLAGS] in {
defm RELEASE_NEG : RELEASE_UNOP<		defm RELEASE_NEG : RELEASE_UNOP<
(ineg (atomic_load_8 addr:$dst)),		(ineg (atomic_load_8 addr:$dst)),
(ineg (atomic_load_16 addr:$dst)),		(ineg (atomic_load_16 addr:$dst)),
▲ Show 20 Lines • Show All 948 Lines • ▼ Show 20 Lines	def : Pat<(mul GR64:$src1, i64immSExt32:$src2),
(IMUL64rri32 GR64:$src1, i64immSExt32:$src2)>;		(IMUL64rri32 GR64:$src1, i64immSExt32:$src2)>;
def : Pat<(mul (loadi64 addr:$src1), i64immSExt8:$src2),		def : Pat<(mul (loadi64 addr:$src1), i64immSExt8:$src2),
(IMUL64rmi8 addr:$src1, i64immSExt8:$src2)>;		(IMUL64rmi8 addr:$src1, i64immSExt8:$src2)>;
def : Pat<(mul (loadi64 addr:$src1), i64immSExt32:$src2),		def : Pat<(mul (loadi64 addr:$src1), i64immSExt32:$src2),
(IMUL64rmi32 addr:$src1, i64immSExt32:$src2)>;		(IMUL64rmi32 addr:$src1, i64immSExt32:$src2)>;

// Increment/Decrement reg.		// Increment/Decrement reg.
// Do not make INC/DEC if it is slow		// Do not make INC/DEC if it is slow
let Predicates = [NotSlowIncDec] in {		let Predicates = [NotSlowIncDec_Or_OptForSize] in {
def : Pat<(add GR8:$src, 1), (INC8r GR8:$src)>;		def : Pat<(add GR8:$src, 1), (INC8r GR8:$src)>;
def : Pat<(add GR16:$src, 1), (INC16r GR16:$src)>;		def : Pat<(add GR16:$src, 1), (INC16r GR16:$src)>;
def : Pat<(add GR32:$src, 1), (INC32r GR32:$src)>;		def : Pat<(add GR32:$src, 1), (INC32r GR32:$src)>;
def : Pat<(add GR64:$src, 1), (INC64r GR64:$src)>;		def : Pat<(add GR64:$src, 1), (INC64r GR64:$src)>;
def : Pat<(add GR8:$src, -1), (DEC8r GR8:$src)>;		def : Pat<(add GR8:$src, -1), (DEC8r GR8:$src)>;
def : Pat<(add GR16:$src, -1), (DEC16r GR16:$src)>;		def : Pat<(add GR16:$src, -1), (DEC16r GR16:$src)>;
def : Pat<(add GR32:$src, -1), (DEC32r GR32:$src)>;		def : Pat<(add GR32:$src, -1), (DEC32r GR32:$src)>;
def : Pat<(add GR64:$src, -1), (DEC64r GR64:$src)>;		def : Pat<(add GR64:$src, -1), (DEC64r GR64:$src)>;
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.td

	Show First 20 Lines • Show All 902 Lines • ▼ Show 20 Lines

	// We could compute these on a per-module basis but doing so requires accessing			// We could compute these on a per-module basis but doing so requires accessing
	// the Function object through the <Target>Subtarget and objections were raised			// the Function object through the <Target>Subtarget and objections were raised
	// to that (see post-commit review comments for r301750).			// to that (see post-commit review comments for r301750).
	let RecomputePerFunction = 1 in {			let RecomputePerFunction = 1 in {
	def OptForSize : Predicate<"MF->getFunction()->optForSize()">;			def OptForSize : Predicate<"MF->getFunction()->optForSize()">;
	def OptForMinSize : Predicate<"MF->getFunction()->optForMinSize()">;			def OptForMinSize : Predicate<"MF->getFunction()->optForMinSize()">;
	def OptForSpeed : Predicate<"!MF->getFunction()->optForSize()">;			def OptForSpeed : Predicate<"!MF->getFunction()->optForSize()">;
				def NotSlowIncDec_Or_OptForSize : Predicate<"!Subtarget->slowIncDec() \|\| "
				chandlercUnsubmitted Not Done Reply Inline Actions Maybe name this `AvoidIncDec` or `NoIncDec` or `DisableIncDec` or ... rather than explaining the rationale in the name? chandlerc: Maybe name this `AvoidIncDec` or `NoIncDec` or `DisableIncDec` or ... rather than explaining…
				"MF->getFunction()->optForSize()">;
	}			}

	def FastBTMem : Predicate<"!Subtarget->isBTMemSlow()">;			def FastBTMem : Predicate<"!Subtarget->isBTMemSlow()">;
	def CallImmAddr : Predicate<"Subtarget->isLegalToCallImmediateAddr()">;			def CallImmAddr : Predicate<"Subtarget->isLegalToCallImmediateAddr()">;
	def FavorMemIndirectCall : Predicate<"!Subtarget->callRegIndirect()">;			def FavorMemIndirectCall : Predicate<"!Subtarget->callRegIndirect()">;
	def NotSlowIncDec : Predicate<"!Subtarget->slowIncDec()">;
	def HasFastMem32 : Predicate<"!Subtarget->isUnalignedMem32Slow()">;			def HasFastMem32 : Predicate<"!Subtarget->isUnalignedMem32Slow()">;
	def HasFastLZCNT : Predicate<"Subtarget->hasFastLZCNT()">;			def HasFastLZCNT : Predicate<"Subtarget->hasFastLZCNT()">;
	def HasFastSHLDRotate : Predicate<"Subtarget->hasFastSHLDRotate()">;			def HasFastSHLDRotate : Predicate<"Subtarget->hasFastSHLDRotate()">;
	def HasERMSB : Predicate<"Subtarget->hasERMSB()">;			def HasERMSB : Predicate<"Subtarget->hasERMSB()">;
	def HasMFence : Predicate<"Subtarget->hasMFence()">;			def HasMFence : Predicate<"Subtarget->hasMFence()">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// X86 Instruction Format Definitions.			// X86 Instruction Format Definitions.
	▲ Show 20 Lines • Show All 2,392 Lines • Show Last 20 Lines

test/CodeGen/X86/slow-incdec.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=i386-unknown-linux-gnu -mattr=-slow-incdec < %s \| FileCheck -check-prefix=INCDEC %s			; RUN: llc -mtriple=i386-unknown-linux-gnu -mattr=-slow-incdec < %s \| FileCheck -check-prefix=INCDEC %s
	; RUN: llc -mtriple=i386-unknown-linux-gnu -mattr=+slow-incdec < %s \| FileCheck -check-prefix=ADD %s			; RUN: llc -mtriple=i386-unknown-linux-gnu -mattr=+slow-incdec < %s \| FileCheck -check-prefix=ADD %s
				RKSimonUnsubmitted Not Done Reply Inline Actions Add a common -check-prefix=CHECK check RKSimon: Add a common -check-prefix=CHECK check

	; check -mattr=-slow-incdec			define i32 @inc(i32 %x) {
				spatelUnsubmitted Not Done Reply Inline Actions I don't know why these tests are this complicated. We can check for inc/dec output with tests as simple as: $ cat incdec.ll define i32 @inc(i32 %x) { %r = add i32 %x, 1 ret i32 %r } define i32 @dec(i32 %x) { %r = add i32 %x, -1 ret i32 %r } $ ./llc -o - incdec.ll -mtriple=i386-unknown-unkown -mattr=-slow-incdec \| grep eax \| grep -v mov incl %eax decl %eax $ ./llc -o - incdec.ll -mtriple=i386-unknown-unkown -mattr=+slow-incdec \| grep eax \| grep -v mov addl $1, %eax addl $-1, %eax spatel: I don't know why these tests are this complicated. We can check for inc/dec output with tests…
	; INCDEC-NOT: addl $-1			; INCDEC-LABEL: inc:
	; INCDEC: dec			; INCDEC: # BB#0:
	; INCDEC-NOT: addl $1			; INCDEC-NEXT: movl {{[0-9]+}}(%esp), %eax
	; INCDEC: inc			; INCDEC-NEXT: incl %eax
				; INCDEC-NEXT: retl
	; check -mattr=+slow-incdec
	; ADD: addl $-1
	; ADD-NOT: dec
	; ADD: addl $1
	; ADD-NOT: inc

	; Function Attrs: nounwind readonly
	define i32 @slow_1(i32* nocapture readonly %a, i32 %s) #0 {
	entry:
	%cmp5 = icmp eq i32 %s, 0
	br i1 %cmp5, label %for.end, label %for.body.preheader

	for.body.preheader: ; preds = %entry
	br label %for.body

	for.cond: ; preds = %for.body
	%cmp = icmp eq i32 %dec, 0
	br i1 %cmp, label %for.end.loopexit, label %for.body

	for.body: ; preds = %for.body.preheader, %for.cond
	%i.06 = phi i32 [ %dec, %for.cond ], [ %s, %for.body.preheader ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i32 %i.06
	%0 = load i32, i32* %arrayidx, align 4, !tbaa !1
	%cmp1 = icmp eq i32 %0, 0
	;			;
	%dec = add nsw i32 %i.06, -1			; ADD-LABEL: inc:
	br i1 %cmp1, label %for.end.loopexit, label %for.cond			; ADD: # BB#0:
				; ADD-NEXT: movl {{[0-9]+}}(%esp), %eax
	for.end.loopexit: ; preds = %for.cond, %for.body			; ADD-NEXT: addl $1, %eax
	%i.0.lcssa.ph = phi i32 [ 0, %for.cond ], [ %i.06, %for.body ]			; ADD-NEXT: retl
	br label %for.end			%r = add i32 %x, 1
				ret i32 %r
	for.end: ; preds = %for.end.loopexit, %entry
	%i.0.lcssa = phi i32 [ 0, %entry ], [ %i.0.lcssa.ph, %for.end.loopexit ]
	ret i32 %i.0.lcssa
	}			}

	; Function Attrs: nounwind readonly			define i32 @dec(i32 %x) {
	define i32 @slow_2(i32* nocapture readonly %a, i32 %s) #0 {			; INCDEC-LABEL: dec:
	entry:			; INCDEC: # BB#0:
	%cmp5 = icmp eq i32 %s, 0			; INCDEC-NEXT: movl {{[0-9]+}}(%esp), %eax
	br i1 %cmp5, label %for.end, label %for.body.preheader			; INCDEC-NEXT: decl %eax
				; INCDEC-NEXT: retl
	for.body.preheader: ; preds = %entry			;
	br label %for.body			; ADD-LABEL: dec:
				; ADD: # BB#0:
	for.cond: ; preds = %for.body			; ADD-NEXT: movl {{[0-9]+}}(%esp), %eax
	%cmp = icmp eq i32 %inc, 0			; ADD-NEXT: addl $-1, %eax
	br i1 %cmp, label %for.end.loopexit, label %for.body			; ADD-NEXT: retl
				%r = add i32 %x, -1
	for.body: ; preds = %for.body.preheader, %for.cond			ret i32 %r
	%i.06 = phi i32 [ %inc, %for.cond ], [ %s, %for.body.preheader ]			}
	%arrayidx = getelementptr inbounds i32, i32* %a, i32 %i.06
	%0 = load i32, i32* %arrayidx, align 4, !tbaa !1
	%cmp1 = icmp eq i32 %0, 0
	%inc = add nsw i32 %i.06, 1
	br i1 %cmp1, label %for.end.loopexit, label %for.cond

	for.end.loopexit: ; preds = %for.cond, %for.body
	%i.0.lcssa.ph = phi i32 [ 0, %for.cond ], [ %i.06, %for.body ]
	br label %for.end

	for.end: ; preds = %for.end.loopexit, %entry			define i32 @inc_size(i32 %x) optsize {
	%i.0.lcssa = phi i32 [ 0, %entry ], [ %i.0.lcssa.ph, %for.end.loopexit ]			; INCDEC-LABEL: inc_size:
	ret i32 %i.0.lcssa			; INCDEC: # BB#0:
				; INCDEC-NEXT: movl {{[0-9]+}}(%esp), %eax
				; INCDEC-NEXT: incl %eax
				; INCDEC-NEXT: retl
				;
				; ADD-LABEL: inc_size:
				; ADD: # BB#0:
				; ADD-NEXT: movl {{[0-9]+}}(%esp), %eax
				; ADD-NEXT: incl %eax
				; ADD-NEXT: retl
				%r = add i32 %x, 1
				ret i32 %r
	}			}

	!1 = !{!2, !2, i64 0}			define i32 @dec_size(i32 %x) optsize {
	!2 = !{!"int", !3, i64 0}			; INCDEC-LABEL: dec_size:
	!3 = !{!"omnipotent char", !4, i64 0}			; INCDEC: # BB#0:
	!4 = !{!"Simple C/C++ TBAA"}			; INCDEC-NEXT: movl {{[0-9]+}}(%esp), %eax
				; INCDEC-NEXT: decl %eax
				; INCDEC-NEXT: retl
				;
				; ADD-LABEL: dec_size:
				; ADD: # BB#0:
				; ADD-NEXT: movl {{[0-9]+}}(%esp), %eax
				; ADD-NEXT: decl %eax
				; ADD-NEXT: retl
				%r = add i32 %x, -1
				ret i32 %r
				}