This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
2/7
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
6/6
DAGCombiner.cpp
-
Target/X86/
-
X86/
-
X86ISelLowering.h
1/1
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1/1
dagcombine-shifts.ll

Differential D74165

[x86] [DAGCombine] Prefer shifts of constant widths.
Needs ReviewPublic

Authored by jlebar on Feb 6 2020, 1:56 PM.

Download Raw Diff

Details

Reviewers

bkramer
spatel
RKSimon

Summary

Commute shift and select in the following pattern:

shift lhs, (select cond, constant1, constant2) -->
select cond, (shift lhs, constant1), (shift lhs, constant2)

This is beneficial on x86, where shifting by an immediate is faster than
shifting by a register.

Canonical example:

return x << (cond ? 4 : 8);

before this patch

mov     eax, edi
xor     ecx, ecx
test    esi, esi
sete    cl
lea     ecx, [rcx + 2*rcx]
add     ecx, 3
shl     eax, cl
ret

after this patch

lea     eax, [8*rdi]
shl     edi, 6
test    esi, esi
cmove   eax, edi
ret

I enabled this folding only on x86. By my reading of the ARM Coretex-A75
optimization guide, this is not beneficial there. (I didn't check other ARM
processors.) I was unable to find a PPC optimization guide that listed
instruction latencies, so I didn't enable it there.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jlebar created this revision.Feb 6 2020, 1:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2020, 1:56 PM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B45891: Diff 242998.Feb 6 2020, 1:57 PM

craig.topper added a reviewer: spatel.Feb 6 2020, 2:06 PM

Is there any canonicalization happening in InstCombine for this?

Vector tests? SSE shifts by uniform constants are a lot better than the alternatives.

Would it be better to make this more generic from the start? Some mechanism that allows targets to push selects up/down the DAG.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
501	"but not FHL or FSHR"? Why shouldn't funnels shifts be included?
7395	isConstantIntBuildVectorOrConstantInt ?
llvm/test/CodeGen/X86/select.ll
1123 ↗	(On Diff #242998)	Regression

Hi, thank you for looking at this!

Is there any canonicalization happening in InstCombine for this?

Yes, it canonicalizes to shift x, (select ...), https://gcc.godbolt.org/z/AtSRrW.

Vector tests? SSE shifts by uniform constants are a lot better than the alternatives.

Vector shift by a uniform value is better than shift by a vector of values, but I don't think that's what this transformation allows. (I recall seeing something in the x86 lowering code for what I think you're describing.) Do you see a CPU on which vector shift by an *immediate* is so much than vector shift by a register that we'd want to do two shifts by immediates instead of one by a register? I'm not seeing that as I look through the Agner table for Skylake, but I may be missing something.

TODO: Once we resolve this, I need to handle and test vector shifts somehow, if only by excluding them from this transformation.

Would it be better to make this more generic from the start? Some mechanism that allows targets to push selects up/down the DAG.

Eh, YAGNI? I looked for other opportunities to apply this optimization (i.e. where it's worth "speculating" an instruction so that one of the operands gets to be an immediate) and didn't really find any. Do you?

"but not FHL or FSHR"? Why shouldn't funnels shifts be included?

Is there a target on which we would do this transformation? If not I'd rather not write code that is never exercised (and it's more code because funnel shift takes three rather than two args).

llvm/test/CodeGen/X86/select.ll
1123 ↗	(On Diff #242998)	Wow. What...is...this...architecture. I guess the answer is, we don't do this optimization if the target doesn't have cmov? I mean, I'll do it, but are we sure the complexity is worth it for this target? I can't even find an Agner optimization table for Intel MCU (Quark?).

craig.topper added a subscriber: craig.topper.Feb 7 2020, 9:49 AM

craig.topper added inline comments.

llvm/test/CodeGen/X86/select.ll
1123 ↗	(On Diff #242998)	MCU is basically a 486 on a much more modern silicon process.

RKSimon added inline comments.Feb 7 2020, 9:58 AM

llvm/test/CodeGen/X86/select.ll
1123 ↗	(On Diff #242998)	Its not just the MCU case - all of these targets' codegen look worse tbh

jlebar marked an inline comment as done.Feb 7 2020, 10:39 AM

jlebar added inline comments.

llvm/test/CodeGen/X86/select.ll
1123 ↗	(On Diff #242998)	It does look worse. I wanted to dig in more... I somewhat arbitrarily tried Core2, Westmere, and Haswell with the x86-32 and x86-64 code. x86-32 `llc < testcases/shift-regression.ll -mtriple=i386-apple-darwin10 --x86-asm-syntax=intel -mcpu=athlon \| llvm-mca -mcpu=<arch>` x86-64 `llc < testcases/shift-regression.ll -mtriple=amd64-apple-darwin10 --x86-asm-syntax=intel \| llvm-mca -mcpu=<arch>` I'm reporting "total cycles". Core2 x86-32: HEAD 211 / patched 304 Core2 x86-64: HEAD 207 / patched 204 Westmere x86-32: HEAD 211 / patched 304 Westmere x86-64: HEAD 207 / patched 204 Haswell x86-32: HEAD 310 / patched 309 Haswell x86-64: HEAD 210 / patched 210 IOW for these CPUs it's a regression only in the x86-32 code and only in the older CPUs. One thing we could do is limit this transformation to x86-64? Or we could do more investigation, in which case I'm curious to know what kind of evidence you'd need to be comfortable with it.

Fix regressions and add vector test.

Harbormaster completed remote builds in B46130: Diff 243634.Feb 10 2020, 10:55 AM

jlebar added inline comments.Feb 10 2020, 10:55 AM

llvm/test/CodeGen/X86/select.ll
1123 ↗	(On Diff #242998)	After thinking about this more... The reason this testcase is sort of special is because the two shift widths are `N` and `N+1`, therefore it's very easy to convert from the boolean into the shift width. In a sense, what's happening here is not `shift x, (select ...)`, but rather `shift x, (add ...)`. The bug is that I'm transforming the latter, when I only want to touch the former. I fixed the regression by restricting this optimization to happen only later in the SelectionDAG pipeline, after which the select is folded into an add. Surprisingly, my change now doesn't affect any existing x86 codegen tests, other than the ones that I added. This makes me a little skeptical I've done something wrong, so I'd appreciate a careful review. I've also now excluded vector types from this transformation, taking care of the TODO from earlier, and added tests.

@RKSimon friendly ping

Have you found any other targets that would benefit from this? Otherwise we might consider put it inside X86ISelLowering initially?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
501	OK, but maybe change this to a TODO for FSHL/FSHR ?
7382	(style) Don't use auto.
7515	Move these lower down to after SimplifyDemandedBits, we tend to have the constant folding / cleanup combines first.
llvm/lib/Target/X86/X86ISelLowering.cpp
46546	(style) Don't use auto.
llvm/test/CodeGen/X86/dagcombine-shifts.ll
225	Please commit these tests with current codegen, then rebase the patch to show the codegen diff.

arsenm added a subscriber: arsenm.Feb 18 2020, 12:49 PM

arsenm added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
3464–3468	I think we generally have too many, overly specific queries like this. Is there any real reason to NOT do this on any target?

Thank you for the comments!

Have you found any other targets that would benefit from this? Otherwise we might consider put it inside X86ISelLowering initially?

Good question. @arsenm says on Discord he'd want this on AMDGPU for 64-bit shifts:

It's indirectly better for AMDGPU since if it's a 64-bit shift we can reduce it to 32-bit shifts
i.e. https://reviews.llvm.org/rG85508595350e2de0218c15f1c0088cd9f6236894
I would probably want that on, maybe not at -Oz

jlebar marked an inline comment as done.Feb 18 2020, 1:04 PM

jlebar added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
3464–3468	I am also not thrilled about adding another query like this that pretends to be general but really means "should I do this particular peephole optimization?" I gated it conditionally on the architecture because ISTM that this transformation could be a pessimation on platforms where shift-by-register is as fast as shift-by-immediate. We're replacing 1 x shift-by-register + 1 x select with 2 x shift-by-immediate + 1 x select. AFAICT e.g. on Coretex-A75 this would not be an improvement.

arsenm added inline comments.Feb 18 2020, 1:18 PM

llvm/include/llvm/CodeGen/TargetLowering.h
3464–3468	GlobalISel avoids this problem by having every combine be an explicit opt in if I can encourage you to also do this there

jlebar marked an inline comment as done.Feb 18 2020, 1:26 PM

jlebar added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
3464–3468	Happy to consider it, but I'm not sure what the suggestion is exactly. (Or at least, this seems like we're making the combine be an explicit opt-in, so I don't see how it's different from your suggestion.) Can you point me to the prior art as used in non-globalisel?

arsenm added inline comments.Feb 18 2020, 7:48 PM

llvm/include/llvm/CodeGen/TargetLowering.h
3464–3468	I mean SelectionDAG is bad and am just generally disappointed there isn't more work going on for globalisel optimizations, and everyone seems to still just be working on SelectionDAG

lebedev.ri added a subscriber: lebedev.ri.Feb 19 2020, 1:26 AM

lebedev.ri added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
3464–3468	From personal expirience - i only look at seldag because that is what is on the current (codegen) path. Doing stuff for GISel would currently result in no benefit to codegen on X86, which kinda misses the point. Is there an actual plan for X86 target to migrate to GISel?

spatel added inline comments.Feb 24 2020, 8:41 AM

llvm/include/llvm/CodeGen/TargetLowering.h
3464–3468	<crickets chirping...> Speaking only for myself here: migrating to GISel is a daunting task for x86 because there's so much code that has to be adapted, and I have not seen a list or estimate of the potential benefits to incentivize the move. There are over 100 tests in llvm/test/CodeGen/X86/GlobalISel, so some work has been done. But whether that is planned to continue or not, I have no idea.

Update per comments.

@RKSimon and others, thank you for the review and comments. Sorry for my delay here; this has changed from being my day job to my weekend hobby, and that made a bigger difference in my responsiveness than I'd like or than I expected.

I believe I've addressed all of the comments. I haven't pushed the tests sans this change just on the principle that I don't want to add tests that might end up being useless (if this patch doesn't land for some reason), but I've rebased this patch atop a commit which adds those tests. So when I commit this it will be as two patches, one to add the tests as-is, and then another to update the test with this patch.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7515	Done. This also allowed me to remove the `Level >= AfterLegalizeTypes`, which I never liked.

Harbormaster completed remote builds in B47728: Diff 247514.Mar 1 2020, 12:49 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

16 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

42 lines

Target/

X86/

X86ISelLowering.h

3 lines

X86ISelLowering.cpp

13 lines

test/

CodeGen/

X86/

dagcombine-shifts.ll

36 lines

Diff 247514

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 3,446 Lines • ▼ Show 20 Lines	public:
///		///
/// @param N the shift node		/// @param N the shift node
/// @param Level the current DAGCombine legalization level.		/// @param Level the current DAGCombine legalization level.
virtual bool isDesirableToCommuteWithShift(const SDNode *N,		virtual bool isDesirableToCommuteWithShift(const SDNode *N,
CombineLevel Level) const {		CombineLevel Level) const {
return true;		return true;
}		}

		/// Return true if it's profitable to replace
		///
		/// shift x, non-constant
		///
		/// with two instances of
		///
		/// shift x, constant
		///
		/// where `shift` is a shift or rotate operation (not including funnel shift
		/// ops).
		virtual bool
		shiftOrRotateIsFasterWithConstantShiftAmount(const SDNode *N,
		CombineLevel Level) const {
		return false;
		arsenmUnsubmitted Not Done Reply Inline Actions I think we generally have too many, overly specific queries like this. Is there any real reason to NOT do this on any target? arsenm: I think we generally have too many, overly specific queries like this. Is there any real reason…
		jlebarAuthorUnsubmitted Done Reply Inline Actions I am also not thrilled about adding another query like this that pretends to be general but really means "should I do this particular peephole optimization?" I gated it conditionally on the architecture because ISTM that this transformation could be a pessimation on platforms where shift-by-register is as fast as shift-by-immediate. We're replacing 1 x shift-by-register + 1 x select with 2 x shift-by-immediate + 1 x select. AFAICT e.g. on Coretex-A75 this would not be an improvement. jlebar: I am also not thrilled about adding another query like this that pretends to be general but…
		arsenmUnsubmitted Not Done Reply Inline Actions GlobalISel avoids this problem by having every combine be an explicit opt in if I can encourage you to also do this there arsenm: GlobalISel avoids this problem by having every combine be an explicit opt in if I can encourage…
		jlebarAuthorUnsubmitted Done Reply Inline Actions Happy to consider it, but I'm not sure what the suggestion is exactly. (Or at least, this seems like we're making the combine be an explicit opt-in, so I don't see how it's different from your suggestion.) Can you point me to the prior art as used in non-globalisel? jlebar: Happy to consider it, but I'm not sure what the suggestion is exactly. (Or at least, this…
		arsenmUnsubmitted Not Done Reply Inline Actions I mean SelectionDAG is bad and am just generally disappointed there isn't more work going on for globalisel optimizations, and everyone seems to still just be working on SelectionDAG arsenm: I mean SelectionDAG is bad and am just generally disappointed there isn't more work going on…
		lebedev.riUnsubmitted Not Done Reply Inline Actions From personal expirience - i only look at seldag because that is what is on the current (codegen) path. Doing stuff for GISel would currently result in no benefit to codegen on X86, which kinda misses the point. Is there an actual plan for X86 target to migrate to GISel? lebedev.ri: From personal expirience - i only look at seldag because that is what is on the current…
		spatelUnsubmitted Not Done Reply Inline Actions <crickets chirping...> Speaking only for myself here: migrating to GISel is a daunting task for x86 because there's so much code that has to be adapted, and I have not seen a list or estimate of the potential benefits to incentivize the move. There are over 100 tests in llvm/test/CodeGen/X86/GlobalISel, so some work has been done. But whether that is planned to continue or not, I have no idea. spatel: <crickets chirping...> Speaking only for myself here: migrating to GISel is a daunting task…
		}

/// Return true if the target has native support for the specified value type		/// Return true if the target has native support for the specified value type
/// and it is 'desirable' to use the type for the given node type. e.g. On x86		/// and it is 'desirable' to use the type for the given node type. e.g. On x86
/// i16 is legal, but undesirable since i16 instruction encodings are longer		/// i16 is legal, but undesirable since i16 instruction encodings are longer
/// and some i16 instructions are slow.		/// and some i16 instructions are slow.
virtual bool isTypeDesirableForOp(unsigned /Opc/, EVT VT) const {		virtual bool isTypeDesirableForOp(unsigned /Opc/, EVT VT) const {
// By default, assume all legal types are desirable.		// By default, assume all legal types are desirable.
return isTypeLegal(VT);		return isTypeLegal(VT);
}		}
▲ Show 20 Lines • Show All 931 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines	private:
bool reassociationCanBreakAddressingModePattern(unsigned Opc,		bool reassociationCanBreakAddressingModePattern(unsigned Opc,
const SDLoc &DL, SDValue N0,		const SDLoc &DL, SDValue N0,
SDValue N1);		SDValue N1);
SDValue reassociateOpsCommutative(unsigned Opc, const SDLoc &DL, SDValue N0,		SDValue reassociateOpsCommutative(unsigned Opc, const SDLoc &DL, SDValue N0,
SDValue N1);		SDValue N1);
SDValue reassociateOps(unsigned Opc, const SDLoc &DL, SDValue N0,		SDValue reassociateOps(unsigned Opc, const SDLoc &DL, SDValue N0,
SDValue N1, SDNodeFlags Flags);		SDValue N1, SDNodeFlags Flags);

		// SHL, SRA, SRL, RTOL, ROTR, but FSHL or FSHR.
		RKSimonUnsubmitted Done Reply Inline Actions "but not FHL or FSHR"? Why shouldn't funnels shifts be included? RKSimon: "but not FHL or FSHR"? Why shouldn't funnels shifts be included?
		RKSimonUnsubmitted Done Reply Inline Actions OK, but maybe change this to a TODO for FSHL/FSHR ? RKSimon: OK, but maybe change this to a TODO for FSHL/FSHR ?
		SDValue visitShiftOrRotate(SDNode *N);

SDValue visitShiftByConstant(SDNode *N);		SDValue visitShiftByConstant(SDNode *N);

SDValue foldSelectOfConstants(SDNode *N);		SDValue foldSelectOfConstants(SDNode *N);
SDValue foldVSelectOfConstants(SDNode *N);		SDValue foldVSelectOfConstants(SDNode *N);
SDValue foldBinOpIntoSelect(SDNode *BO);		SDValue foldBinOpIntoSelect(SDNode *BO);
bool SimplifySelectOps(SDNode *SELECT, SDValue LHS, SDValue RHS);		bool SimplifySelectOps(SDNode *SELECT, SDValue LHS, SDValue RHS);
SDValue hoistLogicOpWithSameOpcodeHands(SDNode *N);		SDValue hoistLogicOpWithSameOpcodeHands(SDNode *N);
SDValue SimplifySelect(const SDLoc &DL, SDValue N0, SDValue N1, SDValue N2);		SDValue SimplifySelect(const SDLoc &DL, SDValue N0, SDValue N1, SDValue N2);
▲ Show 20 Lines • Show All 6,861 Lines • ▼ Show 20 Lines	static SDValue combineShiftOfShiftedLogic(SDNode *Shift, SelectionDAG &DAG) {
EVT VT = Shift->getValueType(0);		EVT VT = Shift->getValueType(0);
EVT ShiftAmtVT = Shift->getOperand(1).getValueType();		EVT ShiftAmtVT = Shift->getOperand(1).getValueType();
SDValue ShiftSumC = DAG.getConstant(*C0Val + C1Val, DL, ShiftAmtVT);		SDValue ShiftSumC = DAG.getConstant(*C0Val + C1Val, DL, ShiftAmtVT);
SDValue NewShift1 = DAG.getNode(ShiftOpcode, DL, VT, X, ShiftSumC);		SDValue NewShift1 = DAG.getNode(ShiftOpcode, DL, VT, X, ShiftSumC);
SDValue NewShift2 = DAG.getNode(ShiftOpcode, DL, VT, Y, C1);		SDValue NewShift2 = DAG.getNode(ShiftOpcode, DL, VT, Y, C1);
return DAG.getNode(LogicOpcode, DL, VT, NewShift1, NewShift2);		return DAG.getNode(LogicOpcode, DL, VT, NewShift1, NewShift2);
}		}

		SDValue DAGCombiner::visitShiftOrRotate(SDNode *N) {
		auto ShiftOpcode = N->getOpcode();
		RKSimonUnsubmitted Done Reply Inline Actions (style) Don't use auto. RKSimon: (style) Don't use auto.
		SDValue LHS = N->getOperand(0);
		SDValue RHS = N->getOperand(1);

		// On some targets, shifting/rotating by a constant is faster than
		// shifting/rotating by a register, so we fold:
		//
		// shift lhs, (select cond, constant1, constant2) -->
		// select cond, (shift lhs, constant1), (shift lhs, constant2)
		//
		// TODO: This logic could be extended to ops other than shift/rotate.
		if (OptLevel != CodeGenOpt::None && RHS.getOpcode() == ISD::SELECT &&
		RHS.hasOneUse() && isa<ConstantSDNode>(RHS.getOperand(1)) &&
		isa<ConstantSDNode>(RHS.getOperand(2)) &&
		RKSimonUnsubmitted Done Reply Inline Actions isConstantIntBuildVectorOrConstantInt ? RKSimon: isConstantIntBuildVectorOrConstantInt ?
		TLI.shiftOrRotateIsFasterWithConstantShiftAmount(N, Level)) {
		SDLoc DL(N);
		EVT VT = N->getValueType(0);
		return DAG.getNode(
		ISD::SELECT, DL, VT, RHS.getOperand(0),
		DAG.getNode(ShiftOpcode, DL, VT, LHS, RHS.getOperand(1)),
		DAG.getNode(ShiftOpcode, DL, VT, LHS, RHS.getOperand(2)));
		}
		return SDValue();
		}

/// Handle transforms common to the three shifts, when the shift amount is a		/// Handle transforms common to the three shifts, when the shift amount is a
/// constant.		/// constant.
/// We are looking for: (shift being one of shl/sra/srl)		/// We are looking for: (shift being one of shl/sra/srl)
/// shift (binop X, C0), C1		/// shift (binop X, C0), C1
/// And want to transform into:		/// And want to transform into:
/// binop (shift X, C1), (shift C0, C1)		/// binop (shift X, C1), (shift C0, C1)
SDValue DAGCombiner::visitShiftByConstant(SDNode *N) {		SDValue DAGCombiner::visitShiftByConstant(SDNode *N) {
assert(isConstOrConstSplat(N->getOperand(1)) && "Expected constant operand");		assert(isConstOrConstSplat(N->getOperand(1)) && "Expected constant operand");
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
SDValue DAGCombiner::visitRotate(SDNode *N) {		SDValue DAGCombiner::visitRotate(SDNode *N) {
SDLoc dl(N);		SDLoc dl(N);
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
unsigned Bitsize = VT.getScalarSizeInBits();		unsigned Bitsize = VT.getScalarSizeInBits();

// fold (rot x, 0) -> x		// fold (rot x, 0) -> x
if (isNullOrNullSplat(N1))		if (isNullOrNullSplat(N1))
		RKSimonUnsubmitted Done Reply Inline Actions Move these lower down to after SimplifyDemandedBits, we tend to have the constant folding / cleanup combines first. RKSimon: Move these lower down to after SimplifyDemandedBits, we tend to have the constant folding /…
		jlebarAuthorUnsubmitted Done Reply Inline Actions Done. This also allowed me to remove the `Level >= AfterLegalizeTypes`, which I never liked. jlebar: Done. This also allowed me to remove the `Level >= AfterLegalizeTypes`, which I never liked.
return N0;		return N0;

// fold (rot x, c) -> x iff (c % BitSize) == 0		// fold (rot x, c) -> x iff (c % BitSize) == 0
if (isPowerOf2_32(Bitsize) && Bitsize > 1) {		if (isPowerOf2_32(Bitsize) && Bitsize > 1) {
APInt ModuloMask(N1.getScalarValueSizeInBits(), Bitsize - 1);		APInt ModuloMask(N1.getScalarValueSizeInBits(), Bitsize - 1);
if (DAG.MaskedValueIsZero(N1, ModuloMask))		if (DAG.MaskedValueIsZero(N1, ModuloMask))
return N0;		return N0;
}		}
Show All 37 Lines	if (C1 && C2 && C1->getValueType(0) == C2->getValueType(0)) {
SDValue BitsizeC = DAG.getConstant(Bitsize, dl, ShiftVT);		SDValue BitsizeC = DAG.getConstant(Bitsize, dl, ShiftVT);
SDValue CombinedShiftNorm = DAG.FoldConstantArithmetic(		SDValue CombinedShiftNorm = DAG.FoldConstantArithmetic(
ISD::SREM, dl, ShiftVT, {CombinedShift, BitsizeC});		ISD::SREM, dl, ShiftVT, {CombinedShift, BitsizeC});
return DAG.getNode(N->getOpcode(), dl, VT, N0->getOperand(0),		return DAG.getNode(N->getOpcode(), dl, VT, N0->getOperand(0),
CombinedShiftNorm);		CombinedShiftNorm);
}		}
}		}
}		}

		if (SDValue V = visitShiftOrRotate(N))
		return V;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSHL(SDNode *N) {		SDValue DAGCombiner::visitSHL(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
if (SDValue V = DAG.simplifyShift(N0, N1))		if (SDValue V = DAG.simplifyShift(N0, N1))
return V;		return V;
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSHL(SDNode *N) {
if (N0.getOpcode() == ISD::VSCALE)		if (N0.getOpcode() == ISD::VSCALE)
if (ConstantSDNode *NC1 = isConstOrConstSplat(N->getOperand(1))) {		if (ConstantSDNode *NC1 = isConstOrConstSplat(N->getOperand(1))) {
auto DL = SDLoc(N);		auto DL = SDLoc(N);
APInt C0 = N0.getConstantOperandAPInt(0);		APInt C0 = N0.getConstantOperandAPInt(0);
APInt C1 = NC1->getAPIntValue();		APInt C1 = NC1->getAPIntValue();
return DAG.getVScale(DL, VT, C0 << C1);		return DAG.getVScale(DL, VT, C0 << C1);
}		}

		if (SDValue V = visitShiftOrRotate(N))
		return V;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSRA(SDNode *N) {		SDValue DAGCombiner::visitSRA(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
if (SDValue V = DAG.simplifyShift(N0, N1))		if (SDValue V = DAG.simplifyShift(N0, N1))
return V;		return V;
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSRA(SDNode *N) {
// If the sign bit is known to be zero, switch this to a SRL.		// If the sign bit is known to be zero, switch this to a SRL.
if (DAG.SignBitIsZero(N0))		if (DAG.SignBitIsZero(N0))
return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0, N1);		return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0, N1);

if (N1C && !N1C->isOpaque())		if (N1C && !N1C->isOpaque())
if (SDValue NewSRA = visitShiftByConstant(N))		if (SDValue NewSRA = visitShiftByConstant(N))
return NewSRA;		return NewSRA;

		if (SDValue V = visitShiftOrRotate(N))
		return V;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSRL(SDNode *N) {		SDValue DAGCombiner::visitSRL(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
if (SDValue V = DAG.simplifyShift(N0, N1))		if (SDValue V = DAG.simplifyShift(N0, N1))
return V;		return V;
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	if (N->hasOneUse()) {
else if (Use->getOpcode() == ISD::TRUNCATE && Use->hasOneUse()) {		else if (Use->getOpcode() == ISD::TRUNCATE && Use->hasOneUse()) {
// Also look pass the truncate.		// Also look pass the truncate.
Use = *Use->use_begin();		Use = *Use->use_begin();
if (Use->getOpcode() == ISD::BRCOND)		if (Use->getOpcode() == ISD::BRCOND)
AddToWorklist(Use);		AddToWorklist(Use);
}		}
}		}

		if (SDValue V = visitShiftOrRotate(N))
		return V;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFunnelShift(SDNode *N) {		SDValue DAGCombiner::visitFunnelShift(SDNode *N) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
▲ Show 20 Lines • Show All 13,239 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 790 Lines • ▼ Show 20 Lines	public:
/// Replace the results of node with an illegal result		/// Replace the results of node with an illegal result
/// type with new values built out of custom code.		/// type with new values built out of custom code.
///		///
void ReplaceNodeResults(SDNode *N, SmallVectorImpl<SDValue>&Results,		void ReplaceNodeResults(SDNode *N, SmallVectorImpl<SDValue>&Results,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;

SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;		SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;

		bool shiftOrRotateIsFasterWithConstantShiftAmount(
		const SDNode *N, CombineLevel Level) const override;

/// Return true if the target has native support for		/// Return true if the target has native support for
/// the specified value type and it is 'desirable' to use the type for the		/// the specified value type and it is 'desirable' to use the type for the
/// given node type. e.g. On x86 i16 is legal, but undesirable since i16		/// given node type. e.g. On x86 i16 is legal, but undesirable since i16
/// instruction encodings are longer and some i16 instructions are slow.		/// instruction encodings are longer and some i16 instructions are slow.
bool isTypeDesirableForOp(unsigned Opc, EVT VT) const override;		bool isTypeDesirableForOp(unsigned Opc, EVT VT) const override;

/// Return true if the target has native support for the		/// Return true if the target has native support for the
/// specified value type and it is 'desirable' to use the type. e.g. On x86		/// specified value type and it is 'desirable' to use the type. e.g. On x86
▲ Show 20 Lines • Show All 815 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,137 Lines • ▼ Show 20 Lines
	unsigned InOpcode = InVec.getOpcode();			unsigned InOpcode = InVec.getOpcode();
	if (VT == MVT::v2f64 && InVecVT == MVT::v4f64) {			if (VT == MVT::v2f64 && InVecVT == MVT::v4f64) {
	// v2f64 CVTDQ2PD(v4i32).			// v2f64 CVTDQ2PD(v4i32).
	if (InOpcode == ISD::SINT_TO_FP &&			if (InOpcode == ISD::SINT_TO_FP &&
	InVec.getOperand(0).getValueType() == MVT::v4i32) {			InVec.getOperand(0).getValueType() == MVT::v4i32) {
	return DAG.getNode(X86ISD::CVTSI2P, SDLoc(N), VT, InVec.getOperand(0));			return DAG.getNode(X86ISD::CVTSI2P, SDLoc(N), VT, InVec.getOperand(0));
	}			}
	// v2f64 CVTUDQ2PD(v4i32).			// v2f64 CVTUDQ2PD(v4i32).
	if (InOpcode == ISD::UINT_TO_FP && Subtarget.hasVLX() &&			if (InOpcode == ISD::UINT_TO_FP && Subtarget.hasVLX() &&
				RKSimonUnsubmitted Done Reply Inline Actions (style) Don't use auto. RKSimon: (style) Don't use auto.
	InVec.getOperand(0).getValueType() == MVT::v4i32) {			InVec.getOperand(0).getValueType() == MVT::v4i32) {
	return DAG.getNode(X86ISD::CVTUI2P, SDLoc(N), VT, InVec.getOperand(0));			return DAG.getNode(X86ISD::CVTUI2P, SDLoc(N), VT, InVec.getOperand(0));
	}			}
	// v2f64 CVTPS2PD(v4f32).			// v2f64 CVTPS2PD(v4f32).
	if (InOpcode == ISD::FP_EXTEND &&			if (InOpcode == ISD::FP_EXTEND &&
	InVec.getOperand(0).getValueType() == MVT::v4f32) {			InVec.getOperand(0).getValueType() == MVT::v4f32) {
	return DAG.getNode(X86ISD::VFPEXT, SDLoc(N), VT, InVec.getOperand(0));			return DAG.getNode(X86ISD::VFPEXT, SDLoc(N), VT, InVec.getOperand(0));
	}			}
	▲ Show 20 Lines • Show All 605 Lines • ▼ Show 20 Lines
	return false;			return false;
	}			}
	}			}

	PVT = MVT::i32;			PVT = MVT::i32;
	return true;			return true;
	}			}

				bool X86TargetLowering::shiftOrRotateIsFasterWithConstantShiftAmount(
				const SDNode *N, CombineLevel Level) const {
				// On most x86 chips, shifts/rotates by a constant are faster than
				// shifts/rotates by a register.
				unsigned Opcode = N->getOpcode();
				(void)Opcode;
				assert(Opcode == ISD::SHL \|\| Opcode == ISD::SRA \|\| Opcode == ISD::SRL \|\|
				Opcode == ISD::ROTL \|\| Opcode == ISD::ROTR);
				// Scalar shifts of an immediate are faster than scalar shifts of a register.
				// But vector shifts have no such preference.
				return !N->getValueType(0).isVector();
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// X86 Inline Assembly Support			// X86 Inline Assembly Support
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// Helper to match a string separated by whitespace.			// Helper to match a string separated by whitespace.
	static bool matchAsm(StringRef S, ArrayRef<const char *> Pieces) {			static bool matchAsm(StringRef S, ArrayRef<const char *> Pieces) {
	S = S.substr(S.find_first_not_of(" \t")); // Skip leading whitespace.			S = S.substr(S.find_first_not_of(" \t")); // Skip leading whitespace.

	▲ Show 20 Lines • Show All 1,064 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/dagcombine-shifts.ll

	Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines

	; The *_select tests below check that we do the following transformation:			; The *_select tests below check that we do the following transformation:
	;			;
	; shift lhs, (select cond, constant1, constant2) -->			; shift lhs, (select cond, constant1, constant2) -->
	; select cond, (shift lhs, constant1), (shift lhs, constant2)			; select cond, (shift lhs, constant1), (shift lhs, constant2)
	;			;
	; When updating these testcases, ensure that there are two shift instructions			; When updating these testcases, ensure that there are two shift instructions
	; in the result and that they take immediates rather than registers.			; in the result and that they take immediates rather than registers.
	define i32 @shl_select(i32 %x, i1 %cond) {			define i32 @shl_select(i32 %x, i1 %cond) {
				RKSimonUnsubmitted Done Reply Inline Actions Please commit these tests with current codegen, then rebase the patch to show the codegen diff. RKSimon: Please commit these tests with current codegen, then rebase the patch to show the codegen diff.
	; CHECK-LABEL: shl_select:			; CHECK-LABEL: shl_select:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: xorl %ecx, %ecx			; CHECK-NEXT: movl %edi, %ecx
				; CHECK-NEXT: shrl $3, %ecx
				; CHECK-NEXT: shrl $6, %eax
	; CHECK-NEXT: testb $1, %sil			; CHECK-NEXT: testb $1, %sil
	; CHECK-NEXT: sete %cl			; CHECK-NEXT: cmovnel %ecx, %eax
	; CHECK-NEXT: leal 3(%rcx,%rcx,2), %ecx
	; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
	; CHECK-NEXT: shrl %cl, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%shift_amnt = select i1 %cond, i32 3, i32 6			%shift_amnt = select i1 %cond, i32 3, i32 6
	%ret = lshr i32 %x, %shift_amnt			%ret = lshr i32 %x, %shift_amnt
	ret i32 %ret			ret i32 %ret
	}			}

	define i32 @ashr_select(i32 %x, i1 %cond) {			define i32 @ashr_select(i32 %x, i1 %cond) {
	; CHECK-LABEL: ashr_select:			; CHECK-LABEL: ashr_select:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: xorl %ecx, %ecx			; CHECK-NEXT: movl %edi, %ecx
				; CHECK-NEXT: sarl $3, %ecx
				; CHECK-NEXT: sarl $6, %eax
	; CHECK-NEXT: testb $1, %sil			; CHECK-NEXT: testb $1, %sil
	; CHECK-NEXT: sete %cl			; CHECK-NEXT: cmovnel %ecx, %eax
	; CHECK-NEXT: leal 3(%rcx,%rcx,2), %ecx
	; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
	; CHECK-NEXT: sarl %cl, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%shift_amnt = select i1 %cond, i32 3, i32 6			%shift_amnt = select i1 %cond, i32 3, i32 6
	%ret = ashr i32 %x, %shift_amnt			%ret = ashr i32 %x, %shift_amnt
	ret i32 %ret			ret i32 %ret
	}			}

	define i32 @lshr_select(i32 %x, i1 %cond) {			define i32 @lshr_select(i32 %x, i1 %cond) {
	; CHECK-LABEL: lshr_select:			; CHECK-LABEL: lshr_select:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: xorl %ecx, %ecx			; CHECK-NEXT: movl %edi, %ecx
				; CHECK-NEXT: shrl $3, %ecx
				; CHECK-NEXT: shrl $6, %eax
	; CHECK-NEXT: testb $1, %sil			; CHECK-NEXT: testb $1, %sil
	; CHECK-NEXT: sete %cl			; CHECK-NEXT: cmovnel %ecx, %eax
	; CHECK-NEXT: leal 3(%rcx,%rcx,2), %ecx
	; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
	; CHECK-NEXT: shrl %cl, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%shift_amnt = select i1 %cond, i32 3, i32 6			%shift_amnt = select i1 %cond, i32 3, i32 6
	%ret = lshr i32 %x, %shift_amnt			%ret = lshr i32 %x, %shift_amnt
	ret i32 %ret			ret i32 %ret
	}			}

	define i32 @rot_select(i32 %x, i1 %cond) {			define i32 @rot_select(i32 %x, i1 %cond) {
	; CHECK-LABEL: rot_select:			; CHECK-LABEL: rot_select:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: xorl %ecx, %ecx			; CHECK-NEXT: movl %edi, %ecx
				; CHECK-NEXT: roll $3, %ecx
				; CHECK-NEXT: roll $6, %eax
	; CHECK-NEXT: testb $1, %sil			; CHECK-NEXT: testb $1, %sil
	; CHECK-NEXT: sete %cl			; CHECK-NEXT: cmovnel %ecx, %eax
	; CHECK-NEXT: leal 3(%rcx,%rcx,2), %ecx
	; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
	; CHECK-NEXT: roll %cl, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%amnt = select i1 %cond, i32 3, i32 6			%amnt = select i1 %cond, i32 3, i32 6
	%amnt2 = sub i32 32, %amnt			%amnt2 = sub i32 32, %amnt
	%t0 = shl i32 %x, %amnt			%t0 = shl i32 %x, %amnt
	%t1 = lshr i32 %x, %amnt2			%t1 = lshr i32 %x, %amnt2
	%t2 = or i32 %t0, %t1			%t2 = or i32 %t0, %t1
	ret i32 %t2			ret i32 %t2
	}			}
	▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] [DAGCombine] Prefer shifts of constant widths.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 247514

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/dagcombine-shifts.ll

[x86] [DAGCombine] Prefer shifts of constant widths.
Needs ReviewPublic