This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
4/4
arm64-bitfield-extract.ll
-
AMDGPU/
5
llvm.amdgcn.ubfe.ll
-
X86/
-
pr32588.ll
-
pull-binop-through-shift.ll
3
rotate-extract-vector.ll
-
rotate-extract.ll
-
shift-mask.ll

Differential D62100

[DAGCombine][X86][AMDGPU][AArch64] (srl (shl x, c1), c2) with c1 != c2 handling
AbandonedPublic

Authored by lebedev.ri on May 18 2019, 6:24 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
spatel
andreadb

Summary

https://rise4fun.com/Alive/6bVL

AArch64 is clearly a regression, will need a separate fix
AMDGPU change looks bad either way - bfe is not used
X86 changes look neutral-positive, except vector cases, those are clearly regressions

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.May 18 2019, 6:24 AM

Herald added subscribers: kristof.beyls, t-tye, tpr and 7 others. · View Herald TranscriptMay 18 2019, 6:24 AM

Diffusion mentioned this in rL361102: [NFC][AArch64] Add some ubfx tests with immediates.May 18 2019, 6:48 AM

lebedev.ri mentioned this in rGd1be3c446ef8: [NFC][AArch64] Add some ubfx tests with immediates.May 18 2019, 6:48 AM

Looked at changes:

I'll leave x86 vector stuff for later. since i actually wanted to look into reverse trasnform, and looked into this only for consistency.
I don't know what to do with AArch64 regression. I can hide it with shouldFoldConstantShiftPairToMask(), but it is there regardless (tests added). Thoughts?
That leaves AMDGPU?

test/CodeGen/AArch64/arm64-bitfield-extract.ll

396

After actually looking at -debug output, this regression happens because of SimplifyDemandedBits(),
which ignores AArch64TargetLowering::isDesirableToCommuteWithShift() override.
So when we get to AArch64TargetLowering::isBitfieldExtractOp(), we have

t18: i32 = srl t15, Constant:i64<2>
  t15: i32 = or t26, t24
    t26: i32 = and t7, Constant:i32<1073741816>
      t7: i32,ch = load<(load 4 from %ir.y, align 8)> t0, t2, undef:i64
        t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t1: i64 = Register %0
        t6: i64 = undef
      t25: i32 = Constant<1073741816>
    t24: i32 = and t12, Constant:i32<4>
      t12: i32 = srl t4, Constant:i64<16>
        t4: i32,ch = CopyFromReg t0, Register:i32 %1
          t3: i32 = Register %1
        t11: i64 = Constant<16>
      t23: i32 = Constant<4>
  t17: i64 = Constant<2>

I'm not sure how to turn this pattern on it's head to produce ubfx again.

test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll

686–687

@arsenm will AMDGPU prefer 2 shifts or shift+mask here?

arsenm added inline comments.May 18 2019, 3:28 PM

test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll
686–687	In this particular case, they're the same. In general 2 shifts is probably better. The mask value is less likely to be an inline immediate. It seems like we have a BFE matching problem though
686–687	For 64-bit, shift and mask would be better

Looks like AMDGPU changes are neutral too.
And now that i think about it, the AArch64 regression should be solvable (hidable) by an inverse transform.
Should i look into that before or after this patch?

sidorovd mentioned this in rG6b1ef130f78c: [NFC][AArch64] Add some ubfx tests with immediates.May 30 2019, 10:41 AM

RKSimon added inline comments.Jun 26 2019, 11:34 AM

test/CodeGen/AArch64/arm64-bitfield-extract.ll
396	@lebedev.ri Which SimplifyDemandedBits transforms is missing this?

xbolva00 added a subscriber: xbolva00.Jun 26 2019, 11:45 AM

xbolva00 added inline comments.

test/CodeGen/X86/rotate-extract-vector.ll
168	@RKSimon is this better?

lebedev.ri marked 2 inline comments as done.Jun 26 2019, 11:58 AM

lebedev.ri added inline comments.

test/CodeGen/AArch64/arm64-bitfield-extract.ll
396	The other way around, some SimplifyDemandedBits produces this, and the rest is unable to recover.

RKSimon added inline comments.Jun 27 2019, 3:40 AM

test/CodeGen/X86/rotate-extract-vector.ll
168	Unlikely - a pair of shifts by uniform constants is almost certainly better

RKSimon added a reviewer: andreadb.Jun 27 2019, 3:52 AM

lebedev.ri planned changes to this revision.Jun 27 2019, 3:53 AM

lebedev.ri marked an inline comment as done.

lebedev.ri marked an inline comment as done and an inline comment as not done.Jul 18 2019, 8:56 AM

lebedev.ri added inline comments.

test/CodeGen/AArch64/arm64-bitfield-extract.ll
396	Still don't have any ideas on how to approach this.
test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll
686–687	bool AMDGPUTargetLowering::shouldFoldConstantShiftPairToMask( const SDNode *N, CombineLevel Level) const { EVT VT = N->getValueType(0); return VT.isScalarInteger() && VT.getScalarSizeInBits() == 64; } results in many AMDGPU test regressions, i can submit a patch, but i thought i'd double-check first - is that what you meant?
test/CodeGen/X86/rotate-extract-vector.ll
168	Hmm, could you be more specific please? For vectors, do we want to keep two shifts even if shift amounts are equal, or only if shift amounts are unequal? Also, what does "almost certainly better" mean given that we have `-mattr=+fast-vector-shift-masks`?

lebedev.ri mentioned this in D66571: [X86] Add a DAG combine to turn vector (and (srl X, ((1 << C1) - 1)), C2) into (srl (shl (X, C3), C4)) to save a constant pool for the AND mask.Aug 22 2019, 6:39 AM

lebedev.ri abandoned this revision.Jan 17 2022, 2:40 PM

Herald added subscribers: kerbowa, pengfei. · View Herald TranscriptJan 17 2022, 2:40 PM

arsenm added inline comments.Jan 17 2022, 2:45 PM

test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll
686–687	That's not what I meant, it's a quirk of these particular shift values. However I assume this change would be good, since masking will be better than 64-bit shifts

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

36 lines

test/

CodeGen/

AArch64/

arm64-bitfield-extract.ll

10 lines

AMDGPU/

llvm.amdgcn.ubfe.ll

8 lines

X86/

pr32588.ll

6 lines

pull-binop-through-shift.ll

5 lines

rotate-extract-vector.ll

23 lines

rotate-extract.ll

29 lines

shift-mask.ll

148 lines

Diff 200150

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,194 Lines • ▼ Show 20 Lines	if (auto N001C = isConstOrConstSplat(N0.getOperand(0).getOperand(1))) {
DAG.getNode(ISD::SRL, DL, InnerShiftVT,		DAG.getNode(ISD::SRL, DL, InnerShiftVT,
N0.getOperand(0).getOperand(0),		N0.getOperand(0).getOperand(0),
DAG.getConstant(c1 + c2, DL,		DAG.getConstant(c1 + c2, DL,
ShiftCountVT)));		ShiftCountVT)));
}		}
}		}
}		}

// fold (srl (shl x, c), c) -> (and x, cst2)		// fold (srl (shl x, c1), c2) -> (and (shl x, (sub c1, c2), MASK) or
// TODO - (srl (shl x, c1), c2).		// (and (srl x, (sub c2, c1), MASK)
		// Only fold this if the inner shift has no other uses -- if it does, folding
		// this will increase the total number of instructions.
		// TODO - drop hasOneUse requirement if c1 == c2?
		// TODO - support non-uniform vector shift amounts.
		if (N1C && N0.getOpcode() == ISD::SHL && N0.hasOneUse() &&
		TLI.shouldFoldConstantShiftPairToMask(N, Level)) {
		if (ConstantSDNode *N0C1 = isConstOrConstSplat(N0.getOperand(1))) {
		if (N0C1->getAPIntValue().ult(OpSizeInBits)) {
		uint64_t c1 = N0C1->getZExtValue();
		uint64_t c2 = N1C->getZExtValue();
		APInt Mask = APInt::getLowBitsSet(OpSizeInBits, OpSizeInBits - c1);
		SDValue Shift;
		if (c2 > c1) {
		Mask.lshrInPlace(c2 - c1);
		SDLoc DL(N);
		Shift = DAG.getNode(ISD::SRL, DL, VT, N0.getOperand(0),
		DAG.getConstant(c2 - c1, DL, N1.getValueType()));
		} else {
		Mask <<= c1 - c2;
		SDLoc DL(N);
		Shift = DAG.getNode(ISD::SHL, DL, VT, N0.getOperand(0),
		DAG.getConstant(c1 - c2, DL, N1.getValueType()));
		}
		SDLoc DL(N0);
		AddToWorklist(Shift.getNode());
		return DAG.getNode(ISD::AND, DL, VT, Shift,
		DAG.getConstant(Mask, DL, VT));
		}
		}
		}

		// fold (srl (shl x, c1), c1) -> (and x, (srl -1, c1))
if (N0.getOpcode() == ISD::SHL && N0.getOperand(1) == N1 &&		if (N0.getOpcode() == ISD::SHL && N0.getOperand(1) == N1 &&
isConstantOrConstantVector(N1, /* NoOpaques */ true)) {		isConstantOrConstantVector(N1, /* NoOpaques */ true)) {
SDLoc DL(N);		SDLoc DL(N);
SDValue Mask =		SDValue Mask =
DAG.getNode(ISD::SRL, DL, VT, DAG.getAllOnesConstant(DL, VT), N1);		DAG.getNode(ISD::SRL, DL, VT, DAG.getAllOnesConstant(DL, VT), N1);
AddToWorklist(Mask.getNode());		AddToWorklist(Mask.getNode());
return DAG.getNode(ISD::AND, DL, VT, N0.getOperand(0), Mask);		return DAG.getNode(ISD::AND, DL, VT, N0.getOperand(0), Mask);
}		}
▲ Show 20 Lines • Show All 12,967 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-bitfield-extract.ll

	Show First 20 Lines • Show All 387 Lines • ▼ Show 20 Lines
	}			}

	; Check if we can still catch bfm instruction when we drop some high bits			; Check if we can still catch bfm instruction when we drop some high bits
	; and some low bits			; and some low bits
	define void @fct12(i32* nocapture %y, i32 %x) nounwind optsize inlinehint ssp {			define void @fct12(i32* nocapture %y, i32 %x) nounwind optsize inlinehint ssp {
	; LLC-LABEL: fct12:			; LLC-LABEL: fct12:
	; LLC: // %bb.0: // %entry			; LLC: // %bb.0: // %entry
	; LLC-NEXT: ldr w8, [x0]			; LLC-NEXT: ldr w8, [x0]
				; LLC-NEXT: and w8, w8, #0x3ffffff8
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions After actually looking at `-debug` output, this regression happens because of `SimplifyDemandedBits()`, which ignores `AArch64TargetLowering::isDesirableToCommuteWithShift()` override. So when we get to `AArch64TargetLowering::isBitfieldExtractOp()`, we have t18: i32 = srl t15, Constant:i64<2> t15: i32 = or t26, t24 t26: i32 = and t7, Constant:i32<1073741816> t7: i32,ch = load<(load 4 from %ir.y, align 8)> t0, t2, undef:i64 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t1: i64 = Register %0 t6: i64 = undef t25: i32 = Constant<1073741816> t24: i32 = and t12, Constant:i32<4> t12: i32 = srl t4, Constant:i64<16> t4: i32,ch = CopyFromReg t0, Register:i32 %1 t3: i32 = Register %1 t11: i64 = Constant<16> t23: i32 = Constant<4> t17: i64 = Constant<2> I'm not sure how to turn this pattern on it's head to produce `ubfx` again. lebedev.ri: After actually looking at `-debug` output, this regression happens because of…
				RKSimonUnsubmitted Done Reply Inline Actions @lebedev.ri Which SimplifyDemandedBits transforms is missing this? RKSimon: @lebedev.ri Which SimplifyDemandedBits transforms is missing this?
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions The other way around, some SimplifyDemandedBits produces this, and the rest is unable to recover. lebedev.ri: The other way around, some SimplifyDemandedBits produces this, and the rest is unable to…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Still don't have any ideas on how to approach this. lebedev.ri: Still don't have any ideas on how to approach this.
	; LLC-NEXT: bfxil w8, w1, #16, #3			; LLC-NEXT: bfxil w8, w1, #16, #3
	; LLC-NEXT: ubfx w8, w8, #2, #28			; LLC-NEXT: lsr w8, w8, #2
	; LLC-NEXT: str w8, [x0]			; LLC-NEXT: str w8, [x0]
	; LLC-NEXT: ret			; LLC-NEXT: ret
	; OPT-LABEL: @fct12(			; OPT-LABEL: @fct12(
	; OPT-NEXT: entry:			; OPT-NEXT: entry:
	; OPT-NEXT: [[TMP0:%.]] = load i32, i32 [[Y:%.*]], align 8			; OPT-NEXT: [[TMP0:%.]] = load i32, i32 [[Y:%.*]], align 8
	; OPT-NEXT: [[AND:%.*]] = and i32 [[TMP0]], -8			; OPT-NEXT: [[AND:%.*]] = and i32 [[TMP0]], -8
	; OPT-NEXT: [[SHR:%.]] = lshr i32 [[X:%.]], 16			; OPT-NEXT: [[SHR:%.]] = lshr i32 [[X:%.]], 16
	; OPT-NEXT: [[AND1:%.*]] = and i32 [[SHR]], 7			; OPT-NEXT: [[AND1:%.*]] = and i32 [[SHR]], 7
	Show All 18 Lines

	; Check if we can still catch bfm instruction when we drop some high bits			; Check if we can still catch bfm instruction when we drop some high bits
	; and some low bits			; and some low bits
	; (i64 version)			; (i64 version)
	define void @fct13(i64* nocapture %y, i64 %x) nounwind optsize inlinehint ssp {			define void @fct13(i64* nocapture %y, i64 %x) nounwind optsize inlinehint ssp {
	; LLC-LABEL: fct13:			; LLC-LABEL: fct13:
	; LLC: // %bb.0: // %entry			; LLC: // %bb.0: // %entry
	; LLC-NEXT: ldr x8, [x0]			; LLC-NEXT: ldr x8, [x0]
				; LLC-NEXT: and x8, x8, #0x3ffffffffffffff8
	; LLC-NEXT: bfxil x8, x1, #16, #3			; LLC-NEXT: bfxil x8, x1, #16, #3
	; LLC-NEXT: ubfx x8, x8, #2, #60			; LLC-NEXT: lsr x8, x8, #2
	; LLC-NEXT: str x8, [x0]			; LLC-NEXT: str x8, [x0]
	; LLC-NEXT: ret			; LLC-NEXT: ret
	; OPT-LABEL: @fct13(			; OPT-LABEL: @fct13(
	; OPT-NEXT: entry:			; OPT-NEXT: entry:
	; OPT-NEXT: [[TMP0:%.]] = load i64, i64 [[Y:%.*]], align 8			; OPT-NEXT: [[TMP0:%.]] = load i64, i64 [[Y:%.*]], align 8
	; OPT-NEXT: [[AND:%.*]] = and i64 [[TMP0]], -8			; OPT-NEXT: [[AND:%.*]] = and i64 [[TMP0]], -8
	; OPT-NEXT: [[SHR:%.]] = lshr i64 [[X:%.]], 16			; OPT-NEXT: [[SHR:%.]] = lshr i64 [[X:%.]], 16
	; OPT-NEXT: [[AND1:%.*]] = and i64 [[SHR]], 7			; OPT-NEXT: [[AND1:%.*]] = and i64 [[SHR]], 7
	▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	define void @fct16(i32* nocapture %y, i32 %x) nounwind optsize inlinehint ssp {			define void @fct16(i32* nocapture %y, i32 %x) nounwind optsize inlinehint ssp {
	; LLC-LABEL: fct16:			; LLC-LABEL: fct16:
	; LLC: // %bb.0: // %entry			; LLC: // %bb.0: // %entry
	; LLC-NEXT: ldr w8, [x0]			; LLC-NEXT: ldr w8, [x0]
	; LLC-NEXT: mov w9, #33120			; LLC-NEXT: mov w9, #33120
	; LLC-NEXT: movk w9, #26, lsl #16			; LLC-NEXT: movk w9, #26, lsl #16
	; LLC-NEXT: and w8, w8, w9			; LLC-NEXT: and w8, w8, w9
	; LLC-NEXT: bfxil w8, w1, #16, #3			; LLC-NEXT: bfxil w8, w1, #16, #3
	; LLC-NEXT: ubfx w8, w8, #2, #28			; LLC-NEXT: lsr w8, w8, #2
	; LLC-NEXT: str w8, [x0]			; LLC-NEXT: str w8, [x0]
	; LLC-NEXT: ret			; LLC-NEXT: ret
	; OPT-LABEL: @fct16(			; OPT-LABEL: @fct16(
	; OPT-NEXT: entry:			; OPT-NEXT: entry:
	; OPT-NEXT: [[TMP0:%.]] = load i32, i32 [[Y:%.*]], align 8			; OPT-NEXT: [[TMP0:%.]] = load i32, i32 [[Y:%.*]], align 8
	; OPT-NEXT: [[AND:%.*]] = and i32 [[TMP0]], 1737056			; OPT-NEXT: [[AND:%.*]] = and i32 [[TMP0]], 1737056
	; OPT-NEXT: [[SHR:%.]] = lshr i32 [[X:%.]], 16			; OPT-NEXT: [[SHR:%.]] = lshr i32 [[X:%.]], 16
	; OPT-NEXT: [[AND1:%.*]] = and i32 [[SHR]], 7			; OPT-NEXT: [[AND1:%.*]] = and i32 [[SHR]], 7
	Show All 25 Lines
	define void @fct17(i64* nocapture %y, i64 %x) nounwind optsize inlinehint ssp {			define void @fct17(i64* nocapture %y, i64 %x) nounwind optsize inlinehint ssp {
	; LLC-LABEL: fct17:			; LLC-LABEL: fct17:
	; LLC: // %bb.0: // %entry			; LLC: // %bb.0: // %entry
	; LLC-NEXT: ldr x8, [x0]			; LLC-NEXT: ldr x8, [x0]
	; LLC-NEXT: mov w9, #33120			; LLC-NEXT: mov w9, #33120
	; LLC-NEXT: movk w9, #26, lsl #16			; LLC-NEXT: movk w9, #26, lsl #16
	; LLC-NEXT: and x8, x8, x9			; LLC-NEXT: and x8, x8, x9
	; LLC-NEXT: bfxil x8, x1, #16, #3			; LLC-NEXT: bfxil x8, x1, #16, #3
	; LLC-NEXT: ubfx x8, x8, #2, #60			; LLC-NEXT: lsr x8, x8, #2
	; LLC-NEXT: str x8, [x0]			; LLC-NEXT: str x8, [x0]
	; LLC-NEXT: ret			; LLC-NEXT: ret
	; OPT-LABEL: @fct17(			; OPT-LABEL: @fct17(
	; OPT-NEXT: entry:			; OPT-NEXT: entry:
	; OPT-NEXT: [[TMP0:%.]] = load i64, i64 [[Y:%.*]], align 8			; OPT-NEXT: [[TMP0:%.]] = load i64, i64 [[Y:%.*]], align 8
	; OPT-NEXT: [[AND:%.*]] = and i64 [[TMP0]], 1737056			; OPT-NEXT: [[AND:%.*]] = and i64 [[TMP0]], 1737056
	; OPT-NEXT: [[SHR:%.]] = lshr i64 [[X:%.]], 16			; OPT-NEXT: [[SHR:%.]] = lshr i64 [[X:%.]], 16
	; OPT-NEXT: [[AND1:%.*]] = and i64 [[SHR]], 7			; OPT-NEXT: [[AND1:%.*]] = and i64 [[SHR]], 7
	▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll

	Show First 20 Lines • Show All 677 Lines • ▼ Show 20 Lines
	; SI-NEXT: s_mov_b32 s11, s7			; SI-NEXT: s_mov_b32 s11, s7
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s8, s2			; SI-NEXT: s_mov_b32 s8, s2
	; SI-NEXT: s_mov_b32 s9, s3			; SI-NEXT: s_mov_b32 s9, s3
	; SI-NEXT: buffer_load_dword v0, off, s[8:11], 0			; SI-NEXT: buffer_load_dword v0, off, s[8:11], 0
	; SI-NEXT: s_mov_b32 s4, s0			; SI-NEXT: s_mov_b32 s4, s0
	; SI-NEXT: s_mov_b32 s5, s1			; SI-NEXT: s_mov_b32 s5, s1
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_lshlrev_b32_e32 v0, 31, v0			; SI-NEXT: v_lshlrev_b32_e32 v0, 30, v0
	; SI-NEXT: v_lshrrev_b32_e32 v0, 1, v0			; SI-NEXT: v_and_b32_e32 v0, 2.0, v0
				lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions @arsenm will AMDGPU prefer 2 shifts or shift+mask here? lebedev.ri: @arsenm will AMDGPU prefer 2 shifts or shift+mask here?
				arsenmUnsubmitted Not Done Reply Inline Actions In this particular case, they're the same. In general 2 shifts is probably better. The mask value is less likely to be an inline immediate. It seems like we have a BFE matching problem though arsenm: In this particular case, they're the same. In general 2 shifts is probably better. The mask…
				arsenmUnsubmitted Not Done Reply Inline Actions For 64-bit, shift and mask would be better arsenm: For 64-bit, shift and mask would be better
				lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions bool AMDGPUTargetLowering::shouldFoldConstantShiftPairToMask( const SDNode N, CombineLevel Level) const { EVT VT = N->getValueType(0); return VT.isScalarInteger() && VT.getScalarSizeInBits() == 64; } results in many AMDGPU test regressions, i can submit a patch, but i thought i'd double-check first - is that what you meant? lebedev.ri:* ``` bool AMDGPUTargetLowering::shouldFoldConstantShiftPairToMask( const SDNode *N…
				arsenmUnsubmitted Not Done Reply Inline Actions That's not what I meant, it's a quirk of these particular shift values. However I assume this change would be good, since masking will be better than 64-bit shifts arsenm: That's not what I meant, it's a quirk of these particular shift values. However I assume this…
	; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0			; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: bfe_u32_test_6:			; VI-LABEL: bfe_u32_test_6:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s0, s4			; VI-NEXT: s_mov_b32 s0, s4
	; VI-NEXT: s_mov_b32 s1, s5			; VI-NEXT: s_mov_b32 s1, s5
	; VI-NEXT: s_mov_b32 s4, s6			; VI-NEXT: s_mov_b32 s4, s6
	; VI-NEXT: s_mov_b32 s5, s7			; VI-NEXT: s_mov_b32 s5, s7
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
	; VI-NEXT: buffer_load_dword v0, off, s[4:7], 0			; VI-NEXT: buffer_load_dword v0, off, s[4:7], 0
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_lshlrev_b32_e32 v0, 31, v0			; VI-NEXT: v_lshlrev_b32_e32 v0, 30, v0
	; VI-NEXT: v_lshrrev_b32_e32 v0, 1, v0			; VI-NEXT: v_and_b32_e32 v0, 2.0, v0
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%x = load i32, i32 addrspace(1)* %in, align 4			%x = load i32, i32 addrspace(1)* %in, align 4
	%shl = shl i32 %x, 31			%shl = shl i32 %x, 31
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %shl, i32 1, i32 31)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %shl, i32 1, i32 31)
	store i32 %bfe, i32 addrspace(1)* %out, align 4			store i32 %bfe, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 1,028 Lines • Show Last 20 Lines

test/CodeGen/X86/pr32588.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mcpu=generic -mtriple=x86_64-linux \| FileCheck %s			; RUN: llc < %s -mcpu=generic -mtriple=x86_64-linux \| FileCheck %s

	@c = external local_unnamed_addr global i32, align 4			@c = external local_unnamed_addr global i32, align 4
	@b = external local_unnamed_addr global i32, align 4			@b = external local_unnamed_addr global i32, align 4
	@d = external local_unnamed_addr global i32, align 4			@d = external local_unnamed_addr global i32, align 4

	define void @fn1() {			define void @fn1() {
	; CHECK-LABEL: fn1:			; CHECK-LABEL: fn1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: cmpl $1, {{.*}}(%rip)			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: sbbl %eax, %eax			; CHECK-NEXT: cmpl $0, {{.*}}(%rip)
	; CHECK-NEXT: andl $1, %eax			; CHECK-NEXT: sete %al
	; CHECK-NEXT: movl %eax, {{.*}}(%rip)			; CHECK-NEXT: movl %eax, {{.*}}(%rip)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = load i32, i32* @c, align 4			%t0 = load i32, i32* @c, align 4
	%tobool1 = icmp eq i32 %t0, 0			%tobool1 = icmp eq i32 %t0, 0
	%xor = zext i1 %tobool1 to i32			%xor = zext i1 %tobool1 to i32
	%t1 = load i32, i32* @b, align 4			%t1 = load i32, i32* @b, align 4
	%tobool2 = icmp ne i32 %t1, 0			%tobool2 = icmp ne i32 %t1, 0
	%t2 = load i32, i32* @d, align 4			%t2 = load i32, i32* @d, align 4
	%tobool4 = icmp ne i32 %t2, 0			%tobool4 = icmp ne i32 %t2, 0
	%t3 = and i1 %tobool4, %tobool2			%t3 = and i1 %tobool4, %tobool2
	%sub = sext i1 %t3 to i32			%sub = sext i1 %t3 to i32
	%div = sdiv i32 %sub, 2			%div = sdiv i32 %sub, 2
	%add = add nsw i32 %div, %xor			%add = add nsw i32 %div, %xor
	store i32 %add, i32* @d, align 4			store i32 %add, i32* @d, align 4
	ret void			ret void
	}			}

test/CodeGen/X86/pull-binop-through-shift.ll

	Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: shrl $8, %eax			; X64-NEXT: shrl $8, %eax
	; X64-NEXT: andl $16776960, %eax # imm = 0xFFFF00			; X64-NEXT: andl $16776960, %eax # imm = 0xFFFF00
	; X64-NEXT: movl %eax, (%rsi)			; X64-NEXT: movl %eax, (%rsi)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: and_signbit_lshr:			; X32-LABEL: and_signbit_lshr:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: shll $16, %eax
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: shrl $8, %eax			; X32-NEXT: movzwl {{[0-9]+}}(%esp), %eax
				; X32-NEXT: shll $8, %eax
	; X32-NEXT: movl %eax, (%ecx)			; X32-NEXT: movl %eax, (%ecx)
	; X32-NEXT: retl			; X32-NEXT: retl
	%t0 = and i32 %x, 4294901760 ; 0xFFFF0000			%t0 = and i32 %x, 4294901760 ; 0xFFFF0000
	%r = lshr i32 %t0, 8			%r = lshr i32 %t0, 8
	store i32 %r, i32* %dst			store i32 %r, i32* %dst
	ret i32 %r			ret i32 %r
	}			}
	define i32 @and_nosignbit_lshr(i32 %x, i32* %dst) {			define i32 @and_nosignbit_lshr(i32 %x, i32* %dst) {
	▲ Show 20 Lines • Show All 337 Lines • Show Last 20 Lines

test/CodeGen/X86/rotate-extract-vector.ll

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%rhs_mul = mul <32 x i16> %i, <i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10>		%rhs_mul = mul <32 x i16> %i, <i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10>
%rhs_shift = lshr <32 x i16> %rhs_mul, <i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10>		%rhs_shift = lshr <32 x i16> %rhs_mul, <i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10, i16 10>
%out = or <32 x i16> %lhs_mul, %rhs_shift		%out = or <32 x i16> %lhs_mul, %rhs_shift
ret <32 x i16> %out		ret <32 x i16> %out
}		}

; Result would undershift		; Result would undershift
define <4 x i64> @no_extract_shl(<4 x i64> %i) nounwind {		define <4 x i64> @no_extract_shl(<4 x i64> %i) nounwind {
; CHECK-LABEL: no_extract_shl:		; X86-LABEL: no_extract_shl:
; CHECK: # %bb.0:		; X86: # %bb.0:
; CHECK-NEXT: vpsllq $11, %ymm0, %ymm1		; X86-NEXT: vpsrlq $39, %ymm0, %ymm1
; CHECK-NEXT: vpsllq $24, %ymm0, %ymm0		; X86-NEXT: vpand {{\.LCPI.*}}, %ymm1, %ymm1
; CHECK-NEXT: vpsrlq $50, %ymm1, %ymm1		; X86-NEXT: vpsllq $24, %ymm0, %ymm0
; CHECK-NEXT: vpor %ymm0, %ymm1, %ymm0		; X86-NEXT: vpor %ymm0, %ymm1, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; X86-NEXT: retl
		;
		; X64-LABEL: no_extract_shl:
		; X64: # %bb.0:
		; X64-NEXT: vpbroadcastq {{.*#+}} ymm1 = [16383,16383,16383,16383]
		xbolva00Unsubmitted Not Done Reply Inline Actions @RKSimon is this better? xbolva00: @rksimon is this better?
		RKSimonUnsubmitted Not Done Reply Inline Actions Unlikely - a pair of shifts by uniform constants is almost certainly better RKSimon: Unlikely - a pair of shifts by uniform constants is almost certainly better
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions Hmm, could you be more specific please? For vectors, do we want to keep two shifts even if shift amounts are equal, or only if shift amounts are unequal? Also, what does "almost certainly better" mean given that we have `-mattr=+fast-vector-shift-masks`? lebedev.ri: Hmm, could you be more specific please? For vectors, do we want to keep two shifts even if…
		; X64-NEXT: vpsrlq $39, %ymm0, %ymm2
		; X64-NEXT: vpand %ymm1, %ymm2, %ymm1
		; X64-NEXT: vpsllq $24, %ymm0, %ymm0
		; X64-NEXT: vpor %ymm0, %ymm1, %ymm0
		; X64-NEXT: retq
%lhs_mul = shl <4 x i64> %i, <i64 11, i64 11, i64 11, i64 11>		%lhs_mul = shl <4 x i64> %i, <i64 11, i64 11, i64 11, i64 11>
%rhs_mul = shl <4 x i64> %i, <i64 24, i64 24, i64 24, i64 24>		%rhs_mul = shl <4 x i64> %i, <i64 24, i64 24, i64 24, i64 24>
%lhs_shift = lshr <4 x i64> %lhs_mul, <i64 50, i64 50, i64 50, i64 50>		%lhs_shift = lshr <4 x i64> %lhs_mul, <i64 50, i64 50, i64 50, i64 50>
%out = or <4 x i64> %lhs_shift, %rhs_mul		%out = or <4 x i64> %lhs_shift, %rhs_mul
ret <4 x i64> %out		ret <4 x i64> %out
}		}

; Result would overshift		; Result would overshift
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

test/CodeGen/X86/rotate-extract.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s --check-prefixes=CHECK,X86		; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s --check-prefixes=CHECK,X86
; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=CHECK,X64		; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=CHECK,X64

; Check that under certain conditions we can factor out a rotate		; Check that under certain conditions we can factor out a rotate
; from the following idioms:		; from the following idioms:
; (ac0) >> s1 \| (ac1)		; (ac0) >> s1 \| (ac1)
; (a/c0) << s1 \| (a/c1)		; (a/c0) << s1 \| (a/c1)
; This targets cases where instcombine has folded a shl/srl/mul/udiv		; This targets cases where instcombine has folded a shl/srl/mul/udiv
; with one of the shifts from the rotate idiom		; with one of the shifts from the rotate idiom

define i64 @rolq_extract_shl(i64 %i) nounwind {		define i64 @rolq_extract_shl(i64 %i) nounwind {
; X86-LABEL: rolq_extract_shl:		; X86-LABEL: rolq_extract_shl:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx		; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: leal (,%edx,8), %eax		; X86-NEXT: movl %edx, %ecx
; X86-NEXT: shldl $10, %ecx, %edx		; X86-NEXT: shrl $22, %ecx
; X86-NEXT: shll $10, %ecx		; X86-NEXT: andl $127, %ecx
; X86-NEXT: shrl $25, %eax		; X86-NEXT: shldl $10, %eax, %edx
		; X86-NEXT: shll $10, %eax
; X86-NEXT: orl %ecx, %eax		; X86-NEXT: orl %ecx, %eax
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: rolq_extract_shl:		; X64-LABEL: rolq_extract_shl:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: leaq (,%rdi,8), %rax		; X64-NEXT: leaq (,%rdi,8), %rax
; X64-NEXT: rolq $7, %rax		; X64-NEXT: rolq $7, %rax
; X64-NEXT: retq		; X64-NEXT: retq
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%out = or i64 %lhs_and, %rhs_shift		%out = or i64 %lhs_and, %rhs_shift
ret i64 %out		ret i64 %out
}		}

; Result would undershift		; Result would undershift
define i64 @no_extract_shl(i64 %i) nounwind {		define i64 @no_extract_shl(i64 %i) nounwind {
; X86-LABEL: no_extract_shl:		; X86-LABEL: no_extract_shl:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx		; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: movl %edx, %eax		; X86-NEXT: movl %edx, %ecx
; X86-NEXT: shll $5, %eax		; X86-NEXT: shrl $20, %ecx
; X86-NEXT: shldl $10, %ecx, %edx		; X86-NEXT: andl $127, %ecx
; X86-NEXT: shll $10, %ecx		; X86-NEXT: shldl $10, %eax, %edx
; X86-NEXT: shrl $25, %eax		; X86-NEXT: shll $10, %eax
; X86-NEXT: orl %ecx, %eax		; X86-NEXT: orl %ecx, %eax
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: no_extract_shl:		; X64-LABEL: no_extract_shl:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movq %rdi, %rax		; X64-NEXT: movq %rdi, %rax
; X64-NEXT: shlq $5, %rax		; X64-NEXT: shrq $52, %rax
		; X64-NEXT: andl $127, %eax
; X64-NEXT: shlq $10, %rdi		; X64-NEXT: shlq $10, %rdi
; X64-NEXT: shrq $57, %rax		; X64-NEXT: leaq (%rdi,%rax), %rax
; X64-NEXT: leaq (%rax,%rdi), %rax
; X64-NEXT: retq		; X64-NEXT: retq
%lhs_mul = shl i64 %i, 5		%lhs_mul = shl i64 %i, 5
%rhs_mul = shl i64 %i, 10		%rhs_mul = shl i64 %i, 10
%lhs_shift = lshr i64 %lhs_mul, 57		%lhs_shift = lshr i64 %lhs_mul, 57
%out = or i64 %lhs_shift, %rhs_mul		%out = or i64 %lhs_shift, %rhs_mul
ret i64 %out		ret i64 %out
}		}

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

test/CodeGen/X86/shift-mask.ll

Show First 20 Lines • Show All 331 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%2 = lshr i8 %1, 3		%2 = lshr i8 %1, 3
ret i8 %2		ret i8 %2
}		}

define i8 @test_i8_lshr_lshr_1(i8 %a0) {		define i8 @test_i8_lshr_lshr_1(i8 %a0) {
; X86-LABEL: test_i8_lshr_lshr_1:		; X86-LABEL: test_i8_lshr_lshr_1:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movb {{[0-9]+}}(%esp), %al		; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: shlb $3, %al		; X86-NEXT: shrb $2, %al
; X86-NEXT: shrb $5, %al		; X86-NEXT: andb $7, %al
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_i8_lshr_lshr_1:		; X64-MASK-LABEL: test_i8_lshr_lshr_1:
; X64: # %bb.0:		; X64-MASK: # %bb.0:
; X64-NEXT: # kill: def $edi killed $edi def $rdi		; X64-MASK-NEXT: movl %edi, %eax
; X64-NEXT: leal (,%rdi,8), %eax		; X64-MASK-NEXT: shrb $2, %al
; X64-NEXT: shrb $5, %al		; X64-MASK-NEXT: andb $7, %al
; X64-NEXT: # kill: def $al killed $al killed $eax		; X64-MASK-NEXT: # kill: def $al killed $al killed $eax
; X64-NEXT: retq		; X64-MASK-NEXT: retq
		;
		; X64-SHIFT-LABEL: test_i8_lshr_lshr_1:
		; X64-SHIFT: # %bb.0:
		; X64-SHIFT-NEXT: # kill: def $edi killed $edi def $rdi
		; X64-SHIFT-NEXT: leal (,%rdi,8), %eax
		; X64-SHIFT-NEXT: shrb $5, %al
		; X64-SHIFT-NEXT: # kill: def $al killed $al killed $eax
		; X64-SHIFT-NEXT: retq
%1 = shl i8 %a0, 3		%1 = shl i8 %a0, 3
%2 = lshr i8 %1, 5		%2 = lshr i8 %1, 5
ret i8 %2		ret i8 %2
}		}

define i8 @test_i8_lshr_lshr_2(i8 %a0) {		define i8 @test_i8_lshr_lshr_2(i8 %a0) {
; X86-LABEL: test_i8_lshr_lshr_2:		; X86-LABEL: test_i8_lshr_lshr_2:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movb {{[0-9]+}}(%esp), %al		; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: shlb $5, %al		; X86-NEXT: shlb $2, %al
; X86-NEXT: shrb $3, %al		; X86-NEXT: andb $28, %al
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_i8_lshr_lshr_2:		; X64-MASK-LABEL: test_i8_lshr_lshr_2:
; X64: # %bb.0:		; X64-MASK: # %bb.0:
; X64-NEXT: movl %edi, %eax		; X64-MASK-NEXT: # kill: def $edi killed $edi def $rdi
; X64-NEXT: shlb $5, %al		; X64-MASK-NEXT: leal (,%rdi,4), %eax
; X64-NEXT: shrb $3, %al		; X64-MASK-NEXT: andb $28, %al
; X64-NEXT: # kill: def $al killed $al killed $eax		; X64-MASK-NEXT: # kill: def $al killed $al killed $eax
; X64-NEXT: retq		; X64-MASK-NEXT: retq
		;
		; X64-SHIFT-LABEL: test_i8_lshr_lshr_2:
		; X64-SHIFT: # %bb.0:
		; X64-SHIFT-NEXT: movl %edi, %eax
		; X64-SHIFT-NEXT: shlb $5, %al
		; X64-SHIFT-NEXT: shrb $3, %al
		; X64-SHIFT-NEXT: # kill: def $al killed $al killed $eax
		; X64-SHIFT-NEXT: retq
%1 = shl i8 %a0, 5		%1 = shl i8 %a0, 5
%2 = lshr i8 %1, 3		%2 = lshr i8 %1, 3
ret i8 %2		ret i8 %2
}		}

define i16 @test_i16_lshr_lshr_0(i16 %a0) {		define i16 @test_i16_lshr_lshr_0(i16 %a0) {
; X86-LABEL: test_i16_lshr_lshr_0:		; X86-LABEL: test_i16_lshr_lshr_0:
; X86: # %bb.0:		; X86: # %bb.0:
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%2 = lshr i32 %1, 3		%2 = lshr i32 %1, 3
ret i32 %2		ret i32 %2
}		}

define i32 @test_i32_lshr_lshr_1(i32 %a0) {		define i32 @test_i32_lshr_lshr_1(i32 %a0) {
; X86-LABEL: test_i32_lshr_lshr_1:		; X86-LABEL: test_i32_lshr_lshr_1:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shll $3, %eax		; X86-NEXT: shrl $2, %eax
; X86-NEXT: shrl $5, %eax		; X86-NEXT: andl $134217727, %eax # imm = 0x7FFFFFF
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_i32_lshr_lshr_1:		; X64-MASK-LABEL: test_i32_lshr_lshr_1:
; X64: # %bb.0:		; X64-MASK: # %bb.0:
; X64-NEXT: # kill: def $edi killed $edi def $rdi		; X64-MASK-NEXT: movl %edi, %eax
; X64-NEXT: leal (,%rdi,8), %eax		; X64-MASK-NEXT: shrl $2, %eax
; X64-NEXT: shrl $5, %eax		; X64-MASK-NEXT: andl $134217727, %eax # imm = 0x7FFFFFF
; X64-NEXT: retq		; X64-MASK-NEXT: retq
		;
		; X64-SHIFT-LABEL: test_i32_lshr_lshr_1:
		; X64-SHIFT: # %bb.0:
		; X64-SHIFT-NEXT: # kill: def $edi killed $edi def $rdi
		; X64-SHIFT-NEXT: leal (,%rdi,8), %eax
		; X64-SHIFT-NEXT: shrl $5, %eax
		; X64-SHIFT-NEXT: retq
%1 = shl i32 %a0, 3		%1 = shl i32 %a0, 3
%2 = lshr i32 %1, 5		%2 = lshr i32 %1, 5
ret i32 %2		ret i32 %2
}		}

define i32 @test_i32_lshr_lshr_2(i32 %a0) {		define i32 @test_i32_lshr_lshr_2(i32 %a0) {
; X86-LABEL: test_i32_lshr_lshr_2:		; X86-LABEL: test_i32_lshr_lshr_2:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shll $5, %eax		; X86-NEXT: shll $2, %eax
; X86-NEXT: shrl $3, %eax		; X86-NEXT: andl $536870908, %eax # imm = 0x1FFFFFFC
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_i32_lshr_lshr_2:		; X64-MASK-LABEL: test_i32_lshr_lshr_2:
; X64: # %bb.0:		; X64-MASK: # %bb.0:
; X64-NEXT: movl %edi, %eax		; X64-MASK-NEXT: # kill: def $edi killed $edi def $rdi
; X64-NEXT: shll $5, %eax		; X64-MASK-NEXT: leal (,%rdi,4), %eax
; X64-NEXT: shrl $3, %eax		; X64-MASK-NEXT: andl $536870908, %eax # imm = 0x1FFFFFFC
; X64-NEXT: retq		; X64-MASK-NEXT: retq
		;
		; X64-SHIFT-LABEL: test_i32_lshr_lshr_2:
		; X64-SHIFT: # %bb.0:
		; X64-SHIFT-NEXT: movl %edi, %eax
		; X64-SHIFT-NEXT: shll $5, %eax
		; X64-SHIFT-NEXT: shrl $3, %eax
		; X64-SHIFT-NEXT: retq
%1 = shl i32 %a0, 5		%1 = shl i32 %a0, 5
%2 = lshr i32 %1, 3		%2 = lshr i32 %1, 3
ret i32 %2		ret i32 %2
}		}

define i64 @test_i64_lshr_lshr_0(i64 %a0) {		define i64 @test_i64_lshr_lshr_0(i64 %a0) {
; X86-LABEL: test_i64_lshr_lshr_0:		; X86-LABEL: test_i64_lshr_lshr_0:
; X86: # %bb.0:		; X86: # %bb.0:
Show All 35 Lines	; X64-BMI2-NEXT: retq
ret i64 %2		ret i64 %2
}		}

define i64 @test_i64_lshr_lshr_1(i64 %a0) {		define i64 @test_i64_lshr_lshr_1(i64 %a0) {
; X86-LABEL: test_i64_lshr_lshr_1:		; X86-LABEL: test_i64_lshr_lshr_1:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx		; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: shldl $3, %eax, %edx		; X86-NEXT: shrdl $2, %edx, %eax
; X86-NEXT: shll $3, %eax		; X86-NEXT: shrl $2, %edx
; X86-NEXT: shrdl $5, %edx, %eax		; X86-NEXT: andl $134217727, %edx # imm = 0x7FFFFFF
; X86-NEXT: shrl $5, %edx
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_i64_lshr_lshr_1:		; X64-MASK-LABEL: test_i64_lshr_lshr_1:
; X64: # %bb.0:		; X64-MASK: # %bb.0:
; X64-NEXT: leaq (,%rdi,8), %rax		; X64-MASK-NEXT: shrq $2, %rdi
; X64-NEXT: shrq $5, %rax		; X64-MASK-NEXT: movabsq $576460752303423487, %rax # imm = 0x7FFFFFFFFFFFFFF
; X64-NEXT: retq		; X64-MASK-NEXT: andq %rdi, %rax
		; X64-MASK-NEXT: retq
		;
		; X64-SHIFT-LABEL: test_i64_lshr_lshr_1:
		; X64-SHIFT: # %bb.0:
		; X64-SHIFT-NEXT: leaq (,%rdi,8), %rax
		; X64-SHIFT-NEXT: shrq $5, %rax
		; X64-SHIFT-NEXT: retq
%1 = shl i64 %a0, 3		%1 = shl i64 %a0, 3
%2 = lshr i64 %1, 5		%2 = lshr i64 %1, 5
ret i64 %2		ret i64 %2
}		}

define i64 @test_i64_lshr_lshr_2(i64 %a0) {		define i64 @test_i64_lshr_lshr_2(i64 %a0) {
; X86-LABEL: test_i64_lshr_lshr_2:		; X86-LABEL: test_i64_lshr_lshr_2:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx		; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: shldl $5, %eax, %edx		; X86-NEXT: shldl $2, %eax, %edx
; X86-NEXT: shll $5, %eax		; X86-NEXT: andl $536870911, %edx # imm = 0x1FFFFFFF
; X86-NEXT: shrdl $3, %edx, %eax		; X86-NEXT: shll $2, %eax
; X86-NEXT: shrl $3, %edx
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_i64_lshr_lshr_2:		; X64-MASK-LABEL: test_i64_lshr_lshr_2:
; X64: # %bb.0:		; X64-MASK: # %bb.0:
; X64-NEXT: movq %rdi, %rax		; X64-MASK-NEXT: leaq (,%rdi,4), %rcx
; X64-NEXT: shlq $5, %rax		; X64-MASK-NEXT: movabsq $2305843009213693948, %rax # imm = 0x1FFFFFFFFFFFFFFC
; X64-NEXT: shrq $3, %rax		; X64-MASK-NEXT: andq %rcx, %rax
; X64-NEXT: retq		; X64-MASK-NEXT: retq
		;
		; X64-SHIFT-LABEL: test_i64_lshr_lshr_2:
		; X64-SHIFT: # %bb.0:
		; X64-SHIFT-NEXT: movq %rdi, %rax
		; X64-SHIFT-NEXT: shlq $5, %rax
		; X64-SHIFT-NEXT: shrq $3, %rax
		; X64-SHIFT-NEXT: retq
%1 = shl i64 %a0, 5		%1 = shl i64 %a0, 5
%2 = lshr i64 %1, 3		%2 = lshr i64 %1, 3
ret i64 %2		ret i64 %2
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][X86][AMDGPU][AArch64] (srl (shl x, c1), c2) with c1 != c2 handlingAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 200150

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/AArch64/arm64-bitfield-extract.ll

test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll

test/CodeGen/X86/pr32588.ll

test/CodeGen/X86/pull-binop-through-shift.ll

test/CodeGen/X86/rotate-extract-vector.ll

test/CodeGen/X86/rotate-extract.ll

test/CodeGen/X86/shift-mask.ll

[DAGCombine][X86][AMDGPU][AArch64] (srl (shl x, c1), c2) with c1 != c2 handling
AbandonedPublic