This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses
ClosedPublic

Authored by RKSimon on May 14 2022, 6:12 AM.

Download Raw Diff

Details

Reviewers

foad
dmgreen
craig.topper
uweigand

Commits

rGd40b7f0d5aec: [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses

Summary

If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node.

AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback.

Part of the work to fix the regressions in D77804

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.May 14 2022, 6:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2022, 6:12 AM

Herald added subscribers: kosarev, StephenFan, frasercrmck and 25 others. · View Herald Transcript

RKSimon requested review of this revision.May 14 2022, 6:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2022, 6:12 AM

Herald added subscribers: • pcwang-thead, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B164449: Diff 429444.May 14 2022, 6:58 AM

RKSimon retitled this revision from [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if x has other uses to [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses.May 14 2022, 9:02 AM

RISC-V changes LGTM.

SystemZ part LGTM.

@foad @dmgreen Any objections to the AMDGPU / ARM changes?

Do you know what makes the AArch64::shouldFoldConstantShiftPairToMask necessary? Is it just something about those tests, or something fundamental to the architecture?

In D125607#3518370, @RKSimon wrote:

@foad @dmgreen Any objections to the AMDGPU / ARM changes?

AMDGPU looks OK to me. Using a literal 0xffff0000 operand increases code size, but on the other had some sequences have one fewer instruction which is nice.

In D125607#3518372, @dmgreen wrote:

Do you know what makes the AArch64::shouldFoldConstantShiftPairToMask necessary? Is it just something about those tests, or something fundamental to the architecture?

There were a number of test regressions - some of the UXTB matching is affected again (I think this was mainly in a followup patch that I'm working on for an equivalent (srl (shl x, c1), c2)) fold) and some load multiple / calling convention tests were messed up.

Shall I update the patch showing the regressions for comparison?

I'd be happy to address these in a followup patch which will relax/remove the AArch64TargetLowering::shouldFoldConstantShiftPairToMask limit again - I'm currently thinking/hoping using this callback in ARM/AArch64 will help with the UXTB regressions I keep hitting.

I think AArch64 should be able to do the And efficiently in most cases. There is an instcombine equivalent fold that doesn't check one-use, so from that perspective it will only be things that come up from DAG combine that this alters. I see two win vararg tests changing, with code that comes from AArch64TargetLowering::LowerWindowsDYNAMIC_STACKALLOC.

It seems to be only better because the use happens to be a sub instruction that can include the shift for "free". I'm not sure if that is a great reason, from the architecture perspective, to block the transform (unlike Thumb1 where the And cannot easily be done, as in D55630). But perhaps the shifted instruction is a good enough justification.

I'm not a huge fan of the override for aarch64, but as this is unlikely to come up from IR, it sounds like it should be fine. It is unlikely to be worth trying to implement something that is a more precise for this particular case.

So LGTM.

This revision is now accepted and ready to land.May 17 2022, 5:00 AM

Closed by commit rGd40b7f0d5aec: [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses (authored by RKSimon). · Explain WhyMay 17 2022, 5:40 AM

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rGd40b7f0d5aec: [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

4 lines

Target/

AArch64/

AArch64ISelLowering.h

4 lines

AArch64ISelLowering.cpp

11 lines

test/

CodeGen/

AMDGPU/

insert_vector_elt.v2i16.ll

10 lines

load-lo16.ll

12 lines

scalar_to_vector.ll

30 lines

ARM/

combine-movc-sub.ll

14 lines

RISCV/

rvv/

extract-subvector.ll

5 lines

insert-subvector.ll

18 lines

legalize-load-sdnode.ll

5 lines

legalize-store-sdnode.ll

5 lines

SystemZ/

store_nonbytesized_vecs.ll

9 lines

Diff 430010

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,931 Lines • ▼ Show 20 Lines	if (N0->getFlags().hasExact()) {
return DAG.getNode(N0.getOpcode(), DL, VT, N0.getOperand(0), Diff);		return DAG.getNode(N0.getOpcode(), DL, VT, N0.getOperand(0), Diff);
}		}
}		}

// fold (shl (srl x, c1), c2) -> (and (shl x, (sub c2, c1), MASK) or		// fold (shl (srl x, c1), c2) -> (and (shl x, (sub c2, c1), MASK) or
// (and (srl x, (sub c1, c2), MASK)		// (and (srl x, (sub c1, c2), MASK)
// Only fold this if the inner shift has no other uses -- if it does,		// Only fold this if the inner shift has no other uses -- if it does,
// folding this will increase the total number of instructions.		// folding this will increase the total number of instructions.
// TODO - drop hasOneUse requirement if c1 == c2?		if (N0.getOpcode() == ISD::SRL &&
if (N0.getOpcode() == ISD::SRL && N0.hasOneUse() &&		(N0.getOperand(1) == N1 \|\| N0.hasOneUse()) &&
TLI.shouldFoldConstantShiftPairToMask(N, Level)) {		TLI.shouldFoldConstantShiftPairToMask(N, Level)) {
if (ISD::matchBinaryPredicate(N1, N0.getOperand(1), MatchShiftAmount,		if (ISD::matchBinaryPredicate(N1, N0.getOperand(1), MatchShiftAmount,
/AllowUndefs/ false,		/AllowUndefs/ false,
/AllowTypeMismatch/ true)) {		/AllowTypeMismatch/ true)) {
SDValue N01 = DAG.getZExtOrTrunc(N0.getOperand(1), DL, ShiftVT);		SDValue N01 = DAG.getZExtOrTrunc(N0.getOperand(1), DL, ShiftVT);
SDValue Diff = DAG.getNode(ISD::SUB, DL, ShiftVT, N01, N1);		SDValue Diff = DAG.getNode(ISD::SUB, DL, ShiftVT, N01, N1);
SDValue Mask = DAG.getAllOnesConstant(DL, VT);		SDValue Mask = DAG.getAllOnesConstant(DL, VT);
Mask = DAG.getNode(ISD::SHL, DL, VT, Mask, N01);		Mask = DAG.getNode(ISD::SHL, DL, VT, Mask, N01);
▲ Show 20 Lines • Show All 15,722 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 638 Lines • ▼ Show 20 Lines	bool generateFMAsInMachineCombiner(EVT VT,
CodeGenOpt::Level OptLevel) const override;		CodeGenOpt::Level OptLevel) const override;

const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;		const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;

/// Returns false if N is a bit extraction pattern of (X >> C) & Mask.		/// Returns false if N is a bit extraction pattern of (X >> C) & Mask.
bool isDesirableToCommuteWithShift(const SDNode *N,		bool isDesirableToCommuteWithShift(const SDNode *N,
CombineLevel Level) const override;		CombineLevel Level) const override;

		/// Return true if it is profitable to fold a pair of shifts into a mask.
		bool shouldFoldConstantShiftPairToMask(const SDNode *N,
		CombineLevel Level) const override;

/// Returns true if it is beneficial to convert a load of a constant		/// Returns true if it is beneficial to convert a load of a constant
/// to just the constant itself.		/// to just the constant itself.
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

/// Return true if EXTRACT_SUBVECTOR is cheap for this result type		/// Return true if EXTRACT_SUBVECTOR is cheap for this result type
/// with this index.		/// with this index.
bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
▲ Show 20 Lines • Show All 498 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,294 Lines • ▼ Show 20 Lines	if (N->getOpcode() == ISD::AND && (VT == MVT::i32 \|\| VT == MVT::i64) &&
if (isMask_64(TruncMask) &&		if (isMask_64(TruncMask) &&
N->getOperand(0).getOpcode() == ISD::SRL &&		N->getOperand(0).getOpcode() == ISD::SRL &&
isa<ConstantSDNode>(N->getOperand(0)->getOperand(1)))		isa<ConstantSDNode>(N->getOperand(0)->getOperand(1)))
return false;		return false;
}		}
return true;		return true;
}		}

		bool AArch64TargetLowering::shouldFoldConstantShiftPairToMask(
		const SDNode *N, CombineLevel Level) const {
		assert(((N->getOpcode() == ISD::SHL &&
		N->getOperand(0).getOpcode() == ISD::SRL) \|\|
		(N->getOpcode() == ISD::SRL &&
		N->getOperand(0).getOpcode() == ISD::SHL)) &&
		"Expected shift-shift mask");
		// Don't allow multiuse shift folding with the same shift amount.
		return N->getOperand(0)->hasOneUse();
		}

bool AArch64TargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool AArch64TargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const {		Type *Ty) const {
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned BitSize = Ty->getPrimitiveSizeInBits();		unsigned BitSize = Ty->getPrimitiveSizeInBits();
if (BitSize == 0)		if (BitSize == 0)
return false;		return false;

▲ Show 20 Lines • Show All 7,810 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll

	Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; CI-LABEL: s_insertelement_v2i16_0_multi_use_hi_reg:			; CI-LABEL: s_insertelement_v2i16_0_multi_use_hi_reg:
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0			; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
	; CI-NEXT: s_load_dword s4, s[4:5], 0xc			; CI-NEXT: s_load_dword s4, s[4:5], 0xc
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_load_dword s2, s[2:3], 0x0			; CI-NEXT: s_load_dword s2, s[2:3], 0x0
	; CI-NEXT: v_mov_b32_e32 v1, s1
	; CI-NEXT: v_mov_b32_e32 v0, s0			; CI-NEXT: v_mov_b32_e32 v0, s0
				; CI-NEXT: v_mov_b32_e32 v1, s1
	; CI-NEXT: s_and_b32 s0, s4, 0xffff			; CI-NEXT: s_and_b32 s0, s4, 0xffff
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: s_lshr_b32 s1, s2, 16			; CI-NEXT: s_and_b32 s1, s2, 0xffff0000
	; CI-NEXT: s_lshl_b32 s2, s1, 16			; CI-NEXT: s_or_b32 s0, s0, s1
	; CI-NEXT: s_or_b32 s0, s0, s2
	; CI-NEXT: v_mov_b32_e32 v2, s0			; CI-NEXT: v_mov_b32_e32 v2, s0
				; CI-NEXT: s_lshr_b32 s2, s2, 16
	; CI-NEXT: flat_store_dword v[0:1], v2			; CI-NEXT: flat_store_dword v[0:1], v2
	; CI-NEXT: ;;#ASMSTART			; CI-NEXT: ;;#ASMSTART
	; CI-NEXT: ; use s1			; CI-NEXT: ; use s2
	; CI-NEXT: ;;#ASMEND			; CI-NEXT: ;;#ASMEND
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	%vec = load <2 x i16>, <2 x i16> addrspace(4)* %vec.ptr			%vec = load <2 x i16>, <2 x i16> addrspace(4)* %vec.ptr
	%elt1 = extractelement <2 x i16> %vec, i32 1			%elt1 = extractelement <2 x i16> %vec, i32 1
	%vecins = insertelement <2 x i16> %vec, i16 %elt, i32 0			%vecins = insertelement <2 x i16> %vec, i16 %elt, i32 0
	store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out			store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out
	%use1 = zext i16 %elt1 to i32			%use1 = zext i16 %elt1 to i32
	call void asm sideeffect "; use $0", "s"(i32 %use1) #0			call void asm sideeffect "; use $0", "s"(i32 %use1) #0
	▲ Show 20 Lines • Show All 1,914 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/load-lo16.ll

	Show First 20 Lines • Show All 589 Lines • ▼ Show 20 Lines
	; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX906-NEXT: s_setpc_b64 s[30:31]			; GFX906-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX803-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_hi:			; GFX803-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_hi:
	; GFX803: ; %bb.0: ; %entry			; GFX803: ; %bb.0: ; %entry
	; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX803-NEXT: s_mov_b32 m0, -1			; GFX803-NEXT: s_mov_b32 m0, -1
	; GFX803-NEXT: ds_read_u16 v0, v0			; GFX803-NEXT: ds_read_u16 v0, v0
				; GFX803-NEXT: v_and_b32_e32 v2, 0xffff0000, v1
	; GFX803-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; GFX803-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; GFX803-NEXT: v_mov_b32_e32 v2, 0			; GFX803-NEXT: v_mov_b32_e32 v3, 0
	; GFX803-NEXT: ds_write_b16 v2, v1			; GFX803-NEXT: ds_write_b16 v3, v1
	; GFX803-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX803-NEXT: s_waitcnt lgkmcnt(1)			; GFX803-NEXT: s_waitcnt lgkmcnt(1)
	; GFX803-NEXT: v_or_b32_e32 v0, v0, v1			; GFX803-NEXT: v_or_b32_e32 v0, v0, v2
	; GFX803-NEXT: flat_store_dword v[0:1], v0			; GFX803-NEXT: flat_store_dword v[0:1], v0
	; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX803-NEXT: s_setpc_b64 s[30:31]			; GFX803-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%load = load i16, i16 addrspace(3)* %in			%load = load i16, i16 addrspace(3)* %in
	%elt1 = extractelement <2 x i16> %reg, i32 1			%elt1 = extractelement <2 x i16> %reg, i32 1
	store i16 %elt1, i16 addrspace(3)* null			store i16 %elt1, i16 addrspace(3)* null
	%build1 = insertelement <2 x i16> %reg, i16 %load, i32 0			%build1 = insertelement <2 x i16> %reg, i16 %load, i32 0
	Show All 30 Lines
	; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX906-NEXT: s_setpc_b64 s[30:31]			; GFX906-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX803-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_lohi:			; GFX803-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_lohi:
	; GFX803: ; %bb.0: ; %entry			; GFX803: ; %bb.0: ; %entry
	; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX803-NEXT: s_mov_b32 m0, -1			; GFX803-NEXT: s_mov_b32 m0, -1
	; GFX803-NEXT: ds_read_u16 v0, v0			; GFX803-NEXT: ds_read_u16 v0, v0
				; GFX803-NEXT: v_and_b32_e32 v4, 0xffff0000, v1
	; GFX803-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; GFX803-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; GFX803-NEXT: s_waitcnt lgkmcnt(0)			; GFX803-NEXT: s_waitcnt lgkmcnt(0)
	; GFX803-NEXT: ds_write_b16 v2, v0			; GFX803-NEXT: ds_write_b16 v2, v0
	; GFX803-NEXT: ds_write_b16 v3, v1			; GFX803-NEXT: ds_write_b16 v3, v1
	; GFX803-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; GFX803-NEXT: v_or_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; GFX803-NEXT: flat_store_dword v[0:1], v0			; GFX803-NEXT: flat_store_dword v[0:1], v0
	; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX803-NEXT: s_setpc_b64 s[30:31]			; GFX803-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%load = load i16, i16 addrspace(3)* %in			%load = load i16, i16 addrspace(3)* %in
	%elt1 = extractelement <2 x i16> %reg, i32 1			%elt1 = extractelement <2 x i16> %reg, i32 1
	store i16 %load, i16 addrspace(3)* %out0			store i16 %load, i16 addrspace(3)* %out0
	store i16 %elt1, i16 addrspace(3)* %out1			store i16 %elt1, i16 addrspace(3)* %out1
	▲ Show 20 Lines • Show All 1,592 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/scalar_to_vector.ll

	Show All 32 Lines
	; VI-NEXT: s_mov_b32 s11, s7			; VI-NEXT: s_mov_b32 s11, s7
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s8, s2			; VI-NEXT: s_mov_b32 s8, s2
	; VI-NEXT: s_mov_b32 s9, s3			; VI-NEXT: s_mov_b32 s9, s3
	; VI-NEXT: buffer_load_dword v0, off, s[8:11], 0			; VI-NEXT: buffer_load_dword v0, off, s[8:11], 0
	; VI-NEXT: s_mov_b32 s4, s0			; VI-NEXT: s_mov_b32 s4, s0
	; VI-NEXT: s_mov_b32 s5, s1			; VI-NEXT: s_mov_b32 s5, s1
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v0			; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
	; VI-NEXT: v_alignbit_b32 v0, v1, v0, 16			; VI-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
	; VI-NEXT: v_mov_b32_e32 v1, v0			; VI-NEXT: v_mov_b32_e32 v1, v0
	; VI-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; VI-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%tmp1 = load i32, i32 addrspace(1)* %in, align 4			%tmp1 = load i32, i32 addrspace(1)* %in, align 4
	%bc = bitcast i32 %tmp1 to <2 x i16>			%bc = bitcast i32 %tmp1 to <2 x i16>
	%tmp2 = shufflevector <2 x i16> %bc, <2 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			%tmp2 = shufflevector <2 x i16> %bc, <2 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	store <4 x i16> %tmp2, <4 x i16> addrspace(1)* %out, align 8			store <4 x i16> %tmp2, <4 x i16> addrspace(1)* %out, align 8
	ret void			ret void
	Show All 29 Lines
	; VI-NEXT: s_mov_b32 s11, s7			; VI-NEXT: s_mov_b32 s11, s7
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s8, s2			; VI-NEXT: s_mov_b32 s8, s2
	; VI-NEXT: s_mov_b32 s9, s3			; VI-NEXT: s_mov_b32 s9, s3
	; VI-NEXT: buffer_load_dword v0, off, s[8:11], 0			; VI-NEXT: buffer_load_dword v0, off, s[8:11], 0
	; VI-NEXT: s_mov_b32 s4, s0			; VI-NEXT: s_mov_b32 s4, s0
	; VI-NEXT: s_mov_b32 s5, s1			; VI-NEXT: s_mov_b32 s5, s1
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v0			; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v0
	; VI-NEXT: v_alignbit_b32 v0, v1, v0, 16			; VI-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
	; VI-NEXT: v_mov_b32_e32 v1, v0			; VI-NEXT: v_mov_b32_e32 v1, v0
	; VI-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; VI-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%tmp1 = load float, float addrspace(1)* %in, align 4			%tmp1 = load float, float addrspace(1)* %in, align 4
	%bc = bitcast float %tmp1 to <2 x i16>			%bc = bitcast float %tmp1 to <2 x i16>
	%tmp2 = shufflevector <2 x i16> %bc, <2 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			%tmp2 = shufflevector <2 x i16> %bc, <2 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	store <4 x i16> %tmp2, <4 x i16> addrspace(1)* %out, align 8			store <4 x i16> %tmp2, <4 x i16> addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @scalar_to_vector_v4i16() {			define amdgpu_kernel void @scalar_to_vector_v4i16() {
	; SI-LABEL: scalar_to_vector_v4i16:			; SI-LABEL: scalar_to_vector_v4i16:
	; SI: ; %bb.0: ; %bb			; SI: ; %bb.0: ; %bb
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s3, 0xf000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: buffer_load_ubyte v0, off, s[0:3], 0			; SI-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_lshlrev_b32_e32 v1, 8, v0			; SI-NEXT: v_lshlrev_b32_e32 v1, 8, v0
	; SI-NEXT: v_or_b32_e32 v0, v1, v0			; SI-NEXT: v_or_b32_e32 v0, v1, v0
	; SI-NEXT: v_lshrrev_b32_e32 v1, 8, v0			; SI-NEXT: v_and_b32_e32 v1, 0xff00, v0
	; SI-NEXT: v_lshlrev_b32_e32 v2, 8, v1			; SI-NEXT: v_lshrrev_b32_e32 v2, 8, v0
	; SI-NEXT: v_or_b32_e32 v1, v1, v2			; SI-NEXT: v_or_b32_e32 v1, v2, v1
	; SI-NEXT: v_lshlrev_b32_e32 v2, 16, v1			; SI-NEXT: v_lshlrev_b32_e32 v2, 16, v1
	; SI-NEXT: v_or_b32_e32 v1, v1, v2			; SI-NEXT: v_or_b32_e32 v1, v1, v2
	; SI-NEXT: v_or_b32_e32 v0, v0, v2			; SI-NEXT: v_or_b32_e32 v0, v0, v2
	; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: scalar_to_vector_v4i16:			; VI-LABEL: scalar_to_vector_v4i16:
	; VI: ; %bb.0: ; %bb			; VI: ; %bb.0: ; %bb
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: buffer_load_ubyte v0, off, s[0:3], 0			; VI-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_lshlrev_b16_e32 v1, 8, v0			; VI-NEXT: v_lshlrev_b16_e32 v1, 8, v0
	; VI-NEXT: v_or_b32_e32 v0, v1, v0			; VI-NEXT: v_or_b32_e32 v0, v1, v0
	; VI-NEXT: v_lshrrev_b16_e32 v1, 8, v0			; VI-NEXT: v_and_b32_e32 v1, 0xffffff00, v0
	; VI-NEXT: v_lshlrev_b16_e32 v2, 8, v1			; VI-NEXT: v_or_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
	; VI-NEXT: v_or_b32_e32 v1, v1, v2
	; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v1			; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v1
	; VI-NEXT: v_or_b32_sdwa v1, v1, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v1, v1, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; VI-NEXT: v_or_b32_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; VI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; VI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	bb:			bb:
	%tmp = load <2 x i8>, <2 x i8> addrspace(1)* undef, align 1			%tmp = load <2 x i8>, <2 x i8> addrspace(1)* undef, align 1
	%tmp1 = shufflevector <2 x i8> %tmp, <2 x i8> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			%tmp1 = shufflevector <2 x i8> %tmp, <2 x i8> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	%tmp2 = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> <i32 0, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9>			%tmp2 = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> <i32 0, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9>
	store <8 x i8> %tmp2, <8 x i8> addrspace(1)* undef, align 8			store <8 x i8> %tmp2, <8 x i8> addrspace(1)* undef, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @scalar_to_vector_v4f16() {			define amdgpu_kernel void @scalar_to_vector_v4f16() {
	; SI-LABEL: scalar_to_vector_v4f16:			; SI-LABEL: scalar_to_vector_v4f16:
	; SI: ; %bb.0: ; %bb			; SI: ; %bb.0: ; %bb
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s3, 0xf000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: buffer_load_ubyte v0, off, s[0:3], 0			; SI-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_lshlrev_b32_e32 v1, 8, v0			; SI-NEXT: v_lshlrev_b32_e32 v1, 8, v0
	; SI-NEXT: v_or_b32_e32 v0, v1, v0			; SI-NEXT: v_or_b32_e32 v0, v1, v0
	; SI-NEXT: v_lshrrev_b32_e32 v1, 8, v0			; SI-NEXT: v_and_b32_e32 v1, 0xff00, v0
	; SI-NEXT: v_lshlrev_b32_e32 v2, 8, v1			; SI-NEXT: v_lshrrev_b32_e32 v2, 8, v0
	; SI-NEXT: v_or_b32_e32 v1, v1, v2			; SI-NEXT: v_or_b32_e32 v1, v2, v1
	; SI-NEXT: v_lshlrev_b32_e32 v2, 16, v1			; SI-NEXT: v_lshlrev_b32_e32 v2, 16, v1
	; SI-NEXT: v_or_b32_e32 v1, v1, v2			; SI-NEXT: v_or_b32_e32 v1, v1, v2
	; SI-NEXT: v_or_b32_e32 v0, v0, v2			; SI-NEXT: v_or_b32_e32 v0, v0, v2
	; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: scalar_to_vector_v4f16:			; VI-LABEL: scalar_to_vector_v4f16:
	; VI: ; %bb.0: ; %bb			; VI: ; %bb.0: ; %bb
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: buffer_load_ubyte v0, off, s[0:3], 0			; VI-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_lshlrev_b16_e32 v1, 8, v0			; VI-NEXT: v_lshlrev_b16_e32 v1, 8, v0
	; VI-NEXT: v_or_b32_e32 v0, v1, v0			; VI-NEXT: v_or_b32_e32 v0, v1, v0
	; VI-NEXT: v_lshrrev_b16_e32 v1, 8, v0			; VI-NEXT: v_and_b32_e32 v1, 0xffffff00, v0
	; VI-NEXT: v_lshlrev_b16_e32 v2, 8, v1			; VI-NEXT: v_or_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
	; VI-NEXT: v_or_b32_e32 v1, v1, v2
	; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v1			; VI-NEXT: v_lshlrev_b32_e32 v2, 16, v1
	; VI-NEXT: v_or_b32_sdwa v1, v1, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v1, v1, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; VI-NEXT: v_or_b32_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; VI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; VI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	bb:			bb:
	%load = load half, half addrspace(1)* undef, align 1			%load = load half, half addrspace(1)* undef, align 1
	%tmp = bitcast half %load to <2 x i8>			%tmp = bitcast half %load to <2 x i8>
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/combine-movc-sub.ll

	Show All 19 Lines
	%struct.CLAUSE_HELP = type { i32, i32, i32, i32, i32, i32, %struct.LIST_HELP, %struct.LIST_HELP, i32, i32, %struct.LITERAL_HELP*, i32, i32, i32, i32 }			%struct.CLAUSE_HELP = type { i32, i32, i32, i32, i32, i32, %struct.LIST_HELP, %struct.LIST_HELP, i32, i32, %struct.LITERAL_HELP*, i32, i32, i32, i32 }
	%struct.LITERAL_HELP = type { i32, i32, i32, %struct.CLAUSE_HELP, %struct.term }			%struct.LITERAL_HELP = type { i32, i32, i32, %struct.CLAUSE_HELP, %struct.term }

	declare void @foo(%struct.PROOFSEARCH_HELP, %struct.CLAUSE_HELP)			declare void @foo(%struct.PROOFSEARCH_HELP, %struct.CLAUSE_HELP)

	define hidden fastcc %struct.LIST_HELP* @test(%struct.PROOFSEARCH_HELP* %Search, %struct.LIST_HELP* %ClauseList, i32 %Level, %struct.LIST_HELP** nocapture %New) {			define hidden fastcc %struct.LIST_HELP* @test(%struct.PROOFSEARCH_HELP* %Search, %struct.LIST_HELP* %ClauseList, i32 %Level, %struct.LIST_HELP** nocapture %New) {
	; CHECK-LABEL: test:			; CHECK-LABEL: test:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, lr}			; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, lr}
	; CHECK-NEXT: sub.w r9, r2, #32			; CHECK-NEXT: sub sp, #4
				; CHECK-NEXT: sub.w r7, r2, #32
	; CHECK-NEXT: mov r8, r0			; CHECK-NEXT: mov r8, r0
	; CHECK-NEXT: movs r0, #1			; CHECK-NEXT: movs r0, #1
	; CHECK-NEXT: mov r4, r2			; CHECK-NEXT: mov r4, r2
	; CHECK-NEXT: add.w r6, r0, r9, lsr #5			; CHECK-NEXT: add.w r6, r0, r7, lsr #5
	; CHECK-NEXT: mov r5, r1			; CHECK-NEXT: mov r5, r1
	; CHECK-NEXT: lsr.w r7, r9, #5			; CHECK-NEXT: mov.w r9, #0
	; CHECK-NEXT: mov.w r10, #0
	; CHECK-NEXT: b .LBB0_2			; CHECK-NEXT: b .LBB0_2
	; CHECK-NEXT: .LBB0_1: @ %for.inc			; CHECK-NEXT: .LBB0_1: @ %for.inc
	; CHECK-NEXT: @ in Loop: Header=BB0_2 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB0_2 Depth=1
	; CHECK-NEXT: ldr r5, [r5]			; CHECK-NEXT: ldr r5, [r5]
	; CHECK-NEXT: .LBB0_2: @ %for.body			; CHECK-NEXT: .LBB0_2: @ %for.body
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldr r1, [r5, #4]			; CHECK-NEXT: ldr r1, [r5, #4]
	; CHECK-NEXT: mov r2, r4			; CHECK-NEXT: mov r2, r4
	; CHECK-NEXT: cmp r4, #31			; CHECK-NEXT: cmp r4, #31
	; CHECK-NEXT: ldr r0, [r1, #16]			; CHECK-NEXT: ldr r0, [r1, #16]
	; CHECK-NEXT: add.w r0, r0, r6, lsl #2			; CHECK-NEXT: add.w r0, r0, r6, lsl #2
	; CHECK-NEXT: ldr r0, [r0, #40]			; CHECK-NEXT: ldr r0, [r0, #40]
	; CHECK-NEXT: it hi			; CHECK-NEXT: it hi
	; CHECK-NEXT: subhi.w r2, r9, r7, lsl #5			; CHECK-NEXT: andhi r2, r7, #31
	; CHECK-NEXT: lsrs r0, r2			; CHECK-NEXT: lsrs r0, r2
	; CHECK-NEXT: lsls r0, r0, #31			; CHECK-NEXT: lsls r0, r0, #31
	; CHECK-NEXT: beq .LBB0_1			; CHECK-NEXT: beq .LBB0_1
	; CHECK-NEXT: @ %bb.3: @ %if.then			; CHECK-NEXT: @ %bb.3: @ %if.then
	; CHECK-NEXT: @ in Loop: Header=BB0_2 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB0_2 Depth=1
	; CHECK-NEXT: mov r0, r8			; CHECK-NEXT: mov r0, r8
	; CHECK-NEXT: bl foo			; CHECK-NEXT: bl foo
	; CHECK-NEXT: str.w r10, [r5, #4]			; CHECK-NEXT: str.w r9, [r5, #4]
	; CHECK-NEXT: b .LBB0_1			; CHECK-NEXT: b .LBB0_1
	entry:			entry:
	%cmp4.i.i = icmp ugt i32 %Level, 31			%cmp4.i.i = icmp ugt i32 %Level, 31
	%0 = add i32 %Level, -32			%0 = add i32 %Level, -32
	%1 = lshr i32 %0, 5			%1 = lshr i32 %0, 5
	%2 = shl nuw i32 %1, 5			%2 = shl nuw i32 %1, 5
	%3 = sub i32 %0, %2			%3 = sub i32 %0, %2
	%4 = add nuw nsw i32 %1, 1			%4 = add nuw nsw i32 %1, 1
	Show All 28 Lines

llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%c = call <vscale x 2 x i8> @llvm.experimental.vector.extract.nxv2i8.nxv32i8(<vscale x 32 x i8> %vec, i64 22)		%c = call <vscale x 2 x i8> @llvm.experimental.vector.extract.nxv2i8.nxv32i8(<vscale x 32 x i8> %vec, i64 22)
ret <vscale x 2 x i8> %c		ret <vscale x 2 x i8> %c
}		}

define <vscale x 1 x i8> @extract_nxv8i8_nxv1i8_7(<vscale x 8 x i8> %vec) {		define <vscale x 1 x i8> @extract_nxv8i8_nxv1i8_7(<vscale x 8 x i8> %vec) {
; CHECK-LABEL: extract_nxv8i8_nxv1i8_7:		; CHECK-LABEL: extract_nxv8i8_nxv1i8_7:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: csrr a0, vlenb		; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: srli a0, a0, 3		; CHECK-NEXT: srli a1, a0, 3
; CHECK-NEXT: slli a1, a0, 3		; CHECK-NEXT: sub a0, a0, a1
; CHECK-NEXT: sub a0, a1, a0
; CHECK-NEXT: vsetvli a1, zero, e8, m1, ta, mu		; CHECK-NEXT: vsetvli a1, zero, e8, m1, ta, mu
; CHECK-NEXT: vslidedown.vx v8, v8, a0		; CHECK-NEXT: vslidedown.vx v8, v8, a0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%c = call <vscale x 1 x i8> @llvm.experimental.vector.extract.nxv1i8.nxv8i8(<vscale x 8 x i8> %vec, i64 7)		%c = call <vscale x 1 x i8> @llvm.experimental.vector.extract.nxv1i8.nxv8i8(<vscale x 8 x i8> %vec, i64 7)
ret <vscale x 1 x i8> %c		ret <vscale x 1 x i8> %c
}		}

define <vscale x 1 x i8> @extract_nxv4i8_nxv1i8_3(<vscale x 4 x i8> %vec) {		define <vscale x 1 x i8> @extract_nxv4i8_nxv1i8_3(<vscale x 4 x i8> %vec) {
▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll

Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%v = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv1i8.nxv16i8(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec, i64 3)		%v = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv1i8.nxv16i8(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec, i64 3)
ret <vscale x 16 x i8> %v		ret <vscale x 16 x i8> %v
}		}

define <vscale x 16 x i8> @insert_nxv16i8_nxv1i8_7(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec) {		define <vscale x 16 x i8> @insert_nxv16i8_nxv1i8_7(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec) {
; CHECK-LABEL: insert_nxv16i8_nxv1i8_7:		; CHECK-LABEL: insert_nxv16i8_nxv1i8_7:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: csrr a0, vlenb		; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: srli a0, a0, 3		; CHECK-NEXT: srli a1, a0, 3
; CHECK-NEXT: slli a1, a0, 3		; CHECK-NEXT: sub a1, a0, a1
; CHECK-NEXT: sub a0, a1, a0		; CHECK-NEXT: vsetvli zero, a0, e8, m1, tu, mu
; CHECK-NEXT: vsetvli zero, a1, e8, m1, tu, mu		; CHECK-NEXT: vslideup.vx v8, v10, a1
; CHECK-NEXT: vslideup.vx v8, v10, a0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%v = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv1i8.nxv16i8(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec, i64 7)		%v = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv1i8.nxv16i8(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec, i64 7)
ret <vscale x 16 x i8> %v		ret <vscale x 16 x i8> %v
}		}

define <vscale x 16 x i8> @insert_nxv16i8_nxv1i8_15(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec) {		define <vscale x 16 x i8> @insert_nxv16i8_nxv1i8_15(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec) {
; CHECK-LABEL: insert_nxv16i8_nxv1i8_15:		; CHECK-LABEL: insert_nxv16i8_nxv1i8_15:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: csrr a0, vlenb		; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: srli a0, a0, 3		; CHECK-NEXT: srli a1, a0, 3
; CHECK-NEXT: slli a1, a0, 3		; CHECK-NEXT: sub a1, a0, a1
; CHECK-NEXT: sub a0, a1, a0		; CHECK-NEXT: vsetvli zero, a0, e8, m1, tu, mu
; CHECK-NEXT: vsetvli zero, a1, e8, m1, tu, mu		; CHECK-NEXT: vslideup.vx v9, v10, a1
; CHECK-NEXT: vslideup.vx v9, v10, a0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%v = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv1i8.nxv16i8(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec, i64 15)		%v = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv1i8.nxv16i8(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec, i64 15)
ret <vscale x 16 x i8> %v		ret <vscale x 16 x i8> %v
}		}

define <vscale x 32 x half> @insert_nxv32f16_nxv2f16_0(<vscale x 32 x half> %vec, <vscale x 2 x half> %subvec) {		define <vscale x 32 x half> @insert_nxv32f16_nxv2f16_0(<vscale x 32 x half> %vec, <vscale x 2 x half> %subvec) {
; CHECK-LABEL: insert_nxv32f16_nxv2f16_0:		; CHECK-LABEL: insert_nxv32f16_nxv2f16_0:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/legalize-load-sdnode.ll

Show All 30 Lines	; CHECK-NEXT: ret
%v = load <vscale x 5 x half>, <vscale x 5 x half>* %ptr		%v = load <vscale x 5 x half>, <vscale x 5 x half>* %ptr
ret <vscale x 5 x half> %v		ret <vscale x 5 x half> %v
}		}

define <vscale x 7 x half> @load_nxv7f16(<vscale x 7 x half>* %ptr, <vscale x 7 x half>* %out) {		define <vscale x 7 x half> @load_nxv7f16(<vscale x 7 x half>* %ptr, <vscale x 7 x half>* %out) {
; CHECK-LABEL: load_nxv7f16:		; CHECK-LABEL: load_nxv7f16:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: csrr a2, vlenb		; CHECK-NEXT: csrr a2, vlenb
; CHECK-NEXT: srli a2, a2, 3		; CHECK-NEXT: srli a3, a2, 3
; CHECK-NEXT: slli a3, a2, 3		; CHECK-NEXT: sub a2, a2, a3
; CHECK-NEXT: sub a2, a3, a2
; CHECK-NEXT: vsetvli zero, a2, e16, m2, ta, mu		; CHECK-NEXT: vsetvli zero, a2, e16, m2, ta, mu
; CHECK-NEXT: vle16.v v8, (a0)		; CHECK-NEXT: vle16.v v8, (a0)
; CHECK-NEXT: vse16.v v8, (a1)		; CHECK-NEXT: vse16.v v8, (a1)
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%v = load <vscale x 7 x half>, <vscale x 7 x half>* %ptr		%v = load <vscale x 7 x half>, <vscale x 7 x half>* %ptr
store <vscale x 7 x half> %v, <vscale x 7 x half>* %out		store <vscale x 7 x half> %v, <vscale x 7 x half>* %out
ret <vscale x 7 x half> %v		ret <vscale x 7 x half> %v
}		}

llvm/test/CodeGen/RISCV/rvv/legalize-store-sdnode.ll

Show All 16 Lines	; CHECK-NEXT: ret
store <vscale x 3 x i8> %val, <vscale x 3 x i8>* %ptr		store <vscale x 3 x i8> %val, <vscale x 3 x i8>* %ptr
ret void		ret void
}		}

define void @store_nxv7f64(<vscale x 7 x double> %val, <vscale x 7 x double>* %ptr) {		define void @store_nxv7f64(<vscale x 7 x double> %val, <vscale x 7 x double>* %ptr) {
; CHECK-LABEL: store_nxv7f64:		; CHECK-LABEL: store_nxv7f64:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: csrr a1, vlenb		; CHECK-NEXT: csrr a1, vlenb
; CHECK-NEXT: srli a1, a1, 3		; CHECK-NEXT: srli a2, a1, 3
; CHECK-NEXT: slli a2, a1, 3		; CHECK-NEXT: sub a1, a1, a2
; CHECK-NEXT: sub a1, a2, a1
; CHECK-NEXT: vsetvli zero, a1, e64, m8, ta, mu		; CHECK-NEXT: vsetvli zero, a1, e64, m8, ta, mu
; CHECK-NEXT: vse64.v v8, (a0)		; CHECK-NEXT: vse64.v v8, (a0)
; CHECK-NEXT: ret		; CHECK-NEXT: ret
store <vscale x 7 x double> %val, <vscale x 7 x double>* %ptr		store <vscale x 7 x double> %val, <vscale x 7 x double>* %ptr
ret void		ret void
}		}

llvm/test/CodeGen/SystemZ/store_nonbytesized_vecs.ll

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	; CHECK-NEXT: br %r14
store <8 x i31> %tmp, <8 x i31>* %p		store <8 x i31> %tmp, <8 x i31>* %p
ret void		ret void
}		}

; Load and store a <3 x i31> vector (test widening).		; Load and store a <3 x i31> vector (test widening).
define void @fun3(<3 x i31>* %src, <3 x i31>* %p)		define void @fun3(<3 x i31>* %src, <3 x i31>* %p)
; CHECK-LABEL: fun3:		; CHECK-LABEL: fun3:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: l %r0, 8(%r2)		; CHECK-NEXT: llgf %r0, 8(%r2)
; CHECK-NEXT: lg %r1, 0(%r2)		; CHECK-NEXT: lg %r1, 0(%r2)
; CHECK-NEXT: sllg %r2, %r1, 32
; CHECK-NEXT: lr %r2, %r0
; CHECK-NEXT: st %r0, 8(%r3)
; CHECK-NEXT: srlg %r0, %r2, 32
; CHECK-NEXT: lr %r1, %r0
; CHECK-NEXT: nihh %r1, 8191
; CHECK-NEXT: stg %r1, 0(%r3)		; CHECK-NEXT: stg %r1, 0(%r3)
		; CHECK-NEXT: st %r0, 8(%r3)
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
{		{
%tmp = load <3 x i31>, <3 x i31>* %src		%tmp = load <3 x i31>, <3 x i31>* %src
store <3 x i31> %tmp, <3 x i31>* %p		store <3 x i31> %tmp, <3 x i31>* %p
ret void		ret void
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other usesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 430010

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll

llvm/test/CodeGen/AMDGPU/load-lo16.ll

llvm/test/CodeGen/AMDGPU/scalar_to_vector.ll

llvm/test/CodeGen/ARM/combine-movc-sub.ll

llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll

llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll

llvm/test/CodeGen/RISCV/rvv/legalize-load-sdnode.ll

llvm/test/CodeGen/RISCV/rvv/legalize-store-sdnode.ll

llvm/test/CodeGen/SystemZ/store_nonbytesized_vecs.ll

[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses
ClosedPublic