This is an archive of the discontinued LLVM Phabricator instance.

This sound interesting, but there might be a more general way to handle it. From what I can tell the base sshr demands a certain number of top bits. That is usually communicated through TLI.SimplifyDemandedBits with an appropriate DemandedMask.

Then I think it could specify the simplification that happens to target nodes based on demanded bits with an overridden SimplifyDemandedBitsForTargetNode. It would need code similar to https://github.com/llvm/llvm-project/blob/4f05f4c8e66bc76b1d94f5283494404382e3bacd/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L1455, but using AArch64ISD::VSHL/AArch64ISD::VLSHR.

That might be more general, handling any cases where the demanded bits come from anywhere. And SimplifyDemandedBitsForTargetNode can be expanded with more cases if we find them.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14562	I think that the shift amount of VASHR/VLSHR/VSHL are always constants, so Op0.getConstantOperandVal(1) can be used directly.

If you want to do it this way instead though, that sounds fine too. There will only be a limited number of cases where the AArch64ISD::VSHL etc haven't already been simplified.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14575	SDValue Shift2 = Shift1->getOperand(0); Same for Shift3 below.
14581	This is only used in one place. Same for Shift2Opc above.
14602	I believe these could both just be hasOneUse()
14608	Shift1Opc

Thanks a lot Dave! I'll follow your first suggestion, and if does not work, we can get back to the original patch.

Used TLI.SimplifyDemandedBits for performShiftCombine.
Extended SimplifyDemandedBits to cover AArch64 VLSHR + VSHL.

Harbormaster completed remote builds in B105035: Diff 346164.May 18 2021, 10:45 AM

Thanks. I'm glad this way worked.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14561	Perhaps performVectorShiftCombine
14564	This probably isn't needed, or could be an assert.
14574	I'm not sure these need casts, or to be uint64_t. They should both be fairly small.
14577	Most other uses of this function that I see seem to use: if (TLI.SimplifyDemandedBits(..)) return SDValue(N, 0); It may not alter much, but will be closer to what DAGCombiner::combine expects the return value to be for something that changed.
llvm/test/CodeGen/AArch64/aarch64-bswap-ext.ll
3	This probably doesn't need -O2

Applied CR comments.

Thanks. LGTM

This revision is now accepted and ready to land.May 19 2021, 10:31 AM

Harbormaster completed remote builds in B105263: Diff 346497.May 19 2021, 11:35 AM

Closed by commit rGa647100b4320: [AArch64] Combine vector shift instructions in SelectionDAG (authored by asavonic). · Explain WhyMay 20 2021, 1:00 AM

This revision was automatically updated to reflect the committed changes.

asavonic added a commit: rGa647100b4320: [AArch64] Combine vector shift instructions in SelectionDAG.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

7 lines

AArch64ISelLowering.cpp

70 lines

test/

CodeGen/

AArch64/

aarch64-bswap-ext.ll

27 lines

Diff 346497

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 1,058 Lines • ▼ Show 20 Lines	private:

bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;		bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;

void finalizeLowering(MachineFunction &MF) const override;		void finalizeLowering(MachineFunction &MF) const override;

bool shouldLocalize(const MachineInstr &MI,		bool shouldLocalize(const MachineInstr &MI,
const TargetTransformInfo *TTI) const override;		const TargetTransformInfo *TTI) const override;

		bool SimplifyDemandedBitsForTargetNode(SDValue Op,
		const APInt &OriginalDemandedBits,
		const APInt &OriginalDemandedElts,
		KnownBits &Known,
		TargetLoweringOpt &TLO,
		unsigned Depth) const override;

// Normally SVE is only used for byte size vectors that do not fit within a		// Normally SVE is only used for byte size vectors that do not fit within a
// NEON vector. This changes when OverrideNEON is true, allowing SVE to be		// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
// used for 64bit and 128bit vectors as well.		// used for 64bit and 128bit vectors as well.
bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;		bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;

// With the exception of data-predicate transitions, no instructions are		// With the exception of data-predicate transitions, no instructions are
// required to cast between legal scalable vector types. However:		// required to cast between legal scalable vector types. However:
// 1. Packed and unpacked types have different bit lengths, meaning BITCAST		// 1. Packed and unpacked types have different bit lengths, meaning BITCAST
Show All 17 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,552 Lines • ▼ Show 20 Lines	if (ExtPg == Pg && ExtFromEVT == MVT::i32) {
return DAG.getNode(NewOpc, DL, {ResVT, MVT::Other},		return DAG.getNode(NewOpc, DL, {ResVT, MVT::Other},
{Chain, Pg, Base, UnextendedOffset, Ty});		{Chain, Pg, Base, UnextendedOffset, Ty});
}		}
}		}

return SDValue();		return SDValue();
}		}

		/// Optimize a vector shift instruction and its operand if shifted out
		dmgreenUnsubmitted Not Done Reply Inline Actions Perhaps performVectorShiftCombine dmgreen: Perhaps performVectorShiftCombine
		/// bits are not used.
		dmgreenUnsubmitted Not Done Reply Inline Actions I think that the shift amount of VASHR/VLSHR/VSHL are always constants, so Op0.getConstantOperandVal(1) can be used directly. dmgreen: I think that the shift amount of VASHR/VLSHR/VSHL are always constants, so Op0.
		static SDValue performVectorShiftCombine(SDNode *N,
		const AArch64TargetLowering &TLI,
		dmgreenUnsubmitted Not Done Reply Inline Actions This probably isn't needed, or could be an assert. dmgreen: This probably isn't needed, or could be an assert.
		TargetLowering::DAGCombinerInfo &DCI) {
		assert(N->getOpcode() == AArch64ISD::VASHR \|\|
		N->getOpcode() == AArch64ISD::VLSHR);

		SDValue Op = N->getOperand(0);
		unsigned OpScalarSize = Op.getScalarValueSizeInBits();

		unsigned ShiftImm = N->getConstantOperandVal(1);
		assert(OpScalarSize > ShiftImm && "Invalid shift imm");

		dmgreenUnsubmitted Not Done Reply Inline Actions I'm not sure these need casts, or to be uint64_t. They should both be fairly small. dmgreen: I'm not sure these need casts, or to be uint64_t. They should both be fairly small.
		APInt ShiftedOutBits = APInt::getLowBitsSet(OpScalarSize, ShiftImm);
		dmgreenUnsubmitted Not Done Reply Inline Actions SDValue Shift2 = Shift1->getOperand(0); Same for Shift3 below. dmgreen: SDValue Shift2 = Shift1->getOperand(0); Same for Shift3 below.
		APInt DemandedMask = ~ShiftedOutBits;

		dmgreenUnsubmitted Not Done Reply Inline Actions Most other uses of this function that I see seem to use: if (TLI.SimplifyDemandedBits(..)) return SDValue(N, 0); It may not alter much, but will be closer to what DAGCombiner::combine expects the return value to be for something that changed. dmgreen: Most other uses of this function that I see seem to use: if (TLI.SimplifyDemandedBits(..))…
		if (TLI.SimplifyDemandedBits(Op, DemandedMask, DCI))
		return SDValue(N, 0);

		return SDValue();
		dmgreenUnsubmitted Not Done Reply Inline Actions This is only used in one place. Same for Shift2Opc above. dmgreen: This is only used in one place. Same for Shift2Opc above.
		}

/// Target-specific DAG combine function for post-increment LD1 (lane) and		/// Target-specific DAG combine function for post-increment LD1 (lane) and
/// post-increment LD1R.		/// post-increment LD1R.
static SDValue performPostLD1Combine(SDNode *N,		static SDValue performPostLD1Combine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
bool IsLaneOp) {		bool IsLaneOp) {
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

if (VT.isScalableVector())		if (VT.isScalableVector())
return SDValue();		return SDValue();

unsigned LoadIdx = IsLaneOp ? 1 : 0;		unsigned LoadIdx = IsLaneOp ? 1 : 0;
SDNode *LD = N->getOperand(LoadIdx).getNode();		SDNode *LD = N->getOperand(LoadIdx).getNode();
// If it is not LOAD, can not do such combine.		// If it is not LOAD, can not do such combine.
if (LD->getOpcode() != ISD::LOAD)		if (LD->getOpcode() != ISD::LOAD)
return SDValue();		return SDValue();
		dmgreenUnsubmitted Not Done Reply Inline Actions I believe these could both just be hasOneUse() dmgreen: I believe these could both just be hasOneUse()

// The vector lane must be a constant in the LD1LANE opcode.		// The vector lane must be a constant in the LD1LANE opcode.
SDValue Lane;		SDValue Lane;
if (IsLaneOp) {		if (IsLaneOp) {
Lane = N->getOperand(2);		Lane = N->getOperand(2);
auto *LaneC = dyn_cast<ConstantSDNode>(Lane);		auto *LaneC = dyn_cast<ConstantSDNode>(Lane);
		dmgreenUnsubmitted Not Done Reply Inline Actions Shift1Opc dmgreen: Shift1Opc
if (!LaneC \|\| LaneC->getZExtValue() >= VT.getVectorNumElements())		if (!LaneC \|\| LaneC->getZExtValue() >= VT.getVectorNumElements())
return SDValue();		return SDValue();
}		}

LoadSDNode *LoadSDN = cast<LoadSDNode>(LD);		LoadSDNode *LoadSDN = cast<LoadSDNode>(LD);
EVT MemVT = LoadSDN->getMemoryVT();		EVT MemVT = LoadSDN->getMemoryVT();
// Check if memory operand is the same type as the vector element.		// Check if memory operand is the same type as the vector element.
if (MemVT != VT.getVectorElementType())		if (MemVT != VT.getVectorElementType())
▲ Show 20 Lines • Show All 1,435 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case AArch64ISD::GLD1S_MERGE_ZERO:		case AArch64ISD::GLD1S_MERGE_ZERO:
case AArch64ISD::GLD1S_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_UXTW_MERGE_ZERO:		case AArch64ISD::GLD1S_UXTW_MERGE_ZERO:
case AArch64ISD::GLD1S_SXTW_MERGE_ZERO:		case AArch64ISD::GLD1S_SXTW_MERGE_ZERO:
case AArch64ISD::GLD1S_UXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_UXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_SXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_SXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_IMM_MERGE_ZERO:		case AArch64ISD::GLD1S_IMM_MERGE_ZERO:
return performGLD1Combine(N, DAG);		return performGLD1Combine(N, DAG);
		case AArch64ISD::VASHR:
		case AArch64ISD::VLSHR:
		return performVectorShiftCombine(N, *this, DCI);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return performInsertVectorEltCombine(N, DCI);		return performInsertVectorEltCombine(N, DCI);
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
return performExtractVectorEltCombine(N, DAG);		return performExtractVectorEltCombine(N, DAG);
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
return performVecReduceAddCombine(N, DCI.DAG, Subtarget);		return performVecReduceAddCombine(N, DCI.DAG, Subtarget);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
▲ Show 20 Lines • Show All 1,674 Lines • ▼ Show 20 Lines	if (VT != PackedVT)
Op = DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL, VT, Op);		Op = DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL, VT, Op);

return Op;		return Op;
}		}

bool AArch64TargetLowering::isAllActivePredicate(SDValue N) const {		bool AArch64TargetLowering::isAllActivePredicate(SDValue N) const {
return ::isAllActivePredicate(N);		return ::isAllActivePredicate(N);
}		}

		bool AArch64TargetLowering::SimplifyDemandedBitsForTargetNode(
		SDValue Op, const APInt &OriginalDemandedBits,
		const APInt &OriginalDemandedElts, KnownBits &Known, TargetLoweringOpt &TLO,
		unsigned Depth) const {

		unsigned Opc = Op.getOpcode();
		switch (Opc) {
		case AArch64ISD::VSHL: {
		// Match (VSHL (VLSHR Val X) X)
		SDValue ShiftL = Op;
		SDValue ShiftR = Op->getOperand(0);
		if (ShiftR->getOpcode() != AArch64ISD::VLSHR)
		return false;

		if (!ShiftL.hasOneUse() \|\| !ShiftR.hasOneUse())
		return false;

		unsigned ShiftLBits = ShiftL->getConstantOperandVal(1);
		unsigned ShiftRBits = ShiftR->getConstantOperandVal(1);

		// Other cases can be handled as well, but this is not
		// implemented.
		if (ShiftRBits != ShiftLBits)
		return false;

		unsigned ScalarSize = Op.getScalarValueSizeInBits();
		assert(ScalarSize > ShiftLBits && "Invalid shift imm");

		APInt ZeroBits = APInt::getLowBitsSet(ScalarSize, ShiftLBits);
		APInt UnusedBits = ~OriginalDemandedBits;

		if ((ZeroBits & UnusedBits) != ZeroBits)
		return false;

		// All bits that are zeroed by (VSHL (VLSHR Val X) X) are not
		// used - simplify to just Val.
		return TLO.CombineTo(Op, ShiftR->getOperand(0));
		}
		}

		return TargetLowering::SimplifyDemandedBitsForTargetNode(
		Op, OriginalDemandedBits, OriginalDemandedElts, Known, TLO, Depth);
		}

llvm/test/CodeGen/AArch64/aarch64-bswap-ext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s \| FileCheck %s

				dmgreenUnsubmitted Not Done Reply Inline Actions This probably doesn't need -O2 dmgreen: This probably doesn't need -O2
				define <2 x i32> @test1(<2 x i16> %v2i16) {
				; CHECK-LABEL: test1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: rev32 v0.8b, v0.8b
				; CHECK-NEXT: sshr v0.2s, v0.2s, #16
				; CHECK-NEXT: ret
				%v2i16_rev = call <2 x i16> @llvm.bswap.v2i16(<2 x i16> %v2i16)
				%v2i32 = sext <2 x i16> %v2i16_rev to <2 x i32>
				ret <2 x i32> %v2i32
				}

				define <2 x float> @test2(<2 x i16> %v2i16) {
				; CHECK-LABEL: test2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: rev32 v0.8b, v0.8b
				; CHECK-NEXT: sshr v0.2s, v0.2s, #16
				; CHECK-NEXT: scvtf v0.2s, v0.2s
				; CHECK-NEXT: ret
				%v2i16_rev = call <2 x i16> @llvm.bswap.v2i16(<2 x i16> %v2i16)
				%v2f32 = sitofp <2 x i16> %v2i16_rev to <2 x float>
				ret <2 x float> %v2f32
				}

				declare <2 x i16> @llvm.bswap.v2i16(<2 x i16>) nounwind readnone

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Combine shift instructions in SelectionDAGClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 346497

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/aarch64-bswap-ext.ll

[AArch64] Combine shift instructions in SelectionDAG
ClosedPublic