This is an archive of the discontinued LLVM Phabricator instance.

This sound interesting, but there might be a more general way to handle it. From what I can tell the base sshr demands a certain number of top bits. That is usually communicated through TLI.SimplifyDemandedBits with an appropriate DemandedMask.

Then I think it could specify the simplification that happens to target nodes based on demanded bits with an overridden SimplifyDemandedBitsForTargetNode. It would need code similar to https://github.com/llvm/llvm-project/blob/4f05f4c8e66bc76b1d94f5283494404382e3bacd/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L1455, but using AArch64ISD::VSHL/AArch64ISD::VLSHR.

That might be more general, handling any cases where the demanded bits come from anywhere. And SimplifyDemandedBitsForTargetNode can be expanded with more cases if we find them.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14562	I think that the shift amount of VASHR/VLSHR/VSHL are always constants, so Op0.getConstantOperandVal(1) can be used directly.

If you want to do it this way instead though, that sounds fine too. There will only be a limited number of cases where the AArch64ISD::VSHL etc haven't already been simplified.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14575	SDValue Shift2 = Shift1->getOperand(0); Same for Shift3 below.
14581	This is only used in one place. Same for Shift2Opc above.
14602	I believe these could both just be hasOneUse()
14608	Shift1Opc

Thanks a lot Dave! I'll follow your first suggestion, and if does not work, we can get back to the original patch.

Used TLI.SimplifyDemandedBits for performShiftCombine.
Extended SimplifyDemandedBits to cover AArch64 VLSHR + VSHL.

Harbormaster completed remote builds in B105035: Diff 346164.May 18 2021, 10:45 AM

Thanks. I'm glad this way worked.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14561	Perhaps performVectorShiftCombine
14564	This probably isn't needed, or could be an assert.
14574	I'm not sure these need casts, or to be uint64_t. They should both be fairly small.
14577	Most other uses of this function that I see seem to use: if (TLI.SimplifyDemandedBits(..)) return SDValue(N, 0); It may not alter much, but will be closer to what DAGCombiner::combine expects the return value to be for something that changed.
llvm/test/CodeGen/AArch64/aarch64-bswap-ext.ll
3	This probably doesn't need -O2

Applied CR comments.

Thanks. LGTM

This revision is now accepted and ready to land.May 19 2021, 10:31 AM

Harbormaster completed remote builds in B105263: Diff 346497.May 19 2021, 11:35 AM

Closed by commit rGa647100b4320: [AArch64] Combine vector shift instructions in SelectionDAG (authored by asavonic). · Explain WhyMay 20 2021, 1:00 AM

This revision was automatically updated to reflect the committed changes.

asavonic added a commit: rGa647100b4320: [AArch64] Combine vector shift instructions in SelectionDAG.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

54 lines

test/

CodeGen/

AArch64/

aarch64-bswap-ext.ll

27 lines

Diff 344839

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,552 Lines • ▼ Show 20 Lines	if (ExtPg == Pg && ExtFromEVT == MVT::i32) {
return DAG.getNode(NewOpc, DL, {ResVT, MVT::Other},		return DAG.getNode(NewOpc, DL, {ResVT, MVT::Other},
{Chain, Pg, Base, UnextendedOffset, Ty});		{Chain, Pg, Base, UnextendedOffset, Ty});
}		}
}		}

return SDValue();		return SDValue();
}		}

		static int64_t getShiftImm(SDNode *Shift) {
		dmgreenUnsubmitted Not Done Reply Inline Actions Perhaps performVectorShiftCombine dmgreen: Perhaps performVectorShiftCombine
		if (auto *Imm = dyn_cast<ConstantSDNode>(Shift->getOperand(1)))
		dmgreenUnsubmitted Not Done Reply Inline Actions I think that the shift amount of VASHR/VLSHR/VSHL are always constants, so Op0.getConstantOperandVal(1) can be used directly. dmgreen: I think that the shift amount of VASHR/VLSHR/VSHL are always constants, so Op0.
		return Imm->getSExtValue();
		return 0;
		dmgreenUnsubmitted Not Done Reply Inline Actions This probably isn't needed, or could be an assert. dmgreen: This probably isn't needed, or could be an assert.
		}

		static SDValue performShiftCombine(SDNode *N, SelectionDAG &DAG) {
		// Match ({VASHR\|VLSHR} (VSHL (VLSHR Op X) X) X)
		// This can be folded to just ({VASHR\|VLSHR} Op X)
		SDNode *Shift1 = N;
		unsigned Shift1Opc = Shift1->getOpcode();
		if (Shift1Opc != AArch64ISD::VASHR && Shift1Opc != AArch64ISD::VLSHR)
		return SDValue();

		dmgreenUnsubmitted Not Done Reply Inline Actions I'm not sure these need casts, or to be uint64_t. They should both be fairly small. dmgreen: I'm not sure these need casts, or to be uint64_t. They should both be fairly small.
		SDNode *Shift2 = Shift1->getOperand(0).getNode();
		dmgreenUnsubmitted Not Done Reply Inline Actions SDValue Shift2 = Shift1->getOperand(0); Same for Shift3 below. dmgreen: SDValue Shift2 = Shift1->getOperand(0); Same for Shift3 below.
		unsigned Shift2Opc = Shift2->getOpcode();
		if (Shift2Opc != AArch64ISD::VSHL)
		dmgreenUnsubmitted Not Done Reply Inline Actions Most other uses of this function that I see seem to use: if (TLI.SimplifyDemandedBits(..)) return SDValue(N, 0); It may not alter much, but will be closer to what DAGCombiner::combine expects the return value to be for something that changed. dmgreen: Most other uses of this function that I see seem to use: if (TLI.SimplifyDemandedBits(..))…
		return SDValue();

		SDNode *Shift3 = Shift2->getOperand(0).getNode();
		unsigned Shift3Opc = Shift3->getOpcode();
		dmgreenUnsubmitted Not Done Reply Inline Actions This is only used in one place. Same for Shift2Opc above. dmgreen: This is only used in one place. Same for Shift2Opc above.
		if (Shift3Opc != AArch64ISD::VLSHR)
		return SDValue();

		// Check that all instructions shift by the same number of bits.
		if (int64_t Shift1Imm = getShiftImm(Shift1)) {
		// Negative shifts are not supported.
		if (Shift1Imm < 0)
		return SDValue();
		if (Shift1Imm != getShiftImm(Shift2) \|\| Shift1Imm != getShiftImm(Shift3))
		return SDValue();
		} else {
		// Shift by a non-constant or zero
		return SDValue();
		}

		// Check that there are no other uses of inner shift instructions
		for (SDNode *User : Shift2->uses()) {
		if (User != Shift1)
		return SDValue();
		}
		for (SDNode *User : Shift3->uses()) {
		dmgreenUnsubmitted Not Done Reply Inline Actions I believe these could both just be hasOneUse() dmgreen: I believe these could both just be hasOneUse()
		if (User != Shift2)
		return SDValue();
		}

		SDValue Ops[] = {Shift3->getOperand(0), Shift1->getOperand(1)};
		return DAG.getNode(Shift1->getOpcode(), SDLoc(Shift1),
		dmgreenUnsubmitted Not Done Reply Inline Actions Shift1Opc dmgreen: Shift1Opc
		Shift1->getValueType(0), Ops);
		}

/// Target-specific DAG combine function for post-increment LD1 (lane) and		/// Target-specific DAG combine function for post-increment LD1 (lane) and
/// post-increment LD1R.		/// post-increment LD1R.
static SDValue performPostLD1Combine(SDNode *N,		static SDValue performPostLD1Combine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
bool IsLaneOp) {		bool IsLaneOp) {
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 1,460 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case AArch64ISD::GLD1S_MERGE_ZERO:		case AArch64ISD::GLD1S_MERGE_ZERO:
case AArch64ISD::GLD1S_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_UXTW_MERGE_ZERO:		case AArch64ISD::GLD1S_UXTW_MERGE_ZERO:
case AArch64ISD::GLD1S_SXTW_MERGE_ZERO:		case AArch64ISD::GLD1S_SXTW_MERGE_ZERO:
case AArch64ISD::GLD1S_UXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_UXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_SXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_SXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_IMM_MERGE_ZERO:		case AArch64ISD::GLD1S_IMM_MERGE_ZERO:
return performGLD1Combine(N, DAG);		return performGLD1Combine(N, DAG);
		case AArch64ISD::VASHR:
		case AArch64ISD::VLSHR:
		return performShiftCombine(N, DAG);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return performInsertVectorEltCombine(N, DCI);		return performInsertVectorEltCombine(N, DCI);
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
return performExtractVectorEltCombine(N, DAG);		return performExtractVectorEltCombine(N, DAG);
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
return performVecReduceAddCombine(N, DCI.DAG, Subtarget);		return performVecReduceAddCombine(N, DCI.DAG, Subtarget);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
▲ Show 20 Lines • Show All 1,682 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-bswap-ext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -O2 -mtriple=aarch64-unknown-linux-gnu < %s \| FileCheck %s

				dmgreenUnsubmitted Not Done Reply Inline Actions This probably doesn't need -O2 dmgreen: This probably doesn't need -O2
				define <2 x i32> @test1(<2 x i16> %v2i16) {
				; CHECK-LABEL: test1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: rev32 v0.8b, v0.8b
				; CHECK-NEXT: sshr v0.2s, v0.2s, #16
				; CHECK-NEXT: ret
				%v2i16_rev = call <2 x i16> @llvm.bswap.v2i16(<2 x i16> %v2i16)
				%v2i32 = sext <2 x i16> %v2i16_rev to <2 x i32>
				ret <2 x i32> %v2i32
				}

				define <2 x float> @test2(<2 x i16> %v2i16) {
				; CHECK-LABEL: test2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: rev32 v0.8b, v0.8b
				; CHECK-NEXT: sshr v0.2s, v0.2s, #16
				; CHECK-NEXT: scvtf v0.2s, v0.2s
				; CHECK-NEXT: ret
				%v2i16_rev = call <2 x i16> @llvm.bswap.v2i16(<2 x i16> %v2i16)
				%v2f32 = sitofp <2 x i16> %v2i16_rev to <2 x float>
				ret <2 x float> %v2f32
				}

				declare <2 x i16> @llvm.bswap.v2i16(<2 x i16>) nounwind readnone

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Combine shift instructions in SelectionDAGClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 344839

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/aarch64-bswap-ext.ll

[AArch64] Combine shift instructions in SelectionDAG
ClosedPublic