This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/5
DAGCombiner.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-extract-subvector.ll

Differential D82910

[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
ClosedPublic

Authored by sdesmalen on Jun 30 2020, 1:19 PM.

Download Raw Diff

Details

Reviewers

david-arm
efriedma
spatel

Commits

rG143e324e7501: [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR

Summary

There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).

When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:

extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')

This patch fixes both issues.

Diff Detail

Unit TestsFailed

	Time	Test
	7,630 ms	linux > libomp.env::Unknown Unit Message ("")
	1,740 ms	linux > libomp.worksharing/for::Unknown Unit Message ("")

Event Timeline

sdesmalen created this revision.Jun 30 2020, 1:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2020, 1:19 PM

Herald added subscribers: steven.zhang, psnobl, rkruppe and 3 others. · View Herald Transcript

efriedma added inline comments.Jun 30 2020, 1:51 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19221	While we're here, do we need to change these getVectorNumElements() calls to getVectorElementCount()?
19238	Does this math work correctly if we're extracting a fixed vector from a scalable vector?

Harbormaster failed remote builds in B62396: Diff 274601!Jun 30 2020, 3:14 PM

Removed uses of getVectorNumElements in favour of getVectorMinNumElements
Removed another warning from IRTranslator, so the test can have a CHECK line for no warnings (coming from TypeSize)

sdesmalen added inline comments.Jul 1 2020, 8:29 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

19238

Yes. To be sure, I tested this with some intrinsic that maps to EXTRACT_SUBVECTOR:

define <2 x i64> @extract_2i64_nxv16i8(<vscale x 16 x i8> %z0) {
  %z0_bc = bitcast <vscale x 16 x i8> %z0 to <vscale x 2 x i64>
  %ext = call <2 x i64> @llvm.experimental.extractsubvec.v2i64.nxv2i64(<vscale x 2 x i64> %z0_bc, i32 2)
  ret <2 x i64> %ext
}
=>
Optimized lowered selection DAG: %bb.0 'extract_2i64_nxv16i8:'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: nxv16i8,ch = CopyFromReg t0, Register:nxv16i8 %0
      t10: v16i8 = extract_subvector t2, Constant:i64<16>
    t11: v2i64 = bitcast t10
  t7: ch,glue = CopyToReg t0, Register:v2i64 $q0, t11
  t8: ch = AArch64ISD::RET_FLAG t7, Register:v2i64 $q0, t7:1

and for the other:

define <16 x i8> @extract_16i8_nxv2i64(<vscale x 2 x i64> %z0) {
  %z0_bc = bitcast <vscale x 2 x i64> %z0 to <vscale x 16 x i8>
  %ext = call <16 x i8> @llvm.experimental.extractsubvec.v16i8.nxv16i8(<vscale x 16 x i8> %z0_bc, i32 16)
  ret <16 x i8> %ext
}
=>
Optimized lowered selection DAG: %bb.0 'extract_16i8_nxv2i64:'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: nxv2i64,ch = CopyFromReg t0, Register:nxv2i64 %0
      t10: v2i64 = extract_subvector t2, Constant:i64<2>
    t11: v16i8 = bitcast t10
  t7: ch,glue = CopyToReg t0, Register:v16i8 $q0, t11
  t8: ch = AArch64ISD::RET_FLAG t7, Register:v16i8 $q0, t7:1

I wasn't planning to add intrinsic as part of this patch to test the behaviour.

efriedma added inline comments.Jul 1 2020, 12:11 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19238	I'm specifically concerned about cases where the number of lanes in the output fixed vector is greater than the number of lanes in the input scalable vector.

LGTM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19238	Actually, hmm, I think that's fine; the math operations doesn't care about the total number of elements in the output.

This revision is now accepted and ready to land.Jul 1 2020, 12:16 PM

Closed by commit rG143e324e7501: [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR (authored by sdesmalen). · Explain WhyJul 2 2020, 2:38 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

10 lines

Target/

AArch64/

AArch64ISelLowering.cpp

3 lines

test/

CodeGen/

AArch64/

sve-extract-subvector.ll

29 lines

Diff 274601

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 19,212 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitEXTRACT_SUBVECTOR(SDNode *N) {

// Try to move vector bitcast after extract_subv by scaling extraction index:		// Try to move vector bitcast after extract_subv by scaling extraction index:
// extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')		// extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')
if (V.getOpcode() == ISD::BITCAST &&		if (V.getOpcode() == ISD::BITCAST &&
V.getOperand(0).getValueType().isVector()) {		V.getOperand(0).getValueType().isVector()) {
SDValue SrcOp = V.getOperand(0);		SDValue SrcOp = V.getOperand(0);
EVT SrcVT = SrcOp.getValueType();		EVT SrcVT = SrcOp.getValueType();
unsigned SrcNumElts = SrcVT.getVectorNumElements();		unsigned SrcNumElts = SrcVT.getVectorNumElements();
unsigned DestNumElts = V.getValueType().getVectorNumElements();		unsigned DestNumElts = V.getValueType().getVectorNumElements();
		efriedmaUnsubmitted Not Done Reply Inline Actions While we're here, do we need to change these getVectorNumElements() calls to getVectorElementCount()? efriedma: While we're here, do we need to change these getVectorNumElements() calls to…
if ((SrcNumElts % DestNumElts) == 0) {		if ((SrcNumElts % DestNumElts) == 0) {
unsigned SrcDestRatio = SrcNumElts / DestNumElts;		unsigned SrcDestRatio = SrcNumElts / DestNumElts;
unsigned NewExtNumElts = NVT.getVectorNumElements() * SrcDestRatio;		ElementCount NewExtEC = NVT.getVectorElementCount() * SrcDestRatio;
EVT NewExtVT = EVT::getVectorVT(*DAG.getContext(), SrcVT.getScalarType(),		EVT NewExtVT = EVT::getVectorVT(*DAG.getContext(), SrcVT.getScalarType(),
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - EVT NewExtVT = EVT::getVectorVT(DAG.getContext(), SrcVT.getScalarType(), - NewExtEC); + EVT NewExtVT = + EVT::getVectorVT(DAG.getContext(), SrcVT.getScalarType(), NewExtEC); Lint: Pre-merge checks: clang-format: please reformat the code ``` - EVT NewExtVT = EVT::getVectorVT(*DAG.
NewExtNumElts);		NewExtEC);
if (TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, NewExtVT)) {		if (TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, NewExtVT)) {
SDLoc DL(N);		SDLoc DL(N);
SDValue NewIndex = DAG.getVectorIdxConstant(ExtIdx * SrcDestRatio, DL);		SDValue NewIndex = DAG.getVectorIdxConstant(ExtIdx * SrcDestRatio, DL);
SDValue NewExtract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,		SDValue NewExtract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,
V.getOperand(0), NewIndex);		V.getOperand(0), NewIndex);
return DAG.getBitcast(NVT, NewExtract);		return DAG.getBitcast(NVT, NewExtract);
}		}
}		}
if ((DestNumElts % SrcNumElts) == 0) {		if ((DestNumElts % SrcNumElts) == 0) {
unsigned DestSrcRatio = DestNumElts / SrcNumElts;		unsigned DestSrcRatio = DestNumElts / SrcNumElts;
if ((NVT.getVectorNumElements() % DestSrcRatio) == 0) {		if ((NVT.getVectorNumElements() % DestSrcRatio) == 0) {
unsigned NewExtNumElts = NVT.getVectorNumElements() / DestSrcRatio;		ElementCount NewExtEC = NVT.getVectorElementCount() / DestSrcRatio;
		efriedmaUnsubmitted Not Done Reply Inline Actions Does this math work correctly if we're extracting a fixed vector from a scalable vector? efriedma: Does this math work correctly if we're extracting a fixed vector from a scalable vector?
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions Yes. To be sure, I tested this with some intrinsic that maps to EXTRACT_SUBVECTOR: define <2 x i64> @extract_2i64_nxv16i8(<vscale x 16 x i8> %z0) { %z0_bc = bitcast <vscale x 16 x i8> %z0 to <vscale x 2 x i64> %ext = call <2 x i64> @llvm.experimental.extractsubvec.v2i64.nxv2i64(<vscale x 2 x i64> %z0_bc, i32 2) ret <2 x i64> %ext } => Optimized lowered selection DAG: %bb.0 'extract_2i64_nxv16i8:' SelectionDAG has 9 nodes: t0: ch = EntryToken t2: nxv16i8,ch = CopyFromReg t0, Register:nxv16i8 %0 t10: v16i8 = extract_subvector t2, Constant:i64<16> t11: v2i64 = bitcast t10 t7: ch,glue = CopyToReg t0, Register:v2i64 $q0, t11 t8: ch = AArch64ISD::RET_FLAG t7, Register:v2i64 $q0, t7:1 and for the other: define <16 x i8> @extract_16i8_nxv2i64(<vscale x 2 x i64> %z0) { %z0_bc = bitcast <vscale x 2 x i64> %z0 to <vscale x 16 x i8> %ext = call <16 x i8> @llvm.experimental.extractsubvec.v16i8.nxv16i8(<vscale x 16 x i8> %z0_bc, i32 16) ret <16 x i8> %ext } => Optimized lowered selection DAG: %bb.0 'extract_16i8_nxv2i64:' SelectionDAG has 9 nodes: t0: ch = EntryToken t2: nxv2i64,ch = CopyFromReg t0, Register:nxv2i64 %0 t10: v2i64 = extract_subvector t2, Constant:i64<2> t11: v16i8 = bitcast t10 t7: ch,glue = CopyToReg t0, Register:v16i8 $q0, t11 t8: ch = AArch64ISD::RET_FLAG t7, Register:v16i8 $q0, t7:1 I wasn't planning to add intrinsic as part of this patch to test the behaviour. sdesmalen: Yes. To be sure, I tested this with some intrinsic that maps to EXTRACT_SUBVECTOR: ```define <2…
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm specifically concerned about cases where the number of lanes in the output fixed vector is greater than the number of lanes in the input scalable vector. efriedma: I'm specifically concerned about cases where the number of lanes in the output fixed vector is…
		efriedmaUnsubmitted Not Done Reply Inline Actions Actually, hmm, I think that's fine; the math operations doesn't care about the total number of elements in the output. efriedma: Actually, hmm, I think that's fine; the math operations doesn't care about the total number of…
EVT ScalarVT = SrcVT.getScalarType();		EVT ScalarVT = SrcVT.getScalarType();
if ((ExtIdx % DestSrcRatio) == 0) {		if ((ExtIdx % DestSrcRatio) == 0) {
SDLoc DL(N);		SDLoc DL(N);
unsigned IndexValScaled = ExtIdx / DestSrcRatio;		unsigned IndexValScaled = ExtIdx / DestSrcRatio;
EVT NewExtVT =		EVT NewExtVT =
EVT::getVectorVT(*DAG.getContext(), ScalarVT, NewExtNumElts);		EVT::getVectorVT(*DAG.getContext(), ScalarVT, NewExtEC);
if (TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, NewExtVT)) {		if (TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, NewExtVT)) {
SDValue NewIndex = DAG.getVectorIdxConstant(IndexValScaled, DL);		SDValue NewIndex = DAG.getVectorIdxConstant(IndexValScaled, DL);
SDValue NewExtract =		SDValue NewExtract =
DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,		DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,
V.getOperand(0), NewIndex);		V.getOperand(0), NewIndex);
return DAG.getBitcast(NVT, NewExtract);		return DAG.getBitcast(NVT, NewExtract);
}		}
if (NewExtNumElts == 1 &&		if (NewExtEC == 1 &&
TLI.isOperationLegalOrCustom(ISD::EXTRACT_VECTOR_ELT, ScalarVT)) {		TLI.isOperationLegalOrCustom(ISD::EXTRACT_VECTOR_ELT, ScalarVT)) {
SDValue NewIndex = DAG.getVectorIdxConstant(IndexValScaled, DL);		SDValue NewIndex = DAG.getVectorIdxConstant(IndexValScaled, DL);
SDValue NewExtract =		SDValue NewExtract =
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ScalarVT,		DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ScalarVT,
V.getOperand(0), NewIndex);		V.getOperand(0), NewIndex);
return DAG.getBitcast(NVT, NewExtract);		return DAG.getBitcast(NVT, NewExtract);
}		}
}		}
▲ Show 20 Lines • Show All 2,742 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,850 Lines • ▼ Show 20 Lines	case Intrinsic::aarch64_sve_st1_scatter_scalar_offset:
return performScatterStoreCombine(N, DAG, AArch64ISD::SST1_IMM_PRED);		return performScatterStoreCombine(N, DAG, AArch64ISD::SST1_IMM_PRED);
case Intrinsic::aarch64_sve_tuple_get: {		case Intrinsic::aarch64_sve_tuple_get: {
SDLoc DL(N);		SDLoc DL(N);
SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
SDValue Src1 = N->getOperand(2);		SDValue Src1 = N->getOperand(2);
SDValue Idx = N->getOperand(3);		SDValue Idx = N->getOperand(3);

uint64_t IdxConst = cast<ConstantSDNode>(Idx)->getZExtValue();		uint64_t IdxConst = cast<ConstantSDNode>(Idx)->getZExtValue();
if (IdxConst > Src1->getNumOperands() - 1)
report_fatal_error("index larger than expected");

EVT ResVT = N->getValueType(0);		EVT ResVT = N->getValueType(0);
uint64_t NumLanes = ResVT.getVectorElementCount().Min;		uint64_t NumLanes = ResVT.getVectorElementCount().Min;
SDValue Val =		SDValue Val =
DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResVT, Src1,		DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResVT, Src1,
DAG.getConstant(IdxConst * NumLanes, DL, MVT::i32));		DAG.getConstant(IdxConst * NumLanes, DL, MVT::i32));
return DAG.getMergeValues({Val, Chain}, DL);		return DAG.getMergeValues({Val, Chain}, DL);
}		}
case Intrinsic::aarch64_sve_tuple_set: {		case Intrinsic::aarch64_sve_tuple_set: {
▲ Show 20 Lines • Show All 1,062 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-extract-subvector.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				; Test that DAGCombiner doesn't drop the scalable flag when it tries to fold:
				; extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')

				define <vscale x 16 x i8> @extract_nxv16i8_nxv4i64(<vscale x 4 x i64> %z0_z1) {
				; CHECK-LABEL: extract_nxv16i8_nxv4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%z0_z1_bc = bitcast <vscale x 4 x i64> %z0_z1 to <vscale x 32 x i8>
				%ext = call <vscale x 16 x i8> @llvm.aarch64.sve.tuple.get.nxv32i8(<vscale x 32 x i8> %z0_z1_bc, i32 1)
				ret <vscale x 16 x i8> %ext
				}


				define <vscale x 2 x i64> @extract_nxv2i64_nxv32i8(<vscale x 32 x i8> %z0_z1) {
				; CHECK-LABEL: extract_nxv2i64_nxv32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%z0_z1_bc = bitcast <vscale x 32 x i8> %z0_z1 to <vscale x 4 x i64>
				%ext = call <vscale x 2 x i64> @llvm.aarch64.sve.tuple.get.nxv4i64(<vscale x 4 x i64> %z0_z1_bc, i32 1)
				ret <vscale x 2 x i64> %ext
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.tuple.get.nxv4i64(<vscale x 4 x i64>, i32)
				declare <vscale x 16 x i8> @llvm.aarch64.sve.tuple.get.nxv32i8(<vscale x 32 x i8>, i32)

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTORClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 274601

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-extract-subvector.ll

[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
ClosedPublic