This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
3/3
ISDOpcodes.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
SelectionDAG.cpp
-
SelectionDAGBuilder.cpp
-
Target/AArch64/
-
AArch64/
2
AArch64ISelLowering.cpp
-
AArch64InstrFormats.td
-
AArch64InstrInfo.td
-
test/CodeGen/
-
CodeGen/
-
AArch64/
4
vecreduce-and-legalization.ll
-
RISCV/rvv/
-
rvv/
-
extract-subvector.ll

Differential D97459

[CodeGen] Fix issues with subvector intrinsic index types
ClosedPublic

Authored by frasercrmck on Feb 25 2021, 3:39 AM.

Download Raw Diff

Details

Reviewers

craig.topper
david-arm
sdesmalen
efriedma
kmclaughlin
rengolin

Commits

rG6718fda6ada8: [CodeGen] Fix issues with subvector intrinsic index types

Summary

This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frasercrmck created this revision.Feb 25 2021, 3:39 AM

Herald added subscribers: luismarques, apazos, sameer.abuasal and 20 others. · View Herald TranscriptFeb 25 2021, 3:39 AM

frasercrmck requested review of this revision.Feb 25 2021, 3:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2021, 3:39 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Btw, there's a line in AArch64InstrInfo.td which might now be redundant: defm : InsertSubvectorUndef<i32>; since I take it with this patch there's not going to be i32 indices being thrown around.

Harbormaster completed remote builds in B90789: Diff 326343.Feb 25 2021, 3:56 AM

Hi @frasercrmck thanks for fixing this, this seems like an oversight when adding the intrinsic.

llvm/include/llvm/CodeGen/ISDOpcodes.h
1049–1050	nit: unrelated change.
llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll
104	Probably not something that's caused by your patch, but I would have expected that elements `v1.b[12], v1.b[14] and v1.b[15]` to also be set to `-1`?

frasercrmck added inline comments.Feb 25 2021, 4:44 AM

llvm/include/llvm/CodeGen/ISDOpcodes.h

1049–1050

True. It's something arcanist prompted me about when creating the patch. Should probably just be pre-committed?

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll

104

Perhaps. I'll admit I don't know much about AArch64 but that makes sense. My patch has taken us from:

t0: ch = EntryToken
t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
t29: v8i8 = extract_subvector t21, Constant:i64<0>
t84: v16i8 = insert_subvector undef:v16i8, t29, Constant:i32<0>
t86: i32 = extract_vector_elt t84, Constant:i64<6>
// and other extracts from t84 of indices < 8
// use t21 elsewhere

t0: ch = EntryToken
t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
t86: i32 = extract_vector_elt t2, Constant:i64<6>
// and other extracts from t2 of indices < 8
// use t21 elsewhere

Which I think is a good thing, since there was previously a redundant extract/insert going on, and a dependency on a vector whose elements different from t2 weren't demanded.

Now as to *why* this is happening, there's this code in SelectionDAG.cpp when creating an ISD::INSERT_SUBVECTOR:

// If this is an insert of an extracted vector into an undef vector, we
// can just use the input to the extract.
if (N1.isUndef() && N2.getOpcode() == ISD::EXTRACT_SUBVECTOR &&
    N2.getOperand(1) == N3 && N2.getOperand(0).getValueType() == VT)
  return N2.getOperand(0);

And I think it would be here N2.getOperand(1) == N3 failed because you were mixing i32 and i64 types between insertion and extraction.

frasercrmck added inline comments.Feb 25 2021, 4:46 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8018	I've just noticed that you may now be able to use `SelectionDAG::WidenVector` here? That's definitely outside the scope of this patch, though.

frasercrmck added inline comments.Feb 25 2021, 5:06 AM

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll
104	Why the insert to element `12` is being removed, I'm not sure. Replacing.2 t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12> With: t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11> That's pretty odd, but then again there's nothing in elements 8, 14, or 15. It seems to be an extension of what is happening to the other 3 elements before my patch: Replacing.2 t25: v16i8 = insert_vector_elt t23, Constant:i32<-1>, Constant:i64<15> With: t23: v16i8 = insert_vector_elt t21, Constant:i32<-1>, Constant:i64<14> Since the test is doing `v8i8 and` of the low half of a vector with this upper half filled with `-1`s, presumably it can tell at some point that it's redundant and it can just use the low half. I don't know why it doesn't remove all of the `-1`s though.

frasercrmck mentioned this in D97428: [RISCV] Add intrinsics for vlmul_ext and vlmul_trunc..Feb 25 2021, 6:54 AM

That one test is either already broken, or I'm missing some important detail. In any case, the changes in this patch look good to me.

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll

104

If I visualise what SelectionDAG was/is trying to do here:

reduce.and(<e0, e1, e2, e3, e4, e5, e6, e7, e8, ??, ??, ??, ??, ??, ??, ??>)
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
                                           must each be -1 for the AND reduction
=>
           <e0, e1, e2, e3, e4, e5, e6, e7, e8, -1, -1, -1, -1, -1, -1, -1>

=> this is somehow optimized to:
           <e0, e1, e2, e3, e4, e5, e6, e7, e8, -1, -1, -1, ??, -1, ??, ??>

it then extracts the Lo/Hi part of the vector,:
           Lo = <e0, e1, e2, e3, e4, e5, e6, e7>
           Hi = <e8, -1, -1, -1, ??, -1, ??, ??>

Performs a vector-AND:
       AND(<e0, e1, e2, e3, e4, e5, e6, e7>,
           <e8, -1, -1, -1, ??, -1, ??, ??>)
       <=>
           <aa, bb, cc, dd, ??, ff, ??, ??>

And then does an element-wise reduction:
  ...
    AND(??,       
      AND(dd,
        AND(cc,
          AND(aa, bb)

The ?? elements could still be anything, so if any of these elements is 0, then the reduction result is also 0. So unless I'm missing something, there seems to be something broken here, already before your patch.

The inserts of -1 into element 14 and 15 were removed by https://github.com/llvm/llvm-project/commit/fc2b4a02b1a82c40ac1459cd15b9911ebfc78acc.

(the output from selectiondag before this patch:

Vector/type-legalized selection DAG: %bb.0 'test_v9i8:'
SelectionDAG has 48 nodes:
  t0: ch = EntryToken
                t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
              t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
            t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
          t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
        t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
      t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
    t23: v16i8 = insert_vector_elt t21, Constant:i32<-1>, Constant:i64<14>
  t25: v16i8 = insert_vector_elt t23, Constant:i32<-1>, Constant:i64<15>
                  t56: i32 = extract_vector_elt t32, Constant:i64<0>
                  t57: i32 = extract_vector_elt t32, Constant:i64<1>
                t58: i32 = and t56, t57
                t59: i32 = extract_vector_elt t32, Constant:i64<2>
              t60: i32 = and t58, t59
              t61: i32 = extract_vector_elt t32, Constant:i64<3>
            t62: i32 = and t60, t61
            t63: i32 = extract_vector_elt t32, Constant:i64<4>
          t64: i32 = and t62, t63
          t65: i32 = extract_vector_elt t32, Constant:i64<5>
        t66: i32 = and t64, t65
        t67: i32 = extract_vector_elt t32, Constant:i64<6>
      t68: i32 = and t66, t67
      t69: i32 = extract_vector_elt t32, Constant:i64<7>
    t70: i32 = and t68, t69
  t8: ch,glue = CopyToReg t0, Register:i32 $w0, t70
    t29: v8i8 = extract_subvector t25, Constant:i64<0>
    t31: v8i8 = extract_subvector t25, Constant:i64<8>
  t32: v8i8 = and t29, t31
  t9: ch = AArch64ISD::RET_FLAG t8, Register:i32 $w0, t8:1


Optimized vector-legalized selection DAG: %bb.0 'test_v9i8:'
SelectionDAG has 44 nodes:
  t0: ch = EntryToken
            t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
          t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
        t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
      t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
    t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
  t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
                  t56: i32 = extract_vector_elt t32, Constant:i64<0>
                  t57: i32 = extract_vector_elt t32, Constant:i64<1>
                t58: i32 = and t56, t57
                t59: i32 = extract_vector_elt t32, Constant:i64<2>
              t60: i32 = and t58, t59
              t61: i32 = extract_vector_elt t32, Constant:i64<3>
            t62: i32 = and t60, t61
            t75: i32 = extract_vector_elt t29, Constant:i64<4>
          t64: i32 = and t62, t75
          t65: i32 = extract_vector_elt t32, Constant:i64<5>
        t66: i32 = and t64, t65
        t72: i32 = extract_vector_elt t29, Constant:i64<6>
      t68: i32 = and t66, t72
      t71: i32 = extract_vector_elt t29, Constant:i64<7>
    t70: i32 = and t68, t71
  t8: ch,glue = CopyToReg t0, Register:i32 $w0, t70
  t29: v8i8 = extract_subvector t21, Constant:i64<0>
    t74: v8i8 = extract_subvector t21, Constant:i64<8>
  t32: v8i8 = and t29, t74
  t9: ch = AArch64ISD::RET_FLAG t8, Register:i32 $w0, t8:1

This revision is now accepted and ready to land.Feb 25 2021, 7:27 AM

sdesmalen added inline comments.Feb 25 2021, 7:44 AM

llvm/include/llvm/CodeGen/ISDOpcodes.h
1049–1050	Either that, or you can just ignore it.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8018	Yes, you're probably right, good catch.

rebase on pre-committed unrelated format change

frasercrmck marked 2 inline comments as done.Feb 25 2021, 8:06 AM

Harbormaster completed remote builds in B90824: Diff 326395.Feb 25 2021, 9:19 AM

frasercrmck edited the summary of this revision. (Show Details)Feb 26 2021, 8:43 AM

rebase

Harbormaster completed remote builds in B91298: Diff 327054.Mar 1 2021, 2:32 AM

This revision was landed with ongoing or failed builds.Mar 1 2021, 2:34 AM

Closed by commit rG6718fda6ada8: [CodeGen] Fix issues with subvector intrinsic index types (authored by frasercrmck). · Explain Why

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rG6718fda6ada8: [CodeGen] Fix issues with subvector intrinsic index types.

frasercrmck mentioned this in D99655: [RISCV] Test llvm.experimental.vector.insert intrinsics on RV32.Mar 31 2021, 5:08 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

ISDOpcodes.h

4 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

7 lines

SelectionDAGBuilder.cpp

15 lines

Target/

AArch64/

AArch64ISelLowering.cpp

2 lines

AArch64InstrFormats.td

2 lines

AArch64InstrInfo.td

18 lines

test/

CodeGen/

AArch64/

vecreduce-and-legalization.ll

12 lines

RISCV/

rvv/

extract-subvector.ll

1 line

Diff 326395

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 514 Lines • ▼ Show 20 Lines	enum NodeType {
/// INSERT_SUBVECTOR(VECTOR1, VECTOR2, IDX) - Returns a vector with VECTOR2		/// INSERT_SUBVECTOR(VECTOR1, VECTOR2, IDX) - Returns a vector with VECTOR2
/// inserted into VECTOR1. IDX represents the starting element number at which		/// inserted into VECTOR1. IDX represents the starting element number at which
/// VECTOR2 will be inserted. IDX must be a constant multiple of T's known		/// VECTOR2 will be inserted. IDX must be a constant multiple of T's known
/// minimum vector length. Let the type of VECTOR2 be T, then if T is a		/// minimum vector length. Let the type of VECTOR2 be T, then if T is a
/// scalable vector, IDX is first scaled by the runtime scaling factor of T.		/// scalable vector, IDX is first scaled by the runtime scaling factor of T.
/// The elements of VECTOR1 starting at IDX are overwritten with VECTOR2.		/// The elements of VECTOR1 starting at IDX are overwritten with VECTOR2.
/// Elements IDX through (IDX + num_elements(T) - 1) must be valid VECTOR1		/// Elements IDX through (IDX + num_elements(T) - 1) must be valid VECTOR1
/// indices. If this condition cannot be determined statically but is false at		/// indices. If this condition cannot be determined statically but is false at
/// runtime, then the result vector is undefined.		/// runtime, then the result vector is undefined. The IDX parameter must be a
		/// vector index constant type, which for most targets will be an integer
		/// pointer type.
///		///
/// This operation supports inserting a fixed-width vector into a scalable		/// This operation supports inserting a fixed-width vector into a scalable
/// vector, but not the other way around.		/// vector, but not the other way around.
INSERT_SUBVECTOR,		INSERT_SUBVECTOR,

/// EXTRACT_SUBVECTOR(VECTOR, IDX) - Returns a subvector from VECTOR.		/// EXTRACT_SUBVECTOR(VECTOR, IDX) - Returns a subvector from VECTOR.
/// Let the result type be T, then IDX represents the starting element number		/// Let the result type be T, then IDX represents the starting element number
/// from which a subvector of type T is extracted. IDX must be a constant		/// from which a subvector of type T is extracted. IDX must be a constant
▲ Show 20 Lines • Show All 507 Lines • ▼ Show 20 Lines
ADJUST_TRAMPOLINE,		ADJUST_TRAMPOLINE,

/// TRAP - Trapping instruction		/// TRAP - Trapping instruction
TRAP,		TRAP,

/// DEBUGTRAP - Trap intended to get the attention of a debugger.		/// DEBUGTRAP - Trap intended to get the attention of a debugger.
DEBUGTRAP,		DEBUGTRAP,

/// UBSANTRAP - Trap with an immediate describing the kind of sanitizer		/// UBSANTRAP - Trap with an immediate describing the kind of sanitizer
/// failure.		/// failure.
		sdesmalenUnsubmitted Done Reply Inline Actions nit: unrelated change. sdesmalen: nit: unrelated change.
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions True. It's something `arcanist` prompted me about when creating the patch. Should probably just be pre-committed? frasercrmck: True. It's something `arcanist` prompted me about when creating the patch. Should probably just…
		sdesmalenUnsubmitted Done Reply Inline Actions Either that, or you can just ignore it. sdesmalen: Either that, or you can just ignore it.
UBSANTRAP,		UBSANTRAP,

/// PREFETCH - This corresponds to a prefetch intrinsic. The first operand		/// PREFETCH - This corresponds to a prefetch intrinsic. The first operand
/// is the chain. The other operands are the address to prefetch,		/// is the chain. The other operands are the address to prefetch,
/// read / write specifier, locality specifier and instruction / data cache		/// read / write specifier, locality specifier and instruction / data cache
/// specifier.		/// specifier.
PREFETCH,		PREFETCH,

▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,592 Lines • ▼ Show 20 Lines	assert((VT.isScalableVector() != N1VT.isScalableVector() \|\|
VT.getVectorMinNumElements() <= N1VT.getVectorMinNumElements()) &&		VT.getVectorMinNumElements() <= N1VT.getVectorMinNumElements()) &&
"Extract subvector must be from larger vector to smaller vector!");		"Extract subvector must be from larger vector to smaller vector!");
assert(N2C && "Extract subvector index must be a constant");		assert(N2C && "Extract subvector index must be a constant");
assert((VT.isScalableVector() != N1VT.isScalableVector() \|\|		assert((VT.isScalableVector() != N1VT.isScalableVector() \|\|
(VT.getVectorMinNumElements() + N2C->getZExtValue()) <=		(VT.getVectorMinNumElements() + N2C->getZExtValue()) <=
N1VT.getVectorMinNumElements()) &&		N1VT.getVectorMinNumElements()) &&
"Extract subvector overflow!");		"Extract subvector overflow!");
assert(N2C->getAPIntValue().getBitWidth() ==		assert(N2C->getAPIntValue().getBitWidth() ==
TLI->getVectorIdxTy(getDataLayout())		TLI->getVectorIdxTy(getDataLayout()).getFixedSizeInBits() &&
.getSizeInBits()
.getFixedSize() &&
"Constant index for EXTRACT_SUBVECTOR has an invalid size");		"Constant index for EXTRACT_SUBVECTOR has an invalid size");

// Trivial extraction.		// Trivial extraction.
if (VT == N1VT)		if (VT == N1VT)
return N1;		return N1;

// EXTRACT_SUBVECTOR of an UNDEF is an UNDEF.		// EXTRACT_SUBVECTOR of an UNDEF is an UNDEF.
if (N1.isUndef())		if (N1.isUndef())
▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	assert((VT.isScalableVector() != N2VT.isScalableVector() \|\|
"Insert subvector must be from smaller vector to larger vector!");		"Insert subvector must be from smaller vector to larger vector!");
assert(isa<ConstantSDNode>(N3) &&		assert(isa<ConstantSDNode>(N3) &&
"Insert subvector index must be constant");		"Insert subvector index must be constant");
assert((VT.isScalableVector() != N2VT.isScalableVector() \|\|		assert((VT.isScalableVector() != N2VT.isScalableVector() \|\|
(N2VT.getVectorMinNumElements() +		(N2VT.getVectorMinNumElements() +
cast<ConstantSDNode>(N3)->getZExtValue()) <=		cast<ConstantSDNode>(N3)->getZExtValue()) <=
VT.getVectorMinNumElements()) &&		VT.getVectorMinNumElements()) &&
"Insert subvector overflow!");		"Insert subvector overflow!");
		assert(cast<ConstantSDNode>(N3)->getAPIntValue().getBitWidth() ==
		TLI->getVectorIdxTy(getDataLayout()).getFixedSizeInBits() &&
		"Constant index for INSERT_SUBVECTOR has an invalid size");

// Trivial insertion.		// Trivial insertion.
if (VT == N2VT)		if (VT == N2VT)
return N2;		return N2;

// If this is an insert of an extracted vector into an undef vector, we		// If this is an insert of an extracted vector into an undef vector, we
// can just use the input to the extract.		// can just use the input to the extract.
if (N1.isUndef() && N2.getOpcode() == ISD::EXTRACT_SUBVECTOR &&		if (N1.isUndef() && N2.getOpcode() == ISD::EXTRACT_SUBVECTOR &&
▲ Show 20 Lines • Show All 4,407 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,003 Lines • ▼ Show 20 Lines	case Intrinsic::get_active_lane_mask: {
return;		return;
}		}
case Intrinsic::experimental_vector_insert: {		case Intrinsic::experimental_vector_insert: {
auto DL = getCurSDLoc();		auto DL = getCurSDLoc();

SDValue Vec = getValue(I.getOperand(0));		SDValue Vec = getValue(I.getOperand(0));
SDValue SubVec = getValue(I.getOperand(1));		SDValue SubVec = getValue(I.getOperand(1));
SDValue Index = getValue(I.getOperand(2));		SDValue Index = getValue(I.getOperand(2));

		// The intrinsic's index type is i64, but the SDNode requires an index type
		// suitable for the target. Convert the index as required.
		MVT VectorIdxTy = TLI.getVectorIdxTy(DAG.getDataLayout());
		if (Index.getValueType() != VectorIdxTy)
		Index = DAG.getVectorIdxConstant(
		cast<ConstantSDNode>(Index)->getZExtValue(), DL);

EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
setValue(&I, DAG.getNode(ISD::INSERT_SUBVECTOR, DL, ResultVT, Vec, SubVec,		setValue(&I, DAG.getNode(ISD::INSERT_SUBVECTOR, DL, ResultVT, Vec, SubVec,
Index));		Index));
return;		return;
}		}
case Intrinsic::experimental_vector_extract: {		case Intrinsic::experimental_vector_extract: {
auto DL = getCurSDLoc();		auto DL = getCurSDLoc();

SDValue Vec = getValue(I.getOperand(0));		SDValue Vec = getValue(I.getOperand(0));
SDValue Index = getValue(I.getOperand(1));		SDValue Index = getValue(I.getOperand(1));
EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());

		// The intrinsic's index type is i64, but the SDNode requires an index type
		// suitable for the target. Convert the index as required.
		MVT VectorIdxTy = TLI.getVectorIdxTy(DAG.getDataLayout());
		if (Index.getValueType() != VectorIdxTy)
		Index = DAG.getVectorIdxConstant(
		cast<ConstantSDNode>(Index)->getZExtValue(), DL);

setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));		setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));
return;		return;
}		}
case Intrinsic::experimental_vector_reverse:		case Intrinsic::experimental_vector_reverse:
visitVectorReverse(I);		visitVectorReverse(I);
return;		return;
}		}
}		}
▲ Show 20 Lines • Show All 3,849 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,009 Lines • ▼ Show 20 Lines
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// AArch64 Advanced SIMD Support			// AArch64 Advanced SIMD Support
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// WidenVector - Given a value in the V64 register class, produce the			/// WidenVector - Given a value in the V64 register class, produce the
	/// equivalent value in the V128 register class.			/// equivalent value in the V128 register class.
	static SDValue WidenVector(SDValue V64Reg, SelectionDAG &DAG) {			static SDValue WidenVector(SDValue V64Reg, SelectionDAG &DAG) {
				frasercrmckAuthorUnsubmitted Not Done Reply Inline Actions I've just noticed that you may now be able to use `SelectionDAG::WidenVector` here? That's definitely outside the scope of this patch, though. frasercrmck: I've just noticed that you may now be able to use `SelectionDAG::WidenVector` here? That's…
				sdesmalenUnsubmitted Not Done Reply Inline Actions Yes, you're probably right, good catch. sdesmalen: Yes, you're probably right, good catch.
	EVT VT = V64Reg.getValueType();			EVT VT = V64Reg.getValueType();
	unsigned NarrowSize = VT.getVectorNumElements();			unsigned NarrowSize = VT.getVectorNumElements();
	MVT EltTy = VT.getVectorElementType().getSimpleVT();			MVT EltTy = VT.getVectorElementType().getSimpleVT();
	MVT WideTy = MVT::getVectorVT(EltTy, 2 * NarrowSize);			MVT WideTy = MVT::getVectorVT(EltTy, 2 * NarrowSize);
	SDLoc DL(V64Reg);			SDLoc DL(V64Reg);

	return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, WideTy, DAG.getUNDEF(WideTy),			return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, WideTy, DAG.getUNDEF(WideTy),
	V64Reg, DAG.getConstant(0, DL, MVT::i32));			V64Reg, DAG.getConstant(0, DL, MVT::i64));
	}			}

	/// getExtFactor - Determine the adjustment factor for the position when			/// getExtFactor - Determine the adjustment factor for the position when
	/// generating an "extract from vector registers" instruction.			/// generating an "extract from vector registers" instruction.
	static unsigned getExtFactor(SDValue &V) {			static unsigned getExtFactor(SDValue &V) {
	EVT EltType = V.getValueType().getVectorElementType();			EVT EltType = V.getValueType().getVectorElementType();
	return EltType.getSizeInBits() / 8;			return EltType.getSizeInBits() / 8;
	}			}
	▲ Show 20 Lines • Show All 9,293 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,496 Lines • ▼ Show 20 Lines	def : Pat<(i32 (Accum (i32 FPR32Op:$Rd),
(i32 (vector_extract		(i32 (vector_extract
(v4i32 (insert_subvector		(v4i32 (insert_subvector
(undef),		(undef),
(v2i32 (int_aarch64_neon_sqrdmulh		(v2i32 (int_aarch64_neon_sqrdmulh
(v2i32 V64:$Rn),		(v2i32 V64:$Rn),
(v2i32 (AArch64duplane32		(v2i32 (AArch64duplane32
(v4i32 V128:$Rm),		(v4i32 V128:$Rm),
VectorIndexS:$idx)))),		VectorIndexS:$idx)))),
(i32 0))),		(i64 0))),
(i64 0))))),		(i64 0))))),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(v2i32 (!cast<Instruction>(NAME # v2i32_indexed)		(v2i32 (!cast<Instruction>(NAME # v2i32_indexed)
(v2i32 (INSERT_SUBREG (v2i32 (IMPLICIT_DEF)),		(v2i32 (INSERT_SUBREG (v2i32 (IMPLICIT_DEF)),
FPR32Op:$Rd,		FPR32Op:$Rd,
ssub)),		ssub)),
V64:$Rn,		V64:$Rn,
V128:$Rm,		V128:$Rm,
▲ Show 20 Lines • Show All 801 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,601 Lines • ▼ Show 20 Lines	def : Pat<(v8i16 (opNode V128:$Rn)),
(!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) V128:$Rn), hsub)>;		(!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) V128:$Rn), hsub)>;
def : Pat<(v4i32 (opNode V128:$Rn)),		def : Pat<(v4i32 (opNode V128:$Rn)),
(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),		(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v4i32v")) V128:$Rn), ssub)>;		(!cast<Instruction>(!strconcat(baseOpc, "v4i32v")) V128:$Rn), ssub)>;


// If none did, fallback to the explicit patterns, consuming the vector_extract.		// If none did, fallback to the explicit patterns, consuming the vector_extract.
def : Pat<(i32 (vector_extract (insert_subvector undef, (v8i8 (opNode V64:$Rn)),		def : Pat<(i32 (vector_extract (insert_subvector undef, (v8i8 (opNode V64:$Rn)),
(i32 0)), (i64 0))),		(i64 0)), (i64 0))),
(EXTRACT_SUBREG (INSERT_SUBREG (v8i8 (IMPLICIT_DEF)),		(EXTRACT_SUBREG (INSERT_SUBREG (v8i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) V64:$Rn),		(!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) V64:$Rn),
bsub), ssub)>;		bsub), ssub)>;
def : Pat<(i32 (vector_extract (v16i8 (opNode V128:$Rn)), (i64 0))),		def : Pat<(i32 (vector_extract (v16i8 (opNode V128:$Rn)), (i64 0))),
(EXTRACT_SUBREG (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(EXTRACT_SUBREG (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) V128:$Rn),		(!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) V128:$Rn),
bsub), ssub)>;		bsub), ssub)>;
def : Pat<(i32 (vector_extract (insert_subvector undef,		def : Pat<(i32 (vector_extract (insert_subvector undef,
(v4i16 (opNode V64:$Rn)), (i32 0)), (i64 0))),		(v4i16 (opNode V64:$Rn)), (i64 0)), (i64 0))),
(EXTRACT_SUBREG (INSERT_SUBREG (v4i16 (IMPLICIT_DEF)),		(EXTRACT_SUBREG (INSERT_SUBREG (v4i16 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) V64:$Rn),		(!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) V64:$Rn),
hsub), ssub)>;		hsub), ssub)>;
def : Pat<(i32 (vector_extract (v8i16 (opNode V128:$Rn)), (i64 0))),		def : Pat<(i32 (vector_extract (v8i16 (opNode V128:$Rn)), (i64 0))),
(EXTRACT_SUBREG (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)),		(EXTRACT_SUBREG (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) V128:$Rn),		(!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) V128:$Rn),
hsub), ssub)>;		hsub), ssub)>;
def : Pat<(i32 (vector_extract (v4i32 (opNode V128:$Rn)), (i64 0))),		def : Pat<(i32 (vector_extract (v4i32 (opNode V128:$Rn)), (i64 0))),
(EXTRACT_SUBREG (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),		(EXTRACT_SUBREG (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v4i32v")) V128:$Rn),		(!cast<Instruction>(!strconcat(baseOpc, "v4i32v")) V128:$Rn),
ssub), ssub)>;		ssub), ssub)>;

}		}

multiclass SIMDAcrossLanesSignedIntrinsic<string baseOpc,		multiclass SIMDAcrossLanesSignedIntrinsic<string baseOpc,
SDPatternOperator opNode>		SDPatternOperator opNode>
: SIMDAcrossLanesIntrinsic<baseOpc, opNode> {		: SIMDAcrossLanesIntrinsic<baseOpc, opNode> {
// If there is a sign extension after this intrinsic, consume it as smov already		// If there is a sign extension after this intrinsic, consume it as smov already
// performed it		// performed it
def : Pat<(i32 (sext_inreg (i32 (vector_extract (insert_subvector undef,		def : Pat<(i32 (sext_inreg (i32 (vector_extract (insert_subvector undef,
(opNode (v8i8 V64:$Rn)), (i32 0)), (i64 0))), i8)),		(opNode (v8i8 V64:$Rn)), (i64 0)), (i64 0))), i8)),
(i32 (SMOVvi8to32		(i32 (SMOVvi8to32
(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) V64:$Rn), bsub),		(!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) V64:$Rn), bsub),
(i64 0)))>;		(i64 0)))>;
def : Pat<(i32 (sext_inreg (i32 (vector_extract		def : Pat<(i32 (sext_inreg (i32 (vector_extract
(opNode (v16i8 V128:$Rn)), (i64 0))), i8)),		(opNode (v16i8 V128:$Rn)), (i64 0))), i8)),
(i32 (SMOVvi8to32		(i32 (SMOVvi8to32
(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) V128:$Rn), bsub),		(!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) V128:$Rn), bsub),
(i64 0)))>;		(i64 0)))>;
def : Pat<(i32 (sext_inreg (i32 (vector_extract (insert_subvector undef,		def : Pat<(i32 (sext_inreg (i32 (vector_extract (insert_subvector undef,
(opNode (v4i16 V64:$Rn)), (i32 0)), (i64 0))), i16)),		(opNode (v4i16 V64:$Rn)), (i64 0)), (i64 0))), i16)),
(i32 (SMOVvi16to32		(i32 (SMOVvi16to32
(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) V64:$Rn), hsub),		(!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) V64:$Rn), hsub),
(i64 0)))>;		(i64 0)))>;
def : Pat<(i32 (sext_inreg (i32 (vector_extract		def : Pat<(i32 (sext_inreg (i32 (vector_extract
(opNode (v8i16 V128:$Rn)), (i64 0))), i16)),		(opNode (v8i16 V128:$Rn)), (i64 0))), i16)),
(i32 (SMOVvi16to32		(i32 (SMOVvi16to32
(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) V128:$Rn), hsub),		(!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) V128:$Rn), hsub),
(i64 0)))>;		(i64 0)))>;
}		}

multiclass SIMDAcrossLanesUnsignedIntrinsic<string baseOpc,		multiclass SIMDAcrossLanesUnsignedIntrinsic<string baseOpc,
SDPatternOperator opNode>		SDPatternOperator opNode>
: SIMDAcrossLanesIntrinsic<baseOpc, opNode> {		: SIMDAcrossLanesIntrinsic<baseOpc, opNode> {
// If there is a masking operation keeping only what has been actually		// If there is a masking operation keeping only what has been actually
// generated, consume it.		// generated, consume it.
def : Pat<(i32 (and (i32 (vector_extract (insert_subvector undef,		def : Pat<(i32 (and (i32 (vector_extract (insert_subvector undef,
(opNode (v8i8 V64:$Rn)), (i32 0)), (i64 0))), maski8_or_more)),		(opNode (v8i8 V64:$Rn)), (i64 0)), (i64 0))), maski8_or_more)),
(i32 (EXTRACT_SUBREG		(i32 (EXTRACT_SUBREG
(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) V64:$Rn), bsub),		(!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) V64:$Rn), bsub),
ssub))>;		ssub))>;
def : Pat<(i32 (and (i32 (vector_extract (opNode (v16i8 V128:$Rn)), (i64 0))),		def : Pat<(i32 (and (i32 (vector_extract (opNode (v16i8 V128:$Rn)), (i64 0))),
maski8_or_more)),		maski8_or_more)),
(i32 (EXTRACT_SUBREG		(i32 (EXTRACT_SUBREG
(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) V128:$Rn), bsub),		(!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) V128:$Rn), bsub),
ssub))>;		ssub))>;
def : Pat<(i32 (and (i32 (vector_extract (insert_subvector undef,		def : Pat<(i32 (and (i32 (vector_extract (insert_subvector undef,
(opNode (v4i16 V64:$Rn)), (i32 0)), (i64 0))), maski16_or_more)),		(opNode (v4i16 V64:$Rn)), (i64 0)), (i64 0))), maski16_or_more)),
(i32 (EXTRACT_SUBREG		(i32 (EXTRACT_SUBREG
(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
(!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) V64:$Rn), hsub),		(!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) V64:$Rn), hsub),
ssub))>;		ssub))>;
def : Pat<(i32 (and (i32 (vector_extract (opNode (v8i16 V128:$Rn)), (i64 0))),		def : Pat<(i32 (and (i32 (vector_extract (opNode (v8i16 V128:$Rn)), (i64 0))),
maski16_or_more)),		maski16_or_more)),
(i32 (EXTRACT_SUBREG		(i32 (EXTRACT_SUBREG
(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),		(INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	multiclass FMLSIndexedAfterNegPatterns<SDPatternOperator OpNode> {
def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),		def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),
(AArch64duplane32 (v4f32 (fneg V128:$Rm)),		(AArch64duplane32 (v4f32 (fneg V128:$Rm)),
VectorIndexS:$idx))),		VectorIndexS:$idx))),
(FMLSv2i32_indexed V64:$Rd, V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;		(FMLSv2i32_indexed V64:$Rd, V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;
def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),		def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),
(v2f32 (AArch64duplane32		(v2f32 (AArch64duplane32
(v4f32 (insert_subvector undef,		(v4f32 (insert_subvector undef,
(v2f32 (fneg V64:$Rm)),		(v2f32 (fneg V64:$Rm)),
(i32 0))),		(i64 0))),
VectorIndexS:$idx)))),		VectorIndexS:$idx)))),
(FMLSv2i32_indexed V64:$Rd, V64:$Rn,		(FMLSv2i32_indexed V64:$Rd, V64:$Rn,
(SUBREG_TO_REG (i32 0), V64:$Rm, dsub),		(SUBREG_TO_REG (i32 0), V64:$Rm, dsub),
VectorIndexS:$idx)>;		VectorIndexS:$idx)>;
def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),		def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),
(AArch64dup (f32 (fneg FPR32Op:$Rm))))),		(AArch64dup (f32 (fneg FPR32Op:$Rm))))),
(FMLSv2i32_indexed V64:$Rd, V64:$Rn,		(FMLSv2i32_indexed V64:$Rd, V64:$Rn,
(SUBREG_TO_REG (i32 0), FPR32Op:$Rm, ssub), (i64 0))>;		(SUBREG_TO_REG (i32 0), FPR32Op:$Rm, ssub), (i64 0))>;

// 3 variants for the .4s version: DUPLANE from 128-bit, DUPLANE from 64-bit		// 3 variants for the .4s version: DUPLANE from 128-bit, DUPLANE from 64-bit
// and DUP scalar.		// and DUP scalar.
def : Pat<(v4f32 (OpNode (v4f32 V128:$Rd), (v4f32 V128:$Rn),		def : Pat<(v4f32 (OpNode (v4f32 V128:$Rd), (v4f32 V128:$Rn),
(AArch64duplane32 (v4f32 (fneg V128:$Rm)),		(AArch64duplane32 (v4f32 (fneg V128:$Rm)),
VectorIndexS:$idx))),		VectorIndexS:$idx))),
(FMLSv4i32_indexed V128:$Rd, V128:$Rn, V128:$Rm,		(FMLSv4i32_indexed V128:$Rd, V128:$Rn, V128:$Rm,
VectorIndexS:$idx)>;		VectorIndexS:$idx)>;
def : Pat<(v4f32 (OpNode (v4f32 V128:$Rd), (v4f32 V128:$Rn),		def : Pat<(v4f32 (OpNode (v4f32 V128:$Rd), (v4f32 V128:$Rn),
(v4f32 (AArch64duplane32		(v4f32 (AArch64duplane32
(v4f32 (insert_subvector undef,		(v4f32 (insert_subvector undef,
(v2f32 (fneg V64:$Rm)),		(v2f32 (fneg V64:$Rm)),
(i32 0))),		(i64 0))),
VectorIndexS:$idx)))),		VectorIndexS:$idx)))),
(FMLSv4i32_indexed V128:$Rd, V128:$Rn,		(FMLSv4i32_indexed V128:$Rd, V128:$Rn,
(SUBREG_TO_REG (i32 0), V64:$Rm, dsub),		(SUBREG_TO_REG (i32 0), V64:$Rm, dsub),
VectorIndexS:$idx)>;		VectorIndexS:$idx)>;
def : Pat<(v4f32 (OpNode (v4f32 V128:$Rd), (v4f32 V128:$Rn),		def : Pat<(v4f32 (OpNode (v4f32 V128:$Rd), (v4f32 V128:$Rn),
(AArch64dup (f32 (fneg FPR32Op:$Rm))))),		(AArch64dup (f32 (fneg FPR32Op:$Rm))))),
(FMLSv4i32_indexed V128:$Rd, V128:$Rn,		(FMLSv4i32_indexed V128:$Rd, V128:$Rn,
(SUBREG_TO_REG (i32 0), FPR32Op:$Rm, ssub), (i64 0))>;		(SUBREG_TO_REG (i32 0), FPR32Op:$Rm, ssub), (i64 0))>;
Show All 14 Lines	multiclass FMLSIndexedAfterNegPatterns<SDPatternOperator OpNode> {
def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),		def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),
(vector_extract (v4f32 (fneg V128:$Rm)),		(vector_extract (v4f32 (fneg V128:$Rm)),
VectorIndexS:$idx))),		VectorIndexS:$idx))),
(FMLSv1i32_indexed FPR32:$Rd, FPR32:$Rn,		(FMLSv1i32_indexed FPR32:$Rd, FPR32:$Rn,
V128:$Rm, VectorIndexS:$idx)>;		V128:$Rm, VectorIndexS:$idx)>;
def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),		def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),
(vector_extract (v4f32 (insert_subvector undef,		(vector_extract (v4f32 (insert_subvector undef,
(v2f32 (fneg V64:$Rm)),		(v2f32 (fneg V64:$Rm)),
(i32 0))),		(i64 0))),
VectorIndexS:$idx))),		VectorIndexS:$idx))),
(FMLSv1i32_indexed FPR32:$Rd, FPR32:$Rn,		(FMLSv1i32_indexed FPR32:$Rd, FPR32:$Rn,
(SUBREG_TO_REG (i32 0), V64:$Rm, dsub), VectorIndexS:$idx)>;		(SUBREG_TO_REG (i32 0), V64:$Rm, dsub), VectorIndexS:$idx)>;

// 1 variant for 64-bit scalar version: extract from .1d or from .2d		// 1 variant for 64-bit scalar version: extract from .1d or from .2d
def : Pat<(f64 (OpNode (f64 FPR64:$Rd), (f64 FPR64:$Rn),		def : Pat<(f64 (OpNode (f64 FPR64:$Rd), (f64 FPR64:$Rn),
(vector_extract (v2f64 (fneg V128:$Rm)),		(vector_extract (v2f64 (fneg V128:$Rm)),
VectorIndexS:$idx))),		VectorIndexS:$idx))),
▲ Show 20 Lines • Show All 1,883 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%b = call i8 @llvm.vector.reduce.and.v3i8(<3 x i8> %a)		%b = call i8 @llvm.vector.reduce.and.v3i8(<3 x i8> %a)
ret i8 %b		ret i8 %b
}		}

define i8 @test_v9i8(<9 x i8> %a) nounwind {		define i8 @test_v9i8(<9 x i8> %a) nounwind {
; CHECK-LABEL: test_v9i8:		; CHECK-LABEL: test_v9i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov w8, #-1
; CHECK-NEXT: mov v0.b[9], w8		; CHECK-NEXT: mov v1.16b, v0.16b
; CHECK-NEXT: mov v0.b[10], w8		; CHECK-NEXT: mov v1.b[9], w8
; CHECK-NEXT: mov v0.b[11], w8		; CHECK-NEXT: mov v1.b[10], w8
; CHECK-NEXT: mov v0.b[12], w8		; CHECK-NEXT: mov v1.b[11], w8
; CHECK-NEXT: mov v0.b[13], w8		; CHECK-NEXT: mov v1.b[13], w8
		sdesmalenUnsubmitted Not Done Reply Inline Actions Probably not something that's caused by your patch, but I would have expected that elements `v1.b[12], v1.b[14] and v1.b[15]` to also be set to `-1`? sdesmalen: Probably not something that's caused by your patch, but I would have expected that elements `v1.
		frasercrmckAuthorUnsubmitted Not Done Reply Inline Actions Perhaps. I'll admit I don't know much about AArch64 but that makes sense. My patch has taken us from: t0: ch = EntryToken t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0 t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9> t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10> t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11> t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12> t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13> t29: v8i8 = extract_subvector t21, Constant:i64<0> t84: v16i8 = insert_subvector undef:v16i8, t29, Constant:i32<0> t86: i32 = extract_vector_elt t84, Constant:i64<6> // and other extracts from t84 of indices < 8 // use t21 elsewhere to t0: ch = EntryToken t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0 t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9> t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10> t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11> t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12> t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13> t86: i32 = extract_vector_elt t2, Constant:i64<6> // and other extracts from t2 of indices < 8 // use t21 elsewhere Which I think is a good thing, since there was previously a redundant extract/insert going on, and a dependency on a vector whose elements different from `t2` weren't demanded. Now as to why this is happening, there's this code in `SelectionDAG.cpp` when creating an `ISD::INSERT_SUBVECTOR`: // If this is an insert of an extracted vector into an undef vector, we // can just use the input to the extract. if (N1.isUndef() && N2.getOpcode() == ISD::EXTRACT_SUBVECTOR && N2.getOperand(1) == N3 && N2.getOperand(0).getValueType() == VT) return N2.getOperand(0); And I think it would be here `N2.getOperand(1) == N3` failed because you were mixing `i32` and `i64` types between insertion and extraction. frasercrmck: Perhaps. I'll admit I don't know much about AArch64 but that makes sense. My patch has taken us…
		frasercrmckAuthorUnsubmitted Not Done Reply Inline Actions Why the insert to element `12` is being removed, I'm not sure. Replacing.2 t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12> With: t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11> That's pretty odd, but then again there's nothing in elements 8, 14, or 15. It seems to be an extension of what is happening to the other 3 elements before my patch: Replacing.2 t25: v16i8 = insert_vector_elt t23, Constant:i32<-1>, Constant:i64<15> With: t23: v16i8 = insert_vector_elt t21, Constant:i32<-1>, Constant:i64<14> Since the test is doing `v8i8 and` of the low half of a vector with this upper half filled with `-1`s, presumably it can tell at some point that it's redundant and it can just use the low half. I don't know why it doesn't remove all of the `-1`s though. frasercrmck: Why the insert to element `12` is being removed, I'm not sure. ``` Replacing.2 t19: v16i8 =…
		sdesmalenUnsubmitted Not Done Reply Inline Actions If I visualise what SelectionDAG was/is trying to do here: reduce.and(<e0, e1, e2, e3, e4, e5, e6, e7, e8, ??, ??, ??, ??, ??, ??, ??>) ^^^^^^^^^^^^^^^^^^^^^^^^^^ must each be -1 for the AND reduction => <e0, e1, e2, e3, e4, e5, e6, e7, e8, -1, -1, -1, -1, -1, -1, -1> => this is somehow optimized to: <e0, e1, e2, e3, e4, e5, e6, e7, e8, -1, -1, -1, ??, -1, ??, ??> it then extracts the Lo/Hi part of the vector,: Lo = <e0, e1, e2, e3, e4, e5, e6, e7> Hi = <e8, -1, -1, -1, ??, -1, ??, ??> Performs a vector-AND: AND(<e0, e1, e2, e3, e4, e5, e6, e7>, <e8, -1, -1, -1, ??, -1, ??, ??>) <=> <aa, bb, cc, dd, ??, ff, ??, ??> And then does an element-wise reduction: ... AND(??, AND(dd, AND(cc, AND(aa, bb) The `??` elements could still be anything, so if any of these elements is 0, then the reduction result is also 0. So unless I'm missing something, there seems to be something broken here, already before your patch. The inserts of `-1` into element 14 and 15 were removed by https://github.com/llvm/llvm-project/commit/fc2b4a02b1a82c40ac1459cd15b9911ebfc78acc. (the output from selectiondag before this patch: Vector/type-legalized selection DAG: %bb.0 'test_v9i8:' SelectionDAG has 48 nodes: t0: ch = EntryToken t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0 t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9> t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10> t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11> t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12> t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13> t23: v16i8 = insert_vector_elt t21, Constant:i32<-1>, Constant:i64<14> t25: v16i8 = insert_vector_elt t23, Constant:i32<-1>, Constant:i64<15> t56: i32 = extract_vector_elt t32, Constant:i64<0> t57: i32 = extract_vector_elt t32, Constant:i64<1> t58: i32 = and t56, t57 t59: i32 = extract_vector_elt t32, Constant:i64<2> t60: i32 = and t58, t59 t61: i32 = extract_vector_elt t32, Constant:i64<3> t62: i32 = and t60, t61 t63: i32 = extract_vector_elt t32, Constant:i64<4> t64: i32 = and t62, t63 t65: i32 = extract_vector_elt t32, Constant:i64<5> t66: i32 = and t64, t65 t67: i32 = extract_vector_elt t32, Constant:i64<6> t68: i32 = and t66, t67 t69: i32 = extract_vector_elt t32, Constant:i64<7> t70: i32 = and t68, t69 t8: ch,glue = CopyToReg t0, Register:i32 $w0, t70 t29: v8i8 = extract_subvector t25, Constant:i64<0> t31: v8i8 = extract_subvector t25, Constant:i64<8> t32: v8i8 = and t29, t31 t9: ch = AArch64ISD::RET_FLAG t8, Register:i32 $w0, t8:1 Optimized vector-legalized selection DAG: %bb.0 'test_v9i8:' SelectionDAG has 44 nodes: t0: ch = EntryToken t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0 t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9> t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10> t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11> t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12> t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13> t56: i32 = extract_vector_elt t32, Constant:i64<0> t57: i32 = extract_vector_elt t32, Constant:i64<1> t58: i32 = and t56, t57 t59: i32 = extract_vector_elt t32, Constant:i64<2> t60: i32 = and t58, t59 t61: i32 = extract_vector_elt t32, Constant:i64<3> t62: i32 = and t60, t61 t75: i32 = extract_vector_elt t29, Constant:i64<4> t64: i32 = and t62, t75 t65: i32 = extract_vector_elt t32, Constant:i64<5> t66: i32 = and t64, t65 t72: i32 = extract_vector_elt t29, Constant:i64<6> t68: i32 = and t66, t72 t71: i32 = extract_vector_elt t29, Constant:i64<7> t70: i32 = and t68, t71 t8: ch,glue = CopyToReg t0, Register:i32 $w0, t70 t29: v8i8 = extract_subvector t21, Constant:i64<0> t74: v8i8 = extract_subvector t21, Constant:i64<8> t32: v8i8 = and t29, t74 t9: ch = AArch64ISD::RET_FLAG t8, Register:i32 $w0, t8:1 sdesmalen: If I visualise what SelectionDAG was/is trying to do here: ```reduce.and(<e0, e1, e2, e3, e4…
; CHECK-NEXT: ext v1.16b, v0.16b, v0.16b, #8		; CHECK-NEXT: ext v1.16b, v1.16b, v1.16b, #8
; CHECK-NEXT: and v1.8b, v0.8b, v1.8b		; CHECK-NEXT: and v1.8b, v0.8b, v1.8b
; CHECK-NEXT: umov w8, v1.b[1]		; CHECK-NEXT: umov w8, v1.b[1]
; CHECK-NEXT: umov w9, v1.b[0]		; CHECK-NEXT: umov w9, v1.b[0]
; CHECK-NEXT: and w8, w9, w8		; CHECK-NEXT: and w8, w9, w8
; CHECK-NEXT: umov w9, v1.b[2]		; CHECK-NEXT: umov w9, v1.b[2]
; CHECK-NEXT: and w8, w8, w9		; CHECK-NEXT: and w8, w8, w9
; CHECK-NEXT: umov w9, v1.b[3]		; CHECK-NEXT: umov w9, v1.b[3]
; CHECK-NEXT: and w8, w8, w9		; CHECK-NEXT: and w8, w8, w9
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple riscv32 -mattr=+m,+d,+experimental-zfh,+experimental-v -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple riscv64 -mattr=+m,+d,+experimental-zfh,+experimental-v -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple riscv64 -mattr=+m,+d,+experimental-zfh,+experimental-v -verify-machineinstrs < %s \| FileCheck %s

	define <vscale x 4 x i32> @extract_nxv8i32_nxv4i32_0(<vscale x 8 x i32> %vec) {			define <vscale x 4 x i32> @extract_nxv8i32_nxv4i32_0(<vscale x 8 x i32> %vec) {
	; CHECK-LABEL: extract_nxv8i32_nxv4i32_0:			; CHECK-LABEL: extract_nxv8i32_nxv4i32_0:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: # kill: def $v8m2 killed $v8m2 killed $v8m4			; CHECK-NEXT: # kill: def $v8m2 killed $v8m2 killed $v8m4
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%c = call <vscale x 4 x i32> @llvm.experimental.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> %vec, i64 0)			%c = call <vscale x 4 x i32> @llvm.experimental.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> %vec, i64 0)
	▲ Show 20 Lines • Show All 365 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fix issues with subvector intrinsic index typesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 326395

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll

llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll

[CodeGen] Fix issues with subvector intrinsic index types
ClosedPublic