This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
2/3
DAGCombiner.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
2/3
AArch64ISelLowering.cpp
-
ARM/
-
ARMISelLowering.h
-
ARMISelLowering.cpp
-
X86/
-
X86ISelLowering.h
-
X86ISelLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
arm64-neon-copy.ll
-
ARM/
-
fp16-insert-extract.ll

Differential D115646

[DAG][TLI][X86][ARM][AArch64] Add `isExtractSubvectorFree` / use it in `foldExtractSubvectorFromShuffleVector()`
AbandonedPublic

Authored by lebedev.ri on Dec 13 2021, 9:37 AM.

Download Raw Diff

Details

Reviewers

RKSimon
t.p.northover
greened

Summary

This generalizes the hardcoded profitability check that was just added in D104156.
To my surprize, this does not catch any new cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,230 ms	x64 debian > Clang.OpenMP::atomic_ast_print.cpp
	1,190 ms	x64 debian > Clang.OpenMP::atomic_codegen.cpp
	1,220 ms	x64 debian > Clang.OpenMP::atomic_messages.c
	100 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

lebedev.ri created this revision.Dec 13 2021, 9:37 AM

Herald added subscribers: ecnelises, pengfei, hiraditya, kristof.beyls. · View Herald TranscriptDec 13 2021, 9:37 AM

lebedev.ri requested review of this revision.Dec 13 2021, 9:37 AM

Harbormaster completed remote builds in B139000: Diff 393933.Dec 13 2021, 10:41 AM

Err, actually make use of the hooks by dropping original ad-hoc check.

Harbormaster completed remote builds in B139068: Diff 394034.Dec 13 2021, 2:44 PM

ping

ping
@RKSimon this removes the hardcoded assumptions that the free extraction is for 0'th subvector specifically, as we've discussed in D104156 originally.

Any luck with finding a x86 test case?

We do need an X86 test for this.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20795	The values here make it seem like replacing a non-cheap operation with two cheap ones yields the same performance (1+1 = 2). contrary to the comment. Is this intentional?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12899	So here, it seems like even if the extract is "free" (cost 0) it'll be counted as "cheap" (cost 1). I wonder if it would be better to combine these functions into a single "getExtractSubvectorCost" target hook.

In D115646#3231202, @RKSimon wrote:

Any luck with finding a x86 test case?

In D115646#3231647, @greened wrote:

We do need an X86 test for this.

Sorry, i do not have an x86 test for this..

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20795	Not only does it make seem so, that is exactly what the comment is saying.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12899	So here, it seems like even if the extract is "free" (cost 0) it'll be counted as "cheap" (cost 1). Right. Because even free extractions can be done for cheap, if that makes sense. I wonder if it would be better to combine these functions into a single "getExtractSubvectorCost" target hook. We could do that, there's already `*TTIImpl::getShuffleCost()`, the problem being, what will be the "cheap budget", i.e. how should the uses of `isExtractSubvectorCheap()` be adjusted?

greened added inline comments.Jan 11 2022, 1:35 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20795	It's not what the comment is saying. The comment says "a win" which to my mind means, "performance improvement." If the cost is the same, where's the improvement? Maybe the comment is confusing me. This is used to calculate `Budget` which I guess limits how many extracts we can do (so two at most?). Perhaps the comment could be expanded to explain a bit more about what is going on.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12899	Right. Because even free extractions can be done for cheap, if that makes sense. In some contexts I suppose. A "free" extract is "cheap" if all you care about is whether an extract is "cheap." A "free" extract is not "cheap" if you care about the relative cost of the two. I could see the above code being very confusing in some circumstances. how should the uses of isExtractSubvectorCheap() be adjusted? Not knowing all of the users of that function, I can't say. It's probably context-dependent.

greened mentioned this in D116832: [UpdateLLCTestChecks] Allow replacing register names with variables.Jan 12 2022, 10:48 AM

lebedev.ri abandoned this revision.Jan 17 2022, 2:36 PM

lebedev.ri marked 2 inline comments as done.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

9 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

21 lines

Target/

AArch64/

AArch64ISelLowering.h

5 lines

AArch64ISelLowering.cpp

9 lines

ARM/

ARMISelLowering.h

5 lines

ARMISelLowering.cpp

9 lines

X86/

X86ISelLowering.h

5 lines

X86ISelLowering.cpp

15 lines

test/

CodeGen/

AArch64/

arm64-neon-copy.ll

2 lines

ARM/

fp16-insert-extract.ll

16 lines

Diff 394034

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,834 Lines • ▼ Show 20 Lines	public:
/// On some targets it might be more efficient to use a combination of		/// On some targets it might be more efficient to use a combination of
/// arithmetic instructions to materialize the constant instead of loading it		/// arithmetic instructions to materialize the constant instead of loading it
/// from a constant pool.		/// from a constant pool.
virtual bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		virtual bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const {		Type *Ty) const {
return false;		return false;
}		}

		/// Return true if EXTRACT_SUBVECTOR is free for extracting this result type
		/// from this source type with this index. This is needed because
		/// EXTRACT_SUBVECTOR usually has custom lowering that depends on the index of
		/// the first element, and only the target knows which lowering is free.
		virtual bool isExtractSubvectorFree(EVT ResVT, EVT SrcVT,
		unsigned Index) const {
		return false;
		}

/// Return true if EXTRACT_SUBVECTOR is cheap for extracting this result type		/// Return true if EXTRACT_SUBVECTOR is cheap for extracting this result type
/// from this source type with this index. This is needed because		/// from this source type with this index. This is needed because
/// EXTRACT_SUBVECTOR usually has custom lowering that depends on the index of		/// EXTRACT_SUBVECTOR usually has custom lowering that depends on the index of
/// the first element, and only the target knows which lowering is cheap.		/// the first element, and only the target knows which lowering is cheap.
virtual bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		virtual bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const {		unsigned Index) const {
return false;		return false;
}		}
▲ Show 20 Lines • Show All 1,918 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 20,779 Lines • ▼ Show 20 Lines	if (LegalOperations &&
!TLI.isOperationLegalOrCustom(ISD::VECTOR_SHUFFLE, NarrowVT))		!TLI.isOperationLegalOrCustom(ISD::VECTOR_SHUFFLE, NarrowVT))
return SDValue();		return SDValue();

uint64_t FirstExtractedEltIdx = N->getConstantOperandVal(1);		uint64_t FirstExtractedEltIdx = N->getConstantOperandVal(1);
int NumEltsExtracted = NarrowVT.getVectorNumElements();		int NumEltsExtracted = NarrowVT.getVectorNumElements();
assert((FirstExtractedEltIdx % NumEltsExtracted) == 0 &&		assert((FirstExtractedEltIdx % NumEltsExtracted) == 0 &&
"Extract index is not a multiple of the output vector length.");		"Extract index is not a multiple of the output vector length.");

		auto GetExtractSubvectorCost = [&TLI, NarrowVT, WideVT](unsigned Index) {
		if (TLI.isExtractSubvectorFree(NarrowVT, WideVT, Index))
		return 0;
		if (TLI.isExtractSubvectorCheap(NarrowVT, WideVT, Index))
		return 1;
		return 2; // Assume that all non-cheap subvector extracts are equal,
		// and that replacing one non-cheap subvector extract
		// with two cheap ones is a win.
		greenedUnsubmitted Done Reply Inline Actions The values here make it seem like replacing a non-cheap operation with two cheap ones yields the same performance (1+1 = 2). contrary to the comment. Is this intentional? greened: The values here make it seem like replacing a non-cheap operation with two cheap ones yields…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Not only does it make seem so, that is exactly what the comment is saying. lebedev.ri: Not only does it make seem so, that is exactly what the comment is saying.
		greenedUnsubmitted Not Done Reply Inline Actions It's not what the comment is saying. The comment says "a win" which to my mind means, "performance improvement." If the cost is the same, where's the improvement? Maybe the comment is confusing me. This is used to calculate `Budget` which I guess limits how many extracts we can do (so two at most?). Perhaps the comment could be expanded to explain a bit more about what is going on. greened: It's not what the comment is saying. The comment says "a win" which to my mind means…
		};

		int Budget = GetExtractSubvectorCost(FirstExtractedEltIdx);

int WideNumElts = WideVT.getVectorNumElements();		int WideNumElts = WideVT.getVectorNumElements();

SmallVector<int, 16> NewMask;		SmallVector<int, 16> NewMask;
NewMask.reserve(NumEltsExtracted);		NewMask.reserve(NumEltsExtracted);
SmallSetVector<std::pair<SDValue /Op/, int /SubvectorIndex/>, 2>		SmallSetVector<std::pair<SDValue /Op/, int /SubvectorIndex/>, 2>
DemandedSubvectors;		DemandedSubvectors;

// Try to decode the wide mask into narrow mask from at most two subvectors.		// Try to decode the wide mask into narrow mask from at most two subvectors.
Show All 31 Lines	for (int M : WideShuffleVector->getMask().slice(FirstExtractedEltIdx,
SDValue Op = WideShuffleVector->getOperand(WideShufOpIdx);		SDValue Op = WideShuffleVector->getOperand(WideShufOpIdx);

if (Op.isUndef()) {		if (Op.isUndef()) {
// Picking from an undef operand. Let's adjust mask instead.		// Picking from an undef operand. Let's adjust mask instead.
NewMask.emplace_back(-1);		NewMask.emplace_back(-1);
continue;		continue;
}		}

// Profitability check: only deal with extractions from the first subvector.
if (OpSubvecIdx != 0)
return SDValue();

const std::pair<SDValue, int> DemandedSubvector =		const std::pair<SDValue, int> DemandedSubvector =
std::make_pair(Op, OpSubvecIdx);		std::make_pair(Op, OpSubvecIdx);

if (DemandedSubvectors.insert(DemandedSubvector)) {		if (DemandedSubvectors.insert(DemandedSubvector)) {
if (DemandedSubvectors.size() > 2)		if (DemandedSubvectors.size() > 2)
return SDValue(); // We can't handle more than two subvectors.		return SDValue(); // We can't handle more than two subvectors.
// How many elements into the WideVT does this subvector start?		// How many elements into the WideVT does this subvector start?
int Index = NumEltsExtracted * OpSubvecIdx;		int Index = NumEltsExtracted * OpSubvecIdx;
// Bail out if the extraction isn't going to be cheap.		// Bail out if the extraction exhausted our budget.
if (!TLI.isExtractSubvectorCheap(NarrowVT, WideVT, Index))		Budget -= GetExtractSubvectorCost(Index);
		if (Budget < 0)
return SDValue();		return SDValue();
}		}

// Ok, but from which operand of the new shuffle will this element pick?		// Ok, but from which operand of the new shuffle will this element pick?
int NewOpIdx =		int NewOpIdx =
getFirstIndexOf(DemandedSubvectors.getArrayRef(), DemandedSubvector);		getFirstIndexOf(DemandedSubvectors.getArrayRef(), DemandedSubvector);
assert((NewOpIdx == 0 \|\| NewOpIdx == 1) && "Unexpected operand index.");		assert((NewOpIdx == 0 \|\| NewOpIdx == 1) && "Unexpected operand index.");

▲ Show 20 Lines • Show All 3,192 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 641 Lines • ▼ Show 20 Lines	public:
bool isDesirableToCommuteWithShift(const SDNode *N,		bool isDesirableToCommuteWithShift(const SDNode *N,
CombineLevel Level) const override;		CombineLevel Level) const override;

/// Returns true if it is beneficial to convert a load of a constant		/// Returns true if it is beneficial to convert a load of a constant
/// to just the constant itself.		/// to just the constant itself.
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

		/// Return true if EXTRACT_SUBVECTOR is free for this result type
		/// with this index.
		bool isExtractSubvectorFree(EVT ResVT, EVT SrcVT,
		unsigned Index) const override;

/// Return true if EXTRACT_SUBVECTOR is cheap for this result type		/// Return true if EXTRACT_SUBVECTOR is cheap for this result type
/// with this index.		/// with this index.
bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const override;		unsigned Index) const override;

bool shouldFormOverflowOp(unsigned Opcode, EVT VT,		bool shouldFormOverflowOp(unsigned Opcode, EVT VT,
bool MathUsed) const override {		bool MathUsed) const override {
// Using overflow ops for overflow checks only should beneficial on		// Using overflow ops for overflow checks only should beneficial on
▲ Show 20 Lines • Show All 492 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,879 Lines • ▼ Show 20 Lines	if (BitSize == 32)
Val &= (1LL << 32) - 1;		Val &= (1LL << 32) - 1;

unsigned LZ = countLeadingZeros((uint64_t)Val);		unsigned LZ = countLeadingZeros((uint64_t)Val);
unsigned Shift = (63 - LZ) / 16;		unsigned Shift = (63 - LZ) / 16;
// MOVZ is free so return true for one or fewer MOVK.		// MOVZ is free so return true for one or fewer MOVK.
return Shift < 3;		return Shift < 3;
}		}

bool AArch64TargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool AArch64TargetLowering::isExtractSubvectorFree(EVT ResVT, EVT SrcVT,
unsigned Index) const {		unsigned Index) const {
if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))		if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))
return false;		return false;

return (Index == 0 \|\| Index == ResVT.getVectorNumElements());		return (Index == 0 \|\| Index == ResVT.getVectorNumElements());
}		}

		bool AArch64TargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
		unsigned Index) const {
		return isExtractSubvectorFree(ResVT, SrcVT, Index);
		}
		greenedUnsubmitted Done Reply Inline Actions So here, it seems like even if the extract is "free" (cost 0) it'll be counted as "cheap" (cost 1). I wonder if it would be better to combine these functions into a single "getExtractSubvectorCost" target hook. greened: So here, it seems like even if the extract is "free" (cost 0) it'll be counted as "cheap" (cost…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions So here, it seems like even if the extract is "free" (cost 0) it'll be counted as "cheap" (cost 1). Right. Because even free extractions can be done for cheap, if that makes sense. I wonder if it would be better to combine these functions into a single "getExtractSubvectorCost" target hook. We could do that, there's already `TTIImpl::getShuffleCost()`, the problem being, what will be the "cheap budget", i.e. how should the uses of `isExtractSubvectorCheap()` be adjusted? lebedev.ri:* > So here, it seems like even if the extract is "free" (cost 0) it'll be counted as "cheap"…
		greenedUnsubmitted Not Done Reply Inline Actions Right. Because even free extractions can be done for cheap, if that makes sense. In some contexts I suppose. A "free" extract is "cheap" if all you care about is whether an extract is "cheap." A "free" extract is not "cheap" if you care about the relative cost of the two. I could see the above code being very confusing in some circumstances. how should the uses of isExtractSubvectorCheap() be adjusted? Not knowing all of the users of that function, I can't say. It's probably context-dependent. greened: >Right. Because even free extractions can be done for cheap, if that makes sense. In some…

/// Turn vector tests of the signbit in the form of:		/// Turn vector tests of the signbit in the form of:
/// xor (sra X, elt_size(X)-1), -1		/// xor (sra X, elt_size(X)-1), -1
/// into:		/// into:
/// cmge X, X, #0		/// cmge X, X, #0
static SDValue foldVectorXorShiftIntoCmp(SDNode *N, SelectionDAG &DAG,		static SDValue foldVectorXorShiftIntoCmp(SDNode *N, SelectionDAG &DAG,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if (!Subtarget->hasNEON() \|\| !VT.isVector())		if (!Subtarget->hasNEON() \|\| !VT.isVector())
▲ Show 20 Lines • Show All 6,706 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines	bool getTgtMemIntrinsic(IntrinsicInfo &Info,
MachineFunction &MF,		MachineFunction &MF,
unsigned Intrinsic) const override;		unsigned Intrinsic) const override;

/// Returns true if it is beneficial to convert a load of a constant		/// Returns true if it is beneficial to convert a load of a constant
/// to just the constant itself.		/// to just the constant itself.
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

		/// Return true if EXTRACT_SUBVECTOR is free for this result type
		/// with this index.
		bool isExtractSubvectorFree(EVT ResVT, EVT SrcVT,
		unsigned Index) const override;

/// Return true if EXTRACT_SUBVECTOR is cheap for this result type		/// Return true if EXTRACT_SUBVECTOR is cheap for this result type
/// with this index.		/// with this index.
bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const override;		unsigned Index) const override;

bool shouldFormOverflowOp(unsigned Opcode, EVT VT,		bool shouldFormOverflowOp(unsigned Opcode, EVT VT,
bool MathUsed) const override {		bool MathUsed) const override {
// Using overflow ops for overflow checks only should beneficial on ARM.		// Using overflow ops for overflow checks only should beneficial on ARM.
▲ Show 20 Lines • Show All 372 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 20,759 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned Bits = Ty->getPrimitiveSizeInBits();		unsigned Bits = Ty->getPrimitiveSizeInBits();
if (Bits == 0 \|\| Bits > 32)		if (Bits == 0 \|\| Bits > 32)
return false;		return false;
return true;		return true;
}		}

bool ARMTargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool ARMTargetLowering::isExtractSubvectorFree(EVT ResVT, EVT SrcVT,
unsigned Index) const {		unsigned Index) const {
if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))		if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))
return false;		return false;

return (Index == 0 \|\| Index == ResVT.getVectorNumElements());		return (Index == 0 \|\| Index == ResVT.getVectorNumElements());
}		}

		bool ARMTargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
		unsigned Index) const {
		return isExtractSubvectorFree(ResVT, SrcVT, Index);
		}

Instruction *ARMTargetLowering::makeDMB(IRBuilderBase &Builder,		Instruction *ARMTargetLowering::makeDMB(IRBuilderBase &Builder,
ARM_MB::MemBOpt Domain) const {		ARM_MB::MemBOpt Domain) const {
Module *M = Builder.GetInsertBlock()->getParent()->getParent();		Module *M = Builder.GetInsertBlock()->getParent()->getParent();

// First, if the target has no DMB, see what fallback we can use.		// First, if the target has no DMB, see what fallback we can use.
if (!Subtarget->hasDataBarrier()) {		if (!Subtarget->hasDataBarrier()) {
// Some ARMv6 cpus can support data barriers with an mcr instruction.		// Some ARMv6 cpus can support data barriers with an mcr instruction.
// Thumb1 and pre-v6 ARM mode use a libcall instead and should never get		// Thumb1 and pre-v6 ARM mode use a libcall instead and should never get
▲ Show 20 Lines • Show All 799 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,341 Lines • ▼ Show 20 Lines	public:

bool reduceSelectOfFPConstantLoads(EVT CmpOpVT) const override;		bool reduceSelectOfFPConstantLoads(EVT CmpOpVT) const override;

bool convertSelectOfConstantsToMath(EVT VT) const override;		bool convertSelectOfConstantsToMath(EVT VT) const override;

bool decomposeMulByConstant(LLVMContext &Context, EVT VT,		bool decomposeMulByConstant(LLVMContext &Context, EVT VT,
SDValue C) const override;		SDValue C) const override;

		/// Return true if EXTRACT_SUBVECTOR is free for this result type
		/// with this index.
		bool isExtractSubvectorFree(EVT ResVT, EVT SrcVT,
		unsigned Index) const override;

/// Return true if EXTRACT_SUBVECTOR is cheap for this result type		/// Return true if EXTRACT_SUBVECTOR is cheap for this result type
/// with this index.		/// with this index.
bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const override;		unsigned Index) const override;

/// Scalar ops always have equal or better analysis/performance/power than		/// Scalar ops always have equal or better analysis/performance/power than
/// the vector equivalent, so this always makes sense if the scalar op is		/// the vector equivalent, so this always makes sense if the scalar op is
/// supported.		/// supported.
▲ Show 20 Lines • Show All 419 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,584 Lines • ▼ Show 20 Lines	if (isOperationLegal(ISD::MUL, VT) && EltSizeInBits <= 32 &&
(EltSizeInBits != 32 \|\| !Subtarget.isPMULLDSlow()))		(EltSizeInBits != 32 \|\| !Subtarget.isPMULLDSlow()))
return false;		return false;

// shl+add, shl+sub, shl+add+neg		// shl+add, shl+sub, shl+add+neg
return (MulC + 1).isPowerOf2() \|\| (MulC - 1).isPowerOf2() \|\|		return (MulC + 1).isPowerOf2() \|\| (MulC - 1).isPowerOf2() \|\|
(1 - MulC).isPowerOf2() \|\| (-(MulC + 1)).isPowerOf2();		(1 - MulC).isPowerOf2() \|\| (-(MulC + 1)).isPowerOf2();
}		}

		bool X86TargetLowering::isExtractSubvectorFree(EVT ResVT, EVT SrcVT,
		unsigned Index) const {
		if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))
		return false;

		return Index == 0;
		}

bool X86TargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool X86TargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const {		unsigned Index) const {
if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))		if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))
return false;		return false;

		if (isExtractSubvectorFree(ResVT, SrcVT, Index))
		return true;

// Mask vectors support all subregister combinations and operations that		// Mask vectors support all subregister combinations and operations that
// extract half of vector.		// extract half of vector.
if (ResVT.getVectorElementType() == MVT::i1)		if (ResVT.getVectorElementType() == MVT::i1)
return Index == 0 \|\| ((ResVT.getSizeInBits() == SrcVT.getSizeInBits()*2) &&		return ((ResVT.getSizeInBits() == SrcVT.getSizeInBits() * 2) &&
(Index == ResVT.getVectorNumElements()));		(Index == ResVT.getVectorNumElements()));

return (Index % ResVT.getVectorNumElements()) == 0;		return (Index % ResVT.getVectorNumElements()) == 0;
}		}

bool X86TargetLowering::shouldScalarizeBinop(SDValue VecOp) const {		bool X86TargetLowering::shouldScalarizeBinop(SDValue VecOp) const {
unsigned Opc = VecOp.getOpcode();		unsigned Opc = VecOp.getOpcode();

// Assume target opcodes can't be scalarized.		// Assume target opcodes can't be scalarized.
▲ Show 20 Lines • Show All 48,924 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-copy.ll

Show First 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%tmp3 = extractelement <4 x float> %tmp1, i32 2		%tmp3 = extractelement <4 x float> %tmp1, i32 2
%tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1		%tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1
ret <2 x float> %tmp4		ret <2 x float> %tmp4
}		}

define <1 x double> @ins2f1(<2 x double> %tmp1, <1 x double> %tmp2) {		define <1 x double> @ins2f1(<2 x double> %tmp1, <1 x double> %tmp2) {
; CHECK-LABEL: ins2f1:		; CHECK-LABEL: ins2f1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: dup v0.2d, v0.d[1]		; CHECK-NEXT: ext v0.16b, v0.16b, v0.16b, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0		; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%tmp3 = extractelement <2 x double> %tmp1, i32 1		%tmp3 = extractelement <2 x double> %tmp1, i32 1
%tmp4 = insertelement <1 x double> %tmp2, double %tmp3, i32 0		%tmp4 = insertelement <1 x double> %tmp2, double %tmp3, i32 0
ret <1 x double> %tmp4		ret <1 x double> %tmp4
}		}

define <8 x i8> @ins8b8(<8 x i8> %tmp1, <8 x i8> %tmp2) {		define <8 x i8> @ins8b8(<8 x i8> %tmp1, <8 x i8> %tmp2) {
▲ Show 20 Lines • Show All 1,642 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/fp16-insert-extract.ll

	Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	define arm_aapcs_vfpcc <8 x half> @shuffle3step_f16(<32 x half> %src) {			define arm_aapcs_vfpcc <8 x half> @shuffle3step_f16(<32 x half> %src) {
	; CHECKHARD-LABEL: shuffle3step_f16:			; CHECKHARD-LABEL: shuffle3step_f16:
	; CHECKHARD: @ %bb.0: @ %entry			; CHECKHARD: @ %bb.0: @ %entry
	; CHECKHARD-NEXT: vmov r1, s0			; CHECKHARD-NEXT: vmov r1, s0
	; CHECKHARD-NEXT: vmovx.f16 s12, s1			; CHECKHARD-NEXT: vmovx.f16 s12, s1
	; CHECKHARD-NEXT: vmov r0, s12			; CHECKHARD-NEXT: vmov r0, s12
	; CHECKHARD-NEXT: vext.16 d16, d4, d5, #2			; CHECKHARD-NEXT: vext.16 d16, d4, d5, #2
	; CHECKHARD-NEXT: vmovx.f16 s12, s4			; CHECKHARD-NEXT: vmovx.f16 s12, s4
	; CHECKHARD-NEXT: vdup.16 q11, d3[1]			; CHECKHARD-NEXT: vdup.32 d21, d3[1]
	; CHECKHARD-NEXT: vrev32.16 d17, d16			; CHECKHARD-NEXT: vrev32.16 d17, d16
	; CHECKHARD-NEXT: vext.16 d16, d16, d17, #3			; CHECKHARD-NEXT: vext.16 d16, d16, d17, #3
	; CHECKHARD-NEXT: vrev32.16 d17, d3			; CHECKHARD-NEXT: vrev32.16 d17, d3
	; CHECKHARD-NEXT: vext.16 d17, d17, d3, #1			; CHECKHARD-NEXT: vext.16 d17, d17, d3, #1
	; CHECKHARD-NEXT: vext.16 d16, d16, d17, #2			; CHECKHARD-NEXT: vext.16 d16, d16, d17, #2
	; CHECKHARD-NEXT: vext.16 d17, d16, d16, #2			; CHECKHARD-NEXT: vext.16 d17, d16, d16, #2
	; CHECKHARD-NEXT: vmov.16 d16[0], r1			; CHECKHARD-NEXT: vmov.16 d16[0], r1
	; CHECKHARD-NEXT: vmov.16 d16[1], r0			; CHECKHARD-NEXT: vmov.16 d16[1], r0
	Show All 14 Lines
	; CHECKHARD-NEXT: vmov.16 d18[2], r0			; CHECKHARD-NEXT: vmov.16 d18[2], r0
	; CHECKHARD-NEXT: vmov r0, s5			; CHECKHARD-NEXT: vmov r0, s5
	; CHECKHARD-NEXT: vmov.16 d18[3], r0			; CHECKHARD-NEXT: vmov.16 d18[3], r0
	; CHECKHARD-NEXT: vmov r0, s12			; CHECKHARD-NEXT: vmov r0, s12
	; CHECKHARD-NEXT: vmov.16 d20[2], r0			; CHECKHARD-NEXT: vmov.16 d20[2], r0
	; CHECKHARD-NEXT: vmov r0, s11			; CHECKHARD-NEXT: vmov r0, s11
	; CHECKHARD-NEXT: vmov.16 d20[3], r0			; CHECKHARD-NEXT: vmov.16 d20[3], r0
	; CHECKHARD-NEXT: vmov r0, s10			; CHECKHARD-NEXT: vmov r0, s10
	; CHECKHARD-NEXT: vext.16 d20, d20, d22, #1			; CHECKHARD-NEXT: vext.16 d20, d20, d21, #1
	; CHECKHARD-NEXT: vdup.16 q11, d3[2]			; CHECKHARD-NEXT: vdup.32 d21, d3[2]
	; CHECKHARD-NEXT: vext.16 d19, d20, d20, #3			; CHECKHARD-NEXT: vext.16 d19, d20, d20, #3
	; CHECKHARD-NEXT: vadd.f16 q8, q8, q9			; CHECKHARD-NEXT: vadd.f16 q8, q8, q9
	; CHECKHARD-NEXT: vext.16 d18, d0, d1, #2			; CHECKHARD-NEXT: vext.16 d18, d0, d1, #2
	; CHECKHARD-NEXT: vmovx.f16 s0, s8			; CHECKHARD-NEXT: vmovx.f16 s0, s8
	; CHECKHARD-NEXT: vmov r1, s0			; CHECKHARD-NEXT: vmov r1, s0
	; CHECKHARD-NEXT: vmovx.f16 s0, s11			; CHECKHARD-NEXT: vmovx.f16 s0, s11
	; CHECKHARD-NEXT: vext.16 d19, d18, d2, #3			; CHECKHARD-NEXT: vext.16 d19, d18, d2, #3
	; CHECKHARD-NEXT: vext.16 d18, d2, d18, #1			; CHECKHARD-NEXT: vext.16 d18, d2, d18, #1
	; CHECKHARD-NEXT: vext.16 d18, d18, d19, #2			; CHECKHARD-NEXT: vext.16 d18, d18, d19, #2
	; CHECKHARD-NEXT: vext.16 d18, d18, d18, #1			; CHECKHARD-NEXT: vext.16 d18, d18, d18, #1
	; CHECKHARD-NEXT: vmov.16 d20[1], r1			; CHECKHARD-NEXT: vmov.16 d20[1], r1
	; CHECKHARD-NEXT: vmov.16 d20[2], r0			; CHECKHARD-NEXT: vmov.16 d20[2], r0
	; CHECKHARD-NEXT: vmov r0, s0			; CHECKHARD-NEXT: vmov r0, s0
	; CHECKHARD-NEXT: vmov.16 d20[3], r0			; CHECKHARD-NEXT: vmov.16 d20[3], r0
	; CHECKHARD-NEXT: vext.16 d20, d20, d22, #1			; CHECKHARD-NEXT: vext.16 d20, d20, d21, #1
	; CHECKHARD-NEXT: vext.16 d19, d20, d20, #3			; CHECKHARD-NEXT: vext.16 d19, d20, d20, #3
	; CHECKHARD-NEXT: vadd.f16 q0, q8, q9			; CHECKHARD-NEXT: vadd.f16 q0, q8, q9
	; CHECKHARD-NEXT: bx lr			; CHECKHARD-NEXT: bx lr
	;			;
	; CHECKSOFT-LABEL: shuffle3step_f16:			; CHECKSOFT-LABEL: shuffle3step_f16:
	; CHECKSOFT: @ %bb.0: @ %entry			; CHECKSOFT: @ %bb.0: @ %entry
	; CHECKSOFT-NEXT: vmov r1, s0			; CHECKSOFT-NEXT: vmov r1, s0
	; CHECKSOFT-NEXT: vmovx.f16 s12, s1			; CHECKSOFT-NEXT: vmovx.f16 s12, s1
	; CHECKSOFT-NEXT: vmov r0, s12			; CHECKSOFT-NEXT: vmov r0, s12
	; CHECKSOFT-NEXT: vext.16 d16, d4, d5, #2			; CHECKSOFT-NEXT: vext.16 d16, d4, d5, #2
	; CHECKSOFT-NEXT: vmovx.f16 s12, s4			; CHECKSOFT-NEXT: vmovx.f16 s12, s4
	; CHECKSOFT-NEXT: vdup.16 q11, d3[1]			; CHECKSOFT-NEXT: vdup.32 d21, d3[1]
	; CHECKSOFT-NEXT: vrev32.16 d17, d16			; CHECKSOFT-NEXT: vrev32.16 d17, d16
	; CHECKSOFT-NEXT: vext.16 d16, d16, d17, #3			; CHECKSOFT-NEXT: vext.16 d16, d16, d17, #3
	; CHECKSOFT-NEXT: vrev32.16 d17, d3			; CHECKSOFT-NEXT: vrev32.16 d17, d3
	; CHECKSOFT-NEXT: vext.16 d17, d17, d3, #1			; CHECKSOFT-NEXT: vext.16 d17, d17, d3, #1
	; CHECKSOFT-NEXT: vext.16 d16, d16, d17, #2			; CHECKSOFT-NEXT: vext.16 d16, d16, d17, #2
	; CHECKSOFT-NEXT: vext.16 d17, d16, d16, #2			; CHECKSOFT-NEXT: vext.16 d17, d16, d16, #2
	; CHECKSOFT-NEXT: vmov.16 d16[0], r1			; CHECKSOFT-NEXT: vmov.16 d16[0], r1
	; CHECKSOFT-NEXT: vmov.16 d16[1], r0			; CHECKSOFT-NEXT: vmov.16 d16[1], r0
	Show All 14 Lines
	; CHECKSOFT-NEXT: vmov.16 d18[2], r0			; CHECKSOFT-NEXT: vmov.16 d18[2], r0
	; CHECKSOFT-NEXT: vmov r0, s5			; CHECKSOFT-NEXT: vmov r0, s5
	; CHECKSOFT-NEXT: vmov.16 d18[3], r0			; CHECKSOFT-NEXT: vmov.16 d18[3], r0
	; CHECKSOFT-NEXT: vmov r0, s12			; CHECKSOFT-NEXT: vmov r0, s12
	; CHECKSOFT-NEXT: vmov.16 d20[2], r0			; CHECKSOFT-NEXT: vmov.16 d20[2], r0
	; CHECKSOFT-NEXT: vmov r0, s11			; CHECKSOFT-NEXT: vmov r0, s11
	; CHECKSOFT-NEXT: vmov.16 d20[3], r0			; CHECKSOFT-NEXT: vmov.16 d20[3], r0
	; CHECKSOFT-NEXT: vmov r0, s10			; CHECKSOFT-NEXT: vmov r0, s10
	; CHECKSOFT-NEXT: vext.16 d20, d20, d22, #1			; CHECKSOFT-NEXT: vext.16 d20, d20, d21, #1
	; CHECKSOFT-NEXT: vdup.16 q11, d3[2]			; CHECKSOFT-NEXT: vdup.32 d21, d3[2]
	; CHECKSOFT-NEXT: vext.16 d19, d20, d20, #3			; CHECKSOFT-NEXT: vext.16 d19, d20, d20, #3
	; CHECKSOFT-NEXT: vadd.f16 q8, q8, q9			; CHECKSOFT-NEXT: vadd.f16 q8, q8, q9
	; CHECKSOFT-NEXT: vext.16 d18, d0, d1, #2			; CHECKSOFT-NEXT: vext.16 d18, d0, d1, #2
	; CHECKSOFT-NEXT: vmovx.f16 s0, s8			; CHECKSOFT-NEXT: vmovx.f16 s0, s8
	; CHECKSOFT-NEXT: vmov r1, s0			; CHECKSOFT-NEXT: vmov r1, s0
	; CHECKSOFT-NEXT: vmovx.f16 s0, s11			; CHECKSOFT-NEXT: vmovx.f16 s0, s11
	; CHECKSOFT-NEXT: vext.16 d19, d18, d2, #3			; CHECKSOFT-NEXT: vext.16 d19, d18, d2, #3
	; CHECKSOFT-NEXT: vext.16 d18, d2, d18, #1			; CHECKSOFT-NEXT: vext.16 d18, d2, d18, #1
	; CHECKSOFT-NEXT: vext.16 d18, d18, d19, #2			; CHECKSOFT-NEXT: vext.16 d18, d18, d19, #2
	; CHECKSOFT-NEXT: vext.16 d18, d18, d18, #1			; CHECKSOFT-NEXT: vext.16 d18, d18, d18, #1
	; CHECKSOFT-NEXT: vmov.16 d20[1], r1			; CHECKSOFT-NEXT: vmov.16 d20[1], r1
	; CHECKSOFT-NEXT: vmov.16 d20[2], r0			; CHECKSOFT-NEXT: vmov.16 d20[2], r0
	; CHECKSOFT-NEXT: vmov r0, s0			; CHECKSOFT-NEXT: vmov r0, s0
	; CHECKSOFT-NEXT: vmov.16 d20[3], r0			; CHECKSOFT-NEXT: vmov.16 d20[3], r0
	; CHECKSOFT-NEXT: vext.16 d20, d20, d22, #1			; CHECKSOFT-NEXT: vext.16 d20, d20, d21, #1
	; CHECKSOFT-NEXT: vext.16 d19, d20, d20, #3			; CHECKSOFT-NEXT: vext.16 d19, d20, d20, #3
	; CHECKSOFT-NEXT: vadd.f16 q0, q8, q9			; CHECKSOFT-NEXT: vadd.f16 q0, q8, q9
	; CHECKSOFT-NEXT: bx lr			; CHECKSOFT-NEXT: bx lr
	entry:			entry:
	%s1 = shufflevector <32 x half> %src, <32 x half> undef, <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>			%s1 = shufflevector <32 x half> %src, <32 x half> undef, <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>
	%s2 = shufflevector <32 x half> %src, <32 x half> undef, <8 x i32> <i32 1, i32 4, i32 7, i32 10, i32 13, i32 16, i32 19, i32 22>			%s2 = shufflevector <32 x half> %src, <32 x half> undef, <8 x i32> <i32 1, i32 4, i32 7, i32 10, i32 13, i32 16, i32 19, i32 22>
	%s3 = shufflevector <32 x half> %src, <32 x half> undef, <8 x i32> <i32 2, i32 5, i32 8, i32 11, i32 14, i32 17, i32 20, i32 23>			%s3 = shufflevector <32 x half> %src, <32 x half> undef, <8 x i32> <i32 2, i32 5, i32 8, i32 11, i32 14, i32 17, i32 20, i32 23>
	%a = fadd <8 x half> %s1, %s2			%a = fadd <8 x half> %s1, %s2
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAG][TLI][X86][ARM][AArch64] Add `isExtractSubvectorFree` / use it in `foldExtractSubvectorFromShuffleVector()`AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 394034

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/ARM/ARMISelLowering.h

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/AArch64/arm64-neon-copy.ll

llvm/test/CodeGen/ARM/fp16-insert-extract.ll

[DAG][TLI][X86][ARM][AArch64] Add `isExtractSubvectorFree` / use it in `foldExtractSubvectorFromShuffleVector()`
AbandonedPublic