This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
2/2
TargetTransformInfo.h
4/4
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/
-
RISCV/
-
RISCVTargetTransformInfo.h
-
RISCVTargetTransformInfo.cpp
-
X86/
-
X86TargetTransformInfo.h
3/3
X86TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/RISCV/
-
Transforms/
-
SLPVectorizer/
-
RISCV/
3/3
getpointerschaincost.ll
-
struct-gep.ll

Differential D149654

[SLP][RISCV] Account for offset folding in getPointersChainCost
ClosedPublic

Authored by luke on May 2 2023, 7:51 AM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
reames

Commits

rGc27a0b21c578: [SLP][RISCV] Account for offset folding in getPointersChainCost

Summary

For a GEP in a pointer chain, if:

a pointer chain is unit-strided
the base pointer wasn't folded and is sitting in a register somewhere
the distance between the GEP and the base pointer is small enough and can be folded into the addressing mode of the using load/store

Then we can exclude that GEP from the total cost of the pointer chain,
as it will likely be folded away.

In order to check if 3) holds, we need to know the type of memory access
being made by the users of the pointer chain. For that, we need to pass
along a new argument to getPointersChainCost. (Using the source pointer
type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for
more details).

Also note that 2) is currently an assumption, and could be modelled more
accurately.

This prevents some unprofitable cases from being SLP vectorized on
RISC-V by making the scalar costs cheaper and closer to the actual
codegen.

For now the getPointersChainCost hook is duplicated for RISC-V to prevent
disturbing other targets, but could be merged back in and shared with
other targets in a following patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

luke created this revision.May 2 2023, 7:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 7:51 AM

Herald added subscribers: asb, vporpo, pmatos and 23 others. · View Herald Transcript

luke requested review of this revision.May 2 2023, 7:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 7:51 AM

Herald added subscribers: llvm-commits, • pcwang-thead, MaskRay. · View Herald Transcript

luke added a parent revision: D149653: [RISCV] Add test for unprofitable SLP vectorization.May 2 2023, 7:52 AM

Harbormaster completed remote builds in B229438: Diff 518734.May 2 2023, 7:52 AM

luke added inline comments.May 2 2023, 7:56 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4960	I removed this because it seems to be subsumed by the extra check in the base implementation. It's 100% equivalent as it will now cost for GEPs that don't fit into the addressing mode, but that should be more accurate right?

Remove fixme

ABataev added inline comments.May 2 2023, 8:02 AM

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1057	`I`
1069	Add a comment with the param name for the `true` argument

Address comments

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4960	Typo, it's not 100% equivalent

Harbormaster completed remote builds in B229443: Diff 518741.May 2 2023, 9:03 AM

I think this patch requires some cost analysis tests.

reames requested changes to this revision.May 2 2023, 1:17 PM

reames added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
323	Rephrase: is the type of the memory access.
327	Rename: AccessTy (This matches the naming convention we use elsewhere for this concept.)
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1066	I don't see anything here which requires the stride to be equal to the type size. What prevents the set of pointers from e.g. advancing by 64 bytes where the access type is 8 bytes? (i.e. what prevents this from being a strided access with non-unit stride in RISCV terms?)
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4950	KnownStride, and KnownUniform are not the same condition. I don't think your code change in the general model actually matches what you removed here. I'd suggest by starting with a RISCV specific hook on your heuristic, and then we can merge in a post commit. I think the RISCV version is going to be more restrictive.

This revision now requires changes to proceed.May 2 2023, 1:17 PM

luke added inline comments.May 2 2023, 4:00 PM

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

1066

From what I understand, PointersChainInfo::isUniformStride() implies unit stride pointers. The only user of getPointersChainCost is in SLP's getGEPCostDiff, and the only time getKnownUniformStrided() is created is in case 2) here, which is a wide vector load:

// Here we differentiate two cases: (1) when Ptrs represent a regular
// vectorization tree node (as they are pointer arguments of scattered
// loads) or (2) when Ptrs are the arguments of loads or stores being
// vectorized as plane wide unit-stride load/store since all the
// loads/stores are known to be from/to adjacent locations.
assert(E->State == TreeEntry::Vectorize &&
       "Entry state expected to be Vectorize here.");
if (isa<LoadInst, StoreInst>(VL0)) {
  // Case 2: estimate costs for pointer related costs when vectorizing to
  // a wide load/store.
  // Scalar cost is estimated as a set of pointers with known relationship
  // between them.
  // For vector code we will use BasePtr as argument for the wide load/store
  // but we also need to account all the instructions which are going to
  // stay in vectorized code due to uses outside of these scalar
  // loads/stores.
  ScalarCost = TTI->getPointersChainCost(
      Ptrs, BasePtr, TTI::PointersChainInfo::getKnownUniformStrided(),
      ScalarTy, CostKind);

I agree this is non-obvious, I was considering renaming isUniformStride() to isUnitStride() or something more strict but deferred to keep the patch small. I can do this is a parent patch or alternatively leave a comment explaining this.

In D149654#4312732, @ABataev wrote:

I think this patch requires some cost analysis tests.

I agree, although getPointersChainInfo doesn't get called anywhere useful for opt -passes="print<cost-model>", so the only way I can think of verifying the cost is via checking the debug output from SLP. Is there a better way to go about this?

Restore x86 hook

Harbormaster completed remote builds in B232045: Diff 522241.May 15 2023, 11:09 AM

Rebase ontop of D150662 and add cost test case

Harbormaster completed remote builds in B232274: Diff 522556.May 16 2023, 5:21 AM

luke added a parent revision: D150662: [SLP] Rename IsUniformStride to IsUnitStride. NFCI.May 16 2023, 5:34 AM

Fix rebase

Harbormaster completed remote builds in B232305: Diff 522599.May 16 2023, 9:20 AM

Doesn't seem to be any changes in x86 code on the llvm-test-suite:

Tests: 2432
Metric: size..text

Program                                        size..text                      
                                               results.head results.patch diff 
 Bitcode/Be...ral_grid/halide_bilateral_grid   59937.00     59937.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-2d     593.00       593.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-11    1185.00      1185.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-12     465.00       465.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-13     497.00       497.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-14     385.00       385.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-15    1633.00      1633.00       0.0%
 SingleSour...e/execute/GCC-C-execute-loop-2     561.00       561.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-2b     561.00       561.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-2e     769.00       769.00       0.0%
 Bitcode/Be...hmarks/Halide/blur/halide_blur   35809.00     35809.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-2f     417.00       417.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-2g     417.00       417.00       0.0%
 SingleSour...e/execute/GCC-C-execute-loop-3     417.00       417.00       0.0%
 SingleSour.../execute/GCC-C-execute-loop-3b     449.00       449.00       0.0%
                            Geomean difference      nan          nan       0.0%
         size..text                      
run    results.head results.patch    diff
count  2.432000e+03  2.432000e+03  2432.0
mean   1.692725e+04  1.692725e+04  0.0   
std    1.559107e+05  1.559107e+05  0.0   
min    3.530000e+02  3.530000e+02  0.0   
25%    3.850000e+02  3.850000e+02  0.0   
50%    5.130000e+02  5.130000e+02  0.0   
75%    4.765000e+03  4.765000e+03  0.0   
max    7.177889e+06  7.177889e+06  0.0

On sqlite3, -O2, there are 12 fewer VF=2 sequences vectorized on RISC-V:

$ grep -E "vsetivli\s+zero, 2, e" sqlite.head.s | wc -l
378
$ grep -E "vsetivli\s+zero, 2, e" sqlite.patch.s | wc -l
366

Most of them are in the form of something like:

	sb	a0, 5(s4)
	sb	s6, 6(s4)
	addi	s4, s4, 1
	vsetivli	zero, 2, e8, mf8, ta, ma
	vmv.v.i	v8, 0
	vse8.v	v8, (s4)

Which are now being emitted as scalar:

	sb	a0, 5(s3)
	sb	s6, 6(s3)
	sb	zero, 1(s3)
	sb	zero, 2(s3)

There are no differences in the code generated on x86. I've attached the RISC-V outputs if anyone wants to see the changes.

sqlite.head.s4 MBDownload

sqlite.patch.s4 MBDownload

Isolate changes to RISCVTargetTransformInfo

Harbormaster completed remote builds in B233150: Diff 523736.May 19 2023, 5:13 AM

luke marked 6 inline comments as done.May 19 2023, 5:15 AM

luke added inline comments.

llvm/test/Transforms/SLPVectorizer/RISCV/getpointerschaincost.ll
1	Apologies in advance for testing with the debug output, I can't think of another way to get access to `getPointersChainCost`.

luke retitled this revision from [SLP] Don't cost pointers that can be folded in getPointersChainCost to [SLP][RISCV] Account for offset folding in getPointersChainCost.May 19 2023, 5:16 AM

luke edited the summary of this revision. (Show Details)

Herald added subscribers: jobnoorman, eopXD, VincentWu and 5 others. · View Herald TranscriptMay 19 2023, 5:16 AM

luke added inline comments.May 19 2023, 5:19 AM

llvm/test/Transforms/SLPVectorizer/RISCV/getpointerschaincost.ll
56	Also worth noting, I tried to come up with a test case where only some of the pointers were folded and some weren't, but couldn't find a sane way to do so. Namely for RISC-V, we need a chain of pointers that are unit-strided, but is also somehow long enough that the offset overflows 2^12.

Rebase

Harbormaster completed remote builds in B233152: Diff 523738.May 19 2023, 6:26 AM

ABataev added inline comments.May 19 2023, 6:29 AM

llvm/test/Transforms/SLPVectorizer/RISCV/getpointerschaincost.ll
1	-pass-remarks-output= inbstead, check some of the remarks_... tests in SLPVectorizer tests directory as an example. Also, better to precommit new tests separately

The RISCV and API extension bits now look good to me. Once Alexey's happy with the SLP bits, you're good to go.

Looking at the structure of the patch - thanks for the scope reduction - I realized this is a special case of pointer difference. The general form of this would be to compute the offset between each GEP and the base. If that difference fits in the scalar addressing range, that GEP has zero cost. Otherwise, it has cost.

This does raise the point that considering a constant offset GEP as zero cost is actually wrong. If the offsets are 0, and UINT_MAX, that's not a zero cost GEP on RISCV.

The only machinery I know for this in LLVM is BasicAAResult::DecomposedGEP . We could potentially reuse some of that.

There's also an interaction with the ptradd proposal here. With simpler GEPs, the subtraction becomes trivial.

This is definite definitely future work through. And probably not future work actually worth doing, at least for the general case. :)

In D149654#4356406, @reames wrote:

This does raise the point that considering a constant offset GEP as zero cost is actually wrong. If the offsets are 0, and UINT_MAX, that's not a zero cost GEP on RISCV.

I guess with cost modelling there's not a strict definition of "wrong", but whether it could be more accurate. Whilst working on this locally I changed that assumption to check if it's foldable, but intentionally left it out to simplify things. It seems like an edge case where the approximation gets it right most of the time.

Convert test to remarks test

Harbormaster completed remote builds in B233501: Diff 524198.May 22 2023, 3:03 AM

This revision was not accepted when it landed; it landed in state Needs Review.May 22 2023, 5:55 AM

Closed by commit rGc27a0b21c578: [SLP][RISCV] Account for offset folding in getPointersChainCost (authored by luke). · Explain Why

This revision was automatically updated to reflect the committed changes.

luke added a commit: rGc27a0b21c578: [SLP][RISCV] Account for offset folding in getPointersChainCost.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

9 lines

TargetTransformInfoImpl.h

1 line

lib/

Analysis/

TargetTransformInfo.cpp

5 lines

Target/

RISCV/

RISCVTargetTransformInfo.h

6 lines

RISCVTargetTransformInfo.cpp

49 lines

X86/

X86TargetTransformInfo.h

1 line

X86TargetTransformInfo.cpp

10 lines

Transforms/

Vectorize/

SLPVectorizer.cpp

8 lines

test/

Transforms/

SLPVectorizer/

RISCV/

getpointerschaincost.ll

4 lines

struct-gep.ll

8 lines

Diff 524261

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	static PointersChainInfo getUnknownStride() {
return {/IsSameBaseAddress=/1, /IsUnitStride=/0,		return {/IsSameBaseAddress=/1, /IsUnitStride=/0,
/IsKnownStride=/0, 0};		/IsKnownStride=/0, 0};
}		}
};		};
static_assert(sizeof(PointersChainInfo) == 4, "Was size increase justified?");		static_assert(sizeof(PointersChainInfo) == 4, "Was size increase justified?");

/// Estimate the cost of a chain of pointers (typically pointer operands of a		/// Estimate the cost of a chain of pointers (typically pointer operands of a
/// chain of loads or stores within same block) operations set when lowered.		/// chain of loads or stores within same block) operations set when lowered.
		/// \p AccessTy is the type of the loads/stores that will ultimately use the
		reamesUnsubmitted Done Reply Inline Actions Rephrase: is the type of the memory access. reames: Rephrase: is the type of the memory access.
		/// \p Ptrs.
InstructionCost		InstructionCost
getPointersChainCost(ArrayRef<const Value > Ptrs, const Value Base,		getPointersChainCost(ArrayRef<const Value > Ptrs, const Value Base,
const PointersChainInfo &Info,		const PointersChainInfo &Info, Type *AccessTy,
		reamesUnsubmitted Done Reply Inline Actions Rename: AccessTy (This matches the naming convention we use elsewhere for this concept.) reames: Rename: AccessTy (This matches the naming convention we use elsewhere for this concept.)
TargetCostKind CostKind = TTI::TCK_RecipThroughput		TargetCostKind CostKind = TTI::TCK_RecipThroughput

) const;		) const;

/// \returns A value by which our inlining threshold should be multiplied.		/// \returns A value by which our inlining threshold should be multiplied.
/// This is primarily used to bump up the inlining threshold wholesale on		/// This is primarily used to bump up the inlining threshold wholesale on
/// targets where calls are unusually expensive.		/// targets where calls are unusually expensive.
///		///
▲ Show 20 Lines • Show All 1,324 Lines • ▼ Show 20 Lines
public:		public:
virtual ~Concept() = 0;		virtual ~Concept() = 0;
virtual const DataLayout &getDataLayout() const = 0;		virtual const DataLayout &getDataLayout() const = 0;
virtual InstructionCost getGEPCost(Type PointeeType, const Value Ptr,		virtual InstructionCost getGEPCost(Type PointeeType, const Value Ptr,
ArrayRef<const Value *> Operands,		ArrayRef<const Value *> Operands,
TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;
virtual InstructionCost		virtual InstructionCost
getPointersChainCost(ArrayRef<const Value > Ptrs, const Value Base,		getPointersChainCost(ArrayRef<const Value > Ptrs, const Value Base,
const TTI::PointersChainInfo &Info,		const TTI::PointersChainInfo &Info, Type *AccessTy,
TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;
virtual unsigned getInliningThresholdMultiplier() = 0;		virtual unsigned getInliningThresholdMultiplier() = 0;
virtual unsigned adjustInliningThreshold(const CallBase *CB) = 0;		virtual unsigned adjustInliningThreshold(const CallBase *CB) = 0;
virtual int getInlinerVectorBonusPercent() = 0;		virtual int getInlinerVectorBonusPercent() = 0;
virtual InstructionCost getMemcpyCost(const Instruction *I) = 0;		virtual InstructionCost getMemcpyCost(const Instruction *I) = 0;
virtual unsigned		virtual unsigned
getEstimatedNumberOfCaseClusters(const SwitchInst &SI, unsigned &JTSize,		getEstimatedNumberOfCaseClusters(const SwitchInst &SI, unsigned &JTSize,
ProfileSummaryInfo *PSI,		ProfileSummaryInfo *PSI,
▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	public:
getGEPCost(Type PointeeType, const Value Ptr,		getGEPCost(Type PointeeType, const Value Ptr,
ArrayRef<const Value *> Operands,		ArrayRef<const Value *> Operands,
TargetTransformInfo::TargetCostKind CostKind) override {		TargetTransformInfo::TargetCostKind CostKind) override {
return Impl.getGEPCost(PointeeType, Ptr, Operands, CostKind);		return Impl.getGEPCost(PointeeType, Ptr, Operands, CostKind);
}		}
InstructionCost getPointersChainCost(ArrayRef<const Value *> Ptrs,		InstructionCost getPointersChainCost(ArrayRef<const Value *> Ptrs,
const Value *Base,		const Value *Base,
const PointersChainInfo &Info,		const PointersChainInfo &Info,
		Type *AccessTy,
TargetCostKind CostKind) override {		TargetCostKind CostKind) override {
return Impl.getPointersChainCost(Ptrs, Base, Info, CostKind);		return Impl.getPointersChainCost(Ptrs, Base, Info, AccessTy, CostKind);
}		}
unsigned getInliningThresholdMultiplier() override {		unsigned getInliningThresholdMultiplier() override {
return Impl.getInliningThresholdMultiplier();		return Impl.getInliningThresholdMultiplier();
}		}
unsigned adjustInliningThreshold(const CallBase *CB) override {		unsigned adjustInliningThreshold(const CallBase *CB) override {
return Impl.adjustInliningThreshold(CB);		return Impl.adjustInliningThreshold(CB);
}		}
int getInlinerVectorBonusPercent() override {		int getInlinerVectorBonusPercent() override {
▲ Show 20 Lines • Show All 764 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 1,035 Lines • ▼ Show 20 Lines	if (static_cast<T *>(this)->isLegalAddressingMode(
Ptr->getType()->getPointerAddressSpace()))		Ptr->getType()->getPointerAddressSpace()))
return TTI::TCC_Free;		return TTI::TCC_Free;
return TTI::TCC_Basic;		return TTI::TCC_Basic;
}		}

InstructionCost getPointersChainCost(ArrayRef<const Value *> Ptrs,		InstructionCost getPointersChainCost(ArrayRef<const Value *> Ptrs,
const Value *Base,		const Value *Base,
const TTI::PointersChainInfo &Info,		const TTI::PointersChainInfo &Info,
		Type *AccessTy,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
InstructionCost Cost = TTI::TCC_Free;		InstructionCost Cost = TTI::TCC_Free;
// In the basic model we take into account GEP instructions only		// In the basic model we take into account GEP instructions only
// (although here can come alloca instruction, a value, constants and/or		// (although here can come alloca instruction, a value, constants and/or
// constant expressions, PHIs, bitcasts ... whatever allowed to be used as a		// constant expressions, PHIs, bitcasts ... whatever allowed to be used as a
// pointer). Typically, if Base is a not a GEP-instruction and all the		// pointer). Typically, if Base is a not a GEP-instruction and all the
// pointers are relative to the same base address, all the rest are		// pointers are relative to the same base address, all the rest are
// either GEP instructions, PHIs, bitcasts or constants. When we have same		// either GEP instructions, PHIs, bitcasts or constants. When we have same
// base, we just calculate cost of each non-Base GEP as an ADD operation if		// base, we just calculate cost of each non-Base GEP as an ADD operation if
// any their index is a non-const.		// any their index is a non-const.
// If no known dependecies between the pointers cost is calculated as a sum		// If no known dependecies between the pointers cost is calculated as a sum
// of costs of GEP instructions.		// of costs of GEP instructions.
for (const Value *V : Ptrs) {		for (const Value *V : Ptrs) {
		ABataevUnsubmitted Done Reply Inline Actions `I` ABataev: `I`
const auto *GEP = dyn_cast<GetElementPtrInst>(V);		const auto *GEP = dyn_cast<GetElementPtrInst>(V);
if (!GEP)		if (!GEP)
continue;		continue;
if (Info.isSameBase() && V != Base) {		if (Info.isSameBase() && V != Base) {
if (GEP->hasAllConstantIndices())		if (GEP->hasAllConstantIndices())
continue;		continue;
Cost += static_cast<T *>(this)->getArithmeticInstrCost(		Cost += static_cast<T *>(this)->getArithmeticInstrCost(
Instruction::Add, GEP->getType(), CostKind,		Instruction::Add, GEP->getType(), CostKind,
{TTI::OK_AnyValue, TTI::OP_None}, {TTI::OK_AnyValue, TTI::OP_None},		{TTI::OK_AnyValue, TTI::OP_None}, {TTI::OK_AnyValue, TTI::OP_None},
		reamesUnsubmitted Done Reply Inline Actions I don't see anything here which requires the stride to be equal to the type size. What prevents the set of pointers from e.g. advancing by 64 bytes where the access type is 8 bytes? (i.e. what prevents this from being a strided access with non-unit stride in RISCV terms?) reames: I don't see anything here which requires the stride to be equal to the type size. What…
		lukeAuthorUnsubmitted Done Reply Inline Actions From what I understand, `PointersChainInfo::isUniformStride()` implies unit stride pointers. The only user of `getPointersChainCost` is in SLP's `getGEPCostDiff`, and the only time `getKnownUniformStrided()` is created is in case 2) here, which is a wide vector load: // Here we differentiate two cases: (1) when Ptrs represent a regular // vectorization tree node (as they are pointer arguments of scattered // loads) or (2) when Ptrs are the arguments of loads or stores being // vectorized as plane wide unit-stride load/store since all the // loads/stores are known to be from/to adjacent locations. assert(E->State == TreeEntry::Vectorize && "Entry state expected to be Vectorize here."); if (isa<LoadInst, StoreInst>(VL0)) { // Case 2: estimate costs for pointer related costs when vectorizing to // a wide load/store. // Scalar cost is estimated as a set of pointers with known relationship // between them. // For vector code we will use BasePtr as argument for the wide load/store // but we also need to account all the instructions which are going to // stay in vectorized code due to uses outside of these scalar // loads/stores. ScalarCost = TTI->getPointersChainCost( Ptrs, BasePtr, TTI::PointersChainInfo::getKnownUniformStrided(), ScalarTy, CostKind); I agree this is non-obvious, I was considering renaming `isUniformStride()` to `isUnitStride()` or something more strict but deferred to keep the patch small. I can do this is a parent patch or alternatively leave a comment explaining this. luke: From what I understand, `PointersChainInfo::isUniformStride()` implies unit stride pointers.
std::nullopt);		std::nullopt);
} else {		} else {
SmallVector<const Value *> Indices(GEP->indices());		SmallVector<const Value *> Indices(GEP->indices());
		ABataevUnsubmitted Done Reply Inline Actions Add a comment with the param name for the `true` argument ABataev: Add a comment with the param name for the `true` argument
Cost += static_cast<T *>(this)->getGEPCost(GEP->getSourceElementType(),		Cost += static_cast<T *>(this)->getGEPCost(GEP->getSourceElementType(),
GEP->getPointerOperand(),		GEP->getPointerOperand(),
Indices, CostKind);		Indices, CostKind);
}		}
}		}
return Cost;		return Cost;
}		}

▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	TargetTransformInfo::getGEPCost(Type PointeeType, const Value Ptr,			TargetTransformInfo::getGEPCost(Type PointeeType, const Value Ptr,
	ArrayRef<const Value *> Operands,			ArrayRef<const Value *> Operands,
	TTI::TargetCostKind CostKind) const {			TTI::TargetCostKind CostKind) const {
	return TTIImpl->getGEPCost(PointeeType, Ptr, Operands, CostKind);			return TTIImpl->getGEPCost(PointeeType, Ptr, Operands, CostKind);
	}			}

	InstructionCost TargetTransformInfo::getPointersChainCost(			InstructionCost TargetTransformInfo::getPointersChainCost(
	ArrayRef<const Value > Ptrs, const Value Base,			ArrayRef<const Value > Ptrs, const Value Base,
	const TTI::PointersChainInfo &Info, TTI::TargetCostKind CostKind) const {			const TTI::PointersChainInfo &Info, Type *AccessTy,
				TTI::TargetCostKind CostKind) const {
	assert((Base \|\| !Info.isSameBase()) &&			assert((Base \|\| !Info.isSameBase()) &&
	"If pointers have same base address it has to be provided.");			"If pointers have same base address it has to be provided.");
	return TTIImpl->getPointersChainCost(Ptrs, Base, Info, CostKind);			return TTIImpl->getPointersChainCost(Ptrs, Base, Info, AccessTy, CostKind);
	}			}

	unsigned TargetTransformInfo::getEstimatedNumberOfCaseClusters(			unsigned TargetTransformInfo::getEstimatedNumberOfCaseClusters(
	const SwitchInst &SI, unsigned &JTSize, ProfileSummaryInfo *PSI,			const SwitchInst &SI, unsigned &JTSize, ProfileSummaryInfo *PSI,
	BlockFrequencyInfo *BFI) const {			BlockFrequencyInfo *BFI) const {
	return TTIImpl->getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);			return TTIImpl->getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);
	}			}

	▲ Show 20 Lines • Show All 1,030 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	bool preferEpilogueVectorization() const {
// should re-examine this once vectorization is better tuned.		// should re-examine this once vectorization is better tuned.
return false;		return false;
}		}

InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

		InstructionCost getPointersChainCost(ArrayRef<const Value *> Ptrs,
		const Value *Base,
		const TTI::PointersChainInfo &Info,
		Type *AccessTy,
		TTI::TargetCostKind CostKind);

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP,		TTI::UnrollingPreferences &UP,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);

void getPeelingPreferences(Loop *L, ScalarEvolution &SE,		void getPeelingPreferences(Loop *L, ScalarEvolution &SE,
TTI::PeelingPreferences &PP);		TTI::PeelingPreferences &PP);

unsigned getMinVectorRegisterBitWidth() const {		unsigned getMinVectorRegisterBitWidth() const {
▲ Show 20 Lines • Show All 242 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 1,586 Lines • ▼ Show 20 Lines	InstructionCost RISCVTTIImpl::getArithmeticInstrCost(
}		}
default:		default:
return ConstantMatCost +		return ConstantMatCost +
BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info,		BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info,
Args, CxtI);		Args, CxtI);
}		}
}		}

		// TODO: Deduplicate from TargetTransformInfoImplCRTPBase.
		InstructionCost RISCVTTIImpl::getPointersChainCost(
		ArrayRef<const Value > Ptrs, const Value Base,
		const TTI::PointersChainInfo &Info, Type *AccessTy,
		TTI::TargetCostKind CostKind) {
		InstructionCost Cost = TTI::TCC_Free;
		// In the basic model we take into account GEP instructions only
		// (although here can come alloca instruction, a value, constants and/or
		// constant expressions, PHIs, bitcasts ... whatever allowed to be used as a
		// pointer). Typically, if Base is a not a GEP-instruction and all the
		// pointers are relative to the same base address, all the rest are
		// either GEP instructions, PHIs, bitcasts or constants. When we have same
		// base, we just calculate cost of each non-Base GEP as an ADD operation if
		// any their index is a non-const.
		// If no known dependecies between the pointers cost is calculated as a sum
		// of costs of GEP instructions.
		for (auto [I, V] : enumerate(Ptrs)) {
		const auto *GEP = dyn_cast<GetElementPtrInst>(V);
		if (!GEP)
		continue;
		if (Info.isSameBase() && V != Base) {
		if (GEP->hasAllConstantIndices())
		continue;
		// If the chain is unit-stride and BaseReg + stride*i is a legal
		// addressing mode, then presume the base GEP is sitting around in a
		// register somewhere and check if we can fold the offset relative to
		// it.
		unsigned Stride = DL.getTypeStoreSize(AccessTy);
		if (Info.isUnitStride() &&
		isLegalAddressingMode(AccessTy,
		/* BaseGV */ nullptr,
		/* BaseOffset / Stride I,
		/* HasBaseReg */ true,
		/* Scale */ 0,
		GEP->getType()->getPointerAddressSpace()))
		continue;
		Cost += getArithmeticInstrCost(Instruction::Add, GEP->getType(), CostKind,
		{TTI::OK_AnyValue, TTI::OP_None},
		{TTI::OK_AnyValue, TTI::OP_None},
		std::nullopt);
		} else {
		SmallVector<const Value *> Indices(GEP->indices());
		Cost += getGEPCost(GEP->getSourceElementType(), GEP->getPointerOperand(),
		Indices, CostKind);
		}
		}
		return Cost;
		}

void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP,		TTI::UnrollingPreferences &UP,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
// TODO: More tuning on benchmarks and metrics with changes as needed		// TODO: More tuning on benchmarks and metrics with changes as needed
// would apply to all settings below to enable performance.		// would apply to all settings below to enable performance.


if (ST->enableDefaultUnroll())		if (ST->enableDefaultUnroll())
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	public:
InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,		InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
const Value *Ptr, bool VariableMask,		const Value *Ptr, bool VariableMask,
Align Alignment,		Align Alignment,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I);		const Instruction *I);
InstructionCost getPointersChainCost(ArrayRef<const Value *> Ptrs,		InstructionCost getPointersChainCost(ArrayRef<const Value *> Ptrs,
const Value *Base,		const Value *Base,
const TTI::PointersChainInfo &Info,		const TTI::PointersChainInfo &Info,
		Type *AccessTy,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
InstructionCost getAddressComputationCost(Type PtrTy, ScalarEvolution SE,		InstructionCost getAddressComputationCost(Type PtrTy, ScalarEvolution SE,
const SCEV *Ptr);		const SCEV *Ptr);

std::optional<Instruction *> instCombineIntrinsic(InstCombiner &IC,		std::optional<Instruction *> instCombineIntrinsic(InstCombiner &IC,
IntrinsicInst &II) const;		IntrinsicInst &II) const;
std::optional<Value *>		std::optional<Value *>
simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,		simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,937 Lines • ▼ Show 20 Lines	X86TTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *SrcTy, Align Alignment,
// Pre-AVX512 - each maskmov load costs 2 + store costs ~8.		// Pre-AVX512 - each maskmov load costs 2 + store costs ~8.
if (!ST->hasAVX512())		if (!ST->hasAVX512())
return Cost + LT.first * (IsLoad ? 2 : 8);		return Cost + LT.first * (IsLoad ? 2 : 8);

// AVX-512 masked load/store is cheaper		// AVX-512 masked load/store is cheaper
return Cost + LT.first;		return Cost + LT.first;
}		}

InstructionCost X86TTIImpl::getPointersChainCost(		InstructionCost
ArrayRef<const Value > Ptrs, const Value Base,		X86TTIImpl::getPointersChainCost(ArrayRef<const Value *> Ptrs,
const TTI::PointersChainInfo &Info, TTI::TargetCostKind CostKind) {		const Value *Base,
		const TTI::PointersChainInfo &Info,
		Type *AccessTy, TTI::TargetCostKind CostKind) {
if (Info.isSameBase() && Info.isKnownStride()) {		if (Info.isSameBase() && Info.isKnownStride()) {
// If all the pointers have known stride all the differences are translated		// If all the pointers have known stride all the differences are translated
reamesUnsubmitted Done Reply Inline Actions KnownStride, and KnownUniform are not the same condition. I don't think your code change in the general model actually matches what you removed here. I'd suggest by starting with a RISCV specific hook on your heuristic, and then we can merge in a post commit. I think the RISCV version is going to be more restrictive. reames: KnownStride, and KnownUniform are not the same condition. I don't think your code change in…
// into constants. X86 memory addressing allows encoding it into		// into constants. X86 memory addressing allows encoding it into
// displacement. So we just need to take the base GEP cost.		// displacement. So we just need to take the base GEP cost.
if (const auto *BaseGEP = dyn_cast<GetElementPtrInst>(Base)) {		if (const auto *BaseGEP = dyn_cast<GetElementPtrInst>(Base)) {
SmallVector<const Value *> Indices(BaseGEP->indices());		SmallVector<const Value *> Indices(BaseGEP->indices());
return getGEPCost(BaseGEP->getSourceElementType(),		return getGEPCost(BaseGEP->getSourceElementType(),
BaseGEP->getPointerOperand(), Indices, CostKind);		BaseGEP->getPointerOperand(), Indices, CostKind);
}		}
return TTI::TCC_Free;		return TTI::TCC_Free;
}		}
return BaseT::getPointersChainCost(Ptrs, Base, Info, CostKind);		return BaseT::getPointersChainCost(Ptrs, Base, Info, AccessTy, CostKind);
lukeAuthorUnsubmitted Done Reply Inline Actions I removed this because it seems to be subsumed by the extra check in the base implementation. It's 100% equivalent as it will now cost for GEPs that don't fit into the addressing mode, but that should be more accurate right? luke: I removed this because it seems to be subsumed by the extra check in the base implementation.
lukeAuthorUnsubmitted Done Reply Inline Actions Typo, it's not 100% equivalent luke: Typo, it's not 100% equivalent
}		}

InstructionCost X86TTIImpl::getAddressComputationCost(Type *Ty,		InstructionCost X86TTIImpl::getAddressComputationCost(Type *Ty,
ScalarEvolution *SE,		ScalarEvolution *SE,
const SCEV *Ptr) {		const SCEV *Ptr) {
// Address computations in vectorized code with non-consecutive addresses will		// Address computations in vectorized code with non-consecutive addresses will
// likely result in more instructions compared to scalar code where the		// likely result in more instructions compared to scalar code where the
// computation can more often be merged into the index mode. The resulting		// computation can more often be merged into the index mode. The resulting
▲ Show 20 Lines • Show All 1,662 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,388 Lines • ▼ Show 20 Lines	if (isa<LoadInst, StoreInst>(VL0)) {
// a wide load/store.		// a wide load/store.
// Scalar cost is estimated as a set of pointers with known relationship		// Scalar cost is estimated as a set of pointers with known relationship
// between them.		// between them.
// For vector code we will use BasePtr as argument for the wide load/store		// For vector code we will use BasePtr as argument for the wide load/store
// but we also need to account all the instructions which are going to		// but we also need to account all the instructions which are going to
// stay in vectorized code due to uses outside of these scalar		// stay in vectorized code due to uses outside of these scalar
// loads/stores.		// loads/stores.
ScalarCost = TTI->getPointersChainCost(		ScalarCost = TTI->getPointersChainCost(
Ptrs, BasePtr, TTI::PointersChainInfo::getUnitStride(), CostKind);		Ptrs, BasePtr, TTI::PointersChainInfo::getUnitStride(), ScalarTy,
		CostKind);

SmallVector<const Value *> PtrsRetainedInVecCode;		SmallVector<const Value *> PtrsRetainedInVecCode;
for (Value *V : Ptrs) {		for (Value *V : Ptrs) {
if (V == BasePtr) {		if (V == BasePtr) {
PtrsRetainedInVecCode.push_back(V);		PtrsRetainedInVecCode.push_back(V);
continue;		continue;
}		}
auto *Ptr = dyn_cast<GetElementPtrInst>(V);		auto *Ptr = dyn_cast<GetElementPtrInst>(V);
Show All 9 Lines	if (isa<LoadInst, StoreInst>(VL0)) {
// If all pointers stay in vectorized code then we don't have		// If all pointers stay in vectorized code then we don't have
// any savings on that.		// any savings on that.
LLVM_DEBUG(dumpTreeCosts(E, 0, ScalarCost, ScalarCost,		LLVM_DEBUG(dumpTreeCosts(E, 0, ScalarCost, ScalarCost,
"Calculated GEPs cost for Tree"));		"Calculated GEPs cost for Tree"));
return InstructionCost{TTI::TCC_Free};		return InstructionCost{TTI::TCC_Free};
}		}
VecCost = TTI->getPointersChainCost(		VecCost = TTI->getPointersChainCost(
PtrsRetainedInVecCode, BasePtr,		PtrsRetainedInVecCode, BasePtr,
TTI::PointersChainInfo::getKnownStride(), CostKind);		TTI::PointersChainInfo::getKnownStride(), VecTy, CostKind);
} else {		} else {
// Case 1: Ptrs are the arguments of loads that we are going to transform		// Case 1: Ptrs are the arguments of loads that we are going to transform
// into masked gather load intrinsic.		// into masked gather load intrinsic.
// All the scalar GEPs will be removed as a result of vectorization.		// All the scalar GEPs will be removed as a result of vectorization.
// For any external uses of some lanes extract element instructions will		// For any external uses of some lanes extract element instructions will
// be generated (which cost is estimated separately).		// be generated (which cost is estimated separately).
TTI::PointersChainInfo PtrsInfo =		TTI::PointersChainInfo PtrsInfo =
all_of(Ptrs,		all_of(Ptrs,
[](const Value *V) {		[](const Value *V) {
auto *Ptr = dyn_cast<GetElementPtrInst>(V);		auto *Ptr = dyn_cast<GetElementPtrInst>(V);
return Ptr && !Ptr->hasAllConstantIndices();		return Ptr && !Ptr->hasAllConstantIndices();
})		})
? TTI::PointersChainInfo::getUnknownStride()		? TTI::PointersChainInfo::getUnknownStride()
: TTI::PointersChainInfo::getKnownStride();		: TTI::PointersChainInfo::getKnownStride();

ScalarCost = TTI->getPointersChainCost(Ptrs, BasePtr, PtrsInfo, CostKind);		ScalarCost = TTI->getPointersChainCost(Ptrs, BasePtr, PtrsInfo, ScalarTy,
		CostKind);

// Remark: it not quite correct to use scalar GEP cost for a vector GEP,		// Remark: it not quite correct to use scalar GEP cost for a vector GEP,
// but it's not clear how to do that without having vector GEP arguments		// but it's not clear how to do that without having vector GEP arguments
// ready.		// ready.
// Perhaps using just TTI::TCC_Free/TTI::TCC_Basic would be better option.		// Perhaps using just TTI::TCC_Free/TTI::TCC_Basic would be better option.
if (const auto *Base = dyn_cast<GetElementPtrInst>(BasePtr)) {		if (const auto *Base = dyn_cast<GetElementPtrInst>(BasePtr)) {
SmallVector<const Value *> Indices(Base->indices());		SmallVector<const Value *> Indices(Base->indices());
VecCost = TTI->getGEPCost(Base->getSourceElementType(),		VecCost = TTI->getGEPCost(Base->getSourceElementType(),
▲ Show 20 Lines • Show All 7,532 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/RISCV/getpointerschaincost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				lukeAuthorUnsubmitted Done Reply Inline Actions Apologies in advance for testing with the debug output, I can't think of another way to get access to `getPointersChainCost`. luke: Apologies in advance for testing with the debug output, I can't think of another way to get…
				ABataevUnsubmitted Done Reply Inline Actions -pass-remarks-output= inbstead, check some of the remarks_... tests in SLPVectorizer tests directory as an example. Also, better to precommit new tests separately ABataev: -pass-remarks-output= inbstead, check some of the remarks_... tests in SLPVectorizer tests…
	; RUN: opt -S -mtriple=riscv64 -mattr=+v -riscv-v-slp-max-vf=0 -passes=slp-vectorizer -pass-remarks-output=%t < %s \| FileCheck %s			; RUN: opt -S -mtriple=riscv64 -mattr=+v -riscv-v-slp-max-vf=0 -passes=slp-vectorizer -pass-remarks-output=%t < %s \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	; Because all of these addresses are foldable, the scalar cost should be 0 when			; Because all of these addresses are foldable, the scalar cost should be 0 when
	; computing the pointers chain cost.			; computing the pointers chain cost.
	;			;
	; TODO: These are currently costed as free the indices are all constants, but we			; TODO: These are currently costed as free the indices are all constants, but we
	; should check if the constants are actually foldable			; should check if the constants are actually foldable
	Show All 38 Lines
	; CHECK-NEXT: [[P1:%.*]] = getelementptr i32, ptr [[DEST]], i32 2048			; CHECK-NEXT: [[P1:%.*]] = getelementptr i32, ptr [[DEST]], i32 2048
	; CHECK-NEXT: store <4 x i32> <i32 1, i32 1, i32 1, i32 1>, ptr [[P1]], align 4			; CHECK-NEXT: store <4 x i32> <i32 1, i32 1, i32 1, i32 1>, ptr [[P1]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: g			; YAML-NEXT: Function: g
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
				lukeAuthorUnsubmitted Done Reply Inline Actions Also worth noting, I tried to come up with a test case where only some of the pointers were folded and some weren't, but couldn't find a sane way to do so. Namely for RISC-V, we need a chain of pointers that are unit-strided, but is also somehow long enough that the offset overflows 2^12. luke: Also worth noting, I tried to come up with a test case where only some of the pointers were…
	; YAML-NEXT: - String: 'Stores SLP vectorized with cost '			; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
	; YAML-NEXT: - Cost: '-2'			; YAML-NEXT: - Cost: '-2'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '2'			; YAML-NEXT: - TreeSize: '2'
	%p1 = getelementptr i32, ptr %dest, i32 2048			%p1 = getelementptr i32, ptr %dest, i32 2048
	store i32 1, ptr %p1			store i32 1, ptr %p1
	%p2 = getelementptr i32, ptr %dest, i32 2049			%p2 = getelementptr i32, ptr %dest, i32 2049
	store i32 1, ptr %p2			store i32 1, ptr %p2
	%p3 = getelementptr i32, ptr %dest, i32 2050			%p3 = getelementptr i32, ptr %dest, i32 2050
	store i32 1, ptr %p3			store i32 1, ptr %p3
	%p4 = getelementptr i32, ptr %dest, i32 2051			%p4 = getelementptr i32, ptr %dest, i32 2051
	store i32 1, ptr %p4			store i32 1, ptr %p4
	ret void			ret void
	}			}

	; FIXME: When computing the scalar pointers chain cost here, there is a cost of			; When computing the scalar pointers chain cost here, there is a cost of
	; 1 for the base pointer, and the rest can be folded in, so the scalar cost			; 1 for the base pointer, and the rest can be folded in, so the scalar cost
	; should be 1.			; should be 1.
	define void @h(ptr %dest, i32 %i) {			define void @h(ptr %dest, i32 %i) {
	; CHECK-LABEL: define void @h			; CHECK-LABEL: define void @h
	; CHECK-SAME: (ptr [[DEST:%.]], i32 [[I:%.]]) #[[ATTR0]] {			; CHECK-SAME: (ptr [[DEST:%.]], i32 [[I:%.]]) #[[ATTR0]] {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[P1:%.*]] = getelementptr [4 x i32], ptr [[DEST]], i32 [[I]], i32 0			; CHECK-NEXT: [[P1:%.*]] = getelementptr [4 x i32], ptr [[DEST]], i32 [[I]], i32 0
	; CHECK-NEXT: store <4 x i32> <i32 1, i32 1, i32 1, i32 1>, ptr [[P1]], align 4			; CHECK-NEXT: store <4 x i32> <i32 1, i32 1, i32 1, i32 1>, ptr [[P1]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: h			; YAML-NEXT: Function: h
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'Stores SLP vectorized with cost '			; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
	; YAML-NEXT: - Cost: '-5'			; YAML-NEXT: - Cost: '-2'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '2'			; YAML-NEXT: - TreeSize: '2'
	%p1 = getelementptr [4 x i32], ptr %dest, i32 %i, i32 0			%p1 = getelementptr [4 x i32], ptr %dest, i32 %i, i32 0
	store i32 1, ptr %p1			store i32 1, ptr %p1
	%p2 = getelementptr [4 x i32], ptr %dest, i32 %i, i32 1			%p2 = getelementptr [4 x i32], ptr %dest, i32 %i, i32 1
	store i32 1, ptr %p2			store i32 1, ptr %p2
	%p3 = getelementptr [4 x i32], ptr %dest, i32 %i, i32 2			%p3 = getelementptr [4 x i32], ptr %dest, i32 %i, i32 2
	store i32 1, ptr %p3			store i32 1, ptr %p3
	%p4 = getelementptr [4 x i32], ptr %dest, i32 %i, i32 3			%p4 = getelementptr [4 x i32], ptr %dest, i32 %i, i32 3
	store i32 1, ptr %p4			store i32 1, ptr %p4
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/RISCV/struct-gep.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -mtriple=riscv64 -mattr=+v \			; RUN: opt < %s -passes=slp-vectorizer -mtriple=riscv64 -mattr=+v \
	; RUN: -riscv-v-slp-max-vf=0 -S \| FileCheck %s			; RUN: -riscv-v-slp-max-vf=0 -S \| FileCheck %s

	; FIXME: This should not be vectorized			; This shouldn't be vectorized as the extra address computation required for the
				; vector store make it unprofitable (vle/vse don't have an offset in their
				; addressing modes)

	%struct.2i32 = type { i32, i32 }			%struct.2i32 = type { i32, i32 }

	define void @splat_store_v2i32(ptr %dest, i64 %i) {			define void @splat_store_v2i32(ptr %dest, i64 %i) {
	; CHECK-LABEL: @splat_store_v2i32(			; CHECK-LABEL: @splat_store_v2i32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[P1:%.]] = getelementptr [[STRUCT_2I32:%.]], ptr [[DEST:%.]], i64 [[I:%.]], i32 0			; CHECK-NEXT: [[P1:%.]] = getelementptr [[STRUCT_2I32:%.]], ptr [[DEST:%.]], i64 [[I:%.]], i32 0
	; CHECK-NEXT: store <2 x i32> <i32 1, i32 1>, ptr [[P1]], align 4			; CHECK-NEXT: store i32 1, ptr [[P1]], align 4
				; CHECK-NEXT: [[P2:%.*]] = getelementptr [[STRUCT_2I32]], ptr [[DEST]], i64 [[I]], i32 1
				; CHECK-NEXT: store i32 1, ptr [[P2]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%p1 = getelementptr %struct.2i32, ptr %dest, i64 %i, i32 0			%p1 = getelementptr %struct.2i32, ptr %dest, i64 %i, i32 0
	store i32 1, ptr %p1			store i32 1, ptr %p1
	%p2 = getelementptr %struct.2i32, ptr %dest, i64 %i, i32 1			%p2 = getelementptr %struct.2i32, ptr %dest, i64 %i, i32 1
	store i32 1, ptr %p2			store i32 1, ptr %p2
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[SLP][RISCV] Account for offset folding in getPointersChainCostClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 524261

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/RISCV/getpointerschaincost.ll

llvm/test/Transforms/SLPVectorizer/RISCV/struct-gep.ll

[SLP][RISCV] Account for offset folding in getPointersChainCost
ClosedPublic