This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
4
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
11
LoopVectorize.cpp

Differential D59149

[LV] move useEmulatedMaskMemRefHack() functionality to TTI.
Needs ReviewPublic

Authored by hsaito on Mar 8 2019, 12:14 PM.

Download Raw Diff

Details

Reviewers

rengolin
markus
craig.topper
hfinkel

Summary

This is a long overdue response to the review comments in https://reviews.llvm.org/D43208#inline-382175.

D43208 created a generic "hack" to essentially disable vectorization if the masked memref requires emulation, to "match" the preexisting behavior.
This patch moves it to TTI so that this can be moved towards the better cost modeling, one target at a time.

Thanks, @markus for reminding me.

Diff Detail

Event Timeline

hsaito created this revision.Mar 8 2019, 12:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2019, 12:14 PM

Herald added subscribers: llvm-commits, jdoerfert, rkruppe, hiraditya. · View Herald Transcript

hsaito marked an inline comment as done.Mar 8 2019, 12:16 PM

hsaito added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5230	Please comment how much of this documentation you'd like to see moved to TTI.

hsaito marked an inline comment as not done.Mar 8 2019, 12:17 PM

Any improvement is an improvement so I am happy with that but it is still mentioned that this solution is a hack and I guess the

Cost model for emulated masked load/store is completely broken.

comment is still valid. What would it take to address this properly?

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
270	This value seems a bit arbitrary. Could it not be e.g. INT_MAX instead if it is supposed to represent something infinitely expensive?

The concept of "hacked" is lost when you move up to TTI. I'd change the logic to reflect that this is making it "prohibitively expensive" instead of "hacked value".

It doesn't need to change much, just use the "override value" pattern:

Cost = someComputation();
// Target can override mask cost (ex. when it's prohibitively expensive)
EmulatedCost = TTI.getEmulatedMaskMemRefCost(...);
if (EmulatedCost)
  Cost = EmulatedCost;

Also, that's perhaps not the right place to add actual costs. Moving up the boolean function as is would make more sense, but the "hacked cost" would still remain.

Can't you move this to the cost model?

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
270	Costs will be added to others, INT_MAX would wrap.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488	This if doesn't need the Cost above, so you can avoid the division by moving it up to the beginning of the block. Nit: you can just return `HackedCost` instead of assigning, but that will depend on the new pattern.

In D59149#1426070, @markus wrote:

Any improvement is an improvement so I am happy with that but it is still mentioned that this solution is a hack and I guess the

Cost model for emulated masked load/store is completely broken.

comment is still valid. What would it take to address this properly?

Each target to run many applications/benchmarks to come up with the "right" adjustment to the cost model. No way around that.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
270	I can certainly add a comment saying that this is an arbitrarily chosen "big enough value" --- just need to be sufficiently bigger than a typical instruction cost, which is a single digit or a low double digit, so as to disable most cases of emulated masked memory references.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488	This is actually a good question. Maybe, we want to structure this TTI interface as the adjustment to the base cost computed above. In which case, the code would look like Cost += TTI.getEmulatedMaskMemRefAdjustment(...) Does this make more sense? Even then, I still agree that we can return early like if (!isPredicaredInst(I)) return Cost; Cost /= getReciprocalPredBlockProb(); return Cost + TTI.getEmulatedMaskMemRefAdjustment(...) Does this sound better?

hsaito marked 2 inline comments as not done.Mar 13 2019, 12:39 PM

In D59149#1427946, @hsaito wrote:

Cost model for emulated masked load/store is completely broken.

comment is still valid. What would it take to address this properly?

Each target to run many applications/benchmarks to come up with the "right" adjustment to the cost model. No way around that.

Indeed, not something that can be done overnight.

But if the intention of this patch is to start preparing terrain for that work to start (at least on Intel), then I guess the current strategy needs to start changing from "hacked big enough values overriding the cost" to "adding up an estimate of the number of cycles or something".

As I said on my comment, this doesn't mean the patch needs to do everything, just the difference between (Cost = BigEnough) versus (Cost += Adjustment) in the current case.

cheers,
--renato

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
270	This is not the only case, and I think it would be interesting to look at all the other "big enough values" and perhaps create a global const int PROHIBITIVE_COST and set to a "big enough value" that would suit all cases. But this is not necessarily for this patch, though. Just an idea.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488	If the prohibitive cost is large enough, then adding up will make no difference, and you can early return on either Cost or Adjusted. If the cost isn't meant as just "big enough" but to (later) actually emulate the masking costs, then it makes sense to add like you propose on the comment above. Right now, the case looks like the former, but if you're planning it to be more like the latter, than the new proposal makes sense.

I guess what we want to achieve here (and please correct me if I am wrong) is

For this first patch to have zero impact on generated code
Allow targets to to start overriding getEmulatedMaskMemRefCost and return some sensible cost for the single operation being queried (and not just 3000000 or 0)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5251	Would this still work if we allow `getEmulatedMaskMemRefCost` to return an actual cost and not just `3000000` or `0`?
5488	I agree that this should be changed to Cost += TTI.getEmulatedMaskMemRefAdjustment(...) (or equivalent with the early return).

rengolin added inline comments.Mar 19 2019, 2:30 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488	IFF the adjustment has any meaning other than "too large a number", I agree, too.

hsaito added inline comments.Mar 19 2019, 10:06 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488	The reason I changed here to this style is that the person starting to work on the real adjustment doesn't have to change the structure here. If you think this suggested "stylistic" change requires at least one target doing a "reasonable" adjustment (or committed to start such a study), I could do that for AVX2 target ---- come up with a smaller number that doesn't degrade some (undisclosed) set of benchmarks that I can run quickly enough. Does this sound reasonable?

markus added inline comments.Mar 20 2019, 1:54 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488	That sounds excellent to me!

rengolin added inline comments.Mar 20 2019, 2:07 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488	+1

hsaito added inline comments.Mar 22 2019, 2:25 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488	Started working on it. Need to run a lot of perf experiments. Will take a bit of time. Just FYI. In the mean time, I'll update the patch along this direction.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

11 lines

TargetTransformInfoImpl.h

7 lines

lib/

Analysis/

TargetTransformInfo.cpp

6 lines

Transforms/

Vectorize/

LoopVectorize.cpp

27 lines

Diff 189902

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedLoad(Type *DataType) const;		bool isLegalMaskedLoad(Type *DataType) const;

/// Return true if the target supports masked gather/scatter		/// Return true if the target supports masked gather/scatter
/// AVX-512 fully supports gather and scatter for vectors with 32 and 64		/// AVX-512 fully supports gather and scatter for vectors with 32 and 64
/// bits scalar type.		/// bits scalar type.
bool isLegalMaskedScatter(Type *DataType) const;		bool isLegalMaskedScatter(Type *DataType) const;
bool isLegalMaskedGather(Type *DataType) const;		bool isLegalMaskedGather(Type *DataType) const;

		/// If non-zero, returns the (artificially high) cost for emulated masked
		/// memrefs.
		int getEmulatedMaskMemRefCost(Instruction *Inst, int NumPredStores,
		int Threshold) const;

/// Return true if the target has a unified operation to calculate division		/// Return true if the target has a unified operation to calculate division
/// and remainder. If so, the additional implicit multiplication and		/// and remainder. If so, the additional implicit multiplication and
/// subtraction required to calculate a remainder from division are free. This		/// subtraction required to calculate a remainder from division are free. This
/// can enable more aggressive transformations for division and remainder than		/// can enable more aggressive transformations for division and remainder than
/// would typically be allowed using throughput or size cost models.		/// would typically be allowed using throughput or size cost models.
bool hasDivRemOp(Type *DataType, bool IsSigned) const;		bool hasDivRemOp(Type *DataType, bool IsSigned) const;

/// Return true if the given instruction (assumed to be a memory access		/// Return true if the given instruction (assumed to be a memory access
▲ Show 20 Lines • Show All 557 Lines • ▼ Show 20 Lines	virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2) = 0;		TargetTransformInfo::LSRCost &C2) = 0;
virtual bool canMacroFuseCmp() = 0;		virtual bool canMacroFuseCmp() = 0;
virtual bool shouldFavorPostInc() const = 0;		virtual bool shouldFavorPostInc() const = 0;
virtual bool shouldFavorBackedgeIndex(const Loop *L) const = 0;		virtual bool shouldFavorBackedgeIndex(const Loop *L) const = 0;
virtual bool isLegalMaskedStore(Type *DataType) = 0;		virtual bool isLegalMaskedStore(Type *DataType) = 0;
virtual bool isLegalMaskedLoad(Type *DataType) = 0;		virtual bool isLegalMaskedLoad(Type *DataType) = 0;
virtual bool isLegalMaskedScatter(Type *DataType) = 0;		virtual bool isLegalMaskedScatter(Type *DataType) = 0;
virtual bool isLegalMaskedGather(Type *DataType) = 0;		virtual bool isLegalMaskedGather(Type *DataType) = 0;
		virtual int getEmulatedMaskMemRefCost(Instruction *Inst, int NumPredStores,
		int Threshold) = 0;
virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;		virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;
virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;		virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;
virtual bool prefersVectorizedAddressing() = 0;		virtual bool prefersVectorizedAddressing() = 0;
virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace) = 0;		int64_t Scale, unsigned AddrSpace) = 0;
virtual bool LSRWithInstrQueries() = 0;		virtual bool LSRWithInstrQueries() = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	bool isLegalMaskedLoad(Type *DataType) override {
return Impl.isLegalMaskedLoad(DataType);		return Impl.isLegalMaskedLoad(DataType);
}		}
bool isLegalMaskedScatter(Type *DataType) override {		bool isLegalMaskedScatter(Type *DataType) override {
return Impl.isLegalMaskedScatter(DataType);		return Impl.isLegalMaskedScatter(DataType);
}		}
bool isLegalMaskedGather(Type *DataType) override {		bool isLegalMaskedGather(Type *DataType) override {
return Impl.isLegalMaskedGather(DataType);		return Impl.isLegalMaskedGather(DataType);
}		}
		int getEmulatedMaskMemRefCost(Instruction *Inst, int NumPredStores,
		int Threshold) override {
		return Impl.getEmulatedMaskMemRefCost(Inst, NumPredStores, Threshold);
		}
bool hasDivRemOp(Type *DataType, bool IsSigned) override {		bool hasDivRemOp(Type *DataType, bool IsSigned) override {
return Impl.hasDivRemOp(DataType, IsSigned);		return Impl.hasDivRemOp(DataType, IsSigned);
}		}
bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) override {		bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) override {
return Impl.hasVolatileVariant(I, AddrSpace);		return Impl.hasVolatileVariant(I, AddrSpace);
}		}
bool prefersVectorizedAddressing() override {		bool prefersVectorizedAddressing() override {
return Impl.prefersVectorizedAddressing();		return Impl.prefersVectorizedAddressing();
▲ Show 20 Lines • Show All 400 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedStore(Type *DataType) { return false; }		bool isLegalMaskedStore(Type *DataType) { return false; }

bool isLegalMaskedLoad(Type *DataType) { return false; }		bool isLegalMaskedLoad(Type *DataType) { return false; }

bool isLegalMaskedScatter(Type *DataType) { return false; }		bool isLegalMaskedScatter(Type *DataType) { return false; }

bool isLegalMaskedGather(Type *DataType) { return false; }		bool isLegalMaskedGather(Type *DataType) { return false; }

		int getEmulatedMaskMemRefCost(Instruction *Inst, int NumPredStores,
		int Threshold) {
		bool useHackedCost = isa<LoadInst>(Inst) \|\|
		(isa<StoreInst>(Inst) && NumPredStores > Threshold);
		return useHackedCost ? 3000000 : 0;
		markusUnsubmitted Not Done Reply Inline Actions This value seems a bit arbitrary. Could it not be e.g. INT_MAX instead if it is supposed to represent something infinitely expensive? markus: This value seems a bit arbitrary. Could it not be e.g. INT_MAX instead if it is supposed to…
		rengolinUnsubmitted Not Done Reply Inline Actions Costs will be added to others, INT_MAX would wrap. rengolin: Costs will be added to others, INT_MAX would wrap.
		hsaitoAuthorUnsubmitted Not Done Reply Inline Actions I can certainly add a comment saying that this is an arbitrarily chosen "big enough value" --- just need to be sufficiently bigger than a typical instruction cost, which is a single digit or a low double digit, so as to disable most cases of emulated masked memory references. hsaito: I can certainly add a comment saying that this is an arbitrarily chosen "big enough value"…
		rengolinUnsubmitted Not Done Reply Inline Actions This is not the only case, and I think it would be interesting to look at all the other "big enough values" and perhaps create a global const int PROHIBITIVE_COST and set to a "big enough value" that would suit all cases. But this is not necessarily for this patch, though. Just an idea. rengolin: This is not the only case, and I think it would be interesting to look at all the other "big…
		}

bool hasDivRemOp(Type *DataType, bool IsSigned) { return false; }		bool hasDivRemOp(Type *DataType, bool IsSigned) { return false; }

bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) { return false; }		bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) { return false; }

bool prefersVectorizedAddressing() { return true; }		bool prefersVectorizedAddressing() { return true; }

int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale, unsigned AddrSpace) {		bool HasBaseReg, int64_t Scale, unsigned AddrSpace) {
▲ Show 20 Lines • Show All 595 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::isLegalMaskedGather(Type *DataType) const {			bool TargetTransformInfo::isLegalMaskedGather(Type *DataType) const {
	return TTIImpl->isLegalMaskedGather(DataType);			return TTIImpl->isLegalMaskedGather(DataType);
	}			}

	bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType) const {			bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType) const {
	return TTIImpl->isLegalMaskedScatter(DataType);			return TTIImpl->isLegalMaskedScatter(DataType);
	}			}

				int TargetTransformInfo::getEmulatedMaskMemRefCost(Instruction *Inst,
				int NumPredStores,
				int Threshold) const {
				return TTIImpl->getEmulatedMaskMemRefCost(Inst, NumPredStores, Threshold);
				}

	bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {			bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {
	return TTIImpl->hasDivRemOp(DataType, IsSigned);			return TTIImpl->hasDivRemOp(DataType, IsSigned);
	}			}

	bool TargetTransformInfo::hasVolatileVariant(Instruction *I,			bool TargetTransformInfo::hasVolatileVariant(Instruction *I,
	unsigned AddrSpace) const {			unsigned AddrSpace) const {
	return TTIImpl->hasVolatileVariant(I, AddrSpace);			return TTIImpl->hasVolatileVariant(I, AddrSpace);
	}			}
	▲ Show 20 Lines • Show All 1,031 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,207 Lines • ▼ Show 20 Lines	private:
/// Store: scalar store + (loop invariant value stored? 0 : extract of last		/// Store: scalar store + (loop invariant value stored? 0 : extract of last
/// element)		/// element)
unsigned getUniformMemOpCost(Instruction *I, unsigned VF);		unsigned getUniformMemOpCost(Instruction *I, unsigned VF);

/// Returns whether the instruction is a load or store and will be a emitted		/// Returns whether the instruction is a load or store and will be a emitted
/// as a vector operation.		/// as a vector operation.
bool isConsecutiveLoadOrStore(Instruction *I);		bool isConsecutiveLoadOrStore(Instruction *I);

/// Returns true if an artificially high cost for emulated masked memrefs
/// should be used.
bool useEmulatedMaskMemRefHack(Instruction *I);

/// Create an analysis remark that explains why vectorization failed		/// Create an analysis remark that explains why vectorization failed
///		///
/// \p RemarkName is the identifier for the remark. \return the remark object		/// \p RemarkName is the identifier for the remark. \return the remark object
/// that can be streamed to.		/// that can be streamed to.
OptimizationRemarkAnalysis createMissedAnalysis(StringRef RemarkName) {		OptimizationRemarkAnalysis createMissedAnalysis(StringRef RemarkName) {
return createLVMissedAnalysis(Hints->vectorizeAnalysisPassName(),		return createLVMissedAnalysis(Hints->vectorizeAnalysisPassName(),
RemarkName, TheLoop);		RemarkName, TheLoop);
}		}
▲ Show 20 Lines • Show All 3,994 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = VFs.size(); i < e; ++i) {
RU.LoopInvariantRegs = Invariant;		RU.LoopInvariantRegs = Invariant;
RU.MaxLocalUsers = MaxUsages[i];		RU.MaxLocalUsers = MaxUsages[i];
RUs[i] = RU;		RUs[i] = RU;
}		}

return RUs;		return RUs;
}		}

bool LoopVectorizationCostModel::useEmulatedMaskMemRefHack(Instruction *I){
hsaitoAuthorUnsubmitted Not Done Reply Inline Actions Please comment how much of this documentation you'd like to see moved to TTI. hsaito: Please comment how much of this documentation you'd like to see moved to TTI.
// TODO: Cost model for emulated masked load/store is completely
// broken. This hack guides the cost model to use an artificially
// high enough value to practically disable vectorization with such
// operations, except where previously deployed legality hack allowed
// using very low cost values. This is to avoid regressions coming simply
// from moving "masked load/store" check from legality to cost model.
// Masked Load/Gather emulation was previously never allowed.
// Limited number of Masked Store/Scatter emulation was allowed.
assert(isPredicatedInst(I) && "Expecting a scalar emulated instruction");
return isa<LoadInst>(I) \|\|
(isa<StoreInst>(I) &&
NumPredStores > NumberOfStoresToPredicate);
}

void LoopVectorizationCostModel::collectInstsToScalarize(unsigned VF) {		void LoopVectorizationCostModel::collectInstsToScalarize(unsigned VF) {
// If we aren't vectorizing the loop, or if we've already collected the		// If we aren't vectorizing the loop, or if we've already collected the
// instructions to scalarize, there's nothing to do. Collection may already		// instructions to scalarize, there's nothing to do. Collection may already
// have occurred if we have a user-selected VF and are now computing the		// have occurred if we have a user-selected VF and are now computing the
// expected cost for interleaving.		// expected cost for interleaving.
if (VF < 2 \|\| InstsToScalarize.find(VF) != InstsToScalarize.end())		if (VF < 2 \|\| InstsToScalarize.find(VF) != InstsToScalarize.end())
return;		return;

// Initialize a mapping for VF in InstsToScalalarize. If we find that it's		// Initialize a mapping for VF in InstsToScalalarize. If we find that it's
// not profitable to scalarize any instructions, the presence of VF in the		// not profitable to scalarize any instructions, the presence of VF in the
// map will indicate that we've analyzed it already.		// map will indicate that we've analyzed it already.
ScalarCostsTy &ScalarCostsVF = InstsToScalarize[VF];		ScalarCostsTy &ScalarCostsVF = InstsToScalarize[VF];

// Find all the instructions that are scalar with predication in the loop and		// Find all the instructions that are scalar with predication in the loop and
// determine if it would be better to not if-convert the blocks they are in.		// determine if it would be better to not if-convert the blocks they are in.
// If so, we also record the instructions to scalarize.		// If so, we also record the instructions to scalarize.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
if (!blockNeedsPredication(BB))		if (!blockNeedsPredication(BB))
continue;		continue;
for (Instruction &I : *BB)		for (Instruction &I : *BB)
if (isScalarWithPredication(&I)) {		if (isScalarWithPredication(&I)) {
ScalarCostsTy ScalarCosts;		ScalarCostsTy ScalarCosts;
// Do not apply discount logic if hacked cost is needed		// Do not apply discount logic if hacked cost is needed
// for emulated masked memrefs.		// for emulated masked memrefs.
if (!useEmulatedMaskMemRefHack(&I) &&		if (TTI.getEmulatedMaskMemRefCost(&I, NumPredStores,
		NumberOfStoresToPredicate) == 0 &&
		markusUnsubmitted Not Done Reply Inline Actions Would this still work if we allow `getEmulatedMaskMemRefCost` to return an actual cost and not just `3000000` or `0`? markus: Would this still work if we allow `getEmulatedMaskMemRefCost` to return an actual cost and not…
computePredInstDiscount(&I, ScalarCosts, VF) >= 0)		computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());		ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());
// Remember that BB will remain after vectorization.		// Remember that BB will remain after vectorization.
PredicatedBBsAfterVectorization.insert(BB);		PredicatedBBsAfterVectorization.insert(BB);
}		}
}		}
}		}

▲ Show 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	unsigned LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
Cost += getScalarizationOverhead(I, VF, TTI);		Cost += getScalarizationOverhead(I, VF, TTI);

// If we have a predicated store, it may not be executed for each vector		// If we have a predicated store, it may not be executed for each vector
// lane. Scale the cost by the probability of executing the predicated		// lane. Scale the cost by the probability of executing the predicated
// block.		// block.
if (isPredicatedInst(I)) {		if (isPredicatedInst(I)) {
Cost /= getReciprocalPredBlockProb();		Cost /= getReciprocalPredBlockProb();

if (useEmulatedMaskMemRefHack(I))		if (int HackedCost = TTI.getEmulatedMaskMemRefCost(
		rengolinUnsubmitted Not Done Reply Inline Actions This if doesn't need the Cost above, so you can avoid the division by moving it up to the beginning of the block. Nit: you can just return `HackedCost` instead of assigning, but that will depend on the new pattern. rengolin: This if doesn't need the Cost above, so you can avoid the division by moving it up to the…
		hsaitoAuthorUnsubmitted Not Done Reply Inline Actions This is actually a good question. Maybe, we want to structure this TTI interface as the adjustment to the base cost computed above. In which case, the code would look like Cost += TTI.getEmulatedMaskMemRefAdjustment(...) Does this make more sense? Even then, I still agree that we can return early like if (!isPredicaredInst(I)) return Cost; Cost /= getReciprocalPredBlockProb(); return Cost + TTI.getEmulatedMaskMemRefAdjustment(...) Does this sound better? hsaito: This is actually a good question. Maybe, we want to structure this TTI interface as the…
		rengolinUnsubmitted Not Done Reply Inline Actions If the prohibitive cost is large enough, then adding up will make no difference, and you can early return on either Cost or Adjusted. If the cost isn't meant as just "big enough" but to (later) actually emulate the masking costs, then it makes sense to add like you propose on the comment above. Right now, the case looks like the former, but if you're planning it to be more like the latter, than the new proposal makes sense. rengolin: If the prohibitive cost is large enough, then adding up will make no difference, and you can…
		markusUnsubmitted Not Done Reply Inline Actions I agree that this should be changed to Cost += TTI.getEmulatedMaskMemRefAdjustment(...) (or equivalent with the early return). markus: I agree that this should be changed to ``` Cost += TTI.getEmulatedMaskMemRefAdjustment(...) ```…
		rengolinUnsubmitted Not Done Reply Inline Actions IFF the adjustment has any meaning other than "too large a number", I agree, too. rengolin: IFF the adjustment has any meaning other than "too large a number", I agree, too.
		hsaitoAuthorUnsubmitted Not Done Reply Inline Actions The reason I changed here to this style is that the person starting to work on the real adjustment doesn't have to change the structure here. If you think this suggested "stylistic" change requires at least one target doing a "reasonable" adjustment (or committed to start such a study), I could do that for AVX2 target ---- come up with a smaller number that doesn't degrade some (undisclosed) set of benchmarks that I can run quickly enough. Does this sound reasonable? hsaito: The reason I changed here to this style is that the person starting to work on the real…
		markusUnsubmitted Not Done Reply Inline Actions That sounds excellent to me! markus: That sounds excellent to me!
		rengolinUnsubmitted Not Done Reply Inline Actions +1 rengolin: +1
		hsaitoAuthorUnsubmitted Not Done Reply Inline Actions Started working on it. Need to run a lot of perf experiments. Will take a bit of time. Just FYI. In the mean time, I'll update the patch along this direction. hsaito: Started working on it. Need to run a lot of perf experiments. Will take a bit of time. Just FYI.
		I, NumPredStores, NumberOfStoresToPredicate))
// Artificially setting to a high enough value to practically disable		// Artificially setting to a high enough value to practically disable
// vectorization with such operations.		// vectorization with such operations.
Cost = 3000000;		Cost = HackedCost;
}		}

return Cost;		return Cost;
}		}

unsigned LoopVectorizationCostModel::getConsecutiveMemOpCost(Instruction *I,		unsigned LoopVectorizationCostModel::getConsecutiveMemOpCost(Instruction *I,
unsigned VF) {		unsigned VF) {
Type *ValTy = getMemInstValueType(I);		Type *ValTy = getMemInstValueType(I);
▲ Show 20 Lines • Show All 2,068 Lines • Show Last 20 Lines