This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
6
LoopStrengthReduce.cpp

Differential D30527

Replacing float with new class Fraction for LSR alternative way of resolving complex solution
Needs ReviewPublic

Authored by evstupac on Mar 2 2017, 1:19 AM.

Download Raw Diff

Details

Reviewers

qcolombet
gottesmm
scanon

Summary

With floating point we potentially get different results on different CPUs). This patch helps to avoid this.
The change is follow up for https://reviews.llvm.org/D29862.
It should be NFC (at least on all CPUs I have in my pool).

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac created this revision.Mar 2 2017, 1:19 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptMar 2 2017, 1:19 AM

qcolombet added inline comments.Mar 7 2017, 4:00 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1113	Could you add a comment explaining how the optimization work? The reduction does not look right to me, but my math class are way behind me :P
1126	Add an assert D != 0
1131	I am confused about the meaning of this operator. For divide, I would have expected something like: return operator*(Fraction(Divider.Denominator, Divider.Numerator));

evstupac added inline comments.Mar 7 2017, 4:48 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1113	I suppose Numerator and Denominator will be in 31 bits for most cases. However if one of them exceed 31 bits, then operator+ could overflow. To avoid this we need to optimize a fraction. Converting N/D to (N/2)/(D/2) will potentially loose precision, but it is much faster than searching for common dividers. The precision we loose is not higher than ABS(N/D - (N/2)/(D/2)), which is not higher than max(D/2,N/2)/(D*D/2). Since we are working with probability N <= D. So it should be less than 1/D each step. That means precision loss is less than 1/(2^31), as D > 2^31. Which is acceptable. I'll add the comment to sources.
1126	ok.
1131	Well, this is specific for our case: probability of not selecting (PoNS) for reg R. PoNS at all is multiplication of PoNS for each use: PAll = P1P2...*PK (this is counted once). Suppose we want to get PoNS for particular register use - Use2. It will be PAll / P2. Numerator in PAll fraction has multiplier P2 Numerator. The same about Denominator. That way we don't need optimization.

What's the rationale for using rationals here instead of a (simpler) fixed-point representation, if we want to get rid of float?

What's the rationale for using rationals here instead of a (simpler) fixed-point representation, if we want to get rid of float?

No specific reason. Actually it is a kind of. I've looked into LLVM support classes and have not found appropriate.

PING.
Should I rewrite this using fixed point float?
Or newly implemented fraction class is acceptable?

PING.

Please follow @scanon recommendation.

Add fixed point instead of fraction.

Drive by comment: how about putting the FixedPoint64 in ADT and adding one or two unit tests?

In D30527#760563, @sanjoy wrote:

Drive by comment: how about putting the FixedPoint64 in ADT and adding one or two unit tests?

The class supports only unsigned values and misses some general operators. However I'm ok with adding it to ADT (and maybe supporting more later).
As for tests, this change should not change anything. The behavior should be the same. (float are replaced with fixed point to avoid potential different results on different CPUs).

In D30527#762673, @evstupac wrote:

As for tests, this change should not change anything. The behavior should be the same. (float are replaced with fixed point to avoid potential different results on different CPUs).

I think that's contradictory -- you're saying that this change must be NFC; but it will avoid potential different results on different CPUs?

In any case, by "test" I meant testing the FixedPoint64 class itself.

I think that's contradictory -- you're saying that this change must be NFC; but it will avoid potential different results on different CPUs?

I don't have a case in my mind. However I agreed with Quentin, that floats "can introduce subtle difference from one target to other". That comes from D29826.

In any case, by "test" I meant testing the FixedPoint64 class itself.

Ok. I'll add such.

I like @sanjoy's suggestion.
@scanon Could you do the review? I would prefer if an expert can look at it.
Thanks

Use APFloat instead of new class.

@gottesmm can you take a look at this? You're more familiar with the APFloat API than I am.

evstupac added reviewers: gottesmm, scanon.May 2 2018, 12:11 PM

PING.

PING

PING.

I think it might help if the motivational part would be specified in the differential's description.

Why is this wanted?
With what does this help?
Does this change affect anything?
Can the change be tested?
etc

In D30527#1125545, @lebedev.ri wrote:

I think it might help if the motivational part would be specified in the differential's description.

Why is this wanted?

It was requested by Quentin in https://reviews.llvm.org/D29862

With what does this help?

It potentially helps to avoid different results on different CPUs due to use of floating point arithmetic.

Does this change affect anything?

No. It is NFC on all CPUs I have in my pool.

Can the change be tested?

I don't see how.

etc

evstupac edited the summary of this revision. (Show Details)Jun 7 2018, 2:05 PM

PING

This patch looks good to me.
I think we can rebase and land this?

Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2022, 1:43 AM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

51 lines

Diff 143853

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 1,104 Lines • ▼ Show 20 Lines	static bool isEqual(const SmallVector<const SCEV *, 4> &LHS,
return LHS == RHS;		return LHS == RHS;
}		}
};		};

/// This class holds the state that LSR keeps for each use in IVUsers, as well		/// This class holds the state that LSR keeps for each use in IVUsers, as well
/// as uses invented by LSR itself. It includes information about what kinds of		/// as uses invented by LSR itself. It includes information about what kinds of
/// things can be folded into the user, information about the user itself, and		/// things can be folded into the user, information about the user itself, and
/// information about how the use may be satisfied. TODO: Represent multiple		/// information about how the use may be satisfied. TODO: Represent multiple
/// users of the same expression in common?		/// users of the same expression in common?
		qcolombetUnsubmitted Not Done Reply Inline Actions Could you add a comment explaining how the optimization work? The reduction does not look right to me, but my math class are way behind me :P qcolombet: Could you add a comment explaining how the optimization work? The reduction does not look right…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions I suppose Numerator and Denominator will be in 31 bits for most cases. However if one of them exceed 31 bits, then operator+ could overflow. To avoid this we need to optimize a fraction. Converting N/D to (N/2)/(D/2) will potentially loose precision, but it is much faster than searching for common dividers. The precision we loose is not higher than ABS(N/D - (N/2)/(D/2)), which is not higher than max(D/2,N/2)/(DD/2). Since we are working with probability N <= D. So it should be less than 1/D each step. That means precision loss is less than 1/(2^31), as D > 2^31. Which is acceptable. I'll add the comment to sources. evstupac:* I suppose Numerator and Denominator will be in 31 bits for most cases. However if one of them…
class LSRUse {		class LSRUse {
DenseSet<SmallVector<const SCEV *, 4>, UniquifierDenseMapInfo> Uniquifier;		DenseSet<SmallVector<const SCEV *, 4>, UniquifierDenseMapInfo> Uniquifier;

public:		public:
/// An enum for a kind of use, indicating what types of scaled and immediate		/// An enum for a kind of use, indicating what types of scaled and immediate
/// operands it might support.		/// operands it might support.
enum KindType {		enum KindType {
Basic, ///< A normal use, with no folding.		Basic, ///< A normal use, with no folding.
Special, ///< A special case of basic, allowing -1 scales.		Special, ///< A special case of basic, allowing -1 scales.
Address, ///< An address use; folding according to TargetLowering		Address, ///< An address use; folding according to TargetLowering
ICmpZero ///< An equality icmp with both operands folded into one.		ICmpZero ///< An equality icmp with both operands folded into one.
// TODO: Add a generic icmp too?		// TODO: Add a generic icmp too?
};		};
		qcolombetUnsubmitted Not Done Reply Inline Actions Add an assert D != 0 qcolombet: Add an assert D != 0
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions ok. evstupac: ok.

using SCEVUseKindPair = PointerIntPair<const SCEV *, 2, KindType>;		using SCEVUseKindPair = PointerIntPair<const SCEV *, 2, KindType>;

KindType Kind;		KindType Kind;
MemAccessTy AccessTy;		MemAccessTy AccessTy;
		qcolombetUnsubmitted Not Done Reply Inline Actions I am confused about the meaning of this operator. For divide, I would have expected something like: return operator(Fraction(Divider.Denominator, Divider.Numerator)); qcolombet:* I am confused about the meaning of this operator. For divide, I would have expected something…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Well, this is specific for our case: probability of not selecting (PoNS) for reg R. PoNS at all is multiplication of PoNS for each use: PAll = P1P2...PK (this is counted once). Suppose we want to get PoNS for particular register use - Use2. It will be PAll / P2. Numerator in PAll fraction has multiplier P2 Numerator. The same about Denominator. That way we don't need optimization. evstupac:* Well, this is specific for our case: probability of not selecting (PoNS) for reg R. PoNS at all…

/// The list of operands which are to be replaced.		/// The list of operands which are to be replaced.
SmallVector<LSRFixup, 8> Fixups;		SmallVector<LSRFixup, 8> Fixups;

/// Keep track of the min and max offsets of the fixups.		/// Keep track of the min and max offsets of the fixups.
int64_t MinOffset = std::numeric_limits<int64_t>::max();		int64_t MinOffset = std::numeric_limits<int64_t>::max();
int64_t MaxOffset = std::numeric_limits<int64_t>::min();		int64_t MaxOffset = std::numeric_limits<int64_t>::min();

Show All 33 Lines	void pushFixup(LSRFixup &f) {
Fixups.push_back(f);		Fixups.push_back(f);
if (f.Offset > MaxOffset)		if (f.Offset > MaxOffset)
MaxOffset = f.Offset;		MaxOffset = f.Offset;
if (f.Offset < MinOffset)		if (f.Offset < MinOffset)
MinOffset = f.Offset;		MinOffset = f.Offset;
}		}

bool HasFormulaWithSameRegs(const Formula &F) const;		bool HasFormulaWithSameRegs(const Formula &F) const;
float getNotSelectedProbability(const SCEV *Reg) const;		APFloat getNotSelectedProbability(const SCEV *Reg) const;
bool InsertFormula(const Formula &F, const Loop &L);		bool InsertFormula(const Formula &F, const Loop &L);
void DeleteFormula(Formula &F);		void DeleteFormula(Formula &F);
void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);		void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);

void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;
};		};

▲ Show 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {
SmallVector<const SCEV *, 4> Key = F.BaseRegs;		SmallVector<const SCEV *, 4> Key = F.BaseRegs;
if (F.ScaledReg) Key.push_back(F.ScaledReg);		if (F.ScaledReg) Key.push_back(F.ScaledReg);
// Unstable sort by host order ok, because this is only used for uniquifying.		// Unstable sort by host order ok, because this is only used for uniquifying.
llvm::sort(Key.begin(), Key.end());		llvm::sort(Key.begin(), Key.end());
return Uniquifier.count(Key);		return Uniquifier.count(Key);
}		}

/// The function returns a probability of selecting formula without Reg.		/// The function returns a probability of selecting formula without Reg.
float LSRUse::getNotSelectedProbability(const SCEV *Reg) const {		APFloat LSRUse::getNotSelectedProbability(const SCEV *Reg) const {
unsigned FNum = 0;		unsigned FNum = 0;
for (const Formula &F : Formulae)		for (const Formula &F : Formulae)
if (F.referencesReg(Reg))		if (F.referencesReg(Reg))
FNum++;		FNum++;
return ((float)(Formulae.size() - FNum)) / Formulae.size();		APFloat Ret(0.0), Div(0.0);
		Ret.convertFromAPInt(APInt(64, Formulae.size() - FNum), false,
		APFloat::rmNearestTiesToEven);
		Div.convertFromAPInt(APInt(64, Formulae.size()), false,
		APFloat::rmNearestTiesToEven);
		return Ret / Div;
}		}

/// If the given formula has not yet been inserted, add it to the list, and		/// If the given formula has not yet been inserted, add it to the list, and
/// return true. Return false otherwise. The formula must be in canonical form.		/// return true. Return false otherwise. The formula must be in canonical form.
bool LSRUse::InsertFormula(const Formula &F, const Loop &L) {		bool LSRUse::InsertFormula(const Formula &F, const Loop &L) {
assert(F.isCanonical(L) && "Invalid canonical representation");		assert(F.isCanonical(L) && "Invalid canonical representation");

if (!Formulae.empty() && RigidFormula)		if (!Formulae.empty() && RigidFormula)
▲ Show 20 Lines • Show All 3,078 Lines • ▼ Show 20 Lines	void LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas() {

// Set of Regs wich will be 100% used in final solution.		// Set of Regs wich will be 100% used in final solution.
// Used in each formula of a solution (in example above this is reg(c)).		// Used in each formula of a solution (in example above this is reg(c)).
// We can skip them in calculations.		// We can skip them in calculations.
SmallPtrSet<const SCEV *, 4> UniqRegs;		SmallPtrSet<const SCEV *, 4> UniqRegs;
DEBUG(dbgs() << "The search space is too complex.\n");		DEBUG(dbgs() << "The search space is too complex.\n");

// Map each register to probability of not selecting		// Map each register to probability of not selecting
DenseMap <const SCEV *, float> RegNumMap;		DenseMap <const SCEV *, APFloat> RegNumMap;
for (const SCEV *Reg : RegUses) {		for (const SCEV *Reg : RegUses) {
if (UniqRegs.count(Reg))		if (UniqRegs.count(Reg))
continue;		continue;
float PNotSel = 1;		APFloat PNotSel(1.0);
for (const LSRUse &LU : Uses) {		for (const LSRUse &LU : Uses) {
if (!LU.Regs.count(Reg))		if (!LU.Regs.count(Reg))
continue;		continue;
float P = LU.getNotSelectedProbability(Reg);		APFloat P = LU.getNotSelectedProbability(Reg);
if (P != 0.0)		if (P.isNonZero())
PNotSel *= P;		PNotSel = PNotSel * P;
else		else
UniqRegs.insert(Reg);		UniqRegs.insert(Reg);
}		}
RegNumMap.insert(std::make_pair(Reg, PNotSel));		RegNumMap.insert(std::make_pair(Reg, PNotSel));
}		}

DEBUG(dbgs() << "Narrowing the search space by deleting costly formulas\n");		DEBUG(dbgs() << "Narrowing the search space by deleting costly formulas\n");

// Delete formulas where registers number expectation is high.		// Delete formulas where registers number expectation is high.
for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
LSRUse &LU = Uses[LUIdx];		LSRUse &LU = Uses[LUIdx];
// If nothing to delete - continue.		// If nothing to delete - continue.
if (LU.Formulae.size() < 2)		if (LU.Formulae.size() < 2)
continue;		continue;
// This is temporary solution to test performance. Float should be		// This is temporary solution to test performance. Float should be
// replaced with round independent type (based on integers) to avoid		// replaced with round independent type (based on integers) to avoid
// different results for different target builds.		// different results for different target builds.
float FMinRegNum = LU.Formulae[0].getNumRegs();		APFloat FMinRegNum(0.0), FMinARegNum(0.0);
float FMinARegNum = LU.Formulae[0].getNumRegs();		FMinRegNum.convertFromAPInt(APInt(64, LU.Formulae[0].getNumRegs()), false,
		APFloatBase::rmNearestTiesToEven);
		FMinARegNum = FMinRegNum;
size_t MinIdx = 0;		size_t MinIdx = 0;
for (size_t i = 0, e = LU.Formulae.size(); i != e; ++i) {		for (size_t i = 0, e = LU.Formulae.size(); i != e; ++i) {
Formula &F = LU.Formulae[i];		Formula &F = LU.Formulae[i];
float FRegNum = 0;		APFloat FRegNum(0.0);
float FARegNum = 0;		APFloat FARegNum(0.0);
for (const SCEV *BaseReg : F.BaseRegs) {		for (const SCEV *BaseReg : F.BaseRegs) {
if (UniqRegs.count(BaseReg))		if (UniqRegs.count(BaseReg))
continue;		continue;
FRegNum += RegNumMap[BaseReg] / LU.getNotSelectedProbability(BaseReg);		FRegNum = FRegNum + (RegNumMap.find(BaseReg)->second /
		LU.getNotSelectedProbability(BaseReg));
if (isa<SCEVAddRecExpr>(BaseReg))		if (isa<SCEVAddRecExpr>(BaseReg))
FARegNum +=		FARegNum = FARegNum + (RegNumMap.find(BaseReg)->second /
RegNumMap[BaseReg] / LU.getNotSelectedProbability(BaseReg);		LU.getNotSelectedProbability(BaseReg));
}		}
if (const SCEV *ScaledReg = F.ScaledReg) {		if (const SCEV *ScaledReg = F.ScaledReg) {
if (!UniqRegs.count(ScaledReg)) {		if (!UniqRegs.count(ScaledReg)) {
FRegNum +=		FRegNum = FRegNum + (RegNumMap.find(ScaledReg)->second /
RegNumMap[ScaledReg] / LU.getNotSelectedProbability(ScaledReg);		LU.getNotSelectedProbability(ScaledReg));
if (isa<SCEVAddRecExpr>(ScaledReg))		if (isa<SCEVAddRecExpr>(ScaledReg))
FARegNum +=		FARegNum = FARegNum + (RegNumMap.find(ScaledReg)->second /
RegNumMap[ScaledReg] / LU.getNotSelectedProbability(ScaledReg);		LU.getNotSelectedProbability(ScaledReg));
}		}
}		}
if (FMinRegNum > FRegNum \|\|		if (FMinRegNum.compare(FRegNum) == APFloat::cmpGreaterThan \|\|
(FMinRegNum == FRegNum && FMinARegNum > FARegNum)) {		(FMinRegNum.compare(FRegNum) == APFloat::cmpEqual &&
		FMinARegNum.compare(FARegNum) == APFloat::cmpGreaterThan)) {
FMinRegNum = FRegNum;		FMinRegNum = FRegNum;
FMinARegNum = FARegNum;		FMinARegNum = FARegNum;
MinIdx = i;		MinIdx = i;
}		}
}		}
DEBUG(dbgs() << " The formula "; LU.Formulae[MinIdx].print(dbgs());		DEBUG(dbgs() << " The formula "; LU.Formulae[MinIdx].print(dbgs());
dbgs() << " with min reg num " << FMinRegNum << '\n');		dbgs() << " with min reg num " << FMinRegNum << '\n');
if (MinIdx != 0)		if (MinIdx != 0)
▲ Show 20 Lines • Show All 929 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Replacing float with new class Fraction for LSR alternative way of resolving complex solutionNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 143853

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Replacing float with new class Fraction for LSR alternative way of resolving complex solution
Needs ReviewPublic