This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
6
LoopStrengthReduce.cpp

Differential D30527

Replacing float with new class Fraction for LSR alternative way of resolving complex solution
Needs ReviewPublic

Authored by evstupac on Mar 2 2017, 1:19 AM.

Download Raw Diff

Details

Reviewers

qcolombet
gottesmm
scanon

Summary

With floating point we potentially get different results on different CPUs). This patch helps to avoid this.
The change is follow up for https://reviews.llvm.org/D29862.
It should be NFC (at least on all CPUs I have in my pool).

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac created this revision.Mar 2 2017, 1:19 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptMar 2 2017, 1:19 AM

qcolombet added inline comments.Mar 7 2017, 4:00 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1069	Could you add a comment explaining how the optimization work? The reduction does not look right to me, but my math class are way behind me :P
1082	Add an assert D != 0
1087	I am confused about the meaning of this operator. For divide, I would have expected something like: return operator*(Fraction(Divider.Denominator, Divider.Numerator));

evstupac added inline comments.Mar 7 2017, 4:48 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1069	I suppose Numerator and Denominator will be in 31 bits for most cases. However if one of them exceed 31 bits, then operator+ could overflow. To avoid this we need to optimize a fraction. Converting N/D to (N/2)/(D/2) will potentially loose precision, but it is much faster than searching for common dividers. The precision we loose is not higher than ABS(N/D - (N/2)/(D/2)), which is not higher than max(D/2,N/2)/(D*D/2). Since we are working with probability N <= D. So it should be less than 1/D each step. That means precision loss is less than 1/(2^31), as D > 2^31. Which is acceptable. I'll add the comment to sources.
1082	ok.
1087	Well, this is specific for our case: probability of not selecting (PoNS) for reg R. PoNS at all is multiplication of PoNS for each use: PAll = P1P2...*PK (this is counted once). Suppose we want to get PoNS for particular register use - Use2. It will be PAll / P2. Numerator in PAll fraction has multiplier P2 Numerator. The same about Denominator. That way we don't need optimization.

What's the rationale for using rationals here instead of a (simpler) fixed-point representation, if we want to get rid of float?

What's the rationale for using rationals here instead of a (simpler) fixed-point representation, if we want to get rid of float?

No specific reason. Actually it is a kind of. I've looked into LLVM support classes and have not found appropriate.

PING.
Should I rewrite this using fixed point float?
Or newly implemented fraction class is acceptable?

PING.

Please follow @scanon recommendation.

Add fixed point instead of fraction.

Drive by comment: how about putting the FixedPoint64 in ADT and adding one or two unit tests?

In D30527#760563, @sanjoy wrote:

Drive by comment: how about putting the FixedPoint64 in ADT and adding one or two unit tests?

The class supports only unsigned values and misses some general operators. However I'm ok with adding it to ADT (and maybe supporting more later).
As for tests, this change should not change anything. The behavior should be the same. (float are replaced with fixed point to avoid potential different results on different CPUs).

In D30527#762673, @evstupac wrote:

As for tests, this change should not change anything. The behavior should be the same. (float are replaced with fixed point to avoid potential different results on different CPUs).

I think that's contradictory -- you're saying that this change must be NFC; but it will avoid potential different results on different CPUs?

In any case, by "test" I meant testing the FixedPoint64 class itself.

I think that's contradictory -- you're saying that this change must be NFC; but it will avoid potential different results on different CPUs?

I don't have a case in my mind. However I agreed with Quentin, that floats "can introduce subtle difference from one target to other". That comes from D29826.

In any case, by "test" I meant testing the FixedPoint64 class itself.

Ok. I'll add such.

I like @sanjoy's suggestion.
@scanon Could you do the review? I would prefer if an expert can look at it.
Thanks

Use APFloat instead of new class.

@gottesmm can you take a look at this? You're more familiar with the APFloat API than I am.

evstupac added reviewers: gottesmm, scanon.May 2 2018, 12:11 PM

PING.

PING

PING.

I think it might help if the motivational part would be specified in the differential's description.

Why is this wanted?
With what does this help?
Does this change affect anything?
Can the change be tested?
etc

In D30527#1125545, @lebedev.ri wrote:

I think it might help if the motivational part would be specified in the differential's description.

Why is this wanted?

It was requested by Quentin in https://reviews.llvm.org/D29862

With what does this help?

It potentially helps to avoid different results on different CPUs due to use of floating point arithmetic.

Does this change affect anything?

No. It is NFC on all CPUs I have in my pool.

Can the change be tested?

I don't see how.

etc

evstupac edited the summary of this revision. (Show Details)Jun 7 2018, 2:05 PM

PING

This patch looks good to me.
I think we can rebase and land this?

Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2022, 1:43 AM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

91 lines

Diff 99629

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 1,056 Lines • ▼ Show 20 Lines	struct UniquifierDenseMapInfo {
}		}

static bool isEqual(const SmallVector<const SCEV *, 4> &LHS,		static bool isEqual(const SmallVector<const SCEV *, 4> &LHS,
const SmallVector<const SCEV *, 4> &RHS) {		const SmallVector<const SCEV *, 4> &RHS) {
return LHS == RHS;		return LHS == RHS;
}		}
};		};

		/// Helper class to count probability as a fixed point instead of float.
		class FixedPoint64 {
		private:
		uint64_t Value;
		FixedPoint64(uint64_t N) : Value(N) {}
		qcolombetUnsubmitted Not Done Reply Inline Actions Could you add a comment explaining how the optimization work? The reduction does not look right to me, but my math class are way behind me :P qcolombet: Could you add a comment explaining how the optimization work? The reduction does not look right…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions I suppose Numerator and Denominator will be in 31 bits for most cases. However if one of them exceed 31 bits, then operator+ could overflow. To avoid this we need to optimize a fraction. Converting N/D to (N/2)/(D/2) will potentially loose precision, but it is much faster than searching for common dividers. The precision we loose is not higher than ABS(N/D - (N/2)/(D/2)), which is not higher than max(D/2,N/2)/(DD/2). Since we are working with probability N <= D. So it should be less than 1/D each step. That means precision loss is less than 1/(2^31), as D > 2^31. Which is acceptable. I'll add the comment to sources. evstupac:* I suppose Numerator and Denominator will be in 31 bits for most cases. However if one of them…
		FixedPoint64(uint32_t Hi, uint32_t Lo) {
		Value = (((uint64_t)Hi) << 32) \| Lo;
		}
		uint32_t getLo() const {
		return (uint32_t)Value;
		}
		uint32_t getHi() const {
		return (uint32_t)(Value >> 32);
		}
		public:
		FixedPoint64() : Value(0) {}
		FixedPoint64(uint32_t N) {
		Value = FixedPoint64(N, 0).Value;
		qcolombetUnsubmitted Not Done Reply Inline Actions Add an assert D != 0 qcolombet: Add an assert D != 0
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions ok. evstupac: ok.
		}
		double getFloat() const {
		return (float)Value / ((uint64_t)1 << 32);
		}
		FixedPoint64 operator+(const FixedPoint64 Add) const {
		qcolombetUnsubmitted Not Done Reply Inline Actions I am confused about the meaning of this operator. For divide, I would have expected something like: return operator(Fraction(Divider.Denominator, Divider.Numerator)); qcolombet:* I am confused about the meaning of this operator. For divide, I would have expected something…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Well, this is specific for our case: probability of not selecting (PoNS) for reg R. PoNS at all is multiplication of PoNS for each use: PAll = P1P2...PK (this is counted once). Suppose we want to get PoNS for particular register use - Use2. It will be PAll / P2. Numerator in PAll fraction has multiplier P2 Numerator. The same about Denominator. That way we don't need optimization. evstupac:* Well, this is specific for our case: probability of not selecting (PoNS) for reg R. PoNS at all…
		return FixedPoint64(Value + Add.Value);
		}
		FixedPoint64 operator/(const FixedPoint64 &Divider) const {
		assert(Divider.Value && "Division by zero!");
		uint32_t Hi = (uint32_t)(Value / Divider.Value);
		uint32_t Lo = (uint32_t)((Value << 32) / Divider.Value);
		if (Divider.Value >> 32)
		Lo += (uint32_t)((Value) / (Divider.Value >> 32));
		return FixedPoint64(Hi, Lo);
		}
		FixedPoint64 operator*(const FixedPoint64 Multiplier) const {
		uint64_t Sum = (uint64_t)this->getHi() * Multiplier.getLo();
		Sum += (uint64_t)this->getLo() * Multiplier.getHi();
		Sum += ((uint64_t)this->getLo() * Multiplier.getLo()) >> 32;
		uint32_t Hi = (uint32_t)(this->getHi() * Multiplier.getHi() + (Sum >> 32));
		return FixedPoint64(Hi, (uint32_t)Sum);
		}
		bool operator>(const FixedPoint64 Right) const {
		return Value > Right.Value;
		}
		bool operator!=(const FixedPoint64 Right) const {
		return Value != Right.Value;
		}
		bool operator==(const FixedPoint64 Right) const {
		return Value == Right.Value;
		}
		};

/// This class holds the state that LSR keeps for each use in IVUsers, as well		/// This class holds the state that LSR keeps for each use in IVUsers, as well
/// as uses invented by LSR itself. It includes information about what kinds of		/// as uses invented by LSR itself. It includes information about what kinds of
/// things can be folded into the user, information about the user itself, and		/// things can be folded into the user, information about the user itself, and
/// information about how the use may be satisfied. TODO: Represent multiple		/// information about how the use may be satisfied. TODO: Represent multiple
/// users of the same expression in common?		/// users of the same expression in common?
class LSRUse {		class LSRUse {
DenseSet<SmallVector<const SCEV *, 4>, UniquifierDenseMapInfo> Uniquifier;		DenseSet<SmallVector<const SCEV *, 4>, UniquifierDenseMapInfo> Uniquifier;

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	void pushFixup(LSRFixup &f) {
Fixups.push_back(f);		Fixups.push_back(f);
if (f.Offset > MaxOffset)		if (f.Offset > MaxOffset)
MaxOffset = f.Offset;		MaxOffset = f.Offset;
if (f.Offset < MinOffset)		if (f.Offset < MinOffset)
MinOffset = f.Offset;		MinOffset = f.Offset;
}		}

bool HasFormulaWithSameRegs(const Formula &F) const;		bool HasFormulaWithSameRegs(const Formula &F) const;
float getNotSelectedProbability(const SCEV *Reg) const;		FixedPoint64 getNotSelectedProbability(const SCEV *Reg) const;
bool InsertFormula(const Formula &F, const Loop &L);		bool InsertFormula(const Formula &F, const Loop &L);
void DeleteFormula(Formula &F);		void DeleteFormula(Formula &F);
void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);		void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);

void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;
};		};

▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {
SmallVector<const SCEV *, 4> Key = F.BaseRegs;		SmallVector<const SCEV *, 4> Key = F.BaseRegs;
if (F.ScaledReg) Key.push_back(F.ScaledReg);		if (F.ScaledReg) Key.push_back(F.ScaledReg);
// Unstable sort by host order ok, because this is only used for uniquifying.		// Unstable sort by host order ok, because this is only used for uniquifying.
std::sort(Key.begin(), Key.end());		std::sort(Key.begin(), Key.end());
return Uniquifier.count(Key);		return Uniquifier.count(Key);
}		}

/// The function returns a probability of selecting formula without Reg.		/// The function returns a probability of selecting formula without Reg.
float LSRUse::getNotSelectedProbability(const SCEV *Reg) const {		FixedPoint64 LSRUse::getNotSelectedProbability(const SCEV *Reg) const {
unsigned FNum = 0;		uint32_t FSize = Formulae.size();
		uint32_t FNum = FSize;
for (const Formula &F : Formulae)		for (const Formula &F : Formulae)
if (F.referencesReg(Reg))		if (F.referencesReg(Reg))
FNum++;		FNum--;
return ((float)(Formulae.size() - FNum)) / Formulae.size();		return FixedPoint64(FNum) / FixedPoint64(FSize);
}		}

/// If the given formula has not yet been inserted, add it to the list, and		/// If the given formula has not yet been inserted, add it to the list, and
/// return true. Return false otherwise. The formula must be in canonical form.		/// return true. Return false otherwise. The formula must be in canonical form.
bool LSRUse::InsertFormula(const Formula &F, const Loop &L) {		bool LSRUse::InsertFormula(const Formula &F, const Loop &L) {
assert(F.isCanonical(L) && "Invalid canonical representation");		assert(F.isCanonical(L) && "Invalid canonical representation");

if (!Formulae.empty() && RigidFormula)		if (!Formulae.empty() && RigidFormula)
▲ Show 20 Lines • Show All 2,924 Lines • ▼ Show 20 Lines	void LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas() {

// Set of Regs wich will be 100% used in final solution.		// Set of Regs wich will be 100% used in final solution.
// Used in each formula of a solution (in example above this is reg(c)).		// Used in each formula of a solution (in example above this is reg(c)).
// We can skip them in calculations.		// We can skip them in calculations.
SmallPtrSet<const SCEV *, 4> UniqRegs;		SmallPtrSet<const SCEV *, 4> UniqRegs;
DEBUG(dbgs() << "The search space is too complex.\n");		DEBUG(dbgs() << "The search space is too complex.\n");

// Map each register to probability of not selecting		// Map each register to probability of not selecting
DenseMap <const SCEV *, float> RegNumMap;		DenseMap <const SCEV *, FixedPoint64> RegNumMap;
for (const SCEV *Reg : RegUses) {		for (const SCEV *Reg : RegUses) {
if (UniqRegs.count(Reg))		if (UniqRegs.count(Reg))
continue;		continue;
float PNotSel = 1;		FixedPoint64 PNotSel((uint32_t)1);
for (const LSRUse &LU : Uses) {		for (const LSRUse &LU : Uses) {
if (!LU.Regs.count(Reg))		if (!LU.Regs.count(Reg))
continue;		continue;
float P = LU.getNotSelectedProbability(Reg);		FixedPoint64 P = LU.getNotSelectedProbability(Reg);
if (P != 0.0)		if (P != FixedPoint64((uint32_t)0))
PNotSel *= P;		PNotSel = PNotSel * P;
else		else
UniqRegs.insert(Reg);		UniqRegs.insert(Reg);
}		}
RegNumMap.insert(std::make_pair(Reg, PNotSel));		RegNumMap.insert(std::make_pair(Reg, PNotSel));
}		}

DEBUG(dbgs() << "Narrowing the search space by deleting costly formulas\n");		DEBUG(dbgs() << "Narrowing the search space by deleting costly formulas\n");

// Delete formulas where registers number expectation is high.		// Delete formulas where registers number expectation is high.
for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
LSRUse &LU = Uses[LUIdx];		LSRUse &LU = Uses[LUIdx];
// If nothing to delete - continue.		// If nothing to delete - continue.
if (LU.Formulae.size() < 2)		if (LU.Formulae.size() < 2)
continue;		continue;
// This is temporary solution to test performance. Float should be		// This is temporary solution to test performance. Float should be
// replaced with round independent type (based on integers) to avoid		// replaced with round independent type (based on integers) to avoid
// different results for different target builds.		// different results for different target builds.
float FMinRegNum = LU.Formulae[0].getNumRegs();		FixedPoint64 FMinRegNum((uint32_t)LU.Formulae[0].getNumRegs());
float FMinARegNum = LU.Formulae[0].getNumRegs();		FixedPoint64 FMinARegNum((uint32_t)LU.Formulae[0].getNumRegs());
size_t MinIdx = 0;		size_t MinIdx = 0;
for (size_t i = 0, e = LU.Formulae.size(); i != e; ++i) {		for (size_t i = 0, e = LU.Formulae.size(); i != e; ++i) {
Formula &F = LU.Formulae[i];		Formula &F = LU.Formulae[i];
float FRegNum = 0;		FixedPoint64 FRegNum;
float FARegNum = 0;		FixedPoint64 FARegNum;
for (const SCEV *BaseReg : F.BaseRegs) {		for (const SCEV *BaseReg : F.BaseRegs) {
if (UniqRegs.count(BaseReg))		if (UniqRegs.count(BaseReg))
continue;		continue;
FRegNum += RegNumMap[BaseReg] / LU.getNotSelectedProbability(BaseReg);		FRegNum = FRegNum +
		RegNumMap[BaseReg] / LU.getNotSelectedProbability(BaseReg);
if (isa<SCEVAddRecExpr>(BaseReg))		if (isa<SCEVAddRecExpr>(BaseReg))
FARegNum +=		FARegNum = FARegNum +
RegNumMap[BaseReg] / LU.getNotSelectedProbability(BaseReg);		RegNumMap[BaseReg] / LU.getNotSelectedProbability(BaseReg);
}		}
if (const SCEV *ScaledReg = F.ScaledReg) {		if (const SCEV *ScaledReg = F.ScaledReg) {
if (!UniqRegs.count(ScaledReg)) {		if (!UniqRegs.count(ScaledReg)) {
FRegNum +=		FRegNum = FRegNum +
RegNumMap[ScaledReg] / LU.getNotSelectedProbability(ScaledReg);		RegNumMap[ScaledReg] / LU.getNotSelectedProbability(ScaledReg);
if (isa<SCEVAddRecExpr>(ScaledReg))		if (isa<SCEVAddRecExpr>(ScaledReg))
FARegNum +=		FARegNum = FARegNum +
RegNumMap[ScaledReg] / LU.getNotSelectedProbability(ScaledReg);		RegNumMap[ScaledReg] / LU.getNotSelectedProbability(ScaledReg);
}		}
}		}
if (FMinRegNum > FRegNum \|\|		if (FMinRegNum > FRegNum \|\|
(FMinRegNum == FRegNum && FMinARegNum > FARegNum)) {		(FMinRegNum == FRegNum && FMinARegNum > FARegNum)) {
FMinRegNum = FRegNum;		FMinRegNum = FRegNum;
FMinARegNum = FARegNum;		FMinARegNum = FARegNum;
MinIdx = i;		MinIdx = i;
}		}
}		}
DEBUG(dbgs() << " The formula "; LU.Formulae[MinIdx].print(dbgs());		DEBUG(dbgs() << " The formula "; LU.Formulae[MinIdx].print(dbgs());
dbgs() << " with min reg num " << FMinRegNum << '\n');		dbgs() << " with min reg num " << FMinRegNum.getFloat() << '\n');
if (MinIdx != 0)		if (MinIdx != 0)
std::swap(LU.Formulae[MinIdx], LU.Formulae[0]);		std::swap(LU.Formulae[MinIdx], LU.Formulae[0]);
while (LU.Formulae.size() != 1) {		while (LU.Formulae.size() != 1) {
DEBUG(dbgs() << " Deleting "; LU.Formulae.back().print(dbgs());		DEBUG(dbgs() << " Deleting "; LU.Formulae.back().print(dbgs());
dbgs() << '\n');		dbgs() << '\n');
LU.Formulae.pop_back();		LU.Formulae.pop_back();
}		}
LU.RecomputeRegs(LUIdx, RegUses);		LU.RecomputeRegs(LUIdx, RegUses);
▲ Show 20 Lines • Show All 928 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Replacing float with new class Fraction for LSR alternative way of resolving complex solutionNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 99629

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Replacing float with new class Fraction for LSR alternative way of resolving complex solution
Needs ReviewPublic