This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
TargetLowering.h
-
Transforms/Scalar/
-
Scalar/
-
Reassociate.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/PowerPC/
-
PowerPC/
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
Transforms/Scalar/
-
Scalar/
1
Reassociate.cpp
-
test/Transforms/Reassociate/PowerPC/
-
Transforms/
-
Reassociate/
-
PowerPC/
-
lit.local.cfg
-
prefer-fma.ll

Differential D85504

[Reassociate] [PowerPC] stop common out mul factors if fma is preferred on target
AbandonedPublic

Authored by shchenz on Aug 7 2020, 12:29 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
spatel
efriedma
qcolombet
Whitney
jsji

Group Reviewers

Restricted Project

Summary

For the case A * B + A * C, now, reassociate pass(function OptimizeAdd) will change it to A * (B + C) to save one mul as A is a common mul factor .

But transformation like above has no benefit at all on PowerPC target. In fact, if target prefers fma, it generates worse IR.

Because on PowerPC target:
A * B + A * C can be generated as fma(fmul(A,B), A, C);
A * (B + C) can be generated as fmul(A, fadd(B, C));

fma, fmul, fadd, fsub all have same latency on PowerPC arch, so no cpu cycle benefit.
Reducing number of mul also makes number of fma reduce. So this is not a benefit transformation on PowerPC.

This patch tries to bail out this opt early to expose more fma folding opportunities and save some compile time on PowerPC target.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

shchenz created this revision.Aug 7 2020, 12:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2020, 12:29 AM

Herald added subscribers: llvm-commits, steven.zhang, • wuzish and 3 others. · View Herald Transcript

shchenz requested review of this revision.Aug 7 2020, 12:29 AM

shchenz updated this revision to Diff 283829.Aug 7 2020, 12:39 AM

(can you please maybe spellcheck both the patch description and the wording within the patch? it is really hard to read)

I'm not really sure this is the way forward.

This is a target-agnostic pass, it we add arbitrary restrictions it will likely miss some intended transforms
As usual, just preventing some transform is not the solution, because you can still have "bad" input that will still be lowered badly

IOW,

why not reverse this, as usual, in DAGCombine
this needs something like D67383 Tree-Height-Reduction, i guess, for more general reverse transform

Harbormaster completed remote builds in B67424: Diff 283826.Aug 7 2020, 12:49 AM

shchenz edited the summary of this revision. (Show Details)Aug 7 2020, 1:02 AM

shchenz edited the summary of this revision. (Show Details)

In D85504#2202000, @lebedev.ri wrote:

(can you please maybe spellcheck both the patch description and the wording within the patch? it is really hard to read)

Done

I'm not really sure this is the way forward.

This is a target-agnostic pass, it we add arbitrary restrictions it will likely miss some intended transforms

Could you be more specific? I verified this change on PowerPC for some benchmarks, no degradation found.

As usual, just preventing some transform is not the solution, because you can still have "bad" input that will still be lowered badly

Agree. But if we know that one transformation has no benefit at all or it has a negative impact on some specific targets, should we stop the transformation on these targets? Reversing the transformations back in later pass will make compiler spend compiling time both on the harmful transformations and on the reversed transformations? also I am worried about the feasibility of reversing this "complex" transformation in later pass like DAGCombine...

shchenz edited the summary of this revision. (Show Details)Aug 7 2020, 1:14 AM

Harbormaster completed remote builds in B67425: Diff 283829.Aug 7 2020, 1:29 AM

I'm not sure why i'm added as a reviewer on this patch, i don't work on ppc, i never used it, and i never really worked on that pass.

In D85504#2202066, @shchenz wrote:

In D85504#2202000, @lebedev.ri wrote:

(can you please maybe spellcheck both the patch description and the wording within the patch? it is really hard to read)

Done

I'm not really sure this is the way forward.

This is a target-agnostic pass, it we add arbitrary restrictions it will likely miss some intended transforms

Could you be more specific? I verified this change on PowerPC for some benchmarks, no degradation found.

Perhaps something like a*b * c*d + e*f * c*d, surely it should be (a*b + e*f) * c * d,
but won't this patch keep it as a*b*c*d + e*f*c*d?

As usual, just preventing some transform is not the solution, because you can still have "bad" input that will still be lowered badly

Agree. But if we know that one transformation has no benefit at all or it has a negative impact on some specific targets, should we stop the transformation on these targets? Reversing the transformations back in later pass will make compiler spend compiling time both on the harmful transformations and on the reversed transformations? also I am worried about the feasibility of reversing this "complex" transformation in later pass like DAGCombine...

llvm/lib/Transforms/Scalar/Reassociate.cpp
1500	So we also prevent the transform for integers now? But there's no FMA for them?

Good catch for the a*b * c*d + e*f * c*d case, Roman. That's a degradation after this patch. Thanks for your good review. ^-^

The reassociate pass (like instcombine) is supposed to be a target-independent canonicalization pass:

  // This pass reassociates commutative expressions in an order that is designed
  // to promote better constant propagation, GCSE, LICM, PRE, etc.

So it's intentionally avoiding using TTI/TLI and trying not to do all forms of reassociation. As Roman mentioned, see D67383 as an example of a possible TTI-aware reassociation pass.

So either we need to reverse this in DAGCombine and/or PPC-specific codegen, or we need to make an argument that the canonicalization does not make sense in general (for FP only?).

There seems to be some (weak) correlation between this and https://reviews.llvm.org/D84309 which I am reworking to remove it from canonicalization. It may be worth looking into whether we can catch both issues in CGP.

Thanks @spatel @nemanjai for your comments. So it should be a common-sense that we should not add TTI/TLI hook inside Reassociate Pass which was already kindly pointed out by @lebedev.ri . I have learned this now ^_^.
I will investigate all the suggested places to do the reversal (DAGCombine, other PPC specific pass or CGP) later.

we will use other solution for this issue.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

1 line

CodeGen/

BasicTTIImpl.h

4 lines

TargetLowering.h

1 line

Transforms/

Scalar/

Reassociate.h

3 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

PowerPC/

PPCISelLowering.h

2 lines

PPCISelLowering.cpp

11 lines

Transforms/

Scalar/

Reassociate.cpp

25 lines

test/

Transforms/

Reassociate/

PowerPC/

lit.local.cfg

2 lines

prefer-fma.ll

51 lines

Diff 283826

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 686 Lines • ▼ Show 20 Lines	public:
/// Ty2. e.g. On x86 it's free to truncate a i32 value in register EAX to i16		/// Ty2. e.g. On x86 it's free to truncate a i32 value in register EAX to i16
/// by referencing its sub-register AX.		/// by referencing its sub-register AX.
bool isTruncateFree(Type Ty1, Type Ty2) const;		bool isTruncateFree(Type Ty1, Type Ty2) const;

/// Return true if it is profitable to hoist instruction in the		/// Return true if it is profitable to hoist instruction in the
/// then/else to before if.		/// then/else to before if.
bool isProfitableToHoist(Instruction *I) const;		bool isProfitableToHoist(Instruction *I) const;

		/// Return true if it is profitable to generate fma instructions.
		bool isProfitableToGenerateFMA(Instruction *I) const;

bool useAA() const;		bool useAA() const;

/// Return true if this type is legal.		/// Return true if this type is legal.
bool isTypeLegal(Type *Ty) const;		bool isTypeLegal(Type *Ty) const;

/// Return true if switches should be turned into lookup tables for the		/// Return true if switches should be turned into lookup tables for the
/// target.		/// target.
bool shouldBuildLookupTables() const;		bool shouldBuildLookupTables() const;
▲ Show 20 Lines • Show All 698 Lines • ▼ Show 20 Lines	public:
virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;		virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;
virtual bool prefersVectorizedAddressing() = 0;		virtual bool prefersVectorizedAddressing() = 0;
virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace) = 0;		int64_t Scale, unsigned AddrSpace) = 0;
virtual bool LSRWithInstrQueries() = 0;		virtual bool LSRWithInstrQueries() = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
		virtual bool isProfitableToGenerateFMA(Instruction *I) = 0;
virtual bool useAA() = 0;		virtual bool useAA() = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
virtual bool shouldBuildLookupTables() = 0;		virtual bool shouldBuildLookupTables() = 0;
virtual bool shouldBuildLookupTablesForConstant(Constant *C) = 0;		virtual bool shouldBuildLookupTablesForConstant(Constant *C) = 0;
virtual bool useColdCCForColdCall(Function &F) = 0;		virtual bool useColdCCForColdCall(Function &F) = 0;
virtual unsigned getScalarizationOverhead(VectorType *Ty,		virtual unsigned getScalarizationOverhead(VectorType *Ty,
const APInt &DemandedElts,		const APInt &DemandedElts,
bool Insert, bool Extract) = 0;		bool Insert, bool Extract) = 0;
▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	public:
}		}
bool LSRWithInstrQueries() override { return Impl.LSRWithInstrQueries(); }		bool LSRWithInstrQueries() override { return Impl.LSRWithInstrQueries(); }
bool isTruncateFree(Type Ty1, Type Ty2) override {		bool isTruncateFree(Type Ty1, Type Ty2) override {
return Impl.isTruncateFree(Ty1, Ty2);		return Impl.isTruncateFree(Ty1, Ty2);
}		}
bool isProfitableToHoist(Instruction *I) override {		bool isProfitableToHoist(Instruction *I) override {
return Impl.isProfitableToHoist(I);		return Impl.isProfitableToHoist(I);
}		}
		bool isProfitableToGenerateFMA(Instruction *I) override {
		return Impl.isProfitableToGenerateFMA(I);
		}
bool useAA() override { return Impl.useAA(); }		bool useAA() override { return Impl.useAA(); }
bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); }		bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); }
bool shouldBuildLookupTables() override {		bool shouldBuildLookupTables() override {
return Impl.shouldBuildLookupTables();		return Impl.shouldBuildLookupTables();
}		}
bool shouldBuildLookupTablesForConstant(Constant *C) override {		bool shouldBuildLookupTablesForConstant(Constant *C) override {
return Impl.shouldBuildLookupTablesForConstant(C);		return Impl.shouldBuildLookupTablesForConstant(C);
}		}
▲ Show 20 Lines • Show All 425 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
return -1;		return -1;
}		}

bool LSRWithInstrQueries() { return false; }		bool LSRWithInstrQueries() { return false; }

bool isTruncateFree(Type Ty1, Type Ty2) { return false; }		bool isTruncateFree(Type Ty1, Type Ty2) { return false; }

bool isProfitableToHoist(Instruction *I) { return true; }		bool isProfitableToHoist(Instruction *I) { return true; }
		bool isProfitableToGenerateFMA(Instruction *I) { return false; }

bool useAA() { return false; }		bool useAA() { return false; }

bool isTypeLegal(Type *Ty) { return false; }		bool isTypeLegal(Type *Ty) { return false; }

bool shouldBuildLookupTables() { return true; }		bool shouldBuildLookupTables() { return true; }
bool shouldBuildLookupTablesForConstant(Constant *C) { return true; }		bool shouldBuildLookupTablesForConstant(Constant *C) { return true; }

▲ Show 20 Lines • Show All 812 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	public:
bool isTruncateFree(Type Ty1, Type Ty2) {		bool isTruncateFree(Type Ty1, Type Ty2) {
return getTLI()->isTruncateFree(Ty1, Ty2);		return getTLI()->isTruncateFree(Ty1, Ty2);
}		}

bool isProfitableToHoist(Instruction *I) {		bool isProfitableToHoist(Instruction *I) {
return getTLI()->isProfitableToHoist(I);		return getTLI()->isProfitableToHoist(I);
}		}

		bool isProfitableToGenerateFMA(Instruction *I) {
		return getTLI()->isProfitableToGenerateFMA(I);
		}

bool useAA() const { return getST()->useAA(); }		bool useAA() const { return getST()->useAA(); }

bool isTypeLegal(Type *Ty) {		bool isTypeLegal(Type *Ty) {
EVT VT = getTLI()->getValueType(DL, Ty);		EVT VT = getTLI()->getValueType(DL, Ty);
return getTLI()->isTypeLegal(VT);		return getTLI()->isTypeLegal(VT);
}		}

int getGEPCost(Type PointeeType, const Value Ptr,		int getGEPCost(Type PointeeType, const Value Ptr,
▲ Show 20 Lines • Show All 1,609 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,446 Lines • ▼ Show 20 Lines	virtual bool allowTruncateForTailCall(Type FromTy, Type ToTy) const {
return false;		return false;
}		}

virtual bool isTruncateFree(EVT FromVT, EVT ToVT) const {		virtual bool isTruncateFree(EVT FromVT, EVT ToVT) const {
return false;		return false;
}		}

virtual bool isProfitableToHoist(Instruction *I) const { return true; }		virtual bool isProfitableToHoist(Instruction *I) const { return true; }
		virtual bool isProfitableToGenerateFMA(Instruction *I) const { return false; }

/// Return true if the extension represented by \p I is free.		/// Return true if the extension represented by \p I is free.
/// Unlikely the is[Z\|FP]ExtFree family which is based on types,		/// Unlikely the is[Z\|FP]ExtFree family which is based on types,
/// this method can use the context provided by \p I to decide		/// this method can use the context provided by \p I to decide
/// whether or not \p I is free.		/// whether or not \p I is free.
/// This method extends the behavior of the is[Z\|FP]ExtFree family.		/// This method extends the behavior of the is[Z\|FP]ExtFree family.
/// In other words, if is[Z\|FP]Free returns true, then this method		/// In other words, if is[Z\|FP]Free returns true, then this method
/// returns true as well. The converse is not true.		/// returns true as well. The converse is not true.
▲ Show 20 Lines • Show All 2,059 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/Reassociate.h

Show All 16 Lines
// (starting at 2), which effectively gives values in deep loops higher rank		// (starting at 2), which effectively gives values in deep loops higher rank
// than values not in loops.		// than values not in loops.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H		#ifndef LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H
#define LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H		#define LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H

		#include "llvm/Analysis/TargetTransformInfo.h"
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -#include "llvm/Analysis/TargetTransformInfo.h" Lint: Pre-merge checks: clang-format: please reformat the code ``` -#include "llvm/Analysis/TargetTransformInfo.h" ```
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code +#include "llvm/Analysis/TargetTransformInfo.h" Lint: Pre-merge checks: clang-format: please reformat the code ``` +#include "llvm/Analysis/TargetTransformInfo.h" ```
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include <deque>		#include <deque>

namespace llvm {		namespace llvm {

class APInt;		class APInt;
class BasicBlock;		class BasicBlock;
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	struct PairMapValue {
bool isValid() const { return Value1 && Value2; }		bool isValid() const { return Value1 && Value2; }
};		};
DenseMap<std::pair<Value , Value >, PairMapValue> PairMap[NumBinaryOps];		DenseMap<std::pair<Value , Value >, PairMapValue> PairMap[NumBinaryOps];

bool MadeChange;		bool MadeChange;

public:		public:
PreservedAnalyses run(Function &F, FunctionAnalysisManager &);		PreservedAnalyses run(Function &F, FunctionAnalysisManager &);
		bool runImpl(Function &F, const TargetTransformInfo *TTI);

private:		private:
void BuildRankMap(Function &F, ReversePostOrderTraversal<Function *> &RPOT);		void BuildRankMap(Function &F, ReversePostOrderTraversal<Function *> &RPOT);
unsigned getRank(Value *V);		unsigned getRank(Value *V);
void canonicalizeOperands(Instruction *I);		void canonicalizeOperands(Instruction *I);
void ReassociateExpression(BinaryOperator *I);		void ReassociateExpression(BinaryOperator *I);
void RewriteExprTree(BinaryOperator *I,		void RewriteExprTree(BinaryOperator *I,
SmallVectorImpl<reassociate::ValueEntry> &Ops);		SmallVectorImpl<reassociate::ValueEntry> &Ops);
Show All 15 Lines	private:
Value RemoveFactorFromExpression(Value V, Value *Factor);		Value RemoveFactorFromExpression(Value V, Value *Factor);
void EraseInst(Instruction *I);		void EraseInst(Instruction *I);
void RecursivelyEraseDeadInsts(Instruction *I, OrderedSet &Insts);		void RecursivelyEraseDeadInsts(Instruction *I, OrderedSet &Insts);
void OptimizeInst(Instruction *I);		void OptimizeInst(Instruction *I);
Instruction canonicalizeNegFPConstantsForOp(Instruction I, Instruction *Op,		Instruction canonicalizeNegFPConstantsForOp(Instruction I, Instruction *Op,
Value *OtherOp);		Value *OtherOp);
Instruction canonicalizeNegFPConstants(Instruction I);		Instruction canonicalizeNegFPConstants(Instruction I);
void BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT);		void BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT);
		const TargetTransformInfo *TTI;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H		#endif // LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::isTruncateFree(Type Ty1, Type Ty2) const {			bool TargetTransformInfo::isTruncateFree(Type Ty1, Type Ty2) const {
	return TTIImpl->isTruncateFree(Ty1, Ty2);			return TTIImpl->isTruncateFree(Ty1, Ty2);
	}			}

	bool TargetTransformInfo::isProfitableToHoist(Instruction *I) const {			bool TargetTransformInfo::isProfitableToHoist(Instruction *I) const {
	return TTIImpl->isProfitableToHoist(I);			return TTIImpl->isProfitableToHoist(I);
	}			}

				bool TargetTransformInfo::isProfitableToGenerateFMA(Instruction *I) const {
				return TTIImpl->isProfitableToGenerateFMA(I);
				}

	bool TargetTransformInfo::useAA() const { return TTIImpl->useAA(); }			bool TargetTransformInfo::useAA() const { return TTIImpl->useAA(); }

	bool TargetTransformInfo::isTypeLegal(Type *Ty) const {			bool TargetTransformInfo::isTypeLegal(Type *Ty) const {
	return TTIImpl->isTypeLegal(Ty);			return TTIImpl->isTypeLegal(Ty);
	}			}

	bool TargetTransformInfo::shouldBuildLookupTables() const {			bool TargetTransformInfo::shouldBuildLookupTables() const {
	return TTIImpl->shouldBuildLookupTables();			return TTIImpl->shouldBuildLookupTables();
	▲ Show 20 Lines • Show All 925 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 918 Lines • ▼ Show 20 Lines	public:
bool isFMAFasterThanFMulAndFAdd(const Function &F, Type *Ty) const override;		bool isFMAFasterThanFMulAndFAdd(const Function &F, Type *Ty) const override;

/// isProfitableToHoist - Check if it is profitable to hoist instruction		/// isProfitableToHoist - Check if it is profitable to hoist instruction
/// \p I to its dominator block.		/// \p I to its dominator block.
/// For example, it is not profitable if \p I and it's only user can form a		/// For example, it is not profitable if \p I and it's only user can form a
/// FMA instruction, because Powerpc prefers FMADD.		/// FMA instruction, because Powerpc prefers FMADD.
bool isProfitableToHoist(Instruction *I) const override;		bool isProfitableToHoist(Instruction *I) const override;

		bool isProfitableToGenerateFMA(Instruction *I) const override;

const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;		const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;

// Should we expand the build vector with shuffles?		// Should we expand the build vector with shuffles?
bool		bool
shouldExpandBuildVectorWithShuffles(EVT VT,		shouldExpandBuildVectorWithShuffles(EVT VT,
unsigned DefinedValues) const override;		unsigned DefinedValues) const override;

/// createFastISel - This method returns a target-specific FastISel object,		/// createFastISel - This method returns a target-specific FastISel object,
▲ Show 20 Lines • Show All 346 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,555 Lines • ▼ Show 20 Lines	case Instruction::Load: {
return false;		return false;
}		}
default:		default:
return true;		return true;
}		}
return true;		return true;
}		}

		bool PPCTargetLowering::isProfitableToGenerateFMA(Instruction *I) const {
		const TargetOptions &Options = getTargetMachine().Options;
		const Function *F = I->getFunction();
		const DataLayout &DL = F->getParent()->getDataLayout();
		Type *Ty = I->getOperand(0)->getType();

		return isFMAFasterThanFMulAndFAdd(*F, Ty) &&
		isOperationLegalOrCustom(ISD::FMA, getValueType(DL, Ty)) &&
		(Options.AllowFPOpFusion == FPOpFusion::Fast \|\| Options.UnsafeFPMath);
		}

const MCPhysReg *		const MCPhysReg *
PPCTargetLowering::getScratchRegisters(CallingConv::ID) const {		PPCTargetLowering::getScratchRegisters(CallingConv::ID) const {
// LR is a callee-save register, but we must treat it as clobbered by any call		// LR is a callee-save register, but we must treat it as clobbered by any call
// site. Hence we include LR in the scratch registers, which are in turn added		// site. Hence we include LR in the scratch registers, which are in turn added
// as implicit-defs for stackmaps and patchpoints. The same reasoning applies		// as implicit-defs for stackmaps and patchpoints. The same reasoning applies
// to CTR, which is used by any indirect call.		// to CTR, which is used by any indirect call.
static const MCPhysReg ScratchRegs[] = {		static const MCPhysReg ScratchRegs[] = {
PPC::X12, PPC::LR8, PPC::CTR8, 0		PPC::X12, PPC::LR8, PPC::CTR8, 0
▲ Show 20 Lines • Show All 730 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/Reassociate.cpp

Show First 20 Lines • Show All 1,491 Lines • ▼ Show 20 Lines	if (!match(TheOp, m_Neg(m_Value(X))) && !match(TheOp, m_Not(m_Value(X))) &&
continue;		continue;

unsigned FoundX = FindInOperandList(Ops, i, X);		unsigned FoundX = FindInOperandList(Ops, i, X);
if (FoundX == i)		if (FoundX == i)
continue;		continue;

// Remove X and -X from the operand list.		// Remove X and -X from the operand list.
if (Ops.size() == 2 &&		if (Ops.size() == 2 &&
(match(TheOp, m_Neg(m_Value())) \|\| match(TheOp, m_FNeg(m_Value()))))		(match(TheOp, m_Neg(m_Value())) \|\| match(TheOp, m_FNeg(m_Value()))))
		lebedev.riUnsubmitted Not Done Reply Inline Actions So we also prevent the transform for integers now? But there's no FMA for them? lebedev.ri: So we also prevent the transform for integers now? But there's no FMA for them?
return Constant::getNullValue(X->getType());		return Constant::getNullValue(X->getType());

// Remove X and ~X from the operand list.		// Remove X and ~X from the operand list.
if (Ops.size() == 2 && match(TheOp, m_Not(m_Value())))		if (Ops.size() == 2 && match(TheOp, m_Not(m_Value())))
return Constant::getAllOnesValue(X->getType());		return Constant::getAllOnesValue(X->getType());

Ops.erase(Ops.begin()+i);		Ops.erase(Ops.begin()+i);
if (i < FoundX)		if (i < FoundX)
--FoundX;		--FoundX;
else		else
--i; // Need to back up an extra one.		--i; // Need to back up an extra one.
Ops.erase(Ops.begin()+FoundX);		Ops.erase(Ops.begin()+FoundX);
++NumAnnihil;		++NumAnnihil;
--i; // Revisit element.		--i; // Revisit element.
e -= 2; // Removed two elements.		e -= 2; // Removed two elements.

// if X and ~X we append -1 to the operand list.		// if X and ~X we append -1 to the operand list.
if (match(TheOp, m_Not(m_Value()))) {		if (match(TheOp, m_Not(m_Value()))) {
Value *V = Constant::getAllOnesValue(X->getType());		Value *V = Constant::getAllOnesValue(X->getType());
Ops.insert(Ops.end(), ValueEntry(getRank(V), V));		Ops.insert(Ops.end(), ValueEntry(getRank(V), V));
e += 1;		e += 1;
}		}
}		}

		// On some target, add-mul operation is more preferable, common out mul factor
		// has no gain at all and breaks the add-mul folding, so bail out early.
		if (TTI->isProfitableToGenerateFMA(I))
		return nullptr;

// Scan the operand list, checking to see if there are any common factors		// Scan the operand list, checking to see if there are any common factors
// between operands. Consider something like AA+AB*C+D. We would like to		// between operands. Consider something like AA+AB*C+D. We would like to
// reassociate this to A(A+BC)+D, which reduces the number of multiplies.		// reassociate this to A(A+BC)+D, which reduces the number of multiplies.
// To efficiently find this, we count the number of times a factor occurs		// To efficiently find this, we count the number of times a factor occurs
// for any ADD operands that are MULs.		// for any ADD operands that are MULs.
DenseMap<Value*, unsigned> FactorOccurrences;		DenseMap<Value*, unsigned> FactorOccurrences;

// Keep track of each multiply we see, to avoid triggering on (X4)+(X4)		// Keep track of each multiply we see, to avoid triggering on (X4)+(X4)
▲ Show 20 Lines • Show All 851 Lines • ▼ Show 20 Lines	for (Instruction &I : *BI) {
++res.first->second.Score;		++res.first->second.Score;
}		}
}		}
}		}
}		}
}		}
}		}

PreservedAnalyses ReassociatePass::run(Function &F, FunctionAnalysisManager &) {		bool ReassociatePass::runImpl(Function &F, const TargetTransformInfo *_TTI) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter '_TTI' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter '_TTI' [readability-identifier-naming]…
// Get the functions basic blocks in Reverse Post Order. This order is used by		// Get the functions basic blocks in Reverse Post Order. This order is used by
// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic		// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
// blocks (it has been seen that the analysis in this pass could hang when		// blocks (it has been seen that the analysis in this pass could hang when
// analysing dead basic blocks).		// analysing dead basic blocks).
ReversePostOrderTraversal<Function *> RPOT(&F);		ReversePostOrderTraversal<Function *> RPOT(&F);

// Calculate the rank map for F.		// Calculate the rank map for F.
BuildRankMap(F, RPOT);		BuildRankMap(F, RPOT);

// Build the pair map before running reassociate.		// Build the pair map before running reassociate.
// Technically this would be more accurate if we did it after one round		// Technically this would be more accurate if we did it after one round
// of reassociation, but in practice it doesn't seem to help much on		// of reassociation, but in practice it doesn't seem to help much on
// real-world code, so don't waste the compile time running reassociate		// real-world code, so don't waste the compile time running reassociate
// twice.		// twice.
// If a user wants, they could expicitly run reassociate twice in their		// If a user wants, they could expicitly run reassociate twice in their
// pass pipeline for further potential gains.		// pass pipeline for further potential gains.
// It might also be possible to update the pair map during runtime, but the		// It might also be possible to update the pair map during runtime, but the
// overhead of that may be large if there's many reassociable chains.		// overhead of that may be large if there's many reassociable chains.
BuildPairMap(RPOT);		BuildPairMap(RPOT);

MadeChange = false;		MadeChange = false;
		TTI = _TTI;

// Traverse the same blocks that were analysed by BuildRankMap.		// Traverse the same blocks that were analysed by BuildRankMap.
for (BasicBlock *BI : RPOT) {		for (BasicBlock *BI : RPOT) {
assert(RankMap.count(&*BI) && "BB should be ranked.");		assert(RankMap.count(&*BI) && "BB should be ranked.");
// Optimize every instruction in the basic block.		// Optimize every instruction in the basic block.
for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)		for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)
if (isInstructionTriviallyDead(&*II)) {		if (isInstructionTriviallyDead(&*II)) {
EraseInst(&*II++);		EraseInst(&*II++);
Show All 31 Lines	bool ReassociatePass::runImpl(Function &F, const TargetTransformInfo *_TTI) {
}		}

// We are done with the rank map and pair map.		// We are done with the rank map and pair map.
RankMap.clear();		RankMap.clear();
ValueRankMap.clear();		ValueRankMap.clear();
for (auto &Entry : PairMap)		for (auto &Entry : PairMap)
Entry.clear();		Entry.clear();

if (MadeChange) {		return MadeChange;

		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Lint: Pre-merge checks: clang-format: please reformat the code ``` - ```
		}

		PreservedAnalyses ReassociatePass::run(Function &F,
		FunctionAnalysisManager &AM) {
		const TargetTransformInfo *TTI = &AM.getResult<TargetIRAnalysis>(F);

		if (runImpl(F, TTI)) {
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<AAManager>();		PA.preserve<AAManager>();
PA.preserve<BasicAA>();		PA.preserve<BasicAA>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
return PA;		return PA;
}		}

Show All 10 Lines	public:

ReassociateLegacyPass() : FunctionPass(ID) {		ReassociateLegacyPass() : FunctionPass(ID) {
initializeReassociateLegacyPassPass(*PassRegistry::getPassRegistry());		initializeReassociateLegacyPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;
		const TargetTransformInfo *TTI =
		&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

FunctionAnalysisManager DummyFAM;		return Impl.runImpl(F, TTI);
auto PA = Impl.run(F, DummyFAM);
return !PA.areAllPreserved();
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addPreserved<BasicAAWrapperPass>();		AU.addPreserved<BasicAAWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char ReassociateLegacyPass::ID = 0;		char ReassociateLegacyPass::ID = 0;

INITIALIZE_PASS(ReassociateLegacyPass, "reassociate",		INITIALIZE_PASS(ReassociateLegacyPass, "reassociate",
"Reassociate expressions", false, false)		"Reassociate expressions", false, false)

// Public interface to the Reassociate pass		// Public interface to the Reassociate pass
FunctionPass *llvm::createReassociatePass() {		FunctionPass *llvm::createReassociatePass() {
return new ReassociateLegacyPass();		return new ReassociateLegacyPass();
}		}

llvm/test/Transforms/Reassociate/PowerPC/lit.local.cfg

This file was added.

				if not 'PowerPC' in config.root.targets:
				config.unsupported = True

llvm/test/Transforms/Reassociate/PowerPC/prefer-fma.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -reassociate -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-unknown-linux-gnu"

				; check that on PowerPC target, we don't do mul factor commoning optimization.

				; a * b + a * c -> a * (b + c) has no gain on PowerPC.
				; before transform: a * b + a * c -> fma(a*b, a, c)
				; after transform: a * (b + c) -> fmul(a, b + c)
				; fma, fmul, fadd are all have same latency on PowerPC.
				define double @foo(double %0, double %1, double %2) {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: [[TMP4:%.]] = fmul double [[TMP0:%.]], [[TMP1:%.*]]
				; CHECK-NEXT: [[TMP5:%.]] = fmul double [[TMP0]], [[TMP2:%.]]
				; CHECK-NEXT: [[TMP6:%.*]] = fadd double [[TMP4]], [[TMP5]]
				; CHECK-NEXT: ret double [[TMP6]]
				;
				%4 = fmul double %0, %1
				%5 = fmul double %0, %2
				%6 = fadd double %4, %5
				ret double %6
				}

				; a + 1.47 * b - 1.47 * c + 2.47 * d - 2.47 * e -> a + 1.47 * (b - c) + 2.47 * (d - e) also has no gain.
				; before transform: fma( fma( fma( fma(a, 1.47, b), -1.47, c), -2.47, d), 2.47, e)
				; after transform: fma( fma(a, 1.47, sub(b, c)), 2.47, sub(d, e))
				; fma fmul, fsub all have same latency on PowerPC. Also we lose the folding opportunity for fma.
				define double @fmaChain(double %0, double %1, double %2, double %3, double %4) {
				; CHECK-LABEL: @fmaChain(
				; CHECK-NEXT: [[TMP6:%.]] = fmul double [[TMP1:%.]], 1.470000e+00
				; CHECK-NEXT: [[TMP7:%.]] = fadd double [[TMP0:%.]], [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.]] = fmul double [[TMP2:%.]], 1.470000e+00
				; CHECK-NEXT: [[TMP9:%.*]] = fsub double [[TMP7]], [[TMP8]]
				; CHECK-NEXT: [[TMP10:%.]] = fmul double [[TMP3:%.]], 2.470000e+00
				; CHECK-NEXT: [[TMP11:%.*]] = fadd double [[TMP9]], [[TMP10]]
				; CHECK-NEXT: [[TMP12:%.]] = fmul double [[TMP4:%.]], 2.470000e+00
				; CHECK-NEXT: [[TMP13:%.*]] = fsub double [[TMP11]], [[TMP12]]
				; CHECK-NEXT: ret double [[TMP13]]
				;
				%6 = fmul double %1, 1.470000e+00
				%7 = fadd double %6, %0
				%8 = fmul double %2, 1.470000e+00
				%9 = fsub double %7, %8
				%10 = fmul double %3, 2.470000e+00
				%11 = fadd double %9, %10
				%12 = fmul double %4, 2.470000e+00
				%13 = fsub double %11, %12
				ret double %13
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Reassociate] [PowerPC] stop common out mul factors if fma is preferred on targetAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 283826

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/Transforms/Scalar/Reassociate.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/lib/Transforms/Scalar/Reassociate.cpp

llvm/test/Transforms/Reassociate/PowerPC/lit.local.cfg

llvm/test/Transforms/Reassociate/PowerPC/prefer-fma.ll

[Reassociate] [PowerPC] stop common out mul factors if fma is preferred on target
AbandonedPublic