This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp
-
Transforms/
-
Scalar/
-
GVN.cpp
1
Reassociate.cpp
-
Vectorize/
2
SLPVectorizer.cpp

Differential D26436

Fix for lost FastMathFlags in 4 optimizations.
AbandonedPublic

Authored by v_klochkov on Nov 8 2016, 5:47 PM.

Download Raw Diff

Details

Reviewers

• dberlin

Summary

Hello,

Please review the trivial fixes in 4 different optimizations
that occasionally/wrongly lost FastMathFlags in FP operations.

I have more fixes for many cases in InstCombiner module,
but those are a bit more difficult and should be submitted for code-review separately.

Also, I added the developers, whose lines I changed, to the Subscribers list. (I used svn blame to get that list).

Thanks you,
Vyacheslav Klochkov

lib/Transforms/Scalar/GVN.cpp

Before GVN:  
  t1 = <fast> a+b
  store t1 to [mem1]
  ...
  t2 = load [mem1]
  use t2
After GVN:
  t1 = a+b // this operation must remain the <fast> flags!
  store t1 to [mem1]
  ...
  use t1 instead of original use of t2 (where t2=load from [mem1]).

The uses of t2 were replaced by uses of t1.
Unfortunately, both t1 and t2 can be cast to <FPMathOperator> 
(the load could be cast as well because the resulf of load was FP),
but here it is clear that math-flags of t1 and t2 should NOT be AND-ed.

lib/Transforms/Scalar/Reassociate.cpp

Before:
  ((a*b)*c)*d // all MUL operations have <fast> math flags.
After:
  (a*b)*(c*d) // all MUL operations lost <fast> math flags.

lib/Transforms/Vectorize/SLPVectorizer.cpp

FastMath flags were not propagated to a newly created FCmp operation.

lib/CodeGen/CodeGenPrepare.cpp

FastMath flags were not propagated to a newly created FCmp operation.

Diff Detail

Event Timeline

v_klochkov updated this revision to Diff 77294.Nov 8 2016, 5:47 PM

v_klochkov retitled this revision from to Fix for lost FastMathFlags in 4 optimizations..

v_klochkov updated this object.

v_klochkov added subscribers: llvm-commits, majnemer, chandlerc and 2 others.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptNov 8 2016, 5:47 PM

Hi,

The SLP changes look good to me, but can we also add a test for it?

Thanks,
Michael

You can probably split these in independent patches to facilitate review. Also each of this cases needs a test.

Ok, thank you for the response.
Initially I did not know how to test those test cases, but now I have some ideas.
Hopefully -run-pass switch will help me to test only 1 pass and check that FastMathFlags are not get lost in the affected opt passes.

For SLP you can take a look at Transforms/SLPVectorizer/X86/propagate_ir_flags.ll. I think in your case the test would be very similar.

Michael

majnemer added inline comments.Nov 10 2016, 10:27 AM

llvm/lib/Transforms/Scalar/Reassociate.cpp
1783–1785	This would be more obvious if you did: `Builder.setFastMathFlags(cast<FPMathOperator>(I)->getFastMathFlags())`
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2425	`auto *CI`
2431–2432	Please consistently brace.

Thank you for the code review and comments.
I decided to use Davide's advice and split this patch into 4 smaller and independent patches.

The first part (SLPVectorizer) was submitted here: https://reviews.llvm.org/D26543
The new SLPVectorizer patch is a bit different as the original patch was
not quite correct because it did not AND IR Flags for scalar FCmp operations.

Thank you,
v_klochkov

3 opts out of 4 optimizations mentioned here (i.e. Reassociate, SLP, GVN) have been fixed by me in separate smaller patches.

Only 1 optimization (CodeGenPrepare) have NOT been fixed yet. Sorry, I do not have time and tools for doing that now. Perhaps, there are some volunteers who can do it later.

Herald added a reviewer: • dberlin. · View Herald TranscriptMar 9 2018, 3:51 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CodeGenPrepare.cpp

4 lines

Transforms/

Scalar/

GVN.cpp

17 lines

Reassociate.cpp

7 lines

Vectorize/

SLPVectorizer.cpp

8 lines

Diff 77294

llvm/lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 938 Lines • ▼ Show 20 Lines	for (Value::user_iterator UI = CI->user_begin(), E = CI->user_end();
CmpInst *&InsertedCmp = InsertedCmps[UserBB];		CmpInst *&InsertedCmp = InsertedCmps[UserBB];

if (!InsertedCmp) {		if (!InsertedCmp) {
BasicBlock::iterator InsertPt = UserBB->getFirstInsertionPt();		BasicBlock::iterator InsertPt = UserBB->getFirstInsertionPt();
assert(InsertPt != UserBB->end());		assert(InsertPt != UserBB->end());
InsertedCmp =		InsertedCmp =
CmpInst::Create(CI->getOpcode(), CI->getPredicate(),		CmpInst::Create(CI->getOpcode(), CI->getPredicate(),
CI->getOperand(0), CI->getOperand(1), "", &*InsertPt);		CI->getOperand(0), CI->getOperand(1), "", &*InsertPt);
// Propagate the debug info.		// Propagate the debug info and FastMath flags.
InsertedCmp->setDebugLoc(CI->getDebugLoc());		InsertedCmp->setDebugLoc(CI->getDebugLoc());
		if (isa<FPMathOperator>(CI))
		InsertedCmp->copyFastMathFlags(CI->getFastMathFlags());
}		}

// Replace a use of the cmp with a use of the new cmp.		// Replace a use of the cmp with a use of the new cmp.
TheUse = InsertedCmp;		TheUse = InsertedCmp;
MadeChange = true;		MadeChange = true;
++NumCmpUses;		++NumCmpUses;
}		}

▲ Show 20 Lines • Show All 4,784 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 1,718 Lines • ▼ Show 20 Lines	if (CmpI->getPredicate() == CmpInst::Predicate::ICMP_EQ \|\|
// If only one operand is constant.		// If only one operand is constant.
if (RHSConst != nullptr && !isa<Constant>(CmpLHS))		if (RHSConst != nullptr && !isa<Constant>(CmpLHS))
ReplaceWithConstMap[CmpLHS] = RHSConst;		ReplaceWithConstMap[CmpLHS] = RHSConst;
}		}
}		}
return Changed;		return Changed;
}		}

static void patchReplacementInstruction(Instruction I, Value Repl) {		static void patchReplacementInstruction(Instruction I, Value Repl,
		bool IsLoad) {
auto *ReplInst = dyn_cast<Instruction>(Repl);		auto *ReplInst = dyn_cast<Instruction>(Repl);
if (!ReplInst)		if (!ReplInst)
return;		return;

// Patch the replacement so that it is not more restrictive than the value		// Patch the replacement so that it is not more restrictive than the value
// being replaced.		// being replaced.
		// Note that if 'I' is a load being replaced by some operation,
		// for example, by an arithmetic operation, then andIRFlags()
		// would just erase all math flags from the original arithmetic
		// operation, which is clearly not wanted and not needed.
		if (!IsLoad)
ReplInst->andIRFlags(I);		ReplInst->andIRFlags(I);

// FIXME: If both the original and replacement value are part of the		// FIXME: If both the original and replacement value are part of the
// same control-flow region (meaning that the execution of one		// same control-flow region (meaning that the execution of one
// guarantees the execution of the other), then we can combine the		// guarantees the execution of the other), then we can combine the
// noalias scopes here and do better than the general conservative		// noalias scopes here and do better than the general conservative
// answer used in combineMetadata().		// answer used in combineMetadata().

// In general, GVN unifies expressions over different control-flow		// In general, GVN unifies expressions over different control-flow
// regions, and so we need a conservative combination of the noalias		// regions, and so we need a conservative combination of the noalias
// scopes.		// scopes.
static const unsigned KnownIDs[] = {		static const unsigned KnownIDs[] = {
LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_range,		LLVMContext::MD_noalias, LLVMContext::MD_range,
LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,		LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,
LLVMContext::MD_invariant_group};		LLVMContext::MD_invariant_group};
combineMetadata(ReplInst, I, KnownIDs);		combineMetadata(ReplInst, I, KnownIDs);
}		}

static void patchAndReplaceAllUsesWith(Instruction I, Value Repl) {		static void patchAndReplaceAllUsesWith(Instruction I, Value Repl,
patchReplacementInstruction(I, Repl);		bool IsLoad = false) {
		patchReplacementInstruction(I, Repl, IsLoad);
I->replaceAllUsesWith(Repl);		I->replaceAllUsesWith(Repl);
}		}

/// Attempt to eliminate a load, first by eliminating it		/// Attempt to eliminate a load, first by eliminating it
/// locally, and then attempting non-local elimination if that fails.		/// locally, and then attempting non-local elimination if that fails.
bool GVN::processLoad(LoadInst *L) {		bool GVN::processLoad(LoadInst *L) {
if (!MD)		if (!MD)
return false;		return false;
Show All 26 Lines	if (!Dep.isDef() && !Dep.isClobber()) {
return false;		return false;
}		}

AvailableValue AV;		AvailableValue AV;
if (AnalyzeLoadAvailability(L, Dep, L->getPointerOperand(), AV)) {		if (AnalyzeLoadAvailability(L, Dep, L->getPointerOperand(), AV)) {
Value AvailableValue = AV.MaterializeAdjustedValue(L, L, this);		Value AvailableValue = AV.MaterializeAdjustedValue(L, L, this);

// Replace the load!		// Replace the load!
patchAndReplaceAllUsesWith(L, AvailableValue);		patchAndReplaceAllUsesWith(L, AvailableValue, true /* IsLoad */);
markInstructionForDeletion(L);		markInstructionForDeletion(L);
++NumGVNLoad;		++NumGVNLoad;
// Tell MDA to rexamine the reused pointer since we might have more		// Tell MDA to rexamine the reused pointer since we might have more
// information after forwarding it.		// information after forwarding it.
if (MD && AvailableValue->getType()->getScalarType()->isPointerTy())		if (MD && AvailableValue->getType()->getScalarType()->isPointerTy())
MD->invalidateCachedPointerInfo(AvailableValue);		MD->invalidateCachedPointerInfo(AvailableValue);
return true;		return true;
}		}
▲ Show 20 Lines • Show All 926 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/Reassociate.cpp

Show First 20 Lines • Show All 1,771 Lines • ▼ Show 20 Lines	Value ReassociatePass::OptimizeMul(BinaryOperator I,
// Try to turn linear trees of multiplies without other uses of the		// Try to turn linear trees of multiplies without other uses of the
// intermediate stages into minimal multiply DAGs with perfect sub-expression		// intermediate stages into minimal multiply DAGs with perfect sub-expression
// re-use.		// re-use.
SmallVector<Factor, 4> Factors;		SmallVector<Factor, 4> Factors;
if (!collectMultiplyFactors(Ops, Factors))		if (!collectMultiplyFactors(Ops, Factors))
return nullptr; // All distinct factors, so nothing left for us to do.		return nullptr; // All distinct factors, so nothing left for us to do.

IRBuilder<> Builder(I);		IRBuilder<> Builder(I);
		if (isa<FPMathOperator>(I)) {
		// This place can be reached only if unsafe algebra is permitted.
		// The newly generated operations must also have the proper MathFlags.
		FastMathFlags FMF;
		FMF.setUnsafeAlgebra();
		Builder.setFastMathFlags(FMF);
		majnemerUnsubmitted Not Done Reply Inline Actions This would be more obvious if you did: `Builder.setFastMathFlags(cast<FPMathOperator>(I)->getFastMathFlags())` majnemer: This would be more obvious if you did: `Builder.setFastMathFlags(cast<FPMathOperator>(I)…
		}
Value *V = buildMinimalMultiplyDAG(Builder, Factors);		Value *V = buildMinimalMultiplyDAG(Builder, Factors);
if (Ops.empty())		if (Ops.empty())
return V;		return V;

ValueEntry NewEntry = ValueEntry(getRank(V), V);		ValueEntry NewEntry = ValueEntry(getRank(V), V);
Ops.insert(std::lower_bound(Ops.begin(), Ops.end(), NewEntry), NewEntry);		Ops.insert(std::lower_bound(Ops.begin(), Ops.end(), NewEntry), NewEntry);
return nullptr;		return nullptr;
}		}
▲ Show 20 Lines • Show All 486 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 2,416 Lines • ▼ Show 20 Lines	case Instruction::ICmp: {
setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars);

Value *L = vectorizeTree(LHSV);		Value *L = vectorizeTree(LHSV);
Value *R = vectorizeTree(RHSV);		Value *R = vectorizeTree(RHSV);

if (Value *V = alreadyVectorized(E->Scalars))		if (Value *V = alreadyVectorized(E->Scalars))
return V;		return V;

CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst *CI = cast<CmpInst>(VL0);
		majnemerUnsubmitted Not Done Reply Inline Actions `auto CI` majnemer:* `auto *CI`
		CmpInst::Predicate P0 = CI->getPredicate();
Value *V;		Value *V;
if (Opcode == Instruction::FCmp)		if (Opcode == Instruction::FCmp) {
V = Builder.CreateFCmp(P0, L, R);		V = Builder.CreateFCmp(P0, L, R);
else		cast<FCmpInst>(V)->copyFastMathFlags(CI);
		} else
V = Builder.CreateICmp(P0, L, R);		V = Builder.CreateICmp(P0, L, R);
		majnemerUnsubmitted Not Done Reply Inline Actions Please consistently brace. majnemer: Please consistently brace.

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Select: {		case Instruction::Select: {
ValueList TrueVec, FalseVec, CondVec;		ValueList TrueVec, FalseVec, CondVec;
for (Value *V : E->Scalars) {		for (Value *V : E->Scalars) {
▲ Show 20 Lines • Show All 2,356 Lines • Show Last 20 Lines