This is an archive of the discontinued LLVM Phabricator instance.

Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity
ClosedPublic

Authored by • hulx2000 on Jan 30 2015, 10:06 AM.

Download Raw Diff

Details

Reviewers

Jiangning
hfinkel
mcrosier
apazos

Summary

For the following code:

if (y == 0)

{
y = -x * a;
b = a;
}

else

{
y -= x * a;
a = b;
}

HoistThenElseCodeToIf in SimplifyCFG.cpp will hoist the multiply outside the THEN/ELSE, stopping the compiler from forming fmsub later, this is not always a wise thing to do because target that support fused instructions can't take advantage of it, on some target, fmsub/fmadd is more efficient than (fmul +fadd/fsub), plus, by looking at the code generated by gcc compiler, I can see gcc doesn't do that kind of hoisting and leading to better performance on SPEC benchmark which I was analyzing.

This patch is to teach HoistThenElseCodeToIf to stop doing that kind of hoisting by using a target hook that checks for fmadd/fmsub opportunites , that improved spec2006/soplex by 2% consistently on cortex-a57 core, no significant gain for other spec2000/spec2006 benchmarks.

Diff Detail

Repository: rL LLVM

Event Timeline

• hulx2000 updated this revision to Diff 19043.Jan 30 2015, 10:06 AM

• hulx2000 retitled this revision from to Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.

• hulx2000 updated this object.

• hulx2000 edited the test plan for this revision. (Show Details)

• hulx2000 set the repository for this revision to rL LLVM.

• hulx2000 updated this object.

• hulx2000 updated this object.Jan 30 2015, 10:21 AM

• hulx2000 added a subscriber: Unknown Object (MLST).

• hulx2000 updated this object.Jan 30 2015, 10:25 AM

hfinkel added a subscriber: hfinkel.Jan 30 2015, 11:15 AM

hfinkel added inline comments.

include/llvm/Analysis/TargetTransformInfo.h
339	This name is pretty long. How about just isProfitableToHoist?
lib/Target/AArch64/AArch64ISelLowering.cpp
6726	I think you can make this the default implementation. PPC will want essentially the same implementation.
6731	Space before Only.
6743	Space after VT (you should probably run clang-format on the entire patch)

• hulx2000 added reviewers: apazos, mcrosier, Jiangning, hfinkel.Jan 30 2015, 11:17 AM

• hulx2000 removed a subscriber: hfinkel.

make it default as Hal suggested, unit test remain under AArch64, whoever want to support other target please add test.

Adjusting for my comment below, LGTM.

include/llvm/Target/TargetLowering.h
27	You don't want enableAggressiveFMAFusion(VT) here. enableAggressiveFMAFusion(VT) affects whether we combine to form an FMA even when the fmul has more than one use, combine FMAs with other nodes to form multiple FMAs, whether we combine them through fpext nodes, etc. -- but it does not affect fusion legality (and, thus, whether a mul will be combined with a fsub/fadd to from an FMA).

This revision is now accepted and ready to land.Feb 3 2015, 4:36 PM

removed the check of enableAggressiveFMAFusion

LGTM.

Lawrence,
The patch does not apply cleanly to top of trunk. Please rebase and upload a new patch. Once complete, I'd be happy to commit the patch on your behalf.

Chad

• hulx2000 edited edge metadata.Feb 19 2015, 11:08 AM

• hulx2000 set the repository for this revision to rL LLVM.

rebased patch

Committed in r230241.

IMO this patch should not have been committed.

lib/Target/AArch64/AArch64ISelLowering.cpp
6540	I think that this is only avoiding a bigger issue, that is currently the instruction selection happens on a single basic block. A better fix would be a machine pass that pairs multiplies with add/sub to form more FMAs.
6561	If both the multiply and the add/sub it feeds into can be hoisted, this code will still prevent multiplies to be hoisted.

• hulx2000 added inline comments.Apr 8 2016, 3:18 PM

lib/Target/AArch64/AArch64ISelLowering.cpp
6540	Could you please explain what do you mean about avoiding a bigger issue?
6561	I do agree about that.

sebpop added inline comments.Apr 8 2016, 6:37 PM

lib/Target/AArch64/AArch64ISelLowering.cpp
6540	Instruction selection happens on a single basic block, and it cannot form FMAs across BBs. The long term solution is to have an instruction selection that can see further than a BB. Until then probably we could have an MI pass that matches FMAs across BB boundaries.

sebpop mentioned this in D100963: [AArch64] Revert "Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.".Apr 21 2021, 8:23 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

8 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

4 lines

Target/

TargetLowering.h

2 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

AArch64/

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

28 lines

Transforms/

Utils/

SimplifyCFG.cpp

8 lines

test/

Transforms/

SimplifyCFG/

AArch64/

lit.local.cfg

5 lines

prefer-fma.ll

72 lines

Diff 20320

include/llvm/Analysis/TargetTransformInfo.h

Context not available.
	/// by referencing its sub-register AX.	/// by referencing its sub-register AX.
	bool isTruncateFree(Type Ty1, Type Ty2) const;	bool isTruncateFree(Type Ty1, Type Ty2) const;

		/// \brief Return true if it is profitable to hoist instruction in the
		/// then/else to before if.
		bool isProfitableToHoist(Instruction *I) const;

	/// \brief Return true if this type is legal.	/// \brief Return true if this type is legal.
	bool isTypeLegal(Type *Ty) const;	bool isTypeLegal(Type *Ty) const;

		hfinkelUnsubmitted Not Done Reply Inline Actions This name is pretty long. How about just isProfitableToHoist? hfinkel: This name is pretty long. How about just isProfitableToHoist?
Context not available.
	int64_t BaseOffset, bool HasBaseReg,	int64_t BaseOffset, bool HasBaseReg,
	int64_t Scale) = 0;	int64_t Scale) = 0;
	virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;	virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
		virtual bool isProfitableToHoist(Instruction *I) = 0;
	virtual bool isTypeLegal(Type *Ty) = 0;	virtual bool isTypeLegal(Type *Ty) = 0;
	virtual unsigned getJumpBufAlignment() = 0;	virtual unsigned getJumpBufAlignment() = 0;
	virtual unsigned getJumpBufSize() = 0;	virtual unsigned getJumpBufSize() = 0;
Context not available.
	bool isTruncateFree(Type Ty1, Type Ty2) override {	bool isTruncateFree(Type Ty1, Type Ty2) override {
	return Impl.isTruncateFree(Ty1, Ty2);	return Impl.isTruncateFree(Ty1, Ty2);
	}	}
		bool isProfitableToHoist(Instruction *I) override {
		return Impl.isProfitableToHoist(I);
		}
	bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); }	bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); }
	unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); }	unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); }
	unsigned getJumpBufSize() override { return Impl.getJumpBufSize(); }	unsigned getJumpBufSize() override { return Impl.getJumpBufSize(); }
Context not available.

include/llvm/Analysis/TargetTransformInfoImpl.h

Context not available.

	bool isTruncateFree(Type Ty1, Type Ty2) { return false; }	bool isTruncateFree(Type Ty1, Type Ty2) { return false; }

		bool isProfitableToHoist(Instruction *I) { return true; }

	bool isTypeLegal(Type *Ty) { return false; }	bool isTypeLegal(Type *Ty) { return false; }

	unsigned getJumpBufAlignment() { return 0; }	unsigned getJumpBufAlignment() { return 0; }
Context not available.

include/llvm/CodeGen/BasicTTIImpl.h

Context not available.
	return getTLI()->isTruncateFree(Ty1, Ty2);	return getTLI()->isTruncateFree(Ty1, Ty2);
	}	}

		bool isProfitableToHoist(Instruction *I) {
		return getTLI()->isProfitableToHoist(I);
		}

	bool isTypeLegal(Type *Ty) {	bool isTypeLegal(Type *Ty) {
	EVT VT = getTLI()->getValueType(Ty);	EVT VT = getTLI()->getValueType(Ty);
	return getTLI()->isTypeLegal(VT);	return getTLI()->isTypeLegal(VT);
Context not available.

include/llvm/Target/TargetLowering.h

Context not available.
	return false;	return false;
	}	}

		virtual bool isProfitableToHoist(Instruction *I) const { return true; }

	/// Return true if any actual instruction that defines a value of type Ty1	/// Return true if any actual instruction that defines a value of type Ty1
	/// implicitly zero-extends the value to Ty2 in the result register.	/// implicitly zero-extends the value to Ty2 in the result register.
	///	///
Context not available.

lib/Analysis/TargetTransformInfo.cpp

Context not available.
	return TTIImpl->isTruncateFree(Ty1, Ty2);	return TTIImpl->isTruncateFree(Ty1, Ty2);
	}	}

		bool TargetTransformInfo::isProfitableToHoist(Instruction *I) const {
		return TTIImpl->isProfitableToHoist(I);
		}

	bool TargetTransformInfo::isTypeLegal(Type *Ty) const {	bool TargetTransformInfo::isTypeLegal(Type *Ty) const {
	return TTIImpl->isTypeLegal(Ty);	return TTIImpl->isTypeLegal(Ty);
	}	}
Context not available.

lib/Target/AArch64/AArch64ISelLowering.h

Context not available.
	#include "llvm/CodeGen/CallingConvLower.h"	#include "llvm/CodeGen/CallingConvLower.h"
	#include "llvm/CodeGen/SelectionDAG.h"	#include "llvm/CodeGen/SelectionDAG.h"
	#include "llvm/IR/CallingConv.h"	#include "llvm/IR/CallingConv.h"
		#include "llvm/IR/Instruction.h"
	#include "llvm/Target/TargetLowering.h"	#include "llvm/Target/TargetLowering.h"

	namespace llvm {	namespace llvm {
Context not available.
	bool isTruncateFree(Type Ty1, Type Ty2) const override;	bool isTruncateFree(Type Ty1, Type Ty2) const override;
	bool isTruncateFree(EVT VT1, EVT VT2) const override;	bool isTruncateFree(EVT VT1, EVT VT2) const override;

		bool isProfitableToHoist(Instruction *I) const override;

	bool isZExtFree(Type Ty1, Type Ty2) const override;	bool isZExtFree(Type Ty1, Type Ty2) const override;
	bool isZExtFree(EVT VT1, EVT VT2) const override;	bool isZExtFree(EVT VT1, EVT VT2) const override;
	bool isZExtFree(SDValue Val, EVT VT2) const override;	bool isZExtFree(SDValue Val, EVT VT2) const override;
Context not available.

lib/Target/AArch64/AArch64ISelLowering.cpp

Context not available.
	return NumBits1 > NumBits2;	return NumBits1 > NumBits2;
	}	}

		/// Check if it is profitable to hoist instruction in then/else to if.
		/// Not profitable if I and it's user can form a FMA instruction
		/// because we prefer FMSUB/FMADD.
		sebpopUnsubmitted Not Done Reply Inline Actions I think that this is only avoiding a bigger issue, that is currently the instruction selection happens on a single basic block. A better fix would be a machine pass that pairs multiplies with add/sub to form more FMAs. sebpop: I think that this is only avoiding a bigger issue, that is currently the instruction selection…
		hulx2000AuthorUnsubmitted Not Done Reply Inline Actions Could you please explain what do you mean about avoiding a bigger issue? hulx2000: Could you please explain what do you mean about avoiding a bigger issue?
		sebpopUnsubmitted Not Done Reply Inline Actions Instruction selection happens on a single basic block, and it cannot form FMAs across BBs. The long term solution is to have an instruction selection that can see further than a BB. Until then probably we could have an MI pass that matches FMAs across BB boundaries. sebpop: Instruction selection happens on a single basic block, and it cannot form FMAs across BBs. The…
		bool AArch64TargetLowering::isProfitableToHoist(Instruction *I) const {
		if (I->getOpcode() != Instruction::FMul)
		return true;

		if (I->getNumUses() != 1)
		return true;

		Instruction *User = I->user_back();

		if (User &&
		!(User->getOpcode() == Instruction::FSub \|\|
		User->getOpcode() == Instruction::FAdd))
		return true;

		const TargetOptions &Options = getTargetMachine().Options;
		EVT VT = getValueType(User->getOperand(0)->getType());

		if (isFMAFasterThanFMulAndFAdd(VT) &&
		isOperationLegalOrCustom(ISD::FMA, VT) &&
		(Options.AllowFPOpFusion == FPOpFusion::Fast \|\| Options.UnsafeFPMath))
		return false;
		sebpopUnsubmitted Not Done Reply Inline Actions If both the multiply and the add/sub it feeds into can be hoisted, this code will still prevent multiplies to be hoisted. sebpop: If both the multiply and the add/sub it feeds into can be hoisted, this code will still prevent…
		hulx2000AuthorUnsubmitted Not Done Reply Inline Actions I do agree about that. hulx2000: I do agree about that.

		return true;
		}

	// All 32-bit GPR operations implicitly zero the high-half of the corresponding	// All 32-bit GPR operations implicitly zero the high-half of the corresponding
	// 64-bit GPR.	// 64-bit GPR.
	bool AArch64TargetLowering::isZExtFree(Type Ty1, Type Ty2) const {	bool AArch64TargetLowering::isZExtFree(Type Ty1, Type Ty2) const {
Context not available.
		hfinkelUnsubmitted Not Done Reply Inline Actions Space before Only. hfinkel: Space before Only.
		hfinkelUnsubmitted Not Done Reply Inline Actions Space after VT (you should probably run clang-format on the entire patch) hfinkel: Space after VT (you should probably run clang-format on the entire patch)
		hfinkelUnsubmitted Not Done Reply Inline Actions I think you can make this the default implementation. PPC will want essentially the same implementation. hfinkel: I think you can make this the default implementation. PPC will want essentially the same…

lib/Transforms/Utils/SimplifyCFG.cpp

Context not available.
	/// HoistThenElseCodeToIf - Given a conditional branch that goes to BB1 and	/// HoistThenElseCodeToIf - Given a conditional branch that goes to BB1 and
	/// BB2, hoist any common code in the two blocks up into the branch block. The	/// BB2, hoist any common code in the two blocks up into the branch block. The
	/// caller of this function guarantees that BI's block dominates BB1 and BB2.	/// caller of this function guarantees that BI's block dominates BB1 and BB2.
	static bool HoistThenElseCodeToIf(BranchInst BI, const DataLayout DL) {	static bool HoistThenElseCodeToIf(BranchInst BI, const DataLayout DL,
		const TargetTransformInfo &TTI) {
	// This does very trivial matching, with limited scanning, to find identical	// This does very trivial matching, with limited scanning, to find identical
	// instructions in the two blocks. In particular, we don't want to get into	// instructions in the two blocks. In particular, we don't want to get into
	// O(M*N) situations here where M and N are the sizes of BB1 and BB2. As	// O(M*N) situations here where M and N are the sizes of BB1 and BB2. As
Context not available.
	if (isa<TerminatorInst>(I1))	if (isa<TerminatorInst>(I1))
	goto HoistTerminator;	goto HoistTerminator;

		if (!TTI.isProfitableToHoist(I1) \|\| !TTI.isProfitableToHoist(I2))
		return Changed;

	// For a normal instruction, we just move one to right before the branch,	// For a normal instruction, we just move one to right before the branch,
	// then replace all uses of the other with the first. Finally, we remove	// then replace all uses of the other with the first. Finally, we remove
	// the now redundant second instruction.	// the now redundant second instruction.
Context not available.
	// can hoist it up to the branching block.	// can hoist it up to the branching block.
	if (BI->getSuccessor(0)->getSinglePredecessor()) {	if (BI->getSuccessor(0)->getSinglePredecessor()) {
	if (BI->getSuccessor(1)->getSinglePredecessor()) {	if (BI->getSuccessor(1)->getSinglePredecessor()) {
	if (HoistThenElseCodeToIf(BI, DL))	if (HoistThenElseCodeToIf(BI, DL, TTI))
	return SimplifyCFG(BB, TTI, BonusInstThreshold, DL, AC) \| true;	return SimplifyCFG(BB, TTI, BonusInstThreshold, DL, AC) \| true;
	} else {	} else {
	// If Successor #1 has multiple preds, we may be able to conditionally	// If Successor #1 has multiple preds, we may be able to conditionally
Context not available.

test/Transforms/SimplifyCFG/AArch64/lit.local.cfg

This file was added.

				config.suffixes = ['.ll']

				targets = set(config.root.targets_to_build.split())
				if not 'AArch64' in targets:
				config.unsupported = True

test/Transforms/SimplifyCFG/AArch64/prefer-fma.ll

This file was added.

				; RUN: opt < %s -mtriple=aarch64-linux-gnu -simplifycfg -enable-unsafe-fp-math -S >%t
				; RUN: FileCheck %s < %t
				; ModuleID = 't.cc'

				; Function Attrs: nounwind
				define double @_Z3fooRdS_S_S_(double* dereferenceable(8) %x, double* dereferenceable(8) %y, double* dereferenceable(8) %a) #0 {
				entry:
				%0 = load double* %y, align 8
				%cmp = fcmp oeq double %0, 0.000000e+00
				%1 = load double* %x, align 8
				br i1 %cmp, label %if.then, label %if.else

				; fadd (const, (fmul x, y))
				if.then: ; preds = %entry
				; CHECK-LABEL: if.then:
				; CHECK: %3 = fmul fast double %1, %2
				; CHECK-NEXT: %mul = fadd fast double 1.000000e+00, %3
				%2 = load double* %a, align 8
				%3 = fmul fast double %1, %2
				%mul = fadd fast double 1.000000e+00, %3
				store double %mul, double* %y, align 8
				br label %if.end

				; fsub ((fmul x, y), z)
				if.else: ; preds = %entry
				; CHECK-LABEL: if.else:
				; CHECK: %mul1 = fmul fast double %1, %2
				; CHECK-NEXT: %sub1 = fsub fast double %mul1, %0
				%4 = load double* %a, align 8
				%mul1 = fmul fast double %1, %4
				%sub1 = fsub fast double %mul1, %0
				store double %sub1, double* %y, align 8
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%5 = load double* %y, align 8
				%cmp2 = fcmp oeq double %5, 2.000000e+00
				%6 = load double* %x, align 8
				br i1 %cmp2, label %if.then2, label %if.else2

				; fsub (x, (fmul y, z))
				if.then2: ; preds = %entry
				; CHECK-LABEL: if.then2:
				; CHECK: %7 = fmul fast double %5, 3.000000e+00
				; CHECK-NEXT: %mul2 = fsub fast double %6, %7
				%7 = load double* %a, align 8
				%8 = fmul fast double %6, 3.0000000e+00
				%mul2 = fsub fast double %7, %8
				store double %mul2, double* %y, align 8
				br label %if.end2

				; fsub (fneg((fmul x, y)), const)
				if.else2: ; preds = %entry
				; CHECK-LABEL: if.else2:
				; CHECK: %mul3 = fmul fast double %5, 3.000000e+00
				; CHECK-NEXT: %neg = fsub fast double 0.000000e+00, %mul3
				; CHECK-NEXT: %sub2 = fsub fast double %neg, 3.000000e+00
				%mul3 = fmul fast double %6, 3.0000000e+00
				%neg = fsub fast double 0.0000000e+00, %mul3
				%sub2 = fsub fast double %neg, 3.0000000e+00
				store double %sub2, double* %y, align 8
				br label %if.end2

				if.end2: ; preds = %if.else, %if.then
				%9 = load double* %x, align 8
				%10 = load double* %y, align 8
				%add = fadd fast double %9, %10
				%11 = load double* %a, align 8
				%add2 = fadd fast double %add, %11
				ret double %add2
				}