This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Target/PowerPC/
-
lib/
-
Target/
-
PowerPC/
-
PPCISelLowering.cpp

Differential D77558

PowerPC: Don't hoist float multiply + add to fused operation on SPE
ClosedPublic

Authored by jhibbits on Apr 6 2020, 8:12 AM.

Download Raw Diff

Details

Reviewers

shchenz
nemanjai

Group Reviewers

Restricted Project

Summary

SPE doesn't have a fmadd instruction, so don't bother hoisting a
multiply and add sequence to this, as it'd become just a library call.
Hoisting happens too late for the CTR usability test to veto using the CTR
in a loop, and results in an assert "Invalid PPC CTR loop!".

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhibbits created this revision.Apr 6 2020, 8:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2020, 8:12 AM

Herald added subscribers: llvm-commits, shchenz, kbarton and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B51955: Diff 255351.Apr 6 2020, 9:45 AM

Herald added a subscriber: • wuzish. · View Herald TranscriptApr 6 2020, 9:45 AM

lkail added a reviewer: shchenz.Apr 6 2020, 3:20 PM

This happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".

This means after not hoisting fmul makes a case have "Invalid PPC CTR loop!" assertion? This is a little surprised for me. Hoist or not hoist fmul should not impact CTR register.
Looking forward to your case.

In D77558#1966207, @shchenz wrote:

This happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".

This means after not hoisting fmul makes a case have "Invalid PPC CTR loop!" assertion? This is a little surprised for me. Hoist or not hoist fmul should not impact CTR register.
Looking forward to your case.

Sorry, on a further reading of the summary I can see it can be a little confusing. *hoisting* fmul + fadd to fma results in a library call, because SPE doesn't have a fma instruction. Unfortunately, this transform is performed long after a loop is transformed to a CTR with bdnz loop, so it can't be caught at the loop transform time, and block the loop. In addition to triggering that assert, hoisting two instructions into a function call is quite a pessimization anyway :)

I'll update the summary to clarify this.

jhibbits edited the summary of this revision. (Show Details)Apr 7 2020, 7:06 AM

adalava added a subscriber: adalava.Jul 5 2022, 12:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2022, 12:20 PM

I think this is the right thing to do regardless of whether it affects CTR loops or not. The FMA is never faster with SPE and that should be marked as such. But yes, we still need a test case.

Add test case into fma-assoc test. This appears to be the simplest test to validate the change.

Harbormaster completed remote builds in B174032: Diff 442752.Jul 6 2022, 7:46 PM

@nemanjai Any further comment on this, or is it good to go?

LGTM.

This revision is now accepted and ready to land.Aug 9 2022, 8:01 AM

https://github.com/llvm/llvm-project/commit/f43b2285815961da057af1a772bc31d0152d286b

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCISelLowering.cpp

2 lines

Diff 255351

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 15,468 Lines • ▼ Show 20 Lines
	bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,			bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
	EVT VT) const {			EVT VT) const {
	return isFMAFasterThanFMulAndFAdd(			return isFMAFasterThanFMulAndFAdd(
	MF.getFunction(), VT.getTypeForEVT(MF.getFunction().getContext()));			MF.getFunction(), VT.getTypeForEVT(MF.getFunction().getContext()));
	}			}

	bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const Function &F,			bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const Function &F,
	Type *Ty) const {			Type *Ty) const {
				if (Subtarget.hasSPE())
				return false;
	switch (Ty->getScalarType()->getTypeID()) {			switch (Ty->getScalarType()->getTypeID()) {
	case Type::FloatTyID:			case Type::FloatTyID:
	case Type::DoubleTyID:			case Type::DoubleTyID:
	return true;			return true;
	case Type::FP128TyID:			case Type::FP128TyID:
	return EnableQuadPrecision && Subtarget.hasP9Vector();			return EnableQuadPrecision && Subtarget.hasP9Vector();
	default:			default:
	return false;			return false;
	▲ Show 20 Lines • Show All 631 Lines • Show Last 20 Lines