This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
2
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
11
fma-aggressive.ll
1
fma-simple.ll

Differential D40696

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's
ClosedPublic

Authored by steleman on Nov 30 2017, 5:19 PM.

Download Raw Diff

Details

Reviewers

joel_k_jones
fhahn
evandro
mcrosier

Commits

rG0715092c65ba: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.
rL323474: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.

Summary

This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma).

Test case demonstrating the effects on T99 is included.

Diff Detail

Repository: rL LLVM

Event Timeline

steleman created this revision.Nov 30 2017, 5:19 PM

Herald added subscribers: kristof.beyls, javed.absar, rengolin, aemerson. · View Herald TranscriptNov 30 2017, 5:20 PM

steleman mentioned this in D40177: performance improvements for ThunderX2 T99.Nov 30 2017, 5:23 PM

My points from D40177 still stand.

In D40696#941537, @MatzeB wrote:

My points from D40177 still stand.

Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA.

Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion().

In D40696#941556, @steleman wrote:

In D40696#941537, @MatzeB wrote:

My points from D40177 still stand.

Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA.

Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion().

You would not need to edit AArch64ISelLowering again after adding a subtargetfeature.

fhahn added a subscriber: fhahn.Dec 1 2017, 1:27 AM

fhahn added inline comments.Dec 1 2017, 8:00 AM

test/CodeGen/AArch64/fma-aggressive.ll
2	I think it would be good to also add a check that aggressive FMA is not enabled for other micro architectures.
324	Can we drop the meta data from here to the end or is it required for the test?

steleman added inline comments.Dec 1 2017, 8:09 AM

test/CodeGen/AArch64/fma-aggressive.ll
2	OK I'll add another test named 'fma-not-aggressive.ll' :-)
324	I'll check and if it works without the metadata, I'll remove it.

fhahn added inline comments.Dec 1 2017, 9:21 AM

test/CodeGen/AArch64/fma-aggressive.ll
2	Thanks. You should be able just use a `check-prefix` in the same file, e.g. as in https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/AArch64/preferred-function-alignment.ll
324	Great thanks. On second look, I think there is no need for the test case to be that complicated, for example, do we need all those global variables, function calls, ifs, loops? AFAIK DagCombine only looks at nodes and uses and does not have a cost model tied to loop iterations. Unless I am missing something, it should be enough to have a couple of basic blocks with fadd/fmul instructions to test this, e.g. fadd/fmul with multiple users, or the more complex patterns guarded by `if (Aggressive)` in `visitFADDForFMACombine`. A much simpler test case will make it easier to debug if it fails in the future and also easier to understand when reviewing if it does the right thing.

steleman added inline comments.Dec 4 2017, 7:54 AM

test/CodeGen/AArch64/fma-aggressive.ll

324

It's not as simple as you think.

DAGCombiner.cpp:9636

// fold (fmul (fadd x, +1.0), y) -> (fma x, y, y)
// fold (fmul (fadd x, -1.0), y) -> (fma x, y, (fneg y))
auto FuseFADD = [&](SDValue X, SDValue Y) {
  if (X.getOpcode() == ISD::FADD && (Aggressive || X->hasOneUse())) {
    auto XC1 = isConstOrConstSplatFP(X.getOperand(1));
    if (XC1 && XC1->isExactlyValue(+1.0))
      return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, Y);
    if (XC1 && XC1->isExactlyValue(-1.0))
      return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y,
                         DAG.getNode(ISD::FNEG, SL, VT, Y));
  }
  return SDValue();
};

Note:

if (X.getOpcode() == ISD::FADD && (Aggressive || X->hasOneUse()))

What we want to do here is make sure that X has more than one use. I.e. X->hasOneUse() evaluates to false. That's the only way of testing that FMA happens when Aggressive is true.

In order to make sure that X ends up having more than one use, the code needs to be complicated.

This code isn't lifted from some existing program. I wrote it specifically for this purpose. If there was no need for these loops, ifs, global variables, etc, I wouldn't have written them.

fhahn added inline comments.Dec 4 2017, 8:29 AM

test/CodeGen/AArch64/fma-aggressive.ll
324	You do not need complicated code to have get multiple uses. In the function below, %mul has 2 uses and will only be combined by DAGCombine, if aggressive FMA fusion is enabled (that's the pattern defined in DAGCombiner.cpp:9099) define double @test(double %x, double %y, double %z) { %mul = fmul fast double %x, %y %add = fadd fast double %mul, %z %use2 = fdiv fast double %mul, %add ret double %use2 }

On a related note, the MachineCombiner ignores enableAggressiveFMAFusion, so at -O3, the example won't be combined. I plan to put up a patch soon that updates the machine combiner to consider combining multiple uses, if profitable.

steleman added inline comments.Dec 5 2017, 9:11 AM

test/CodeGen/AArch64/fma-aggressive.ll
324	I will be happy to include your test case in a separate, and different test for Aggressive FMA, but I do not see how your test demonstrates Aggressive FMA as triggered by the compilation of a real program, written in a language that software developers realistically write software in. Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture.

Updated large test case to use --check-prefix={CHECK-FMA|CHECK-GENERIC}.
Included Florian Hahn's small test case for FMA.
Metadata in the large LLVM IR test case is required.

Thanks! I agree with Matthias in that a target feature would be better to enable/disable aggressive FMA. In AArch64.td there are already quite a lot of similar target features, like FeatureFuseAES or FeatureSlowSTRQro that enable certain optimizations for some micro architectures.

test/CodeGen/AArch64/fma-aggressive.ll
324	Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. Sorry, I just intended to highlight how to create a function that tests an isolated pattern that is triggered by aggressive FMA. Ideally we would have similar functions for all important patterns. Then the test should catch all relevant regressions in DAGCombine with aggressive FMA. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture. Agreed! But for that, the LLVM test-suite is probably a better place to make sure there are no performance regressions as it uses Clang directly, like users would, see http://llvm.org/docs/TestingGuide.html . `llvm/test` should contain small, isolated regression tests. I see why it is tempting to add a big test case like that to llvm/test, but 1) it still only guards against changes in Codegen, but opt or clang could mess up the IR to begin with and 2) it makes the regression test suite more fragile than it needs to be. If others are happy with the big test, I won't object though

MatzeB added inline comments.Dec 7 2017, 1:02 PM

test/CodeGen/AArch64/fma-aggressive.ll
324	Yes bigger tests are better of in the test-suite; you summed up llvms testing strategy nicely!

steleman added a child revision: D41810: test case for Aggressive FMA on AArch64.Jan 7 2018, 11:56 AM

Updated diff with latest changes:

Added FeatureAggressiveFMAFloat and FeatureAggressiveFMADouble in AArch64.td as micro-arch features.
Added EnableAggressiveFMA cl::opt in AArch64ISelLowering.cpp.
Implemented AArch64TargetLowering::enableAggressiveFMAFusion in terms of the changes from AArch64.td.
I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

Ping!

It looks like the latest changes address the earlier concerns expressed in the review comments:

Use target feature: Present in AArch64.td, AArch64ISelLowering.h, AArch64ISelLowering.cpp, and AArch64Subtarget.h
Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll
Combined optimized/missed in .ll tests: Present in fma-aggressive.ll and fma-simple.ll

Is there anything else that is needed?

Thanks for updating the patch! I have a few suggestions about reducing fma-aggressive.ll , but besides that and removing aarch64-enable-aggressive-fma it looks quite good.

I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

It's fine to have a more complex test with more patterns than test/CodeGen/AArch64/fma-simple.ll, but test/CodeGen/AArch64/fma-aggressive.ll still contains lots of stuff unrelated the FMA fusion, e.g. define double @reset(double %x, double %y) local_unnamed_addr #0 is not used, the attributes and debug metadata is not needed, the global variables are not needed, the calls to fprint/frwite and so on are not needed and the loop should have not have an impact on the DAGCombiner.

I think the following test function should still expose patterns very similar to the original fma-aggressive.ll, but only focuses on the relevant bits for fusion:

define double @test1(double %a, double %b, double %conv, i32 %rem) {
entry:
  %conv.neg = fsub fast double -0.000000e+00, %conv
  %cmp3 = icmp eq i32 %rem, 0
  %. = select i1 %cmp3, double 1.000000e+00, double -1.000000e+00
  %add = fadd fast double %a, 1.000000e+00
  %add8 = fadd fast double %add, %.
  %mul = fmul fast double %add8, %a
  %add9 = fadd fast double %mul, 1.000000e+01
  %mul10 = fmul fast double %add9, 2.000000e+00
  %add11 = fsub fast double %conv.neg, %mul
  %sub = fadd fast double %add11, %mul10
  %mul14 = fmul fast double %sub, %mul
  %mul15 = fmul fast double %mul14, %.
  %sub17 = fadd fast double %mul15, %mul14
 br i1 %cmp3, label %if.then22, label %if.else27

if.then22:
  %add23 = fadd fast double %sub17, -1.000000e+00
  %sub24 = fadd fast double %sub17, 1.000000e+00
  %mul25 = fmul fast double %add23, %sub24
  %sub26 = fsub fast double 1.000000e+00, %mul25
  br label %if.end32

if.else27:
  %sub28 = fsub fast double -1.000000e+00, %sub17
  %sub29 = fadd fast double %sub17, 1.000000e+00
  %mul30 = fmul fast double %sub28, %sub29
  %add31 = fadd fast double %mul30, 1.000000e+00
  br label %if.end32

if.end32:                                         ; preds = %if.else27, %if.then22
  %b.1 = phi double [ %sub26, %if.then22 ], [ %add31, %if.else27 ]
  %sub33 = fsub fast double 1.000000e+00, %sub17
  %mul34 = fmul fast double %sub17, %conv
  %mul35 = fmul fast double %mul34, %sub33
  %sub36 = fsub fast double %b.1, %.
  %add37 = fadd fast double %b.1, %mul35
  %add38 = fadd fast double %add37, %sub36
  ret double %add38
}

If you think this is not sufficient for some reason, please at least remove the unneeded metadata, unused functions from the test case, replace the calls to fprintf & co that are needed to have uses for the FMA results with simpler functions (or return the result) and replace the globals by adding parameters if required. Also the other ifs should not matter to the combiner either, so I would simplify the CFG there as well.

Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll

pending simplification in fma-aggressive, I would move this test function to fma-aggressive.ll too.

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's probably enough to just have `-aggressive-fma` for now.
lib/Target/AArch64/AArch64ISelLowering.cpp
116	I think there is no need for this option any longer, `-mattr=+aggressive-fma-float+aggressive-fma-double` can be used instead.
10977	If we are only using subtarget features here, we can move this to the header I think.
test/CodeGen/AArch64/fma-simple.ll
3	Please also add a test that enables aggressive fma fusion using mattr for non-thunderX2.

steleman added inline comments.Jan 22 2018, 8:26 AM

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	Could you please clarify this? I.e. are you referring to quad-precision? T99 doesn't have quad-precision FMA. Only floats and doubles. Are there any AArch64 micro-arch'es that can do quad-precision FMA? Or are you asking about SIMD?

fhahn added inline comments.Jan 22 2018, 8:31 AM

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	Sorry, I meant I think a single option to enable aggressive FMA would be enough and separate options for float and double seems too fine grained, unless I am missing something :)

steleman added inline comments.Jan 22 2018, 8:42 AM

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	Aaah, OK. Got it. Here's my thinking: There may be some AArch64 micro-arch'es that would want to enable FMA for floats, but not doubles. Or doubles, but not floats. Because of latency, or some other constraint. So the idea is to allow this differentiation. Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features. And then there's also FMLA and FMLS. For T99 it doesn't matter because all the FMA instructions have the same latency. But I'm not sure this applies to all the other AArch64 micro-arch'es.

fhahn added inline comments.Jan 22 2018, 9:06 AM

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	I see what you mean, I just think that we should make our lives more complicated only when there actually are micro-arch'es that want to enable aggressive FMA for double or float only ;) Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features. I am not sure I follow. I think @MatzeB 's issue was using `getProcFamily()` and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

If there's only one EnableAggressiveFMA boolean in AArch64.td:

What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example).

Won't they have to call getProcFamily() in enableAggressiveFMAFusion()?

As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type.

Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types?

In D40696#984012, @steleman wrote:

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

If there's only one EnableAggressiveFMA boolean in AArch64.td:

What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example).

Won't they have to call getProcFamily() in enableAggressiveFMAFusion()?

As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type.

Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types?

My 2 cents: In general, we could think of probably thousands of subtarget features describing how different potential micro-architectures might handle specific instructions or sequences of instructions.
Implementing all of those thousands of subtarget features without knowing that it actually makes a difference for any of the micro-architectures that LLVM does know about doesn't make sense.
Following that same logic, I'd choose to have 1 subtarget feature introduced in this patch, assuming IIUC that you only really need one for the functionality you introduce in this patch.
If in the future it becomes clear there is a micro-architecture for which in LLVM we want to model different behaviour with respect to AggressiveFMA for floats, doubles, vectors, or something else, we can introduce new subtarget features then.
Introducing them now, without there being a demonstrated need for it, feels a bit like premature optimization to me.
I hope that makes sense?

Updated per latest comments:

Removed AggressiveFMAFloat and AggressiveFMADouble from AArch64.td.
Created FeatureAggressiveFMA in AArch64.td.
Merged test cases into one, including the IR test case from Florian Hahn.

LGTM with 2 small comments, I think all comments should be addressed now, thanks! Please wait with committing for another day or so, to give people some time to raise additional concerns.

When committing, please adjust the commit title & message to mention the target feature rather than the option. And it would be great if you could prefix the title again with [AArch64]

lib/Target/AArch64/AArch64.td
153 ↗	(On Diff #131192)	Just `aggressive-fma` seems more in line with the existing target feature names. `+aggressive-fma` enables the target feature, `-aggressive-fma` explicitly disables it.
409 ↗	(On Diff #131192)	nit: move it to the other FeatureXXX in the list.

This revision is now accepted and ready to land.Jan 24 2018, 1:38 AM

Updated per latest comments.

Closed by commit rL323474: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others. (authored by joel_k_jones). · Explain WhyJan 25 2018, 1:57 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

21 lines

test/

CodeGen/

AArch64/

fma-aggressive.ll

346 lines

fma-simple.ll

13 lines

Diff 126011

lib/Target/AArch64/AArch64ISelLowering.h

Context not available.
	return VT == MVT::f32 \|\| VT == MVT::f64;	return VT == MVT::f32 \|\| VT == MVT::f64;
	}	}

	bool supportSplitCSR(MachineFunction *MF) const override {	bool supportSplitCSR(MachineFunction *MF) const override {
	return MF->getFunction()->getCallingConv() == CallingConv::CXX_FAST_TLS &&	return MF->getFunction()->getCallingConv() == CallingConv::CXX_FAST_TLS &&
	MF->getFunction()->hasFnAttribute(Attribute::NoUnwind);	MF->getFunction()->hasFnAttribute(Attribute::NoUnwind);
	}	}
	void initializeSplitCSR(MachineBasicBlock *Entry) const override;	void initializeSplitCSR(MachineBasicBlock *Entry) const override;
	void insertCopiesSplitCSR(	void insertCopiesSplitCSR(
	MachineBasicBlock *Entry,	MachineBasicBlock *Entry,
	const SmallVectorImpl<MachineBasicBlock *> &Exits) const override;	const SmallVectorImpl<MachineBasicBlock *> &Exits) const override;

	bool supportSwiftError() const override {	bool supportSwiftError() const override {
	return true;	return true;
	}	}

		/// Enable aggressive FMA fusion on targets that want it.
		bool enableAggressiveFMAFusion(EVT VT) const override;

	/// Returns the size of the platform's va_list object.	/// Returns the size of the platform's va_list object.
	unsigned getVaListSizeInBits(const DataLayout &DL) const override;	unsigned getVaListSizeInBits(const DataLayout &DL) const override;

	/// Returns true if \p VecTy is a legal interleaved access type. This	/// Returns true if \p VecTy is a legal interleaved access type. This
	/// function checks the vector element type and the overall width of the	/// function checks the vector element type and the overall width of the
	/// vector.	/// vector.
	bool isLegalInterleavedAccessType(VectorType *VecTy,	bool isLegalInterleavedAccessType(VectorType *VecTy,
	const DataLayout &DL) const;	const DataLayout &DL) const;

	/// Returns the number of interleaved accesses that will be generated when	/// Returns the number of interleaved accesses that will be generated when
	/// lowering accesses of the given type.	/// lowering accesses of the given type.
	unsigned getNumInterleavedAccesses(VectorType *VecTy,	unsigned getNumInterleavedAccesses(VectorType *VecTy,
	const DataLayout &DL) const;	const DataLayout &DL) const;

	MachineMemOperand::Flags getMMOFlags(const Instruction &I) const override;	MachineMemOperand::Flags getMMOFlags(const Instruction &I) const override;

Context not available.

lib/Target/AArch64/AArch64ISelLowering.cpp

Context not available.
	cl::init(false));	cl::init(false));

	// FIXME: The necessary dtprel relocations don't seem to be supported	// FIXME: The necessary dtprel relocations don't seem to be supported
	// well in the GNU bfd and gold linkers at the moment. Therefore, by	// well in the GNU bfd and gold linkers at the moment. Therefore, by
	// default, for now, fall back to GeneralDynamic code generation.	// default, for now, fall back to GeneralDynamic code generation.
	cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(	cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(
	"aarch64-elf-ldtls-generation", cl::Hidden,	"aarch64-elf-ldtls-generation", cl::Hidden,
	cl::desc("Allow AArch64 Local Dynamic TLS code generation"),	cl::desc("Allow AArch64 Local Dynamic TLS code generation"),
	cl::init(false));	cl::init(false));

	static cl::opt<bool>	static cl::opt<bool>
	EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,	EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
	cl::desc("Enable AArch64 logical imm instruction "	cl::desc("Enable AArch64 logical imm instruction "
	"optimization"),	"optimization"),
	cl::init(true));	cl::init(true));

		static cl::opt<bool>
		fhahnUnsubmitted Not Done Reply Inline Actions I think there is no need for this option any longer, `-mattr=+aggressive-fma-float+aggressive-fma-double` can be used instead. fhahn: I think there is no need for this option any longer, `-mattr=+aggressive-fma-float+aggressive…
		EnableAggressiveFMA("aarch64-enable-aggressive-fma", cl::Hidden,
		cl::desc("Enable AArch64 aggressive fused "
		"multiply-add"),
		cl::init(false));

	/// Value type used for condition codes.	/// Value type used for condition codes.
	static const MVT MVT_CC = MVT::i32;	static const MVT MVT_CC = MVT::i32;

	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
	const AArch64Subtarget &STI)	const AArch64Subtarget &STI)
	: TargetLowering(TM), Subtarget(&STI) {	: TargetLowering(TM), Subtarget(&STI) {
	// AArch64 doesn't have comparisons which set GPRs or setcc instructions, so	// AArch64 doesn't have comparisons which set GPRs or setcc instructions, so
	// we have to make something up. Arbitrarily, choose ZeroOrOne.	// we have to make something up. Arbitrarily, choose ZeroOrOne.
	setBooleanContents(ZeroOrOneBooleanContent);	setBooleanContents(ZeroOrOneBooleanContent);
	// When comparing vectors the result sets the different elements in the	// When comparing vectors the result sets the different elements in the
	// vector to all-one or all-zero.	// vector to all-one or all-zero.
	setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);	setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);

	// Set up the register classes.	// Set up the register classes.
	addRegisterClass(MVT::i32, &AArch64::GPR32allRegClass);	addRegisterClass(MVT::i32, &AArch64::GPR32allRegClass);
	addRegisterClass(MVT::i64, &AArch64::GPR64allRegClass);	addRegisterClass(MVT::i64, &AArch64::GPR64allRegClass);
Context not available.
	}	}
	}	}

	bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeList Attr) const {	bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeList Attr) const {
	// Integer division on AArch64 is expensive. However, when aggressively	// Integer division on AArch64 is expensive. However, when aggressively
	// optimizing for code size, we prefer to use a div instruction, as it is	// optimizing for code size, we prefer to use a div instruction, as it is
	// usually smaller than the alternative sequence.	// usually smaller than the alternative sequence.
	// The exception to this is vector division. Since AArch64 doesn't have vector	// The exception to this is vector division. Since AArch64 doesn't have vector
	// integer division, leaving the division as-is is a loss even in terms of	// integer division, leaving the division as-is is a loss even in terms of
		fhahnUnsubmitted Not Done Reply Inline Actions If we are only using subtarget features here, we can move this to the header I think. fhahn: If we are only using subtarget features here, we can move this to the header I think.
	// size, because it will have to be scalarized, while the alternative code	// size, because it will have to be scalarized, while the alternative code
	// sequence can be performed in vector form.	// sequence can be performed in vector form.
	bool OptSize =	bool OptSize =
	Attr.hasAttribute(AttributeList::FunctionIndex, Attribute::MinSize);	Attr.hasAttribute(AttributeList::FunctionIndex, Attribute::MinSize);
	return OptSize && !VT.isVector();	return OptSize && !VT.isVector();
	}	}

		bool AArch64TargetLowering::enableAggressiveFMAFusion(EVT VT) const {
		unsigned PF = static_cast<unsigned>(Subtarget->getProcFamily());
		switch(PF) {
		default:
		return VT.isFloatingPoint() && EnableAggressiveFMA.getValue();
		break;
		case AArch64Subtarget::ThunderX2T99:
		// Always enabled on Cavium T99.
		return VT.isFloatingPoint();
		break;
		}

		return false;
		}

	unsigned	unsigned
	AArch64TargetLowering::getVaListSizeInBits(const DataLayout &DL) const {	AArch64TargetLowering::getVaListSizeInBits(const DataLayout &DL) const {
	if (Subtarget->isTargetDarwin() \|\| Subtarget->isTargetWindows())	if (Subtarget->isTargetDarwin() \|\| Subtarget->isTargetWindows())
	return getPointerTy(DL).getSizeInBits();	return getPointerTy(DL).getSizeInBits();

	return 3 * getPointerTy(DL).getSizeInBits() + 2 * 32;	return 3 * getPointerTy(DL).getSizeInBits() + 2 * 32;
	}	}
Context not available.

test/CodeGen/AArch64/fma-aggressive.ll

				; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=thunderx2t99 -fp-contract=fast < %s \| FileCheck %s --check-prefix=CHECK-FMA
				; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=generic < %s \| FileCheck %s --check-prefix=CHECK-GENERIC
				fhahnUnsubmitted Not Done Reply Inline Actions I think it would be good to also add a check that aggressive FMA is not enabled for other micro architectures. fhahn: I think it would be good to also add a check that aggressive FMA is not enabled for other micro…
				stelemanAuthorUnsubmitted Not Done Reply Inline Actions OK I'll add another test named 'fma-not-aggressive.ll' :-) steleman: OK I'll add another test named 'fma-not-aggressive.ll' :-)
				fhahnUnsubmitted Not Done Reply Inline Actions Thanks. You should be able just use a `check-prefix` in the same file, e.g. as in https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/AArch64/preferred-function-alignment.ll fhahn: Thanks. You should be able just use a `check-prefix` in the same file, e.g. as in https…
				; /* This test program demonstrates the effects of enabling aggressive FMA
				; * on AArch64. With aggressive FMA enabled, CodeGen will fuse instructions
				; * for SDValues with one or more use. With aggressive FMA disabled, this
				; * fusion does not happen.
				; */
				;
				; /* clang -O2 -std=c99 -Wall -mcpu=thunderx2t99 -march=armv8.1-a+lse
				; * -funroll-loops -ffast-math -Xclang -menable-unsafe-fp-math
				; * -emit-llvm -S fma.c -o fma.ll
				; */
				;
				; #include <stdio.h>
				; #include <stdlib.h>
				; #include <math.h>
				;
				; static const double AE[] = { -0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
				; 0.00, 0.00, 0.00, 0.00 };
				;
				; static const double BE[] = { 1102499.00, -2.00, -3.00, -4.00, -5.00,
				; -6.00, -7.00, -8.00, -9.00, -10.00 };
				;
				; double reset(double x, double y)
				; {
				; double i;
				; if (modf(x, &i) == 0.0)
				; return x + y;
				;
				; return x - y;
				; }
				;
				; int main(int argc, char* argv[])
				; {
				; int z;
				; if (argc >= 2)
				; z = atoi(argv[1]);
				; else
				; z = 10;
				;
				; double a = 3.0;
				; double b = 5.0;
				; double c = 10.0;
				; (void) fprintf(stderr, "a=%lf b=%lf c=%lf\n", a, b, c);
				;
				; for (int i = 0; i < z; ++i) {
				; double x = (double) i;
				; double p1 = 1.0;
				; double n1 = -1.0;
				; double y;
				;
				; if ((i % 2) == 0)
				; y = p1;
				; else
				; y = n1;
				;
				; a *= y + p1 + a;
				; a = y + ((c + a) 2.0) - (a + y + x);
				; a -= n1 * (y * a);
				;
				; (void) fprintf(stderr, "a=%lf b=%lf c=%lf\n", a, b, c);
				;
				; if ((i % 2) == 0)
				; b = p1 - ((n1 + a) * (a - n1));
				; else
				; b = p1 + ((n1 - a) * (a - n1));
				;
				; a = (p1 - a) x;
				; b -= reset(x, b - y + (a + b));
				;
				; (void) fprintf(stderr, "a=%lf b=%lf c=%lf", a, b, c);
				; if ((a == AE[i]) && (b == BE[i]))
				; (void) fprintf(stderr, "\t-----> PASS.");
				; else {
				; (void) fprintf(stderr, "\t-----> FAIL: ");
				; if (a != AE[i])
				; (void) fprintf(stderr, "(a == %lf, expected %lf)", a, AE[i]);
				; else if (b != BE[i])
				; (void) fprintf(stderr, "(b == %lf, expected %lf)", b, BE[i]);
				; }
				;
				; (void) fprintf(stderr, "\n\n");
				; }
				;
				; return 0;
				; }
				;
				; ModuleID = 'fma.c'
				source_filename = "fma.c"
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				%struct._IO_FILE = type { i32, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, %struct._IO_marker, %struct._IO_FILE, i32, i32, i64, i16, i8, [1 x i8], i8, i64, i8, i8, i8, i8, i64, i32, [20 x i8] }
				%struct._IO_marker = type { %struct._IO_marker, %struct._IO_FILE, i32 }

				@stderr = external local_unnamed_addr global %struct._IO_FILE*, align 8
				@.str = private unnamed_addr constant [19 x i8] c"a=%lf b=%lf c=%lf\0A\00", align 1
				@.str.1 = private unnamed_addr constant [18 x i8] c"a=%lf b=%lf c=%lf\00", align 1
				@AE = internal unnamed_addr constant [10 x double] [double -0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00], align 8
				@BE = internal unnamed_addr constant [10 x double] [double 0x4130D2A300000000, double -2.000000e+00, double -3.000000e+00, double -4.000000e+00, double -5.000000e+00, double -6.000000e+00, double -7.000000e+00, double -8.000000e+00, double -9.000000e+00, double -1.000000e+01], align 8
				@.str.2 = private unnamed_addr constant [14 x i8] c"\09-----> PASS.\00", align 1
				@.str.3 = private unnamed_addr constant [15 x i8] c"\09-----> FAIL: \00", align 1
				@.str.4 = private unnamed_addr constant [24 x i8] c"(a == %lf, expected %lf\00", align 1
				@.str.5 = private unnamed_addr constant [24 x i8] c"(b == %lf, expected %lf\00", align 1
				@.str.6 = private unnamed_addr constant [3 x i8] c"\0A\0A\00", align 1

				; Function Attrs: nounwind
				define double @reset(double %x, double %y) local_unnamed_addr #0 {
				; CHECK-LABEL: reset:
				entry:
				%i = alloca double, align 8
				%0 = bitcast double* %i to i8*
				call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0) #3
				%call = call fast double @modf(double %x, double* nonnull %i) #3
				%cmp = fcmp fast oeq double %call, 0.000000e+00
				%1 = fsub fast double -0.000000e+00, %y
				%retval.0.p = select i1 %cmp, double %y, double %1
				%retval.0 = fadd fast double %retval.0.p, %x
				call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3
				ret double %retval.0
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1

				; Function Attrs: nounwind
				declare double @modf(double, double* nocapture) local_unnamed_addr #2

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1

				; Function Attrs: nounwind
				define i32 @main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #0 {
				entry:
				%i.i = alloca double, align 8
				%cmp = icmp sgt i32 %argc, 1
				br i1 %cmp, label %if.end, label %if.end.thread

				if.end.thread: ; preds = %entry
				%0 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%call1140 = tail call i32 (%struct._IO_FILE, i8, ...) @fprintf(%struct._IO_FILE* %0, i8* getelementptr inbounds ([19 x i8], [19 x i8]* @.str, i64 0, i64 0), double 3.000000e+00, double 5.000000e+00, double 1.000000e+01) #4
				br label %for.body.lr.ph

				if.end: ; preds = %entry
				%arrayidx = getelementptr inbounds i8, i8* %argv, i64 1
				%1 = load i8, i8* %arrayidx, align 8, !tbaa !2
				%call.i = tail call i64 @strtol(i8* nocapture nonnull %1, i8** null, i32 10) #3
				%conv.i = trunc i64 %call.i to i32
				%2 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%call1 = tail call i32 (%struct._IO_FILE, i8, ...) @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([19 x i8], [19 x i8]* @.str, i64 0, i64 0), double 3.000000e+00, double 5.000000e+00, double 1.000000e+01) #4
				%cmp2136 = icmp sgt i32 %conv.i, 0
				br i1 %cmp2136, label %for.body.lr.ph, label %for.cond.cleanup

				for.body.lr.ph: ; preds = %if.end.thread, %if.end
				%z.0142 = phi i64 [ 10, %if.end.thread ], [ %call.i, %if.end ]
				%3 = bitcast double* %i.i to i8*
				%wide.trip.count = and i64 %z.0142, 4294967295
				br label %for.body

				for.cond.cleanup: ; preds = %if.end72, %if.end
				ret i32 0

				for.body: ; preds = %if.end72, %for.body.lr.ph
				; CHECK-FMA: fadd d0, d9, d13
				; CHECK-FMA: tst w26, #0x1
				; CHECK-FMA: fcsel d15, d13, d12, eq
				; CHECK-GENERIC: tst w26, #0x1
				; CHECK-GENERIC: fcsel d15, d13, d12, eq
				; CHECK-GENERIC: fadd d1, d13, d15
				; CHECK-GENERIC: fadd d1, d9, d1
				; CHECK-GENERIC: fneg d0, d11
				; CHECK-GENERIC: fmul d1, d1, d9
				; CHECK-GENERIC: fadd d2, d1, d8
				; CHECK-GENERIC: fsub d0, d0, d1
				; CHECK-GENERIC: ldr x0, [x28, :lo12:stderr]
				; CHECK-FMA: ldr x0, [x28, :lo12:stderr]
				; CHECK-FMA: fadd d0, d0, d15
				; CHECK-FMA: fmul d1, d0, d9
				; CHECK-FMA: fmadd d2, d0, d9, d8
				; CHECK-FMA: fnmadd d0, d0, d9, d11
				; CHCK-FMA: mov x1, x19
				; CHECK-FMA: fmadd d0, d2, d14, d0
				; CHECK-GENERIC: fmadd d0, d2, d14, d0
				; CHECK-FMA: mov v2.16b, v8.16b
				; CHECK-FMA: fmul d0, d0, d1
				; CHECK-GENERIC: fmul d0, d0, d1
				; CHECK-FMA: mov v1.16b, v10.16b
				; CHECK-FMA: fmadd d9, d0, d15, d0
				; CHECK-FMA: mov v0.16b, v9.16b
				; CHECK-GENERIC: fmadd d9, d0, d15, d0
				; CHECK-GENERIC: tbnz w26, #0, .LBB1_6
				; CHECK-GENERIC: fsub d0, d13, d9
				; CHECK-GENERIC: b .LBB1_7
				; CHECK-GENERIC: fmadd d10, d0, d1, d13
				; CHECK-GENERIC: fsub d0, d10, d15
				; CHECK-GENERIC: fadd d0, d9, d0
				; CHECK-GENERIC: fadd d15, d10, d0
				; CHECK-GENERIC: add x0, sp, #8
				; CHECK-FMA: fsub d0, d12, d9
				; CHECK-FMA: fadd d1, d9, d13
				; CHECK-FMA: fsub d2, d13, d9
				; CHECK-FMA: tst w26, #0x1
				; CHECK-FMA: add x0, sp, #8
				; CHECK-FMA: fmadd d0, d0, d1, d13
				; CHECK-FMA: fmadd d1, d2, d1, d13
				; CHECK-FMA: fcsel d10, d0, d1, ne
				; CHECK-FMA: fmul d0, d9, d11
				; CHECK-FMA: fsub d1, d10, d15
				; CHECK-FMA: fmsub d9, d9, d0, d0
				; CHECK-FMA: fmadd d0, d0, d2, d10
				; CHECK-FMA: fadd d15, d0, d1
				; CHECK-GENERIC: mov x1, x20
				; CHECK-GENERIC: mov v0.16b, v9.16b
				; CHECK-GENERIC: mov v1.16b, v10.16b
				; CHECK-GENERIC: mov v2.16b, v8.16b
				; CHECK-GENERIC: b.ne .LBB1_10
				; CHECK-GENERIC: b.ne .LBB1_10
				; CHECK-GENERIC: adrp x1, .L.str.5
				; CHECK-GENERIC: mov v0.16b, v10.16b
				%indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %if.end72 ]
				%a.0139 = phi double [ 3.000000e+00, %for.body.lr.ph ], [ %mul35, %if.end72 ]
				%b.0138 = phi double [ 5.000000e+00, %for.body.lr.ph ], [ %sub40, %if.end72 ]
				%4 = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %4 to double
				%conv.neg = fsub fast double -0.000000e+00, %conv
				%rem = and i32 %4, 1
				%cmp3 = icmp eq i32 %rem, 0
				%. = select i1 %cmp3, double 1.000000e+00, double -1.000000e+00
				%add = fadd fast double %a.0139, 1.000000e+00
				%add8 = fadd fast double %add, %.
				%mul = fmul fast double %add8, %a.0139
				%add9 = fadd fast double %mul, 1.000000e+01
				%mul10 = fmul fast double %add9, 2.000000e+00
				%add11 = fsub fast double %conv.neg, %mul
				%sub = fadd fast double %add11, %mul10
				%mul14 = fmul fast double %sub, %mul
				%mul15 = fmul fast double %mul14, %.
				%sub17 = fadd fast double %mul15, %mul14
				%5 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%call18 = tail call i32 (%struct._IO_FILE, i8, ...) @fprintf(%struct._IO_FILE* %5, i8* getelementptr inbounds ([19 x i8], [19 x i8]* @.str, i64 0, i64 0), double %sub17, double %b.0138, double 1.000000e+01) #4
				br i1 %cmp3, label %if.then22, label %if.else27

				if.then22: ; preds = %for.body
				%add23 = fadd fast double %sub17, -1.000000e+00
				%sub24 = fadd fast double %sub17, 1.000000e+00
				%mul25 = fmul fast double %add23, %sub24
				%sub26 = fsub fast double 1.000000e+00, %mul25
				br label %if.end32

				if.else27: ; preds = %for.body
				%sub28 = fsub fast double -1.000000e+00, %sub17
				%sub29 = fadd fast double %sub17, 1.000000e+00
				%mul30 = fmul fast double %sub28, %sub29
				%add31 = fadd fast double %mul30, 1.000000e+00
				br label %if.end32

				if.end32: ; preds = %if.else27, %if.then22
				%b.1 = phi double [ %sub26, %if.then22 ], [ %add31, %if.else27 ]
				%sub33 = fsub fast double 1.000000e+00, %sub17
				%mul34 = fmul fast double %sub17, %conv
				%mul35 = fmul fast double %mul34, %sub33
				%sub36 = fsub fast double %b.1, %.
				%add37 = fadd fast double %b.1, %mul35
				%add38 = fadd fast double %add37, %sub36
				call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %3) #3
				%call.i135 = call fast double @modf(double %conv, double* nonnull %i.i) #3
				%cmp.i = fcmp fast oeq double %call.i135, 0.000000e+00
				%6 = fsub fast double -0.000000e+00, %add38
				%retval.0.p.i = select i1 %cmp.i, double %add38, double %6
				call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %3) #3
				%retval.0.i.neg = fsub fast double %b.1, %conv
				%sub40 = fsub fast double %retval.0.i.neg, %retval.0.p.i
				%7 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%call41 = tail call i32 (%struct._IO_FILE, i8, ...) @fprintf(%struct._IO_FILE* %7, i8* getelementptr inbounds ([18 x i8], [18 x i8]* @.str.1, i64 0, i64 0), double %mul35, double %sub40, double 1.000000e+01) #4
				%arrayidx42 = getelementptr inbounds [10 x double], [10 x double]* @AE, i64 0, i64 %indvars.iv
				%8 = load double, double* %arrayidx42, align 8, !tbaa !6
				%cmp43 = fcmp fast oeq double %mul35, %8
				br i1 %cmp43, label %land.lhs.true, label %if.else51

				land.lhs.true: ; preds = %if.end32
				%arrayidx46 = getelementptr inbounds [10 x double], [10 x double]* @BE, i64 0, i64 %indvars.iv
				%9 = load double, double* %arrayidx46, align 8, !tbaa !6
				%cmp47 = fcmp fast oeq double %sub40, %9
				br i1 %cmp47, label %if.then49, label %if.else51

				if.then49: ; preds = %land.lhs.true
				%10 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%11 = tail call i64 @fwrite(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @.str.2, i64 0, i64 0), i64 13, i64 1, %struct._IO_FILE* %10) #4
				br label %if.end72

				if.else51: ; preds = %land.lhs.true, %if.end32
				%12 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%13 = tail call i64 @fwrite(i8* getelementptr inbounds ([15 x i8], [15 x i8]* @.str.3, i64 0, i64 0), i64 14, i64 1, %struct._IO_FILE* %12) #4
				%cmp55 = fcmp fast une double %mul35, %8
				br i1 %cmp55, label %if.then57, label %if.else61

				if.then57: ; preds = %if.else51
				%14 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%call60 = tail call i32 (%struct._IO_FILE, i8, ...) @fprintf(%struct._IO_FILE* %14, i8* getelementptr inbounds ([24 x i8], [24 x i8]* @.str.4, i64 0, i64 0), double %mul35, double %8) #4
				br label %if.end72

				if.else61: ; preds = %if.else51
				%arrayidx63 = getelementptr inbounds [10 x double], [10 x double]* @BE, i64 0, i64 %indvars.iv
				%15 = load double, double* %arrayidx63, align 8, !tbaa !6
				%cmp64 = fcmp fast une double %sub40, %15
				br i1 %cmp64, label %if.then66, label %if.end72

				if.then66: ; preds = %if.else61
				%16 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%call69 = tail call i32 (%struct._IO_FILE, i8, ...) @fprintf(%struct._IO_FILE* %16, i8* getelementptr inbounds ([24 x i8], [24 x i8]* @.str.5, i64 0, i64 0), double %sub40, double %15) #4
				br label %if.end72

				if.end72: ; preds = %if.then57, %if.then66, %if.else61, %if.then49
				%17 = load %struct._IO_FILE, %struct._IO_FILE* @stderr, align 8, !tbaa !2
				%18 = tail call i64 @fwrite(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str.6, i64 0, i64 0), i64 2, i64 1, %struct._IO_FILE* %17) #4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

				; Function Attrs: nounwind
				declare i32 @fprintf(%struct._IO_FILE* nocapture, i8* nocapture readonly, ...) local_unnamed_addr #2

				; Function Attrs: nounwind
				fhahnUnsubmitted Not Done Reply Inline Actions Can we drop the meta data from here to the end or is it required for the test? fhahn: Can we drop the meta data from here to the end or is it required for the test?
				stelemanAuthorUnsubmitted Not Done Reply Inline Actions I'll check and if it works without the metadata, I'll remove it. steleman: I'll check and if it works without the metadata, I'll remove it.
				fhahnUnsubmitted Not Done Reply Inline Actions Great thanks. On second look, I think there is no need for the test case to be that complicated, for example, do we need all those global variables, function calls, ifs, loops? AFAIK DagCombine only looks at nodes and uses and does not have a cost model tied to loop iterations. Unless I am missing something, it should be enough to have a couple of basic blocks with fadd/fmul instructions to test this, e.g. fadd/fmul with multiple users, or the more complex patterns guarded by `if (Aggressive)` in `visitFADDForFMACombine`. A much simpler test case will make it easier to debug if it fails in the future and also easier to understand when reviewing if it does the right thing. fhahn: Great thanks. On second look, I think there is no need for the test case to be that complicated…
				stelemanAuthorUnsubmitted Not Done Reply Inline Actions It's not as simple as you think. DAGCombiner.cpp:9636 // fold (fmul (fadd x, +1.0), y) -> (fma x, y, y) // fold (fmul (fadd x, -1.0), y) -> (fma x, y, (fneg y)) auto FuseFADD = [&](SDValue X, SDValue Y) { if (X.getOpcode() == ISD::FADD && (Aggressive \|\| X->hasOneUse())) { auto XC1 = isConstOrConstSplatFP(X.getOperand(1)); if (XC1 && XC1->isExactlyValue(+1.0)) return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, Y); if (XC1 && XC1->isExactlyValue(-1.0)) return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, DAG.getNode(ISD::FNEG, SL, VT, Y)); } return SDValue(); }; Note: if (X.getOpcode() == ISD::FADD && (Aggressive \|\| X->hasOneUse())) What we want to do here is make sure that X has more than one use. I.e. X->hasOneUse() evaluates to false. That's the only way of testing that FMA happens when Aggressive is true. In order to make sure that X ends up having more than one use, the code needs to be complicated. This code isn't lifted from some existing program. I wrote it specifically for this purpose. If there was no need for these loops, ifs, global variables, etc, I wouldn't have written them. steleman: It's not as simple as you think. DAGCombiner.cpp:9636 ``` // fold (fmul (fadd x, +1.0)…
				fhahnUnsubmitted Not Done Reply Inline Actions You do not need complicated code to have get multiple uses. In the function below, %mul has 2 uses and will only be combined by DAGCombine, if aggressive FMA fusion is enabled (that's the pattern defined in DAGCombiner.cpp:9099) define double @test(double %x, double %y, double %z) { %mul = fmul fast double %x, %y %add = fadd fast double %mul, %z %use2 = fdiv fast double %mul, %add ret double %use2 } fhahn: You do not need complicated code to have get multiple uses. In the function below, %mul has 2…
				stelemanAuthorUnsubmitted Not Done Reply Inline Actions I will be happy to include your test case in a separate, and different test for Aggressive FMA, but I do not see how your test demonstrates Aggressive FMA as triggered by the compilation of a real program, written in a language that software developers realistically write software in. Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture. steleman: I will be happy to include your test case in a separate, and different test for Aggressive FMA…
				fhahnUnsubmitted Not Done Reply Inline Actions Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. Sorry, I just intended to highlight how to create a function that tests an isolated pattern that is triggered by aggressive FMA. Ideally we would have similar functions for all important patterns. Then the test should catch all relevant regressions in DAGCombine with aggressive FMA. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture. Agreed! But for that, the LLVM test-suite is probably a better place to make sure there are no performance regressions as it uses Clang directly, like users would, see http://llvm.org/docs/TestingGuide.html . `llvm/test` should contain small, isolated regression tests. I see why it is tempting to add a big test case like that to llvm/test, but 1) it still only guards against changes in Codegen, but opt or clang could mess up the IR to begin with and 2) it makes the regression test suite more fragile than it needs to be. If others are happy with the big test, I won't object though fhahn: > Also: your test does not provide any method of validating the results produced by enabling…
				MatzeBUnsubmitted Not Done Reply Inline Actions Yes bigger tests are better of in the test-suite; you summed up llvms testing strategy nicely! MatzeB: Yes bigger tests are better of in the test-suite; you summed up llvms testing strategy nicely!
				declare i64 @strtol(i8* readonly, i8** nocapture, i32) local_unnamed_addr #2

				; Function Attrs: nounwind
				declare i64 @fwrite(i8* nocapture, i64, i64, %struct._IO_FILE* nocapture) local_unnamed_addr #3

				attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="thunderx2t99" "target-features"="+lse,+neon,+v8.1a" "unsafe-fp-math"="true" "use-soft-float"="false" }
				attributes #1 = { argmemonly nounwind }
				attributes #2 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="thunderx2t99" "target-features"="+lse,+neon,+v8.1a" "unsafe-fp-math"="true" "use-soft-float"="false" }
				attributes #3 = { nounwind }
				attributes #4 = { cold }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (http://llvm.org/git/clang.git 9f9177d3ef72580ca29e8844327f63d7aa1908af) (http://llvm.org/git/llvm.git 3e48a4f4584fcf21e300affe64eb228647f4bb13)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"any pointer", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}
				!6 = !{!7, !7, i64 0}
				!7 = !{!"double", !4, i64 0}

test/CodeGen/AArch64/fma-simple.ll

				; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=thunderx2t99 -fp-contract=fast < %s \| FileCheck %s --check-prefix=CHECK-FMA
				; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=generic < %s \| FileCheck %s --check-prefix=CHECK-GENERIC
				define double @test(double %x, double %y, double %z) {
				fhahnUnsubmitted Not Done Reply Inline Actions Please also add a test that enables aggressive fma fusion using mattr for non-thunderX2. fhahn: Please also add a test that enables aggressive fma fusion using mattr for non-thunderX2.
				; CHECK-FMA: fmul d3, d0, d1
				; CHECK-FMA: fmadd d0, d0, d1, d2
				; CHECK-GENERIC: fmul d0, d0, d1
				; CHECK-GENERIC: fadd d1, d0, d2
				%mul = fmul fast double %x, %y
				%add = fadd fast double %mul, %z
				%use2 = fdiv fast double %mul, %add
				ret double %use2
				}

This is an archive of the discontinued LLVM Phabricator instance.

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch'sClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 126011

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

test/CodeGen/AArch64/fma-aggressive.ll

test/CodeGen/AArch64/fma-simple.ll

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's
ClosedPublic