This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
7
AArch64.td
-
AArch64ISelLowering.h
1
AArch64ISelLowering.cpp
-
AArch64Subtarget.h
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
11
fma-aggressive.ll

Differential D40696

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's
ClosedPublic

Authored by steleman on Nov 30 2017, 5:19 PM.

Download Raw Diff

Details

Reviewers

joel_k_jones
fhahn
evandro
mcrosier

Commits

rG0715092c65ba: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.
rL323474: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.

Summary

This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma).

Test case demonstrating the effects on T99 is included.

Diff Detail

Repository: rL LLVM

Event Timeline

steleman created this revision.Nov 30 2017, 5:19 PM

Herald added subscribers: kristof.beyls, javed.absar, rengolin, aemerson. · View Herald TranscriptNov 30 2017, 5:20 PM

steleman mentioned this in D40177: performance improvements for ThunderX2 T99.Nov 30 2017, 5:23 PM

My points from D40177 still stand.

In D40696#941537, @MatzeB wrote:

My points from D40177 still stand.

Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA.

Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion().

In D40696#941556, @steleman wrote:

In D40696#941537, @MatzeB wrote:

My points from D40177 still stand.

Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA.

Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion().

You would not need to edit AArch64ISelLowering again after adding a subtargetfeature.

fhahn added a subscriber: fhahn.Dec 1 2017, 1:27 AM

fhahn added inline comments.Dec 1 2017, 8:00 AM

test/CodeGen/AArch64/fma-aggressive.ll
2	I think it would be good to also add a check that aggressive FMA is not enabled for other micro architectures.
324	Can we drop the meta data from here to the end or is it required for the test?

steleman added inline comments.Dec 1 2017, 8:09 AM

test/CodeGen/AArch64/fma-aggressive.ll
2	OK I'll add another test named 'fma-not-aggressive.ll' :-)
324	I'll check and if it works without the metadata, I'll remove it.

fhahn added inline comments.Dec 1 2017, 9:21 AM

test/CodeGen/AArch64/fma-aggressive.ll
2	Thanks. You should be able just use a `check-prefix` in the same file, e.g. as in https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/AArch64/preferred-function-alignment.ll
324	Great thanks. On second look, I think there is no need for the test case to be that complicated, for example, do we need all those global variables, function calls, ifs, loops? AFAIK DagCombine only looks at nodes and uses and does not have a cost model tied to loop iterations. Unless I am missing something, it should be enough to have a couple of basic blocks with fadd/fmul instructions to test this, e.g. fadd/fmul with multiple users, or the more complex patterns guarded by `if (Aggressive)` in `visitFADDForFMACombine`. A much simpler test case will make it easier to debug if it fails in the future and also easier to understand when reviewing if it does the right thing.

steleman added inline comments.Dec 4 2017, 7:54 AM

test/CodeGen/AArch64/fma-aggressive.ll

324

It's not as simple as you think.

DAGCombiner.cpp:9636

// fold (fmul (fadd x, +1.0), y) -> (fma x, y, y)
// fold (fmul (fadd x, -1.0), y) -> (fma x, y, (fneg y))
auto FuseFADD = [&](SDValue X, SDValue Y) {
  if (X.getOpcode() == ISD::FADD && (Aggressive || X->hasOneUse())) {
    auto XC1 = isConstOrConstSplatFP(X.getOperand(1));
    if (XC1 && XC1->isExactlyValue(+1.0))
      return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, Y);
    if (XC1 && XC1->isExactlyValue(-1.0))
      return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y,
                         DAG.getNode(ISD::FNEG, SL, VT, Y));
  }
  return SDValue();
};

Note:

if (X.getOpcode() == ISD::FADD && (Aggressive || X->hasOneUse()))

What we want to do here is make sure that X has more than one use. I.e. X->hasOneUse() evaluates to false. That's the only way of testing that FMA happens when Aggressive is true.

In order to make sure that X ends up having more than one use, the code needs to be complicated.

This code isn't lifted from some existing program. I wrote it specifically for this purpose. If there was no need for these loops, ifs, global variables, etc, I wouldn't have written them.

fhahn added inline comments.Dec 4 2017, 8:29 AM

test/CodeGen/AArch64/fma-aggressive.ll
324	You do not need complicated code to have get multiple uses. In the function below, %mul has 2 uses and will only be combined by DAGCombine, if aggressive FMA fusion is enabled (that's the pattern defined in DAGCombiner.cpp:9099) define double @test(double %x, double %y, double %z) { %mul = fmul fast double %x, %y %add = fadd fast double %mul, %z %use2 = fdiv fast double %mul, %add ret double %use2 }

On a related note, the MachineCombiner ignores enableAggressiveFMAFusion, so at -O3, the example won't be combined. I plan to put up a patch soon that updates the machine combiner to consider combining multiple uses, if profitable.

steleman added inline comments.Dec 5 2017, 9:11 AM

test/CodeGen/AArch64/fma-aggressive.ll
324	I will be happy to include your test case in a separate, and different test for Aggressive FMA, but I do not see how your test demonstrates Aggressive FMA as triggered by the compilation of a real program, written in a language that software developers realistically write software in. Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture.

Updated large test case to use --check-prefix={CHECK-FMA|CHECK-GENERIC}.
Included Florian Hahn's small test case for FMA.
Metadata in the large LLVM IR test case is required.

Thanks! I agree with Matthias in that a target feature would be better to enable/disable aggressive FMA. In AArch64.td there are already quite a lot of similar target features, like FeatureFuseAES or FeatureSlowSTRQro that enable certain optimizations for some micro architectures.

test/CodeGen/AArch64/fma-aggressive.ll
324	Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. Sorry, I just intended to highlight how to create a function that tests an isolated pattern that is triggered by aggressive FMA. Ideally we would have similar functions for all important patterns. Then the test should catch all relevant regressions in DAGCombine with aggressive FMA. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture. Agreed! But for that, the LLVM test-suite is probably a better place to make sure there are no performance regressions as it uses Clang directly, like users would, see http://llvm.org/docs/TestingGuide.html . `llvm/test` should contain small, isolated regression tests. I see why it is tempting to add a big test case like that to llvm/test, but 1) it still only guards against changes in Codegen, but opt or clang could mess up the IR to begin with and 2) it makes the regression test suite more fragile than it needs to be. If others are happy with the big test, I won't object though

MatzeB added inline comments.Dec 7 2017, 1:02 PM

test/CodeGen/AArch64/fma-aggressive.ll
324	Yes bigger tests are better of in the test-suite; you summed up llvms testing strategy nicely!

steleman added a child revision: D41810: test case for Aggressive FMA on AArch64.Jan 7 2018, 11:56 AM

Updated diff with latest changes:

Added FeatureAggressiveFMAFloat and FeatureAggressiveFMADouble in AArch64.td as micro-arch features.
Added EnableAggressiveFMA cl::opt in AArch64ISelLowering.cpp.
Implemented AArch64TargetLowering::enableAggressiveFMAFusion in terms of the changes from AArch64.td.
I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

Ping!

It looks like the latest changes address the earlier concerns expressed in the review comments:

Use target feature: Present in AArch64.td, AArch64ISelLowering.h, AArch64ISelLowering.cpp, and AArch64Subtarget.h
Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll
Combined optimized/missed in .ll tests: Present in fma-aggressive.ll and fma-simple.ll

Is there anything else that is needed?

Thanks for updating the patch! I have a few suggestions about reducing fma-aggressive.ll , but besides that and removing aarch64-enable-aggressive-fma it looks quite good.

I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

It's fine to have a more complex test with more patterns than test/CodeGen/AArch64/fma-simple.ll, but test/CodeGen/AArch64/fma-aggressive.ll still contains lots of stuff unrelated the FMA fusion, e.g. define double @reset(double %x, double %y) local_unnamed_addr #0 is not used, the attributes and debug metadata is not needed, the global variables are not needed, the calls to fprint/frwite and so on are not needed and the loop should have not have an impact on the DAGCombiner.

I think the following test function should still expose patterns very similar to the original fma-aggressive.ll, but only focuses on the relevant bits for fusion:

define double @test1(double %a, double %b, double %conv, i32 %rem) {
entry:
  %conv.neg = fsub fast double -0.000000e+00, %conv
  %cmp3 = icmp eq i32 %rem, 0
  %. = select i1 %cmp3, double 1.000000e+00, double -1.000000e+00
  %add = fadd fast double %a, 1.000000e+00
  %add8 = fadd fast double %add, %.
  %mul = fmul fast double %add8, %a
  %add9 = fadd fast double %mul, 1.000000e+01
  %mul10 = fmul fast double %add9, 2.000000e+00
  %add11 = fsub fast double %conv.neg, %mul
  %sub = fadd fast double %add11, %mul10
  %mul14 = fmul fast double %sub, %mul
  %mul15 = fmul fast double %mul14, %.
  %sub17 = fadd fast double %mul15, %mul14
 br i1 %cmp3, label %if.then22, label %if.else27

if.then22:
  %add23 = fadd fast double %sub17, -1.000000e+00
  %sub24 = fadd fast double %sub17, 1.000000e+00
  %mul25 = fmul fast double %add23, %sub24
  %sub26 = fsub fast double 1.000000e+00, %mul25
  br label %if.end32

if.else27:
  %sub28 = fsub fast double -1.000000e+00, %sub17
  %sub29 = fadd fast double %sub17, 1.000000e+00
  %mul30 = fmul fast double %sub28, %sub29
  %add31 = fadd fast double %mul30, 1.000000e+00
  br label %if.end32

if.end32:                                         ; preds = %if.else27, %if.then22
  %b.1 = phi double [ %sub26, %if.then22 ], [ %add31, %if.else27 ]
  %sub33 = fsub fast double 1.000000e+00, %sub17
  %mul34 = fmul fast double %sub17, %conv
  %mul35 = fmul fast double %mul34, %sub33
  %sub36 = fsub fast double %b.1, %.
  %add37 = fadd fast double %b.1, %mul35
  %add38 = fadd fast double %add37, %sub36
  ret double %add38
}

If you think this is not sufficient for some reason, please at least remove the unneeded metadata, unused functions from the test case, replace the calls to fprintf & co that are needed to have uses for the FMA results with simpler functions (or return the result) and replace the globals by adding parameters if required. Also the other ifs should not matter to the combiner either, so I would simplify the CFG there as well.

Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll

pending simplification in fma-aggressive, I would move this test function to fma-aggressive.ll too.

lib/Target/AArch64/AArch64.td
152	How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's probably enough to just have `-aggressive-fma` for now.
lib/Target/AArch64/AArch64ISelLowering.cpp
116	I think there is no need for this option any longer, `-mattr=+aggressive-fma-float+aggressive-fma-double` can be used instead.
10977	If we are only using subtarget features here, we can move this to the header I think.
test/CodeGen/AArch64/fma-simple.ll
2 ↗	(On Diff #128884)	Please also add a test that enables aggressive fma fusion using mattr for non-thunderX2.

steleman added inline comments.Jan 22 2018, 8:26 AM

lib/Target/AArch64/AArch64.td
152	Could you please clarify this? I.e. are you referring to quad-precision? T99 doesn't have quad-precision FMA. Only floats and doubles. Are there any AArch64 micro-arch'es that can do quad-precision FMA? Or are you asking about SIMD?

fhahn added inline comments.Jan 22 2018, 8:31 AM

lib/Target/AArch64/AArch64.td
152	Sorry, I meant I think a single option to enable aggressive FMA would be enough and separate options for float and double seems too fine grained, unless I am missing something :)

steleman added inline comments.Jan 22 2018, 8:42 AM

lib/Target/AArch64/AArch64.td
152	Aaah, OK. Got it. Here's my thinking: There may be some AArch64 micro-arch'es that would want to enable FMA for floats, but not doubles. Or doubles, but not floats. Because of latency, or some other constraint. So the idea is to allow this differentiation. Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features. And then there's also FMLA and FMLS. For T99 it doesn't matter because all the FMA instructions have the same latency. But I'm not sure this applies to all the other AArch64 micro-arch'es.

fhahn added inline comments.Jan 22 2018, 9:06 AM

lib/Target/AArch64/AArch64.td
152	I see what you mean, I just think that we should make our lives more complicated only when there actually are micro-arch'es that want to enable aggressive FMA for double or float only ;) Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features. I am not sure I follow. I think @MatzeB 's issue was using `getProcFamily()` and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

If there's only one EnableAggressiveFMA boolean in AArch64.td:

What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example).

Won't they have to call getProcFamily() in enableAggressiveFMAFusion()?

As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type.

Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types?

In D40696#984012, @steleman wrote:

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

If there's only one EnableAggressiveFMA boolean in AArch64.td:

What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example).

Won't they have to call getProcFamily() in enableAggressiveFMAFusion()?

As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type.

Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types?

My 2 cents: In general, we could think of probably thousands of subtarget features describing how different potential micro-architectures might handle specific instructions or sequences of instructions.
Implementing all of those thousands of subtarget features without knowing that it actually makes a difference for any of the micro-architectures that LLVM does know about doesn't make sense.
Following that same logic, I'd choose to have 1 subtarget feature introduced in this patch, assuming IIUC that you only really need one for the functionality you introduce in this patch.
If in the future it becomes clear there is a micro-architecture for which in LLVM we want to model different behaviour with respect to AggressiveFMA for floats, doubles, vectors, or something else, we can introduce new subtarget features then.
Introducing them now, without there being a demonstrated need for it, feels a bit like premature optimization to me.
I hope that makes sense?

Updated per latest comments:

Removed AggressiveFMAFloat and AggressiveFMADouble from AArch64.td.
Created FeatureAggressiveFMA in AArch64.td.
Merged test cases into one, including the IR test case from Florian Hahn.

LGTM with 2 small comments, I think all comments should be addressed now, thanks! Please wait with committing for another day or so, to give people some time to raise additional concerns.

When committing, please adjust the commit title & message to mention the target feature rather than the option. And it would be great if you could prefix the title again with [AArch64]

lib/Target/AArch64/AArch64.td
153	Just `aggressive-fma` seems more in line with the existing target feature names. `+aggressive-fma` enables the target feature, `-aggressive-fma` explicitly disables it.
409	nit: move it to the other FeatureXXX in the list.

This revision is now accepted and ready to land.Jan 24 2018, 1:38 AM

Updated per latest comments.

Closed by commit rL323474: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others. (authored by joel_k_jones). · Explain WhyJan 25 2018, 1:57 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64.td

9 lines

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

4 lines

AArch64Subtarget.h

2 lines

test/

CodeGen/

AArch64/

fma-aggressive.ll

103 lines

Diff 131192

lib/Target/AArch64/AArch64.td

Context not available.

	def FeatureDotProd : SubtargetFeature<	def FeatureDotProd : SubtargetFeature<
	"dotprod", "HasDotProd", "true",	"dotprod", "HasDotProd", "true",
	"Enable dot product support">;	"Enable dot product support">;

	def FeatureNoNegativeImmediates : SubtargetFeature<"no-neg-immediates",	def FeatureNoNegativeImmediates : SubtargetFeature<"no-neg-immediates",
	"NegativeImmediates", "false",	"NegativeImmediates", "false",
	"Convert immediates and instructions "	"Convert immediates and instructions "
	"to their negated or complemented "	"to their negated or complemented "
	"equivalent when the immediate does "	"equivalent when the immediate does "
	"not fit in the encoding.">;	"not fit in the encoding.">;

	def FeatureLSLFast : SubtargetFeature<	def FeatureLSLFast : SubtargetFeature<
	"lsl-fast", "HasLSLFast", "true",	"lsl-fast", "HasLSLFast", "true",
	"CPU has a fastpath logical shift of up to 3 places">;	"CPU has a fastpath logical shift of up to 3 places">;

		def FeatureAggressiveFMA :
		fhahnUnsubmitted Not Done Reply Inline Actions How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's probably enough to just have `-aggressive-fma` for now. fhahn: How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's…
		stelemanAuthorUnsubmitted Not Done Reply Inline Actions Could you please clarify this? I.e. are you referring to quad-precision? T99 doesn't have quad-precision FMA. Only floats and doubles. Are there any AArch64 micro-arch'es that can do quad-precision FMA? Or are you asking about SIMD? steleman: Could you please clarify this? I.e. are you referring to quad-precision? T99 doesn't have…
		fhahnUnsubmitted Not Done Reply Inline Actions Sorry, I meant I think a single option to enable aggressive FMA would be enough and separate options for float and double seems too fine grained, unless I am missing something :) fhahn: Sorry, I meant I think a single option to enable aggressive FMA would be enough and separate…
		stelemanAuthorUnsubmitted Not Done Reply Inline Actions Aaah, OK. Got it. Here's my thinking: There may be some AArch64 micro-arch'es that would want to enable FMA for floats, but not doubles. Or doubles, but not floats. Because of latency, or some other constraint. So the idea is to allow this differentiation. Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features. And then there's also FMLA and FMLS. For T99 it doesn't matter because all the FMA instructions have the same latency. But I'm not sure this applies to all the other AArch64 micro-arch'es. steleman: Aaah, OK. Got it. Here's my thinking: There may be some AArch64 micro-arch'es that would want…
		fhahnUnsubmitted Not Done Reply Inline Actions I see what you mean, I just think that we should make our lives more complicated only when there actually are micro-arch'es that want to enable aggressive FMA for double or float only ;) Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features. I am not sure I follow. I think @MatzeB 's issue was using `getProcFamily()` and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974 fhahn: I see what you mean, I just think that we should make our lives more complicated only when…
		SubtargetFeature<"enable-aggressive-fma",
		fhahnUnsubmitted Not Done Reply Inline Actions Just `aggressive-fma` seems more in line with the existing target feature names. `+aggressive-fma` enables the target feature, `-aggressive-fma` explicitly disables it. fhahn: Just `aggressive-fma` seems more in line with the existing target feature names. `+aggressive…
		"HasAggressiveFMA",
		"true",
		"Enable Aggressive FMA for floating-point.">;

	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
	// Architectures.	// Architectures.
	//	//

	def HasV8_1aOps : SubtargetFeature<"v8.1a", "HasV8_1aOps", "true",	def HasV8_1aOps : SubtargetFeature<"v8.1a", "HasV8_1aOps", "true",
	"Support ARM v8.1a instructions", [FeatureCRC, FeatureLSE, FeatureRDM]>;	"Support ARM v8.1a instructions", [FeatureCRC, FeatureLSE, FeatureRDM]>;

	def HasV8_2aOps : SubtargetFeature<"v8.2a", "HasV8_2aOps", "true",	def HasV8_2aOps : SubtargetFeature<"v8.2a", "HasV8_2aOps", "true",
	Show All 16 Lines
	FeatureCRC,	FeatureCRC,
	FeatureCrypto,	FeatureCrypto,
	FeatureFPARMv8,	FeatureFPARMv8,
	FeatureArithmeticBccFusion,	FeatureArithmeticBccFusion,
	FeatureNEON,	FeatureNEON,
	FeaturePostRAScheduler,	FeaturePostRAScheduler,
	FeaturePredictableSelectIsExpensive,	FeaturePredictableSelectIsExpensive,
	FeatureLSE,	FeatureLSE,
	HasV8_1aOps]>;	HasV8_1aOps,
		FeatureAggressiveFMA]>;
		fhahnUnsubmitted Not Done Reply Inline Actions nit: move it to the other FeatureXXX in the list. fhahn: nit: move it to the other FeatureXXX in the list.

	def ProcThunderX : SubtargetFeature<"thunderx", "ARMProcFamily", "ThunderX",	def ProcThunderX : SubtargetFeature<"thunderx", "ARMProcFamily", "ThunderX",
	"Cavium ThunderX processors", [	"Cavium ThunderX processors", [
	FeatureCRC,	FeatureCRC,
	FeatureCrypto,	FeatureCrypto,
	FeatureFPARMv8,	FeatureFPARMv8,
	FeaturePerfMon,	FeaturePerfMon,
	FeaturePostRAScheduler,	FeaturePostRAScheduler,
	FeaturePredictableSelectIsExpensive,	FeaturePredictableSelectIsExpensive,
	FeatureNEON]>;	FeatureNEON]>;

	def ProcThunderXT88 : SubtargetFeature<"thunderxt88", "ARMProcFamily",	def ProcThunderXT88 : SubtargetFeature<"thunderxt88", "ARMProcFamily",
	"ThunderXT88",	"ThunderXT88",
	"Cavium ThunderX processors", [	"Cavium ThunderX processors", [
	FeatureCRC,	FeatureCRC,
	FeatureCrypto,	FeatureCrypto,
Context not available.

lib/Target/AArch64/AArch64ISelLowering.h

Context not available.
	return VT == MVT::f32 \|\| VT == MVT::f64;	return VT == MVT::f32 \|\| VT == MVT::f64;
	}	}

	bool supportSplitCSR(MachineFunction *MF) const override {	bool supportSplitCSR(MachineFunction *MF) const override {
	return MF->getFunction().getCallingConv() == CallingConv::CXX_FAST_TLS &&	return MF->getFunction().getCallingConv() == CallingConv::CXX_FAST_TLS &&
	MF->getFunction().hasFnAttribute(Attribute::NoUnwind);	MF->getFunction().hasFnAttribute(Attribute::NoUnwind);
	}	}
	void initializeSplitCSR(MachineBasicBlock *Entry) const override;	void initializeSplitCSR(MachineBasicBlock *Entry) const override;
	void insertCopiesSplitCSR(	void insertCopiesSplitCSR(
	MachineBasicBlock *Entry,	MachineBasicBlock *Entry,
	const SmallVectorImpl<MachineBasicBlock *> &Exits) const override;	const SmallVectorImpl<MachineBasicBlock *> &Exits) const override;

	bool supportSwiftError() const override {	bool supportSwiftError() const override {
	return true;	return true;
	}	}

		/// Enable aggressive FMA fusion on targets that want it.
		bool enableAggressiveFMAFusion(EVT VT) const override;

	/// Returns the size of the platform's va_list object.	/// Returns the size of the platform's va_list object.
	unsigned getVaListSizeInBits(const DataLayout &DL) const override;	unsigned getVaListSizeInBits(const DataLayout &DL) const override;

	/// Returns true if \p VecTy is a legal interleaved access type. This	/// Returns true if \p VecTy is a legal interleaved access type. This
	/// function checks the vector element type and the overall width of the	/// function checks the vector element type and the overall width of the
	/// vector.	/// vector.
	bool isLegalInterleavedAccessType(VectorType *VecTy,	bool isLegalInterleavedAccessType(VectorType *VecTy,
	const DataLayout &DL) const;	const DataLayout &DL) const;

	/// Returns the number of interleaved accesses that will be generated when	/// Returns the number of interleaved accesses that will be generated when
	/// lowering accesses of the given type.	/// lowering accesses of the given type.
	unsigned getNumInterleavedAccesses(VectorType *VecTy,	unsigned getNumInterleavedAccesses(VectorType *VecTy,
	const DataLayout &DL) const;	const DataLayout &DL) const;

	MachineMemOperand::Flags getMMOFlags(const Instruction &I) const override;	MachineMemOperand::Flags getMMOFlags(const Instruction &I) const override;

Context not available.

lib/Target/AArch64/AArch64ISelLowering.cpp

Context not available.
	}	}
	}	}

		fhahnUnsubmitted Not Done Reply Inline Actions If we are only using subtarget features here, we can move this to the header I think. fhahn: If we are only using subtarget features here, we can move this to the header I think.
	bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeList Attr) const {	bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeList Attr) const {
	// Integer division on AArch64 is expensive. However, when aggressively	// Integer division on AArch64 is expensive. However, when aggressively
	// optimizing for code size, we prefer to use a div instruction, as it is	// optimizing for code size, we prefer to use a div instruction, as it is
	// usually smaller than the alternative sequence.	// usually smaller than the alternative sequence.
	// The exception to this is vector division. Since AArch64 doesn't have vector	// The exception to this is vector division. Since AArch64 doesn't have vector
	// integer division, leaving the division as-is is a loss even in terms of	// integer division, leaving the division as-is is a loss even in terms of
	// size, because it will have to be scalarized, while the alternative code	// size, because it will have to be scalarized, while the alternative code
	// sequence can be performed in vector form.	// sequence can be performed in vector form.
	bool OptSize =	bool OptSize =
	Attr.hasAttribute(AttributeList::FunctionIndex, Attribute::MinSize);	Attr.hasAttribute(AttributeList::FunctionIndex, Attribute::MinSize);
	return OptSize && !VT.isVector();	return OptSize && !VT.isVector();
	}	}

		bool AArch64TargetLowering::enableAggressiveFMAFusion(EVT VT) const {
		return Subtarget->hasAggressiveFMA() && VT.isFloatingPoint();
		}

	unsigned	unsigned
	AArch64TargetLowering::getVaListSizeInBits(const DataLayout &DL) const {	AArch64TargetLowering::getVaListSizeInBits(const DataLayout &DL) const {
	if (Subtarget->isTargetDarwin() \|\| Subtarget->isTargetWindows())	if (Subtarget->isTargetDarwin() \|\| Subtarget->isTargetWindows())
	return getPointerTy(DL).getSizeInBits();	return getPointerTy(DL).getSizeInBits();

	return 3 * getPointerTy(DL).getSizeInBits() + 2 * 32;	return 3 * getPointerTy(DL).getSizeInBits() + 2 * 32;
	}	}

	void AArch64TargetLowering::finalizeLowering(MachineFunction &MF) const {	void AArch64TargetLowering::finalizeLowering(MachineFunction &MF) const {
	MF.getFrameInfo().computeMaxCallFrameSize(MF);	MF.getFrameInfo().computeMaxCallFrameSize(MF);
	TargetLoweringBase::finalizeLowering(MF);	TargetLoweringBase::finalizeLowering(MF);
	}	}
Context not available.

lib/Target/AArch64/AArch64Subtarget.h

Context not available.
	bool HasV8_3aOps = false;	bool HasV8_3aOps = false;

	bool HasFPARMv8 = false;	bool HasFPARMv8 = false;
	bool HasNEON = false;	bool HasNEON = false;
	bool HasCrypto = false;	bool HasCrypto = false;
	bool HasDotProd = false;	bool HasDotProd = false;
	bool HasCRC = false;	bool HasCRC = false;
	bool HasLSE = false;	bool HasLSE = false;
	bool HasRAS = false;	bool HasRAS = false;
	bool HasRDM = false;	bool HasRDM = false;
	bool HasPerfMon = false;	bool HasPerfMon = false;
	bool HasFullFP16 = false;	bool HasFullFP16 = false;
	bool HasSPE = false;	bool HasSPE = false;
	bool HasLSLFast = false;	bool HasLSLFast = false;
	bool HasSVE = false;	bool HasSVE = false;
	bool HasRCPC = false;	bool HasRCPC = false;
		bool HasAggressiveFMA = false;

	// HasZeroCycleRegMove - Has zero-cycle register mov instructions.	// HasZeroCycleRegMove - Has zero-cycle register mov instructions.
	bool HasZeroCycleRegMove = false;	bool HasZeroCycleRegMove = false;

	// HasZeroCycleZeroing - Has zero-cycle zeroing instructions.	// HasZeroCycleZeroing - Has zero-cycle zeroing instructions.
	bool HasZeroCycleZeroing = false;	bool HasZeroCycleZeroing = false;
	bool HasZeroCycleZeroingFPWorkaround = false;	bool HasZeroCycleZeroingFPWorkaround = false;

	Show All 16 Lines
	bool supportsAddressTopByteIgnored() const;	bool supportsAddressTopByteIgnored() const;

	bool hasPerfMon() const { return HasPerfMon; }	bool hasPerfMon() const { return HasPerfMon; }
	bool hasFullFP16() const { return HasFullFP16; }	bool hasFullFP16() const { return HasFullFP16; }
	bool hasSPE() const { return HasSPE; }	bool hasSPE() const { return HasSPE; }
	bool hasLSLFast() const { return HasLSLFast; }	bool hasLSLFast() const { return HasLSLFast; }
	bool hasSVE() const { return HasSVE; }	bool hasSVE() const { return HasSVE; }
	bool hasRCPC() const { return HasRCPC; }	bool hasRCPC() const { return HasRCPC; }
		bool hasAggressiveFMA() const { return HasAggressiveFMA; }

	bool isLittleEndian() const { return IsLittle; }	bool isLittleEndian() const { return IsLittle; }

	bool isTargetDarwin() const { return TargetTriple.isOSDarwin(); }	bool isTargetDarwin() const { return TargetTriple.isOSDarwin(); }
	bool isTargetIOS() const { return TargetTriple.isiOS(); }	bool isTargetIOS() const { return TargetTriple.isiOS(); }
	bool isTargetLinux() const { return TargetTriple.isOSLinux(); }	bool isTargetLinux() const { return TargetTriple.isOSLinux(); }
	bool isTargetWindows() const { return TargetTriple.isOSWindows(); }	bool isTargetWindows() const { return TargetTriple.isOSWindows(); }
	bool isTargetAndroid() const { return TargetTriple.isAndroid(); }	bool isTargetAndroid() const { return TargetTriple.isAndroid(); }
	bool isTargetFuchsia() const { return TargetTriple.isOSFuchsia(); }	bool isTargetFuchsia() const { return TargetTriple.isOSFuchsia(); }

	bool isTargetCOFF() const { return TargetTriple.isOSBinFormatCOFF(); }	bool isTargetCOFF() const { return TargetTriple.isOSBinFormatCOFF(); }
	bool isTargetELF() const { return TargetTriple.isOSBinFormatELF(); }	bool isTargetELF() const { return TargetTriple.isOSBinFormatELF(); }
	bool isTargetMachO() const { return TargetTriple.isOSBinFormatMachO(); }	bool isTargetMachO() const { return TargetTriple.isOSBinFormatMachO(); }

	bool useAA() const override { return UseAA; }	bool useAA() const override { return UseAA; }

Context not available.

test/CodeGen/AArch64/fma-aggressive.ll

				; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=thunderx2t99 -fp-contract=fast < %s \| FileCheck %s --check-prefix=CHECK-FMA
				; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=generic < %s \| FileCheck %s --check-prefix=CHECK-GENERIC
				fhahnUnsubmitted Not Done Reply Inline Actions I think it would be good to also add a check that aggressive FMA is not enabled for other micro architectures. fhahn: I think it would be good to also add a check that aggressive FMA is not enabled for other micro…
				stelemanAuthorUnsubmitted Not Done Reply Inline Actions OK I'll add another test named 'fma-not-aggressive.ll' :-) steleman: OK I'll add another test named 'fma-not-aggressive.ll' :-)
				fhahnUnsubmitted Not Done Reply Inline Actions Thanks. You should be able just use a `check-prefix` in the same file, e.g. as in https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/AArch64/preferred-function-alignment.ll fhahn: Thanks. You should be able just use a `check-prefix` in the same file, e.g. as in https…
				; RUN: llc -O2 -mtriple=aarch64-none-linux-gnu -mcpu=generic -fp-contract=fast -mattr=+enable-aggressive-fma < %s \| FileCheck %s --check-prefix=CHECK-FMA-FORCED
				define double @test(double %x, double %y, double %z) {
				; CHECK-FMA: fmul d3, d0, d1
				; CHECK-FMA: fmadd d0, d0, d1, d2
				; CHECK-FMA-FORCED: fmul d3, d0, d1
				; CHECK-FMA-FORCED: fmadd d0, d0, d1, d2
				; CHECK-GENERIC: fmul d0, d0, d1
				; CHECK-GENERIC: fadd d1, d0, d2
				%mul = fmul fast double %x, %y
				%add = fadd fast double %mul, %z
				%use2 = fdiv fast double %mul, %add
				ret double %use2
				}

				define double @test1(double %a, double %b, double %conv, i32 %rem) {
				entry:
				%conv.neg = fsub fast double -0.000000e+00, %conv
				%cmp3 = icmp eq i32 %rem, 0
				%. = select i1 %cmp3, double 1.000000e+00, double -1.000000e+00
				; CHECK-GENERIC: fmov d4, #-1.00000000
				; CHECK-GENERIC: fmov d1, #1.00000000
				; CHECK-FMA: fmov d3, #1.00000000
				; CHECK-FMA: fmov d1, #-1.00000000
				; CHECK-FMA-FORCED: fmov d4, #-1.00000000
				; CHECK-FMA-FORCED: fmov d1, #1.00000000
				%add = fadd fast double %a, 1.000000e+00
				%add8 = fadd fast double %add, %.
				%mul = fmul fast double %add8, %a
				%add9 = fadd fast double %mul, 1.000000e+01
				; CHECK-GENERIC: fneg d5, d2
				; CHECK-GENERIC: fmov d6, #10.00000000
				; CHECK-GENERIC: fmul d0, d7, d0
				; CHECK-GENERIC: fadd d6, d0, d6
				; CHECK-FMA: fmadd d7, d5, d0, d7
				; CHECK-FMA: fnmadd d0, d5, d0, d2
				; CHECK-FMA-FORCED: fmadd d5, d6, d0, d5
				; CHECK-FMA-FORCED: fnmadd d0, d6, d0, d2
				%mul10 = fmul fast double %add9, 2.000000e+00
				%add11 = fsub fast double %conv.neg, %mul
				%sub = fadd fast double %add11, %mul10
				; CHECK-GENERIC: fmov d7, #2.00000000
				; CHECK-GENERIC: fmadd d5, d6, d7, d5
				; CHECK-GENERIC: fmul d0, d5, d0
				; CHECK-FMA: fmov d5, #2.00000000
				; CHECK-FMA: fmadd d0, d7, d5, d0
				; CHECK-FMA-FORCED: fmov d6, #2.00000000
				; CHECK-FMA-FORCED: fmadd d0, d5, d6, d0
				%mul14 = fmul fast double %sub, %mul
				%mul15 = fmul fast double %mul14, %.
				%sub17 = fadd fast double %mul15, %mul14
				; CHECK-GENERIC: fmadd d0, d0, d3, d0
				; CHECK-FMA: fmul d0, d0, d6
				; CHECK-FMA: fmadd d0, d0, d4, d0
				; CHECK-FMA-FORCED: fmul d0, d0, d7
				; CHECK-FMA-FORCED: fmadd d0, d0, d3, d0
				br i1 %cmp3, label %if.then22, label %if.else27

				if.then22:
				%add23 = fadd fast double %sub17, -1.000000e+00
				%sub24 = fadd fast double %sub17, 1.000000e+00
				%mul25 = fmul fast double %add23, %sub24
				%sub26 = fsub fast double 1.000000e+00, %mul25
				; CHECK-GENERIC: fsub d4, d4, d0
				; CHECK-GENERIC: fadd d5, d0, d1
				; CHECK-GENERIC: fmadd d4, d4, d5, d1
				; CHECK-FMA: fmadd d5, d5, d6, d3
				; CHECK-FMA: fmsub d1, d1, d6, d3
				; CHECK-FMA-FORCED: fmadd d4, d4, d5, d1
				; CHECK-FMA-FORCED: b .LBB1_3
				; CHECK-FMA-FORCED: fadd d4, d0, d4
				; CHECK-FMA-FORCED: fadd d5, d0, d1
				; CHECK-FMA-FORCED: fmsub d4, d4, d5, d1
				br label %if.end32

				if.else27:
				%sub28 = fsub fast double -1.000000e+00, %sub17
				%sub29 = fadd fast double %sub17, 1.000000e+00
				%mul30 = fmul fast double %sub28, %sub29
				%add31 = fadd fast double %mul30, 1.000000e+00
				; CHECK-GENERIC: fadd d4, d0, d4
				; CHECK-GENERIC: fadd d5, d0, d1
				; CHECK-GENERIC: fmsub d4, d4, d5, d1
				; CHECK-FMA: fsub d2, d1, d4
				; CHECK-FMA: fmadd d0, d0, d3, d1
				; CHECK-FMA: fadd d0, d0, d2
				; CHECK-FMA-FORCED: fsub d1, d1, d0
				; CHECK-FMA-FORCED: fmadd d0, d0, d1, d4
				; CHECK-FMA-FORCED: fadd d0, d0, d2
				br label %if.end32

				if.end32: ; preds = %if.else27, %if.then22
				%b.1 = phi double [ %sub26, %if.then22 ], [ %add31, %if.else27 ]
				%sub33 = fsub fast double 1.000000e+00, %sub17
				%mul34 = fmul fast double %sub17, %conv
				%mul35 = fmul fast double %mul34, %sub33
				%sub36 = fsub fast double %b.1, %.
				%add37 = fadd fast double %b.1, %mul35
				%add38 = fadd fast double %add37, %sub36
				ret double %add38
				}

				fhahnUnsubmitted Not Done Reply Inline Actions Can we drop the meta data from here to the end or is it required for the test? fhahn: Can we drop the meta data from here to the end or is it required for the test?
				stelemanAuthorUnsubmitted Not Done Reply Inline Actions I'll check and if it works without the metadata, I'll remove it. steleman: I'll check and if it works without the metadata, I'll remove it.
				fhahnUnsubmitted Not Done Reply Inline Actions Great thanks. On second look, I think there is no need for the test case to be that complicated, for example, do we need all those global variables, function calls, ifs, loops? AFAIK DagCombine only looks at nodes and uses and does not have a cost model tied to loop iterations. Unless I am missing something, it should be enough to have a couple of basic blocks with fadd/fmul instructions to test this, e.g. fadd/fmul with multiple users, or the more complex patterns guarded by `if (Aggressive)` in `visitFADDForFMACombine`. A much simpler test case will make it easier to debug if it fails in the future and also easier to understand when reviewing if it does the right thing. fhahn: Great thanks. On second look, I think there is no need for the test case to be that complicated…
				stelemanAuthorUnsubmitted Not Done Reply Inline Actions It's not as simple as you think. DAGCombiner.cpp:9636 // fold (fmul (fadd x, +1.0), y) -> (fma x, y, y) // fold (fmul (fadd x, -1.0), y) -> (fma x, y, (fneg y)) auto FuseFADD = [&](SDValue X, SDValue Y) { if (X.getOpcode() == ISD::FADD && (Aggressive \|\| X->hasOneUse())) { auto XC1 = isConstOrConstSplatFP(X.getOperand(1)); if (XC1 && XC1->isExactlyValue(+1.0)) return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, Y); if (XC1 && XC1->isExactlyValue(-1.0)) return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, DAG.getNode(ISD::FNEG, SL, VT, Y)); } return SDValue(); }; Note: if (X.getOpcode() == ISD::FADD && (Aggressive \|\| X->hasOneUse())) What we want to do here is make sure that X has more than one use. I.e. X->hasOneUse() evaluates to false. That's the only way of testing that FMA happens when Aggressive is true. In order to make sure that X ends up having more than one use, the code needs to be complicated. This code isn't lifted from some existing program. I wrote it specifically for this purpose. If there was no need for these loops, ifs, global variables, etc, I wouldn't have written them. steleman: It's not as simple as you think. DAGCombiner.cpp:9636 ``` // fold (fmul (fadd x, +1.0)…
				fhahnUnsubmitted Not Done Reply Inline Actions You do not need complicated code to have get multiple uses. In the function below, %mul has 2 uses and will only be combined by DAGCombine, if aggressive FMA fusion is enabled (that's the pattern defined in DAGCombiner.cpp:9099) define double @test(double %x, double %y, double %z) { %mul = fmul fast double %x, %y %add = fadd fast double %mul, %z %use2 = fdiv fast double %mul, %add ret double %use2 } fhahn: You do not need complicated code to have get multiple uses. In the function below, %mul has 2…
				stelemanAuthorUnsubmitted Not Done Reply Inline Actions I will be happy to include your test case in a separate, and different test for Aggressive FMA, but I do not see how your test demonstrates Aggressive FMA as triggered by the compilation of a real program, written in a language that software developers realistically write software in. Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture. steleman: I will be happy to include your test case in a separate, and different test for Aggressive FMA…
				fhahnUnsubmitted Not Done Reply Inline Actions Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. Sorry, I just intended to highlight how to create a function that tests an isolated pattern that is triggered by aggressive FMA. Ideally we would have similar functions for all important patterns. Then the test should catch all relevant regressions in DAGCombine with aggressive FMA. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture. Agreed! But for that, the LLVM test-suite is probably a better place to make sure there are no performance regressions as it uses Clang directly, like users would, see http://llvm.org/docs/TestingGuide.html . `llvm/test` should contain small, isolated regression tests. I see why it is tempting to add a big test case like that to llvm/test, but 1) it still only guards against changes in Codegen, but opt or clang could mess up the IR to begin with and 2) it makes the regression test suite more fragile than it needs to be. If others are happy with the big test, I won't object though fhahn: > Also: your test does not provide any method of validating the results produced by enabling…
				MatzeBUnsubmitted Not Done Reply Inline Actions Yes bigger tests are better of in the test-suite; you summed up llvms testing strategy nicely! MatzeB: Yes bigger tests are better of in the test-suite; you summed up llvms testing strategy nicely!

This is an archive of the discontinued LLVM Phabricator instance.

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch'sClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 131192

lib/Target/AArch64/AArch64.td

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64Subtarget.h

test/CodeGen/AArch64/fma-aggressive.ll

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's
ClosedPublic