This is an archive of the discontinued LLVM Phabricator instance.

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's
ClosedPublic

Authored by steleman on Nov 30 2017, 5:19 PM.

Details

Summary

This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma).

Test case demonstrating the effects on T99 is included.

Diff Detail

Repository
rL LLVM

Event Timeline

steleman created this revision.Nov 30 2017, 5:19 PM
MatzeB added a subscriber: MatzeB.Nov 30 2017, 5:26 PM

My points from D40177 still stand.

My points from D40177 still stand.

Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA.

Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion().

My points from D40177 still stand.

Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA.

Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion().

You would not need to edit AArch64ISelLowering again after adding a subtargetfeature.

fhahn added a subscriber: fhahn.Dec 1 2017, 1:27 AM
fhahn added inline comments.Dec 1 2017, 8:00 AM
test/CodeGen/AArch64/fma-aggressive.ll
1 ↗(On Diff #125061)

I think it would be good to also add a check that aggressive FMA is not enabled for other micro architectures.

323 ↗(On Diff #125061)

Can we drop the meta data from here to the end or is it required for the test?

steleman added inline comments.Dec 1 2017, 8:09 AM
test/CodeGen/AArch64/fma-aggressive.ll
1 ↗(On Diff #125061)

OK I'll add another test named 'fma-not-aggressive.ll' :-)

323 ↗(On Diff #125061)

I'll check and if it works without the metadata, I'll remove it.

fhahn added inline comments.Dec 1 2017, 9:21 AM
test/CodeGen/AArch64/fma-aggressive.ll
1 ↗(On Diff #125061)

Thanks. You should be able just use a check-prefix in the same file, e.g. as in https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/AArch64/preferred-function-alignment.ll

323 ↗(On Diff #125061)

Great thanks. On second look, I think there is no need for the test case to be that complicated, for example, do we need all those global variables, function calls, ifs, loops? AFAIK DagCombine only looks at nodes and uses and does not have a cost model tied to loop iterations.

Unless I am missing something, it should be enough to have a couple of basic blocks with fadd/fmul instructions to test this, e.g. fadd/fmul with multiple users, or the more complex patterns guarded by if (Aggressive) in visitFADDForFMACombine.

A much simpler test case will make it easier to debug if it fails in the future and also easier to understand when reviewing if it does the right thing.

steleman added inline comments.Dec 4 2017, 7:54 AM
test/CodeGen/AArch64/fma-aggressive.ll
323 ↗(On Diff #125061)

It's not as simple as you think.

DAGCombiner.cpp:9636

// fold (fmul (fadd x, +1.0), y) -> (fma x, y, y)
// fold (fmul (fadd x, -1.0), y) -> (fma x, y, (fneg y))
auto FuseFADD = [&](SDValue X, SDValue Y) {
  if (X.getOpcode() == ISD::FADD && (Aggressive || X->hasOneUse())) {
    auto XC1 = isConstOrConstSplatFP(X.getOperand(1));
    if (XC1 && XC1->isExactlyValue(+1.0))
      return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, Y);
    if (XC1 && XC1->isExactlyValue(-1.0))
      return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y,
                         DAG.getNode(ISD::FNEG, SL, VT, Y));
  }
  return SDValue();
};

Note:

if (X.getOpcode() == ISD::FADD && (Aggressive || X->hasOneUse()))

What we want to do here is make sure that X has more than one use. I.e. X->hasOneUse() evaluates to false. That's the only way of testing that FMA happens when Aggressive is true.

In order to make sure that X ends up having more than one use, the code needs to be complicated.

This code isn't lifted from some existing program. I wrote it specifically for this purpose. If there was no need for these loops, ifs, global variables, etc, I wouldn't have written them.

fhahn added inline comments.Dec 4 2017, 8:29 AM
test/CodeGen/AArch64/fma-aggressive.ll
323 ↗(On Diff #125061)

You do not need complicated code to have get multiple uses. In the function below, %mul has 2 uses and will only be combined by DAGCombine, if aggressive FMA fusion is enabled (that's the pattern defined in DAGCombiner.cpp:9099)

define double @test(double %x, double %y, double %z) {
  %mul = fmul fast double %x, %y
  %add = fadd fast double %mul, %z
  %use2 = fdiv fast double %mul, %add
  ret double %use2
}
fhahn added a comment.Dec 4 2017, 8:35 AM

On a related note, the MachineCombiner ignores enableAggressiveFMAFusion, so at -O3, the example won't be combined. I plan to put up a patch soon that updates the machine combiner to consider combining multiple uses, if profitable.

steleman added inline comments.Dec 5 2017, 9:11 AM
test/CodeGen/AArch64/fma-aggressive.ll
323 ↗(On Diff #125061)

I will be happy to include your test case in a separate, and different test for Aggressive FMA, but I do not see how your test demonstrates Aggressive FMA as triggered by the compilation of a real program, written in a language that software developers realistically write software in.

Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression.

For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture.

steleman updated this revision to Diff 126011.Dec 7 2017, 11:48 AM

Updated large test case to use --check-prefix={CHECK-FMA|CHECK-GENERIC}.
Included Florian Hahn's small test case for FMA.
Metadata in the large LLVM IR test case is required.

fhahn added a reviewer: fhahn.Dec 7 2017, 12:05 PM

Thanks! I agree with Matthias in that a target feature would be better to enable/disable aggressive FMA. In AArch64.td there are already quite a lot of similar target features, like FeatureFuseAES or FeatureSlowSTRQro that enable certain optimizations for some micro architectures.

test/CodeGen/AArch64/fma-aggressive.ll
323 ↗(On Diff #125061)

Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression.

Sorry, I just intended to highlight how to create a function that tests an isolated pattern that is triggered by aggressive FMA. Ideally we would have similar functions for all important patterns. Then the test should catch all relevant regressions in DAGCombine with aggressive FMA.

For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture.

Agreed! But for that, the LLVM test-suite is probably a better place to make sure there are no performance regressions as it uses Clang directly, like users would, see http://llvm.org/docs/TestingGuide.html .

llvm/test should contain small, isolated regression tests. I see why it is tempting to add a big test case like that to llvm/test, but 1) it still only guards against changes in Codegen, but opt or clang could mess up the IR to begin with and 2) it makes the regression test suite more fragile than it needs to be.

If others are happy with the big test, I won't object though

MatzeB added inline comments.Dec 7 2017, 1:02 PM
test/CodeGen/AArch64/fma-aggressive.ll
323 ↗(On Diff #125061)

Yes bigger tests are better of in the test-suite; you summed up llvms testing strategy nicely!

steleman updated this revision to Diff 128884.Jan 7 2018, 12:01 PM

Updated diff with latest changes:

  1. Added FeatureAggressiveFMAFloat and FeatureAggressiveFMADouble in AArch64.td as micro-arch features.
  2. Added EnableAggressiveFMA cl::opt in AArch64ISelLowering.cpp.
  3. Implemented AArch64TargetLowering::enableAggressiveFMAFusion in terms of the changes from AArch64.td.
  4. I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

It looks like the latest changes address the earlier concerns expressed in the review comments:

  • Use target feature: Present in AArch64.td, AArch64ISelLowering.h, AArch64ISelLowering.cpp, and AArch64Subtarget.h
  • Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll
  • Combined optimized/missed in .ll tests: Present in fma-aggressive.ll and fma-simple.ll

Is there anything else that is needed?

Thanks for updating the patch! I have a few suggestions about reducing fma-aggressive.ll , but besides that and removing aarch64-enable-aggressive-fma it looks quite good.

I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

It's fine to have a more complex test with more patterns than test/CodeGen/AArch64/fma-simple.ll, but test/CodeGen/AArch64/fma-aggressive.ll still contains lots of stuff unrelated the FMA fusion, e.g. define double @reset(double %x, double %y) local_unnamed_addr #0 is not used, the attributes and debug metadata is not needed, the global variables are not needed, the calls to fprint/frwite and so on are not needed and the loop should have not have an impact on the DAGCombiner.

I think the following test function should still expose patterns very similar to the original fma-aggressive.ll, but only focuses on the relevant bits for fusion:

define double @test1(double %a, double %b, double %conv, i32 %rem) {
entry:
  %conv.neg = fsub fast double -0.000000e+00, %conv
  %cmp3 = icmp eq i32 %rem, 0
  %. = select i1 %cmp3, double 1.000000e+00, double -1.000000e+00
  %add = fadd fast double %a, 1.000000e+00
  %add8 = fadd fast double %add, %.
  %mul = fmul fast double %add8, %a
  %add9 = fadd fast double %mul, 1.000000e+01
  %mul10 = fmul fast double %add9, 2.000000e+00
  %add11 = fsub fast double %conv.neg, %mul
  %sub = fadd fast double %add11, %mul10
  %mul14 = fmul fast double %sub, %mul
  %mul15 = fmul fast double %mul14, %.
  %sub17 = fadd fast double %mul15, %mul14
 br i1 %cmp3, label %if.then22, label %if.else27

if.then22:
  %add23 = fadd fast double %sub17, -1.000000e+00
  %sub24 = fadd fast double %sub17, 1.000000e+00
  %mul25 = fmul fast double %add23, %sub24
  %sub26 = fsub fast double 1.000000e+00, %mul25
  br label %if.end32

if.else27:
  %sub28 = fsub fast double -1.000000e+00, %sub17
  %sub29 = fadd fast double %sub17, 1.000000e+00
  %mul30 = fmul fast double %sub28, %sub29
  %add31 = fadd fast double %mul30, 1.000000e+00
  br label %if.end32

if.end32:                                         ; preds = %if.else27, %if.then22
  %b.1 = phi double [ %sub26, %if.then22 ], [ %add31, %if.else27 ]
  %sub33 = fsub fast double 1.000000e+00, %sub17
  %mul34 = fmul fast double %sub17, %conv
  %mul35 = fmul fast double %mul34, %sub33
  %sub36 = fsub fast double %b.1, %.
  %add37 = fadd fast double %b.1, %mul35
  %add38 = fadd fast double %add37, %sub36
  ret double %add38
}

If you think this is not sufficient for some reason, please at least remove the unneeded metadata, unused functions from the test case, replace the calls to fprintf & co that are needed to have uses for the FMA results with simpler functions (or return the result) and replace the globals by adding parameters if required. Also the other ifs should not matter to the combiner either, so I would simplify the CFG there as well.

Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll

pending simplification in fma-aggressive, I would move this test function to fma-aggressive.ll too.

lib/Target/AArch64/AArch64.td
152 ↗(On Diff #128884)

How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's probably enough to just have -aggressive-fma for now.

lib/Target/AArch64/AArch64ISelLowering.cpp
116 ↗(On Diff #128884)

I think there is no need for this option any longer, -mattr=+aggressive-fma-float+aggressive-fma-double can be used instead.

10977 ↗(On Diff #128884)

If we are only using subtarget features here, we can move this to the header I think.

test/CodeGen/AArch64/fma-simple.ll
2 ↗(On Diff #128884)

Please also add a test that enables aggressive fma fusion using mattr for non-thunderX2.

steleman added inline comments.Jan 22 2018, 8:26 AM
lib/Target/AArch64/AArch64.td
152 ↗(On Diff #128884)

Could you please clarify this? I.e. are you referring to quad-precision?

T99 doesn't have quad-precision FMA. Only floats and doubles. Are there any AArch64 micro-arch'es that can do quad-precision FMA?

Or are you asking about SIMD?

fhahn added inline comments.Jan 22 2018, 8:31 AM
lib/Target/AArch64/AArch64.td
152 ↗(On Diff #128884)

Sorry, I meant I think a single option to enable aggressive FMA would be enough and separate options for float and double seems too fine grained, unless I am missing something :)

steleman added inline comments.Jan 22 2018, 8:42 AM
lib/Target/AArch64/AArch64.td
152 ↗(On Diff #128884)

Aaah, OK. Got it.

Here's my thinking:

There may be some AArch64 micro-arch'es that would want to enable FMA for floats, but not doubles. Or doubles, but not floats. Because of latency, or some other constraint.

So the idea is to allow this differentiation.

Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features.

And then there's also FMLA and FMLS.

For T99 it doesn't matter because all the FMA instructions have the same latency. But I'm not sure this applies to all the other AArch64 micro-arch'es.

fhahn added inline comments.Jan 22 2018, 9:06 AM
lib/Target/AArch64/AArch64.td
152 ↗(On Diff #128884)

I see what you mean, I just think that we should make our lives more complicated only when there actually are micro-arch'es that want to enable aggressive FMA for double or float only ;)

Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features.

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

If there's only one EnableAggressiveFMA boolean in AArch64.td:

What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example).

Won't they have to call getProcFamily() in enableAggressiveFMAFusion()?

As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type.

Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types?

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

If there's only one EnableAggressiveFMA boolean in AArch64.td:

What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example).

Won't they have to call getProcFamily() in enableAggressiveFMAFusion()?

As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type.

Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types?

My 2 cents: In general, we could think of probably thousands of subtarget features describing how different potential micro-architectures might handle specific instructions or sequences of instructions.
Implementing all of those thousands of subtarget features without knowing that it actually makes a difference for any of the micro-architectures that LLVM does know about doesn't make sense.
Following that same logic, I'd choose to have 1 subtarget feature introduced in this patch, assuming IIUC that you only really need one for the functionality you introduce in this patch.
If in the future it becomes clear there is a micro-architecture for which in LLVM we want to model different behaviour with respect to AggressiveFMA for floats, doubles, vectors, or something else, we can introduce new subtarget features then.
Introducing them now, without there being a demonstrated need for it, feels a bit like premature optimization to me.
I hope that makes sense?

steleman updated this revision to Diff 131192.Jan 23 2018, 8:28 PM

Updated per latest comments:

  1. Removed AggressiveFMAFloat and AggressiveFMADouble from AArch64.td.
  2. Created FeatureAggressiveFMA in AArch64.td.
  3. Merged test cases into one, including the IR test case from Florian Hahn.
fhahn accepted this revision.Jan 24 2018, 1:38 AM

LGTM with 2 small comments, I think all comments should be addressed now, thanks! Please wait with committing for another day or so, to give people some time to raise additional concerns.

When committing, please adjust the commit title & message to mention the target feature rather than the option. And it would be great if you could prefix the title again with [AArch64]

lib/Target/AArch64/AArch64.td
153 ↗(On Diff #131192)

Just aggressive-fma seems more in line with the existing target feature names. +aggressive-fma enables the target feature, -aggressive-fma explicitly disables it.

409 ↗(On Diff #131192)

nit: move it to the other FeatureXXX in the list.

This revision is now accepted and ready to land.Jan 24 2018, 1:38 AM
steleman updated this revision to Diff 131303.Jan 24 2018, 9:58 AM

Updated per latest comments.

This revision was automatically updated to reflect the committed changes.