This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma).
Test case demonstrating the effects on T99 is included.
Differential D40696
Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's steleman on Nov 30 2017, 5:19 PM. Authored by
Details This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma). Test case demonstrating the effects on T99 is included.
Diff Detail
Event TimelineComment Actions Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA. Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion(). Comment Actions You would not need to edit AArch64ISelLowering again after adding a subtargetfeature.
Comment Actions On a related note, the MachineCombiner ignores enableAggressiveFMAFusion, so at -O3, the example won't be combined. I plan to put up a patch soon that updates the machine combiner to consider combining multiple uses, if profitable.
Comment Actions Updated large test case to use --check-prefix={CHECK-FMA|CHECK-GENERIC}. Comment Actions Thanks! I agree with Matthias in that a target feature would be better to enable/disable aggressive FMA. In AArch64.td there are already quite a lot of similar target features, like FeatureFuseAES or FeatureSlowSTRQro that enable certain optimizations for some micro architectures.
Comment Actions Updated diff with latest changes:
Comment Actions It looks like the latest changes address the earlier concerns expressed in the review comments:
Is there anything else that is needed? Comment Actions Thanks for updating the patch! I have a few suggestions about reducing fma-aggressive.ll , but besides that and removing aarch64-enable-aggressive-fma it looks quite good.
It's fine to have a more complex test with more patterns than test/CodeGen/AArch64/fma-simple.ll, but test/CodeGen/AArch64/fma-aggressive.ll still contains lots of stuff unrelated the FMA fusion, e.g. define double @reset(double %x, double %y) local_unnamed_addr #0 is not used, the attributes and debug metadata is not needed, the global variables are not needed, the calls to fprint/frwite and so on are not needed and the loop should have not have an impact on the DAGCombiner. I think the following test function should still expose patterns very similar to the original fma-aggressive.ll, but only focuses on the relevant bits for fusion: define double @test1(double %a, double %b, double %conv, i32 %rem) { entry: %conv.neg = fsub fast double -0.000000e+00, %conv %cmp3 = icmp eq i32 %rem, 0 %. = select i1 %cmp3, double 1.000000e+00, double -1.000000e+00 %add = fadd fast double %a, 1.000000e+00 %add8 = fadd fast double %add, %. %mul = fmul fast double %add8, %a %add9 = fadd fast double %mul, 1.000000e+01 %mul10 = fmul fast double %add9, 2.000000e+00 %add11 = fsub fast double %conv.neg, %mul %sub = fadd fast double %add11, %mul10 %mul14 = fmul fast double %sub, %mul %mul15 = fmul fast double %mul14, %. %sub17 = fadd fast double %mul15, %mul14 br i1 %cmp3, label %if.then22, label %if.else27 if.then22: %add23 = fadd fast double %sub17, -1.000000e+00 %sub24 = fadd fast double %sub17, 1.000000e+00 %mul25 = fmul fast double %add23, %sub24 %sub26 = fsub fast double 1.000000e+00, %mul25 br label %if.end32 if.else27: %sub28 = fsub fast double -1.000000e+00, %sub17 %sub29 = fadd fast double %sub17, 1.000000e+00 %mul30 = fmul fast double %sub28, %sub29 %add31 = fadd fast double %mul30, 1.000000e+00 br label %if.end32 if.end32: ; preds = %if.else27, %if.then22 %b.1 = phi double [ %sub26, %if.then22 ], [ %add31, %if.else27 ] %sub33 = fsub fast double 1.000000e+00, %sub17 %mul34 = fmul fast double %sub17, %conv %mul35 = fmul fast double %mul34, %sub33 %sub36 = fsub fast double %b.1, %. %add37 = fadd fast double %b.1, %mul35 %add38 = fadd fast double %add37, %sub36 ret double %add38 } If you think this is not sufficient for some reason, please at least remove the unneeded metadata, unused functions from the test case, replace the calls to fprintf & co that are needed to have uses for the FMA results with simpler functions (or return the result) and replace the globals by adding parameters if required. Also the other ifs should not matter to the combiner either, so I would simplify the CFG there as well.
pending simplification in fma-aggressive, I would move this test function to fma-aggressive.ll too.
Comment Actions
If there's only one EnableAggressiveFMA boolean in AArch64.td: What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example). Won't they have to call getProcFamily() in enableAggressiveFMAFusion()? As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type. Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types? Comment Actions My 2 cents: In general, we could think of probably thousands of subtarget features describing how different potential micro-architectures might handle specific instructions or sequences of instructions. Comment Actions Updated per latest comments:
Comment Actions LGTM with 2 small comments, I think all comments should be addressed now, thanks! Please wait with committing for another day or so, to give people some time to raise additional concerns. When committing, please adjust the commit title & message to mention the target feature rather than the option. And it would be great if you could prefix the title again with [AArch64]
|
How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's probably enough to just have -aggressive-fma for now.