Page MenuHomePhabricator

[Reassociate] [PowerPC] stop common out mul factors if fma is preferred on target
AbandonedPublic

Authored by shchenz on Aug 7 2020, 12:29 AM.

Details

Reviewers
lebedev.ri
spatel
efriedma
qcolombet
Whitney
jsji
Group Reviewers
Restricted Project
Summary

For the case A * B + A * C, now, reassociate pass(function OptimizeAdd) will change it to A * (B + C) to save one mul as A is a common mul factor .

But transformation like above has no benefit at all on PowerPC target. In fact, if target prefers fma, it generates worse IR.

Because on PowerPC target:
A * B + A * C can be generated as fma(fmul(A,B), A, C);
A * (B + C) can be generated as fmul(A, fadd(B, C));

fma, fmul, fadd, fsub all have same latency on PowerPC arch, so no cpu cycle benefit.
Reducing number of mul also makes number of fma reduce. So this is not a benefit transformation on PowerPC.

This patch tries to bail out this opt early to expose more fma folding opportunities and save some compile time on PowerPC target.

Diff Detail

Event Timeline

shchenz created this revision.Aug 7 2020, 12:29 AM
shchenz requested review of this revision.Aug 7 2020, 12:29 AM
shchenz updated this revision to Diff 283829.Aug 7 2020, 12:39 AM

(can you please maybe spellcheck both the patch description and the wording within the patch? it is really hard to read)

I'm not really sure this is the way forward.

  1. This is a target-agnostic pass, it we add arbitrary restrictions it will likely miss some intended transforms
  2. As usual, just preventing some transform is not the solution, because you can still have "bad" input that will still be lowered badly

IOW,

  1. why not reverse this, as usual, in DAGCombine
  2. this needs something like D67383 Tree-Height-Reduction, i guess, for more general reverse transform
shchenz edited the summary of this revision. (Show Details)Aug 7 2020, 1:02 AM
shchenz edited the summary of this revision. (Show Details)
shchenz added a comment.EditedAug 7 2020, 1:11 AM

(can you please maybe spellcheck both the patch description and the wording within the patch? it is really hard to read)

Done

I'm not really sure this is the way forward.

  1. This is a target-agnostic pass, it we add arbitrary restrictions it will likely miss some intended transforms

Could you be more specific? I verified this change on PowerPC for some benchmarks, no degradation found.

  1. As usual, just preventing some transform is not the solution, because you can still have "bad" input that will still be lowered badly

Agree. But if we know that one transformation has no benefit at all or it has a negative impact on some specific targets, should we stop the transformation on these targets? Reversing the transformations back in later pass will make compiler spend compiling time both on the harmful transformations and on the reversed transformations? also I am worried about the feasibility of reversing this "complex" transformation in later pass like DAGCombine...

shchenz edited the summary of this revision. (Show Details)Aug 7 2020, 1:14 AM

I'm not sure why i'm added as a reviewer on this patch, i don't work on ppc, i never used it, and i never really worked on that pass.

(can you please maybe spellcheck both the patch description and the wording within the patch? it is really hard to read)

Done

I'm not really sure this is the way forward.

  1. This is a target-agnostic pass, it we add arbitrary restrictions it will likely miss some intended transforms

Could you be more specific? I verified this change on PowerPC for some benchmarks, no degradation found.

Perhaps something like a*b * c*d + e*f * c*d, surely it should be (a*b + e*f) * c * d,
but won't this patch keep it as a*b*c*d + e*f*c*d?

  1. As usual, just preventing some transform is not the solution, because you can still have "bad" input that will still be lowered badly

Agree. But if we know that one transformation has no benefit at all or it has a negative impact on some specific targets, should we stop the transformation on these targets? Reversing the transformations back in later pass will make compiler spend compiling time both on the harmful transformations and on the reversed transformations? also I am worried about the feasibility of reversing this "complex" transformation in later pass like DAGCombine...

llvm/lib/Transforms/Scalar/Reassociate.cpp
1500

So we also prevent the transform for integers now?
But there's no FMA for them?

shchenz planned changes to this revision.Aug 7 2020, 2:07 AM

Good catch for the a*b * c*d + e*f * c*d case, Roman. That's a degradation after this patch. Thanks for your good review. ^-^

spatel added a comment.Aug 7 2020, 7:50 AM

The reassociate pass (like instcombine) is supposed to be a target-independent canonicalization pass:

  // This pass reassociates commutative expressions in an order that is designed
  // to promote better constant propagation, GCSE, LICM, PRE, etc.

So it's intentionally avoiding using TTI/TLI and trying not to do all forms of reassociation. As Roman mentioned, see D67383 as an example of a possible TTI-aware reassociation pass.

So either we need to reverse this in DAGCombine and/or PPC-specific codegen, or we need to make an argument that the canonicalization does not make sense in general (for FP only?).

There seems to be some (weak) correlation between this and https://reviews.llvm.org/D84309 which I am reworking to remove it from canonicalization. It may be worth looking into whether we can catch both issues in CGP.

Thanks @spatel @nemanjai for your comments. So it should be a common-sense that we should not add TTI/TLI hook inside Reassociate Pass which was already kindly pointed out by @lebedev.ri . I have learned this now ^_^.
I will investigate all the suggested places to do the reversal (DAGCombine, other PPC specific pass or CGP) later.

shchenz abandoned this revision.Thu, Sep 10, 10:11 PM

we will use other solution for this issue.