This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Avoid sinking fdiv into a loop
AbandonedPublic

Authored by david-arm on Feb 14 2023, 2:06 PM.

Details

Summary

This an alternate to D87479 that uses LoopInfo when available
to more precisely target the problematic transform. If we
know the fmul lives in a loop then we will only sink the fdiv
if it's not loop invariant. Otherwise, if we don't have any
loop information, or the fmul is not in a loop we only combine
the fmul and fdiv if they are in the same block. Allowing the
transform early in the optimization pipeline (because there
may not be any LoopInfo yet) doesn't seem like a bad thing.

As discussed in D143631, there's a larger issue between
reassociation/combining/LICM because this doesn't solve a more
general problem that we can show in an example with no fdivs:
https://godbolt.org/z/xrs9xfGrf

...but this does manage to avoid the problem in the motivating
test for D143631.

Co-authored-by: Sanjay Patel <spatel@rotateright.com>

Diff Detail

Event Timeline

spatel created this revision.Feb 14 2023, 2:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2023, 2:06 PM
spatel requested review of this revision.Feb 14 2023, 2:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2023, 2:06 PM
nikic requested changes to this revision.Feb 14 2023, 11:35 PM

Before we do this, we should change InstCombine to accept a pass parameter that determines whether LoopInfo is used or not. Currently, we will use it if it happens to be cached, which is very fragile under the new pass manager. I think the particular motivating case will work because InstCombine is preceded by an AlignmentFromAssumption pass, which requires SCEV, which requires LoopInfo, and which preserves CFG. But in other pipeline positions we will randomly either use or not use LoopInfo, depending on whether passes in between happened to make a change or not.

Instead we should specify whether each specific InstCombine invocation should be using LoopInfo or not (probably based on guaranteed availability at that pipeline position).

This revision now requires changes to proceed.Feb 14 2023, 11:35 PM
david-arm commandeered this revision.Mar 22 2023, 2:30 AM
david-arm edited reviewers, added: spatel; removed: david-arm.

I have spoken with @spatel who said he is unlikely to have much time to progress this patch and he's happy for me to commandeer it. I would like to make progress on this because it is an important fix on AArch64 for the SPEC2017 benchmark parest.

@nikic do you have any thoughts about whether it's worth re-opening D144274 and potentially using the LoopInfo in more places to fix the compile-time issue you saw?

Hi @david-arm, I can see the value of adding LoopInfo and using that to avoid reversing some of LICM's decisions, but I believe this patch makes the decision the wrong way around.
Rather than making the combine unless it can prove that it doesn't reverse LICM, I'd rather see InstCombine not make the combine unless it can prove that it's not reversing LICM.

To clarify, at the moment InstCombine has the following behaviour:

(A) Combine the reciprocal + fmul => fdiv always

With this patch, the behaviour changes when LoopInfo is available:

(B) Combine the reciprocal + fmul => fdiv only if the reciprocal has not been hoisted (requires LoopInfo)

We could add a new, more conservative mode where InstCombine does the following:

(C) Combine the reciprocal + fmul => fdiv only if these live in the same basic block.

Then we can have the following behaviours:

  1. Default:
    • If LoopInfo is not available: (C)
    • If LoopInfo is cached: (B)
    • With this new default, it's no longer possible for InstCombine and LICM to conflict, because InstCombine is by default more conservative. This means that having forgotten to enable LoopInfo (when another instance of InstCombine is added to the pipeline) can't lead to regressions for this case.
  2. Explicit option to force canonical form regardless of LoopInfo: (A)
    • This way it's still possible to run InstCombine in the mode it currently runs.
    • InstCombine can be forced to run in this mode earlier on in the pipeline before LICM has run.
  3. Explicit option to force availability of LoopInfo (i.e. this patch as it is now): (B)
    • I'm not sure how useful this option is in practice, but at least it allows InstCombine to always make the most informed decision, regardless of LoopInfo being intact from a previous pass.

What do you think?

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
663–669

nit: there is little value in creating a lambda and using it only once?

Matt added a subscriber: Matt.Apr 24 2023, 2:06 PM
david-arm updated this revision to Diff 521590.May 12 2023, 2:46 AM
david-arm retitled this revision from [InstCombine] avoid sinking fdiv into a loop to [InstCombine] Avoid sinking fdiv into a loop.
david-arm edited the summary of this revision. (Show Details)
  • Address review comments by only applying the transformation when either a) we know the fmul lives in a loop and the fdiv is loop invariant, or b) when we know the fmul and fdiv live in the same block.
david-arm marked an inline comment as done.May 12 2023, 2:47 AM

Did you do any performance runs to check if there were regressions? If so, we'd need to change some of the InstCombine instances to run in a different mode (suggestion (2)) by explicitly forcing the canonical form regardless of LoopInfo. If not, I'm not sure if such an option is still worth adding.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
663–669

Is this equivalent to:

bool ShouldSink = L ? !L->isLoopInvariant(FDiv) : cast<Instruction>(FDiv)->getParent() == I.getParent();

?

david-arm updated this revision to Diff 521641.May 12 2023, 7:08 AM
  • Refactored logic.
david-arm marked an inline comment as done.May 12 2023, 7:13 AM

Did you do any performance runs to check if there were regressions? If so, we'd need to change some of the InstCombine instances to run in a different mode (suggestion (2)) by explicitly forcing the canonical form regardless of LoopInfo. If not, I'm not sure if such an option is still worth adding.

I didn't see any change in performance when running SPEC2017 on neoverse-v1 and on a X86 machine. On neoverse-v1 I also ran some additional HPC benchmarks and some LLVM test suite workloads without seeing any regressions.

sdesmalen accepted this revision.May 12 2023, 7:39 AM

Thanks for the refactor and collecting some benchmark stats @david-arm!

The patch LGTM. @nikic are you happy to accept it as well?

nikic added a comment.May 15 2023, 3:41 AM

My general inclination would be not to do this -- I spent some time looking into it, and I think we would be better off dropping the LoopInfo dependency from InstCombine. Unless we actually want to make it a hard dependency, this is always going to be fragile. It also only addresses one pretty specific case (though an important one, of course), while InstCombine potentially sinking instructions into loops is a generic problem.

My preference for solving this would be to enable the final LICM run for the full LTO pipeline instead. I hate to do this because it has significant compile-time impact (I've been working on reducing LICM compile-time, and it is now much cheaper than when this was first discussed years ago, but it's still fairly expensive). But I think this is the right thing to do despite that. My longer-term goal would be to unify the thin LTO and full LTO optimization pipelines to the degree it is possible, and adding that final LICM run would be required for that as well. As long as we have the pipeline differences, we're going to see these kinds of optimization failures in full LTO, simply because the major production users all use thin LTO and full LTO nowadays tends to be an afterthought only.

david-arm abandoned this revision.Jul 10 2023, 2:36 AM