This is an archive of the discontinued LLVM Phabricator instance.

[LoopSink] Allow sinking to PHI-use
ClosedPublic

Authored by wenlei on Jun 12 2023, 9:29 PM.

Details

Summary

This change allows sinking defs from loop preheader with PHI-use into loop body. Loop sink can now see through PHI-use and select incoming blocks of value being used as candidate sink destination.

It makes loop sink more effective so more LICM can be undone if proven unprofitable with profile info. It addresses the motivating case in D87551, without resorting to profile guided LICM which breaks canonicalization.

Diff Detail

Event Timeline

wenlei created this revision.Jun 12 2023, 9:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2023, 9:29 PM
wenlei requested review of this revision.Jun 12 2023, 9:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2023, 9:29 PM
nikic requested changes to this revision.Jun 13 2023, 1:17 AM

Looks reasonable conceptually. Could be extended with critical edge splitting, but not sure how profitable that would be.

llvm/lib/Transforms/Scalar/LoopSink.cpp
182

Might it make sense to move the phi node handling before this check? It should be fine to sink an instruction that is used outside the loop into the corresponding exiting block (i.e. the predecessor of the lcssa phi node), and this may be profitable if the exiting block is cold.

194

You don't need this loop, use PN->getIncomingBlock(U) instead.

llvm/test/Transforms/LICM/loopsink-phi.ll
2

aa-pipeline unnecessary

10

Use update_test_checks.py.

12

Remove !dbg metadata and name instructions in test.

This revision now requires changes to proceed.Jun 13 2023, 1:17 AM
wenlei marked 3 inline comments as done.Jun 13 2023, 8:48 AM
wenlei added inline comments.
llvm/lib/Transforms/Scalar/LoopSink.cpp
182

I was thinking about that too. But with some digging I found this was considered in the original loop sink patch (D22778) and then was decided to keep it strictly a loop sink pass, instead of general sinking. Perhaps makes sense to keep it that way.

Original comment/reply:

Isn't it still okay to try to sink in outside the loop if the user block is cold enough?

That would become general purpose sinking instead of loop-sinking. And we need to handle alias/invariant differently.

194

We need to find all incoming blocks of a value, but PN->getIncomingBlock(U) only return one (first) instance? The loop doesn't "break" when a value is found.

E.g. for value %326, we need to find the 4 incoming blocks for all 4 uses in the PHI.

%343         = phi i64 [ %326, %338 ], [ %291, %305 ], [ %326, %332 ], [ %326, %334 ], [ %326, %337 ]
wenlei updated this revision to Diff 530925.Jun 13 2023, 8:49 AM

address comments

nikic added inline comments.Jun 13 2023, 9:45 AM
llvm/lib/Transforms/Scalar/LoopSink.cpp
194

The outer loop already iterates over all uses. The other phi operands will be visited by it later.

llvm/test/Transforms/LICM/loopsink-phi.ll
49

This !prof metadata probably not needed. In fact, the entire call looks unnecessary, can just be an arg.

143

Drop dead debug metadata.

wenlei marked 2 inline comments as done.Jun 13 2023, 10:24 AM
wenlei added inline comments.
llvm/lib/Transforms/Scalar/LoopSink.cpp
194

Ah, I see what you meant. Changed - thanks.

wenlei updated this revision to Diff 530973.Jun 13 2023, 10:25 AM

address comments

nikic accepted this revision.Jun 13 2023, 11:18 AM

LGTM

llvm/test/Transforms/LICM/loopsink-phi.ll
49

You probably don't need this check and can just make .j.check.preheader the entry block?

99

Can drop this one from the module flags.

130

!31 is unused.

This revision is now accepted and ready to land.Jun 13 2023, 11:18 AM
wenlei marked 3 inline comments as done.Jun 13 2023, 12:13 PM

Appreciate the quick review!

llvm/test/Transforms/LICM/loopsink-phi.ll
49

indeed, removed bb and ret block.

wenlei updated this revision to Diff 531026.Jun 13 2023, 12:13 PM
wenlei marked an inline comment as done.

simplify tests

This revision was landed with ongoing or failed builds.Jun 13 2023, 1:07 PM
This revision was automatically updated to reflect the committed changes.

Heads-up: we are noticing some clang crashes due to this commit. We are currently working on a reproducer and will post as soon as we get one.

alexfh added a subscriber: alexfh.Jun 21 2023, 7:35 AM

We're still trying to get a reproducer, but it's somewhat complicated given that the issue happens in ThinLTO builds. The problem manifests as:

Instruction does not dominate all uses!
  %1480 = getelementptr inbounds %"class.absl::AlphaNum", ptr %24, i64 0, i32 1
  %1478 = phi ptr [ %1476, %1474 ], [ %1480, %1472 ]
fatal error: error in backend: Broken module found, compilation aborted!

Please revert while we're preparing a test case!

We're still trying to get a reproducer, but it's somewhat complicated given that the issue happens in ThinLTO builds. The problem manifests as:

Instruction does not dominate all uses!
  %1480 = getelementptr inbounds %"class.absl::AlphaNum", ptr %24, i64 0, i32 1
  %1478 = phi ptr [ %1476, %1474 ], [ %1480, %1472 ]
fatal error: error in backend: Broken module found, compilation aborted!

Please revert while we're preparing a test case!

I reverted the commit in c96c85aba28089ec840a867e289782831e9637ee. We'll post a reduced test case as soon as we have one.

$ cat /tmp/a.ll
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-grtev4-linux-gnu"

%struct.blam = type { %struct.blam.0, [32 x i8] }
%struct.blam.0 = type { ptr, i64 }

define internal void @wibble() !prof !0 {
bb:
  %getelementptr = getelementptr %struct.blam, ptr null, i64 0, i32 1
  br label %bb1

bb1:                                              ; preds = %bb7, %bb3, %bb
  br i1 false, label %bb2, label %bb3

bb2:                                              ; preds = %bb1
  ret void

bb3:                                              ; preds = %bb4, %bb1
  switch i32 0, label %bb5 [
    i32 1, label %bb4
    i32 0, label %bb1
  ], !prof !1

bb4:                                              ; preds = %bb3
  br i1 false, label %bb3, label %bb6

bb5:                                              ; preds = %bb3
  br i1 false, label %bb6, label %bb7

bb6:                                              ; preds = %bb5, %bb4
  br label %bb7

bb7:                                              ; preds = %bb6, %bb5
  %phi = phi ptr [ null, %bb6 ], [ %getelementptr, %bb5 ]
  store ptr %getelementptr, ptr null, align 8
  br label %bb1
}

!0 = !{!"function_entry_count", i64 1}
!1 = !{!"branch_weights", i32 1, i32 188894, i32 287400}
$ opt -p loop-sink /tmp/a.ll
Instruction does not dominate all uses!                                                                                                                                                                                                                                                   
  %getelementptr1 = getelementptr %struct.blam, ptr null, i64 0, i32 1                                                                                                                                                                                                                    
  %phi = phi ptr [ null, %bb6 ], [ %getelementptr1, %bb5 ]                                                                                                                                                                                                                                

Oops, sorry for the breakage. Thanks for the repro, I will take a look.

wenlei added inline comments.Jun 23 2023, 9:08 AM
llvm/lib/Transforms/Scalar/LoopSink.cpp
257

When PHI-use is allowed, we also need a tweak here, otherwise the reported ICE could happen.

return cast<Instruction>(U.getUser())->getParent() == N;

-->

return cast<Instruction>(U.getUser())->getParent() == N &&
             !isa<PHINode>(cast<Instruction>(U.getUser()));

This is because the code assumed inserted clone is before all uses in the block, which is not true for PHI-use. The PHI-use is being taken care of by defs in its incoming blocks so we don't need to update PHI-use.

I will reland with this fix included.