This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability
AcceptedPublic

Authored by chill on Feb 13 2023, 2:50 AM.

Details

Summary

When checking the profitability of folding an address computation
into a memory instruction, the compiler tries to determine the liveness
of the values, comprising the address, at the point of the memory instruction.
This patch improves on the live variable estimates by including
the loop invariants which are references in the loop body.

Diff Detail

Event Timeline

chill created this revision.Feb 13 2023, 2:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:50 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
chill requested review of this revision.Feb 13 2023, 2:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:50 AM
chill retitled this revision from [CodeGenPrepare] Loop invariant liveness to [CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability.Feb 13 2023, 6:52 AM
chill edited the summary of this revision. (Show Details)
chill updated this revision to Diff 503841.Mar 9 2023, 10:35 AM
chill updated this revision to Diff 509399.Mar 29 2023, 9:32 AM

Chages: make the limit on loop invariant users to scan a parameter

dmgreen added inline comments.
llvm/lib/CodeGen/CodeGenPrepare.cpp
274

-> loop invariant

5124

"If the value is loop invariant"

chill updated this revision to Diff 515008.Apr 19 2023, 10:06 AM
chill marked 2 inline comments as done.
dmgreen accepted this revision.Apr 20 2023, 5:54 AM

I ran some tests and this showed some nice improvements. LGTM, thanks.

This revision is now accepted and ready to land.Apr 20 2023, 5:54 AM
This revision was landed with ongoing or failed builds.Apr 24 2023, 2:26 AM
This revision was automatically updated to reflect the committed changes.

This discovers unstable behavior.
For me, MCA/ResourceManaget.o sometimes differs.
Reproduced by stage1 clang.

Something sensitive might be in LoopInfo.

llvm/lib/CodeGen/CodeGenPrepare.cpp
5127

L is unstable.

This discovers unstable behavior.
For me, MCA/ResourceManaget.o sometimes differs.
Reproduced by stage1 clang.

Same here, I also bisected to this commit. I found it separately via CodeGen/PrologEpilogInserter.o which differs ~10-20% of the time. The minimal set of flags to reproduce it with (for me at least) is -O3 -fno-exceptions.

chill reopened this revision.Apr 28 2023, 1:43 AM

Thank you, I'll have a look.

This revision is now accepted and ready to land.Apr 28 2023, 1:43 AM
chill planned changes to this revision.Apr 28 2023, 1:43 AM

@chill, are you able to reproduce this?

I was reducing this last night so that I could give you a .ll test case that exhibits this behavior, but that was derailed by something and I have to start over. If you can already reproduce it yourself, I'll just stop the process.

chill added a comment.Apr 28 2023, 7:57 AM

@chill, are you able to reproduce this?

I was reducing this last night so that I could give you a .ll test case that exhibits this behavior, but that was derailed by something and I have to start over. If you can already reproduce it yourself, I'll just stop the process.

No, I don't have anything (I've not done anything for this problem either, I'm busy elsewhere).

Any input/data would be greatly appreciated!

I'm not making very good progress with a test case. So far the only interesting insights I may have:

  • This only reproduces in optimized builds. Setting -UNDEBUG makes the problem go away. Maybe debug builds are normalizing something?
  • The problem obvious goes away with -mllvm -cgp-max-loop-inv-users-to-scan=0. One of my guesses was that V->uses() ordering might be non-deterministic, so you could get different answers in isUsedInLoop if the loop is terminated before you visit the instruction that would cause you to return true. That is not the case: the problem still exists with -mllvm -cgp-max-loop-inv-users-to-scan=10000.

Otherwise, if I do the usual process of getting clang to dump the IR and pipe that into opt, or various things like that, the process is deterministic. If you have any guess for what might get a reproducer, I can give it a shot.

All the operations involved here should be deterministic. You're probably looking for a bug elsewhere. One possibility is that LoopInfo doesn't contain the correct information... for example, the CFG changes, but that change isn't recorded in LoopInfo.

All the operations involved here should be deterministic. You're probably looking for a bug elsewhere. One possibility is that LoopInfo doesn't contain the correct information... for example, the CFG changes, but that change isn't recorded in LoopInfo.

Ack. To clarify, I'm fairly confident this patch _causes_ non-determinism, as I can bisect to this commit and reproduce it, and not at the commit prior to it, etc. But of course this might be doing a valid, deterministic transformation that triggers an existing non-determinism bug later on, or something like that.

chill added a comment.May 3 2023, 6:47 AM

Reproduced with a stage2 compiler when compiling llvm/lib/MCA/HardwareUnits/ResourceManager.cpp.

chill updated this revision to Diff 520757.May 9 2023, 10:42 AM

Minor update on top of D150210

This revision is now accepted and ready to land.May 9 2023, 10:42 AM
chill marked an inline comment as done.May 9 2023, 10:47 AM
rupprecht accepted this revision.May 9 2023, 3:53 PM

No non-determinism detected with this on top of D150210, so LGTM. Thanks!

chill updated this revision to Diff 521367.May 11 2023, 10:35 AM
chill updated this revision to Diff 530938.Jun 13 2023, 9:20 AM

Rebase/refresh, no changes