This is an archive of the discontinued LLVM Phabricator instance.

[Phase Ordering] Don't speculate in SimplifyCFG before PGO annotation
Needs ReviewPublic

Authored by tejohnson on Jul 21 2023, 2:33 PM.

Details

Reviewers
aeubanks
nikic
Summary

SimplifyCFG contains a number of transformations that perform
speculation, and will use profile data if it exists in the IR.
However, the first invocations of SimplifyCFG are before both
instrumented and sample PGO annotation. We found cases where
unprofitable speculation was performed in these early invocations, when
subsequent PGO profile data would have prevented it.

Disable the SpeculateBlocks flag before PGO annotation, in compiles
where PGO is provided.

This change resulted in noticeable and significant performance
improvements in multiple large internal codes.

Finally, remove an old test that had a difference with this change
(clang/test/CodeGen/pgo-sample-preparation.c), however, was testing
a pipeline configuration that no longer exists in the same way as when
the test was added. The code change being checked does not seem
indicative of a performance issue as the critical path remains the same,
and I could not find any description of a performance issue with the
original commit.

Diff Detail

Event Timeline

tejohnson created this revision.Jul 21 2023, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2023, 2:33 PM
tejohnson requested review of this revision.Jul 21 2023, 2:33 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 21 2023, 2:33 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

the pipeline change and simplifycfg change should be split into two changes

llvm/lib/Passes/PassBuilderPipelines.cpp
277

can we just drop the flag and make this change?

llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll
3 ↗(On Diff #543071)

these checks won't work for a update_test_checks.py test. should add tests like the one added in https://reviews.llvm.org/D153391

llvm/test/Transforms/SimplifyCFG/pipeline-delay-speculation-pgo.ll
1 ↗(On Diff #543071)

do you have a phase ordering test instead?

tejohnson added inline comments.Jul 24 2023, 3:50 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
277

Ok, I will remove.

llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll
3 ↗(On Diff #543071)

Can you clarify - are you saying remove the changes from this test and add a new test for this change like llvm/test/Transforms/SimplifyCFG/speculate-blocks.ll, or change the checks here?

llvm/test/Transforms/SimplifyCFG/pipeline-delay-speculation-pgo.ll
1 ↗(On Diff #543071)

Can you clarify what you are referring to?

aeubanks added inline comments.Jul 24 2023, 3:56 PM
llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll
3 ↗(On Diff #543071)

revert the changes here (update_test_checks.py tests shouldn't have manually added CHECK lines) and add new function(s) to llvm/test/Transforms/SimplifyCFG/speculate-blocks.ll that show the difference between speculate-blocks and no-speculate-blocks

llvm/test/Transforms/SimplifyCFG/pipeline-delay-speculation-pgo.ll
1 ↗(On Diff #543071)

llvm/test/Transforms/PhaseOrdering has IR tests that get run through an entire pipeline (e.g. thinlto-pre-link<O3>) to verify some property of the entire pipeline. ideally you can get some somewhat-reduced IR that exhibits the problem you were seeing before the change and show that it improves with the pipeline change. you'll need a profile as part of that test I suppose

the pipeline change and simplifycfg change should be split into two changes

SimplifyCFG change split into D156194.

tejohnson marked 2 inline comments as done.Jul 25 2023, 2:59 PM
tejohnson updated this revision to Diff 544109.Jul 25 2023, 3:00 PM

Address comments

tejohnson edited the summary of this revision. (Show Details)Jul 25 2023, 3:00 PM
aeubanks added inline comments.Jul 28 2023, 10:32 AM
llvm/test/Transforms/PhaseOrdering/simplifycfg-speculate-blocks.ll
2

hmm typically these phase ordering tests use llvm/utils/update_test_checks.py, but that doesn't support the llvm-profdata RUN line. I think a non-update_test_checks test is probably fine for this, @nikic does that make sense?

23

lto-pre-link

84

the debug info is necessary for sample profile to work I presume?

I think some of this can still be simplified, like PIC Level, etc

I'm not a fan of the PGO conditional behavior here. I'd prefer to disable speculation unconditionally if that is feasible. This is basically what D153392 did. While the specific test case there was addressed in a different way, if we take it in conjunction with this one, I think we have enough motivation to do the change.

Another piece of motivation would be that the EarlyFPM change would allow us to move LowerExpectIntrinsic to the end of EarlyFPM again, which would make expect intrinsics work in Rust again. The pass was moved earlier to make the first SimplifyCFG run take them into account, but that would no longer be a problem if we disable speculation.

And another piece of motivation would be that we could move SimplifyCFG after SROA, where it can be much more effective (but doing so with speculation breaks IPSCCP too much). I think doing that needs some further changes, but it's a move in the right direction.

So I think the EarlyFPM change at least seems like something we should pretty clearly do independently of PGO. The addPGOInstrPasses() change is of course PGO specific anyway. I think only the GlobalCleanupPM change may not be completely clear cut.

llvm/test/Transforms/PhaseOrdering/simplifycfg-speculate-blocks.ll
2

I thought UTC supports unrecognized commands (by just executing them)... If not, a non-UTC test is fine.

I'm not a fan of the PGO conditional behavior here. I'd prefer to disable speculation unconditionally if that is feasible. This is basically what D153392 did. While the specific test case there was addressed in a different way, if we take it in conjunction with this one, I think we have enough motivation to do the change.

It looks like D153392 does roughly the same disabling in practice, but you are right it doesn't gate on whether we will eventually feed back profile info. For the issue I was trying to solve, it was because of the PGO based profitability checks in SimplifyCFG speculation, but if it makes sense in other cases to disable it until later, then that works for me. BTW doesn't look like that patch disables speculation in addPGOInstrPasses, which we need for the issue I'm trying to solve.

Another piece of motivation would be that the EarlyFPM change would allow us to move LowerExpectIntrinsic to the end of EarlyFPM again, which would make expect intrinsics work in Rust again. The pass was moved earlier to make the first SimplifyCFG run take them into account, but that would no longer be a problem if we disable speculation.

And another piece of motivation would be that we could move SimplifyCFG after SROA, where it can be much more effective (but doing so with speculation breaks IPSCCP too much). I think doing that needs some further changes, but it's a move in the right direction.

So I think the EarlyFPM change at least seems like something we should pretty clearly do independently of PGO. The addPGOInstrPasses() change is of course PGO specific anyway. I think only the GlobalCleanupPM change may not be completely clear cut.

For the case I was looking at, with instrumentation PGO, any of the pre-feedback SimplifyCFG speculation passes would do the wrong thing without that PGO data.

tejohnson marked 2 inline comments as done.Jul 28 2023, 6:06 PM
tejohnson added inline comments.
llvm/test/Transforms/PhaseOrdering/simplifycfg-speculate-blocks.ll
84

Right, that's need for SamplePGO. Removed the other stuff that wasn't necessary.

tejohnson updated this revision to Diff 545318.Jul 28 2023, 6:06 PM
tejohnson marked an inline comment as done.

Address test comments

can we try not gating this on PGO as suggested? minimizing differences between pipelines is nice, and as mentioned it'll help with other cases

llvm/test/Transforms/PhaseOrdering/simplifycfg-speculate-blocks.ll
2

update_test_checks.py doesn't handle %t (maybe that's an improvement we could make to the script)

can we try not gating this on PGO as suggested? minimizing differences between pipelines is nice, and as mentioned it'll help with other cases

Sorry for the delay, was OOO for a bit and this ended up being a bit more complicated. Since I added a change to guard branch folding with the simplifyBlocks flag in D156194, there were additional tests that had differences that I have been trying to analyze and mitigate, with partial but not complete success (i.e. additional to what was seen when @aeubanks tried this in D153392):

  1. llvm/test/CodeGen/Hexagon/loop-idiom/pmpy-mod.ll

The Hexagon loop idiom recognition pass is a little rigid in the detected code sequences, which were different because once we did the branch folding in a later SimplifyCFG invocation, followed by InstCombine, there was no EarlyCSE before the hexagon passes to do the cleanup required to get the old sequence. I can fix this by adding an invocation of EarlyCSE after those passes in buildFunctionSimplificationPipeline. There were no test changes from that fix, other than some tests that check the order of passes in each pipeline.

  1. llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

Not all function in this test are affected, but several are affected due to the phase ordering, specifically we stopped doing some of the branch folding completely which blocked some of the vectorization handling. In one of the functions I examined I found this was due to the fact that an EarlyCSE pass before we reran SimplifyCFG with speculation enabled did some commoning that blocked branch folding due to a limitation in the handling there. Specifically, the one added in https://reviews.llvm.org/rG909cba969981032c5740774ca84a34b7f76b909b. Undoing that does make the branch folding kick in more in this test, but isn't correct in general (it sounds like that patch was submitted to address a problem that was difficult to solve more optimally). In fact, that change had some similar affects on this test to my change, but there have been quite a few changes impacting this test over the past few years.

  1. clang/test/Headers/__clang_hip_math.hip

Some fairly superficial diffs that don't appear important to the test.

Thoughts on how should I proceed? I can send the simple change I did for 1) above separately, since it doesn't seem to do any harm and presumably EarlyCSE is cheap. For 2) however, I'm not sure how much effort I should invest to get the test to behave exactly the same as before. There is clearly a limitation with SimplifyCFG that has been looked at before but isn't easy to solve.

llvm/test/Transforms/PhaseOrdering/simplifycfg-speculate-blocks.ll
2

I'm not sure what the alternative is here to using %t - do we need make the test support update_test_checks.py?