This is an archive of the discontinued LLVM Phabricator instance.

[Pipeline] Remove GlobalCleanupPM
AcceptedPublic

Authored by aeubanks on Mar 3 2023, 12:33 PM.

Details

Summary

This is mostly unnecessary given we're about to run the simplification pipeline anyway.

A couple of remarks, some to do with not regressing PhaseOrdering tests.
The main issue with just removing GlobalCleanupPM is that we no longer run InstCombine before the inliner.
Another issue is that CGSCC passes before the function simplification pipeline may have less precise IR to work with.
SROA -> EarlyCSE -> InstCombine -> SimplifyCFG should be the passes at the top of the function simplification pipeline.
InstCombine/SimplifyCFG need to run before JumpThreading, so move JumpThreading after those.
This also adds an extra EarlyCSE in the -O1 pipeline to match where GVN runs in other pipelines, otherwise we never run EarlyCSE after InstCombine.

Fairly substantial compile time wins:
https://llvm-compile-time-tracker.com/compare.php?from=a5595e9f1feb5960132ffcef539ab425abbe97cc&to=2baa33baa9bfa722329a728a100610daba60c559&stat=instructions:u

IR diffs on llvm-test-suite:
https://github.com/aeubanks/llvm-ir-diff/commit/f833481abce7d2310c84029a5df08d4d9bd943e5
The diffs mostly look neutral except for some gcc torture suite files which are arguably unreasonable.

This regresses D117091 because we now do not run instcombine before the inliner, losing return value alignment info.
This regresses pr52289.ll which was a fuzzed test case to begin with (and never properly fixed).

Diff Detail

Event Timeline

aeubanks created this revision.Mar 3 2023, 12:33 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: hiraditya. · View Herald Transcript
aeubanks requested review of this revision.Mar 3 2023, 12:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2023, 12:33 PM
aeubanks retitled this revision from [Pipeline] Remove GlobalCleanupPM to [WIP][Pipeline] Remove GlobalCleanupPM.Mar 3 2023, 12:34 PM

D145210 fixes cgscc-devirt-iteration.ll

@spatel @anton-afanasyev for the instcombine issue

I see some regressions on internal benchmarks from this, investigating those

llvm/test/Transforms/PhaseOrdering/X86/pr52289.ll
9

this is an instcombine issue related to https://github.com/llvm/llvm-project/issues/51631 which has some bugs split out from it that are still open

aeubanks updated this revision to Diff 507053.Mar 21 2023, 10:51 AM

rebase, update tests

aeubanks retitled this revision from [WIP][Pipeline] Remove GlobalCleanupPM to [Pipeline] Remove GlobalCleanupPM.Mar 22 2023, 1:12 PM
aeubanks added inline comments.
llvm/test/Transforms/PhaseOrdering/X86/pr52289.ll
9

https://github.com/llvm/llvm-project/issues/51744 and https://github.com/llvm/llvm-project/issues/51748 are the open issues for this. is it ok to regress this for now given that the original motivation for this was found via fuzzing and not real world code?

aeubanks updated this revision to Diff 508167.Mar 24 2023, 11:04 AM

add FIXME to phase ordering test

I think this (removing GlobalCleanupPM) makes sense at a high level, but this is a pretty substantial pipeline change and will need more thorough evaluation (e.g. looking at IR diffs on test suite). For example, it is not at all obvious to me that moving InstCombine before EarlyCSE is harmless, given the impact this may have on one-use heuristics.

got IR diffs for llvm-test-suite by dumping the IR in clang after running the optimization pipeline (-O3), IR file name is md5 of the module identifier

@@@ -1093,6 -1093,6 +1096,16 @@@ void EmitAssemblyHelper::EmitAssembly(B
  
    std::unique_ptr<llvm::ToolOutputFile> ThinLinkOS, DwoOS;
    RunOptimizationPipeline(Action, OS, ThinLinkOS);
++  if (!TheModule->getModuleIdentifier().empty()) {
++    std::error_code EC;
++    std::string FileName = "/usr/local/google/home/aeubanks/tmp/wow3/";
++    FileName += llvm::utohexstr(llvm::MD5Hash(TheModule->getModuleIdentifier()), false, 16);
++    FileName += ".ll";
++    llvm::raw_fd_ostream OS(FileName, EC);
++    if (EC)
++      report_fatal_error(createStringError(EC, "???"));
++    TheModule->print(OS, nullptr);
++  }
    RunCodegenPipeline(Action, OS, DwoOS);

all IR diffs: https://github.com/aeubanks/llvm-ir-diff/commit/ca19575407a94933d10e9b493784cce47751d0aa
diffs with <50 lines changed: https://github.com/aeubanks/llvm-ir-diff/commit/5592d7e71191501b6029af1daa1ee438076f4913

I'm going to be out for a couple weeks, but a couple initial observations:
4508D45179E820C5.ll: we don't run instcombine anymore before getting to the function simplification pipeline, and that causes argpromo to fail. perhaps we should add back in an early instcombine somewhere. (but the forced noinline in the file prevents obvious inlining)
1A365B06B0D46C86.ll: select i1 %37, i32 %38, i32 %35 becomes call i32 @llvm.smin.i32(i32 %36, i32 4095)
1ACE620D6E75417F.ll: extra memset
A4B000DC4289863A.ll: missed used once global -> alloca

will see if running an early instcombine resolves most of these issues

Nit: the diffs would be easier to analyze with "-fno-discard-value-names".

aeubanks updated this revision to Diff 513731.Apr 14 2023, 1:35 PM
aeubanks edited the summary of this revision. (Show Details)

fix clang test

Nit: the diffs would be easier to analyze with "-fno-discard-value-names".

done, see commit description

the IR diffs look pretty good now, I'm not seeing any major regressions

this regresses clang/test/Headers/mm_malloc.c (added here) because we now don't run instcombine before the inliner, and we lose return value alignment info when inlining. perhaps this is still a tradeoff worth making

aeubanks updated this revision to Diff 523175.May 17 2023, 2:28 PM

rebase

keep the existing EarlyCSE/InstCombine ordering

nikic added inline comments.May 17 2023, 2:42 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
419

Why the extra EarlyCSE pass in the O1 pipeline?

495

Any particular reason why InstCombine and AggressiveInstCombine are no longer directly next to each other?

aeubanks updated this revision to Diff 523182.May 17 2023, 2:48 PM
aeubanks edited the summary of this revision. (Show Details)

update commit message

aeubanks added inline comments.May 17 2023, 2:50 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
419

forgot to update the commit message, see that

495

we need InstCombine -> SimplifyCFG, so swapped them (see updated commit message). I don't necessarily see why AIC needs to be right after IC

asbirlea accepted this revision.May 24 2023, 2:50 PM

This went through a few rounds of testing on our internal benchmarks and it's at a point where there are no meaningful run-time regressions observed, but the compile-time improvements remain and are significant enough to move fwd with the change.
Approving to move things forward. If folks do encounter regressions from this change before or after landing, please flag them here.

This revision is now accepted and ready to land.May 24 2023, 2:50 PM
nikic added a comment.May 25 2023, 2:54 AM

Unfortunately I'm seeing a number of Rust optimization regressions with this change. I'll try to reduce those to some PhaseOrdering tests.

nikic added a comment.May 25 2023, 3:05 AM

Here are two inputs that currently fold down to constants and no longer do so with this patch: https://gist.github.com/nikic/4a714ea550bf2252543570585f642af2 These need further reduction for PhaseOrdering tests, but should be a good starting point for analysis...

@nikic could you try running this over the rust tests again?

nikic added a comment.Jul 7 2023, 1:29 AM

@nikic could you try running this over the rust tests again?

Ignoring some unrelated powerpc data layout failures, these codegen tests fail:

[codegen] tests/codegen/issues/issue-45222.rs
[codegen] tests/codegen/issues/issue-45466.rs
[codegen] tests/codegen/issues/issue-69101-bounds-check.rs
[codegen] tests/codegen/iter-repeat-n-trivial-drop.rs
[codegen] tests/codegen/slice-as_chunks.rs

The iter-repeat-n-trivial-drop.rs and slice-as_chunks.rs failures are cosmetic.

issues-45222.rs is the same as last time (loop not folded to constant).

issue-45466.rs fails to convert a loop into a memset 0 (this looks like the simplest test case, because it has essentially just one function in initial IR).

issue-69101-bounds-check.rs leaves behind panic_bounds_checks in already_sliced_no_bounds_check.

I've put the input IR for these three failure up at https://gist.github.com/nikic/9ec0bc02d451e7abd0f8d78800c2206c.