This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Only perform one iteration
ClosedPublic

Authored by nikic on Jul 6 2023, 1:40 AM.

Details

Summary

InstCombine is a worklist-driven algorithm, which works roughly as follows:

  • All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
  • All newly inserted instructions get added to the worklist.
  • When an instruction is folded, its users get added back to the worklist.
  • When the use-count of an instruction decreases, it gets added back to the worklist.
  • ...plus a bunch of other heuristics on when we should revisit instructions.

On top of the worklist algorithm, InstCombine layers an additional fix-point iteration: If any fold was performed in the previous iteration, then InstCombine will re-populate the worklist from scratch and fold the entire function again. This continues until a fix-point is reached.

In the vast majority of cases, InstCombine will reach a fix-point within a single iteration: However, a second iteration is performed to verify that this is indeed the fixpoint. We can see this in the statistics for llvm-test-suite:

"instcombine.NumOneIteration": 411380,
"instcombine.NumTwoIterations": 117921,
"instcombine.NumThreeIterations": 236,
"instcombine.NumFourOrMoreIterations": 2,

The way to read these numbers is that in 411380 cases, InstCombine performs no folds. In 117921 cases it performs a fold and reaches the fix-point within one iteration (the second iteration verifies the fixpoint). In the remaining 238 cases, more than one iteration is needed to reach the fixpoint.

In other words, only in 0.04% of cases are additional iterations needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine performs a completely useless extra iteration to verify the fix point.

This patch proposes to remove the fixpoint iteration from InstCombine, and to always only perform a single iteration. This results in a major compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=b7e38ff22326d7bcbd01f080dc91f47be25e703e&to=40936c7e9324ce41819483f2c02f5bbcefa292a0&stat=instructions%3Au We get a 4-5% compile-time reduction at negligible codegen impact. (These numbers include D75362, which is a non-trivial regression when taken by itself. Most of the size-text changes are also due to that patch, not this one.)

This explicitly does accept that we will not reach a fixpoint in all cases. However, this is mitigated by two factors: First, the data suggests that this happens very rarely in practice. Second, InstCombine runs many times during the optimization pipeline (8 times even without LTO), so there are many chances to recover such cases.

In order to prevent accidental optimization regressions in the future, this implements a default-enabled verify-fixpoint option, which will make sure that the fix point has indeed been reached after a single iteration. This means that tests where this is not the case need to be explicitly annotated. The actual optimization pipeline will disable this option, as failure to reach the fix point is expected to happen there (in rare cases, as described above).

Depends on D75362.

Diff Detail

Event Timeline

nikic created this revision.Jul 6 2023, 1:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 1:40 AM
nikic requested review of this revision.Jul 6 2023, 1:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 1:40 AM
nikic edited the summary of this revision. (Show Details)Jul 6 2023, 1:45 AM
nikic added a comment.Jul 6 2023, 1:57 AM

Left some comments on test diffs. I don't think any of the remaining cases are particularly problematic, though the phi and freeze cases are something that may be worth fixing.

llvm/test/Analysis/ValueTracking/numsignbits-from-assume.ll
51

This is related to backwards-propagation of assumes: Assumes can affect guaranteed-to-transfer instructions in a limited window before the assume. We may fail to fold such cases in one iteration if we first need to fold instructions to bring the assume into a recognized form. Here the assume is only recognized by AC after ule is converted to ult, at which point the add before has already been visited.

I don't think this issue matters in practice.

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
31

This is caused by details of how we canonicalize phi operand order. This is easy to fix, it just has annoying test fallout.

llvm/test/Transforms/InstCombine/pr55228.ll
11

This happens because the initializer of the global is not fully folded. This is not a problem when run in a real optimization pipeline, because GlobalOpt will handle such cases earlier.

llvm/test/Transforms/InstCombine/shift.ll
1718

I didn't bother looking into this, because it's a fuzzer test case.

llvm/test/Transforms/PGOProfile/chr.ll
1936

At the time we process this freeze, j.fr hasn't been introduced yet, so we would have to introduce two freeze instructions. We could fix this by allowing the creation of more than one freeze when pushing upward. Especially for icmps that is probably beneficial.

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
119

This is the same backwards reasoning assume issue mentioned above.

This seems worthwhile in pursuing, but I don't know very much about how IC worklists are managed/sorted - your summary implies there are a number of workarounds in there, would these benefit from being cleaned up before/after this change?

Regarding your verify-fixpoint proposal - would it mean we don't have much "no-verify-fixpoint" test coverage apart from phase-ordering or similar tests?

nikic added a comment.Jul 6 2023, 7:33 AM

This seems worthwhile in pursuing, but I don't know very much about how IC worklists are managed/sorted - your summary implies there are a number of workarounds in there, would these benefit from being cleaned up before/after this change?

Normally, InstCombine worklist management is handled implicitly, using a combination of IRBuilder callbacks and standard helpers like replaceInstUsesWith(). Things work automatically as long as folds just replace one sequence of instructions with another. However, for folds that do non-local changes (e.g. looping over users and doing extra replacements there), it may be necessary to perform manual worklist management. I've been working on adding that manual worklist management in all the places that were missing it over the last few weeks, and I'm not aware of any remaining issues. (The most common case is folds leaving behind dead instructions without queuing them for DCE.)

Regarding your verify-fixpoint proposal - would it mean we don't have much "no-verify-fixpoint" test coverage apart from phase-ordering or similar tests?

Right. Ideally we would always verify the fixpoint for tests (so that an explicitly opt-out is required for cases that don't reach the fixpoint), and not verify it outside tests. Verifying it for -passes=instcombine but not -passes='default<O3>' would be the heuristic for that.

the instcombine<no-verify-fixpoint> approach sgtm

re:

All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
All newly inserted instructions get added to the worklist.
When an instruction is folded, its users get added back to the worklist.
When the use-count of an instruction decreases, it gets added back to the worklist.
...plus a bunch of other heuristics on when we should revisit instructions.

What does it look like if instead of decreasing iteration count, we change re-insertion
logic based on iteration?
I.e:
iteration 1 -> do everything
iteration 2+ -> only re-add newly created insn or insn that are now single-use.

nikic added a comment.Jul 7 2023, 1:52 AM

All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
All newly inserted instructions get added to the worklist.
When an instruction is folded, its users get added back to the worklist.
When the use-count of an instruction decreases, it gets added back to the worklist.
...plus a bunch of other heuristics on when we should revisit instructions.

What does it look like if instead of decreasing iteration count, we change re-insertion
logic based on iteration?
I.e:
iteration 1 -> do everything
iteration 2+ -> only re-add newly created insn or insn that are now single-use.

I don't think I understand your suggestion here. This sparse reprocessing is what the worklist is for -- and we do want to perform the reprocessing as part of the same iteration, not a later one, to make sure that folds working on later instructions see already folded operands, even if arriving at them requires multiple folds. If we delayed all reprocessing until a second iteration, folds would see operands after a single round of folding was applied to them, rather than in their final form.

nikic updated this revision to Diff 544774.Jul 27 2023, 7:45 AM
nikic edited the summary of this revision. (Show Details)

Implement fix-point verification.

nikic updated this revision to Diff 544777.Jul 27 2023, 7:50 AM

Move stat update.

aeubanks added inline comments.Jul 27 2023, 10:51 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
369 ↗(On Diff #544777)

imo the InstCombinePass constructor should default to no-verify-fixpoint, but parseInstCombineOptions should by default set verify-fixpoint, since we typically call the InstCombinePass constructor from pass pipelines

nikic updated this revision to Diff 545068.Jul 28 2023, 1:58 AM

Move default to parseInstCombineOptions().

nikic marked an inline comment as done.Jul 28 2023, 1:58 AM
nikic added inline comments.
llvm/lib/Passes/PassBuilderPipelines.cpp
369 ↗(On Diff #544777)

Good point. In fact, I missed some InstCombinePass() uses in BackendUtil in the previous patch. Doing this in option parsing makes sure all C++ uses of InstCombinePass don't get fixpoint verification.

nikic marked an inline comment as done.Jul 28 2023, 1:59 AM
nikic added inline comments.
llvm/lib/Passes/PassRegistry.def
328 ↗(On Diff #545068)

This didn't get removed when FUNCTION_PASS_WITH_PARAMS was added below.

are you still looking into some of the remaining cases, or is this in a state you want to land now?

are you still looking into some of the remaining cases, or is this in a state you want to land now?

This is ready to land as far as I'm concerned.

aeubanks accepted this revision.Jul 28 2023, 10:40 AM

lgtm

llvm/test/Transforms/InstCombine/shift.ll
5
This revision is now accepted and ready to land.Jul 28 2023, 10:40 AM
This revision was landed with ongoing or failed builds.Jul 31 2023, 1:57 AM
This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Jul 31 2023, 6:20 AM
bjope added inline comments.
llvm/lib/Passes/PassRegistry.def
511 ↗(On Diff #545541)

We should add "no-verify-fixpoint;verify-fixpoint;" here, right?

bjope added inline comments.Jul 31 2023, 6:23 AM
llvm/lib/Passes/PassRegistry.def
511 ↗(On Diff #545541)

Also noticed that instcombine<verify-fixpoint> will be tricky to use in fuzzy testing with random pipelines. So I think we will avoid that.

bjope added inline comments.Jul 31 2023, 12:47 PM
llvm/lib/Passes/PassRegistry.def
511 ↗(On Diff #545541)
nikic added inline comments.Jul 31 2023, 12:50 PM
llvm/lib/Passes/PassRegistry.def
511 ↗(On Diff #545541)

Thank you! And yes, for fuzzing purposes, instcombine<no-verify-fixpoint> should be used.