This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Aggregate reconstruction simplification (PR47060)
ClosedPublic

Authored by lebedev.ri on Aug 11 2020, 2:34 PM.

Details

Summary

This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a landingpad block after each invoke, where RAII
cleanup is performed, and the elements of an aggregate {i8*, i32}
holding exception info are extractvalue'd, and we then branch to common block
that takes extracted i8* and i32 elements (via phi nodes),
form a new aggregate, and finally resume's the exception.

The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that landingpad and resume,
said invoke should be a call.

Indeed, we do that simplification in e.g. SimplifyCFG SimplifyCFGOpt::simplifyResume().
But the thing is, all this extra extractvalue + phi + insertvalue cruft,
while it is pointless, does not look like "empty cleanup block".
So the SimplifyCFGOpt::simplifyResume() fails, and the exception is has
higher cost than it could have on unwind branch :S

This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be nounwind.

I think, this is a missing fold in InstCombine, so i've implemented it.

I think, the algorithm/implementation is rather self-explanatory:

  1. Find a chain of insertvalue's that fully tell us the initializer of the aggregate.
  2. For each element, try to find from which aggregate it was extracted. If it was extracted from the aggregate with identical type, from identical element index, great.
  3. If all elements were found to have been extracted from the same aggregate, then we can just use said original source aggregate directly, instead of re-creating it.
  4. If we fail to find said aggregate when looking only in the current block, we need be PHI-aware - we might have different source aggregate when coming from each predecessor.

I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.

I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.

On RawSpeed, for example, this has the following effect:

| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |     1253 |  1253 |   0.00% |  0.00% |
| simplifycfg.NumInvokes                            |      948 |     1355 |   407 |  42.93% | 42.93% |
| instcount.NumInsertValueInst                      |     4382 |     3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode                     |      574 |      458 |  -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs                   |     1154 |      921 |  -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst                     |    29017 |    26397 | -2620 |  -9.03% |  9.03% |
| instcombine.NumDeadInst                           |   166618 |   174705 |  8087 |   4.85% |  4.85% |
| instcount.NumPHIInst                              |    51526 |    50678 |  -848 |  -1.65% |  1.65% |
| instcount.NumLandingPadInst                       |    20865 |    20609 |  -256 |  -1.23% |  1.23% |
| instcount.NumInvokeInst                           |    34023 |    33675 |  -348 |  -1.02% |  1.02% |
| simplifycfg.NumSimpl                              |   113634 |   114708 |  1074 |   0.95% |  0.95% |
| instcombine.NumSunkInst                           |    15030 |    14930 |  -100 |  -0.67% |  0.67% |
| instcount.TotalBlocks                             |   219544 |   219024 |  -520 |  -0.24% |  0.24% |
| instcombine.NumCombined                           |   644562 |   645805 |  1243 |   0.19% |  0.19% |
| instcount.TotalInsts                              |  2139506 |  2135377 | -4129 |  -0.19% |  0.19% |
| instcount.NumBrInst                               |   156988 |   156821 |  -167 |  -0.11% |  0.11% |
| instcount.NumCallInst                             |  1206144 |  1207076 |   932 |   0.08% |  0.08% |
| instcount.NumResumeInst                           |     5193 |     5190 |    -3 |  -0.06% |  0.06% |
| asm-printer.EmittedInsts                          |   948580 |   948299 |  -281 |  -0.03% |  0.03% |
| instcount.TotalFuncs                              |    11509 |    11507 |    -2 |  -0.02% |  0.02% |
| inline.NumDeleted                                 |    97595 |    97597 |     2 |   0.00% |  0.00% |
| inline.NumInlined                                 |   210514 |   210522 |     8 |   0.00% |  0.00% |

So we manage to increase the amount of invoke -> call conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.

On vanilla llvm-test-suite:

| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |      744 |   744 |   0.00% |  0.00% |
| instcount.NumInsertValueInst                      |     2705 |     2053 |  -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes                            |     1212 |     1424 |   212 |  17.49% | 17.49% |
| instcount.NumExtractValueInst                     |    21681 |    20139 | -1542 |  -7.11% |  7.11% |
| simplifycfg.NumSinkCommonInstrs                   |    14575 |    14361 |  -214 |  -1.47% |  1.47% |
| simplifycfg.NumSinkCommonCode                     |     6815 |     6743 |   -72 |  -1.06% |  1.06% |
| instcount.NumLandingPadInst                       |    14851 |    14712 |  -139 |  -0.94% |  0.94% |
| instcount.NumInvokeInst                           |    27510 |    27332 |  -178 |  -0.65% |  0.65% |
| instcombine.NumDeadInst                           |  1438173 |  1443371 |  5198 |   0.36% |  0.36% |
| instcount.NumResumeInst                           |     2880 |     2872 |    -8 |  -0.28% |  0.28% |
| instcombine.NumSunkInst                           |    55187 |    55076 |  -111 |  -0.20% |  0.20% |
| instcount.NumPHIInst                              |   321366 |   320916 |  -450 |  -0.14% |  0.14% |
| instcount.TotalBlocks                             |   886816 |   886493 |  -323 |  -0.04% |  0.04% |
| instcount.TotalInsts                              |  7663845 |  7661108 | -2737 |  -0.04% |  0.04% |
| simplifycfg.NumSimpl                              |   886791 |   887171 |   380 |   0.04% |  0.04% |
| instcount.NumCallInst                             |   553552 |   553733 |   181 |   0.03% |  0.03% |
| instcombine.NumCombined                           |  3200512 |  3201202 |   690 |   0.02% |  0.02% |
| instcount.NumBrInst                               |   741794 |   741656 |  -138 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                  |    14443 |    14445 |     2 |   0.01% |  0.01% |
| asm-printer.EmittedInsts                          |  7978085 |  7977916 |  -169 |   0.00% |  0.00% |
| inline.NumDeleted                                 |    73188 |    73189 |     1 |   0.00% |  0.00% |
| inline.NumInlined                                 |   291959 |   291968 |     9 |   0.00% |  0.00% |

Roughly similar effect, less instructions and blocks total.

See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.

Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions

And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text

See https://bugs.llvm.org/show_bug.cgi?id=47060

Diff Detail

Unit TestsFailed

Event Timeline

lebedev.ri created this revision.Aug 11 2020, 2:34 PM
lebedev.ri requested review of this revision.Aug 11 2020, 2:34 PM
lebedev.ri edited the summary of this revision. (Show Details)
lebedev.ri edited the summary of this revision. (Show Details)Aug 11 2020, 2:55 PM

One last tiny cleanup.

aqjune added a subscriber: aqjune.Aug 12 2020, 10:53 AM

@ reviewers - i'm not so much interested in deep code/algo review,
but more like in the general direction disscussion, like, is this okay for instcombine? :)

I guess pre-merge check failures are relevant, and they show exactly the problem; will resolve in a bit.

Fix (overstepping) clang test being affected by this change.

Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2020, 1:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Drop unneeded header include.

@ reviewers - i'm not so much interested in deep code/algo review,
but more like in the general direction disscussion, like, is this okay for instcombine? :)

The test diffs look great, and it seems to at least pay for itself in compile-time impact, so I think it's a good direction. There's always a question of "is this substantial enough (or would it grow enough with the 'TODO' items) to be a stand-alone pass?".

Haven't had a chance to look at the code itself yet. Do we have tests where there are extra uses of the extracted values?

@spatel thank you for taking a look!

@ reviewers - i'm not so much interested in deep code/algo review,
but more like in the general direction disscussion, like, is this okay for instcombine? :)

The test diffs look great, and it seems to at least pay for itself in compile-time impact, so I think it's a good direction.

There's always a question of "is this substantial enough (or would it grow enough with the 'TODO' items) to be a stand-alone pass?".

Yeah, i'm not sure. I don't really expect it to grow *that* much further after addressing TODO's.
This is a weird mix of GVN and CSE, with PHI awareness ontop. It doesn't really fit anywhere.
Also, i don't have any info on how making this fold a separate pass (and it's placement) would affect the results.

Haven't had a chance to look at the code itself yet.

Do we have tests where there are extra uses of the extracted values?

Sure we do :)

; This fold does not care whether or not intermediate instructions have extra uses.
define { i32, i32 } @test12({ i32, i32 } %srcagg) {

in llvm/test/Transforms/InstCombine/aggregate-reconstruction.ll.

lebedev.ri edited the summary of this revision. (Show Details)Aug 14 2020, 5:47 AM
spatel accepted this revision.Aug 16 2020, 10:58 AM

This seems similar in spirit to the recursive element tracking that we do in collectShuffleElements() in this file or foldIdentityShuffles() in InstSimplify, but a bit more complicated (because of the phi translation?).
I've pointed out a few minor potential improvements, otherwise LGTM.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
731

IIUC, you're intentionally giving the placeholder objects the same names as the enum values, but that confused me. Would it make sense to call this "InvalidValue" or similar?

779

Similar comment about the name as above for NotFound (and might be better to declare it on the next line after above?).

781

Spelling - "SourceAggregate"

856

Instead of switch, could we reduce to:

if (Describe(SourceAggForElt) != SourceAggregate::Found)
  return SourceAggForElt;
908

Best to give that "64" a name in case we want to experiment with other settings.

This revision is now accepted and ready to land.Aug 16 2020, 10:58 AM

@spatel thank you for taking a look!

This seems similar in spirit to the recursive element tracking that we do in collectShuffleElements() in this file or foldIdentityShuffles() in InstSimplify, but a bit more complicated (because of the phi translation?).

I actually haven't seen those, but i would imagine that yes, having to deal with PHI's makes things slightly more complicated.

I've pointed out a few minor potential improvements, otherwise LGTM.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
731

IIUC, you're intentionally giving the placeholder objects the same names as the enum values, but that confused me.

Yes. I first added enum, and then thought that it would be better to not
spell None; // SourceAggegate::NotFound / nullptr; // SourceAggegate::FoundMismatch
but literally just give them a name and spell them as close to the enum
values as possible, for obvious reasons of avoiding duplication/errors.

Would it make sense to call this "InvalidValue" or similar?

Do we want to change both the enum values and the variable names,
or just variables to be different from the enum values?

NotFound seems like a good fit to me, because with it we literally mean
that we can't tell what scalar value that element of an aggregate has.

InvalidValue doesn't really have any connotation to me,
but it does look kinda sorta similar to UndefValue, which i want to avoid.

So i'm not sure here.

856

Hm, i guess, thanks!
But we still need the inner switch.

908

Do you mean a cl::opt, or just a local static const variable?
I'll do the latter for now.

lebedev.ri marked 4 inline comments as done.

Patch updated - addressed all nits other than the question about decoupling enum values from variable names.

This revision was landed with ongoing or failed builds.Aug 16 2020, 1:28 PM
This revision was automatically updated to reflect the committed changes.