This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Merge similar fatal error blocks ending in unreachable
AcceptedPublic

Authored by rnk on Feb 1 2017, 2:53 PM.

Details

Summary

The goal is to canonicalize these examples to the same code:

void f(unsigned x) {
  if (x < 7) abort();
  if (x > 40) abort();
}
  • void g(unsigned x) { if (x < 7 || x > 40) abort(); }

Macros similar to 'assert' are very popular, and not all assertions are
compiled out from release builds. Assertion failures are usually very
cold, so we should try to optimize them for code size as much as
possible. Merging blocks ending in noreturn calls followed by
unreachable helps with that. It also enables follow-on optimizations
from instcombine, which can turn the above example into something like:

void f(unsigned x) {
  if (x - 7 > 33) abort();
}

These code patterns appear in MSVC's std::string destructor, which is
analyzed in PR31774, but they appear in LLVM Release+Asserts builds as
well.

This would be a net code size win on Release+Asserts clang if it weren't
for the inliner. By default, my clang is built with -O3, and this patch
increases the code size of that binary by 6% (70MB -> 75MB) and
regresses performance of compiling tramp3d-v4.cpp from test-suite by a
tiny amount. If I rebuild clang with -O1, this patch saves 111KB of code
from the 97MB of code in clang, which isn't much, but is nice. It
improves performance by 1.2% (42.50s -> 41.97s).

As a result, this transformation seems like a nice canonicalization, but
it's off by default for now. The -merge-noreturn-bbs flag enables it.
We can enable it by default when we address the issues in the inliner.

Diff Detail

Event Timeline

rnk created this revision.Feb 1 2017, 2:53 PM
hans edited edge metadata.Feb 2 2017, 10:04 AM

Since this is off by default, it seems the changes to existing test files are unrelated?

Can you comment on how this relates to https://reviews.llvm.org/D29153 ?

lib/Transforms/Scalar/SimplifyCFGPass.cpp
185 ↗(On Diff #86732)

nit: one "is" too many

191 ↗(On Diff #86732)

No big deal, but I'd spell out auto for E1 and E2 too, to make the declarations look more conventional. It's the same number of characters anyway :-)

234 ↗(On Diff #86732)

We haven't introduced any phis yet though, so this is just cleaning up to avoid left-over phis from previous passes that would get in the way of this transformation?

237 ↗(On Diff #86732)

Is there a reason for doing I1++ here instead of in the for-loop?

255 ↗(On Diff #86732)

nit: "which is" ... what? :-)

361 ↗(On Diff #86732)

Since there's nothing that could have changed yet, maybe just return false here and declare Changed further down?

365 ↗(On Diff #86732)

I found this comment a little confusing. Isn't it more like, for each bucket, attempt to merge each block in the bucket with the first, canonical, block?

369 ↗(On Diff #86732)

It's not clear to me why this needs multiple iterations. If merging a block the first time doesn't succeed, why would it work after others have been merged?

rnk marked 3 inline comments as done.Feb 2 2017, 10:39 AM
In D29428#664902, @hans wrote:

Since this is off by default, it seems the changes to existing test files are unrelated?

Somewhat. The other test changes make them resilient to changing the default for -merge-noreturn-bbs. That seems like a useful improvement that I don't want to lose track of.

Can you comment on how this relates to https://reviews.llvm.org/D29153 ?

It's basically the same optimization, except more aggressive and on the IR-level, and I think we should do both. We duplicate a fair amount of CFG updating logic between MI and IR, and this is good because MI knows how to split the critical edges that this pass introduces with these phis. It also knows how to tail duplicate. See the X86 codegen test that shows that we generate good code for glibc's __assert_fail calls, but we still tail duplicate when the shared tail is too small.

lib/Transforms/Scalar/SimplifyCFGPass.cpp
234 ↗(On Diff #86732)

We modify the CFG below, but we only update phis with uses. Phis without uses (dead phis) need to be removed so that the IR remains valid.

237 ↗(On Diff #86732)

eraseFromParent would invalidate I1.

365 ↗(On Diff #86732)

reworded

369 ↗(On Diff #86732)

I can't construct a test case for this because I hash pointer values, but imagine that we have two pairs of identical noreturn blocks, and they all hash to the same bucket. We pick one, iterate the other three, merge the one that matches, and remove them from the bucket. Then we repeat with the remaining two and merge them together.

We could test this scenario by adding a command line flag that makes every block hash to '0'. Do you think it's worth it?

rnk updated this revision to Diff 86853.Feb 2 2017, 10:40 AM
  • style and comments
hans accepted this revision.Feb 2 2017, 11:59 AM
In D29428#664945, @rnk wrote:
In D29428#664902, @hans wrote:

Since this is off by default, it seems the changes to existing test files are unrelated?

Somewhat. The other test changes make them resilient to changing the default for -merge-noreturn-bbs. That seems like a useful improvement that I don't want to lose track of.

sgtm

lib/Transforms/Scalar/SimplifyCFGPass.cpp
369 ↗(On Diff #86732)

Oh, I just didn't consider the hash collisions. This makes sense, and I don't think it's worth adding a flag to test it. Maybe it's possible to explain it in a comment though.

This revision is now accepted and ready to land.Feb 2 2017, 11:59 AM
iteratee edited edge metadata.May 2 2017, 4:45 PM

Did this ever get committed?

rnk added a comment.May 2 2017, 5:24 PM

To evaluate this patch, I compiled clang with assertions enabled on Linux. It fires often there, and in interesting ways because after inlining there are many different assertions from many different files and lines in the same function.

It looks like I didn't save or paste the results for it anywhere. I remember they were disappointingly negative, both in code size and performance, so I never came back to re-evaluate this. In isolation, this patch should reduce code size, but I suspect that because simplifycfg runs before inlining, it affects inline cost analysis and we over-inline.

It's hard to get excited about pursuing this further because this pass is a bit redundant with MI-level tail merging, which has a better cost model and doesn't impact inlining.

rnk updated this revision to Diff 240009.Jan 23 2020, 2:18 PM
  • three year later update, I forgot we still don't do this
Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2020, 2:18 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

Unit tests: fail. 62149 tests passed, 5 failed and 811 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 3 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

rnk added a comment.Jan 23 2020, 4:00 PM

Some new numbers:

$ du -b build_a/bin/clang-11  build_b/bin/clang-11                                                                                                                                                                         
136375664       build_a/bin/clang-11                                                                         
141186840       build_b/bin/clang-11 

So, enabling this patch in build_b increases the size of clang with asserts enabled by 4.5MB. Odd.

sanjoy resigned from this revision.Jan 29 2022, 5:27 PM