Page MenuHomePhabricator

[SimplifyCFG] Merge similar fatal error blocks ending in unreachable
AcceptedPublic

Authored by rnk on Feb 1 2017, 2:53 PM.

Details

Summary

The goal is to canonicalize these examples to the same code:

void f(unsigned x) {
  if (x < 7) abort();
  if (x > 40) abort();
}
  • void g(unsigned x) { if (x < 7 || x > 40) abort(); }

Macros similar to 'assert' are very popular, and not all assertions are
compiled out from release builds. Assertion failures are usually very
cold, so we should try to optimize them for code size as much as
possible. Merging blocks ending in noreturn calls followed by
unreachable helps with that. It also enables follow-on optimizations
from instcombine, which can turn the above example into something like:

void f(unsigned x) {
  if (x - 7 > 33) abort();
}

These code patterns appear in MSVC's std::string destructor, which is
analyzed in PR31774, but they appear in LLVM Release+Asserts builds as
well.

This would be a net code size win on Release+Asserts clang if it weren't
for the inliner. By default, my clang is built with -O3, and this patch
increases the code size of that binary by 6% (70MB -> 75MB) and
regresses performance of compiling tramp3d-v4.cpp from test-suite by a
tiny amount. If I rebuild clang with -O1, this patch saves 111KB of code
from the 97MB of code in clang, which isn't much, but is nice. It
improves performance by 1.2% (42.50s -> 41.97s).

As a result, this transformation seems like a nice canonicalization, but
it's off by default for now. The -merge-noreturn-bbs flag enables it.
We can enable it by default when we address the issues in the inliner.

Event Timeline

rnk created this revision.Feb 1 2017, 2:53 PM
hans edited edge metadata.Feb 2 2017, 10:04 AM

Since this is off by default, it seems the changes to existing test files are unrelated?

Can you comment on how this relates to https://reviews.llvm.org/D29153 ?

lib/Transforms/Scalar/SimplifyCFGPass.cpp
185

nit: one "is" too many

191

No big deal, but I'd spell out auto for E1 and E2 too, to make the declarations look more conventional. It's the same number of characters anyway :-)

234

We haven't introduced any phis yet though, so this is just cleaning up to avoid left-over phis from previous passes that would get in the way of this transformation?

237

Is there a reason for doing I1++ here instead of in the for-loop?

255

nit: "which is" ... what? :-)

361

Since there's nothing that could have changed yet, maybe just return false here and declare Changed further down?

365

I found this comment a little confusing. Isn't it more like, for each bucket, attempt to merge each block in the bucket with the first, canonical, block?

369

It's not clear to me why this needs multiple iterations. If merging a block the first time doesn't succeed, why would it work after others have been merged?

rnk marked 3 inline comments as done.Feb 2 2017, 10:39 AM
In D29428#664902, @hans wrote:

Since this is off by default, it seems the changes to existing test files are unrelated?

Somewhat. The other test changes make them resilient to changing the default for -merge-noreturn-bbs. That seems like a useful improvement that I don't want to lose track of.

Can you comment on how this relates to https://reviews.llvm.org/D29153 ?

It's basically the same optimization, except more aggressive and on the IR-level, and I think we should do both. We duplicate a fair amount of CFG updating logic between MI and IR, and this is good because MI knows how to split the critical edges that this pass introduces with these phis. It also knows how to tail duplicate. See the X86 codegen test that shows that we generate good code for glibc's __assert_fail calls, but we still tail duplicate when the shared tail is too small.

lib/Transforms/Scalar/SimplifyCFGPass.cpp
234

We modify the CFG below, but we only update phis with uses. Phis without uses (dead phis) need to be removed so that the IR remains valid.

237

eraseFromParent would invalidate I1.

365

reworded

369

I can't construct a test case for this because I hash pointer values, but imagine that we have two pairs of identical noreturn blocks, and they all hash to the same bucket. We pick one, iterate the other three, merge the one that matches, and remove them from the bucket. Then we repeat with the remaining two and merge them together.

We could test this scenario by adding a command line flag that makes every block hash to '0'. Do you think it's worth it?

rnk updated this revision to Diff 86853.Feb 2 2017, 10:40 AM
  • style and comments
hans accepted this revision.Feb 2 2017, 11:59 AM
In D29428#664945, @rnk wrote:
In D29428#664902, @hans wrote:

Since this is off by default, it seems the changes to existing test files are unrelated?

Somewhat. The other test changes make them resilient to changing the default for -merge-noreturn-bbs. That seems like a useful improvement that I don't want to lose track of.

sgtm

lib/Transforms/Scalar/SimplifyCFGPass.cpp
369

Oh, I just didn't consider the hash collisions. This makes sense, and I don't think it's worth adding a flag to test it. Maybe it's possible to explain it in a comment though.

This revision is now accepted and ready to land.Feb 2 2017, 11:59 AM
iteratee edited edge metadata.May 2 2017, 4:45 PM

Did this ever get committed?

rnk added a comment.May 2 2017, 5:24 PM

To evaluate this patch, I compiled clang with assertions enabled on Linux. It fires often there, and in interesting ways because after inlining there are many different assertions from many different files and lines in the same function.

It looks like I didn't save or paste the results for it anywhere. I remember they were disappointingly negative, both in code size and performance, so I never came back to re-evaluate this. In isolation, this patch should reduce code size, but I suspect that because simplifycfg runs before inlining, it affects inline cost analysis and we over-inline.

It's hard to get excited about pursuing this further because this pass is a bit redundant with MI-level tail merging, which has a better cost model and doesn't impact inlining.