This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Apply loop guards when computing max BTC for arbitrary steps.
ClosedPublic

Authored by fhahn on Nov 10 2021, 9:33 AM.

Details

Summary

Similar other cases in the current function (e.g. when the step is 1 or
-1), applying loop guards can lead to tighter upper bounds for the
backedge-taken counts.

Fixes PR52464.

Diff Detail

Event Timeline

fhahn created this revision.Nov 10 2021, 9:33 AM
fhahn requested review of this revision.Nov 10 2021, 9:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2021, 9:33 AM
nikic added inline comments.Nov 10 2021, 9:46 AM
llvm/lib/Analysis/ScalarEvolution.cpp
9664

Same as in other cases, this assertions likely doesn't always hold.

mkazantsev added inline comments.Nov 10 2021, 9:58 PM
llvm/lib/Analysis/ScalarEvolution.cpp
9664

Isn't breach of this is a bug we want to fix?

I'd rather move the assert inside of applyLoopGuards. If it breaks, it's a subject for improving our reasoning. I can imagine it could be breaking because of issues with caching (e.g. if addrecs are involved), though. So, unless the failures are mass and disruptive, I'd prefer it to stay and catch us opportunities.

I'd rather move the assert inside of applyLoopGuards. If it breaks, it's a subject for improving our reasoning. I can imagine it could be breaking because of issues with caching (e.g. if addrecs are involved), though. So, unless the failures are mass and disruptive, I'd prefer it to stay and catch us opportunities.

We already know that the assertion is going to fail and also know why. See https://reviews.llvm.org/D102267#inline-967507 for two suggestions on how to avoid pessimization, or at least the known cases of it. This should either be addressed first, or the assert should be omitted.

fhahn updated this revision to Diff 386803.Nov 12 2021, 4:23 AM

Replace over-aggressive assertion with selecting the minimum max BTC. Added test case where the assert is voilated in 69c1cbe20f5d.

fhahn marked 2 inline comments as done.Nov 12 2021, 4:29 AM
fhahn added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
9664

Same as in other cases, this assertions likely doesn't always hold.

Yeah, turns out I was a bit too optimistic. I didn't hit the assertion when building a larger number of projects, but I managed to construct a crashing test from other existing problematic tests. I added the test in 69c1cbe20f5d and turned the assertion into a select of the minimum.

I am also looking into weeding out the cases where applying the guard info pessimizes results. Unfortunately there are some cases where it pessimizes the range even if we only allow compares with constants. I'm still digging into the details, but it might be a case where the original range is not correct. I'm planning on sharing more on that soon.

nikic accepted this revision.Nov 12 2021, 9:03 AM

LG

llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info-rewrite-expressions.ll
68

The zext is not in the condition here, so it doesn't really seem relevant?

This revision is now accepted and ready to land.Nov 12 2021, 9:03 AM
reames accepted this revision.Nov 12 2021, 9:11 AM

LGTM as well. Amusingly, I'd stumbled into a variant of this as well just yesterday. :)

fhahn updated this revision to Diff 387552.Nov 16 2021, 3:17 AM

rebase after landing D113577

This revision was landed with ongoing or failed builds.Nov 17 2021, 3:01 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.Nov 17 2021, 4:32 AM
fhahn added inline comments.
llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info-rewrite-expressions.ll
68

Thanks, I adjusted the name to @rewrite_zext_with_info_from_icmp_ne to hopefully make it a bit clearer in the committed version.

alexfh added a subscriber: alexfh.Dec 10 2021, 9:29 AM

Hi Florian,

this revision is causing compiler crashes on a number of translation units in our code. This _may_ be an increase in stack depth, but I'm not sure yet. Is significantly increased stack depth expected after this patch? Is it justifiable?

The failures look more or less like this:

1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '...'.
4.	Running pass 'Loop Pass Manager' on function '@...'
5.	Running pass 'Induction Variable Users' on basic block '%107409'
  #0 0x0000562126256c38 llvm::sys::RunSignalHandlers() (clang+0x6c56c38)
  #1 0x000056212625926c SignalHandler(int) (clang+0x6c5926c)
  #2 0x00007faea56f9750 __restore_rt (libpthread.so.0+0x15750)
  #3 0x0000562125ec3b1b computeKnownBitsFromAssume(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68c3b1b)
  #4 0x0000562125eae55d computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68ae55d)
  #5 0x0000562125ec2de5 computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68c2de5)
  #6 0x0000562125eaeb6e computeKnownBits(llvm::Value const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68aeb6e)
  #7 0x0000562125eae55d computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68ae55d)
  #8 0x0000562125eaecd9 llvm::computeKnownBits(llvm::Value const*, llvm::DataLayout const&, unsigned int, llvm::AssumptionCache*, llvm::Instruction const*, llvm::DominatorTree const*, llvm::OptimizationRemark
Emitter*, bool) (clang+0x68aecd9)
  #9 0x0000562125e47339 llvm::ScalarEvolution::GetMinTrailingZerosImpl(llvm::SCEV const*) (clang+0x6847339)
 #10 0x0000562125e3143a llvm::ScalarEvolution::GetMinTrailingZeros(llvm::SCEV const*) (clang+0x683143a)
 #11 0x0000562125e482ea llvm::ScalarEvolution::getRangeRef(llvm::SCEV const*, llvm::ScalarEvolution::RangeSignHint) (clang+0x68482ea)
 #12 0x0000562125e3ba27 StrengthenNoWrapFlags(llvm::ScalarEvolution*, llvm::SCEVTypes, llvm::ArrayRef<llvm::SCEV const*>, llvm::SCEV::NoWrapFlags) (clang+0x683ba27)
 #13 0x0000562125e2facf llvm::ScalarEvolution::getAddExpr(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::SCEV::NoWrapFlags, unsigned int) (clang+0x682facf)
 #14 0x0000562125e3c19d llvm::ScalarEvolution::getGEPExpr(llvm::GEPOperator*, llvm::SmallVectorImpl<llvm::SCEV const*> const&) (clang+0x683c19d)
 #15 0x0000562125e46e7e llvm::ScalarEvolution::createNodeForGEP(llvm::GEPOperator*) (clang+0x6846e7e)
 #16 0x0000562125e3fe3d llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683fe3d)
 #17 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60)
 #18 0x0000562125e452f4 llvm::ScalarEvolution::createAddRecFromPHI(llvm::PHINode*) (clang+0x68452f4)
 #19 0x0000562125e46938 llvm::ScalarEvolution::createNodeForPHI(llvm::PHINode*) (clang+0x6846938)
 #20 0x0000562125e3ff14 llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683ff14)
 #21 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60)
 #22 0x0000562125e3bd47 llvm::ScalarEvolution::getGEPExpr(llvm::GEPOperator*, llvm::SmallVectorImpl<llvm::SCEV const*> const&) (clang+0x683bd47)
 #23 0x0000562125e46e7e llvm::ScalarEvolution::createNodeForGEP(llvm::GEPOperator*) (clang+0x6846e7e)
 #24 0x0000562125e3fe3d llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683fe3d)
 #25 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60)
 #26 0x0000562125e4dd1d llvm::ScalarEvolution::applyLoopGuards(llvm::SCEV const*, llvm::Loop const*) (clang+0x684dd1d)
...

I'm now working on a self-contained test case, but if you have any ideas of fixes, I can verify them on our real code.

Regards,
Alex

I've created a standalone test case for the issue:

Relevant compiler invocations:

$ ./clang-before -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc

$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'q2.cc'.
4.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
5.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body85833'
...

I observe the problem on Linux on x86-64.

Please investigate the failure. If there's no obvious fix, please revert the patch while you're working on a solution.

A bit cleaner test case:

$ ./clang-before -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'q2.cc'.
4.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
5.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029'
...

Alternatively, the corresponding IR file:

$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll
1.      Code generation
2.      Running pass 'Function Pass Manager' on module 'q2.ll'.
3.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
4.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029'
...
fhahn marked an inline comment as done.Dec 14 2021, 3:51 AM

A bit cleaner test case:

$ ./clang-before -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'q2.cc'.
4.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
5.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029'
...

Alternatively, the corresponding IR file:

$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll
1.      Code generation
2.      Running pass 'Function Pass Manager' on module 'q2.ll'.
3.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
4.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029'
...

Thanks for the heads-up, I'll take a look now. But from the stack trace you shared, I doubt that the patch itself is causing the crash, but rather exposes an existing bug in GetMinTrailingZeros/computeKnownBits

Thanks for the heads-up, I'll take a look now. But from the stack trace you shared, I doubt that the patch itself is causing the crash, but rather exposes an existing bug in GetMinTrailingZeros/computeKnownBits

Right, I've seen this sort of stuff a few times recently. I guess, in a sufficiently complex system this is not a rare occasion. Unfortunately, the burden of mitigating the underlying issue in practice frequently lies with the author of the patch exposing the issue (unless someone else volunteers to do this).

Do you see an obvious fix for the bug you mentioned or should we revert this patch while you're investigating?

Unfortunately, the burden of mitigating the underlying issue in practice frequently lies with the author of the patch exposing the issue (unless someone else volunteers to do this).

This isn't quite as clear cut as you make it out to be here. Yes, we will frequently fix issues exposed in the process of introducing an unrelated bug. However, the standard is not (and has never been) any reported issue which happens to be exposed. We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged.

To be clear, I'm not commenting on which this situation might be. I'm just making the general point that this is a lot more complicated than your wording implies.

Unfortunately, the burden of mitigating the underlying issue in practice frequently lies with the author of the patch exposing the issue (unless someone else volunteers to do this).

This isn't quite as clear cut as you make it out to be here. Yes, we will frequently fix issues exposed in the process of introducing an unrelated bug. However, the standard is not (and has never been) any reported issue which happens to be exposed. We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged.

To be clear, I'm not commenting on which this situation might be. I'm just making the general point that this is a lot more complicated than your wording implies.

Thanks for the comment. Note that I used "frequently", not "always". I'm not trying to impose any rules, just stating my observation.

As for rapid vs delayed reporting, IMO, the main difference is whether the patch got dependencies (in LLVM code, in downstream LLVM dependencies, in code compiled by LLVM). Does it sound right to you?

As for rapid vs delayed reporting, IMO, the main difference is whether the patch got dependencies (in LLVM code, in downstream LLVM dependencies, in code compiled by LLVM). Does it sound right to you?

Can you rephrase this? I'm not sure what you mean by "got dependencies", and am not sure what you're trying to ask here.

fhahn added a comment.Dec 15 2021, 6:36 AM

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

fhahn added a comment.Dec 15 2021, 8:23 AM

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases.

(general discussion, not necessarily related to this patch)

As for rapid vs delayed reporting, IMO, the main difference is whether the patch got dependencies (in LLVM code, in downstream LLVM dependencies, in code compiled by LLVM). Does it sound right to you?

Can you rephrase this? I'm not sure what you mean by "got dependencies", and am not sure what you're trying to ask here.

You mentioned quickly reported issues ("We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged."), and I wanted to gauge my understanding of this. I see multiple reasons why quickly reported issues may be handled differently:

  1. as the time passes, the author of the patch may have switched context and it would be more effort to fix the issue;
  2. the patch that caused the issue may be more difficult to revert cleanly or even changed significantly, when
    1. more patches landed touching the same lines of code
    2. other project code started depending on the new interfaces or behavior introduced by the patch
    3. something outside the project started depending on the new APIs, features, or behavior introduced by the patch.
  3. the quicker the issue gets reported, the more the chances that it's a widespread issue with a higher impact.

Do these sound about right? Am I missing something else? What's your view on the relative importance of the factors above?

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases.

The patch fixes the crash, but it looks like the compilation time with the patch grows by a factor of ~7 for one of the problematic files. I'll try to make a cleaner experiment with two compilers built with exactly the same compiler options, but I doubt it will change much.

(general discussion, not necessarily related to this patch)

(snip)
Do these sound about right? Am I missing something else? What's your view on the relative importance of the factors above?

You hit a bunch of important points. Here's a couple extra:

  • We want incentives to encourage following upstream development. We don't want large amount of testing happening far from ToT. We want to encourage testing as close to ToT as practical because that has the broadest community benefit.
  • Some of our contributors are volunteers. If they spent all of their time fixing issues reported with long delays, they may chose not to contribute. (This is not abstract - there are areas of LLVM I do not spent my volunteer time on because the overhead of working in that area is too high.)
  • Some of our contributors are employees. Generally, in most organizations getting funding for open ended maintenance is hard. Having bugs found quickly so that it's within the scope of the current project makes them much more likely to be fixed. (Ever)

There are many many interacting reasons here. None are "the reason", all contribute in some way.

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases.

The patch fixes the crash, but it looks like the compilation time with the patch grows by a factor of ~7 for one of the problematic files. I'll try to make a cleaner experiment with two compilers built with exactly the same compiler options, but I doubt it will change much.

Scratch that. It wasn't a valid experiment at all. Looking further.

Scratch that. It wasn't a valid experiment at all. Looking further.

Ok, please let me know if there's a regression. Otherwise I'll go ahead with the fix, as the compile-time-tracker shows this to be compile-time neutral (or a tiny bit positive).

Scratch that. It wasn't a valid experiment at all. Looking further.

Ok, please let me know if there's a regression. Otherwise I'll go ahead with the fix, as the compile-time-tracker shows this to be compile-time neutral (or a tiny bit positive).

https://reviews.llvm.org/rGf5f421e0eefa492545c3848e318c21ed04cb1ddd fixes all the problematic cases related to this patch. Compile time doesn't seem to be affected. Thanks!