This is an archive of the discontinued LLVM Phabricator instance.

[CaptureTracking] Avoid overly restrictive dominates check
AbandonedPublic

Authored by anna on Nov 3 2020, 8:26 AM.

Details

Summary

CapturesBefore tracker has an overly restrictive dominates check when
the BeforeHere and the capture point are in different basic blocks.
All we need to check is that there is no path from the capture point
to BeforeHere (which is less stricter than the dominates check).
See added testcase in one of the users of CapturesBefore.

Diff Detail

Event Timeline

anna created this revision.Nov 3 2020, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2020, 8:26 AM
anna requested review of this revision.Nov 3 2020, 8:26 AM
jdoerfert accepted this revision.Nov 3 2020, 9:01 AM

LGTM. Attributor does the same with an "optimistic" version of reachability via AAReachability. I guess that is why no test there changed.

This revision is now accepted and ready to land.Nov 3 2020, 9:01 AM
anna added a comment.Nov 3 2020, 10:55 AM

LGTM. Attributor does the same with an "optimistic" version of reachability via AAReachability. I guess that is why no test there changed.

Thanks Johannes. We see these limitations in clients that use this analysis downstream and I was surprised we didn't have a TODO testcase upstream either (i.e. no test cases failing).

This revision was landed with ongoing or failed builds.Nov 5 2020, 8:39 AM
This revision was automatically updated to reflect the committed changes.
nikic reopened this revision.Nov 5 2020, 9:29 AM
This revision is now accepted and ready to land.Nov 5 2020, 9:29 AM
anna added a comment.Nov 5 2020, 9:29 AM

thanks for the headsup. I've reverted the change and checking the testcase offline.

nikic requested changes to this revision.Nov 5 2020, 9:31 AM
nikic added a subscriber: nikic.

This change causes a large compile-time regression on some benchmarks: https://llvm-compile-time-tracker.com/compare.php?from=79d16764dd29aeddb7e6400e6b2d89d31653886c&to=15694fd6ad955c6a16b446a6324364111a49ae8b&stat=instructions

tramp3d-v4 is particularly badly affected, with a 25% instruction count regression in the ThinLTO configuration.

This revision now requires changes to proceed.Nov 5 2020, 9:31 AM
anna added a comment.Nov 5 2020, 9:56 AM

This change causes a large compile-time regression on some benchmarks: https://llvm-compile-time-tracker.com/compare.php?from=79d16764dd29aeddb7e6400e6b2d89d31653886c&to=15694fd6ad955c6a16b446a6324364111a49ae8b&stat=instructions

tramp3d-v4 is particularly badly affected, with a 25% instruction count regression in the ThinLTO configuration.

Thanks for pointing that out. I was under the impression the path check is more expensive than the dominates check (and the dominates was done to prevent compile time increase) and our bailouts are also higher (32 basic blocks for the path check).
It looks like this limitation of checking for dominates exists to avoid compile time regressions of this sort.

nikic added a comment.Nov 5 2020, 11:50 AM

This change causes a large compile-time regression on some benchmarks: https://llvm-compile-time-tracker.com/compare.php?from=79d16764dd29aeddb7e6400e6b2d89d31653886c&to=15694fd6ad955c6a16b446a6324364111a49ae8b&stat=instructions

tramp3d-v4 is particularly badly affected, with a 25% instruction count regression in the ThinLTO configuration.

Thanks for pointing that out. I was under the impression the path check is more expensive than the dominates check (and the dominates was done to prevent compile time increase) and our bailouts are also higher (32 basic blocks for the path check).
It looks like this limitation of checking for dominates exists to avoid compile time regressions of this sort.

Right. It may be possible to salvage this though. One very simple thing to try would be to cache the reachability query. This should make it much cheaper if there are many uses in the same block.

One thing I thought about: Do we have a test where the "nocapture use" has a second operand that captures?

nikic added a comment.Nov 8 2020, 11:57 AM

I've applied a couple of fixes to CaptureTracking, the main ones are: https://github.com/llvm/llvm-project/commit/d35366bccae0016418660337ce94e3d7d0ff391e is a potential correctness fix, https://github.com/llvm/llvm-project/commit/f63ab188c63be12871da75bfc5801a7fc752769b is a compile-time improvement.

I also gave caching the reachability query a try: https://llvm-compile-time-tracker.com/compare.php?from=5225c102649b5469d61385e598b744ac8f3dd1da&to=6d1d73dec948a0cbbf0dd6261d5664c4bfce92b1&stat=instructions The result is better, but we still have a 6% regression on tramp3d-v4 with ThinLTO (down from 25%).

One remaining problem in the current CaptureTracking implementation is that the use limit is not fully enforced: It's currently a limit on the the maximum number of direct uses, not transitive uses. If the value has 20 GEP uses, each of which has 20 uses itself, we'll happily explore all 400 of them.

anna added a comment.Nov 10 2020, 10:09 AM

Hi @nikic, some comments below.

I've applied a couple of fixes to CaptureTracking, the main ones are: https://github.com/llvm/llvm-project/commit/d35366bccae0016418660337ce94e3d7d0ff391e is a potential correctness fix, https://github.com/llvm/llvm-project/commit/f63ab188c63be12871da75bfc5801a7fc752769b is a compile-time improvement.

Ah, I think your correctness fix is what Johannes was implying by the "multiple args captured" in a single call.

I also gave caching the reachability query a try: https://llvm-compile-time-tracker.com/compare.php?from=5225c102649b5469d61385e598b744ac8f3dd1da&to=6d1d73dec948a0cbbf0dd6261d5664c4bfce92b1&stat=instructions The result is better, but we still have a 6% regression on tramp3d-v4 with ThinLTO (down from 25%).

Wow. this is great. Thank you for working on this! I'm curious if you have already landed the patch or have it for review? Perhaps further improvement will help reduce the compile time regression here.

One remaining problem in the current CaptureTracking implementation is that the use limit is not fully enforced: It's currently a limit on the the maximum number of direct uses, not transitive uses. If the value has 20 GEP uses, each of which has 20 uses itself, we'll happily explore all 400 of them.

Yeah, you're completely right. We should add a TODO/FIXME at the point we're capturing the direct uses for the worklist (there's a chance this triggers runtime degradations by fixing the "transitive uses" bug).
We also have a comment stating that this max uses restriction can be completely lifted if we cache the capture tracking analysis. The potential downside is that keeping track/adding incremental updates can be a source of bugs.

anna abandoned this revision.Sep 13 2021, 6:53 AM

This has since landed (see above comments from @nikic).