DeadStoreElimination currently calls getModRefInfo() for each combination of
exit block and alloca (and similarly local allocs). Each one of these calls
is checking whether the given memory location is a non-escaping local. In
the case of one function I have, this is ~5,000 exit blocks * ~13,000 allocas
~65 million calls to getModRefInfo(). As a result it spends ~57s on
DeadStoreElimination. Unfortunately, DeadStoreElimination finds that it's not
able to make any changes at all so getModRefInfo() is returning the same result
5,000 times for each allocation. While none of the calls to getModRefInfo() are
redundant, 99.98% of the checks that the argument is a non-escaping local inside
it (which involves an expensive call to PointerMayBeCaptured()) are redundant
since this portion of getModRefInfo() depends on a property of the Value being
queried and not the particular call site.
DeadStoreElimination knows that all of its queries are about locals
(or equivalent to a local such as a non-escaping heap alloc), it doesn't cause
non-escaping locals to escape since it's only removing dead stores, and it
knows when a change may cause an escaping local to stop escaping. Therefore
it has everything it getModRefInfo() needs to cache the expensive
PointerMayBeCaptured() call and provide it to future calls.
This patch introduces a means to do that by extending MemoryLocation with a
KnownFlags member which can record pre-existing knowledge which can be used by
its clients to elide particularly expensive checks. This patch currently applies
this caching very conservatively within DeadStoreElimination. Any change at all
to the IR flushes the whole cache. This is partly to keep the overhead of the
cache maintenance down and partly to keep it simple. In the worst case, we end
up doing all the PointerMayBeCaptured() with a very small amount of overhead to
maintain the cache.
CTMark isn't significantly affected by this patch. With 10x multisampling, I see
two regressions:
- pairlocalalign: 0.10%
- sqlite3-link: 0.66%
and several minor improvements, the top 3 are:
- 7-zip-benchmark-link: -1.37%
- bullet-link: -0.92%
- kc-link: -0.82%
and overall the geomean has improved slightly (-0.21%). The resulting binaries
are unchanged. The more interesting result is the motivating function mentioned
above. Previously, DeadStoreElimination was taking ~57s on this function and now
takes ~20s (-65%).
Add an assert before this to check: