This is an archive of the discontinued LLVM Phabricator instance.

[MemorySSA] Add caching results of reaching LiveOnEntry MemoryDef to ClobberWalker
Needs ReviewPublic

Authored by eastig on Jan 15 2019, 7:02 AM.

Details

Summary

There is the project VIXL (https://git.linaro.org/arm/vixl.git), AArch32 and AArch64 Runtime Code Generation Library. The library is used by such projects as AOSP and HHVM. The library has C++ sources which trigger quadratic running time in MemorySSA.
For example, the compilation time of https://git.linaro.org/arm/vixl.git/tree/test/aarch32/test-assembler-cond-rd-operand-rn-in-it-block-t32.cc is 5 minutes. G++ 5.4 compiles it in 17 seconds. The file contains a big global array of structs for which an initialiser is generated. The issue was found by Google. At the moment a workaround to turn optimisations off for the sources is used: https://android.googlesource.com/platform/external/vixl/+/master/test/aarch32/test-utils-aarch32.h.

This patch is a prototype of caching some results in ClobberWalker. The idea of caching is based on a pattern I have seen. When there is a walk path leading to a LiveOnEntry MemoryDef its result can be reused as soon as we get on the path again.

The patch reduces the compilation time from 5 minutes to 25 seconds.
Command to reproduce:

clang++ -c -Wall -Werror -fdiagnostics-show-option -Wextra -Wredundant-decls -pedantic -Wwrite-strings -Wunused -DVIXL_INCLUDE_TARGET_A32 -DVIXL_INCLUDE_TARGET_T32 -DVIXL_INCLUDE_TARGET_A64 -O3 -DVIXL_CODE_BUFFER_MMAP -DVIXL_INCLUDE_SIMULATOR_AARCH64 -Wmissing-noreturn  -Wunreachable-code -Wno-long-long -Wno-variadic-macros -Isrc -Itest test/aarch32/test-assembler-cond-rd-operand-rn-in-it-block-t32.cc

Testing results:

  1. check-all passed
  2. VIXL tests passed
  3. Bootstrapped clang
    1. check-all passed
    2. VIXL tests passed

Diff Detail

Event Timeline

eastig created this revision.Jan 15 2019, 7:02 AM

Thanks for this!

It's unrelated to the general problem here, but for this repro in particular, it looks pretty simple to reduce compile time to ~2 seconds on clang ToT. Clang needs to emit a dynamic initializer for kTests because neither Condition nor Register are constexpr. If you tag kTests with [[clang::require_constant_initialization]] and address the complaints it has, it looks like just a handful of additions of constexpr will allow clang to elide the dynamic initializer entirely.

Back on topic, since we already keep per-Memory{Use,Def} caches of where things point to, and keeping up with those has been a constant source of bugs, I'm pretty reluctant to have a second cache here to have to reason about.

We originally had a similar cache embedded inside of the clobber walker (https://reviews.llvm.org/D31576). It was removed because it was a net loss (or neutral) in cycles and memory in all of the cases we could find. In general, for caches where you can't trivially figure out the things that point to a cached clobber (as we have here), we're in the unfortunate situation of needing to drop the entire thing on basically every meaningful update of any MemoryAccess, which makes it difficult for them to provide great value in general.

Looking at the repro, it appears that, after optimizations, we end up with a single basic block that consists of over 20,000 (!) lines of IR. Around 19,000 of these are stores. So, it makes sense that our current walker spins a lot for this. :)

When we first landed MemorySSA, there were talks about flagging pathological blocks like that and just refusing to walk/optimize them at all. For the reasons detailed above, I'd strongly prefer we do that over keeping a cache, unless a new cache is a very solid win in general. If you find that idea reasonable, it looks like the MaxCheckLimit flag is only respected in our OptimizeUses walker; I'd be happy to have some logic in our CachingWalker that says "if we walk more than MaxCheckLimit defs in a single block, give up."

What do you think?

lib/Analysis/MemorySSA.cpp
599

If we end up going forward with this, please make caching optional, so that we can have an EXPENSIVE_CHECKS assert to verify that we get the same result with/without our cache (without dropping our cache), as we did before.

Hi George,

Thank you for the comments and advice.

There is a problem with using constexpr. The VIXL library is built with c++98 by default. I'll speak to VIXL developers whether it is still a requirement.

I agree with you any caching is a source of bugs. I prefer using heuristics to adding functionality producing bugs.
I am not sure the current caching prototype is bug-free. I made some assumptions which might not be always true.

As the case is definitely pathological I'd be happy to have some kind of heuristics to filter it out.
I appreciate if someone familiar with the MemorySSA code adds the heuristics. If everyone is busy, I can try to do.

Thanks,
Evgeny

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2019, 4:50 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript

FWIW, the cause of the slowdown is the usage of MemorySSA in EarlyCSE.
Here's an alternative solution: D58248.
On my machine compile time goes from ~41s to ~11s (with vixl ToT today).

FWIW, the cause of the slowdown is the usage of MemorySSA in EarlyCSE.
Here's an alternative solution: D58248.
On my machine compile time goes from ~41s to ~11s (with vixl ToT today).

Thank you, Alina, for working on this.

asbirlea resigned from this revision.Sep 15 2021, 12:28 PM
sanjoy resigned from this revision.Jan 29 2022, 5:24 PM