This is an archive of the discontinued LLVM Phabricator instance.

Add PassManagerImpl.h to hide implementation details
ClosedPublic

Authored by rnk on Jan 31 2020, 4:18 PM.

Details

Summary

ClangBuildAnalyzer results show that a lot of time is spent
instantiating AnalysisManager::getResultImpl across the code base:

**** Templates that took longest to instantiate:
 50445 ms: llvm::AnalysisManager<llvm::Function>::getResultImpl (412 times, avg 122 ms)
 47797 ms: llvm::AnalysisManager<llvm::Function>::getResult<llvm::TargetLibraryAnalysis> (389 times, avg 122 ms)
 46894 ms: std::tie<const unsigned long long, const bool> (2452 times, avg 19 ms)
 43851 ms: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096, 4096>::Allocate (3228 times, avg 13 ms)
 33911 ms: std::tie<const unsigned int, const unsigned int, const unsigned int, const unsigned int> (897 times, avg 37 ms)
 33854 ms: std::tie<const unsigned long long, const unsigned long long> (1897 times, avg 17 ms)
 27886 ms: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (11156 times, avg 2 ms)

I mentioned this result to @chandlerc, and he suggested this direction.

AnalysisManager is already explicitly instantiated, and getResultImpl
doesn't need to be inlined. Move the definition to an Impl header, and
include that header in files that explicitly instantiate
AnalysisManager. There are only four (real) IR units:

  • function
  • module
  • loop
  • cgscc

Looking at a specific transform (ArgumentPromotion.cpp), here are three
compilations before & after this change:

BEFORE:
$ for i in $(seq 3) ; do ./ccit.bat ; done
peak memory: 258.15MB
real: 0m6.297s
peak memory: 257.54MB
real: 0m5.906s
peak memory: 257.47MB
real: 0m6.219s

AFTER:
$ for i in $(seq 3) ; do ./ccit.bat ; done
peak memory: 235.35MB
real: 0m5.454s
peak memory: 234.72MB
real: 0m5.235s
peak memory: 234.39MB
real: 0m5.469s

The 20MB of memory saved seems real, and the time improvement seems like
it is there.

Diff Detail

Event Timeline

rnk created this revision.Jan 31 2020, 4:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2020, 4:18 PM

Unit tests: pass. 62378 tests passed, 0 failed and 839 were skipped.

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

MaskRay edited the summary of this revision. (Show Details)Jan 31 2020, 9:10 PM
MaskRay accepted this revision.Feb 2 2020, 7:34 PM

include/llvm/IR/PassManager.h is directly included by ~300 files. In most cases the moved functions are not used. Adding PassManagerImpl.h to files (4) that need them sounds good.

This revision is now accepted and ready to land.Feb 2 2020, 7:34 PM
This revision was automatically updated to reflect the committed changes.

It looks like this may have broken the modules build? Could you take a look?

(To reproduce, pass -DLLVM_ENABLE_MODULES=1 to cmake.)

http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/7775/

project/llvm/include/llvm/IR/PassManagerImpl.h:18:
In file included from <module-includes>:1:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/Argument.h:19:10: fatal error: cyclic dependency in module 'LLVM_IR': LLVM_IR -> LLVM_intrinsic_gen -> LLVM_IR
#include "llvm/IR/Value.h"
         ^
While building module 'LLVM_IR' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/IR/DIBuilder.cpp:13:
In file included from <module-includes>:36:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/PassManagerImpl.h:18:10: fatal error: could not build module 'LLVM_intrinsic_gen'
#include "llvm/IR/PassManager.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/IR/DIBuilder.cpp:13:10: fatal error: could not build module 'LLVM_IR'
#include "llvm/IR/DIBuilder.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
3 errors generated.
aprantl added subscribers: bruno, dblaikie.
aprantl added a subscriber: jkorous.

Any idea why is this class being instantiated/spending a lot of time on that instantiation if it's the subject of an explicit instantiation decl/def?

dblaikie added a subscriber: rsmith.Feb 3 2020, 2:22 PM

@rsmith - ping, just in case you've got thoughts on whether this should be a compile time performance improvement or a bug/problem in the compiler to be fixed (or something else about the way the source should be structured to avoid the cost)

rnk added a comment.Feb 3 2020, 2:36 PM

@aprantl, thanks, I fixed it in a follow-up module.map change: rGf8c4d70d1138.

Any idea why is this class being instantiated/spending a lot of time on that instantiation if it's the subject of an explicit instantiation decl/def?

As for why it's instantiated, most TUs instantiate AnalysisManager::getResult. I see 83 files with call sites in lib/Transforms. So, it is actually commonly required functionality. I haven't nailed down if it gets instantiated accidentally just from including PassManager.h. I think the answer is no, but either way this is an improvement.

As for why it's expensive, there are two DenseMaps in AnalysisManager, and this function inserts into both maps. In -ftime-trace, those instantiations happen to be attributed to getResultImpl. It's possible that some of those DenseMapPair & std::pair instantiations happen later anyway after this change, but you can see the memory usage reduction is real.

In D73817#1855867, @rnk wrote:

@aprantl, thanks, I fixed it in a follow-up module.map change: rGf8c4d70d1138.

Any idea why is this class being instantiated/spending a lot of time on that instantiation if it's the subject of an explicit instantiation decl/def?

As for why it's instantiated, most TUs instantiate AnalysisManager::getResult. I see 83 files with call sites in lib/Transforms. So, it is actually commonly required functionality. I haven't nailed down if it gets instantiated accidentally just from including PassManager.h. I think the answer is no, but either way this is an improvement.

But there's an explicit instantiation declaration, right? Which should prevent any implicit instantiation.

As for why it's expensive, there are two DenseMaps in AnalysisManager, and this function inserts into both maps. In -ftime-trace, those instantiations happen to be attributed to getResultImpl. It's possible that some of those DenseMapPair & std::pair instantiations happen later anyway after this change, but you can see the memory usage reduction is real.

My understanding is that the explicit instantiation declaration should actually make it invalid for the C++ compiler to instantiate the function at all, except where it sees the explicit instantiation definition.

For instance this example:

template<typename T>
void f1() {

typename T::u v;

}
extern template void f1<int>();
int main() {

f1<int>();

}

Successfully compiles because the compiler doesn't instantiate the contents of f1<int> due to the presence of the explicit instantiation declaration. If you add the "inline" keyword to the f1 declaration, then the compiler is allowed to instantiate f1<int> in this TU and it does reject it due to "T::u" being invalid for T=int.

Thanks a lot, Reid!

rnk added a comment.Feb 4 2020, 5:50 PM

But there's an explicit instantiation declaration, right? Which should prevent any implicit instantiation.

I think Clang will instantiate the template anyway to make an available_externally definition for optimization purposes.

As for why it's expensive, there are two DenseMaps in AnalysisManager, and this function inserts into both maps. In -ftime-trace, those instantiations happen to be attributed to getResultImpl. It's possible that some of those DenseMapPair & std::pair instantiations happen later anyway after this change, but you can see the memory usage reduction is real.

My understanding is that the explicit instantiation declaration should actually make it invalid for the C++ compiler to instantiate the function at all, except where it sees the explicit instantiation definition.

For instance this example:

template<typename T>
void f1() {

typename T::u v;

}
extern template void f1<int>();
int main() {

f1<int>();

}

Successfully compiles because the compiler doesn't instantiate the contents of f1<int> due to the presence of the explicit instantiation declaration. If you add the "inline" keyword to the f1 declaration, then the compiler is allowed to instantiate f1<int> in this TU and it does reject it due to "T::u" being invalid for T=int.

That's interesting. Clang produces the error if you put the function in a class template:

template <typename T> struct Foo {
  static void f1() { typename T::u v; }
};
extern template struct Foo<int>;
int main() { Foo<int>::f1(); }

It also errors if you add inline:

template <typename T> inline void f1() { typename T::u v; }
extern template void f1<int>();
int main() { f1<int>(); }

I think this proves the theory about available_externally definitions. In other words, extern template isn't as useful as we think it is.

In D73817#1858364, @rnk wrote:

But there's an explicit instantiation declaration, right? Which should prevent any implicit instantiation.

I think Clang will instantiate the template anyway to make an available_externally definition for optimization purposes.

My understanding is that it's not allowed to do this if the inline keyword is not present (which it doesn't appear to be in this case on any of the functions... oh, right, because they're defined inline so they're implicitly 'inline').

As for why it's expensive, there are two DenseMaps in AnalysisManager, and this function inserts into both maps. In -ftime-trace, those instantiations happen to be attributed to getResultImpl. It's possible that some of those DenseMapPair & std::pair instantiations happen later anyway after this change, but you can see the memory usage reduction is real.

My understanding is that the explicit instantiation declaration should actually make it invalid for the C++ compiler to instantiate the function at all, except where it sees the explicit instantiation definition.

For instance this example:

template<typename T>
void f1() {

typename T::u v;

}
extern template void f1<int>();
int main() {

f1<int>();

}

Successfully compiles because the compiler doesn't instantiate the contents of f1<int> due to the presence of the explicit instantiation declaration. If you add the "inline" keyword to the f1 declaration, then the compiler is allowed to instantiate f1<int> in this TU and it does reject it due to "T::u" being invalid for T=int.

That's interesting. Clang produces the error if you put the function in a class template:

template <typename T> struct Foo {
  static void f1() { typename T::u v; }
};
extern template struct Foo<int>;
int main() { Foo<int>::f1(); }

Ah, right - I missed the implicit inline in class members.

It also errors if you add inline:

template <typename T> inline void f1() { typename T::u v; }
extern template void f1<int>();
int main() { f1<int>(); }

I think this proves the theory about available_externally definitions. In other words, extern template isn't as useful as we think it is.

Except it can only do this if they're 'inline' (implicitly or explicitly), and with optimizations enabled. But if there's no particular benefit to inlining these, I can see the desire to disable that even in optimized builds by having the explicit instantiation.

With that in mind, I believe you could get the same functionality with your patch without splitting into a separate file that would need to be #included in certain places, by putting those definitions outside the class definition (as they are in the separate file) after the class definition in the same file.

This would make it harder to break the build (I believe this change broke some polly compilation - at least I couldn't figure out how to fix it & eventually removed polly from my LLVM_ENABLE_PROJECTS build) by keeping the header standalone/implicitly instantiable, but still getting the compile time benefits of the explicit instantiations

rnk added a comment.Feb 10 2020, 2:48 PM

Looks like polly was fixed here: rGd4c8230a0fdbcaf65

I guess I like the current code structure, but I don't feel that strongly about it. Even if moving the code back into the header but moving it out of the class body gets the same compile time benefits, it puts tokens into a widely included header that we don't need. The current structure raises the cost of instantiating new Pass & Analysis managers. You have to try to link, spot the link error, find the symbol, add the include as appropriate. Given how few instantiations (IR units) we expect there to be, it seems like a reasonable tradeoff.