Reduce memory footprint of AST Reader/Writer:
- Adjust internal data containers' element type.
- Switch to set for deduplication of deferred diags.
Differential D101793
[clang][AST] Improve AST Reader/Writer memory footprint weiwang on May 3 2021, 4:02 PM. Authored by
Details Reduce memory footprint of AST Reader/Writer:
Diff Detail
Event TimelineComment Actions We've seen a huge memory footprint from AST Reader/Writer in a single CU with module enabled from internal workloads. Upon further analysis, the content of vector DeclsToCheckForDeferredDiags seems mostly redundant. In one case, 1,734,387,685 out of 1,734,404,000 elements are the same. While this may indicate something wrong with the source itself, it also suggests that compiler would be better to perform deduplication on this type of Decl ID. This change also tries to restrict the data length of several other vectors which only deal with 32-bit unsigned int. Comment Actions Decls in Sema::DeclsToCheckForDeferredDiags is supposed to be unique. Therefore the fact that '1,734,387,685 out of 1,734,404,000 elements are the same' is surprising. Did this happen when you compile the source code and write AST? What language was the source code? C++, OpenMP, or CUDA? What was the decl that got duplicated? Thanks. Comment Actions Sorry for the late reply. This happens during a single compilation instance, and emits a module file from a big list of module map files and headers. I believe the code is pure C++. Because of the huge amount of duplications, the memory RSS shoots to over 50GB. I'll update once having more information. Comment Actions Finally dealt with the other issues I need to take care. Let's resume the discussion. I printed out the decls that are duplicated. The list of very long, but the top ones all seem from glibc headers. For example (the number after 'c:' is the duplication count), static inline __uint16_t __bswap_16(__uint16_t __bsx) { return ((__uint16_t)((((__bsx) >> 8) & 255) | (((__bsx) & 255) << 8))); } c:8581627 static inline __uint16_t __bswap_16(__uint16_t __bsx) c:8581627 static inline __uint32_t __bswap_32(__uint32_t __bsx) { return ((((__bsx) & 4278190080U) >> 24) | (((__bsx) & 16711680U) >> 8) | (((__bsx) & 65280U) << 8) | (((__bsx) & 255U) << 24)); } c:8581627 static inline __uint32_t __bswap_32(__uint32_t __bsx) c:8581627 static inline __uint64_t __bswap_64(__uint64_t __bsx) { return ((((__bsx) & 18374686479671623680ULL) >> 56) | (((__bsx) & 71776119061217280ULL) >> 40) | (((__bsx) & 280375465082880ULL) >> 24) | (((__bsx) & 1095216660480ULL) >> 8) | (((__bsx) & 4278190080ULL) << 8) | (((__bsx) & 16711680ULL) << 24) | (((__bsx) & 65280ULL) << 40) | (((__bsx) & 255ULL) << 56)); } c:8581627 static inline __uint64_t __bswap_64(__uint64_t __bsx) c:8581627 static inline __uint16_t __uint16_identity(__uint16_t __x) { return __x; } c:8581627 static inline __uint32_t __uint32_identity(__uint32_t __x) { return __x; } c:8581627 static inline __uint64_t __uint64_identity(__uint64_t __x) { return __x; } c:8581627 inline int iscanonical(float __val) { return ((void)(typeof (__val))(__val) , 1); } c:8581627 inline int iscanonical(double __val) { return ((void)(typeof (__val))(__val) , 1); } c:8581627 inline int iscanonical(long double __val) { return __iscanonicall(__val); } c:8581627 inline int issignaling(float __val) { return __issignalingf(__val); } c:8581627 inline int issignaling(double __val) { return __issignaling(__val); } c:8581627 inline int issignaling(long double __val) { return __issignalingl(__val); } c:8581627 static int __call(float __x, float __y) throw() { return __iseqsigf(__x, __y); } c:8581627 static int __call(double __x, double __y) throw() { return __iseqsig(__x, __y); } c:8581627 static int __call(long double __x, long double __y) throw() { return __iseqsigl(__x, __y); } c:8581627 I think the source codebase is a mix of conventional headers and modules, I don't know if that makes any difference. Comment Actions I think the root cause might be duplicated decls are added to Sema::DeclsToCheckForDeferredDiags defined in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Sema/Sema.h#L1789 When compiling source codes, a decl is added only once. However if modules are imported, duplicate decls may be added. We need to avoid adding duplicate decls to Sema::DeclsToCheckForDeferredDiags. However we cannot simply change it to a set since the order is important, otherwise the error message for later code may show up earlier, causing confusion for users. I would suggest to change its type to SetVector, which keeps the order and also avoids duplicates. Comment Actions Thanks for the suggestion! It does make more sense to use SetVector here. I will try this and report back. Also I think such decls are first read from AST files with ASTReader::ReadAST and filled into ASTReader::DeclsToCheckForDeferredDiags (a SmallVector), then into Sema::DeclsToCheckForDeferredDiags as external source in Sema::emitDeferredDiags. I Wonder if we set ASTReader::DeclsToCheckForDeferredDiags to be llvm::SmallSetVector as this diff currently does, would it eliminate duplication all along the way? Comment Actions Tried to make Sema::DeclsToCheckForDeferredDiags llvm::SmallSetVector. The heap RSS did drop significantly (from peak 100GB to 59GB) , but not as good as the current fix (peak 26GB), which makes ASTReader::DeclsToCheckForDeferredDiags llvm::SmallSetVector. I think the reason is that the duplicated decls are read from multiple module file sources (ASTReader::ReadAST() -> ASTReader::ReadASTBlock()), then stored into ASTReader::DeclsToCheckForDeferredDiags, then goes into Sema::DeclsToCheckForDeferredDiags in ASTReader::ReadDeclsToCheckForDeferredDiags(). Doing dedup at the early stage when the decls were just read in ASTReader is more effective at reducing RSS. Comment Actions What if you use SmallSetVector for both Sema::DeclsToCheckForDeferredDiags and ASTReader::DeclsToCheckForDeferredDiags? Does it cause extra memory usage compared to using it only for ASTReader::DeclsToCheckForDeferredDiags? Thanks. Comment Actions There would be a slight increase in memory usage since SmallSetVector requires more memory than just SmallVector internally, but given the majority the RSS comes from duplicated decls, I don't think it's an issue by making both SmallSetVector. If you think it's better to change both, I'll update the diff. Comment Actions make both ASTReader::DeclsToCheckForDeferredDiags and Sema::DeclsToCheckForDeferredDiags SmallSetVector Comment Actions Thanks for the approval! Just want to understand the list of "decls to check for deferred diagnostics" better, where are these decls coming from? And why do they need to be checked for warnings? I see decls from libc are in the list, but I have no idea why are they selected. Comment Actions For offloading languages e.g. OpenMP/CUDA/HIP, there are apparent errors in functions shared between host and device. However, unless these functions are sure to be emitted on device or host, these errors should not be emitted. These errors are so called deferred error messages. The function decls which need to be checked are recorded. After AST is finalized, the AST of these functions are iterated. If a function is found sure to be emitted, the deferred error message in it are emitted. Comment Actions Thanks! So the DeclsToCheckForDeferredDiags contains the candidate decls to be checked. The decls are selected because they would generate diags in the context of offloading languages, but whether or not an diag will be emitted is deferred till traversal of the AST is performed. I still don't quite understand why would libc functions be in the candidate list? They look simple enough (and I think they've been there forever). For example __uint16_identity (https://code.woboq.org/userspace/glibc/bits/uintn-identity.h.html#32), static inline __uint16_t __uint16_identity(__uint16_t __x) { return __x; } I don't see how this function would generate diag either on host or device. |