This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Lex/
-
clang/
-
Lex/
2/2
MacroInfo.h
-
lib/
-
Lex/
3/7
MacroInfo.cpp
-
PPDirectives.cpp
-
Serialization/
-
ASTReader.cpp
-
ASTWriter.cpp
-
unittests/Lex/
-
Lex/
-
CMakeLists.txt
-
PPMemoryAllocationsTest.cpp

Differential D117348

[Preprocessor] Reduce the memory overhead of `#define` directives
ClosedPublic

Authored by arphaman on Jan 14 2022, 11:14 AM.

Download Raw Diff

Details

Reviewers

ravikandhadai
egorzhdan
aaron.ballman
rsmith
ributzka

Commits

rG00cd6c04202a: [Preprocessor] Reduce the memory overhead of `#define` directives (Recommit)
rG0d9b91524ea4: [Preprocessor] Reduce the memory overhead of `#define` directives

Summary

Recently we observed high memory pressure caused by clang during some parallel builds. We discovered that we have several projects that have a large number of #define directives in their TUs (on the order of millions), which caused huge memory consumption in clang due to a lot of allocations for MacroInfo. We would like to reduce the memory overhead of clang for a single #define to reduce the memory overhead for these files, to allow us to reduce the memory pressure on the system during highly parallel builds. This change achieves that by removing the SmallVector in MacroInfo and instead storing the tokens in an array allocated using the bump pointer allocator, after all tokens are lexed.

The added unit test with 1000000 #define directives illustrates the problem. Prior to this change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616 bytes, and used roughly 272 bytes per #define. After this change, clang's PP bump pointer allocator allocates 120002016 bytes, and uses only roughly 120 bytes per #define.

For an example test file that we have internally with 7.8 million #define directives, this change produces the following improvement on arm64 macOS: Persistent allocation footprint for this test case file as it's being compiled to LLVM IR went down 22% from 5.28 GB to 4.07 GB and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore, this change reduced the total number of allocations made by the system for this clang invocation from 1454853 to 133663, an order of magnitude improvement.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

arphaman created this revision.Jan 14 2022, 11:14 AM

Herald added subscribers: ributzka, kristof.beyls, mgorny. · View Herald TranscriptJan 14 2022, 11:14 AM

arphaman requested review of this revision.Jan 14 2022, 11:14 AM

Harbormaster completed remote builds in B143461: Diff 400076.Jan 14 2022, 12:58 PM

LGTM

This revision is now accepted and ready to land.Jan 14 2022, 2:41 PM

LGTM!

Just some minor nits from me, but generally LG.

clang/include/clang/Lex/MacroInfo.h
243	I think this should be a `const_tokens_iterator` instead (and it's fine that we don't expose a non-const interface for the iterator).
256	Should we assert that we've not already allocated tokens before?
clang/lib/Lex/MacroInfo.cpp
33	Should we do this dance for 32-bit systems as well?
61	Please spell out the type.

dexonsmith added a subscriber: dexonsmith.Jan 18 2022, 2:29 PM

dexonsmith added inline comments.

clang/lib/Lex/MacroInfo.cpp
33	Do I remember correctly that `SourceLocation`'s size recently became configurable? Or maybe it will be soon? Should that be factored in somehow?

aaron.ballman added inline comments.Jan 19 2022, 7:25 AM

clang/lib/Lex/MacroInfo.cpp
33	Are you thinking about this review https://reviews.llvm.org/D97204 or something else?

dexonsmith added inline comments.Jan 19 2022, 1:35 PM

clang/lib/Lex/MacroInfo.cpp
33	Yes, I think that's the one.

aaron.ballman added inline comments.Jan 20 2022, 8:16 AM

clang/lib/Lex/MacroInfo.cpp
33	Yeah, it's probably not a bad idea to use `sizeof(SourceLocation)` instead of calculating the size manually for that bit.

Thanks, that feedback makes sense. I'll update the patch today.

Update to address review feedback, remove appendToken which is not needed as we're can just setTokens instead (it's a new macro info)

clang/lib/Lex/MacroInfo.cpp
33	Good idea. Done.

Harbormaster completed remote builds in B147027: Diff 405103.Feb 1 2022, 5:15 PM

LGTM once @aaron.ballman is happy.

Thanks, this LGTM as well! I don't think the precommit CI pipeline failures are related from what I can tell.

This revision was landed with ongoing or failed builds.Feb 11 2022, 3:01 PM

Closed by commit rG0d9b91524ea4: [Preprocessor] Reduce the memory overhead of `#define` directives (authored by arphaman). · Explain Why

This revision was automatically updated to reflect the committed changes.

arphaman added a commit: rG0d9b91524ea4: [Preprocessor] Reduce the memory overhead of `#define` directives.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2022, 3:01 PM

Very cool! Looks like it broke lldb builds though: http://45.33.8.238/linux/68321/step_4.txt

Yep, I just noticed. Reverting for now and will fix LLDB before recommitting.

Revert:

To github.com:llvm/llvm-project.git

bdf573652138..3f05192c4c40  main -> main

arphaman added a reverting change: rG3f05192c4c40: Revert "[Preprocessor] Reduce the memory overhead of `#define` directives".Feb 11 2022, 3:54 PM

abrachet mentioned this in D119598: [sanitizers] Fix missing header for mac builds.Feb 11 2022, 4:14 PM

arphaman added a commit: rG00cd6c04202a: [Preprocessor] Reduce the memory overhead of `#define` directives (Recommit).Feb 14 2022, 9:28 AM

Revision Contents

Path

Size

clang/

include/

clang/

Lex/

MacroInfo.h

48 lines

lib/

Lex/

MacroInfo.cpp

26 lines

PPDirectives.cpp

37 lines

Serialization/

ASTReader.cpp

12 lines

ASTWriter.cpp

1 line

unittests/

Lex/

CMakeLists.txt

1 line

PPMemoryAllocationsTest.cpp

97 lines

Diff 408070

clang/include/clang/Lex/MacroInfo.h

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	class MacroInfo {
/// The list of arguments for a function-like macro.		/// The list of arguments for a function-like macro.
///		///
/// ParameterList points to the first of NumParameters pointers.		/// ParameterList points to the first of NumParameters pointers.
///		///
/// This can be empty, for, e.g. "#define X()". In a C99-style variadic		/// This can be empty, for, e.g. "#define X()". In a C99-style variadic
/// macro, this includes the \c __VA_ARGS__ identifier on the list.		/// macro, this includes the \c __VA_ARGS__ identifier on the list.
IdentifierInfo **ParameterList = nullptr;		IdentifierInfo **ParameterList = nullptr;

		/// This is the list of tokens that the macro is defined to.
		const Token *ReplacementTokens = nullptr;

/// \see ParameterList		/// \see ParameterList
unsigned NumParameters = 0;		unsigned NumParameters = 0;

/// This is the list of tokens that the macro is defined to.		/// \see ReplacementTokens
SmallVector<Token, 8> ReplacementTokens;		unsigned NumReplacementTokens = 0;

/// Length in characters of the macro definition.		/// Length in characters of the macro definition.
mutable unsigned DefinitionLength;		mutable unsigned DefinitionLength;
mutable bool IsDefinitionLengthCached : 1;		mutable bool IsDefinitionLengthCached : 1;

/// True if this macro is function-like, false if it is object-like.		/// True if this macro is function-like, false if it is object-like.
bool IsFunctionLike : 1;		bool IsFunctionLike : 1;

▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	public:
bool isAllowRedefinitionsWithoutWarning() const {		bool isAllowRedefinitionsWithoutWarning() const {
return IsAllowRedefinitionsWithoutWarning;		return IsAllowRedefinitionsWithoutWarning;
}		}

/// Return true if we should emit a warning if the macro is unused.		/// Return true if we should emit a warning if the macro is unused.
bool isWarnIfUnused() const { return IsWarnIfUnused; }		bool isWarnIfUnused() const { return IsWarnIfUnused; }

/// Return the number of tokens that this macro expands to.		/// Return the number of tokens that this macro expands to.
unsigned getNumTokens() const { return ReplacementTokens.size(); }		unsigned getNumTokens() const { return NumReplacementTokens; }

const Token &getReplacementToken(unsigned Tok) const {		const Token &getReplacementToken(unsigned Tok) const {
assert(Tok < ReplacementTokens.size() && "Invalid token #");		assert(Tok < NumReplacementTokens && "Invalid token #");
return ReplacementTokens[Tok];		return ReplacementTokens[Tok];
}		}

using tokens_iterator = SmallVectorImpl<Token>::const_iterator;		using const_tokens_iterator = const Token *;
		aaron.ballmanUnsubmitted Done Reply Inline Actions I think this should be a `const_tokens_iterator` instead (and it's fine that we don't expose a non-const interface for the iterator). aaron.ballman: I think this should be a `const_tokens_iterator` instead (and it's fine that we don't expose a…

tokens_iterator tokens_begin() const { return ReplacementTokens.begin(); }		const_tokens_iterator tokens_begin() const { return ReplacementTokens; }
tokens_iterator tokens_end() const { return ReplacementTokens.end(); }		const_tokens_iterator tokens_end() const {
bool tokens_empty() const { return ReplacementTokens.empty(); }		return ReplacementTokens + NumReplacementTokens;
ArrayRef<Token> tokens() const { return ReplacementTokens; }		}
		bool tokens_empty() const { return NumReplacementTokens == 0; }
		ArrayRef<Token> tokens() const {
		return llvm::makeArrayRef(ReplacementTokens, NumReplacementTokens);
		}

/// Add the specified token to the replacement text for the macro.		llvm::MutableArrayRef<Token>
void AddTokenToBody(const Token &Tok) {		allocateTokens(unsigned NumTokens, llvm::BumpPtrAllocator &PPAllocator) {
		assert(ReplacementTokens == nullptr && NumReplacementTokens == 0 &&
		aaron.ballmanUnsubmitted Done Reply Inline Actions Should we assert that we've not already allocated tokens before? aaron.ballman: Should we assert that we've not already allocated tokens before?
		"Token list already allocated!");
		NumReplacementTokens = NumTokens;
		Token *NewReplacementTokens = PPAllocator.Allocate<Token>(NumTokens);
		ReplacementTokens = NewReplacementTokens;
		return llvm::makeMutableArrayRef(NewReplacementTokens, NumTokens);
		}

		void setTokens(ArrayRef<Token> Tokens, llvm::BumpPtrAllocator &PPAllocator) {
assert(		assert(
!IsDefinitionLengthCached &&		!IsDefinitionLengthCached &&
"Changing replacement tokens after definition length got calculated");		"Changing replacement tokens after definition length got calculated");
ReplacementTokens.push_back(Tok);		assert(ReplacementTokens == nullptr && NumReplacementTokens == 0 &&
		"Token list already set!");
		if (Tokens.empty())
		return;

		NumReplacementTokens = Tokens.size();
		Token *NewReplacementTokens = PPAllocator.Allocate<Token>(Tokens.size());
		std::copy(Tokens.begin(), Tokens.end(), NewReplacementTokens);
		ReplacementTokens = NewReplacementTokens;
}		}

/// Return true if this macro is enabled.		/// Return true if this macro is enabled.
///		///
/// In other words, that we are not currently in an expansion of this macro.		/// In other words, that we are not currently in an expansion of this macro.
bool isEnabled() const { return !IsDisabled; }		bool isEnabled() const { return !IsDisabled; }

void EnableMacro() {		void EnableMacro() {
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

clang/lib/Lex/MacroInfo.cpp

Show All 22 Lines
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cassert>		#include <cassert>
#include <utility>		#include <utility>

using namespace clang;		using namespace clang;

		namespace {

		// MacroInfo is expected to take 40 bytes on platforms with an 8 byte pointer
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Should we do this dance for 32-bit systems as well? aaron.ballman: Should we do this dance for 32-bit systems as well?
		dexonsmithUnsubmitted Not Done Reply Inline Actions Do I remember correctly that `SourceLocation`'s size recently became configurable? Or maybe it will be soon? Should that be factored in somehow? dexonsmith: Do I remember correctly that `SourceLocation`'s size recently became configurable? Or maybe it…
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Are you thinking about this review https://reviews.llvm.org/D97204 or something else? aaron.ballman: Are you thinking about this review https://reviews.llvm.org/D97204 or something else?
		dexonsmithUnsubmitted Not Done Reply Inline Actions Yes, I think that's the one. dexonsmith: Yes, I think that's the one.
		aaron.ballmanUnsubmitted Done Reply Inline Actions Yeah, it's probably not a bad idea to use `sizeof(SourceLocation)` instead of calculating the size manually for that bit. aaron.ballman: Yeah, it's probably not a bad idea to use `sizeof(SourceLocation)` instead of calculating the…
		arphamanAuthorUnsubmitted Done Reply Inline Actions Good idea. Done. arphaman: Good idea. Done.
		// and 4 byte SourceLocation.
		template <int> class MacroInfoSizeChecker {
		public:
		constexpr static bool AsExpected = true;
		};
		template <> class MacroInfoSizeChecker<8> {
		public:
		constexpr static bool AsExpected =
		sizeof(MacroInfo) == (32 + sizeof(SourceLocation) * 2);
		};

		static_assert(MacroInfoSizeChecker<sizeof(void *)>::AsExpected,
		"Unexpected size of MacroInfo");

		} // end namespace

MacroInfo::MacroInfo(SourceLocation DefLoc)		MacroInfo::MacroInfo(SourceLocation DefLoc)
: Location(DefLoc), IsDefinitionLengthCached(false), IsFunctionLike(false),		: Location(DefLoc), IsDefinitionLengthCached(false), IsFunctionLike(false),
IsC99Varargs(false), IsGNUVarargs(false), IsBuiltinMacro(false),		IsC99Varargs(false), IsGNUVarargs(false), IsBuiltinMacro(false),
HasCommaPasting(false), IsDisabled(false), IsUsed(false),		HasCommaPasting(false), IsDisabled(false), IsUsed(false),
IsAllowRedefinitionsWithoutWarning(false), IsWarnIfUnused(false),		IsAllowRedefinitionsWithoutWarning(false), IsWarnIfUnused(false),
UsedForHeaderGuard(false) {}		UsedForHeaderGuard(false) {}

unsigned MacroInfo::getDefinitionLengthSlow(const SourceManager &SM) const {		unsigned MacroInfo::getDefinitionLengthSlow(const SourceManager &SM) const {
assert(!IsDefinitionLengthCached);		assert(!IsDefinitionLengthCached);
IsDefinitionLengthCached = true;		IsDefinitionLengthCached = true;

		ArrayRef<Token> ReplacementTokens = tokens();
		aaron.ballmanUnsubmitted Done Reply Inline Actions Please spell out the type. aaron.ballman: Please spell out the type.
if (ReplacementTokens.empty())		if (ReplacementTokens.empty())
return (DefinitionLength = 0);		return (DefinitionLength = 0);

const Token &firstToken = ReplacementTokens.front();		const Token &firstToken = ReplacementTokens.front();
const Token &lastToken = ReplacementTokens.back();		const Token &lastToken = ReplacementTokens.back();
SourceLocation macroStart = firstToken.getLocation();		SourceLocation macroStart = firstToken.getLocation();
SourceLocation macroEnd = lastToken.getLocation();		SourceLocation macroEnd = lastToken.getLocation();
assert(macroStart.isValid() && macroEnd.isValid());		assert(macroStart.isValid() && macroEnd.isValid());
Show All 21 Lines
/// if they use different identifiers for the function macro parameters.		/// if they use different identifiers for the function macro parameters.
/// Otherwise the comparison is lexical and this implements the rules in		/// Otherwise the comparison is lexical and this implements the rules in
/// C99 6.10.3.		/// C99 6.10.3.
bool MacroInfo::isIdenticalTo(const MacroInfo &Other, Preprocessor &PP,		bool MacroInfo::isIdenticalTo(const MacroInfo &Other, Preprocessor &PP,
bool Syntactically) const {		bool Syntactically) const {
bool Lexically = !Syntactically;		bool Lexically = !Syntactically;

// Check # tokens in replacement, number of args, and various flags all match.		// Check # tokens in replacement, number of args, and various flags all match.
if (ReplacementTokens.size() != Other.ReplacementTokens.size() \|\|		if (getNumTokens() != Other.getNumTokens() \|\|
getNumParams() != Other.getNumParams() \|\|		getNumParams() != Other.getNumParams() \|\|
isFunctionLike() != Other.isFunctionLike() \|\|		isFunctionLike() != Other.isFunctionLike() \|\|
isC99Varargs() != Other.isC99Varargs() \|\|		isC99Varargs() != Other.isC99Varargs() \|\|
isGNUVarargs() != Other.isGNUVarargs())		isGNUVarargs() != Other.isGNUVarargs())
return false;		return false;

if (Lexically) {		if (Lexically) {
// Check arguments.		// Check arguments.
for (param_iterator I = param_begin(), OI = Other.param_begin(),		for (param_iterator I = param_begin(), OI = Other.param_begin(),
E = param_end();		E = param_end();
I != E; ++I, ++OI)		I != E; ++I, ++OI)
if (I != OI) return false;		if (I != OI) return false;
}		}

// Check all the tokens.		// Check all the tokens.
for (unsigned i = 0, e = ReplacementTokens.size(); i != e; ++i) {		for (unsigned i = 0; i != NumReplacementTokens; ++i) {
const Token &A = ReplacementTokens[i];		const Token &A = ReplacementTokens[i];
const Token &B = Other.ReplacementTokens[i];		const Token &B = Other.ReplacementTokens[i];
if (A.getKind() != B.getKind())		if (A.getKind() != B.getKind())
return false;		return false;

// If this isn't the first first token, check that the whitespace and		// If this isn't the first first token, check that the whitespace and
// start-of-line characteristics match.		// start-of-line characteristics match.
if (i != 0 &&		if (i != 0 &&
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (IsFunctionLike) {
if (IsC99Varargs \|\| IsGNUVarargs) {		if (IsC99Varargs \|\| IsGNUVarargs) {
if (NumParameters && IsC99Varargs) Out << ", ";		if (NumParameters && IsC99Varargs) Out << ", ";
Out << "...";		Out << "...";
}		}
Out << ")";		Out << ")";
}		}

bool First = true;		bool First = true;
for (const Token &Tok : ReplacementTokens) {		for (const Token &Tok : tokens()) {
// Leading space is semantically meaningful in a macro definition,		// Leading space is semantically meaningful in a macro definition,
// so preserve it in the dump output.		// so preserve it in the dump output.
if (First \|\| Tok.hasLeadingSpace())		if (First \|\| Tok.hasLeadingSpace())
Out << " ";		Out << " ";
First = false;		First = false;

if (const char *Punc = tok::getPunctuatorSpelling(Tok.getKind()))		if (const char *Punc = tok::getPunctuatorSpelling(Tok.getKind()))
Out << Punc;		Out << Punc;
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

clang/lib/Lex/PPDirectives.cpp

Show First 20 Lines • Show All 2,705 Lines • ▼ Show 20 Lines	if (isInvalid)
Diag(Tok, diag::ext_missing_whitespace_after_macro_name);		Diag(Tok, diag::ext_missing_whitespace_after_macro_name);
else		else
Diag(Tok, diag::warn_missing_whitespace_after_macro_name);		Diag(Tok, diag::warn_missing_whitespace_after_macro_name);
}		}

if (!Tok.is(tok::eod))		if (!Tok.is(tok::eod))
LastTok = Tok;		LastTok = Tok;

		SmallVector<Token, 16> Tokens;

// Read the rest of the macro body.		// Read the rest of the macro body.
if (MI->isObjectLike()) {		if (MI->isObjectLike()) {
// Object-like macros are very simple, just read their body.		// Object-like macros are very simple, just read their body.
while (Tok.isNot(tok::eod)) {		while (Tok.isNot(tok::eod)) {
LastTok = Tok;		LastTok = Tok;
MI->AddTokenToBody(Tok);		Tokens.push_back(Tok);
// Get the next token of the macro.		// Get the next token of the macro.
LexUnexpandedToken(Tok);		LexUnexpandedToken(Tok);
}		}
} else {		} else {
// Otherwise, read the body of a function-like macro. While we are at it,		// Otherwise, read the body of a function-like macro. While we are at it,
// check C99 6.10.3.2p1: ensure that # operators are followed by macro		// check C99 6.10.3.2p1: ensure that # operators are followed by macro
// parameters in function-like macro expansions.		// parameters in function-like macro expansions.

VAOptDefinitionContext VAOCtx(*this);		VAOptDefinitionContext VAOCtx(*this);

while (Tok.isNot(tok::eod)) {		while (Tok.isNot(tok::eod)) {
LastTok = Tok;		LastTok = Tok;

if (!Tok.isOneOf(tok::hash, tok::hashat, tok::hashhash)) {		if (!Tok.isOneOf(tok::hash, tok::hashat, tok::hashhash)) {
MI->AddTokenToBody(Tok);		Tokens.push_back(Tok);

if (VAOCtx.isVAOptToken(Tok)) {		if (VAOCtx.isVAOptToken(Tok)) {
// If we're already within a VAOPT, emit an error.		// If we're already within a VAOPT, emit an error.
if (VAOCtx.isInVAOpt()) {		if (VAOCtx.isInVAOpt()) {
Diag(Tok, diag::err_pp_vaopt_nested_use);		Diag(Tok, diag::err_pp_vaopt_nested_use);
return nullptr;		return nullptr;
}		}
// Ensure VAOPT is followed by a '(' .		// Ensure VAOPT is followed by a '(' .
LexUnexpandedToken(Tok);		LexUnexpandedToken(Tok);
if (Tok.isNot(tok::l_paren)) {		if (Tok.isNot(tok::l_paren)) {
Diag(Tok, diag::err_pp_missing_lparen_in_vaopt_use);		Diag(Tok, diag::err_pp_missing_lparen_in_vaopt_use);
return nullptr;		return nullptr;
}		}
MI->AddTokenToBody(Tok);		Tokens.push_back(Tok);
VAOCtx.sawVAOptFollowedByOpeningParens(Tok.getLocation());		VAOCtx.sawVAOptFollowedByOpeningParens(Tok.getLocation());
LexUnexpandedToken(Tok);		LexUnexpandedToken(Tok);
if (Tok.is(tok::hashhash)) {		if (Tok.is(tok::hashhash)) {
Diag(Tok, diag::err_vaopt_paste_at_start);		Diag(Tok, diag::err_vaopt_paste_at_start);
return nullptr;		return nullptr;
}		}
continue;		continue;
} else if (VAOCtx.isInVAOpt()) {		} else if (VAOCtx.isInVAOpt()) {
if (Tok.is(tok::r_paren)) {		if (Tok.is(tok::r_paren)) {
if (VAOCtx.sawClosingParen()) {		if (VAOCtx.sawClosingParen()) {
const unsigned NumTokens = MI->getNumTokens();		assert(Tokens.size() >= 3 &&
assert(NumTokens >= 3 && "Must have seen at least __VA_OPT__( "		"Must have seen at least __VA_OPT__( "
"and a subsequent tok::r_paren");		"and a subsequent tok::r_paren");
if (MI->getReplacementToken(NumTokens - 2).is(tok::hashhash)) {		if (Tokens[Tokens.size() - 2].is(tok::hashhash)) {
Diag(Tok, diag::err_vaopt_paste_at_end);		Diag(Tok, diag::err_vaopt_paste_at_end);
return nullptr;		return nullptr;
}		}
}		}
} else if (Tok.is(tok::l_paren)) {		} else if (Tok.is(tok::l_paren)) {
VAOCtx.sawOpeningParen(Tok.getLocation());		VAOCtx.sawOpeningParen(Tok.getLocation());
}		}
}		}
// Get the next token of the macro.		// Get the next token of the macro.
LexUnexpandedToken(Tok);		LexUnexpandedToken(Tok);
continue;		continue;
}		}

// If we're in -traditional mode, then we should ignore stringification		// If we're in -traditional mode, then we should ignore stringification
// and token pasting. Mark the tokens as unknown so as not to confuse		// and token pasting. Mark the tokens as unknown so as not to confuse
// things.		// things.
if (getLangOpts().TraditionalCPP) {		if (getLangOpts().TraditionalCPP) {
Tok.setKind(tok::unknown);		Tok.setKind(tok::unknown);
MI->AddTokenToBody(Tok);		Tokens.push_back(Tok);

// Get the next token of the macro.		// Get the next token of the macro.
LexUnexpandedToken(Tok);		LexUnexpandedToken(Tok);
continue;		continue;
}		}

if (Tok.is(tok::hashhash)) {		if (Tok.is(tok::hashhash)) {
// If we see token pasting, check if it looks like the gcc comma		// If we see token pasting, check if it looks like the gcc comma
// pasting extension. We'll use this information to suppress		// pasting extension. We'll use this information to suppress
// diagnostics later on.		// diagnostics later on.

// Get the next token of the macro.		// Get the next token of the macro.
LexUnexpandedToken(Tok);		LexUnexpandedToken(Tok);

if (Tok.is(tok::eod)) {		if (Tok.is(tok::eod)) {
MI->AddTokenToBody(LastTok);		Tokens.push_back(LastTok);
break;		break;
}		}

unsigned NumTokens = MI->getNumTokens();		if (!Tokens.empty() && Tok.getIdentifierInfo() == Ident__VA_ARGS__ &&
if (NumTokens && Tok.getIdentifierInfo() == Ident__VA_ARGS__ &&		Tokens[Tokens.size() - 1].is(tok::comma))
MI->getReplacementToken(NumTokens-1).is(tok::comma))
MI->setHasCommaPasting();		MI->setHasCommaPasting();

// Things look ok, add the '##' token to the macro.		// Things look ok, add the '##' token to the macro.
MI->AddTokenToBody(LastTok);		Tokens.push_back(LastTok);
continue;		continue;
}		}

// Our Token is a stringization operator.		// Our Token is a stringization operator.
// Get the next token of the macro.		// Get the next token of the macro.
LexUnexpandedToken(Tok);		LexUnexpandedToken(Tok);

// Check for a valid macro arg identifier or __VA_OPT__.		// Check for a valid macro arg identifier or __VA_OPT__.
if (!VAOCtx.isVAOptToken(Tok) &&		if (!VAOCtx.isVAOptToken(Tok) &&
(Tok.getIdentifierInfo() == nullptr \|\|		(Tok.getIdentifierInfo() == nullptr \|\|
MI->getParameterNum(Tok.getIdentifierInfo()) == -1)) {		MI->getParameterNum(Tok.getIdentifierInfo()) == -1)) {

// If this is assembler-with-cpp mode, we accept random gibberish after		// If this is assembler-with-cpp mode, we accept random gibberish after
// the '#' because '#' is often a comment character. However, change		// the '#' because '#' is often a comment character. However, change
// the kind of the token to tok::unknown so that the preprocessor isn't		// the kind of the token to tok::unknown so that the preprocessor isn't
// confused.		// confused.
if (getLangOpts().AsmPreprocessor && Tok.isNot(tok::eod)) {		if (getLangOpts().AsmPreprocessor && Tok.isNot(tok::eod)) {
LastTok.setKind(tok::unknown);		LastTok.setKind(tok::unknown);
MI->AddTokenToBody(LastTok);		Tokens.push_back(LastTok);
continue;		continue;
} else {		} else {
Diag(Tok, diag::err_pp_stringize_not_parameter)		Diag(Tok, diag::err_pp_stringize_not_parameter)
<< LastTok.is(tok::hashat);		<< LastTok.is(tok::hashat);
return nullptr;		return nullptr;
}		}
}		}

// Things look ok, add the '#' and param name tokens to the macro.		// Things look ok, add the '#' and param name tokens to the macro.
MI->AddTokenToBody(LastTok);		Tokens.push_back(LastTok);

// If the token following '#' is VAOPT, let the next iteration handle it		// If the token following '#' is VAOPT, let the next iteration handle it
// and check it for correctness, otherwise add the token and prime the		// and check it for correctness, otherwise add the token and prime the
// loop with the next one.		// loop with the next one.
if (!VAOCtx.isVAOptToken(Tok)) {		if (!VAOCtx.isVAOptToken(Tok)) {
MI->AddTokenToBody(Tok);		Tokens.push_back(Tok);
LastTok = Tok;		LastTok = Tok;

// Get the next token of the macro.		// Get the next token of the macro.
LexUnexpandedToken(Tok);		LexUnexpandedToken(Tok);
}		}
}		}
if (VAOCtx.isInVAOpt()) {		if (VAOCtx.isInVAOpt()) {
assert(Tok.is(tok::eod) && "Must be at End Of preprocessing Directive");		assert(Tok.is(tok::eod) && "Must be at End Of preprocessing Directive");
Diag(Tok, diag::err_pp_expected_after)		Diag(Tok, diag::err_pp_expected_after)
<< LastTok.getKind() << tok::r_paren;		<< LastTok.getKind() << tok::r_paren;
Diag(VAOCtx.getUnmatchedOpeningParenLoc(), diag::note_matching) << tok::l_paren;		Diag(VAOCtx.getUnmatchedOpeningParenLoc(), diag::note_matching) << tok::l_paren;
return nullptr;		return nullptr;
}		}
}		}
MI->setDefinitionEndLoc(LastTok.getLocation());		MI->setDefinitionEndLoc(LastTok.getLocation());

		MI->setTokens(Tokens, BP);
return MI;		return MI;
}		}
/// HandleDefineDirective - Implements \#define. This consumes the entire macro		/// HandleDefineDirective - Implements \#define. This consumes the entire macro
/// line then lets the caller lex the next real token.		/// line then lets the caller lex the next real token.
void Preprocessor::HandleDefineDirective(		void Preprocessor::HandleDefineDirective(
Token &DefineTok, const bool ImmediatelyAfterHeaderGuard) {		Token &DefineTok, const bool ImmediatelyAfterHeaderGuard) {
++NumDefined;		++NumDefined;

▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	if (!getLangOpts().CPlusPlus && getLangOpts().MSVCCompat &&
MacroNameTok.getIdentifierInfo()->isStr("assert") &&		MacroNameTok.getIdentifierInfo()->isStr("assert") &&
!isMacroDefined("static_assert")) {		!isMacroDefined("static_assert")) {
MacroInfo *MI = AllocateMacroInfo(SourceLocation());		MacroInfo *MI = AllocateMacroInfo(SourceLocation());

Token Tok;		Token Tok;
Tok.startToken();		Tok.startToken();
Tok.setKind(tok::kw__Static_assert);		Tok.setKind(tok::kw__Static_assert);
Tok.setIdentifierInfo(getIdentifierInfo("_Static_assert"));		Tok.setIdentifierInfo(getIdentifierInfo("_Static_assert"));
MI->AddTokenToBody(Tok);		MI->setTokens({Tok}, BP);
(void)appendDefMacroDirective(getIdentifierInfo("static_assert"), MI);		(void)appendDefMacroDirective(getIdentifierInfo("static_assert"), MI);
}		}
}		}

/// HandleUndefDirective - Implements \#undef.		/// HandleUndefDirective - Implements \#undef.
///		///
void Preprocessor::HandleUndefDirective() {		void Preprocessor::HandleUndefDirective() {
++NumUndefined;		++NumUndefined;
▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

clang/lib/Serialization/ASTReader.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,688 Lines • ▼ Show 20 Lines	MacroInfo *ASTReader::ReadMacroRecord(ModuleFile &F, uint64_t Offset) {
if (llvm::Error Err = Stream.JumpToBit(Offset)) {		if (llvm::Error Err = Stream.JumpToBit(Offset)) {
// FIXME this drops errors on the floor.		// FIXME this drops errors on the floor.
consumeError(std::move(Err));		consumeError(std::move(Err));
return nullptr;		return nullptr;
}		}
RecordData Record;		RecordData Record;
SmallVector<IdentifierInfo*, 16> MacroParams;		SmallVector<IdentifierInfo*, 16> MacroParams;
MacroInfo *Macro = nullptr;		MacroInfo *Macro = nullptr;
		llvm::MutableArrayRef<Token> MacroTokens;

while (true) {		while (true) {
// Advance to the next record, but if we get to the end of the block, don't		// Advance to the next record, but if we get to the end of the block, don't
// pop it (removing all the abbreviations from the cursor) since we want to		// pop it (removing all the abbreviations from the cursor) since we want to
// be able to reseek within the block and read entries.		// be able to reseek within the block and read entries.
unsigned Flags = BitstreamCursor::AF_DontPopBlockAtEnd;		unsigned Flags = BitstreamCursor::AF_DontPopBlockAtEnd;
Expected<llvm::BitstreamEntry> MaybeEntry =		Expected<llvm::BitstreamEntry> MaybeEntry =
Stream.advanceSkippingSubblocks(Flags);		Stream.advanceSkippingSubblocks(Flags);
Show All 38 Lines	case PP_MACRO_FUNCTION_LIKE: {
return Macro;		return Macro;

unsigned NextIndex = 1; // Skip identifier ID.		unsigned NextIndex = 1; // Skip identifier ID.
SourceLocation Loc = ReadSourceLocation(F, Record, NextIndex);		SourceLocation Loc = ReadSourceLocation(F, Record, NextIndex);
MacroInfo *MI = PP.AllocateMacroInfo(Loc);		MacroInfo *MI = PP.AllocateMacroInfo(Loc);
MI->setDefinitionEndLoc(ReadSourceLocation(F, Record, NextIndex));		MI->setDefinitionEndLoc(ReadSourceLocation(F, Record, NextIndex));
MI->setIsUsed(Record[NextIndex++]);		MI->setIsUsed(Record[NextIndex++]);
MI->setUsedForHeaderGuard(Record[NextIndex++]);		MI->setUsedForHeaderGuard(Record[NextIndex++]);
		MacroTokens = MI->allocateTokens(Record[NextIndex++],
		PP.getPreprocessorAllocator());
if (RecType == PP_MACRO_FUNCTION_LIKE) {		if (RecType == PP_MACRO_FUNCTION_LIKE) {
// Decode function-like macro info.		// Decode function-like macro info.
bool isC99VarArgs = Record[NextIndex++];		bool isC99VarArgs = Record[NextIndex++];
bool isGNUVarArgs = Record[NextIndex++];		bool isGNUVarArgs = Record[NextIndex++];
bool hasCommaPasting = Record[NextIndex++];		bool hasCommaPasting = Record[NextIndex++];
MacroParams.clear();		MacroParams.clear();
unsigned NumArgs = Record[NextIndex++];		unsigned NumArgs = Record[NextIndex++];
for (unsigned i = 0; i != NumArgs; ++i)		for (unsigned i = 0; i != NumArgs; ++i)
Show All 28 Lines	case PP_MACRO_FUNCTION_LIKE: {
++NumMacrosRead;		++NumMacrosRead;
break;		break;
}		}

case PP_TOKEN: {		case PP_TOKEN: {
// If we see a TOKEN before a PP_MACRO_*, then the file is		// If we see a TOKEN before a PP_MACRO_*, then the file is
// erroneous, just pretend we didn't see this.		// erroneous, just pretend we didn't see this.
if (!Macro) break;		if (!Macro) break;
		if (MacroTokens.empty()) {
		Error("unexpected number of macro tokens for a macro in AST file");
		return Macro;
		}

unsigned Idx = 0;		unsigned Idx = 0;
Token Tok = ReadToken(F, Record, Idx);		MacroTokens[0] = ReadToken(F, Record, Idx);
Macro->AddTokenToBody(Tok);		MacroTokens = MacroTokens.drop_front();
break;		break;
}		}
}		}
}		}
}		}

PreprocessedEntityID		PreprocessedEntityID
ASTReader::getGlobalPreprocessedEntityID(ModuleFile &M,		ASTReader::getGlobalPreprocessedEntityID(ModuleFile &M,
▲ Show 20 Lines • Show All 11,127 Lines • Show Last 20 Lines

clang/lib/Serialization/ASTWriter.cpp

Show First 20 Lines • Show All 2,425 Lines • ▼ Show 20 Lines	for (unsigned I = 0, N = MacroInfosToEmit.size(); I != N; ++I) {
assert((Offset >> 32) == 0 && "Macro offset too large");		assert((Offset >> 32) == 0 && "Macro offset too large");
MacroOffsets[Index] = Offset;		MacroOffsets[Index] = Offset;

AddIdentifierRef(Name, Record);		AddIdentifierRef(Name, Record);
AddSourceLocation(MI->getDefinitionLoc(), Record);		AddSourceLocation(MI->getDefinitionLoc(), Record);
AddSourceLocation(MI->getDefinitionEndLoc(), Record);		AddSourceLocation(MI->getDefinitionEndLoc(), Record);
Record.push_back(MI->isUsed());		Record.push_back(MI->isUsed());
Record.push_back(MI->isUsedForHeaderGuard());		Record.push_back(MI->isUsedForHeaderGuard());
		Record.push_back(MI->getNumTokens());
unsigned Code;		unsigned Code;
if (MI->isObjectLike()) {		if (MI->isObjectLike()) {
Code = PP_MACRO_OBJECT_LIKE;		Code = PP_MACRO_OBJECT_LIKE;
} else {		} else {
Code = PP_MACRO_FUNCTION_LIKE;		Code = PP_MACRO_FUNCTION_LIKE;

Record.push_back(MI->isC99Varargs());		Record.push_back(MI->isC99Varargs());
Record.push_back(MI->isGNUVarargs());		Record.push_back(MI->isGNUVarargs());
▲ Show 20 Lines • Show All 4,455 Lines • Show Last 20 Lines

clang/unittests/Lex/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	Support			Support
	)			)

	add_clang_unittest(LexTests			add_clang_unittest(LexTests
	DependencyDirectivesSourceMinimizerTest.cpp			DependencyDirectivesSourceMinimizerTest.cpp
	HeaderMapTest.cpp			HeaderMapTest.cpp
	HeaderSearchTest.cpp			HeaderSearchTest.cpp
	LexerTest.cpp			LexerTest.cpp
	PPCallbacksTest.cpp			PPCallbacksTest.cpp
	PPConditionalDirectiveRecordTest.cpp			PPConditionalDirectiveRecordTest.cpp
				PPMemoryAllocationsTest.cpp
	)			)

	clang_target_link_libraries(LexTests			clang_target_link_libraries(LexTests
	PRIVATE			PRIVATE
	clangAST			clangAST
	clangBasic			clangBasic
	clangLex			clangLex
	clangParse			clangParse
	clangSema			clangSema
	)			)

clang/unittests/Lex/PPMemoryAllocationsTest.cpp

This file was added.

				//===- unittests/Lex/PPMemoryAllocationsTest.cpp - ----------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===--------------------------------------------------------------===//

				#include "clang/Basic/Diagnostic.h"
				#include "clang/Basic/DiagnosticOptions.h"
				#include "clang/Basic/FileManager.h"
				#include "clang/Basic/LangOptions.h"
				#include "clang/Basic/SourceManager.h"
				#include "clang/Basic/TargetInfo.h"
				#include "clang/Basic/TargetOptions.h"
				#include "clang/Lex/HeaderSearch.h"
				#include "clang/Lex/HeaderSearchOptions.h"
				#include "clang/Lex/ModuleLoader.h"
				#include "clang/Lex/Preprocessor.h"
				#include "clang/Lex/PreprocessorOptions.h"
				#include "gtest/gtest.h"

				using namespace clang;

				namespace {

				class PPMemoryAllocationsTest : public ::testing::Test {
				protected:
				PPMemoryAllocationsTest()
				: FileMgr(FileMgrOpts), DiagID(new DiagnosticIDs()),
				Diags(DiagID, new DiagnosticOptions, new IgnoringDiagConsumer()),
				SourceMgr(Diags, FileMgr), TargetOpts(new TargetOptions) {
				TargetOpts->Triple = "x86_64-apple-darwin11.1.0";
				Target = TargetInfo::CreateTargetInfo(Diags, TargetOpts);
				}

				FileSystemOptions FileMgrOpts;
				FileManager FileMgr;
				IntrusiveRefCntPtr<DiagnosticIDs> DiagID;
				DiagnosticsEngine Diags;
				SourceManager SourceMgr;
				LangOptions LangOpts;
				std::shared_ptr<TargetOptions> TargetOpts;
				IntrusiveRefCntPtr<TargetInfo> Target;
				};

				TEST_F(PPMemoryAllocationsTest, PPMacroDefinesAllocations) {
				std::string Source;
				size_t NumMacros = 1000000;
				{
				llvm::raw_string_ostream SourceOS(Source);

				// Create a combination of 1 or 3 token macros.
				for (size_t I = 0; I < NumMacros; ++I) {
				SourceOS << "#define MACRO_ID_" << I << " ";
				if ((I % 2) == 0)
				SourceOS << "(" << I << ")";
				else
				SourceOS << I;
				SourceOS << "\n";
				}
				}

				std::unique_ptr<llvm::MemoryBuffer> Buf =
				llvm::MemoryBuffer::getMemBuffer(Source);
				SourceMgr.setMainFileID(SourceMgr.createFileID(std::move(Buf)));

				TrivialModuleLoader ModLoader;
				HeaderSearch HeaderInfo(std::make_shared<HeaderSearchOptions>(), SourceMgr,
				Diags, LangOpts, Target.get());
				Preprocessor PP(std::make_shared<PreprocessorOptions>(), Diags, LangOpts,
				SourceMgr, HeaderInfo, ModLoader,
				/IILookup =/nullptr,
				/OwnsHeaderSearch =/false);
				PP.Initialize(*Target);
				PP.EnterMainSourceFile();

				while (1) {
				Token tok;
				PP.Lex(tok);
				if (tok.is(tok::eof))
				break;
				}

				size_t NumAllocated = PP.getPreprocessorAllocator().getBytesAllocated();
				float BytesPerDefine = float(NumAllocated) / float(NumMacros);
				llvm::errs() << "Num preprocessor allocations for " << NumMacros
				<< " #define: " << NumAllocated << "\n";
				llvm::errs() << "Bytes per #define: " << BytesPerDefine << "\n";
				// On arm64-apple-macos, we get around 120 bytes per define.
				// Assume a reasonable upper bound based on that number that we don't want
				// to exceed when storing information about a macro #define with 1 or 3
				// tokens.
				EXPECT_LT(BytesPerDefine, 130.0f);
				}

				} // anonymous namespace

This is an archive of the discontinued LLVM Phabricator instance.

[Preprocessor] Reduce the memory overhead of `#define` directivesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 408070

clang/include/clang/Lex/MacroInfo.h

clang/lib/Lex/MacroInfo.cpp

clang/lib/Lex/PPDirectives.cpp

clang/lib/Serialization/ASTReader.cpp

clang/lib/Serialization/ASTWriter.cpp

clang/unittests/Lex/CMakeLists.txt

clang/unittests/Lex/PPMemoryAllocationsTest.cpp

[Preprocessor] Reduce the memory overhead of `#define` directives
ClosedPublic