This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Format/
-
Format/
-
CMakeLists.txt
8/9
FormatToken.h
32/43
MacroExpander.cpp
3/4
Macros.h
-
unittests/Format/
-
Format/
-
CMakeLists.txt
4/4
MacroExpanderTest.cpp
1/1
TestLexer.h

Differential D83296

[clang-format] Add a MacroExpander.
ClosedPublic

Authored by klimek on Jul 7 2020, 4:54 AM.

Download Raw Diff

Details

Reviewers

sammccall

Commits

rGe336b74c995d: [clang-format] Add a MacroExpander.

Summary

The MacroExpander allows to expand simple (non-resursive) macro
definitions from a macro identifier token and macro arguments. It
annotates the tokens with a newly introduced MacroContext that keeps
track of the role a token played in expanding the macro in order to
be able to reconstruct the macro expansion from an expanded (formatted)
token stream.

Made Token explicitly copy-able to enable copying tokens from the parsed
macro definition.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

klimek created this revision.Jul 7 2020, 4:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2020, 4:54 AM

Herald added subscribers: cfe-commits, mgorny. · View Herald Transcript

MyDeveloperDay added a project: Restricted Project.Jul 7 2020, 5:32 AM

Harbormaster failed remote builds in B63182: Diff 275998!Jul 7 2020, 7:16 AM

Monday-morning ping.

In D83296#2146870, @klimek wrote:

Monday-morning ping.

Thanks for the reminder here... however this is taking me a bit to get my head around, and we've got a release branch cut scheduled for a couple of days that we're trying to polish for.
AFAICT there's significant followup work still needed to make use of this - are you wanting this to land in the 11 release? Else i'd probably come back to this after the cut...

In D83296#2146970, @sammccall wrote:

In D83296#2146870, @klimek wrote:

Monday-morning ping.

Thanks for the reminder here... however this is taking me a bit to get my head around, and we've got a release branch cut scheduled for a couple of days that we're trying to polish for.
AFAICT there's significant followup work still needed to make use of this - are you wanting this to land in the 11 release? Else i'd probably come back to this after the cut...

Sorry, this is not urgent - the 11 cut has clear priority. My default is to ping stuff every week unless somebody tells me they prefer me not doing that :)

MyDeveloperDay added a subscriber: MyDeveloperDay.Jul 13 2020, 6:06 AM

MyDeveloperDay added inline comments.

clang/lib/Format/MacroExpander.cpp
151	elide braces?
clang/unittests/Format/MacroExpanderTest.cpp
56	when these assertions fail you have no idea which of the various calls is actually failing how about passing in FILE,LINE then adding that to the output

MyDeveloperDay added inline comments.Jul 13 2020, 6:06 AM

clang/unittests/Format/MacroExpanderTest.cpp
6	are you using this?

Sorry for the long delay, I've made up for it with extra comments :-\

This looks really well-thought-out and I'm rationalizing my pickiness as:

this is conceptually complicated
I expect this code to live a long time and be read and "modified around" by lots of people

Some of the comments/requests for doc might strictly be more in scope for in later patches (documenting functionality that doesn't exist yet). Those docs would help *me* now but happy if you'd rather briefly explain and add them later.

clang/lib/Format/FormatToken.h
168	this is a great example, it might be a little more clear with more distinct chars and some vertical alignment: Given X(A)=[A], Y(A)=<A>, X({ Y(0) } ) expands as [ { < 0 > } ] StartOfExpansion 1 1 ExpandedFrom[0] X X X X X X X ExpandedFrom[1] Y Y Y You could extend this to cover all the fields and hoist it to be a comment on MacroContext if you like, I think the concreteness helps.
178	why the asymmetry between start/end? given `ID(x)=X`, `ID(ID(0))` yields `0` which starts and ends two expansions, right? Consider making them both integer, even if you don't need it at this point. (also 64 bits, really?)
185	this isn't used in this patch - can we leave it out until used?
447	if you're not extremely concerned about memory layout, I'd consider making this an `Optional<MacroContext>` with nullopt replacing the current MR_None. This reduces the number of implicit invariants (AIUI MR_None can't be combined with any other fields being set) and means the name MacroContext more closely fits the thing it's modeling.
702	const. I guess it doesn't matter, but copyFrom would seem a little less weird to me in an OOP/encapsulation sense. I do like this explicit form rather than clone() + move constructor though, as pointer identity is pretty important for tokens.
clang/lib/Format/MacroExpander.cpp
36	Tokens -> Expansion? (semantics rather than type)
39	Dmitri gave a tech talk on dropping comments like these :-)
43	who's responsible for establishing this? AIUI this will fail if e.g. `Macros` contains a string that contains only whitespace, which is a slightly weird precondition.
64	assert instead? Caller checks this
82	this assumes the expansion is nonempty, which the grammar doesn't. while{} instead?
114	weird param name!
117	This is a slightly spooky buffer name - it's the magic name the PP uses for pasted tokens. A closer fit for config is maybe "<command line>" (like macro definitions passed with `-D`). Is it necessary to use one of clang's magic buffer names at all? If so, comment! Else maybe "<clang-format style>" or something?
134	is the caller responsible for checking the #args matches #params? If so, document and assert here? Looking at the implementation, it seems you don't expand if there are too few args, and expand if there are too many args (ignoring the last ones). Maybe it doesn't matter, but it'd be nice to be more consistent here. (Probably worth calling out somewhere explicitly that variadic macros are not supported)
142	This doesn't depend on args, so we could compute this mapping when the Definition is constructed and encapsulate it there. (Maybe performance doesn't matter, I'd also find this a little clearer. But if the allocation doesn't matter, we shouldn't be using SmallVector...)
168	skip the parameter -> treat the parameter as empty? (My first guess was this meant given `ID(X)=X`, `ID()` would expand to `X`.)
186	nit: Result
190	"tokens that were not part of the macro argument" --> "tokens from the macro body"?
195	(I don't know exactly how this is used, but consider whether you mean "do not need to", "should not" or "cannot" here)
199	this threw me for a loop... it's EOF right? It's not explicitly mentioned, so maybe either add a comment or `&& Result.back()->is(tok::eof)`. This makes the `size-2` less cryptic too.
201	Why not set StartOfExpansion in the same way, to avoid tracking the `First` state?
clang/lib/Format/MacroExpander.h
10 ↗	(On Diff #275998)	I think this comment is too short (and doesn't really say much that you can't get from the filename). IME many people's mental model of macros is based on how they're used rather than how they formally work, so I think it's worth spelling out even the obvious implementation choices. I'd like to see: rough description of the scope/goal of the feature (clang-format doesn't see macro definitions, so macro uses tend to be pseudo-parsed as function calls. When this isn't appropriate [example], a macro definition can be provided as part of the style. When such a macro is used in the code, clang-format will expand it as the preprocessor would before determining the role of tokens. [example]) explicitly call out here that only a single level of expansion is supported, which is a divergence from the real preprocessor. (This influences both the implementation and the mental model) what the MacroExpander does and how it relates to MacroContext. I think this should give the input and output token streams names, as it's often important to clearly distinguish the two. (TokenBuffer uses "Spelled tokens" and "Expanded tokens" for this, which is not the same as how these terms are used in SourceManager, but related and hasn't been confusing in practice). a rough sketch of how the mismatch between what was annotated and what must be formatted is resolved e.g. -- just guessing here -- the spelled stream is formatted but for macro args, the annotations from the expanded stream are used. (I'm assuming this is the best file to talk about the implementation of this feature in general - i'm really hoping that there is such a file. If there are a bunch of related utilities you might even consider renaming the header as an umbrella to make them easier to document. This is a question of taste...)
40 ↗	(On Diff #275998)	You define "simple" in the patch description as non-recursive - can you just say "non-recursive" here? Or better spell out what that means (no macro can refer to another macro)
44 ↗	(On Diff #275998)	nit: I think `using` is usually considered more readable
49 ↗	(On Diff #275998)	This seems to precisely define the grammar but leave me guessing as to the semantics. I'd at least suggest 'exp' -> 'expansion'. Personally I'm partial to examples instead :-)
50 ↗	(On Diff #275998)	"PI 3.14" or "NOT(X) !X" seems much less familiar than "PI=3.14" or "NOT(X)=!X" as accepted by `-D`. It also resolves an ambiguity: is "DISCARD_ERROR ( void ) ( err )" an object or function macro? The extra redundancy in the grammar should make it easier to detect errors too.
51 ↗	(On Diff #275998)	this grammar allows object macros, but disallows function macros with no args. intended? (FWIW this grammar also allows "ID(X X": an object macro "ID" which expands to "(X X". But the implementation, probably correctly, doesn't)
54 ↗	(On Diff #275998)	The signature doesn't allow errors to be reported, which is a little unfortunate but seems hard to fix properly (so that errors are reported when the config is parsed) - the "style is a simple struct" is hard to reconcile with this data structure. Silent discard on error should probably be mentioned in the constructor.
56 ↗	(On Diff #275998)	Why are the macro definitions in an arbitrary specified encoding? I'd hope by the time we've parsed the config, our strings are UTF-8. (On disk, YAML can be UTF-16 per spec, but...)
62 ↗	(On Diff #275998)	const (and an expand)
66 ↗	(On Diff #275998)	(I can't see the real callsite but...) If we care about performance here, is 8 a little small? should we have a `vector &Out` instead?

curdeius added a subscriber: curdeius.Jul 22 2020, 2:18 PM

curdeius added inline comments.

clang/lib/Format/MacroExpander.cpp
47	Nit: typo "corresponding Definition".
111	Why isn't it defaulted?

Addressed code review comments.

Harbormaster completed remote builds in B66219: Diff 281611.Jul 29 2020, 8:35 AM

klimek added inline comments.Jul 29 2020, 8:37 AM

clang/lib/Format/MacroExpander.cpp
36	Changed to "Body".
82	I have no clue how this ever worked tbh O.O Has been reworked as part of the move to use = to separate the macro signature from the body.
114	Copy-paste gone wrong I assume.
117	We need source locations, and apparently only: <built-in>, <inline asm> and <scratch space> are allowed to have source locations.
134	Added docs in the class comment for MacroExpander. (so far I always expand, too few -> empty, too many -> ignore)
195	Replaced with "are not".
clang/lib/Format/MacroExpander.h
10 ↗	(On Diff #275998)	Moved to Macros.h and added file comment that tries to address all of these.
66 ↗	(On Diff #275998)	I don't think we particularly care about performance here, but the llvm docs say I should use SmallVector. Happy to bump down to 0 if you feel that the magic 8 is a problem here as a gut-feeling premature optimization. https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h

JakeMerdichAMD added a subscriber: JakeMerdichAMD.Aug 4 2020, 7:48 AM

Somehow I missed the email from your replies.

Mostly nits that you can take or leave, but I think potential bugs around functionlike-vs-objectlike and multiple-expansion of args.

clang/lib/Format/FormatToken.h
179	"context" is often pretty vague - "MacroSource" isn't a brilliant name but at least seems to hint at the direction (that the FormatToken is the expanded token and the MacroSource gives information about what it was expanded from) I don't feel strongly about this though, up to you.
705	nit: comment -> copyFrom
clang/lib/Format/MacroExpander.cpp
2	nit: banner is for wrong filename
82	this accepts `FOO(A,B,)=...` as equivalent to `FOO(A,B)=...`. Not sure if worth fixing.
89	(nit: I'd probably find this easier to follow as `if (equal) else if (eof) else` with parseTail inlined, but up to you)
132	uber-nit: seems like this loop belongs in the caller
143	nit: this is a copy for what seems like no reason - move `Parser.parse()` inline to this line?
156	lookup() returns a value, so this is a copy (with lifetime-extension) I think you want `*find`
179	please use a different name for this variable, or the parameter it shadows, or preferably both!
180	nit: "part of a macro argument at multiple levels"? (Current text suggests to me that it can be arg 0 and arg 1 of the same macro)
187	you're pushing here without copying. This means the original tokens from the ArgsList are mutated. Maybe we own them, but this seems at least wrong for multiple expansion of the same arg. e.g. #define M(X,Y) X Y X M(1,2) Will expand to: 1, ExpandedArg, ExpandedFrom = [M, M] // should just be one M 2, ExpandedArg, ExpandedFrom = [M] 1, ExpandedArg, ExpandedFrom = [M, M] // this is the same token pointer as the first one Maybe it would be better if pushToken performed the copy, and returned a mutable pointer to the copy. (If you can make the input const, that would prevent this type of bug)
clang/lib/Format/Macros.h
83	Is this saying that the functionlike vs objectlike distiction is not preserved? This doesn't seem safe (unless the caller is required to retain this information). e.g. #define NUMBER int using Calculator = NUMBER(); // does expansion consume () or not?
87	(Seems a little odd that these pointers to external FormatTokens aren't const... I can believe there's a reason though)

klimek added inline comments.Aug 19 2020, 9:26 AM

clang/lib/Format/MacroExpander.cpp
187	Ugh. I'll need to take a deeper look, but generally, the problem is we don't want to copy - we're mutating the data of the token while formatting the expanded token stream, and then re-use that info when formatting the original stream. We could copy, add a reference to the original token, and then have a step that adapts in the end, and perhaps that's cleaner overall anyway, but will be quite a change. The alternative is that I'll look into how to specifically handle double-expansion (or ... forbid it).

sammccall added inline comments.Aug 19 2020, 10:12 AM

clang/lib/Format/MacroExpander.cpp
187	(or ... forbid it). I'm starting to think this is the best option. The downsides seem pretty acceptable to me: it's another wart to document: on the other hand it simplifies the conceptual model, I think it helps users understand the deeper behavior some macros require simplification rather than supplying the actual definition: already crossed this bridge by not supporting macros in macro bodies, variadics, pasting... loses information: one expansion is enough to establish which part of the grammar the arguments form in realistic cases. (Even in pathological cases, preserving the conflicting info only helps you if you have a plan to resolve the conflicts) it's another wart to document: Are there any others?

klimek added inline comments.Aug 20 2020, 2:32 AM

clang/lib/Format/MacroExpander.cpp
187	My main concern is that it's probably the most surprising feature to not support.

Just checking this is waiting on you rather than me...

If the multiple-expansion thing is blocking progress, I think we're much better off getting a limited version of this feature landed than losing momentum trying to solve it.

Worked in review comments.

clang/lib/Format/FormatToken.h
179	MacroSource sounds like it is about the macro source (i.e. the tokens written for the macro). I'd be happy to rename to MacroExpansion or MacroExpansionInfo or somesuch if you think that helps?
clang/lib/Format/MacroExpander.cpp
82	We're generally accepting too much; I'd either want to restrict it fully, or basically be somewhat minimum/forgiving. Given that we can't get errors back to the user, I was aiming for the latter.
89	I basically like having the implementation match the BNR. That said, not feeling strongly about it. You're saying you'd duplicate the Def.Body.push_back in the if (eof)? if (Current->is(tok::equal) { nextToken(); // inline parseTail } else if (Current->is(tok::eof) { Def.Body.push_back(Current); } else { return false; } Generally, I personally find it easier to read the early exit.
143	Reason is that we need the name.
187	Forbade multi-expansion.
clang/lib/Format/Macros.h
83	Fixed. I thought we'd get away without it, but it's simple enough to fix and we have enough suprises as is.
87	We modify the tokens by adding the macro context.

Harbormaster completed remote builds in B72274: Diff 292970.Sep 19 2020, 10:27 AM

Ship it!

clang/lib/Format/FormatToken.h
179	Oops, accidental "source" pun. MacroOrigin would be another name in the same spirit. But MacroExpansion sounds good to me, too.
clang/lib/Format/MacroExpander.cpp
143	oops, right. std::move() the RHS? (mostly I just find the copies surprising, so draws attention)
191	nit: this is confusingly a const reference to a non-const pointer... `auto ` or `FormatToken `?
clang/unittests/Format/MacroExpanderTest.cpp
184	may want a test that uses of an arg after the first are not expanded, because that "guards" a bunch of nasty potential bugs
clang/unittests/Format/TestLexer.h
16	I guess clang-tidy wants ..._TESTLEXER_H here

This revision is now accepted and ready to land.Sep 22 2020, 11:33 AM

This revision was landed with ongoing or failed builds.Sep 25 2020, 5:09 AM

Closed by commit rGe336b74c995d: [clang-format] Add a MacroExpander. (authored by klimek). · Explain Why

This revision was automatically updated to reflect the committed changes.

klimek marked 5 inline comments as done.

klimek added a commit: rGe336b74c995d: [clang-format] Add a MacroExpander..

What does this change mean for users of clang-format -- better formatting of complicated (e.g. multi-line) macro invocations?

In D83296#2299062, @nridge wrote:

What does this change mean for users of clang-format -- better formatting of complicated (e.g. multi-line) macro invocations?

Nothing from this change is exposed yet, it's part of a series.
The end goal is as you say: perfect formatting of code in and around arbitrary macros, by passing (simplified) macro definitions as configuration.

D88299 is next, Manuel assures me it gets easier from there :-)

In D83296#2299062, @nridge wrote:

What does this change mean for users of clang-format -- better formatting of complicated (e.g. multi-line) macro invocations?

In addition to what Sam said, this also attempts to be an improvement in maintainability. Given this is a fairly complex change, you might ask how this helps :)
The idea is that we bundle the complexity of macro handling in a clearly separated part of the code that can be tested and developed ~on its own.
Currently, we have multiple macro regex settings that then lead to random code throughout clang-format that tries to handle those identifiers special.
Once this is done, we can delete all those settings, as the more generalized macro configuration will supersede them.

clang/lib/Format/MacroExpander.cpp
191	Yikes, thanks for catching!
clang/unittests/Format/MacroExpanderTest.cpp
184	Discussed offline: the above test tests exactly this.

Revision Contents

Path

Size

clang/

lib/

Format/

1 line

76 lines

225 lines

141 lines

unittests/

Format/

CMakeLists.txt

1 line

MacroExpanderTest.cpp

187 lines

TestLexer.h

88 lines

Diff 294285

clang/lib/Format/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS support)			set(LLVM_LINK_COMPONENTS support)

	add_clang_library(clangFormat			add_clang_library(clangFormat
	AffectedRangeManager.cpp			AffectedRangeManager.cpp
	BreakableToken.cpp			BreakableToken.cpp
	ContinuationIndenter.cpp			ContinuationIndenter.cpp
	Format.cpp			Format.cpp
	FormatToken.cpp			FormatToken.cpp
	FormatTokenLexer.cpp			FormatTokenLexer.cpp
				MacroExpander.cpp
	NamespaceEndCommentsFixer.cpp			NamespaceEndCommentsFixer.cpp
	SortJavaScriptImports.cpp			SortJavaScriptImports.cpp
	TokenAnalyzer.cpp			TokenAnalyzer.cpp
	TokenAnnotator.cpp			TokenAnnotator.cpp
	UnwrappedLineFormatter.cpp			UnwrappedLineFormatter.cpp
	UnwrappedLineParser.cpp			UnwrappedLineParser.cpp
	UsingDeclarationsSorter.cpp			UsingDeclarationsSorter.cpp
	WhitespaceManager.cpp			WhitespaceManager.cpp

	LINK_LIBS			LINK_LIBS
	clangBasic			clangBasic
	clangLex			clangLex
	clangToolingCore			clangToolingCore
	clangToolingInclusions			clangToolingInclusions
	)			)

clang/lib/Format/FormatToken.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
// Represents what type of block a set of braces open.		// Represents what type of block a set of braces open.
enum BraceBlockKind { BK_Unknown, BK_Block, BK_BracedInit };		enum BraceBlockKind { BK_Unknown, BK_Block, BK_BracedInit };

// The packing kind of a function's parameters.		// The packing kind of a function's parameters.
enum ParameterPackingKind { PPK_BinPacked, PPK_OnePerLine, PPK_Inconclusive };		enum ParameterPackingKind { PPK_BinPacked, PPK_OnePerLine, PPK_Inconclusive };

enum FormatDecision { FD_Unformatted, FD_Continue, FD_Break };		enum FormatDecision { FD_Unformatted, FD_Continue, FD_Break };

		/// Roles a token can take in a configured macro expansion.
		enum MacroRole {
		/// The token was expanded from a macro argument when formatting the expanded
		/// token sequence.
		MR_ExpandedArg,
		/// The token is part of a macro argument that was previously formatted as
		/// expansion when formatting the unexpanded macro call.
		MR_UnexpandedArg,
		/// The token was expanded from a macro definition, and is not visible as part
		/// of the macro call.
		MR_Hidden,
		};

		struct FormatToken;

		/// Contains information on the token's role in a macro expansion.
		///
		/// Given the following definitions:
		/// A(X) = [ X ]
		/// B(X) = < X >
		/// C(X) = X
		///
		/// Consider the macro call:
		/// A({B(C(C(x)))}) -> [{<x>}]
		///
		/// In this case, the tokens of the unexpanded macro call will have the
		/// following relevant entries in their macro context (note that formatting
		/// the unexpanded macro call happens after formatting the expanded macro
		/// call):
		/// A( { B( C( C(x) ) ) } )
		sammccallUnsubmitted Done Reply Inline Actions this is a great example, it might be a little more clear with more distinct chars and some vertical alignment: Given X(A)=[A], Y(A)=<A>, X({ Y(0) } ) expands as [ { < 0 > } ] StartOfExpansion 1 1 ExpandedFrom[0] X X X X X X X ExpandedFrom[1] Y Y Y You could extend this to cover all the fields and hoist it to be a comment on MacroContext if you like, I think the concreteness helps. sammccall: this is a great example, it might be a little more clear with more distinct chars and some…
		/// Role: NN U NN NN NNUN N N U N (N=None, U=UnexpandedArg)
		///
		/// [ { < x > } ]
		/// Role: H E H E H E H (H=Hidden, E=ExpandedArg)
		/// ExpandedFrom[0]: A A A A A A A
		/// ExpandedFrom[1]: B B B
		/// ExpandedFrom[2]: C
		/// ExpandedFrom[3]: C
		/// StartOfExpansion: 1 0 1 2 0 0 0
		/// EndOfExpansion: 0 0 0 2 1 0 1
		sammccallUnsubmitted Done Reply Inline Actions why the asymmetry between start/end? given `ID(x)=X`, `ID(ID(0))` yields `0` which starts and ends two expansions, right? Consider making them both integer, even if you don't need it at this point. (also 64 bits, really?) sammccall: why the asymmetry between start/end? given `ID(x)=X`, `ID(ID(0))` yields `0` which starts and…
		struct MacroExpansion {
		sammccallUnsubmitted Not Done Reply Inline Actions "context" is often pretty vague - "MacroSource" isn't a brilliant name but at least seems to hint at the direction (that the FormatToken is the expanded token and the MacroSource gives information about what it was expanded from) I don't feel strongly about this though, up to you. sammccall: "context" is often pretty vague - "MacroSource" isn't a brilliant name but at least seems to…
		klimekAuthorUnsubmitted Done Reply Inline Actions MacroSource sounds like it is about the macro source (i.e. the tokens written for the macro). I'd be happy to rename to MacroExpansion or MacroExpansionInfo or somesuch if you think that helps? klimek: MacroSource sounds like it is about the macro source (i.e. the tokens written for the macro).
		sammccallUnsubmitted Done Reply Inline Actions Oops, accidental "source" pun. MacroOrigin would be another name in the same spirit. But MacroExpansion sounds good to me, too. sammccall: Oops, accidental "source" pun. MacroOrigin would be another name in the same spirit. But…
		MacroExpansion(MacroRole Role) : Role(Role) {}

		/// The token's role in the macro expansion.
		/// When formatting an expanded macro, all tokens that are part of macro
		/// arguments will be MR_ExpandedArg, while all tokens that are not visible in
		/// the macro call will be MR_Hidden.
		sammccallUnsubmitted Done Reply Inline Actions this isn't used in this patch - can we leave it out until used? sammccall: this isn't used in this patch - can we leave it out until used?
		/// When formatting an unexpanded macro call, all tokens that are part of
		/// macro arguments will be MR_UnexpandedArg.
		MacroRole Role;

		/// The stack of macro call identifier tokens this token was expanded from.
		llvm::SmallVector<FormatToken *, 1> ExpandedFrom;

		/// The number of expansions of which this macro is the first entry.
		unsigned StartOfExpansion = 0;

		/// The number of currently open expansions in \c ExpandedFrom this macro is
		/// the last token in.
		unsigned EndOfExpansion = 0;
		};

class TokenRole;		class TokenRole;
class AnnotatedLine;		class AnnotatedLine;

/// A wrapper around a \c Token storing information about the		/// A wrapper around a \c Token storing information about the
/// whitespace characters preceding it.		/// whitespace characters preceding it.
struct FormatToken {		struct FormatToken {
FormatToken()		FormatToken()
: HasUnescapedNewline(false), IsMultiline(false), IsFirst(false),		: HasUnescapedNewline(false), IsMultiline(false), IsFirst(false),
Show All 11 Lines	struct FormatToken {
/// The raw text of the token.		/// The raw text of the token.
///		///
/// Contains the raw token text without leading whitespace and without leading		/// Contains the raw token text without leading whitespace and without leading
/// escaped newlines.		/// escaped newlines.
StringRef TokenText;		StringRef TokenText;

/// A token can have a special role that can carry extra information		/// A token can have a special role that can carry extra information
/// about the token's formatting.		/// about the token's formatting.
std::unique_ptr<TokenRole> Role;		/// FIXME: Make FormatToken for parsing and AnnotatedToken two different
		/// classes and make this a unique_ptr in the AnnotatedToken class.
		std::shared_ptr<TokenRole> Role;

/// The range of the whitespace immediately preceding the \c Token.		/// The range of the whitespace immediately preceding the \c Token.
SourceRange WhitespaceRange;		SourceRange WhitespaceRange;

/// Whether there is at least one unescaped newline before the \c		/// Whether there is at least one unescaped newline before the \c
/// Token.		/// Token.
unsigned HasUnescapedNewline : 1;		unsigned HasUnescapedNewline : 1;

▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	public:

/// The next token in the unwrapped line.		/// The next token in the unwrapped line.
FormatToken *Next = nullptr;		FormatToken *Next = nullptr;

/// If this token starts a block, this contains all the unwrapped lines		/// If this token starts a block, this contains all the unwrapped lines
/// in it.		/// in it.
SmallVector<AnnotatedLine *, 1> Children;		SmallVector<AnnotatedLine *, 1> Children;

		// Contains all attributes related to how this token takes part
		// in a configured macro expansion.
		llvm::Optional<MacroExpansion> MacroCtx;
		sammccallUnsubmitted Done Reply Inline Actions if you're not extremely concerned about memory layout, I'd consider making this an `Optional<MacroContext>` with nullopt replacing the current MR_None. This reduces the number of implicit invariants (AIUI MR_None can't be combined with any other fields being set) and means the name MacroContext more closely fits the thing it's modeling. sammccall: if you're not extremely concerned about memory layout, I'd consider making this an…

bool is(tok::TokenKind Kind) const { return Tok.is(Kind); }		bool is(tok::TokenKind Kind) const { return Tok.is(Kind); }
bool is(TokenType TT) const { return getType() == TT; }		bool is(TokenType TT) const { return getType() == TT; }
bool is(const IdentifierInfo *II) const {		bool is(const IdentifierInfo *II) const {
return II && II == Tok.getIdentifierInfo();		return II && II == Tok.getIdentifierInfo();
}		}
bool is(tok::PPKeywordKind Kind) const {		bool is(tok::PPKeywordKind Kind) const {
return Tok.getIdentifierInfo() &&		return Tok.getIdentifierInfo() &&
Tok.getIdentifierInfo()->getPPKeywordID() == Kind;		Tok.getIdentifierInfo()->getPPKeywordID() == Kind;
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	const FormatToken *getNamespaceToken() const {
if (NamespaceTok && NamespaceTok->isOneOf(tok::kw_inline, tok::kw_export))		if (NamespaceTok && NamespaceTok->isOneOf(tok::kw_inline, tok::kw_export))
NamespaceTok = NamespaceTok->getNextNonComment();		NamespaceTok = NamespaceTok->getNextNonComment();
return NamespaceTok &&		return NamespaceTok &&
NamespaceTok->isOneOf(tok::kw_namespace, TT_NamespaceMacro)		NamespaceTok->isOneOf(tok::kw_namespace, TT_NamespaceMacro)
? NamespaceTok		? NamespaceTok
: nullptr;		: nullptr;
}		}

		void copyFrom(const FormatToken &Tok) { *this = Tok; }
		sammccallUnsubmitted Done Reply Inline Actions const. I guess it doesn't matter, but copyFrom would seem a little less weird to me in an OOP/encapsulation sense. I do like this explicit form rather than clone() + move constructor though, as pointer identity is pretty important for tokens. sammccall: const. I guess it doesn't matter, but copyFrom would seem a little less weird to me in an…

private:		private:
// Disallow copying.		// Only allow copying via the explicit copyFrom method.
		sammccallUnsubmitted Done Reply Inline Actions nit: comment -> copyFrom sammccall: nit: comment -> copyFrom
FormatToken(const FormatToken &) = delete;		FormatToken(const FormatToken &) = delete;
void operator=(const FormatToken &) = delete;		FormatToken &operator=(const FormatToken &) = default;

template <typename A, typename... Ts>		template <typename A, typename... Ts>
bool startsSequenceInternal(A K1, Ts... Tokens) const {		bool startsSequenceInternal(A K1, Ts... Tokens) const {
if (is(tok::comment) && Next)		if (is(tok::comment) && Next)
return Next->startsSequenceInternal(K1, Tokens...);		return Next->startsSequenceInternal(K1, Tokens...);
return is(K1) && Next && Next->startsSequenceInternal(Tokens...);		return is(K1) && Next && Next->startsSequenceInternal(Tokens...);
}		}

▲ Show 20 Lines • Show All 473 Lines • Show Last 20 Lines

clang/lib/Format/MacroExpander.cpp

This file was added.

				//===--- MacroExpander.cpp - Format C++ code --------------------- C++ --===//
				//
				sammccallUnsubmitted Done Reply Inline Actions nit: banner is for wrong filename sammccall: nit: banner is for wrong filename
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file contains the implementation of MacroExpander, which handles macro
				/// configuration and expansion while formatting.
				///
				//===----------------------------------------------------------------------===//

				#include "Macros.h"

				#include "Encoding.h"
				#include "FormatToken.h"
				#include "FormatTokenLexer.h"
				#include "clang/Basic/TokenKinds.h"
				#include "clang/Format/Format.h"
				#include "clang/Lex/HeaderSearch.h"
				#include "clang/Lex/HeaderSearchOptions.h"
				#include "clang/Lex/Lexer.h"
				#include "clang/Lex/ModuleLoader.h"
				#include "clang/Lex/Preprocessor.h"
				#include "clang/Lex/PreprocessorOptions.h"
				#include "llvm/ADT/StringSet.h"
				#include "llvm/Support/ErrorHandling.h"

				namespace clang {
				namespace format {

				struct MacroExpander::Definition {
				StringRef Name;
				sammccallUnsubmitted Not Done Reply Inline Actions Tokens -> Expansion? (semantics rather than type) sammccall: Tokens -> Expansion? (semantics rather than type)
				klimekAuthorUnsubmitted Done Reply Inline Actions Changed to "Body". klimek: Changed to "Body".
				SmallVector<FormatToken *, 8> Params;
				SmallVector<FormatToken *, 8> Body;

				sammccallUnsubmitted Done Reply Inline Actions Dmitri gave a tech talk on dropping comments like these :-) sammccall: Dmitri gave a tech talk on dropping comments like these :-)
				// Map from each argument's name to its position in the argument list.
				// With "M(x, y) x + y":
				// x -> 0
				// y -> 1
				sammccallUnsubmitted Done Reply Inline Actions who's responsible for establishing this? AIUI this will fail if e.g. `Macros` contains a string that contains only whitespace, which is a slightly weird precondition. sammccall: who's responsible for establishing this? AIUI this will fail if e.g. `Macros` contains a…
				llvm::StringMap<size_t> ArgMap;

				bool ObjectLike = true;
				};
				curdeiusUnsubmitted Done Reply Inline Actions Nit: typo "corresponding Definition". curdeius: Nit: typo "corresponding Definition".

				class MacroExpander::DefinitionParser {
				public:
				DefinitionParser(ArrayRef<FormatToken *> Tokens) : Tokens(Tokens) {
				assert(!Tokens.empty());
				Current = Tokens[0];
				}

				// Parse the token stream and return the corresonding Definition object.
				// Returns an empty definition object with a null-Name on error.
				MacroExpander::Definition parse() {
				if (!Current->is(tok::identifier))
				return {};
				Def.Name = Current->TokenText;
				nextToken();
				if (Current->is(tok::l_paren)) {
				Def.ObjectLike = false;
				sammccallUnsubmitted Done Reply Inline Actions assert instead? Caller checks this sammccall: assert instead? Caller checks this
				if (!parseParams())
				return {};
				}
				if (!parseExpansion())
				return {};

				return Def;
				}

				private:
				bool parseParams() {
				assert(Current->is(tok::l_paren));
				nextToken();
				while (Current->is(tok::identifier)) {
				Def.Params.push_back(Current);
				Def.ArgMap[Def.Params.back()->TokenText] = Def.Params.size() - 1;
				nextToken();
				if (Current->isNot(tok::comma))
				sammccallUnsubmitted Not Done Reply Inline Actions this assumes the expansion is nonempty, which the grammar doesn't. while{} instead? sammccall: this assumes the expansion is nonempty, which the grammar doesn't. while{} instead?
				klimekAuthorUnsubmitted Done Reply Inline Actions I have no clue how this ever worked tbh O.O Has been reworked as part of the move to use = to separate the macro signature from the body. klimek: I have no clue how this ever worked tbh O.O Has been reworked as part of the move to use = to…
				sammccallUnsubmitted Not Done Reply Inline Actions this accepts `FOO(A,B,)=...` as equivalent to `FOO(A,B)=...`. Not sure if worth fixing. sammccall: this accepts `FOO(A,B,)=...` as equivalent to `FOO(A,B)=...`. Not sure if worth fixing.
				klimekAuthorUnsubmitted Done Reply Inline Actions We're generally accepting too much; I'd either want to restrict it fully, or basically be somewhat minimum/forgiving. Given that we can't get errors back to the user, I was aiming for the latter. klimek: We're generally accepting too much; I'd either want to restrict it fully, or basically be…
				break;
				nextToken();
				}
				if (Current->isNot(tok::r_paren))
				return false;
				nextToken();
				return true;
				sammccallUnsubmitted Not Done Reply Inline Actions (nit: I'd probably find this easier to follow as `if (equal) else if (eof) else` with parseTail inlined, but up to you) sammccall: (nit: I'd probably find this easier to follow as `if (equal) else if (eof) else` with parseTail…
				klimekAuthorUnsubmitted Done Reply Inline Actions I basically like having the implementation match the BNR. That said, not feeling strongly about it. You're saying you'd duplicate the Def.Body.push_back in the if (eof)? if (Current->is(tok::equal) { nextToken(); // inline parseTail } else if (Current->is(tok::eof) { Def.Body.push_back(Current); } else { return false; } Generally, I personally find it easier to read the early exit. klimek: I basically like having the implementation match the BNR. That said, not feeling strongly about…
				}

				bool parseExpansion() {
				if (!Current->isOneOf(tok::equal, tok::eof))
				return false;
				if (Current->is(tok::equal))
				nextToken();
				parseTail();
				return true;
				}

				void parseTail() {
				while (Current->isNot(tok::eof)) {
				Def.Body.push_back(Current);
				nextToken();
				}
				Def.Body.push_back(Current);
				}

				void nextToken() {
				if (Pos + 1 < Tokens.size())
				++Pos;
				curdeiusUnsubmitted Done Reply Inline Actions Why isn't it defaulted? curdeius: Why isn't it defaulted?
				Current = Tokens[Pos];
				Current->Finalized = true;
				}
				sammccallUnsubmitted Not Done Reply Inline Actions weird param name! sammccall: weird param name!
				klimekAuthorUnsubmitted Done Reply Inline Actions Copy-paste gone wrong I assume. klimek: Copy-paste gone wrong I assume.

				size_t Pos = 0;
				FormatToken *Current = nullptr;
				sammccallUnsubmitted Not Done Reply Inline Actions This is a slightly spooky buffer name - it's the magic name the PP uses for pasted tokens. A closer fit for config is maybe "<command line>" (like macro definitions passed with `-D`). Is it necessary to use one of clang's magic buffer names at all? If so, comment! Else maybe "<clang-format style>" or something? sammccall: This is a slightly spooky buffer name - it's the magic name the PP uses for pasted tokens. A…
				klimekAuthorUnsubmitted Done Reply Inline Actions We need source locations, and apparently only: <built-in>, <inline asm> and <scratch space> are allowed to have source locations. klimek: We need source locations, and apparently only: <built-in>, <inline asm> and <scratch space> are…
				Definition Def;
				ArrayRef<FormatToken *> Tokens;
				};

				MacroExpander::MacroExpander(
				const std::vector<std::string> &Macros, clang::SourceManager &SourceMgr,
				const FormatStyle &Style,
				llvm::SpecificBumpPtrAllocator<FormatToken> &Allocator,
				IdentifierTable &IdentTable)
				: SourceMgr(SourceMgr), Style(Style), Allocator(Allocator),
				IdentTable(IdentTable) {
				for (const std::string &Macro : Macros) {
				parseDefinition(Macro);
				}
				}
				sammccallUnsubmitted Done Reply Inline Actions uber-nit: seems like this loop belongs in the caller sammccall: uber-nit: seems like this loop belongs in the caller

				MacroExpander::~MacroExpander() = default;
				sammccallUnsubmitted Not Done Reply Inline Actions is the caller responsible for checking the #args matches #params? If so, document and assert here? Looking at the implementation, it seems you don't expand if there are too few args, and expand if there are too many args (ignoring the last ones). Maybe it doesn't matter, but it'd be nice to be more consistent here. (Probably worth calling out somewhere explicitly that variadic macros are not supported) sammccall: is the caller responsible for checking the #args matches #params? If so, document and assert…
				klimekAuthorUnsubmitted Done Reply Inline Actions Added docs in the class comment for MacroExpander. (so far I always expand, too few -> empty, too many -> ignore) klimek: Added docs in the class comment for MacroExpander. (so far I always expand, too few -> empty…

				void MacroExpander::parseDefinition(const std::string &Macro) {
				Buffers.push_back(
				llvm::MemoryBuffer::getMemBufferCopy(Macro, "<scratch space>"));
				clang::FileID FID =
				SourceMgr.createFileID(SourceManager::Unowned, Buffers.back().get());
				FormatTokenLexer Lex(SourceMgr, FID, 0, Style, encoding::Encoding_UTF8,
				Allocator, IdentTable);
				sammccallUnsubmitted Done Reply Inline Actions This doesn't depend on args, so we could compute this mapping when the Definition is constructed and encapsulate it there. (Maybe performance doesn't matter, I'd also find this a little clearer. But if the allocation doesn't matter, we shouldn't be using SmallVector...) sammccall: This doesn't depend on args, so we could compute this mapping when the Definition is…
				const auto Tokens = Lex.lex();
				sammccallUnsubmitted Not Done Reply Inline Actions nit: this is a copy for what seems like no reason - move `Parser.parse()` inline to this line? sammccall: nit: this is a copy for what seems like no reason - move `Parser.parse()` inline to this line?
				klimekAuthorUnsubmitted Done Reply Inline Actions Reason is that we need the name. klimek: Reason is that we need the name.
				sammccallUnsubmitted Done Reply Inline Actions oops, right. std::move() the RHS? (mostly I just find the copies surprising, so draws attention) sammccall: oops, right. std::move() the RHS? (mostly I just find the copies surprising, so draws attention)
				if (!Tokens.empty()) {
				DefinitionParser Parser(Tokens);
				auto Definition = Parser.parse();
				Definitions[Definition.Name] = std::move(Definition);
				}
				}

				bool MacroExpander::defined(llvm::StringRef Name) const {
				MyDeveloperDayUnsubmitted Done Reply Inline Actions elide braces? MyDeveloperDay: elide braces?
				return Definitions.find(Name) != Definitions.end();
				}

				bool MacroExpander::objectLike(llvm::StringRef Name) const {
				return Definitions.find(Name)->second.ObjectLike;
				sammccallUnsubmitted Done Reply Inline Actions lookup() returns a value, so this is a copy (with lifetime-extension) I think you want `find` sammccall:* lookup() returns a value, so this is a copy (with lifetime-extension) I think you want `*find`
				}

				llvm::SmallVector<FormatToken , 8> MacroExpander::expand(FormatToken ID,
				ArgsList Args) const {
				assert(defined(ID->TokenText));
				SmallVector<FormatToken *, 8> Result;
				const Definition &Def = Definitions.find(ID->TokenText)->second;

				// Expand each argument at most once.
				llvm::StringSet<> ExpandedArgs;

				// Adds the given token to Result.
				sammccallUnsubmitted Done Reply Inline Actions skip the parameter -> treat the parameter as empty? (My first guess was this meant given `ID(X)=X`, `ID()` would expand to `X`.) sammccall: skip the parameter -> treat the parameter as empty? (My first guess was this meant given `ID…
				auto pushToken = [&](FormatToken *Tok) {
				Tok->MacroCtx->ExpandedFrom.push_back(ID);
				Result.push_back(Tok);
				};

				// If Tok references a parameter, adds the corresponding argument to Result.
				// Returns false if Tok does not reference a parameter.
				auto expandArgument = [&](FormatToken *Tok) -> bool {
				// If the current token references a parameter, expand the corresponding
				// argument.
				if (!Tok->is(tok::identifier) \|\| ExpandedArgs.contains(Tok->TokenText))
				sammccallUnsubmitted Done Reply Inline Actions please use a different name for this variable, or the parameter it shadows, or preferably both! sammccall: please use a different name for this variable, or the parameter it shadows, or preferably both!
				return false;
				sammccallUnsubmitted Done Reply Inline Actions nit: "part of a macro argument at multiple levels"? (Current text suggests to me that it can be arg 0 and arg 1 of the same macro) sammccall: nit: "part of a macro argument at multiple levels"? (Current text suggests to me that it can be…
				ExpandedArgs.insert(Tok->TokenText);
				auto I = Def.ArgMap.find(Tok->TokenText);
				if (I == Def.ArgMap.end())
				return false;
				// If there are fewer arguments than referenced parameters, treat the
				// parameter as empty.
				sammccallUnsubmitted Done Reply Inline Actions nit: Result sammccall: nit: Result
				// FIXME: Potentially fully abort the expansion instead.
				sammccallUnsubmitted Not Done Reply Inline Actions you're pushing here without copying. This means the original tokens from the ArgsList are mutated. Maybe we own them, but this seems at least wrong for multiple expansion of the same arg. e.g. #define M(X,Y) X Y X M(1,2) Will expand to: 1, ExpandedArg, ExpandedFrom = [M, M] // should just be one M 2, ExpandedArg, ExpandedFrom = [M] 1, ExpandedArg, ExpandedFrom = [M, M] // this is the same token pointer as the first one Maybe it would be better if pushToken performed the copy, and returned a mutable pointer to the copy. (If you can make the input const, that would prevent this type of bug) sammccall: you're pushing here without copying. This means the original tokens from the ArgsList are…
				klimekAuthorUnsubmitted Done Reply Inline Actions Ugh. I'll need to take a deeper look, but generally, the problem is we don't want to copy - we're mutating the data of the token while formatting the expanded token stream, and then re-use that info when formatting the original stream. We could copy, add a reference to the original token, and then have a step that adapts in the end, and perhaps that's cleaner overall anyway, but will be quite a change. The alternative is that I'll look into how to specifically handle double-expansion (or ... forbid it). klimek: Ugh. I'll need to take a deeper look, but generally, the problem is we don't want to copy…
				sammccallUnsubmitted Not Done Reply Inline Actions (or ... forbid it). I'm starting to think this is the best option. The downsides seem pretty acceptable to me: it's another wart to document: on the other hand it simplifies the conceptual model, I think it helps users understand the deeper behavior some macros require simplification rather than supplying the actual definition: already crossed this bridge by not supporting macros in macro bodies, variadics, pasting... loses information: one expansion is enough to establish which part of the grammar the arguments form in realistic cases. (Even in pathological cases, preserving the conflicting info only helps you if you have a plan to resolve the conflicts) it's another wart to document: Are there any others? sammccall: > (or ... forbid it). I'm starting to think this is the best option. The downsides seem…
				klimekAuthorUnsubmitted Done Reply Inline Actions My main concern is that it's probably the most surprising feature to not support. klimek: My main concern is that it's probably the most surprising feature to not support.
				klimekAuthorUnsubmitted Done Reply Inline Actions Forbade multi-expansion. klimek: Forbade multi-expansion.
				if (I->getValue() >= Args.size())
				return true;
				for (FormatToken *Arg : Args[I->getValue()]) {
				sammccallUnsubmitted Done Reply Inline Actions "tokens that were not part of the macro argument" --> "tokens from the macro body"? sammccall: "tokens that were not part of the macro argument" --> "tokens from the macro body"?
				// A token can be part of a macro argument at multiple levels.
				sammccallUnsubmitted Done Reply Inline Actions nit: this is confusingly a const reference to a non-const pointer... `auto ` or `FormatToken `? sammccall: nit: this is confusingly a const reference to a non-const pointer... `auto *` or `FormatToken…
				klimekAuthorUnsubmitted Done Reply Inline Actions Yikes, thanks for catching! klimek: Yikes, thanks for catching!
				// For example, with "ID(x) x":
				// in ID(ID(x)), 'x' is expanded first as argument to the inner
				// ID, then again as argument to the outer ID. We keep the macro
				// role the token had from the inner expansion.
				sammccallUnsubmitted Not Done Reply Inline Actions (I don't know exactly how this is used, but consider whether you mean "do not need to", "should not" or "cannot" here) sammccall: (I don't know exactly how this is used, but consider whether you mean "do not need to", "should…
				klimekAuthorUnsubmitted Done Reply Inline Actions Replaced with "are not". klimek: Replaced with "are not".
				if (!Arg->MacroCtx)
				Arg->MacroCtx = MacroExpansion(MR_ExpandedArg);
				pushToken(Arg);
				}
				sammccallUnsubmitted Done Reply Inline Actions this threw me for a loop... it's EOF right? It's not explicitly mentioned, so maybe either add a comment or `&& Result.back()->is(tok::eof)`. This makes the `size-2` less cryptic too. sammccall: this threw me for a loop... it's EOF right? It's not explicitly mentioned, so maybe either add…
				return true;
				};
				sammccallUnsubmitted Done Reply Inline Actions Why not set StartOfExpansion in the same way, to avoid tracking the `First` state? sammccall: Why not set StartOfExpansion in the same way, to avoid tracking the `First` state?

				// Expand the definition into Result.
				for (FormatToken *Tok : Def.Body) {
				if (expandArgument(Tok))
				continue;
				// Create a copy of the tokens from the macro body, i.e. were not provided
				// by user code.
				FormatToken *New = new (Allocator.Allocate()) FormatToken;
				New->copyFrom(*Tok);
				assert(!New->MacroCtx);
				// Tokens that are not part of the user code are not formatted.
				New->MacroCtx = MacroExpansion(MR_Hidden);
				pushToken(New);
				}
				assert(Result.size() >= 1 && Result.back()->is(tok::eof));
				if (Result.size() > 1) {
				++Result[0]->MacroCtx->StartOfExpansion;
				++Result[Result.size() - 2]->MacroCtx->EndOfExpansion;
				}
				return Result;
				}

				} // namespace format
				} // namespace clang

clang/lib/Format/Macros.h

This file was added.

				//===--- MacroExpander.h - Format C++ code ----------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file contains the main building blocks of macro support in
				/// clang-format.
				///
				/// In order to not violate the requirement that clang-format can format files
				/// in isolation, clang-format's macro support uses expansions users provide
				/// as part of clang-format's style configuration.
				///
				/// Macro definitions are of the form "MACRO(p1, p2)=p1 + p2", but only support
				/// one level of expansion (\see MacroExpander for a full description of what
				/// is supported).
				///
				/// As part of parsing, clang-format uses the MacroExpander to expand the
				/// spelled token streams into expanded token streams when it encounters a
				/// macro call. The UnwrappedLineParser continues to parse UnwrappedLines
				/// from the expanded token stream.
				/// After the expanded unwrapped lines are parsed, the MacroUnexpander matches
				/// the spelled token stream into unwrapped lines that best resemble the
				/// structure of the expanded unwrapped lines.
				///
				/// When formatting, clang-format formats the expanded unwrapped lines first,
				/// determining the token types. Next, it formats the spelled unwrapped lines,
				/// keeping the token types fixed, while allowing other formatting decisions
				/// to change.
				///
				//===----------------------------------------------------------------------===//

				#ifndef CLANG_LIB_FORMAT_MACROS_H
				#define CLANG_LIB_FORMAT_MACROS_H

				#include <string>
				#include <unordered_map>
				#include <vector>

				#include "Encoding.h"
				#include "FormatToken.h"
				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringRef.h"

				namespace llvm {
				class MemoryBuffer;
				} // namespace llvm

				namespace clang {
				class IdentifierTable;
				class SourceManager;

				namespace format {
				struct FormatStyle;

				/// Takes a set of macro definitions as strings and allows expanding calls to
				/// those macros.
				///
				/// For example:
				/// Definition: A(x, y)=x + y
				/// Call : A(int a = 1, 2)
				/// Expansion : int a = 1 + 2
				///
				/// Expansion does not check arity of the definition.
				/// If fewer arguments than expected are provided, the remaining parameters
				/// are considered empty:
				/// Call : A(a)
				/// Expansion: a +
				/// If more arguments than expected are provided, they will be discarded.
				///
				/// The expander does not support:
				/// - recursive expansion
				/// - stringification
				/// - concatenation
				/// - variadic macros
				///
				/// Furthermore, only a single expansion of each macro argument is supported,
				/// so that we cannot get conflicting formatting decisions from different
				sammccallUnsubmitted Done Reply Inline Actions Is this saying that the functionlike vs objectlike distiction is not preserved? This doesn't seem safe (unless the caller is required to retain this information). e.g. #define NUMBER int using Calculator = NUMBER(); // does expansion consume () or not? sammccall: Is this saying that the functionlike vs objectlike distiction is not preserved? This doesn't…
				klimekAuthorUnsubmitted Done Reply Inline Actions Fixed. I thought we'd get away without it, but it's simple enough to fix and we have enough suprises as is. klimek: Fixed. I thought we'd get away without it, but it's simple enough to fix and we have enough…
				/// expansions.
				/// Definition: A(x)=x+x
				/// Call : A(id)
				/// Expansion : id+x
				sammccallUnsubmitted Not Done Reply Inline Actions (Seems a little odd that these pointers to external FormatTokens aren't const... I can believe there's a reason though) sammccall: (Seems a little odd that these pointers to external FormatTokens aren't const... I can believe…
				klimekAuthorUnsubmitted Done Reply Inline Actions We modify the tokens by adding the macro context. klimek: We modify the tokens by adding the macro context.
				///
				class MacroExpander {
				public:
				using ArgsList = llvm::ArrayRef<llvm::SmallVector<FormatToken *, 8>>;

				/// Construct a macro expander from a set of macro definitions.
				/// Macro definitions must be encoded as UTF-8.
				///
				/// Each entry in \p Macros must conform to the following simple
				/// macro-definition language:
				/// <definition> ::= <id> <expansion> \| <id> "(" <params> ")" <expansion>
				/// <params> ::= <id-list> \| ""
				/// <id-list> ::= <id> \| <id> "," <params>
				/// <expansion> ::= "=" <tail> \| <eof>
				/// <tail> ::= <tok> <tail> \| <eof>
				///
				/// Macros that cannot be parsed will be silently discarded.
				///
				MacroExpander(const std::vector<std::string> &Macros,
				clang::SourceManager &SourceMgr, const FormatStyle &Style,
				llvm::SpecificBumpPtrAllocator<FormatToken> &Allocator,
				IdentifierTable &IdentTable);
				~MacroExpander();

				/// Returns whether a macro \p Name is defined.
				bool defined(llvm::StringRef Name) const;

				/// Returns whether the macro has no arguments and should not consume
				/// subsequent parentheses.
				bool objectLike(llvm::StringRef Name) const;

				/// Returns the expanded stream of format tokens for \p ID, where
				/// each element in \p Args is a positional argument to the macro call.
				llvm::SmallVector<FormatToken , 8> expand(FormatToken ID,
				ArgsList Args) const;

				private:
				struct Definition;
				class DefinitionParser;

				void parseDefinition(const std::string &Macro);

				clang::SourceManager &SourceMgr;
				const FormatStyle &Style;
				llvm::SpecificBumpPtrAllocator<FormatToken> &Allocator;
				IdentifierTable &IdentTable;
				std::vector<std::unique_ptr<llvm::MemoryBuffer>> Buffers;
				llvm::StringMap<Definition> Definitions;
				};

				} // namespace format
				} // namespace clang

				#endif

clang/unittests/Format/CMakeLists.txt

Show All 9 Lines	add_clang_unittest(FormatTests
FormatTestJS.cpp		FormatTestJS.cpp
FormatTestJava.cpp		FormatTestJava.cpp
FormatTestObjC.cpp		FormatTestObjC.cpp
FormatTestProto.cpp		FormatTestProto.cpp
FormatTestRawStrings.cpp		FormatTestRawStrings.cpp
FormatTestSelective.cpp		FormatTestSelective.cpp
FormatTestTableGen.cpp		FormatTestTableGen.cpp
FormatTestTextProto.cpp		FormatTestTextProto.cpp
		MacroExpanderTest.cpp
NamespaceEndCommentsFixerTest.cpp		NamespaceEndCommentsFixerTest.cpp
SortImportsTestJS.cpp		SortImportsTestJS.cpp
SortImportsTestJava.cpp		SortImportsTestJava.cpp
SortIncludesTest.cpp		SortIncludesTest.cpp
UsingDeclarationsSorterTest.cpp		UsingDeclarationsSorterTest.cpp
)		)

clang_target_link_libraries(FormatTests		clang_target_link_libraries(FormatTests
PRIVATE		PRIVATE
clangBasic		clangBasic
clangFormat		clangFormat
clangFrontend		clangFrontend
clangRewrite		clangRewrite
clangToolingCore		clangToolingCore
)		)

clang/unittests/Format/MacroExpanderTest.cpp

This file was added.

				#include "../../lib/Format/Macros.h"
				#include "TestLexer.h"
				#include "clang/Basic/FileManager.h"

				#include "gtest/gtest.h"

				MyDeveloperDayUnsubmitted Done Reply Inline Actions are you using this? MyDeveloperDay: are you using this?
				namespace clang {
				namespace format {

				namespace {

				class MacroExpanderTest : public ::testing::Test {
				public:
				std::unique_ptr<MacroExpander>
				create(const std::vector<std::string> &MacroDefinitions) {
				return std::make_unique<MacroExpander>(MacroDefinitions,
				Lex.SourceMgr.get(), Lex.Style,
				Lex.Allocator, Lex.IdentTable);
				}

				std::string expand(MacroExpander &Macros, llvm::StringRef Name,
				const std::vector<std::string> &Args = {}) {
				EXPECT_TRUE(Macros.defined(Name));
				return text(Macros.expand(Lex.id(Name), lexArgs(Args)));
				}

				llvm::SmallVector<TokenList, 1>
				lexArgs(const std::vector<std::string> &Args) {
				llvm::SmallVector<TokenList, 1> Result;
				for (const auto &Arg : Args) {
				Result.push_back(uneof(Lex.lex(Arg)));
				}
				return Result;
				}

				struct MacroAttributes {
				clang::tok::TokenKind Kind;
				MacroRole Role;
				unsigned Start;
				unsigned End;
				llvm::SmallVector<FormatToken *, 1> ExpandedFrom;
				};

				void expectAttributes(const TokenList &Tokens,
				const std::vector<MacroAttributes> &Attributes,
				const std::string &File, unsigned Line) {
				EXPECT_EQ(Tokens.size(), Attributes.size()) << text(Tokens);
				for (size_t I = 0, E = Tokens.size(); I != E; ++I) {
				if (I >= Attributes.size())
				continue;
				std::string Context =
				("for token " + llvm::Twine(I) + ": " + Tokens[I]->Tok.getName() +
				" / " + Tokens[I]->TokenText)
				.str();
				EXPECT_TRUE(Tokens[I]->is(Attributes[I].Kind))
				<< Context << " in " << text(Tokens) << " at " << File << ":" << Line;
				MyDeveloperDayUnsubmitted Done Reply Inline Actions when these assertions fail you have no idea which of the various calls is actually failing how about passing in FILE,LINE then adding that to the output MyDeveloperDay: when these assertions fail you have no idea which of the various calls is actually failing how…
				EXPECT_EQ(Tokens[I]->MacroCtx->Role, Attributes[I].Role)
				<< Context << " in " << text(Tokens) << " at " << File << ":" << Line;
				EXPECT_EQ(Tokens[I]->MacroCtx->StartOfExpansion, Attributes[I].Start)
				<< Context << " in " << text(Tokens) << " at " << File << ":" << Line;
				EXPECT_EQ(Tokens[I]->MacroCtx->EndOfExpansion, Attributes[I].End)
				<< Context << " in " << text(Tokens) << " at " << File << ":" << Line;
				EXPECT_EQ(Tokens[I]->MacroCtx->ExpandedFrom, Attributes[I].ExpandedFrom)
				<< Context << " in " << text(Tokens) << " at " << File << ":" << Line;
				}
				}

				TestLexer Lex;
				};

				#define EXPECT_ATTRIBUTES(Tokens, Attributes) \
				expectAttributes(Tokens, Attributes, __FILE__, __LINE__)

				TEST_F(MacroExpanderTest, SkipsDefinitionOnError) {
				auto Macros =
				create({"A(", "B(,", "C(a,", "D(a a", "E(a, a", "F(,)", "G(a;"});
				for (const auto *Name : {"A", "B", "C", "D", "E", "F", "G"}) {
				EXPECT_FALSE(Macros->defined(Name)) << "for Name " << Name;
				}
				}

				TEST_F(MacroExpanderTest, ExpandsWithoutArguments) {
				auto Macros = create({
				"A",
				"B=b",
				"C=c + c",
				"D()",
				});
				EXPECT_TRUE(Macros->objectLike("A"));
				EXPECT_TRUE(Macros->objectLike("B"));
				EXPECT_TRUE(Macros->objectLike("C"));
				EXPECT_TRUE(!Macros->objectLike("D"));
				EXPECT_EQ("", expand(*Macros, "A"));
				EXPECT_EQ("b", expand(*Macros, "B"));
				EXPECT_EQ("c+c", expand(*Macros, "C"));
				EXPECT_EQ("", expand(*Macros, "D"));
				}

				TEST_F(MacroExpanderTest, ExpandsWithArguments) {
				auto Macros = create({
				"A(x)",
				"B(x, y)=x + y",
				});
				EXPECT_EQ("", expand(*Macros, "A", {"a"}));
				EXPECT_EQ("b1+b2+b3", expand(*Macros, "B", {"b1", "b2 + b3"}));
				EXPECT_EQ("x+", expand(*Macros, "B", {"x"}));
				}

				TEST_F(MacroExpanderTest, AttributizesTokens) {
				auto Macros = create({
				"A(x, y)={ x + y; }",
				"B(x, y)=x + 3 + y",
				});
				auto *A = Lex.id("A");
				auto AArgs = lexArgs({"a1 * a2", "a3 * a4"});
				auto Result = Macros->expand(A, AArgs);
				EXPECT_EQ(11U, Result.size()) << text(Result) << " / " << Result;
				EXPECT_EQ("{a1a2+a3a4;}", text(Result));
				std::vector<MacroAttributes> Attributes = {
				{tok::l_brace, MR_Hidden, 1, 0, {A}},
				{tok::identifier, MR_ExpandedArg, 0, 0, {A}},
				{tok::star, MR_ExpandedArg, 0, 0, {A}},
				{tok::identifier, MR_ExpandedArg, 0, 0, {A}},
				{tok::plus, MR_Hidden, 0, 0, {A}},
				{tok::identifier, MR_ExpandedArg, 0, 0, {A}},
				{tok::star, MR_ExpandedArg, 0, 0, {A}},
				{tok::identifier, MR_ExpandedArg, 0, 0, {A}},
				{tok::semi, MR_Hidden, 0, 0, {A}},
				{tok::r_brace, MR_Hidden, 0, 1, {A}},
				{tok::eof, MR_Hidden, 0, 0, {A}},
				};
				EXPECT_ATTRIBUTES(Result, Attributes);

				auto *B = Lex.id("B");
				auto BArgs = lexArgs({"b1", "b2"});
				Result = Macros->expand(B, BArgs);
				EXPECT_EQ(6U, Result.size()) << text(Result) << " / " << Result;
				EXPECT_EQ("b1+3+b2", text(Result));
				Attributes = {
				{tok::identifier, MR_ExpandedArg, 1, 0, {B}},
				{tok::plus, MR_Hidden, 0, 0, {B}},
				{tok::numeric_constant, MR_Hidden, 0, 0, {B}},
				{tok::plus, MR_Hidden, 0, 0, {B}},
				{tok::identifier, MR_ExpandedArg, 0, 1, {B}},
				{tok::eof, MR_Hidden, 0, 0, {B}},
				};
				EXPECT_ATTRIBUTES(Result, Attributes);
				}

				TEST_F(MacroExpanderTest, RecursiveExpansion) {
				auto Macros = create({
				"A(x)=x",
				"B(x)=x",
				"C(x)=x",
				});

				auto *A = Lex.id("A");
				auto *B = Lex.id("B");
				auto *C = Lex.id("C");

				auto Args = lexArgs({"id"});
				auto CResult = uneof(Macros->expand(C, Args));
				auto BResult = uneof(Macros->expand(B, CResult));
				auto AResult = uneof(Macros->expand(A, BResult));

				std::vector<MacroAttributes> Attributes = {
				{tok::identifier, MR_ExpandedArg, 3, 3, {C, B, A}},
				};
				EXPECT_ATTRIBUTES(AResult, Attributes);
				}

				TEST_F(MacroExpanderTest, SingleExpansion) {
				auto Macros = create({"A(x)=x+x"});
				auto *A = Lex.id("A");
				auto Args = lexArgs({"id"});
				auto Result = uneof(Macros->expand(A, Args));
				std::vector<MacroAttributes> Attributes = {
				{tok::identifier, MR_ExpandedArg, 1, 0, {A}},
				{tok::plus, MR_Hidden, 0, 0, {A}},
				{tok::identifier, MR_Hidden, 0, 1, {A}},
				};
				EXPECT_ATTRIBUTES(Result, Attributes);
				}

				sammccallUnsubmitted Done Reply Inline Actions may want a test that uses of an arg after the first are not expanded, because that "guards" a bunch of nasty potential bugs sammccall: may want a test that uses of an arg after the first are not expanded, because that "guards" a…
				klimekAuthorUnsubmitted Done Reply Inline Actions Discussed offline: the above test tests exactly this. klimek: Discussed offline: the above test tests exactly this.
				} // namespace
				} // namespace format
				} // namespace clang

clang/unittests/Format/TestLexer.h

This file was added.

				//===--- TestLexer.h - Format C++ code --------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file contains a TestLexer to create FormatTokens from strings.
				///
				//===----------------------------------------------------------------------===//

				#ifndef CLANG_UNITTESTS_FORMAT_TESTLEXER_H
				#define CLANG_UNITTESTS_FORMAT_TESTLEXER_H
				sammccallUnsubmitted Done Reply Inline Actions I guess clang-tidy wants ..._TESTLEXER_H here sammccall: I guess clang-tidy wants ..._TESTLEXER_H here

				#include "../../lib/Format/FormatTokenLexer.h"

				#include "clang/Basic/FileManager.h"
				#include "clang/Basic/SourceManager.h"

				#include <numeric>
				#include <ostream>

				namespace clang {
				namespace format {

				typedef llvm::SmallVector<FormatToken *, 8> TokenList;

				inline std::ostream &operator<<(std::ostream &Stream, const FormatToken &Tok) {
				Stream << "(" << Tok.Tok.getName() << ", \"" << Tok.TokenText.str() << "\")";
				return Stream;
				}
				inline std::ostream &operator<<(std::ostream &Stream, const TokenList &Tokens) {
				Stream << "{";
				for (size_t I = 0, E = Tokens.size(); I != E; ++I) {
				Stream << (I > 0 ? ", " : "") << *Tokens[I];
				}
				Stream << "}";
				return Stream;
				}

				inline TokenList uneof(const TokenList &Tokens) {
				assert(!Tokens.empty() && Tokens.back()->is(tok::eof));
				return TokenList(Tokens.begin(), std::prev(Tokens.end()));
				}

				inline std::string text(llvm::ArrayRef<FormatToken *> Tokens) {
				return std::accumulate(Tokens.begin(), Tokens.end(), std::string(),
				[](const std::string &R, FormatToken *Tok) {
				return (R + Tok->TokenText).str();
				});
				}

				class TestLexer {
				public:
				TestLexer() : SourceMgr("test.cpp", "") {}

				TokenList lex(llvm::StringRef Code) {
				Buffers.push_back(
				llvm::MemoryBuffer::getMemBufferCopy(Code, "<scratch space>"));
				clang::FileID FID = SourceMgr.get().createFileID(SourceManager::Unowned,
				Buffers.back().get());
				FormatTokenLexer Lex(SourceMgr.get(), FID, 0, Style, Encoding, Allocator,
				IdentTable);
				auto Result = Lex.lex();
				return TokenList(Result.begin(), Result.end());
				}

				FormatToken *id(llvm::StringRef Code) {
				auto Result = uneof(lex(Code));
				assert(Result.size() == 1U && "Code must expand to 1 token.");
				return Result[0];
				}

				FormatStyle Style = getLLVMStyle();
				encoding::Encoding Encoding = encoding::Encoding_UTF8;
				std::vector<std::unique_ptr<llvm::MemoryBuffer>> Buffers;
				clang::SourceManagerForFile SourceMgr;
				llvm::SpecificBumpPtrAllocator<FormatToken> Allocator;
				IdentifierTable IdentTable;
				};

				} // namespace format
				} // namespace clang

				#endif // LLVM_CLANG_UNITTESTS_FORMAT_TEST_LEXER_H