This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Format/
-
Format/
-
CMakeLists.txt
1/2
FormatToken.h
6/7
MacroCallReconstructor.cpp
35/46
Macros.h
-
UnwrappedLineParser.h
-
unittests/Format/
-
Format/
-
CMakeLists.txt
-
MacroCallReconstructorTest.cpp

Differential D88299

[clang-format] Add MacroUnexpander.
ClosedPublic

Authored by klimek on Sep 25 2020, 6:28 AM.

Download Raw Diff

Details

Reviewers

sammccall
rymiel
HazardyKnusperkeks
owenpan
MyDeveloperDay

Commits

rGd6d0dc1f4537: [clang-format] Add MacroUnexpander.

Summary

MacroUnexpander applies the structural formatting of expanded lines into
UnwrappedLines to the corresponding unexpanded macro calls, resulting in
UnwrappedLines for the macro calls the user typed.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

klimek created this revision.Sep 25 2020, 6:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2020, 6:28 AM

Herald added subscribers: cfe-commits, mgorny. · View Herald Transcript

klimek requested review of this revision.Sep 25 2020, 6:28 AM

sammccall mentioned this in D83296: [clang-format] Add a MacroExpander..Sep 28 2020, 1:46 PM

This is magnificently difficult :-)
I've sunk a fair few hours into understanding it now, and need to send some comments based on the interface+tests.
Some of these will be misguided or answered by the implementation, but I'm afraid I can't fit it all into my head (on a reasonable timeframe) until I understand it better.

clang/lib/Format/FormatToken.h
500	I can't really understand from the comment when this is supposed to be set, and there are no tests of it. (The comment is vague: is a "parent" the inverse of FormatToken::Children here? Is this scenario when the parents in question are new, or their children are new, or both? What part of the code is "formatting", and why would it otherwise skip the children?)
clang/lib/Format/Macros.h
135	"matches formatted lines" probably describes the hard technical problem it has to solve, but not so much what it does for the caller: what the transformation is between its inputs and its outputs. Is it something like: Rewrites expanded code (containing tokens expanded from macros) into spelled code (containing tokens for the macro call itself). Token types are preserved, so macro arguments in the output have semantically-correct types from their expansion. This is the point of expansion/unexpansion: to allow this information to be used in formatting. [Is it just tokentypes? I guess it's also Role and MustBreakBefore and some other stuff like that?]
143	I'm a bit confused by these arrows. It doesn't seem that they each point to an unwrappedline passed to addLine?
143	This example didn't really help me understand the interface of this class, it seems to be a special case: the input is a single block construct (rather than e.g. a whole program), but it's not clear why that's the case. the output (unexpanded form) consists of exactly a macro call with no leading/trailing tokens, which isn't true in general If the idea is to provide as input the minimal range of lines that span the macro, we should say so somewhere. But I would like to understand why we're not simply processing the whole file.
148	this says "creates the unwrapped lines" but getResult() returns only one. Does the plural here refer to the tree? Maybe just use singular or "the unwrappedlinetree"?
155	I get the symmetry between the expander/unexpander classes, but the name is making it harder for me to follow the code. the extra compound+negation in the name makes it hard/slow to understand, as I'm always thinking first about expansion the fact that expansion/unexpansion are not actually opposites completely breaks my intuition. It also creates two different meanings of "unexpanded" that led me down the garden path a bit (e.g. in the test). Would you consider `MacroCollapser`? It's artificial, but expand/collapse are well-known antonyms without being the same word. (Incidentally, I just realized this is why I find "UnwrappedLine" so slippery - the ambiguity between "this isn't wrapped" and "this was unwrapped")
165	I find this hard to follow. again, the "match" part is an implementation detail that sounds interesting but isn't :-) "from a macro" sounds like one in particular, but is actually every macro the bit about "getResult" is a separate point that feels shoehorned in What about: "Replaces tokens that were expanded from macros with the original macro calls. The result is stored and available in getResult()"
176	how can this be the case if the input can have multiple lines and the output only one? Is the return value a synthetic parent of the translated lines? Or is there a hidden requirement on the caller here that we don't keep feeding in lines unless we're continuing a macro call and therefore know we'll end up with one line? This stuff could be clarified in docs but again I have to ask, can't we sidestep the whole issue by processing the whole file and returning all the lines? (This is somewhat answered in the implementation, though that doesn't seem like the right place)
238	The explanation here seems to be proving the converse: if we didn't use this representation, then the code wouldn't work. However what I'm missing is an explanation of why it is correct/sensible. After staring at the tests, I'm still not sure, since the tests seem to postprocess the "natural" output the same way before asserting equality. My tentative conclusion is it would be clearest to move this "in the end" step to the caller of getResult(), as it seems to have more to do with formatting than unexpansion. But I haven't looked in detail at that caller, maybe I'm wrong...
clang/unittests/Format/MacroUnexpanderTest.cpp
1 ↗	(On Diff #294294)	All of these tests use both the expander and unexpander (so need consider behavior/bugs of both). Is it possible to test the unexpander in isolation? (Context: I'm trying to use the tests to understand what the class does, but the inputs aren't visible)
1 ↗	(On Diff #294294)	Having read all the tests here, they seem to follow exactly the same schema: some macro definitions some sequence of expand() calls, simulating expanding some macro-heavy code verify the expanded token sequence, rearranging it into expected UnwrappedLine structure call unexpand() and check that the result has the expected UnwrappedLine structure You do this using various fairly-general helpers and DSLs, but don't really combine them in different ways... the tests are somewhat readable and error messages are OK, but if these are important tests, it might be worth looking at a data-driven test. e.g. Case NestedChildBlocks; NestedChildBlocks.Macros = {"ID(x)=x", "CALL(x)=f([] { x })"}; NestedChildBlocks.Original = "ID(CALL(CALL(return a * b;)))"; NestedChildBlocks.Expanded = R"cpp( { f([] { f([] { return a * b; }) }) } )cpp"; // indentation shows structure NestedChildBlocks.Unexpanded = R"cpp( ID( { CALL(CALL(return a * b;)) } ) )cpp"; NestedChildBlocks.verify(); Definitely involves a bit of reinventing wheels though.
20 ↗	(On Diff #294294)	docs for this class/major members?
28 ↗	(On Diff #294294)	this name (and all the other "Unexpanded*" in this class) is misleading, because there's a process called "unexpanding" but these aren't the output of it. I'd suggest "spelled", though I do think renaming the unexpander would also be worthwhile.
93 ↗	(On Diff #294294)	this needs docs (it's a cool technique! no need to make the reader decipher it though)
96 ↗	(On Diff #294294)	you always consume(lex(...)) - taking a StringRef directly instead would make it clearer that the identity of the temporary tokens doesn't matter
125 ↗	(On Diff #294294)	`create()` is a strange name for the expander in an unexpander test
141 ↗	(On Diff #294294)	FWIW, this seems confusing to me - line() has overloads that are simple, and then this one that does the same nontrivial stitching that the production code does. The fact that the a/b lines are not sibling sub-lines in `line({tokens("a"), tokens("b")})`but they are sibling tokens in `line(lex("a b"))` is hard to keep track of in the tests. If the stitching really is necessary, I think it's important for the expected output to also be shown in its stitched form.
155 ↗	(On Diff #294294)	why is this "tokens" and not "chunk"?
207 ↗	(On Diff #294294)	you have lots of assertions that this is true, but none that it is false
242 ↗	(On Diff #294294)	the high-level point of this test seems to be that by looking the post-expansion context, we can determine that b is a declared pointer so we don't put a space before it. And everywhere these tokens are mentioned/verified here, the spacing is correct, as if it were propagated... but of course the spacing is actually ignored everywhere and underneath it's just sequences of tokens. This makes it hard to track how well this it testing what it really wants to test. Is it possible to mark the asterisk with the correct tokentype at the point where formatting would occur, and then verify that the unexpanded form (i.e. `Chunk1.Tokens[1]`) still has the tokentype set? (It's probably possible to prove this by inspection by understanding what U1 contains, how Matcher works etc, but I think it'd still be illustrative)
245 ↗	(On Diff #294294)	what does id() mean if not identifier?
406 ↗	(On Diff #294294)	because this example isn't realistic, it'd be nice to have the comment here showing what formatting you're simulating

Adapted based on review comments.

clang/lib/Format/FormatToken.h
500	Rewrote comment.
clang/lib/Format/Macros.h
135	It's not the token info, this we'd trivially have by using the original token sequence which is still lying around (we re-use the same tokens). Reworded.
143	Why not? That is the intention? Note that high-level we do not pass class definitions in one unwrapped line; if they ever go into a single line it's done in the end by a special merging pass (yes, that's a bit confusing).
143	Re: input is a single construct - that is not the case (see other comment) Re: leading / trailing tokens: I didn't want to make it too complex in the example.
148	Fixed comment.
155	Happy to rename, but first wanted to check in - I use "unexpanded token stream" quite often to refer to the macro call. Perhaps we should also find different wording for that then? Perhaps we should call this MacroLineMatcher btw? This is not creating anything new - the resulting token sequence is the "unexpanded token sequence" with the exact same tokens, the special thing is that they're matched into unwrapped lines.
165	I think the match part is important, as it's matching unwrapped lines, which is the heart of the algorithm.
176	Reworded; the reason why we have the single-line anyway is that: a macro call is something we generally want to have in one unwrapped line the tokens (or other macro calls) that go into the same instance of MacroUnexpander only consist of tokens that do not have an unwrapped line break and macro calls Thus, we want the output to be in a single unwrapped line, as we're otherwise majorly confusing ~everything else in the formatter.
238	This is basically what I wrote before - in the end, that the expanded code creates multiple unwrapped lines is the weird thing, as the input is fundamentally a single unwrapped line (the macro call plus a bit of stuff around it). Thus, it's quite natural for the unexpander to return a single unwrapped line that represents the original structure. Not sure how to best put this in words.
clang/unittests/Format/MacroUnexpanderTest.cpp
1 ↗	(On Diff #294294)	Not sure I understand what you mean - everything that's interesting about the inputs is spelled out in the test - namely, what the structure of unwrapped lines going into the unexpander is.
1 ↗	(On Diff #294294)	It seems like the main thing this does is replacing the structure how we build unwrapped lines with a DSL that gets parsed into unwrapped lines by indentation? I personally find that significantly less readable unless we'd create really good error messages if the indentation doesn't line up. In your example "Unexpanded" I have problems understanding exactly what goes into what unwrapped line intuitively - for example, {} can be one unwrapped line or 2 different ones. How do we decide when an unwrapped line is finished?
28 ↗	(On Diff #294294)	This comment made it clear why Unwrapper is a really bad name, because it's also not about collapsing.
96 ↗	(On Diff #294294)	Yeah, I thought that I want Matcher and the test to be more decoupled, but passing in the Lexer is not a big thing, so changed.
141 ↗	(On Diff #294294)	Would renaming tokens() to chunk() help with that? The idea is that in the test I mostly need to create a single line from chunks of tokens. Also happy to rename this function if it helps more?
155 ↗	(On Diff #294294)	My thought was that Chunk is just a type for a chunk of a line, and I can create one from a set of tokens (via tokens()) or from a set of child unwrapped lines (via children()).
207 ↗	(On Diff #294294)	Yeah, that's a white-box assertion - finished() is false the vast majority of the time, so testing the true cases is the important part - happy to add tests if you think it's worth it.
242 ↗	(On Diff #294294)	I don't care about anything about the tokens than that they have pointer identity and that the same tokens end up in the right unwrapped lines in the result (thus match also checks token identity).
245 ↗	(On Diff #294294)	Sigh, yeah, good point - initially this was used for identifiers, but it really means "lex exactly one token". Do you have a good name that is short (it's used really often) and descriptive?
406 ↗	(On Diff #294294)	As before, formatting is (to me) not relevant here - what's relevant is that we don't crash and the resulting unwrapped lines contain the right tokens.

Harbormaster completed remote builds in B76711: Diff 301253.Oct 28 2020, 6:15 AM

Thanks a lot for all the time explaining, I get this at least at a high level now.
I have trouble following the ins and outs of the algorithm, but I think I understand most of it and tests cover the rest.

Regarding tests: I have trouble reasoning about whether all cases are tested. I wonder if we can easily throw this against (internal) coverage + mutation testing infrastructure? I'm not a massive believer in that stuff usually, but this seems like a good candidate.

Lots of comments around making this easier for confused people like me to understand, partly because I don't understand it well enough to suggest more substantive changes.
Throughout, I think it'd be nice to be explicit about:

which structure tokens/lines are part of: spelled/expanded/unexpanded. (Including getting rid of input/output/incoming/original, which AFAICT are synonyms for these)
which lines are complete, and which are partial (being added to as we go)

It took me a while to understand the control flow and to tell "who's driving" - I think it was hard for me to see that processNextUnexpanded was basically a leaf, and that unexpand/startUnexpansion/endUnexpansion/continueUnexpansionUntil were the layer above, and add() was the top. Maybe there's naming that would clarify this better, but I don't have great suggestions, and I'm not sure if this is a problem other people would have.
Maybe an example trace showing the internal method calls when parsing a small example might help... but again, not sure.

clang/lib/Format/MacroUnexpander.cpp
52 ↗	(On Diff #301253)	this doesn't use any state, right? it could be static or even private to the impl file. replacing std::function with a template param allow this loop to be specialized for the one callsite - up to you, maybe it doesn't matter much, but it's not very invasive
77 ↗	(On Diff #301253)	I can't work out what "it" refers to in this sentence. (and "spelled" token stream?)
82 ↗	(On Diff #301253)	It would be helpful to complete the example by spelling out which token you're adding, which the correct parent is, and which tokens you need to "expand over" to make it available. I think the answer to the first two is - when you're adding the `a` then its proper parent is the inner `(` in `BRACED(BRACED(`... but I don't know the last part.
263 ↗	(On Diff #301253)	maybe unexpandActiveMacroUntil or so? Something to hint that this stops at the end of the top-of-stack macro. (and to avoid the term "unexpansion stream" which isn't used/defined anywhere else)
269 ↗	(On Diff #301253)	nit: while
273 ↗	(On Diff #301253)	nit: this assert is just the opposite of the while condition, drop it?
322 ↗	(On Diff #301253)	want to keep this?
391 ↗	(On Diff #301253)	nit: finished()
clang/lib/Format/Macros.h
25–29	This would be a good place to explicitly introduce the name "unexpanded" for what comes out of the unexpander. This para gives names to the token streams, but not as clearly to the UnwrappedLines parsed out of them. I think the fact that the tokens alias between the streams/lines, and so the final formatting of the expanded lines "writes through" tokentype etc to the unexpanded lines, is an important design point worth emphasizing. (This part is mostly structure around "what happens", with the data sets secondary - I think I'd personally find the reverse easier to follow but YMMV)
30–31	would s/formats/annotates/ be inaccurate? This is just my poor understanding of the code, but it wasn't obvious to me that annotation is closely associated with formatting and not with parsing UnwrappedLines.
136	I know it's the common case, but I think saying "the macro call" here is misleading, because it quickly becomes apparent reading the code that the scope isn't one macro call, and (at least for me) it's easy to get hung up on not understanding what the scope is. (AIUI the scope is actually one UL of output... so the use of plural there is also confusing). I think a escription could be something like: Converts a sequence of UnwrappedLines containing expanded macros into a single UnwrappedLine containing the macro calls. This UnwrappedLine may be broken into child lines, in a way that best conveys the structure of the expanded code. ... In the simplest case, a spelled UnwrappedLine contains one macro, and after expanding it we have one expanded UnwrappedLine. In general, macro expansions can span UnwrappedLines, and multiple macros can contribute tokens to the same line. We keep consuming expanded lines until: all expansions that started have finished (we're not chopping any macros in half) and we've reached the end of a spelled unwrapped line A single UnwrappedLine represents this chunk of code. After this point, the state of the spelled/expanded stream is "in sync" (both at the start of an UnwrappedLine, with no macros open), so the Unexpander can be thrown away and parsing can continue. (and then launch into an example)
143	Re: input is a single construct - that is not the case (see other comment) A class is definitely a single construct :-) It sounds like that's not significant to the MacroUnexpander though, so it feels like a red herring to me. Re: leading / trailing tokens: I didn't want to make it too complex in the example. That seems fine, I think the complexities of the general case need to be mentioned somewhere because the API reflects them. But you're right, the primary example should be simple. I think a tricky example (maybe the `#define M ; x++` one?) could be given on one of addLine/finish/getResult maybe.
155	I think unexpanded/unexpander are reasonable names, having understood this better, but with caveats. It's important to distinguish between the pre-expansion state ("spelled"?) and the post-unexpansion state ("unexpanded?"). The UnwrappedLines are vitally different, but the token stream is the same as you point out. When referring to the token stream, I think "spelled" is probably less confusing (that's where the token stream fundamentally comes from), and explicitly mentioning somewhere that the token sequence encoded by the unexpanded lines is the same original spelled stream.
175	const? or do we not care
175	Maybe comment that when finished() is true, it's possible to call getResult() and stop processing... but also valid to continue calling addLine(), if this isn't a good place to stop.
183	Maybe a note like "this representation is chosen because it can be opaque to the UnwrappedLineParser, but the Formatter treats it appropriately" or something. I think it should be clear that this representation isn't really "natural" at this layer (or if it is, why). Maybe an example would help.
190	ASCII art of some sort would help :-)
193	nit: giving `getResult()` a side-effect but also making it idempotent is a bit clever/confusing. Either exposing `void finalize();` + `UnwrappedLine get() const`, or `UnwrappedLine takeResult() &&`, seems a little more obvious.
287	consider calling these NextSpelled and EndSpelled to be really explicit? Since the type doesn't really clarify which sequence is being referred to.
292	I think this is a confusing use of "unexpanded". These macros that we're in the process of unexpanding. So the past tense doesn't seem right, they don't seem more "unexpanded" than they do "expanded", at least to me. Maybe ActiveExpansions or so?
clang/unittests/Format/MacroUnexpanderTest.cpp
541 ↗	(On Diff #301253)	nit: Result

sammccall added inline comments.Feb 3 2021, 7:56 AM

clang/lib/Format/MacroUnexpander.cpp
66 ↗	(On Diff #301253)	I find using all of spelled/expanded/unexpanded, input/incoming/output/outgoing, and original, unclear. (Particularly after OriginalParent is propagated around by that name without comment, and because "original" refers to the expanded structure, which is not the first one created) Can we call this "ExpandedParent in the expanded unwrapped line"?
66 ↗	(On Diff #301253)	Unexand -> unexpand
80 ↗	(On Diff #301253)	(in these examples throughout, #define BRACED(a) {a} would make it clearer that this is a macro definition and not code)
84 ↗	(On Diff #301253)	is it possible that you may need to unexpand more than the innermost macro? e.g. BRACED(ID(ID(BRACED(ID(ID(a)))))) expands to {{a}} and the parents of `a` and inner `{` each come from a couple of macro-levels up. (I don't totally understand the logic here, so the answer's probably "no"...)
103 ↗	(On Diff #301253)	I had trouble understanding the what this function does at a high level: i.e. why you'd want this. Maybe: Adjusts the stack of active (unexpanded) lines so we're ready to push tokens. The tokens to be pushed are children of ExpandedParent (in the expanded code). This may entail: - creating a new line, if the parent is on the active line - popping active lines, if the parent is further up the stack and s/First/ForceNewLine/ to avoid documenting it? (impl looks good though once I understood what it does!)
156 ↗	(On Diff #301253)	nit: s/find/lookup/, then you only have to deal with pointers and this becomes a bit easier to read IMO
173 ↗	(On Diff #301253)	This raises another point: a macro can have an empty body. AFAICT this fundamentally isn't supported here, as we're driven by the expanded token stream. I guess it makes sense to handle this elsewhere in clang-format (or even not handle it) but it should be documented somewhere.
237 ↗	(On Diff #301253)	assert that this number is equal to StartOfExpansion?
238 ↗	(On Diff #301253)	nit: index arithmetic obscures what's going on a bit. You could write ArrayRef<FormatToken > StartedMacros = makeArrayRef(Token->MacroCtx->ExpandedFrom).drop_front(Unexpanded.size()); for (FormatToken ID : llvm::reverse(StartedMacros)) { ... } but up to you It's not obvious to me why we're iterating in reverse order BTW: i.e. why the order of the `Unexpanded` stack is opposite the order of the `ExpandedFrom` stack. So maybe just a comment to reinforce that, like (// ExpandedFrom is outside-in order, we want inside-out)
282 ↗	(On Diff #301253)	nit: Token->MacroCtx is checked at the callsite, and startUnexpansion asserts it as a precondition - do that here too for symmetry?
303 ↗	(On Diff #301253)	finishe -> finish
310 ↗	(On Diff #301253)	I can't work out if this is supposed to say comma or comment :-) If comma - is that a thing? If comment - why would a comment be considered part of the unexpansion of a macro invocation, rather than a sibling to the macro? How do we know what trails is a comment - should we have an assertion?
319 ↗	(On Diff #301253)	describe the return value, it's not obvious
345 ↗	(On Diff #301253)	nit: easier to see the three cases being handled independently (comma, rparen, lparen) if they're all siblings instead of grouping two together.
348 ↗	(On Diff #301253)	This line is both subtle and cryptic :-). // New unwrapped-lines from unexpansion are normally "chained" - treated as // children of the last token of the previous line. // Redirect them to be treated as children of the comma instead. // (only affects lines added before we push more tokens into MacroStructure.Line)
366 ↗	(On Diff #301253)	This feels like a stupid question, but how do we know that this non-macro-parenthesis has anything to do with macros? (Fairly sure the answer is "processNextUnexpanded() is only called when the token is macro-related", it'd be nice to have this precondition spelled out somewhere)
367 ↗	(On Diff #301253)	please assign to the struct fields or use /Foo=/ comments All the data flow around this struct is local but the order of fields isn't :-)
369 ↗	(On Diff #301253)	nit: = MacroCallStructure.back().RedirectParentTo This line + line 361 are the keys to understanding what the RedirectParentFrom/To do, so it'd be helpful for them to explicitly use those variables. (or use the source expressions for both... but in that case the fields may need docs)
396 ↗	(On Diff #301253)	nit: pull out a named reference for Output.Tokens.front() after the size assertion. assert(NullToken.is(tok::null) && !NullToken.children.empty()) may even obviate the need for a comment :-)
398 ↗	(On Diff #301253)	resulting
406 ↗	(On Diff #301253)	hmm, when is this possible? are these literal blank lines? where do they come from?
409 ↗	(On Diff #301253)	Isn't this just the end of line from the previous iteration of this loop? Why the global map populated by pushToken? (I feel like i'm missing something obvious here)
411 ↗	(On Diff #301253)	nit: again .lookup()
423 ↗	(On Diff #301253)	nit: i'd consider calling this appendToken, since lots of the stuff in this class deals with stacks but this doesn't) (I say a lot of nasty things about java but ArrayList.add is 1000x better name than vector::push_back)
clang/lib/Format/Macros.h
164	any reason for std::map rather than DenseMap, here and elsewhere? (Only good reasons i'm aware of are sorting and pointer stability)
219	you have this "no expansions are open, but we already didn't find any" state. The effect of this is that finished() returns false so the caller keeps looping. But a correct caller will never rely on this: the first line a caller feeds must have macro tokens in it, or our output will be garbage AFAICT calling getResult() without feeding in any lines is definitely not correct It seems we could rather assert on these two conditions, and eliminate the Start/InProgress distinction. That way incorrect usage is an assertion crash, rather than turning into an infinite loop.
221	similarly, the InProgress/Finalized distinction would be eliminated by making `takeResult()` destructive, and requiring (through the type system or an assert) that it be called only once. It doesn't seem that it needs to be part of the NDEBUG runtime control flow.
250	This is very closely related to what you return from "getResult" - not quite the same, but Output vs Result doesn't seem to hint at the difference. Could we use the same name for both?
254	ActiveUnexpandedLines? (Line is very overloaded here)
279	ParentInExpandedToParentInUnexpanded? (current name implies that it maps a token to its parent. It also uses the input/output names, rather than expanded/unexpanded - it would be nice to be consistent)
315	nit: if you're going to specify SmallVector sizes, I don't understand why you'd size Unexpanded vs MacroCallStructure differently - they're going to be the same size, right? These days you can leave off the size entirely though, and have it pick a value
clang/unittests/Format/MacroUnexpanderTest.cpp
202 ↗	(On Diff #301253)	There's no test for a macro that expands to nothing - is this supported?
207 ↗	(On Diff #294294)	Yes - a basic test that it's not always true would be useful I think (maybe the `#define M ;x++` case would be useful for showing the expected loop and finished() values)

Work in review comments.

Harbormaster completed remote builds in B136238: Diff 390057.Nov 26 2021, 8:29 AM

Noticed I should have waiting with the renaming of the file until the review is done :( Sorry for the extra confusion.

clang/lib/Format/MacroUnexpander.cpp
77 ↗	(On Diff #301253)	Changed to "given token" - it refers to the token. It's reconstructed, not spelled, like the example below explains: we do have hidden tokens that we want to find when we're in the middle of a recursive expansion. I wouldn't call the hidden tokens here "spelled" - would you?
82 ↗	(On Diff #301253)	Found out that I was missing a unit test. Added a unit test, and now explained the unit test here in the comment. PTAL.
84 ↗	(On Diff #301253)	A token on a higher level must always also be there on a lower level. Calls in this example are: BRACED({a}) ID({a}) ID({a}) BRACED(a) ID(a) ID(a) When the next token comes in, we thus always find it in the higher level (open) macro call. When we reconstruct for that token, we then reconstruct multiple macro calls at once, if it is the first token in multiple macro calls.
173 ↗	(On Diff #301253)	I think the right point to document and handle it is at the next layer, where we integrate all of this into clang-format.
238 ↗	(On Diff #301253)	Oooh, this is nice. s/drop_front/drop_back/. Added comment.
282 ↗	(On Diff #301253)	Thanks for spotting, this was from a time where the code looked differently (the assert above also didn't make sense any more in that form).
310 ↗	(On Diff #301253)	Added words. Re: the trailing comment, that's an idiosyncrasy of who clang-format handles trailing comments - it parses them with the token preceding them. We could probably spend more complexity on the call side, trying to break them out, but I'm not sure that's better, given that this point already needs to be fairly reliable against all user code.
348 ↗	(On Diff #301253)	This is partially captured in the comment above. Added a shorter description here, let me know if that's not enough.
366 ↗	(On Diff #301253)	Added a comment above the restructured !Token->MacroCtx check.
367 ↗	(On Diff #301253)	Added a constructor.
369 ↗	(On Diff #301253)	Line 361 was not actually necessary; I first thought that was a bug, but then dug in, and noticed that we weren't able to hit that code path (anymore, after restructuring) Completely renamed the members and documented them, added some more asserts to make things hopefully clearer - thanks for pointing it out, those were clearly underdocumented; I hope it's now more clear.
406 ↗	(On Diff #301253)	Looks like you're right; this looks like it was left over from a previous different structure, I'll make sure to fuzz it thoroughly.
409 ↗	(On Diff #301253)	Nice catch, the complexity here was also from a previous iteration where the structure of the algorithm was (even) more complex, where we needed to also do stitching for non-top-level lines. Now able to completely get rid of PreviousToken and TokenToPrevious \o/
clang/lib/Format/Macros.h
30–31	Said "annotates and formats" now - yes, the fact that annotating and formatting is so coupled is a admittedly weird choice of the initial design.
136	Thanks, that's a really good write-up!
143	#define M ; x++ seems to be similarly tricky to #define CLASSA(x) class A x We get multiple calls to addLine, in between which finished() is false.
193	Done.
219	Changed to asserts.
279	Called it SpelledParentToReconstructedParent.
287	Called them SpelledI and SpelledE in llvm tradition.
315	I did not know that, that's awesome!
clang/unittests/Format/MacroUnexpanderTest.cpp
207 ↗	(On Diff #294294)	We got a couple of tests like that, added checks for negative finished in between.

OK, I think we've reached the point where:

the impact is very clear, this solves a whole class of clang-format's biggest problems
the idea clearly works (there are good tests)
the implementation is very well-documented: I can't really improve my understanding further by asking for things to be better explained
I can't make clear suggestions to simplify - you've applied all my low-level suggestions and my high-level understanding is poor
I still don't feel like I really understand how it works, but that's not really different than the other big pieces of clang-format

So I think all that's left to do here is ship it. It makes me uneasy that the core of clang-format is functionally magic (could anyone other than you and Daniel reproduce it after nuclear apocalypse?) but this doesn't really change that state.

clang/lib/Format/MacroCallReconstructor.cpp
54	if you want to keep these LLVM_DEBUGs, maybe this should be "MCR" or so instead of "unex"?
63	you might want an assertion that Result has one token with one child (it's pretty obvious in finalize() but less so here)
99	nit: I think this would be clearer by naming the result: `if (bool PassedMacroComma = reconstruct...)` (because it's not clear from the name what the function returns, and documenting it would help only a little)
109	(this is the else branch of a negated condition, consider swapping the branches to avoid double-negative)
115	liens -> lines (unless there's a really weird metaphor going on here!)
233	this FIXME looks obsolete?
clang/lib/Format/Macros.h
190	nit: getResult()->takeResult() in comment now
201	you could give this a name like "tail form", and then refer to it in docs of MacroCallReconstructor::Result, in MacroCallReconstructor.cpp:482, etc. Up to you.

This revision is now accepted and ready to land.Jul 11 2022, 10:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 10:46 AM

klimek marked 7 inline comments as done.Jul 11 2022, 12:48 PM

klimek added inline comments.

clang/lib/Format/Macros.h
201	I'm somewhat unhappy with the term "tail form"; happy to do this in a subsequent change if we find a better name.

Address review comments.

Harbormaster completed remote builds in B174724: Diff 443729.Jul 11 2022, 1:47 PM

This revision was landed with ongoing or failed builds.Jul 12 2022, 12:12 AM

Closed by commit rGd6d0dc1f4537: [clang-format] Add MacroUnexpander. (authored by klimek). · Explain Why

This revision was automatically updated to reflect the committed changes.

klimek added a commit: rGd6d0dc1f4537: [clang-format] Add MacroUnexpander..

Does this patch change the formatting behaviour of clang-format?

If so, are there any test cases that show before/after formatting? The MacroCallReconstructorTest unit test seems like it's testing an internal interface.

In D88299#3660772, @nridge wrote:

Does this patch change the formatting behaviour of clang-format?

If so, are there any test cases that show before/after formatting? The MacroCallReconstructorTest unit test seems like it's testing an internal interface.

No, this is prep-work for the real change, which I'm planning to send out soon.

Thanks! (I was intrigued by Sam's "solves a whole class of clang-format's biggest problems" comment :-))

In D88299#3660779, @nridge wrote:

Thanks! (I was intrigued by Sam's "solves a whole class of clang-format's biggest problems" comment :-))

The end-result hopefully will :)

A bit surprising such a big change...

It looks like some of the braces in the code should be removed for example those surrounding one-line for bodies. Sorry if it is not my job to point this out, but MyDeveloperDay has not said anything.

@sstwcw thanks for pointing it out. See D134329.

In D88299#3804910, @owenpan wrote:

@sstwcw thanks for pointing it out. See D134329.

Thanks Owen, really appreciate this! And sorry for not getting to it yet myself :(

@klimek np!

rymiel mentioned this in D147176: [clang-format] NFC ensure Style operator== remains sorted for ease of editing.Mar 29 2023, 2:05 PM

@klimek can you have a look at https://github.com/llvm/llvm-project/issues/64275?

clang/lib/Format/MacroCallReconstructor.cpp
223	This fails as reported in https://github.com/llvm/llvm-project/issues/64275.

Herald added reviewers: rymiel, HazardyKnusperkeks, owenpan, MyDeveloperDay. · View Herald TranscriptAug 5 2023, 1:01 PM

Revision Contents

Path

Size

clang/

lib/

Format/

CMakeLists.txt

1 line

FormatToken.h

9 lines

MacroCallReconstructor.cpp

569 lines

Macros.h

277 lines

UnwrappedLineParser.h

3 lines

unittests/

Format/

CMakeLists.txt

1 line

MacroCallReconstructorTest.cpp

688 lines

Diff 443839

clang/lib/Format/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS support)			set(LLVM_LINK_COMPONENTS support)

	add_clang_library(clangFormat			add_clang_library(clangFormat
	AffectedRangeManager.cpp			AffectedRangeManager.cpp
	BreakableToken.cpp			BreakableToken.cpp
	ContinuationIndenter.cpp			ContinuationIndenter.cpp
	DefinitionBlockSeparator.cpp			DefinitionBlockSeparator.cpp
	Format.cpp			Format.cpp
	FormatToken.cpp			FormatToken.cpp
	FormatTokenLexer.cpp			FormatTokenLexer.cpp
				MacroCallReconstructor.cpp
	MacroExpander.cpp			MacroExpander.cpp
	NamespaceEndCommentsFixer.cpp			NamespaceEndCommentsFixer.cpp
	QualifierAlignmentFixer.cpp			QualifierAlignmentFixer.cpp
	SortJavaScriptImports.cpp			SortJavaScriptImports.cpp
	TokenAnalyzer.cpp			TokenAnalyzer.cpp
	TokenAnnotator.cpp			TokenAnnotator.cpp
	UnwrappedLineFormatter.cpp			UnwrappedLineFormatter.cpp
	UnwrappedLineParser.cpp			UnwrappedLineParser.cpp
	Show All 9 Lines

clang/lib/Format/FormatToken.h

Show First 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	public:
/// If this token starts a block, this contains all the unwrapped lines		/// If this token starts a block, this contains all the unwrapped lines
/// in it.		/// in it.
SmallVector<AnnotatedLine *, 1> Children;		SmallVector<AnnotatedLine *, 1> Children;

// Contains all attributes related to how this token takes part		// Contains all attributes related to how this token takes part
// in a configured macro expansion.		// in a configured macro expansion.
llvm::Optional<MacroExpansion> MacroCtx;		llvm::Optional<MacroExpansion> MacroCtx;

		/// When macro expansion introduces nodes with children, those are marked as
		sammccallUnsubmitted Not Done Reply Inline Actions I can't really understand from the comment when this is supposed to be set, and there are no tests of it. (The comment is vague: is a "parent" the inverse of FormatToken::Children here? Is this scenario when the parents in question are new, or their children are new, or both? What part of the code is "formatting", and why would it otherwise skip the children?) sammccall: I can't really understand from the comment when this is supposed to be set, and there are no…
		klimekAuthorUnsubmitted Done Reply Inline Actions Rewrote comment. klimek: Rewrote comment.
		/// \c MacroParent.
		/// FIXME: The formatting code currently hard-codes the assumption that
		/// child nodes are introduced by blocks following an opening brace.
		/// This is deeply baked into the code and disentangling this will require
		/// signficant refactorings. \c MacroParent allows us to special-case the
		/// cases in which we treat parents as block-openers for now.
		bool MacroParent = false;

bool is(tok::TokenKind Kind) const { return Tok.is(Kind); }		bool is(tok::TokenKind Kind) const { return Tok.is(Kind); }
bool is(TokenType TT) const { return getType() == TT; }		bool is(TokenType TT) const { return getType() == TT; }
bool is(const IdentifierInfo *II) const {		bool is(const IdentifierInfo *II) const {
return II && II == Tok.getIdentifierInfo();		return II && II == Tok.getIdentifierInfo();
}		}
bool is(tok::PPKeywordKind Kind) const {		bool is(tok::PPKeywordKind Kind) const {
return Tok.getIdentifierInfo() &&		return Tok.getIdentifierInfo() &&
Tok.getIdentifierInfo()->getPPKeywordID() == Kind;		Tok.getIdentifierInfo()->getPPKeywordID() == Kind;
▲ Show 20 Lines • Show All 1,231 Lines • Show Last 20 Lines

clang/lib/Format/MacroCallReconstructor.cpp

This file was added.

				//===--- MacroCallReconstructor.cpp - Format C++ code ------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file contains the implementation of MacroCallReconstructor, which fits
				/// an reconstructed macro call to a parsed set of UnwrappedLines.
				///
				//===----------------------------------------------------------------------===//

				#include "Macros.h"

				#include "UnwrappedLineParser.h"
				#include "clang/Basic/TokenKinds.h"
				#include "llvm/ADT/DenseSet.h"
				#include "llvm/Support/Debug.h"
				#include <cassert>

				#define DEBUG_TYPE "format-reconstruct"

				namespace clang {
				namespace format {

				// Call \p Call for each token in the unwrapped line given, passing
				// the token, its parent and whether it is the first token in the line.
				template <typename T>
				void forEachToken(const UnwrappedLine &Line, const T &Call,
				FormatToken *Parent = nullptr) {
				bool First = true;
				for (const auto &N : Line.Tokens) {
				Call(N.Tok, Parent, First);
				First = false;
				for (const auto &Child : N.Children) {
				forEachToken(Child, Call, N.Tok);
				}
				}
				}

				MacroCallReconstructor::MacroCallReconstructor(
				unsigned Level,
				const llvm::DenseMap<FormatToken *, std::unique_ptr<UnwrappedLine>>
				&ActiveExpansions)
				: Level(Level), IdToReconstructed(ActiveExpansions) {
				Result.Tokens.push_back(std::make_unique<LineNode>());
				ActiveReconstructedLines.push_back(&Result);
				}

				void MacroCallReconstructor::addLine(const UnwrappedLine &Line) {
				assert(State != Finalized);
				sammccallUnsubmitted Done Reply Inline Actions if you want to keep these LLVM_DEBUGs, maybe this should be "MCR" or so instead of "unex"? sammccall: if you want to keep these LLVM_DEBUGs, maybe this should be "MCR" or so instead of "unex"?
				LLVM_DEBUG(llvm::dbgs() << "MCR: new line...\n");
				forEachToken(Line, [&](FormatToken Token, FormatToken Parent, bool First) {
				add(Token, Parent, First);
				});
				assert(InProgress \|\| finished());
				}

				UnwrappedLine MacroCallReconstructor::takeResult() && {
				finalize();
				sammccallUnsubmitted Done Reply Inline Actions you might want an assertion that Result has one token with one child (it's pretty obvious in finalize() but less so here) sammccall: you might want an assertion that Result has one token with one child (it's pretty obvious in…
				assert(Result.Tokens.size() == 1 && Result.Tokens.front()->Children.size() == 1);
				UnwrappedLine Final =
				createUnwrappedLine(*Result.Tokens.front()->Children.front(), Level);
				assert(!Final.Tokens.empty());
				return Final;
				}

				// Reconstruct the position of the next \p Token, given its parent \p
				// ExpandedParent in the incoming unwrapped line. \p First specifies whether it
				// is the first token in a given unwrapped line.
				void MacroCallReconstructor::add(FormatToken *Token,
				FormatToken *ExpandedParent, bool First) {
				LLVM_DEBUG(
				llvm::dbgs() << "MCR: Token: " << Token->TokenText << ", Parent: "
				<< (ExpandedParent ? ExpandedParent->TokenText : "<null>")
				<< ", First: " << First << "\n");
				// In order to be able to find the correct parent in the reconstructed token
				// stream, we need to continue the last open reconstruction until we find the
				// given token if it is part of the reconstructed token stream.
				//
				// Note that hidden tokens can be part of the reconstructed stream in nested
				// macro calls.
				// For example, given
				// #define C(x, y) x y
				// #define B(x) {x}
				// And the call:
				// C(a, B(b))
				// The outer macro call will be C(a, {b}), and the hidden token '}' can be
				// found in the reconstructed token stream of that expansion level.
				// In the expanded token stream
				// a {b}
				// 'b' is a child of '{'. We need to continue the open expansion of the ','
				// in the call of 'C' in order to correctly set the ',' as the parent of '{',
				// so we later set the spelled token 'b' as a child of the ','.
				if (!ActiveExpansions.empty() && Token->MacroCtx &&
				(Token->MacroCtx->Role != MR_Hidden \|\|
				sammccallUnsubmitted Done Reply Inline Actions nit: I think this would be clearer by naming the result: `if (bool PassedMacroComma = reconstruct...)` (because it's not clear from the name what the function returns, and documenting it would help only a little) sammccall: nit: I think this would be clearer by naming the result: `if (bool PassedMacroComma =…
				ActiveExpansions.size() != Token->MacroCtx->ExpandedFrom.size())) {
				if (bool PassedMacroComma = reconstructActiveCallUntil(Token))
				First = true;
				}

				prepareParent(ExpandedParent, First);

				if (Token->MacroCtx) {
				// If this token was generated by a macro call, add the reconstructed
				// equivalent of the token.
				sammccallUnsubmitted Done Reply Inline Actions (this is the else branch of a negated condition, consider swapping the branches to avoid double-negative) sammccall: (this is the else branch of a negated condition, consider swapping the branches to avoid double…
				reconstruct(Token);
				} else {
				// Otherwise, we add it to the current line.
				appendToken(Token);
				}
				}
				sammccallUnsubmitted Done Reply Inline Actions liens -> lines (unless there's a really weird metaphor going on here!) sammccall: liens -> lines (unless there's a really weird metaphor going on here!)

				// Adjusts the stack of active reconstructed lines so we're ready to push
				// tokens. The tokens to be pushed are children of ExpandedParent in the
				// expanded code.
				//
				// This may entail:
				// - creating a new line, if the parent is on the active line
				// - popping active lines, if the parent is further up the stack
				//
				// Postcondition:
				// ActiveReconstructedLines.back() is the line that has \p ExpandedParent or its
				// reconstructed replacement token as a parent (when possible) - that is, the
				// last token in \c ActiveReconstructedLines[ActiveReconstructedLines.size()-2]
				// is the parent of ActiveReconstructedLines.back() in the reconstructed
				// unwrapped line.
				void MacroCallReconstructor::prepareParent(FormatToken *ExpandedParent,
				bool NewLine) {
				LLVM_DEBUG({
				llvm::dbgs() << "ParentMap:\n";
				debugParentMap();
				});
				// We want to find the parent in the new unwrapped line, where the expanded
				// parent might have been replaced during reconstruction.
				FormatToken *Parent = getParentInResult(ExpandedParent);
				LLVM_DEBUG(llvm::dbgs() << "MCR: New parent: "
				<< (Parent ? Parent->TokenText : "<null>") << "\n");

				FormatToken *OpenMacroParent = nullptr;
				if (!MacroCallStructure.empty()) {
				// Inside a macro expansion, it is possible to lose track of the correct
				// parent - either because it is already popped, for example because it was
				// in a different macro argument (e.g. M({, })), or when we work on invalid
				// code.
				// Thus, we use the innermost macro call's parent as the parent at which
				// we stop; this allows us to stay within the macro expansion and keeps
				// any problems confined to the extent of the macro call.
				OpenMacroParent =
				getParentInResult(MacroCallStructure.back().MacroCallLParen);
				LLVM_DEBUG(llvm::dbgs()
				<< "MacroCallLParen: "
				<< MacroCallStructure.back().MacroCallLParen->TokenText
				<< ", OpenMacroParent: "
				<< (OpenMacroParent ? OpenMacroParent->TokenText : "<null>")
				<< "\n");
				}
				if (NewLine \|\|
				(!ActiveReconstructedLines.back()->Tokens.empty() &&
				Parent == ActiveReconstructedLines.back()->Tokens.back()->Tok)) {
				// If we are at the first token in a new line, we want to also
				// create a new line in the resulting reconstructed unwrapped line.
				while (ActiveReconstructedLines.back()->Tokens.empty() \|\|
				(Parent != ActiveReconstructedLines.back()->Tokens.back()->Tok &&
				ActiveReconstructedLines.back()->Tokens.back()->Tok !=
				OpenMacroParent)) {
				ActiveReconstructedLines.pop_back();
				assert(!ActiveReconstructedLines.empty());
				}
				assert(!ActiveReconstructedLines.empty());
				ActiveReconstructedLines.back()->Tokens.back()->Children.push_back(
				std::make_unique<Line>());
				ActiveReconstructedLines.push_back(
				&*ActiveReconstructedLines.back()->Tokens.back()->Children.back());
				} else if (parentLine().Tokens.back()->Tok != Parent) {
				// If we're not the first token in a new line, pop lines until we find
				// the child of \c Parent in the stack.
				while (Parent != parentLine().Tokens.back()->Tok &&
				parentLine().Tokens.back()->Tok &&
				parentLine().Tokens.back()->Tok != OpenMacroParent) {
				ActiveReconstructedLines.pop_back();
				assert(!ActiveReconstructedLines.empty());
				}
				}
				assert(!ActiveReconstructedLines.empty());
				}

				// For a given \p Parent in the incoming expanded token stream, find the
				// corresponding parent in the output.
				FormatToken MacroCallReconstructor::getParentInResult(FormatToken Parent) {
				FormatToken *Mapped = SpelledParentToReconstructedParent.lookup(Parent);
				if (!Mapped)
				return Parent;
				for (; Mapped; Mapped = SpelledParentToReconstructedParent.lookup(Parent)) {
				Parent = Mapped;
				}
				// If we use a different token than the parent in the expanded token stream
				// as parent, mark it as a special parent, so the formatting code knows it
				// needs to have its children formatted.
				Parent->MacroParent = true;
				return Parent;
				}

				// Reconstruct a \p Token that was expanded from a macro call.
				void MacroCallReconstructor::reconstruct(FormatToken *Token) {
				assert(Token->MacroCtx);
				// A single token can be the only result of a macro call:
				// Given: #define ID(x, y) ;
				// And the call: ID(<some>, <tokens>)
				// ';' in the expanded stream will reconstruct all of ID(<some>, <tokens>).
				if (Token->MacroCtx->StartOfExpansion) {
				startReconstruction(Token);
				// If the order of tokens in the expanded token stream is not the
				// same as the order of tokens in the reconstructed stream, we need
				// to reconstruct tokens that arrive later in the stream.
				if (Token->MacroCtx->Role != MR_Hidden) {
				reconstructActiveCallUntil(Token);
				}
				}
				assert(!ActiveExpansions.empty());
				owenpanUnsubmitted Not Done Reply Inline Actions This fails as reported in https://github.com/llvm/llvm-project/issues/64275. owenpan: This fails as reported in https://github.com/llvm/llvm-project/issues/64275.
				if (ActiveExpansions.back().SpelledI != ActiveExpansions.back().SpelledE) {
				assert(ActiveExpansions.size() == Token->MacroCtx->ExpandedFrom.size());
				if (Token->MacroCtx->Role != MR_Hidden) {
				// The current token in the reconstructed token stream must be the token
				// we're looking for - we either arrive here after startReconstruction,
				// which initiates the stream to the first token, or after
				// continueReconstructionUntil skipped until the expected token in the
				// reconstructed stream at the start of add(...).
				assert(ActiveExpansions.back().SpelledI->Tok == Token);
				processNextReconstructed();
				sammccallUnsubmitted Done Reply Inline Actions this FIXME looks obsolete? sammccall: this FIXME looks obsolete?
				} else if (!currentLine()->Tokens.empty()) {
				// Map all hidden tokens to the last visible token in the output.
				// If the hidden token is a parent, we'll use the last visible
				// token as the parent of the hidden token's children.
				SpelledParentToReconstructedParent[Token] =
				currentLine()->Tokens.back()->Tok;
				} else {
				for (auto I = ActiveReconstructedLines.rbegin(),
				E = ActiveReconstructedLines.rend();
				I != E; ++I) {
				if (!(*I)->Tokens.empty()) {
				SpelledParentToReconstructedParent[Token] = (*I)->Tokens.back()->Tok;
				break;
				}
				}
				}
				}
				if (Token->MacroCtx->EndOfExpansion)
				endReconstruction(Token);
				}

				// Given a \p Token that starts an expansion, reconstruct the beginning of the
				// macro call.
				// For example, given: #define ID(x) x
				// And the call: ID(int a)
				// Reconstructs: ID(
				void MacroCallReconstructor::startReconstruction(FormatToken *Token) {
				assert(Token->MacroCtx);
				assert(!Token->MacroCtx->ExpandedFrom.empty());
				assert(ActiveExpansions.size() <= Token->MacroCtx->ExpandedFrom.size());
				#ifndef NDEBUG
				// Check that the token's reconstruction stack matches our current
				// reconstruction stack.
				for (size_t I = 0; I < ActiveExpansions.size(); ++I) {
				assert(ActiveExpansions[I].ID ==
				Token->MacroCtx
				->ExpandedFrom[Token->MacroCtx->ExpandedFrom.size() - 1 - I]);
				}
				#endif
				// Start reconstruction for all calls for which this token is the first token
				// generated by the call.
				// Note that the token's expanded from stack is inside-to-outside, and the
				// expansions for which this token is not the first are the outermost ones.
				ArrayRef<FormatToken *> StartedMacros =
				makeArrayRef(Token->MacroCtx->ExpandedFrom)
				.drop_back(ActiveExpansions.size());
				assert(StartedMacros.size() == Token->MacroCtx->StartOfExpansion);
				// We reconstruct macro calls outside-to-inside.
				for (FormatToken *ID : llvm::reverse(StartedMacros)) {
				// We found a macro call to be reconstructed; the next time our
				// reconstruction stack is empty we know we finished an reconstruction.
				#ifndef NDEBUG
				State = InProgress;
				#endif
				// Put the reconstructed macro call's token into our reconstruction stack.
				auto IU = IdToReconstructed.find(ID);
				assert(IU != IdToReconstructed.end());
				ActiveExpansions.push_back(
				{ID, IU->second->Tokens.begin(), IU->second->Tokens.end()});
				// Process the macro call's identifier.
				processNextReconstructed();
				if (ActiveExpansions.back().SpelledI == ActiveExpansions.back().SpelledE)
				continue;
				if (ActiveExpansions.back().SpelledI->Tok->is(tok::l_paren)) {
				// Process the optional opening parenthesis.
				processNextReconstructed();
				}
				}
				}

				// Add all tokens in the reconstruction stream to the output until we find the
				// given \p Token.
				bool MacroCallReconstructor::reconstructActiveCallUntil(FormatToken *Token) {
				assert(!ActiveExpansions.empty());
				bool PassedMacroComma = false;
				// FIXME: If Token was already expanded earlier, due to
				// a change in order, we will not find it, but need to
				// skip it.
				while (ActiveExpansions.back().SpelledI != ActiveExpansions.back().SpelledE &&
				ActiveExpansions.back().SpelledI->Tok != Token) {
				PassedMacroComma = processNextReconstructed() \|\| PassedMacroComma;
				}
				return PassedMacroComma;
				}

				// End all reconstructions for which \p Token is the final token.
				void MacroCallReconstructor::endReconstruction(FormatToken *Token) {
				assert(Token->MacroCtx &&
				(ActiveExpansions.size() >= Token->MacroCtx->EndOfExpansion));
				for (size_t I = 0; I < Token->MacroCtx->EndOfExpansion; ++I) {
				#ifndef NDEBUG
				// Check all remaining tokens but the final closing parenthesis and optional
				// trailing comment were already reconstructed at an inner expansion level.
				for (auto T = ActiveExpansions.back().SpelledI;
				T != ActiveExpansions.back().SpelledE; ++T) {
				FormatToken *Token = T->Tok;
				bool ClosingParen = (std::next(T) == ActiveExpansions.back().SpelledE \|\|
				std::next(T)->Tok->isTrailingComment()) &&
				!Token->MacroCtx && Token->is(tok::r_paren);
				bool TrailingComment = Token->isTrailingComment();
				bool PreviousLevel =
				Token->MacroCtx &&
				(ActiveExpansions.size() < Token->MacroCtx->ExpandedFrom.size());
				if (!ClosingParen && !TrailingComment && !PreviousLevel) {
				llvm::dbgs() << "At token: " << Token->TokenText << "\n";
				}
				// In addition to the following cases, we can also run into this
				// when a macro call had more arguments than expected; in that case,
				// the comma and the remaining tokens in the macro call will potentially
				// end up in the line when we finish the expansion.
				// FIXME: Add the information which arguments are unused, and assert
				// one of the cases below plus reconstructed macro argument tokens.
				// assert(ClosingParen \|\| TrailingComment \|\| PreviousLevel);
				}
				#endif
				// Handle the remaining open tokens:
				// - expand the closing parenthesis, if it exists, including an optional
				// trailing comment
				// - handle tokens that were already reconstructed at an inner expansion
				// level
				// - handle tokens when a macro call had more than the expected number of
				// arguments, i.e. when #define M(x) is called as M(a, b, c) we'll end
				// up with the sequence ", b, c)" being open at the end of the
				// reconstruction; we want to gracefully handle that case
				//
				// FIXME: See the above debug-check for what we will need to do to be
				// able to assert this.
				for (auto T = ActiveExpansions.back().SpelledI;
				T != ActiveExpansions.back().SpelledE; ++T) {
				processNextReconstructed();
				}
				ActiveExpansions.pop_back();
				}
				}

				void MacroCallReconstructor::debugParentMap() const {
				llvm::DenseSet<FormatToken *> Values;
				for (const auto &P : SpelledParentToReconstructedParent)
				Values.insert(P.second);

				for (const auto &P : SpelledParentToReconstructedParent) {
				if (Values.contains(P.first))
				continue;
				llvm::dbgs() << (P.first ? P.first->TokenText : "<null>");
				for (auto I = SpelledParentToReconstructedParent.find(P.first),
				E = SpelledParentToReconstructedParent.end();
				I != E; I = SpelledParentToReconstructedParent.find(I->second)) {
				llvm::dbgs() << " -> " << (I->second ? I->second->TokenText : "<null>");
				}
				llvm::dbgs() << "\n";
				}
				}

				// If visible, add the next token of the reconstructed token sequence to the
				// output. Returns whether reconstruction passed a comma that is part of a
				// macro call.
				bool MacroCallReconstructor::processNextReconstructed() {
				FormatToken *Token = ActiveExpansions.back().SpelledI->Tok;
				++ActiveExpansions.back().SpelledI;
				if (Token->MacroCtx) {
				// Skip tokens that are not part of the macro call.
				if (Token->MacroCtx->Role == MR_Hidden) {
				return false;
				}
				// Skip tokens we already expanded during an inner reconstruction.
				// For example, given: #define ID(x) {x}
				// And the call: ID(ID(f))
				// We get two reconstructions:
				// ID(f) -> {f}
				// ID({f}) -> {{f}}
				// We reconstruct f during the first reconstruction, and skip it during the
				// second reconstruction.
				if (ActiveExpansions.size() < Token->MacroCtx->ExpandedFrom.size()) {
				return false;
				}
				}
				// Tokens that do not have a macro context are tokens in that are part of the
				// macro call that have not taken part in expansion.
				if (!Token->MacroCtx) {
				// Put the parentheses and commas of a macro call into the same line;
				// if the arguments produce new unwrapped lines, they will become children
				// of the corresponding opening parenthesis or comma tokens in the
				// reconstructed call.
				if (Token->is(tok::l_paren)) {
				MacroCallStructure.push_back(MacroCallState(
				currentLine(), parentLine().Tokens.back()->Tok, Token));
				// All tokens that are children of the previous line's last token in the
				// reconstructed token stream will now be children of the l_paren token.
				// For example, for the line containing the macro calls:
				// auto x = ID({ID(2)});
				// We will build up a map <null> -> ( -> ( with the first and second
				// l_paren of the macro call respectively. New lines that come in with a
				// <null> parent will then become children of the l_paren token of the
				// currently innermost macro call.
				SpelledParentToReconstructedParent[MacroCallStructure.back()
				.ParentLastToken] = Token;
				appendToken(Token);
				prepareParent(Token, /NewLine=/true);
				Token->MacroParent = true;
				return false;
				}
				if (!MacroCallStructure.empty()) {
				if (Token->is(tok::comma)) {
				// Make new lines inside the next argument children of the comma token.
				SpelledParentToReconstructedParent
				[MacroCallStructure.back().Line->Tokens.back()->Tok] = Token;
				Token->MacroParent = true;
				appendToken(Token, MacroCallStructure.back().Line);
				prepareParent(Token, /NewLine=/true);
				return true;
				}
				if (Token->is(tok::r_paren)) {
				appendToken(Token, MacroCallStructure.back().Line);
				SpelledParentToReconstructedParent.erase(
				MacroCallStructure.back().ParentLastToken);
				MacroCallStructure.pop_back();
				return false;
				}
				}
				}
				// Note that any tokens that are tagged with MR_None have been passed as
				// arguments to the macro that have not been expanded, for example:
				// Given: #define ID(X) x
				// When calling: ID(a, b)
				// 'b' will be part of the reconstructed token stream, but tagged MR_None.
				// Given that erroring out in this case would be disruptive, we continue
				// pushing the (unformatted) token.
				// FIXME: This can lead to unfortunate formatting decisions - give the user
				// a hint that their macro definition is broken.
				appendToken(Token);
				return false;
				}

				void MacroCallReconstructor::finalize() {
				#ifndef NDEBUG
				assert(State != Finalized && finished());
				State = Finalized;
				#endif

				// We created corresponding unwrapped lines for each incoming line as children
				// the the toplevel null token.
				assert(Result.Tokens.size() == 1 && !Result.Tokens.front()->Children.empty());
				LLVM_DEBUG({
				llvm::dbgs() << "Finalizing reconstructed lines:\n";
				debug(Result, 0);
				});

				// The first line becomes the top level line in the resulting unwrapped line.
				LineNode &Top = *Result.Tokens.front();
				auto *I = Top.Children.begin();
				// Every subsequent line will become a child of the last token in the previous
				// line, which is the token prior to the first token in the line.
				LineNode Last = (I)->Tokens.back().get();
				++I;
				for (auto *E = Top.Children.end(); I != E; ++I) {
				assert(Last->Children.empty());
				Last->Children.push_back(std::move(*I));

				// Mark the previous line's last token as generated by a macro expansion
				// so the formatting algorithm can take that into account.
				Last->Tok->MacroParent = true;

				Last = Last->Children.back()->Tokens.back().get();
				}
				Top.Children.resize(1);
				}

				void MacroCallReconstructor::appendToken(FormatToken Token, Line L) {
				L = L ? L : currentLine();
				LLVM_DEBUG(llvm::dbgs() << "-> " << Token->TokenText << "\n");
				L->Tokens.push_back(std::make_unique<LineNode>(Token));
				}

				UnwrappedLine MacroCallReconstructor::createUnwrappedLine(const Line &Line,
				int Level) {
				UnwrappedLine Result;
				Result.Level = Level;
				for (const auto &N : Line.Tokens) {
				Result.Tokens.push_back(N->Tok);
				UnwrappedLineNode &Current = Result.Tokens.back();
				for (const auto &Child : N->Children) {
				if (Child->Tokens.empty())
				continue;
				Current.Children.push_back(createUnwrappedLine(*Child, Level + 1));
				}
				if (Current.Children.size() == 1 &&
				Current.Tok->isOneOf(tok::l_paren, tok::comma)) {
				Result.Tokens.splice(Result.Tokens.end(),
				Current.Children.front().Tokens);
				Current.Children.clear();
				}
				}
				return Result;
				}

				void MacroCallReconstructor::debug(const Line &Line, int Level) {
				for (int i = 0; i < Level; ++i)
				llvm::dbgs() << " ";
				for (const auto &N : Line.Tokens) {
				if (!N)
				continue;
				if (N->Tok)
				llvm::dbgs() << N->Tok->TokenText << " ";
				for (const auto &Child : N->Children) {
				llvm::dbgs() << "\n";
				debug(*Child, Level + 1);
				for (int i = 0; i < Level; ++i)
				llvm::dbgs() << " ";
				}
				}
				llvm::dbgs() << "\n";
				}

				MacroCallReconstructor::Line &MacroCallReconstructor::parentLine() {
				return **std::prev(std::prev(ActiveReconstructedLines.end()));
				}

				MacroCallReconstructor::Line *MacroCallReconstructor::currentLine() {
				return ActiveReconstructedLines.back();
				}

				MacroCallReconstructor::MacroCallState::MacroCallState(
				MacroCallReconstructor::Line Line, FormatToken ParentLastToken,
				FormatToken *MacroCallLParen)
				: Line(Line), ParentLastToken(ParentLastToken),
				MacroCallLParen(MacroCallLParen) {
				LLVM_DEBUG(
				llvm::dbgs() << "ParentLastToken: "
				<< (ParentLastToken ? ParentLastToken->TokenText : "<null>")
				<< "\n");

				assert(MacroCallLParen->is(tok::l_paren));
				}

				} // namespace format
				} // namespace clang

clang/lib/Format/Macros.h

//===--- MacroExpander.h - Format C++ code ----------------------- C++ --===//		//===--- Macros.h - Format C++ code ------------------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
///		///
/// \file		/// \file
/// This file contains the main building blocks of macro support in		/// This file contains the main building blocks of macro support in
/// clang-format.		/// clang-format.
///		///
/// In order to not violate the requirement that clang-format can format files		/// In order to not violate the requirement that clang-format can format files
/// in isolation, clang-format's macro support uses expansions users provide		/// in isolation, clang-format's macro support uses expansions users provide
/// as part of clang-format's style configuration.		/// as part of clang-format's style configuration.
///		///
/// Macro definitions are of the form "MACRO(p1, p2)=p1 + p2", but only support		/// Macro definitions are of the form "MACRO(p1, p2)=p1 + p2", but only support
/// one level of expansion (\see MacroExpander for a full description of what		/// one level of expansion (\see MacroExpander for a full description of what
/// is supported).		/// is supported).
///		///
/// As part of parsing, clang-format uses the MacroExpander to expand the		/// As part of parsing, clang-format uses the MacroExpander to expand the
/// spelled token streams into expanded token streams when it encounters a		/// spelled token streams into expanded token streams when it encounters a
/// macro call. The UnwrappedLineParser continues to parse UnwrappedLines		/// macro call. The UnwrappedLineParser continues to parse UnwrappedLines
/// from the expanded token stream.		/// from the expanded token stream.
/// After the expanded unwrapped lines are parsed, the MacroUnexpander matches		/// After the expanded unwrapped lines are parsed, the MacroCallReconstructor
/// the spelled token stream into unwrapped lines that best resemble the		/// matches the spelled token stream into unwrapped lines that best resemble the
/// structure of the expanded unwrapped lines.		/// structure of the expanded unwrapped lines. These reconstructed unwrapped
///		/// lines are aliasing the tokens in the expanded token stream, so that token
/// When formatting, clang-format formats the expanded unwrapped lines first,		/// annotations will be reused when formatting the spelled macro calls.
		sammccallUnsubmitted Done Reply Inline Actions This would be a good place to explicitly introduce the name "unexpanded" for what comes out of the unexpander. This para gives names to the token streams, but not as clearly to the UnwrappedLines parsed out of them. I think the fact that the tokens alias between the streams/lines, and so the final formatting of the expanded lines "writes through" tokentype etc to the unexpanded lines, is an important design point worth emphasizing. (This part is mostly structure around "what happens", with the data sets secondary - I think I'd personally find the reverse easier to follow but YMMV) sammccall: This would be a good place to explicitly introduce the name "unexpanded" for what comes out of…
/// determining the token types. Next, it formats the spelled unwrapped lines,		///
/// keeping the token types fixed, while allowing other formatting decisions		/// When formatting, clang-format annotates and formats the expanded unwrapped
		sammccallUnsubmitted Not Done Reply Inline Actions would s/formats/annotates/ be inaccurate? This is just my poor understanding of the code, but it wasn't obvious to me that annotation is closely associated with formatting and not with parsing UnwrappedLines. sammccall: would s/formats/annotates/ be inaccurate? This is just my poor understanding of the code, but…
		klimekAuthorUnsubmitted Done Reply Inline Actions Said "annotates and formats" now - yes, the fact that annotating and formatting is so coupled is a admittedly weird choice of the initial design. klimek: Said "annotates and formats" now - yes, the fact that annotating and formatting is so coupled…
/// to change.		/// lines first, determining the token types. Next, it formats the spelled
		/// unwrapped lines, keeping the token types fixed, while allowing other
		/// formatting decisions to change.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef CLANG_LIB_FORMAT_MACROS_H		#ifndef CLANG_LIB_FORMAT_MACROS_H
#define CLANG_LIB_FORMAT_MACROS_H		#define CLANG_LIB_FORMAT_MACROS_H

		#include <list>
		#include <map>
#include <string>		#include <string>
#include <unordered_map>
#include <vector>		#include <vector>

#include "Encoding.h"
#include "FormatToken.h"		#include "FormatToken.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"

namespace llvm {
class MemoryBuffer;
} // namespace llvm

namespace clang {		namespace clang {
class IdentifierTable;
class SourceManager;

namespace format {		namespace format {
struct FormatStyle;
		struct UnwrappedLine;
		struct UnwrappedLineNode;

/// Takes a set of macro definitions as strings and allows expanding calls to		/// Takes a set of macro definitions as strings and allows expanding calls to
/// those macros.		/// those macros.
///		///
/// For example:		/// For example:
/// Definition: A(x, y)=x + y		/// Definition: A(x, y)=x + y
/// Call : A(int a = 1, 2)		/// Call : A(int a = 1, 2)
/// Expansion : int a = 1 + 2		/// Expansion : int a = 1 + 2
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	private:
clang::SourceManager &SourceMgr;		clang::SourceManager &SourceMgr;
const FormatStyle &Style;		const FormatStyle &Style;
llvm::SpecificBumpPtrAllocator<FormatToken> &Allocator;		llvm::SpecificBumpPtrAllocator<FormatToken> &Allocator;
IdentifierTable &IdentTable;		IdentifierTable &IdentTable;
std::vector<std::unique_ptr<llvm::MemoryBuffer>> Buffers;		std::vector<std::unique_ptr<llvm::MemoryBuffer>> Buffers;
llvm::StringMap<Definition> Definitions;		llvm::StringMap<Definition> Definitions;
};		};

		/// Converts a sequence of UnwrappedLines containing expanded macros into a
		sammccallUnsubmitted Not Done Reply Inline Actions "matches formatted lines" probably describes the hard technical problem it has to solve, but not so much what it does for the caller: what the transformation is between its inputs and its outputs. Is it something like: Rewrites expanded code (containing tokens expanded from macros) into spelled code (containing tokens for the macro call itself). Token types are preserved, so macro arguments in the output have semantically-correct types from their expansion. This is the point of expansion/unexpansion: to allow this information to be used in formatting. [Is it just tokentypes? I guess it's also Role and MustBreakBefore and some other stuff like that?] sammccall: "matches formatted lines" probably describes the hard technical problem it has to solve, but…
		klimekAuthorUnsubmitted Done Reply Inline Actions It's not the token info, this we'd trivially have by using the original token sequence which is still lying around (we re-use the same tokens). Reworded. klimek: It's not the token info, this we'd trivially have by using the original token sequence which is…
		/// single UnwrappedLine containing the macro calls. This UnwrappedLine may be
		sammccallUnsubmitted Done Reply Inline Actions I know it's the common case, but I think saying "the macro call" here is misleading, because it quickly becomes apparent reading the code that the scope isn't one macro call, and (at least for me) it's easy to get hung up on not understanding what the scope is. (AIUI the scope is actually one UL of output... so the use of plural there is also confusing). I think a escription could be something like: Converts a sequence of UnwrappedLines containing expanded macros into a single UnwrappedLine containing the macro calls. This UnwrappedLine may be broken into child lines, in a way that best conveys the structure of the expanded code. ... In the simplest case, a spelled UnwrappedLine contains one macro, and after expanding it we have one expanded UnwrappedLine. In general, macro expansions can span UnwrappedLines, and multiple macros can contribute tokens to the same line. We keep consuming expanded lines until: all expansions that started have finished (we're not chopping any macros in half) and we've reached the end of a spelled unwrapped line A single UnwrappedLine represents this chunk of code. After this point, the state of the spelled/expanded stream is "in sync" (both at the start of an UnwrappedLine, with no macros open), so the Unexpander can be thrown away and parsing can continue. (and then launch into an example) sammccall: I know it's the common case, but I think saying "the macro call" here is misleading, because it…
		klimekAuthorUnsubmitted Done Reply Inline Actions Thanks, that's a really good write-up! klimek: Thanks, that's a really good write-up!
		/// broken into child lines, in a way that best conveys the structure of the
		/// expanded code.
		///
		/// In the simplest case, a spelled UnwrappedLine contains one macro, and after
		/// expanding it we have one expanded UnwrappedLine. In general, macro
		/// expansions can span UnwrappedLines, and multiple macros can contribute
		/// tokens to the same line. We keep consuming expanded lines until:
		sammccallUnsubmitted Not Done Reply Inline Actions I'm a bit confused by these arrows. It doesn't seem that they each point to an unwrappedline passed to addLine? sammccall: I'm a bit confused by these arrows. It doesn't seem that they each point to an unwrappedline…
		klimekAuthorUnsubmitted Done Reply Inline Actions Why not? That is the intention? Note that high-level we do not pass class definitions in one unwrapped line; if they ever go into a single line it's done in the end by a special merging pass (yes, that's a bit confusing). klimek: Why not? That is the intention? Note that high-level we do not pass class definitions in one…
		sammccallUnsubmitted Not Done Reply Inline Actions This example didn't really help me understand the interface of this class, it seems to be a special case: the input is a single block construct (rather than e.g. a whole program), but it's not clear why that's the case. the output (unexpanded form) consists of exactly a macro call with no leading/trailing tokens, which isn't true in general If the idea is to provide as input the minimal range of lines that span the macro, we should say so somewhere. But I would like to understand why we're not simply processing the whole file. sammccall: This example didn't really help me understand the interface of this class, it seems to be a…
		klimekAuthorUnsubmitted Done Reply Inline Actions Re: input is a single construct - that is not the case (see other comment) Re: leading / trailing tokens: I didn't want to make it too complex in the example. klimek: Re: input is a single construct - that is not the case (see other comment) Re: leading /…
		sammccallUnsubmitted Not Done Reply Inline Actions Re: input is a single construct - that is not the case (see other comment) A class is definitely a single construct :-) It sounds like that's not significant to the MacroUnexpander though, so it feels like a red herring to me. Re: leading / trailing tokens: I didn't want to make it too complex in the example. That seems fine, I think the complexities of the general case need to be mentioned somewhere because the API reflects them. But you're right, the primary example should be simple. I think a tricky example (maybe the `#define M ; x++` one?) could be given on one of addLine/finish/getResult maybe. sammccall: > Re: input is a single construct - that is not the case (see other comment) A class is…
		klimekAuthorUnsubmitted Done Reply Inline Actions #define M ; x++ seems to be similarly tricky to #define CLASSA(x) class A x We get multiple calls to addLine, in between which finished() is false. klimek: #define M ; x++ seems to be similarly tricky to #define CLASSA(x) class A x We get multiple…
		/// * all expansions that started have finished (we're not chopping any macros
		/// in half)
		/// * and we've reached the end of a spelled unwrapped line.
		///
		/// A single UnwrappedLine represents this chunk of code.
		sammccallUnsubmitted Not Done Reply Inline Actions this says "creates the unwrapped lines" but getResult() returns only one. Does the plural here refer to the tree? Maybe just use singular or "the unwrappedlinetree"? sammccall: this says "creates the unwrapped lines" but getResult() returns only one. Does the plural here…
		klimekAuthorUnsubmitted Done Reply Inline Actions Fixed comment. klimek: Fixed comment.
		///
		/// After this point, the state of the spelled/expanded stream is "in sync"
		/// (both at the start of an UnwrappedLine, with no macros open), so the
		/// Unexpander can be thrown away and parsing can continue.
		///
		/// Given a mapping from the macro name identifier token in the macro call
		/// to the tokens of the macro call, for example:
		sammccallUnsubmitted Not Done Reply Inline Actions I get the symmetry between the expander/unexpander classes, but the name is making it harder for me to follow the code. the extra compound+negation in the name makes it hard/slow to understand, as I'm always thinking first about expansion the fact that expansion/unexpansion are not actually opposites completely breaks my intuition. It also creates two different meanings of "unexpanded" that led me down the garden path a bit (e.g. in the test). Would you consider `MacroCollapser`? It's artificial, but expand/collapse are well-known antonyms without being the same word. (Incidentally, I just realized this is why I find "UnwrappedLine" so slippery - the ambiguity between "this isn't wrapped" and "this was unwrapped") sammccall: I get the symmetry between the expander/unexpander classes, but the name is making it harder…
		klimekAuthorUnsubmitted Done Reply Inline Actions Happy to rename, but first wanted to check in - I use "unexpanded token stream" quite often to refer to the macro call. Perhaps we should also find different wording for that then? Perhaps we should call this MacroLineMatcher btw? This is not creating anything new - the resulting token sequence is the "unexpanded token sequence" with the exact same tokens, the special thing is that they're matched into unwrapped lines. klimek: Happy to rename, but first wanted to check in - I use "unexpanded token stream" quite often to…
		sammccallUnsubmitted Done Reply Inline Actions I think unexpanded/unexpander are reasonable names, having understood this better, but with caveats. It's important to distinguish between the pre-expansion state ("spelled"?) and the post-unexpansion state ("unexpanded?"). The UnwrappedLines are vitally different, but the token stream is the same as you point out. When referring to the token stream, I think "spelled" is probably less confusing (that's where the token stream fundamentally comes from), and explicitly mentioning somewhere that the token sequence encoded by the unexpanded lines is the same original spelled stream. sammccall: I think unexpanded/unexpander are reasonable names, having understood this better, but with…
		/// CLASSA -> CLASSA({public: void x();})
		///
		/// When getting the formatted lines of the expansion via the \c addLine method
		/// (each '->' specifies a call to \c addLine ):
		/// -> class A {
		/// -> public:
		/// -> void x();
		/// -> };
		///
		sammccallUnsubmitted Done Reply Inline Actions any reason for std::map rather than DenseMap, here and elsewhere? (Only good reasons i'm aware of are sorting and pointer stability) sammccall: any reason for std::map rather than DenseMap, here and elsewhere? (Only good reasons i'm aware…
		/// Creates the tree of unwrapped lines containing the macro call tokens so that
		sammccallUnsubmitted Not Done Reply Inline Actions I find this hard to follow. again, the "match" part is an implementation detail that sounds interesting but isn't :-) "from a macro" sounds like one in particular, but is actually every macro the bit about "getResult" is a separate point that feels shoehorned in What about: "Replaces tokens that were expanded from macros with the original macro calls. The result is stored and available in getResult()" sammccall: I find this hard to follow. - again, the "match" part is an implementation detail that sounds…
		klimekAuthorUnsubmitted Done Reply Inline Actions I think the match part is important, as it's matching unwrapped lines, which is the heart of the algorithm. klimek: I think the match part is important, as it's matching unwrapped lines, which is the heart of…
		/// the macro call tokens fit the semantic structure of the expanded formatted
		/// lines:
		/// -> CLASSA({
		/// -> public:
		/// -> void x();
		/// -> })
		class MacroCallReconstructor {
		public:
		/// Create an Reconstructor whose resulting \p UnwrappedLine will start at
		/// \p Level, using the map from name identifier token to the corresponding
		sammccallUnsubmitted Done Reply Inline Actions const? or do we not care sammccall: const? or do we not care
		sammccallUnsubmitted Done Reply Inline Actions Maybe comment that when finished() is true, it's possible to call getResult() and stop processing... but also valid to continue calling addLine(), if this isn't a good place to stop. sammccall: Maybe comment that when finished() is true, it's possible to call getResult() and stop…
		/// tokens of the spelled macro call.
		sammccallUnsubmitted Not Done Reply Inline Actions how can this be the case if the input can have multiple lines and the output only one? Is the return value a synthetic parent of the translated lines? Or is there a hidden requirement on the caller here that we don't keep feeding in lines unless we're continuing a macro call and therefore know we'll end up with one line? This stuff could be clarified in docs but again I have to ask, can't we sidestep the whole issue by processing the whole file and returning all the lines? (This is somewhat answered in the implementation, though that doesn't seem like the right place) sammccall: how can this be the case if the input can have multiple lines and the output only one? Is the…
		klimekAuthorUnsubmitted Done Reply Inline Actions Reworded; the reason why we have the single-line anyway is that: a macro call is something we generally want to have in one unwrapped line the tokens (or other macro calls) that go into the same instance of MacroUnexpander only consist of tokens that do not have an unwrapped line break and macro calls Thus, we want the output to be in a single unwrapped line, as we're otherwise majorly confusing ~everything else in the formatter. klimek: Reworded; the reason why we have the single-line anyway is that: - a macro call is something we…
		MacroCallReconstructor(
		unsigned Level,
		const llvm::DenseMap<FormatToken *, std::unique_ptr<UnwrappedLine>>
		&ActiveExpansions);

		/// For the given \p Line, match all occurences of tokens expanded from a
		/// macro to unwrapped lines in the spelled macro call so that the resulting
		sammccallUnsubmitted Done Reply Inline Actions Maybe a note like "this representation is chosen because it can be opaque to the UnwrappedLineParser, but the Formatter treats it appropriately" or something. I think it should be clear that this representation isn't really "natural" at this layer (or if it is, why). Maybe an example would help. sammccall: Maybe a note like "this representation is chosen because it can be opaque to the…
		/// tree of unwrapped lines best resembles the structure of unwrapped lines
		/// passed in via \c addLine.
		void addLine(const UnwrappedLine &Line);

		/// Check whether at the current state there is no open macro expansion
		/// that needs to be processed to finish an macro call.
		/// Only when \c finished() is true, \c takeResult() can be called to retrieve
		sammccallUnsubmitted Done Reply Inline Actions ASCII art of some sort would help :-) sammccall: ASCII art of some sort would help :-)
		sammccallUnsubmitted Done Reply Inline Actions nit: getResult()->takeResult() in comment now sammccall: nit: getResult()->takeResult() in comment now
		/// the resulting \c UnwrappedLine.
		/// If there are multiple subsequent macro calls within an unwrapped line in
		/// the spelled token stream, the calling code may also continue to call
		sammccallUnsubmitted Done Reply Inline Actions nit: giving `getResult()` a side-effect but also making it idempotent is a bit clever/confusing. Either exposing `void finalize();` + `UnwrappedLine get() const`, or `UnwrappedLine takeResult() &&`, seems a little more obvious. sammccall: nit: giving `getResult()` a side-effect but also making it idempotent is a bit clever/confusing.
		klimekAuthorUnsubmitted Done Reply Inline Actions Done. klimek: Done.
		/// \c addLine() when \c finished() is true.
		bool finished() const { return ActiveExpansions.empty(); }

		/// Retrieve the formatted \c UnwrappedLine containing the orginal
		/// macro calls, formatted according to the expanded token stream received
		/// via \c addLine().
		/// Generally, this line tries to have the same structure as the expanded,
		/// formatted unwrapped lines handed in via \c addLine(), with the exception
		sammccallUnsubmitted Not Done Reply Inline Actions you could give this a name like "tail form", and then refer to it in docs of MacroCallReconstructor::Result, in MacroCallReconstructor.cpp:482, etc. Up to you. sammccall: you could give this a name like "tail form", and then refer to it in docs of…
		klimekAuthorUnsubmitted Done Reply Inline Actions I'm somewhat unhappy with the term "tail form"; happy to do this in a subsequent change if we find a better name. klimek: I'm somewhat unhappy with the term "tail form"; happy to do this in a subsequent change if we…
		/// that for multiple top-level lines, each subsequent line will be the
		/// child of the last token in its predecessor. This representation is chosen
		/// because it is a precondition to the formatter that we get what looks like
		/// a single statement in a single \c UnwrappedLine (i.e. matching parens).
		///
		/// If a token in a macro argument is a child of a token in the expansion,
		/// the parent will be the corresponding token in the macro call.
		/// For example:
		/// #define C(a, b) class C { a b
		/// C(int x;, int y;)
		/// would expand to
		/// class C { int x; int y;
		/// where in a formatted line "int x;" and "int y;" would both be new separate
		/// lines.
		///
		/// In the result, "int x;" will be a child of the opening parenthesis in "C("
		/// and "int y;" will be a child of the "," token:
		/// C (
		sammccallUnsubmitted Done Reply Inline Actions you have this "no expansions are open, but we already didn't find any" state. The effect of this is that finished() returns false so the caller keeps looping. But a correct caller will never rely on this: the first line a caller feeds must have macro tokens in it, or our output will be garbage AFAICT calling getResult() without feeding in any lines is definitely not correct It seems we could rather assert on these two conditions, and eliminate the Start/InProgress distinction. That way incorrect usage is an assertion crash, rather than turning into an infinite loop. sammccall: you have this "no expansions are open, but we already didn't find any" state. The effect of…
		klimekAuthorUnsubmitted Done Reply Inline Actions Changed to asserts. klimek: Changed to asserts.
		/// \- int x;
		/// ,
		sammccallUnsubmitted Done Reply Inline Actions similarly, the InProgress/Finalized distinction would be eliminated by making `takeResult()` destructive, and requiring (through the type system or an assert) that it be called only once. It doesn't seem that it needs to be part of the NDEBUG runtime control flow. sammccall: similarly, the InProgress/Finalized distinction would be eliminated by making `takeResult()`…
		/// \- int y;
		/// )
		UnwrappedLine takeResult() &&;

		private:
		void add(FormatToken Token, FormatToken ExpandedParent, bool First);
		void prepareParent(FormatToken *ExpandedParent, bool First);
		FormatToken getParentInResult(FormatToken Parent);
		void reconstruct(FormatToken *Token);
		void startReconstruction(FormatToken *Token);
		bool reconstructActiveCallUntil(FormatToken *Token);
		void endReconstruction(FormatToken *Token);
		bool processNextReconstructed();
		void finalize();

		struct Line;

		sammccallUnsubmitted Not Done Reply Inline Actions The explanation here seems to be proving the converse: if we didn't use this representation, then the code wouldn't work. However what I'm missing is an explanation of why it is correct/sensible. After staring at the tests, I'm still not sure, since the tests seem to postprocess the "natural" output the same way before asserting equality. My tentative conclusion is it would be clearest to move this "in the end" step to the caller of getResult(), as it seems to have more to do with formatting than unexpansion. But I haven't looked in detail at that caller, maybe I'm wrong... sammccall: The explanation here seems to be proving the converse: if we didn't use this representation…
		klimekAuthorUnsubmitted Done Reply Inline Actions This is basically what I wrote before - in the end, that the expanded code creates multiple unwrapped lines is the weird thing, as the input is fundamentally a single unwrapped line (the macro call plus a bit of stuff around it). Thus, it's quite natural for the unexpander to return a single unwrapped line that represents the original structure. Not sure how to best put this in words. klimek: This is basically what I wrote before - in the end, that the expanded code creates multiple…
		void appendToken(FormatToken Token, Line L = nullptr);
		UnwrappedLine createUnwrappedLine(const Line &Line, int Level);
		void debug(const Line &Line, int Level);
		Line &parentLine();
		Line *currentLine();
		void debugParentMap() const;

		#ifndef NDEBUG
		enum ReconstructorState {
		Start, // No macro expansion was found in the input yet.
		InProgress, // During a macro reconstruction.
		Finalized, // Past macro reconstruction, the result is finalized.
		sammccallUnsubmitted Done Reply Inline Actions This is very closely related to what you return from "getResult" - not quite the same, but Output vs Result doesn't seem to hint at the difference. Could we use the same name for both? sammccall: This is very closely related to what you return from "getResult" - not quite the same, but…
		};
		ReconstructorState State = Start;
		#endif

		sammccallUnsubmitted Done Reply Inline Actions ActiveUnexpandedLines? (Line is very overloaded here) sammccall: ActiveUnexpandedLines? (Line is very overloaded here)
		// Node in which we build up the resulting unwrapped line; this type is
		// analogous to UnwrappedLineNode.
		struct LineNode {
		LineNode() = default;
		LineNode(FormatToken *Tok) : Tok(Tok) {}
		FormatToken *Tok = nullptr;
		llvm::SmallVector<std::unique_ptr<Line>> Children;
		};

		// Line in which we build up the resulting unwrapped line.
		// FIXME: Investigate changing UnwrappedLine to a pointer type and using it
		// instead of rolling our own type.
		struct Line {
		llvm::SmallVector<std::unique_ptr<LineNode>> Tokens;
		};

		// The line in which we collect the resulting reconstructed output.
		// To reduce special cases in the algorithm, the first level of the line
		// contains a single null token that has the reconstructed incoming
		// lines as children.
		// In the end, we stich the lines together so that each subsequent line
		// is a child of the last token of the previous line. This is necessary
		// in order to format the overall expression as a single logical line -
		// if we created separate lines, we'd format them with their own top-level
		// indent depending on the semantic structure, which is not desired.
		sammccallUnsubmitted Done Reply Inline Actions ParentInExpandedToParentInUnexpanded? (current name implies that it maps a token to its parent. It also uses the input/output names, rather than expanded/unexpanded - it would be nice to be consistent) sammccall: ParentInExpandedToParentInUnexpanded? (current name implies that it maps a token to its…
		klimekAuthorUnsubmitted Done Reply Inline Actions Called it SpelledParentToReconstructedParent. klimek: Called it SpelledParentToReconstructedParent.
		Line Result;

		// Stack of currently "open" lines, where each line's predecessor's last
		// token is the parent token for that line.
		llvm::SmallVector<Line *> ActiveReconstructedLines;

		// Maps from the expanded token to the token that takes its place in the
		// reconstructed token stream in terms of parent-child relationships.
		sammccallUnsubmitted Done Reply Inline Actions consider calling these NextSpelled and EndSpelled to be really explicit? Since the type doesn't really clarify which sequence is being referred to. sammccall: consider calling these NextSpelled and EndSpelled to be really explicit? Since the type doesn't…
		klimekAuthorUnsubmitted Done Reply Inline Actions Called them SpelledI and SpelledE in llvm tradition. klimek: Called them SpelledI and SpelledE in llvm tradition.
		// Note that it might take multiple steps to arrive at the correct
		// parent in the output.
		// Given: #define C(a, b) []() { a; b; }
		// And a call: C(f(), g())
		// The structure in the incoming formatted unwrapped line will be:
		sammccallUnsubmitted Done Reply Inline Actions I think this is a confusing use of "unexpanded". These macros that we're in the process of unexpanding. So the past tense doesn't seem right, they don't seem more "unexpanded" than they do "expanded", at least to me. Maybe ActiveExpansions or so? sammccall: I think this is a confusing use of "unexpanded". These macros that we're in the process of…
		// []() {
		// \|- f();
		// \- g();
		// }
		// with f and g being children of the opening brace.
		// In the reconstructed call:
		// C(f(), g())
		// \- f()
		// \- g()
		// We want f to be a child of the opening parenthesis and g to be a child
		// of the comma token in the macro call.
		// Thus, we map
		// { -> (
		// and add
		// ( -> ,
		// once we're past the comma in the reconstruction.
		llvm::DenseMap<FormatToken , FormatToken >
		SpelledParentToReconstructedParent;

		// Keeps track of a single expansion while we're reconstructing tokens it
		// generated.
		struct Expansion {
		// The identifier token of the macro call.
		sammccallUnsubmitted Done Reply Inline Actions nit: if you're going to specify SmallVector sizes, I don't understand why you'd size Unexpanded vs MacroCallStructure differently - they're going to be the same size, right? These days you can leave off the size entirely though, and have it pick a value sammccall: nit: if you're going to specify SmallVector sizes, I don't understand why you'd size Unexpanded…
		klimekAuthorUnsubmitted Done Reply Inline Actions I did not know that, that's awesome! klimek: I did not know that, that's awesome!
		FormatToken *ID;
		// Our current position in the reconstruction.
		std::list<UnwrappedLineNode>::iterator SpelledI;
		// The end of the reconstructed token sequence.
		std::list<UnwrappedLineNode>::iterator SpelledE;
		};

		// Stack of macro calls for which we're in the middle of an expansion.
		llvm::SmallVector<Expansion> ActiveExpansions;

		struct MacroCallState {
		MacroCallState(Line Line, FormatToken ParentLastToken,
		FormatToken *MacroCallLParen);

		Line *Line;

		// The last token in the parent line or expansion, or nullptr if the macro
		// expansion is on a top-level line.
		//
		// For example, in the macro call:
		// auto f = []() { ID(1); };
		// The MacroCallState for ID will have '{' as ParentLastToken.
		//
		// In the macro call:
		// ID(ID(void f()));
		// The MacroCallState of the outer ID will have nullptr as ParentLastToken,
		// while the MacroCallState for the inner ID will have the '(' of the outer
		// ID as ParentLastToken.
		//
		// In the macro call:
		// ID2(a, ID(b));
		// The MacroCallState of ID will have ',' as ParentLastToken.
		FormatToken *ParentLastToken;

		// The l_paren of this MacroCallState's macro call.
		FormatToken *MacroCallLParen;
		};

		// Keeps track of the lines into which the opening brace/parenthesis &
		// argument separating commas for each level in the macro call go in order to
		// put the corresponding closing brace/parenthesis into the same line in the
		// output and keep track of which parents in the expanded token stream map to
		// which tokens in the reconstructed stream.
		// When an opening brace/parenthesis has children, we want the structure of
		// the output line to be:
		// \|- MACRO
		// \|- (
		// \| \- <argument>
		// \|- ,
		// \| \- <argument>
		// \- )
		llvm::SmallVector<MacroCallState> MacroCallStructure;

		// Level the generated UnwrappedLine will be at.
		const unsigned Level;

		// Maps from identifier of the macro call to an unwrapped line containing
		// all tokens of the macro call.
		const llvm::DenseMap<FormatToken *, std::unique_ptr<UnwrappedLine>>
		&IdToReconstructed;
		};

} // namespace format		} // namespace format
} // namespace clang		} // namespace clang

#endif		#endif

clang/lib/Format/UnwrappedLineParser.h

	Show All 14 Lines
	#ifndef LLVM_CLANG_LIB_FORMAT_UNWRAPPEDLINEPARSER_H			#ifndef LLVM_CLANG_LIB_FORMAT_UNWRAPPEDLINEPARSER_H
	#define LLVM_CLANG_LIB_FORMAT_UNWRAPPEDLINEPARSER_H			#define LLVM_CLANG_LIB_FORMAT_UNWRAPPEDLINEPARSER_H

	#include "FormatToken.h"			#include "FormatToken.h"
	#include "clang/Basic/IdentifierTable.h"			#include "clang/Basic/IdentifierTable.h"
	#include "clang/Format/Format.h"			#include "clang/Format/Format.h"
	#include "llvm/ADT/BitVector.h"			#include "llvm/ADT/BitVector.h"
	#include "llvm/Support/Regex.h"			#include "llvm/Support/Regex.h"
				#include <list>
	#include <stack>			#include <stack>
	#include <vector>			#include <vector>

	namespace clang {			namespace clang {
	namespace format {			namespace format {

	struct UnwrappedLineNode;			struct UnwrappedLineNode;

	/// An unwrapped line is a sequence of \c Token, that we would like to			/// An unwrapped line is a sequence of \c Token, that we would like to
	/// put on a single line if there was no column limit.			/// put on a single line if there was no column limit.
	///			///
	/// This is used as a main interface between the \c UnwrappedLineParser and the			/// This is used as a main interface between the \c UnwrappedLineParser and the
	/// \c UnwrappedLineFormatter. The key property is that changing the formatting			/// \c UnwrappedLineFormatter. The key property is that changing the formatting
	/// within an unwrapped line does not affect any other unwrapped lines.			/// within an unwrapped line does not affect any other unwrapped lines.
	struct UnwrappedLine {			struct UnwrappedLine {
	UnwrappedLine();			UnwrappedLine();

	/// The \c Tokens comprising this \c UnwrappedLine.			/// The \c Tokens comprising this \c UnwrappedLine.
	std::vector<UnwrappedLineNode> Tokens;			std::list<UnwrappedLineNode> Tokens;

	/// The indent level of the \c UnwrappedLine.			/// The indent level of the \c UnwrappedLine.
	unsigned Level;			unsigned Level;

	/// Whether this \c UnwrappedLine is part of a preprocessor directive.			/// Whether this \c UnwrappedLine is part of a preprocessor directive.
	bool InPPDirective;			bool InPPDirective;

	bool MustBeDeclaration;			bool MustBeDeclaration;
	▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

clang/unittests/Format/CMakeLists.txt

Show All 12 Lines	add_clang_unittest(FormatTests
FormatTestJson.cpp		FormatTestJson.cpp
FormatTestObjC.cpp		FormatTestObjC.cpp
FormatTestProto.cpp		FormatTestProto.cpp
FormatTestRawStrings.cpp		FormatTestRawStrings.cpp
FormatTestSelective.cpp		FormatTestSelective.cpp
FormatTestTableGen.cpp		FormatTestTableGen.cpp
FormatTestTextProto.cpp		FormatTestTextProto.cpp
FormatTestVerilog.cpp		FormatTestVerilog.cpp
		MacroCallReconstructorTest.cpp
MacroExpanderTest.cpp		MacroExpanderTest.cpp
NamespaceEndCommentsFixerTest.cpp		NamespaceEndCommentsFixerTest.cpp
QualifierFixerTest.cpp		QualifierFixerTest.cpp
SortImportsTestJS.cpp		SortImportsTestJS.cpp
SortImportsTestJava.cpp		SortImportsTestJava.cpp
SortIncludesTest.cpp		SortIncludesTest.cpp
UsingDeclarationsSorterTest.cpp		UsingDeclarationsSorterTest.cpp
TokenAnnotatorTest.cpp		TokenAnnotatorTest.cpp
Show All 9 Lines

clang/unittests/Format/MacroCallReconstructorTest.cpp

This file was added.

				#include "../../lib/Format/Macros.h"
				#include "../../lib/Format/UnwrappedLineParser.h"
				#include "TestLexer.h"
				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringRef.h"

				#include "gmock/gmock.h"
				#include "gtest/gtest.h"
				#include <map>
				#include <memory>
				#include <vector>

				namespace clang {
				namespace format {
				namespace {

				using UnexpandedMap =
				llvm::DenseMap<FormatToken *, std::unique_ptr<UnwrappedLine>>;

				// Keeps track of a sequence of macro expansions.
				//
				// The expanded tokens are accessible via getTokens(), while a map of macro call
				// identifier token to unexpanded token stream is accessible via
				// getUnexpanded().
				class Expansion {
				public:
				Expansion(TestLexer &Lex, MacroExpander &Macros) : Lex(Lex), Macros(Macros) {}

				// Appends the token stream obtained from expanding the macro Name given
				// the provided arguments, to be later retrieved with getTokens().
				// Returns the list of tokens making up the unexpanded macro call.
				TokenList
				expand(llvm::StringRef Name,
				const SmallVector<llvm::SmallVector<FormatToken *, 8>, 1> &Args) {
				auto *ID = Lex.id(Name);
				auto UnexpandedLine = std::make_unique<UnwrappedLine>();
				UnexpandedLine->Tokens.push_back(ID);
				if (!Args.empty()) {
				UnexpandedLine->Tokens.push_back(Lex.id("("));
				for (auto I = Args.begin(), E = Args.end(); I != E; ++I) {
				if (I != Args.begin())
				UnexpandedLine->Tokens.push_back(Lex.id(","));
				UnexpandedLine->Tokens.insert(UnexpandedLine->Tokens.end(), I->begin(),
				I->end());
				}
				UnexpandedLine->Tokens.push_back(Lex.id(")"));
				}
				Unexpanded[ID] = std::move(UnexpandedLine);

				auto Expanded = uneof(Macros.expand(ID, Args));
				Tokens.append(Expanded.begin(), Expanded.end());

				TokenList UnexpandedTokens;
				for (const UnwrappedLineNode &Node : Unexpanded[ID]->Tokens) {
				UnexpandedTokens.push_back(Node.Tok);
				}
				return UnexpandedTokens;
				}

				TokenList expand(llvm::StringRef Name,
				const std::vector<std::string> &Args = {}) {
				return expand(Name, lexArgs(Args));
				}

				const UnexpandedMap &getUnexpanded() const { return Unexpanded; }

				const TokenList &getTokens() const { return Tokens; }

				private:
				llvm::SmallVector<TokenList, 1>
				lexArgs(const std::vector<std::string> &Args) {
				llvm::SmallVector<TokenList, 1> Result;
				for (const auto &Arg : Args) {
				Result.push_back(uneof(Lex.lex(Arg)));
				}
				return Result;
				}
				llvm::DenseMap<FormatToken *, std::unique_ptr<UnwrappedLine>> Unexpanded;
				llvm::SmallVector<FormatToken *, 8> Tokens;
				TestLexer &Lex;
				MacroExpander &Macros;
				};

				struct Chunk {
				Chunk(llvm::ArrayRef<FormatToken *> Tokens)
				: Tokens(Tokens.begin(), Tokens.end()) {}
				Chunk(llvm::ArrayRef<UnwrappedLine> Children)
				: Children(Children.begin(), Children.end()) {}
				llvm::SmallVector<UnwrappedLineNode, 1> Tokens;
				llvm::SmallVector<UnwrappedLine, 0> Children;
				};

				bool tokenMatches(const FormatToken Left, const FormatToken Right) {
				if (Left->getType() == Right->getType() &&
				Left->TokenText == Right->TokenText)
				return true;
				llvm::dbgs() << Left->TokenText << " != " << Right->TokenText << "\n";
				return false;
				}

				// Allows to produce chunks of a token list by typing the code of equal tokens.
				//
				// Created from a list of tokens, users call "consume" to get the next chunk
				// of tokens, checking that they match the written code.
				struct Matcher {
				Matcher(const TokenList &Tokens, TestLexer &Lex)
				: Tokens(Tokens), It(this->Tokens.begin()), Lex(Lex) {}

				Chunk consume(StringRef Tokens) {
				TokenList Result;
				for (const FormatToken *Token : uneof(Lex.lex(Tokens))) {
				assert(tokenMatches(*It, Token));
				Result.push_back(*It);
				++It;
				}
				return Chunk(Result);
				}

				TokenList Tokens;
				TokenList::iterator It;
				TestLexer &Lex;
				};

				UnexpandedMap mergeUnexpanded(const UnexpandedMap &M1,
				const UnexpandedMap &M2) {
				UnexpandedMap Result;
				for (const auto &KV : M1) {
				Result[KV.first] = std::make_unique<UnwrappedLine>(*KV.second);
				}
				for (const auto &KV : M2) {
				Result[KV.first] = std::make_unique<UnwrappedLine>(*KV.second);
				}
				return Result;
				}

				class MacroCallReconstructorTest : public ::testing::Test {
				public:
				MacroCallReconstructorTest() : Lex(Allocator, Buffers) {}

				std::unique_ptr<MacroExpander>
				createExpander(const std::vector<std::string> &MacroDefinitions) {
				return std::make_unique<MacroExpander>(MacroDefinitions,
				Lex.SourceMgr.get(), Lex.Style,
				Lex.Allocator, Lex.IdentTable);
				}

				UnwrappedLine line(llvm::ArrayRef<FormatToken *> Tokens) {
				UnwrappedLine Result;
				for (FormatToken *Tok : Tokens) {
				Result.Tokens.push_back(UnwrappedLineNode(Tok));
				}
				return Result;
				}

				UnwrappedLine line(llvm::StringRef Text) { return line({lex(Text)}); }

				UnwrappedLine line(llvm::ArrayRef<Chunk> Chunks) {
				UnwrappedLine Result;
				for (const Chunk &Chunk : Chunks) {
				Result.Tokens.insert(Result.Tokens.end(), Chunk.Tokens.begin(),
				Chunk.Tokens.end());
				assert(!Result.Tokens.empty());
				Result.Tokens.back().Children.append(Chunk.Children.begin(),
				Chunk.Children.end());
				}
				return Result;
				}

				TokenList lex(llvm::StringRef Text) { return uneof(Lex.lex(Text)); }

				Chunk tokens(llvm::StringRef Text) { return Chunk(lex(Text)); }

				Chunk children(llvm::ArrayRef<UnwrappedLine> Children) {
				return Chunk(Children);
				}

				llvm::SpecificBumpPtrAllocator<FormatToken> Allocator;
				std::vector<std::unique_ptr<llvm::MemoryBuffer>> Buffers;
				TestLexer Lex;
				};

				bool matchesTokens(const UnwrappedLine &L1, const UnwrappedLine &L2) {
				if (L1.Tokens.size() != L2.Tokens.size())
				return false;
				for (auto L1It = L1.Tokens.begin(), L2It = L2.Tokens.begin();
				L1It != L1.Tokens.end(); ++L1It, ++L2It) {
				if (L1It->Tok != L2It->Tok)
				return false;
				if (L1It->Children.size() != L2It->Children.size())
				return false;
				for (auto L1ChildIt = L1It->Children.begin(),
				L2ChildIt = L2It->Children.begin();
				L1ChildIt != L1It->Children.end(); ++L1ChildIt, ++L2ChildIt) {
				if (!matchesTokens(L1ChildIt, L2ChildIt))
				return false;
				}
				}
				return true;
				}
				MATCHER_P(matchesLine, line, "") { return matchesTokens(arg, line); }

				TEST_F(MacroCallReconstructorTest, Identifier) {
				auto Macros = createExpander({"X=x"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("X");

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Unexp.addLine(line(Exp.getTokens()));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(line(U.consume("X"))));
				}

				TEST_F(MacroCallReconstructorTest, NestedLineWithinCall) {
				auto Macros = createExpander({"C(a)=class X { a; };"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("C", {"void f()"});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				Unexp.addLine(line(E.consume("class X {")));
				EXPECT_FALSE(Unexp.finished());
				Unexp.addLine(line(E.consume("void f();")));
				EXPECT_FALSE(Unexp.finished());
				Unexp.addLine(line(E.consume("};")));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				EXPECT_THAT(std::move(Unexp).takeResult(),
				matchesLine(line(U.consume("C(void f())"))));
				}

				TEST_F(MacroCallReconstructorTest, MultipleLinesInNestedMultiParamsExpansion) {
				auto Macros = createExpander({"C(a, b)=a b", "B(a)={a}"});
				Expansion Exp1(Lex, *Macros);
				TokenList Call1 = Exp1.expand("B", {"b"});
				Expansion Exp2(Lex, *Macros);
				TokenList Call2 = Exp2.expand("C", {uneof(Lex.lex("a")), Exp1.getTokens()});

				UnexpandedMap Unexpanded =
				mergeUnexpanded(Exp1.getUnexpanded(), Exp2.getUnexpanded());
				MacroCallReconstructor Unexp(0, Unexpanded);
				Matcher E(Exp2.getTokens(), Lex);
				Unexp.addLine(line(E.consume("a")));
				EXPECT_FALSE(Unexp.finished());
				Unexp.addLine(line(E.consume("{")));
				EXPECT_FALSE(Unexp.finished());
				Unexp.addLine(line(E.consume("b")));
				EXPECT_FALSE(Unexp.finished());
				Unexp.addLine(line(E.consume("}")));
				EXPECT_TRUE(Unexp.finished());

				Matcher U1(Call1, Lex);
				auto Middle = U1.consume("B(b)");
				Matcher U2(Call2, Lex);
				auto Chunk1 = U2.consume("C(a, ");
				auto Chunk2 = U2.consume("{ b }");
				auto Chunk3 = U2.consume(")");

				EXPECT_THAT(std::move(Unexp).takeResult(),
				matchesLine(line({Chunk1, Middle, Chunk3})));
				}

				TEST_F(MacroCallReconstructorTest, StatementSequence) {
				auto Macros = createExpander({"SEMI=;"});
				Expansion Exp(Lex, *Macros);
				TokenList Call1 = Exp.expand("SEMI");
				TokenList Call2 = Exp.expand("SEMI");
				TokenList Call3 = Exp.expand("SEMI");

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				Unexp.addLine(line(E.consume(";")));
				EXPECT_TRUE(Unexp.finished());
				Unexp.addLine(line(E.consume(";")));
				EXPECT_TRUE(Unexp.finished());
				Unexp.addLine(line(E.consume(";")));
				EXPECT_TRUE(Unexp.finished());
				Matcher U1(Call1, Lex);
				Matcher U2(Call2, Lex);
				Matcher U3(Call3, Lex);
				EXPECT_THAT(std::move(Unexp).takeResult(),
				matchesLine(line(
				{U1.consume("SEMI"),
				children({line({U2.consume("SEMI"),
				children({line(U3.consume("SEMI"))})})})})));
				}

				TEST_F(MacroCallReconstructorTest, NestedBlock) {
				auto Macros = createExpander({"ID(x)=x"});
				// Test: ID({ ID(a *b); })
				// 1. expand ID(a b) -> a b
				Expansion Exp1(Lex, *Macros);
				TokenList Call1 = Exp1.expand("ID", {"a *b"});
				// 2. expand ID({ a *b; })
				TokenList Arg;
				Arg.push_back(Lex.id("{"));
				Arg.append(Exp1.getTokens().begin(), Exp1.getTokens().end());
				Arg.push_back(Lex.id(";"));
				Arg.push_back(Lex.id("}"));
				Expansion Exp2(Lex, *Macros);
				TokenList Call2 = Exp2.expand("ID", {Arg});

				// Consume as-if formatted:
				// {
				// a *b;
				// }
				UnexpandedMap Unexpanded =
				mergeUnexpanded(Exp1.getUnexpanded(), Exp2.getUnexpanded());
				MacroCallReconstructor Unexp(0, Unexpanded);
				Matcher E(Exp2.getTokens(), Lex);
				Unexp.addLine(line(E.consume("{")));
				EXPECT_FALSE(Unexp.finished());
				Unexp.addLine(line(E.consume("a *b;")));
				EXPECT_FALSE(Unexp.finished());
				Unexp.addLine(line(E.consume("}")));
				EXPECT_TRUE(Unexp.finished());

				// Expect lines:
				// ID({
				// ID(a *b);
				// })
				Matcher U1(Call1, Lex);
				Matcher U2(Call2, Lex);
				auto Chunk2Start = U2.consume("ID(");
				auto Chunk2LBrace = U2.consume("{");
				U2.consume("a *b");
				auto Chunk2Mid = U2.consume(";");
				auto Chunk2RBrace = U2.consume("}");
				auto Chunk2End = U2.consume(")");
				auto Chunk1 = U1.consume("ID(a *b)");

				auto Expected = line({Chunk2Start,
				children({
				line(Chunk2LBrace),
				line({Chunk1, Chunk2Mid}),
				line(Chunk2RBrace),
				}),
				Chunk2End});
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, NestedChildBlocks) {
				auto Macros = createExpander({"ID(x)=x", "CALL(x)=f([] { x })"});
				// Test: ID(CALL(CALL(return a * b;)))
				// 1. expand CALL(return a * b;)
				Expansion Exp1(Lex, *Macros);
				TokenList Call1 = Exp1.expand("CALL", {"return a * b;"});
				// 2. expand CALL(f([] { return a * b; }))
				Expansion Exp2(Lex, *Macros);
				TokenList Call2 = Exp2.expand("CALL", {Exp1.getTokens()});
				// 3. expand ID({ f([] { f([] { return a * b; }) }) })
				TokenList Arg3;
				Arg3.push_back(Lex.id("{"));
				Arg3.append(Exp2.getTokens().begin(), Exp2.getTokens().end());
				Arg3.push_back(Lex.id("}"));
				Expansion Exp3(Lex, *Macros);
				TokenList Call3 = Exp3.expand("ID", {Arg3});

				// Consume as-if formatted in three unwrapped lines:
				// 0: {
				// 1: f([] {
				// f([] {
				// return a * b;
				// })
				// })
				// 2: }
				UnexpandedMap Unexpanded = mergeUnexpanded(
				Exp1.getUnexpanded(),
				mergeUnexpanded(Exp2.getUnexpanded(), Exp3.getUnexpanded()));
				MacroCallReconstructor Unexp(0, Unexpanded);
				Matcher E(Exp3.getTokens(), Lex);
				Unexp.addLine(line(E.consume("{")));
				Unexp.addLine(
				line({E.consume("f([] {"),
				children({line({E.consume("f([] {"),
				children({line(E.consume("return a * b;"))}),
				E.consume("})")})}),
				E.consume("})")}));
				Unexp.addLine(line(E.consume("}")));
				EXPECT_TRUE(Unexp.finished());

				// Expect lines:
				// ID(
				// {
				// CALL(CALL(return a * b;))
				// }
				// )
				Matcher U1(Call1, Lex);
				Matcher U2(Call2, Lex);
				Matcher U3(Call3, Lex);
				auto Chunk3Start = U3.consume("ID(");
				auto Chunk3LBrace = U3.consume("{");
				U3.consume("f([] { f([] { return a * b; }) })");
				auto Chunk3RBrace = U3.consume("}");
				auto Chunk3End = U3.consume(")");
				auto Chunk2Start = U2.consume("CALL(");
				U2.consume("f([] { return a * b; })");
				auto Chunk2End = U2.consume(")");
				auto Chunk1 = U1.consume("CALL(return a * b;)");

				auto Expected = line({
				Chunk3Start,
				children({
				line(Chunk3LBrace),
				line({
				Chunk2Start,
				Chunk1,
				Chunk2End,
				}),
				line(Chunk3RBrace),
				}),
				Chunk3End,
				});
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, NestedChildrenMultipleArguments) {
				auto Macros = createExpander({"CALL(a, b)=f([] { a; b; })"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("CALL", {std::string("int a"), "int b"});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				Unexp.addLine(line({
				E.consume("f([] {"),
				children({
				line(E.consume("int a;")),
				line(E.consume("int b;")),
				}),
				E.consume("})"),
				}));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				auto Expected = line(U.consume("CALL(int a, int b)"));
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, ReverseOrderArgumentsInExpansion) {
				auto Macros = createExpander({"CALL(a, b)=b + a"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("CALL", {std::string("x"), "y"});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				Unexp.addLine(line(E.consume("y + x")));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				auto Expected = line(U.consume("CALL(x, y)"));
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, MultipleToplevelUnwrappedLines) {
				auto Macros = createExpander({"ID(a, b)=a b"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("ID", {std::string("x; x"), "y"});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				Unexp.addLine(line(E.consume("x;")));
				Unexp.addLine(line(E.consume("x y")));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				auto Expected = line({
				U.consume("ID("),
				children({
				line(U.consume("x;")),
				line(U.consume("x")),
				}),
				U.consume(", y)"),
				});
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, NestedCallsMultipleLines) {
				auto Macros = createExpander({"ID(x)=x"});
				// Test: ID({ID(a * b);})
				// 1. expand ID(a * b)
				Expansion Exp1(Lex, *Macros);
				TokenList Call1 = Exp1.expand("ID", {"a * b"});
				// 2. expand ID({ a * b; })
				Expansion Exp2(Lex, *Macros);
				TokenList Arg2;
				Arg2.push_back(Lex.id("{"));
				Arg2.append(Exp1.getTokens().begin(), Exp1.getTokens().end());
				Arg2.push_back(Lex.id(";"));
				Arg2.push_back(Lex.id("}"));
				TokenList Call2 = Exp2.expand("ID", {Arg2});

				// Consume as-if formatted in three unwrapped lines:
				// 0: {
				// 1: a * b;
				// 2: }
				UnexpandedMap Unexpanded =
				mergeUnexpanded(Exp1.getUnexpanded(), Exp2.getUnexpanded());
				MacroCallReconstructor Unexp(0, Unexpanded);
				Matcher E(Exp2.getTokens(), Lex);
				Unexp.addLine(line(E.consume("{")));
				Unexp.addLine(line(E.consume("a * b;")));
				Unexp.addLine(line(E.consume("}")));
				EXPECT_TRUE(Unexp.finished());

				// Expect lines:
				// ID(
				// {
				// ID(a * b);
				// }
				// )
				Matcher U1(Call1, Lex);
				Matcher U2(Call2, Lex);
				auto Chunk2Start = U2.consume("ID(");
				auto Chunk2LBrace = U2.consume("{");
				U2.consume("a * b");
				auto Chunk2Semi = U2.consume(";");
				auto Chunk2RBrace = U2.consume("}");
				auto Chunk2End = U2.consume(")");
				auto Chunk1 = U1.consume("ID(a * b)");

				auto Expected = line({
				Chunk2Start,
				children({
				line({Chunk2LBrace}),
				line({Chunk1, Chunk2Semi}),
				line({Chunk2RBrace}),
				}),
				Chunk2End,
				});
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, ParentOutsideMacroCall) {
				auto Macros = createExpander({"ID(a)=a"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("ID", {std::string("x; y; z;")});

				auto Prefix = tokens("int a = []() {");
				auto Postfix = tokens("}();");
				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				Unexp.addLine(line({
				Prefix,
				children({
				line(E.consume("x;")),
				line(E.consume("y;")),
				line(E.consume("z;")),
				}),
				Postfix,
				}));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				auto Expected = line({
				Prefix,
				children({
				line({
				U.consume("ID("),
				children({
				line(U.consume("x;")),
				line(U.consume("y;")),
				line(U.consume("z;")),
				}),
				U.consume(")"),
				}),
				}),
				Postfix,
				});
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, UnusedMacroArguments) {
				auto Macros = createExpander({"X=x"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("X", {"a", "b", "c"});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Unexp.addLine(line(Exp.getTokens()));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				EXPECT_THAT(std::move(Unexp).takeResult(),
				matchesLine(line(U.consume("X(a, b, c)"))));
				}

				TEST_F(MacroCallReconstructorTest, UnusedEmptyMacroArgument) {
				auto Macros = createExpander({"X=x"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("X", {std::string("")});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				auto Semi = tokens(";");
				Unexp.addLine(line({E.consume("x"), Semi}));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				EXPECT_THAT(std::move(Unexp).takeResult(),
				matchesLine(line({U.consume("X()"), Semi})));
				}

				TEST_F(MacroCallReconstructorTest, ChildrenSplitAcrossArguments) {
				auto Macros = createExpander({"CALL(a, b)=f([]() a b)"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("CALL", {std::string("{ a;"), "b; }"});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				Unexp.addLine(line({
				E.consume("f([]() {"),
				children({
				line(E.consume("a;")),
				line(E.consume("b;")),
				}),
				E.consume("})"),
				}));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				auto Expected = line({
				U.consume("CALL({"),
				children(line(U.consume("a;"))),
				U.consume(", b; })"),
				});
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, ChildrenAfterMacroCall) {
				auto Macros = createExpander({"CALL(a, b)=f([]() a b"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("CALL", {std::string("{ a"), "b"});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				auto Semi = tokens(";");
				auto SecondLine = tokens("c d;");
				auto ThirdLine = tokens("e f;");
				auto Postfix = tokens("})");
				Unexp.addLine(line({
				E.consume("f([]() {"),
				children({
				line({E.consume("a b"), Semi}),
				line(SecondLine),
				line(ThirdLine),
				}),
				Postfix,
				}));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				auto Expected = line({
				U.consume("CALL({"),
				children(line(U.consume("a"))),
				U.consume(", b)"),
				Semi,
				children(line({
				SecondLine,
				children(line({
				ThirdLine,
				Postfix,
				})),
				})),
				});
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				TEST_F(MacroCallReconstructorTest, InvalidCodeSplittingBracesAcrossArgs) {
				auto Macros = createExpander({"M(a, b)=(a) (b)"});
				Expansion Exp(Lex, *Macros);
				TokenList Call = Exp.expand("M", {std::string("{"), "x", ""});

				MacroCallReconstructor Unexp(0, Exp.getUnexpanded());
				Matcher E(Exp.getTokens(), Lex);
				auto Prefix = tokens("({");
				Unexp.addLine(line({
				Prefix,
				children({
				line({
				E.consume("({"),
				children({line(E.consume(")(x)"))}),
				}),
				}),
				}));
				EXPECT_TRUE(Unexp.finished());
				Matcher U(Call, Lex);
				auto Expected = line({
				Prefix,
				children({line(U.consume("M({,x,)"))}),
				});
				EXPECT_THAT(std::move(Unexp).takeResult(), matchesLine(Expected));
				}

				} // namespace
				} // namespace format
				} // namespace clang