Page MenuHomePhabricator

[analyzer] Introduce MacroExpansionContext to libAnalysis

Authored by steakhal on Dec 14 2020, 8:19 AM.



Introduce MacroExpansionContext to track what and how macros in a translation
unit expand. This is the first element of the patch-stack in this direction.

The main goal is to substitute the current macro expansion generator in the
PlistsDiagnostics, but all the other DiagnosticsConsumer could benefit from

getExpandedText and getOriginalText are the primary functions of this class.
The former can provide you the text that was the result of the macro expansion
chain starting from a SourceLocation.
While the latter will tell you what text was in the original source code
replaced by the macro expansion chain from that location.

Here is an example:

void bar();
#define retArg(x) x
#define retArgUnclosed retArg(bar()
#define BB CC
#define applyInt BB(int)
#define CC(x) retArgUnclosed

void unbalancedMacros() {
  applyInt  );
//^~~~~~~~~~^ is the substituted range
// Original text is "applyInt  )"
// Expanded text is "bar()"

#define expandArgUnclosedCommaExpr(x) (x, bar(), 1
#define f expandArgUnclosedCommaExpr

void unbalancedMacros2() {
  int x =  f(f(1))  ));  // Look at the parenthesis!
//         ^~~~~~^ is the substituted range
// Original text is "f(f(1))"
// Expanded text is "((1,bar(),1,bar(),1"

Might worth investigating how to provide a reusable component, which could be
used for example by a standalone tool eg. expanding all macros to their

I borrowed the main idea from the PrintPreprocessedOutput.cpp Frontend
component, providing a PPCallbacks instance hooking the preprocessor events.
I'm using that for calculating the source range where tokens will be expanded
to. I'm also using the Preprocessor's OnToken callback, via the
Preprocessor::setTokenWatcher to reconstruct the expanded text.

Unfortunately, I concatenate the token's string representation without any
whitespaces except if the token is an identifier when I emit an extra space
to produce valid code for int var token sequences.
This could be improved later if needed.


  1. D93222 (this one) Introduces the MacroExpansionContext class and unittests
  2. D93223 Create MacroExpansionContext member in AnalysisConsumer and pass down to the diagnostics consumers
  3. D93224 Use the MacroExpansionContext for macro expansions in plists It replaces the 'old' macro expansion mechanism.
  4. D94673 API for CTU macro expansions You should be able to get a MacroExpansionContext for each imported TU. Right now it will just return llvm::None as this is not implemented yet.
  5. FIXME: Implement macro expansion tracking for imported TUs as well.

It would also relieve us from bugs like:

  • [fixed] D86135
  • [confirmed] The __VA_ARGS__ and other macro nitty-gritty, such as how to stringify macro parameters, where to put or swallow commas, etc. are not handled correctly.
  • [confirmed] Unbalanced parenthesis are not well handled - resulting in incorrect expansions or even crashes.
  • [confirmed][crashing]

Diff Detail

Event Timeline

steakhal created this revision.Dec 14 2020, 8:19 AM
steakhal requested review of this revision.Dec 14 2020, 8:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2020, 8:19 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steakhal retitled this revision from [RFC] Introduce MacroExpansionContext to libAnalysis to [RFC][analyzer] Introduce MacroExpansionContext to libAnalysis.Dec 14 2020, 8:27 AM

Nice and precise work! And special thanks for the unit tests. I've found some minor nits.




I think we should have an Optional<String> as a return type, so we'd be able to detect if no macro expansion happened. (?)


Could this be StringRef also?


We should be able to distinguish between no macro expansion (1) and expansion to an empty string (2). So, I think we should have an Optional<StringRef> as a return type.



martong added inline comments.Dec 15 2020, 4:04 AM

getExpandedText ? Since what you get back is not a macro anymore.


Substituted is a bit ambiguous to me. What about getOriginalText ?

Overall looks good to me, I have some minor nits and questions inline.


It may be more idiomatic to put classes in an anonymous namespace rather than expanding them in a method.


Will we end up recording ranges for intermediate macro expansions? If yes, we might end up recording excessive amount of ranges. Is there any way to only record final expansion ranges in that case?


Why do you need the assignment in &SM = SM?


I am a bit uneasy about this. As far as I understand a macro could be expanded to an empty string. How could the user differentiate between a valid empty expansion and a lookup failure. Also, can this lookup ever fail in a well behaving code? Do we need to add llvm_unreachable() or something?


Similar concerns to above. Do we want an assert? When can this happen?

steakhal marked an inline comment as done.Dec 16 2020, 3:12 AM

I want to highlight, that the 4th part of the stack is not yet done.
Partially because I'm not quite familiar with CTU.
AFAIK, CTU creates compiler invocations, which are going to call Parse at some point.
I'm not sure how to configure, apply the callbacks, etc. on the Preprocessor in the middle of that process.
I'm also reluctant to pass this domain-specific MacroExpansionContext as the 23rd parameter to that generic function (ASTUnit::LoadFromCommandLine).

I'm not even sure if there will be a Preprocessor, which would generate the 'OnToken' or PPCallbacks calls if we are parsing a PCH or some other serialized source.
That would be crucial for this MacroExpansion machinery. I hope you can ensure that I'm on the right path.


Yes, it depends on what interface we want to provide.
If we expect it to be called on source locations that participate in a macro expansion, then it would be somewhat redundant.
I don't mind if it returns an Optional<>.


We could return a StringRef, but that would point to an entry of a DenseMap. That could be invalidated later if the preprocessor still generates tokens.

If we say that these calls should be called after the preprocessing is completely done, it could just return a StringRef.


You are right! I like it.


I'm not sure about this.
If a SourceLocation is not a macro expansion, then the isFileID() would return false AFAIK.
Otherwise, if the result is an empty string, that means that the macro indeed expanded to no tokens.


I agree.


I tried, but I could not hide the MacroExpansionRangeRecorder class as a detail.
Let me know if there is a better way to hide details.


SM is an object member.
The lambda should either capture the this pointer or a pointer/reference to the referred member (SM).
I preferred the latter as the lambda doesn't want to access all of the members except this one.


Yes, I should have put an unreachable there as you suggested.


Yes, I should put an assert there.

xazax.hun added inline comments.Dec 16 2020, 3:22 AM

I don't see any problems with the code you linked. What do you mean by could not hide it better? I think, it is very common to have utility functions with the current file as the visibility.

whisperity added inline comments.Dec 16 2020, 4:12 AM

@steakhal You need to forward declare the class in the detail namespace first within the header (so both the namespace and the qualified name detail::Class is introduced), and then the friend class declaration works as intended, see:

xazax.hun added inline comments.Dec 16 2020, 4:23 AM

Oh, I see the problem now. Somehow I did not see the error message. I would also not mind not hiding the using ExpansionRangeMap ... part,

steakhal updated this revision to Diff 314857.Jan 6 2021, 4:47 AM
steakhal marked 11 inline comments as done.
  • move MacroExpansionRangeRecorder to clang::detail and mark it as a friend class
  • fix comment typo in getExpandedMacroForLocation
  • rename getExpandedMacroForLocation -> getExpandedText
  • rename getSubstitutedTextForLocation -> getOriginalText
  • introduce llvm_unreachable where applicable.

No tests were changed.

martong accepted this revision.Jan 20 2021, 8:36 AM

Nice work! Thanks!


[Range, &SM, &MacroName] would not work? (why?)


Missing newline?

This revision is now accepted and ready to land.Jan 20 2021, 8:36 AM
steakhal marked an inline comment as done.Jan 21 2021, 3:26 AM
steakhal added inline comments.

All top-level macro expansions will insert a new entry to the map. All intermediate macro expansions right after that will have the very same expansion location, so it will just update the 'end' for that expansion chain.
For tracking the final tokens I'm just aggregating them for every expansion location in a similar fashion.

MacroExpansionContext.cpp:35:50: error: capture of non-variable 'clang::detail::MacroExpansionRangeRecorder::SM'
MacroExpansionContext.cpp:38:16: error: 'this' was not captured for this lambda function

I honestly don't know why was that not addressed by clang-format.
Even here, the prebuild bot should have complained about this - if this is an issue.

If you still think I should add that newline, we should also take a look at the .clang-format.

steakhal updated this revision to Diff 318480.Jan 22 2021, 3:59 AM
steakhal marked 2 inline comments as done.
steakhal retitled this revision from [RFC][analyzer] Introduce MacroExpansionContext to libAnalysis to [analyzer] Introduce MacroExpansionContext to libAnalysis.
steakhal edited the summary of this revision. (Show Details)

No changes made besides backporting API changes from the immediate child revision, D93223.
This way the API of MacroExpansionContext won't be changed in later parts of this stack.

steakhal updated this revision to Diff 319011.Jan 25 2021, 8:05 AM

Ignore _Pragma macro expansions, also ignore those during lexing.
Added a unit-test to demonstrate that we don't crash and these expansions are not recorded.

Fix typo in MacroExpansionContext.cpp header comment.

martong accepted this revision.Feb 1 2021, 6:09 AM

Still looks good to me! Thanks for handling the pragma cases!

Szelethus accepted this revision.Feb 9 2021, 7:03 PM

This is amazing. We longed for a sensible implementation for this for a long time. Really liking the unit tests as well!

There are a number of inlines you didn't mark as done but seem to have addressed.

The main goal is to substitute the current macro expansion generator in the PlistsDiagnostics, but all the other DiagnosticsConsumer could benefit from this.

Good! Burn it, bury it, lets forget that it ever existed. It was a stupid idea from the get-go and got so much worse over time.


That might be very tricky. I recall stumbling across the same issue when using the Preprocessor. I think I used this or something similar:


Consider mentioning getExpansionLoc, since that is almost always the source of the expansion loc (no other comes to my mind).


Well that is interesting, so the Range parameter is not (always?) the range of the expansion. Doxygen says that

Called by Preprocessor::HandleMacroExpandedIdentifier when a macro invocation is found.

in which there is a TODO:

// FIXME: We lose macro args info with delayed callback.
Callbacks->MacroExpands(Info.Tok, Info.MD, Info.Range,

I suppose you're handling this corner case?


In the TokenPrinter in the plist implementation, I used Preprocessor::getSpelling().

Szelethus added inline comments.Feb 9 2021, 7:44 PM

Hmm, I'm by no means an expert, but isn't a string-like structure a bit big for a DenseMap?

steakhal marked 4 inline comments as done.Feb 10 2021, 2:47 AM

Thank you @Szelethus for taking the time to review this.
This time I marked the inline comments done where it was applicable :)

I'm gonna investigate some of your comments and if everything goes well I'm planning to commit this in the next week.


Since I'm listening to every new token that was parsed (via the token watcher), the tokens will be absolutely correct. But I'm sacrificing the precise location to acquire it.
However, we should be able to reconstruct the precise position too with some additional bookkeeping. But I'm more interested in correct expansions than whitespace preserving (and also correct) expansions.
That's why I'm not interested in re-lexing anything via clang::Preprocessor::getRawToken.


Ok, I will.


I'm not sure. When I grepped for an associative container having SourceLocation as a key, the only one I found was the DenseMap.
Besides that, the Value should fit into a single cacheline (at least on my X86_64 linux machine).
What data structure you think is a better fit?


It could be, but I'm not sure.
According to the code snippet you mention, the range does not look suspicious. The nullptr is passed as the Args parameter.
So, I think it's a different thing.

Either way, my tests will break if anything changes, so we will be notified for sure.


Oh, I'm gonna investigate!
This code looks really ugly xD.