This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][MacroExpansion] Fix a crash where multiple parameters resolved to __VA_ARGS__
ClosedPublic

Authored by Szelethus on Aug 18 2020, 5:29 AM.

Details

Summary

In short, macro expansions handled the case where a variadic parameter mapped to multiple arguments, but not the other way around. An internal ticket was submitted that demonstrated that we fail an assertion. Macro expansion so far worked by lexing the source code token-by-token and using the Preprocessor to turn these tokens into identifiers or just get their proper spelling, but what this counter intuitively doesn't do, is actually expand these macros, so we have to do the heavy lifting -- in this case, figure out what __VA_ARGS__ expands into. Since this case can only occur in a nested macro, the information we gathered from the containing macro does contain this information. If a parameter resolves to __VA_ARGS__, we need to temporarily stop getting our tokens from the lexer, and get the tokens from what __VA_ARGS__ maps to.

I also found a few more deficiencies I'll have to handle sooner rather then later. I also have 3 commits about to land this is based on, some miscellaneous renames, clarification in the documentation, prettifying some tests.

An educational rant:

For those that didn't have the displeasure of following my macro expansion project, here's why this is so annoying: We really, really suck at understanding macros. D74473 is a recent example where we attempt to get the definition of the EOF macro, but give up rather fast if it isn't something trivial. D54349#1294765 is also memorable for me. Indeed, whenever we hit a macro, we are in deep trouble.

Since CodeChecker isn't an IDE, what we really wanted to achieve is when the bug path goes through a macro, show what it would expand into.

The fundamental problem is, we simply can't ask Preprocessor what a macro expands into without hacking really hard. The HTML output commits about every sin under the sun; hiding under clang/lib/Rewrite/HTMLRewrite.cpp, it const_casts a Preprocessor object, stores most of its inner state (you can't guarantee you got them all) such as the DiagnosticsEngine, some of the user configurations in temporary variables, replaces them with new ones, practically reruns the entire lexing and preprocessing stage of the compiler, and at the end, puts the original inner state back in. This enables it to not have to literally reimplement the preprocessor, but creates a large preprocessed version of the source code, which it is almost impossible to reliably link the two together (answering the question that what a specific macro usage expands into).

Pp-trace, a Clang tool, would be great, but unfortunately it still isn't able to do macro expansions in a single go, just step by step (think about nested macros and variadic arguments).

So, I chose to practically reimplement the entire preprocessor with the goal of taking a single source location of a macro usage, and spitting the entire expansion out. This has a few advantages, namely that it leaves the Preprocessor const, and provides a rather deep understanding of the process of the macro expansion. The problem is, its a nightmare due to the extreme primitiveness of the available tools, and will have to be updated as time goes on.

You can learn a bit more from these patches and discussions:
D52742 (check the patch stack)
http://lists.llvm.org/pipermail/cfe-dev/2018-September/059226.html
http://lists.llvm.org/pipermail/cfe-dev/2017-August/055077.html

Diff Detail

Event Timeline

Szelethus created this revision.Aug 18 2020, 5:29 AM
Szelethus requested review of this revision.Aug 18 2020, 5:29 AM
Szelethus added inline comments.Aug 18 2020, 7:39 AM
clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
1228

Oh, this has to be fixed as well. at() should be fine, we messed something up really bad if the information we gathered from containing macro has no mention of __VA_ARGS__.

I can feel your pain.

The fundamental problem is, we simply can't ask Preprocessor what a macro expands into without hacking really hard.

Can you summarize what is the exact problem (or give a link to a discussion, etc)? Is it an architectural problem in Clang itself? Could we somehow refactor Clang and the Preprocessor to be usable for us? I mean LLVM and Clang has the mindset to build reusable components, the Preprocessor (and the Sema) should be one of them too, not just the AST.

clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
910

typo? injectRange ?

The fundamental problem is, we simply can't ask Preprocessor what a macro expands into without hacking really hard.

Can you summarize what is the exact problem (or give a link to a discussion, etc)? Is it an architectural problem in Clang itself?

I phrased myself poorly. The Preprocessor can tell what a macro usage immediately expands into (otherwise this project would have been practically impossible to implement), what it struggles to get done is show what a macro usage in the source code turns into in the preprocessed translation unit. As is often the case, macros may be nested into one another:

#define MACRO2(ptr) ptr = 0
#define MACRO1(x) MACRO2(x)

int *a = get();
MACRO1(a); // Step 1. Expand MACRO1.
           //        a.) Retrieve the tokens defined for MACRO1.
           //        b.) Resolve parameter x to argument a.
           // Step 2. Expand MACRO2...
*a = 5;

From this code snippet, should we be interested in what MACRO1(a) expands into, Preprocessor can pretty much only deliver on point 1a.) with any convenience. The problem here is that it was simply never engineered to be used outside of the preprocessing stage, so much so, that by the time I worked on this project about a decade after Preprocessor's inception, I still had to turn super trivial methods const. Other than that, the discussions I linked in the summary is pretty much all I can offer.

Could we somehow refactor Clang and the Preprocessor to be usable for us? I mean LLVM and Clang has the mindset to build reusable components, the Preprocessor (and the Sema) should be one of them too, not just the AST.

Working with the Preprocessor seems incredibly tedious and error prone -- mind that this is almost literally the first thing we implemented in Clang, and you can tell. Also, its performance critical, and any meaningful performance impact may need strong justification. While it would be beneficial to have the knowledge I'm crafting being integrated into the actual class, its hard to justify all the effort it would take to do so, especially now this project is so far along anyways. If I has to start from scratch, I would try to explore other approaches, but might still end up just doing what I did here.

To some extent, but this patch unfortunately doesn't solve that bug. The problem I think is that named variadic macros aren't supported at all.

clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
910

Yep.

steakhal requested changes to this revision.Aug 25 2020, 2:55 AM

This preprocessor expansion stuff is definitely not my expertise, nvm here is my review.

However, I observed some strange things happening in the test-cases, that's why I request changes.
I hope that I messed something up at the evaluation of your tests, but please have a closer look at them.

There were a few nits, but nothing serious.
I also liked the previous discussion about the reusable components, and I understand your decision going this way. I'm fine with that.
Keep up your good work. We need this.

clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
883

Optionally -> Additionally?

892–894

It should be more readable if you use tie. That way you can give names to the parts.

896–898

I'm always puzzled if I see a naked new.
Couldn't we use the assignment operator and std::make_unique here?

901

I don't like output parameters.
If we really need them, we should at least have a suspicious name.
Unfortunately, I can not come up with anything useful :|

1152–1163

By lexing one might think we use the actual lexer. Should we change this comment?

1358–1359

What does hashhash mean? I might lack some context though :D

clang/test/Analysis/plist-macros-with-expansion.cpp
479

You don't need an ending semicolon here. It will be already there at the expansion location.
This way you introduce an empty expression after the macro expansion.
The same happens in all the other cases as well.

482–484

Should we really abuse the division by zero checker here?
Can't we just use an ExprInspection call here?
Maybe it requires a specific BugPath visitor, and that is why we do it this way?

509

How did that comma appear there?
https://godbolt.org/z/4En3E5

515–522

This test case is also bad.
https://godbolt.org/z/89s48K

524–531
This revision now requires changes to proceed.Aug 25 2020, 2:55 AM
Szelethus planned changes to this revision.Aug 25 2020, 4:15 AM
Szelethus marked an inline comment as done.

Thanks! I'll get these fixed.

clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
901

This is intentional, it meant to replicate the Lexer's interface. I would prefer to keep this as-is.

1358–1359

# and ## respectively. The test cases you pointed out as flawed refer to this FIXME, though a FIXME in the tests themselves wouldn't hurt.

clang/test/Analysis/plist-macros-with-expansion.cpp
479

You are correct, though the point of macro expansion testing is to see whether we nailed what the preprocessor is supposed to do -- not whether the code it creates makes such sense. In fact, I would argue that most GNU extensions to the preprocessor shouldn't be a thing, but we still need to support it.

482–484

We could totally use ExprInspection -- but I'd argue that using something else isn't an abuse of the specific checker :) Since the entire file is already written this way, and would demand changes in the large plist file, I'd prefer to keep it this way.

I'm not sure about the status of this patch.
If you say that further improvements will be done later and this functionality is enough, I'm fine with that.

clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
901

Sure, be it.

1358–1359

Maybe HashtagHashtag?
Or an example would be even better like: ##__VA_ARGS__

clang/test/Analysis/plist-macros-with-expansion.cpp
479

Oh, now I get it.
I didn't know that this was ann extension lol.

482–484

Perfectly fine. I agree with you knowing this. Thanks.

Szelethus marked an inline comment as done and an inline comment as not done.Aug 25 2020, 8:45 AM

I'll get the nits I didn't object to fixed, thats the status you're looking for :)

clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
1358–1359

If you look a few lines down, you can see that its not up to us to choose this one :^)

Szelethus updated this revision to Diff 288452.Aug 27 2020, 1:59 PM
Szelethus marked 12 inline comments as done.

Fixes according to reviewer comments!

Szelethus added inline comments.Aug 27 2020, 2:00 PM
clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
896–898

Wait, isn't it naked if its not surrounded by smart pointer stuff? In any case, explicit calls to operator new and delete are indeed discouraged by the core guidelines.

1152–1163

I see what you mean, but this is why its phrased as a rough idea, not an in-depth step-by-step description of the process covering corner cases -- not in this comment, at least. Functionally, you could argue that we're lexing even if we get tokens from the injected range.

martong added inline comments.Aug 28 2020, 5:51 AM
clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
896–898

It is not enough to call a smart pointer's constructor with a result of a naked new. Because after the allocation of the object, the object's constructor itself could throw (well, not in LLVM :D) and this could happen before acquiring the ownership by the smart pointer, bumm, we have a leak.

The fundamental problem is, we simply can't ask Preprocessor what a macro expands into without hacking really hard.

Can you summarize what is the exact problem (or give a link to a discussion, etc)? Is it an architectural problem in Clang itself?

I phrased myself poorly. The Preprocessor can tell what a macro usage immediately expands into (otherwise this project would have been practically impossible to implement), what it struggles to get done is show what a macro usage in the source code turns into in the preprocessed translation unit. As is often the case, macros may be nested into one another:

#define MACRO2(ptr) ptr = 0
#define MACRO1(x) MACRO2(x)

int *a = get();
MACRO1(a); // Step 1. Expand MACRO1.
           //        a.) Retrieve the tokens defined for MACRO1.
           //        b.) Resolve parameter x to argument a.
           // Step 2. Expand MACRO2...
*a = 5;

From this code snippet, should we be interested in what MACRO1(a) expands into, Preprocessor can pretty much only deliver on point 1a.) with any convenience. The problem here is that it was simply never engineered to be used outside of the preprocessing stage, so much so, that by the time I worked on this project about a decade after Preprocessor's inception, I still had to turn super trivial methods const. Other than that, the discussions I linked in the summary is pretty much all I can offer.

Could we somehow refactor Clang and the Preprocessor to be usable for us? I mean LLVM and Clang has the mindset to build reusable components, the Preprocessor (and the Sema) should be one of them too, not just the AST.

Working with the Preprocessor seems incredibly tedious and error prone -- mind that this is almost literally the first thing we implemented in Clang, and you can tell. Also, its performance critical, and any meaningful performance impact may need strong justification. While it would be beneficial to have the knowledge I'm crafting being integrated into the actual class, its hard to justify all the effort it would take to do so, especially now this project is so far along anyways. If I has to start from scratch, I would try to explore other approaches, but might still end up just doing what I did here.

To some extent, but this patch unfortunately doesn't solve that bug. The problem I think is that named variadic macros aren't supported at all.

Thanks, for the detailed explanation, makes it easier to understand the reasons!

clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
1227

Why do we have to push back the tokens in case of __VA_ARGS? And what is in PrevParamMap here. Is it possible that at can fail here? Perhaps an example could make this hunk way easier to understand. To be honest, this hunk is a mystique for me in this form.

Szelethus updated this revision to Diff 290290.Sep 7 2020, 7:51 AM

Added some documentation to the code snippet pointed out by @martong.

Szelethus marked 3 inline comments as done.Sep 7 2020, 7:52 AM

Perfectly clear, thank you. However, I would still rely on the others to accept this :|

BTW why does the plist-macros-with-expansion.cpp.plist change? It makes the diff somewhat noisy :s

Perfectly clear, thank you. However, I would still rely on the others to accept this :|

BTW why does the plist-macros-with-expansion.cpp.plist change? It makes the diff somewhat noisy :s

Well, I removed a line, so every other entry about file position is changed in the plist file. I think I could just remove the entire thing altogether, its not like the actual plist output is what we're looking for, at least not in its many-thousand-line entirety.

martong accepted this revision.Sep 8 2020, 6:11 AM

LGTM! Thanks for the clarification and the example you gave.
(I agree with @steakhal and I wish if we could get rid of the many lines not-descriptive plist stuff, but that is rather unrelated)

This revision was not accepted when it landed; it landed in state Needs Review.Sep 11 2020, 5:08 AM
This revision was automatically updated to reflect the committed changes.