Page MenuHomePhabricator

[analyzer][CrossTU] Extend CTU to VarDecls with initializer
ClosedPublic

Authored by r.stahl on May 4 2018, 1:04 AM.

Details

Summary

The existing CTU mechanism imports FunctionDecls where the definition is available in another TU. This patch extends that to VarDecls, to bind more constants.

  • Add VarDecl importing functionality to CrossTranslationUnitContext
  • Import Decls while traversing them in AnalysisConsumer
  • Add VarDecls to CTU external mappings generator
  • Name changes from "external function map" to "external definition map"

Diff Detail

Repository
rC Clang

Event Timeline

r.stahl created this revision.May 4 2018, 1:04 AM

Tests and doc fixes pending. First I'm interested in your thoughts to this change.

It allows to bind more symbolic values to constants if they have been defined and initialized in another TU:

use.c:

extern int * const p;
extern struct S * const s;

def.c:

int * const p = 0;
struct S * const s = { /* complex init */ };

Hi Rafael! I like the change.

lib/CrossTU/CrossTranslationUnit.cpp
399

Are we always sure that Import() returns non-null value of appropriate type? If no, we have to check the result after applying dyn_cast_or_null.

r.stahl updated this revision to Diff 146609.May 14 2018, 7:45 AM
r.stahl marked an inline comment as done.
  • added tests
  • updated doc and var naming
  • addressed review comment

The analyzer can only reason about const variables this way, right? Maybe we should only import the initializers for such variables to have better performance? What do you think?

Also, I wonder what happens with user-defined classes. Will the analyzer evaluate the constructor if the variable is imported?

include/clang/CrossTU/CrossTranslationUnit.h
129

I wonder if we need these overloads. Maybe having only the templated version and having a static assert for the supported types is better? Asking the other reviewers as well.

155

Same question as above.

lib/CrossTU/CrossTranslationUnit.cpp
161

Functions should start with lowercase letter. Maybe these could be static?

200

Fn in the name refers to a function, maybe that could be changed as well.

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
350

Will this work for zero initialized globals (i.e. not having an initializer expression) that are defined in the current translation unit? It would be great to have a test case for that.

xazax.hun added inline comments.May 14 2018, 11:04 PM
include/clang/CrossTU/CrossTranslationUnit.h
132

Adding briefs are unnecessary, see https://reviews.llvm.org/rL331834

r.stahl updated this revision to Diff 146789.May 15 2018, 5:36 AM
r.stahl marked 3 inline comments as done.
r.stahl edited the summary of this revision. (Show Details)

I looked through the original patches and found quite a few more occurrences of "function map" and renamed them - including the tool "clang-func-mapping". There is one comment "by using the 'clang-extdef-mapping' packaged with Xcode (on OS X)". Is this implicit from the install targets that the tool name changed or do I have to inform someone?

r.stahl added inline comments.May 15 2018, 5:37 AM
include/clang/CrossTU/CrossTranslationUnit.h
129

They are not required, but I thought they make the interface more clear and help prevent implementation in headers.

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
350

If a variable has a definition in the current TU, it may not have another one else where, but you're correct that this case will continue here. It will however only result in a failed lookup.

Similar but backwards: If an extern var is defined in another TU with global zero init.

Overall I'd say that default initializing a constant to zero is pretty uncommon (and invalid in C++), so that in my opinion it's fine to not support it for now. It seems like there is no transparent way to get the initializer including that case or am I missing something?

george.karpenkov resigned from this revision.May 30 2018, 4:27 PM
george.karpenkov added a subscriber: george.karpenkov.

Sorry for the limited activity. Unfortunately, I have very little time reviewing patches lately.
I think we need to answer the following questions:

  • Does this change affect the analysis of the constructors of global objects? If so, how?
  • Do we want to import non-const object's initializer expressions? The analyzer might not end up using the value anyways.
  • How big can the index get with this modification for large projects?

Overall the direction looks good to me, this will be a very useful addition, thanks for working on this.

Sorry for the limited activity. Unfortunately, I have very little time reviewing patches lately.

Thanks for getting around to it!

I think we need to answer the following questions:

  • Does this change affect the analysis of the constructors of global objects? If so, how?

The intention was to only allow the analyzer to resolve constants of arrays, structs and integral values. It seems to me that construction would have to be symbolically executed at the correct point in time and cannot be retroactively looked up - as is done with the constants here.

  • Do we want to import non-const object's initializer expressions? The analyzer might not end up using the value anyways.

It seems like a good idea to not do that, since non-const values are not used. It might become useful if we ever do some kind of straight line execution from static initialization to main.
However for structs it is enough if one of their fields is declared const.

  • How big can the index get with this modification for large projects?

That depends a lot on global usage of course. In LLVM for example there is little impact, but I don't have any numbers, it's taking too long. Taking care of the previous point helps here as well.

Overall the direction looks good to me, this will be a very useful addition, thanks for working on this.

I will prepare something to not store unnecessary entries in the index. I also noticed the AnalysisConsumer change has gone missing with my latest diff - will be fixed.

r.stahl updated this revision to Diff 149959.Jun 5 2018, 5:47 AM
  • add missing AnalysisConsumer diff
  • only collect const qualified vardecls in the extdef index and adjust test

Please let me know if there is anything else that should be addressed.

Note that even though this diff is quite large, most is just renaming things to stay consistent.

whisperity edited the summary of this revision. (Show Details)Dec 14 2018, 5:30 AM
whisperity added a reviewer: martong.

Sorry for the delay. The changes are looking good to me, although I think it might be worth to split this up into two patch, one NFC with the renaming, and one that actually introduces the changes.

@martong could you also take a look?

First, we would also like to test this internally. just to make sure it works as intended. Could you rebase the path to current top of tree?

Thanks!

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
346

Maybe we should also opt out for non-const VarDecls to avoid even attempting to import the initializer if we will not find it anyways.

Hi Rafael,

This is a great patch and good improvement, thank you! I had a few minor comments.
(Also, I was thinking about what else could we import in the future, maybe definitions of C++11 enum classes would be also worth to be handled by CTU.)

include/clang/CrossTU/CrossTranslationUnit.h
129

I consider the new overload just great. Later, if we want we still can extend the interface with a template which would call the overloaded functions.

lib/CrossTU/CrossTranslationUnit.cpp
164

The naming here is confusing for me, would be better to be hasInit, because there are cases when a VarDecl has an initializer but it is not a definition:

struct A {
  static const int a = 1 + 2; // VarDecl: has init, but not a definition
};
const int A::a; // VarDecl: definition

In the above case we probably want to import the initializer and we don't care about the definition.

167

hasDefinitionOrInit ?

test/Analysis/Inputs/ctu-other.cpp
86

Could we have another test case when S::a is static?

test/Analysis/func-mapping-test.cpp
18

Could we have one more test when we have a static member variable?

NoQ added a comment.Dec 14 2018, 12:02 PM

It seems like a good idea to not do that, since non-const values are not used. It might become useful if we ever do some kind of straight line execution from static initialization to main.
However for structs it is enough if one of their fields is declared const.

Aaand in C++ there's also the mutable keyword that can cancel the effect of the surrounding const keyword, at least for the purposes of precise memory contents modeling in RegionStore.

The idea looks great to me. It was far from obvious to me that importing variables manually was necessary, nice catch.

I definitely wish for a more direct test for this, i.e. "CTU analysis avoids that specific false positive due to the new functionality".

r.stahl updated this revision to Diff 181263.Jan 11 2019, 6:56 AM
r.stahl marked 12 inline comments as done.

Strip name changes (see D56441); addressed review comments

In my old version I seemed to get away with the tests, but they failed after rebasing. It seems like earlier there was only a single VarDecl for the imported ones with InitExpr. Now after importing there is one without init and a redecl with the init. This is why I changed getInit() in RegionStore to getAnyInititializer. I think these three should be enough, but I'm not sure where else in the analyzer this would have to be changed.

I also noticed that nested struct inits don't work, but this is unrelated to this patch, so I commented out the test for now and will send a separate patch.

lib/CrossTU/CrossTranslationUnit.cpp
167

I simply made it hasBodyOrInit now to be as clear as possible

NoQ added a comment.Jan 28 2019, 5:16 PM

In my old version I seemed to get away with the tests, but they failed after rebasing. It seems like earlier there was only a single VarDecl for the imported ones with InitExpr. Now after importing there is one without init and a redecl with the init. This is why I changed getInit() in RegionStore to getAnyInititializer. I think these three should be enough, but I'm not sure where else in the analyzer this would have to be changed.

Hmm, that is actually pretty interesting and sounds pretty bad: it seems that we are constructing a VarRegion with a non-canonical VarDecl. In other words, it turns out that VarRegion that is constructed for a DeclRefExpr may depend on which of the re-declarations of the variable does DeclRefExpr refer to. Which means that we potentially construct two different VarRegions for the same variable during analysis. It should always refer to the canonical declaration (which should, as far as i understand, be the one that has an initializer on it, if any).

The problem can be easily demonstrated by adding assert(d->isCanonicalDecl() to DeclRegion's constructor and running tests (a few should fail, including ctu-main.c), though i'd rather prefer the constructor to automatically canonicalize the declaration - which statically ensures that this sort of stuff doesn't happen (for a tiny performance cost). I also wonder if DeclRefExpr in a fully constructed AST should always point to the canonical declaration - maybe that's the thing that's broken.

At the same time, i don't have any test cases for the actual change in behavior that such canonicalization causes. If the test case that you had in mind is indeed demonstrating this problem, i'd love to have it. If it turns out that your test case doesn't allow us to demonstrate the problem without CTU, then probably it has something to do with ASTImporter accidentally canonicalizing the the declaration in DeclRefExpr more rarely than the vanilla AST.

In D46421#1374807, @NoQ wrote:

At the same time, i don't have any test cases for the actual change in behavior that such canonicalization causes. If the test case that you had in mind is indeed demonstrating this problem, i'd love to have it. If it turns out that your test case doesn't allow us to demonstrate the problem without CTU, then probably it has something to do with ASTImporter accidentally canonicalizing the the declaration in DeclRefExpr more rarely than the vanilla AST.

This seems unrelated to CTU. The following subset of my test demonstrates this:

// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify %s

void clang_analyzer_eval(int);

extern const int extInt;

int main()
{
    clang_analyzer_eval(extInt == 2); // expected-warning{{TRUE}}
}

extern const int extInt = 2;
Breakpoint 1, (anonymous namespace)::RegionStoreManager::getBindingForVar (this=0xa7b420, B=..., R=0xa7d348)
    at /data/work/commitllvm/llvm/tools/clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1948
1948        if (const Expr *Init = VD->getAnyInitializer()) {
(gdb) p VD->getInit()
$1 = (const clang::Expr *) 0x0
(gdb) p VD->getAnyInitializer()
$2 = (const clang::Expr *) 0xa4b630
In D46421#1374807, @NoQ wrote:

At the same time, i don't have any test cases for the actual change in behavior that such canonicalization causes. If the test case that you had in mind is indeed demonstrating this problem, i'd love to have it. If it turns out that your test case doesn't allow us to demonstrate the problem without CTU, then probably it has something to do with ASTImporter accidentally canonicalizing the the declaration in DeclRefExpr more rarely than the vanilla AST.

This seems unrelated to CTU. The following subset of my test demonstrates this:

// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify %s

void clang_analyzer_eval(int);

extern const int extInt;

int main()
{
    clang_analyzer_eval(extInt == 2); // expected-warning{{TRUE}}
}

extern const int extInt = 2;
Breakpoint 1, (anonymous namespace)::RegionStoreManager::getBindingForVar (this=0xa7b420, B=..., R=0xa7d348)
    at /data/work/commitllvm/llvm/tools/clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1948
1948        if (const Expr *Init = VD->getAnyInitializer()) {
(gdb) p VD->getInit()
$1 = (const clang::Expr *) 0x0
(gdb) p VD->getAnyInitializer()
$2 = (const clang::Expr *) 0xa4b630

I know you probably tired of me making you split all the patches but I think if it is independent of CTU we should submit this fix as a separate commit. This way we could be more focused having one commit doing one thing.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 6:36 AM

Thank you for working on this!

I have two concerns:

  • The logic when should we import the initializer of a VarDecl is duplicated. I think to have better maintainability it would be better to have only one implementation of the decision in a function and call it multiple times.
  • While the static analyzer is currently the only user of CTU lib it might change in the future. Only importing initializers of const variables is an analyzer specific decision which might not be good for other potential users of the CTU library. Maybe it would be better to be able to configure the ClangExtDefMapGen somehow, so it is able to both export only vardecls that are required by the analyzer or anything?
lib/CrossTU/CrossTranslationUnit.cpp
165

The != nullptr part is usually not written in LLVM codebase.

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
356

I wonder what would happen with types that has const fields and user written constructors. In case we will not simulate the effect of the constructor and will not be able to set the const fields maybe we should exclude those as well?

361

Maybe this check could be hoisted so we do not need to repeat in both branches?

368

Redundant != nullptr.

tools/clang-extdef-mapping/ClangExtDefMapGen.cpp
68

Variables should start with a capital letter.

Thanks for the comments! All good points. Nice idea with the constructor, but that can probably happen in a follow up patch.

I know you probably tired of me making you split all the patches but I think if it is independent of CTU we should submit this fix as a separate commit. This way we could be more focused having one commit doing one thing.

No problem if I should do that. However I'm concerned whether this change is correct. Previously this was not required since all VarDecls were canonical. Not sure if this change was intended. I did some digging, but am not familiar enough with the code base to figure out what changed. Does anyone have an idea about this?

But I agree that if it wasn't a deliberate change, we could canonicalize in the DeclRegion constructor - or VarRegion since only those are redeclarable.

Previously this was not required since all VarDecls were canonical. Not sure if this change was intended. I did some digging, but am not familiar enough with the code base to figure out what changed. Does anyone have an idea about this?

We changed the ASTImporter a few months ago to import the redecl chains properly. That means we do not squash ever decl into one decl. Also we link a newly imported decl to an already existing (found by the lookup). We allow only one definition of course. In case of variables we allow only one initializer. This way we can preserve the original structure of the AST in the most precise way.
I don't think this could be a problem, since we may have redecl chains of variables (and other kind of decls) in a single TU too.

Fxd the global variable problem in D57619, thanks! I think you should change back to getInit() once D57619 lands.

I think you should change back to getInit()

I am not entirely sure about this because the initalizer may not be attached to the canonical decl. getInit() gives the initializer of one given specific Decl, however, getAnyInitializer() searches through the whole redecl chain.

NoQ added a comment.Feb 26 2019, 12:58 PM

I think you should change back to getInit()

I am not entirely sure about this because the initalizer may not be attached to the canonical decl. getInit() gives the initializer of one given specific Decl, however, getAnyInitializer() searches through the whole redecl chain.

Ugh, indeed. getDefinition() is also not the thing we're looking for. What a mess :) Tests are welcome!~

P.S. I don't intend to block this patch; you guys know your stuff, i'm just displaying a bit of curiosity and observing how it unfolds.

Okay so I would suggest to go ahead and commit this. Rebased it succeeds without modification.

Still leaves the open problems with the redecls. Should I add the failing test from https://reviews.llvm.org/D46421#1375147 in the same commit marked as expected to fail? Or what is the procedure here?

Okay so I would suggest to go ahead and commit this. Rebased it succeeds without modification.

Still leaves the open problems with the redecls. Should I add the failing test from https://reviews.llvm.org/D46421#1375147 in the same commit marked as expected to fail? Or what is the procedure here?

My last set of comments are also unresolved. Most of them are minor nits, but I would love to get rid of the code duplication between ClangExtDefMapGen and the Clang Static Analyzer regarding when do we consider a variable worth to import. Otherwise the patch looks good to me.

r.stahl marked 3 inline comments as done.Fri, Apr 5, 9:13 AM

My last set of comments are also unresolved. Most of them are minor nits, but I would love to get rid of the code duplication between ClangExtDefMapGen and the Clang Static Analyzer regarding when do we consider a variable worth to import. Otherwise the patch looks good to me.

Sorry, was too long ago, thought this was done!

What would be a good place for this? I'm thinking ExprClassification in the AST lib. There is a tail in IsModifiable that tests the CanQualType for const that could be moved out. And then make a new function bool ContainsConst(const VarDecl *VD) mabye? Shouldn't that be a new patch, because it touches the more critical AST lib?

I cannot think of other users, so I would prefer to put it in the CTU lib for now.

r.stahl updated this revision to Diff 194285.Tue, Apr 9, 4:15 AM
r.stahl marked 4 inline comments as done.
  • addressed review comments
  • created cross_tu::containsConst
  • fixed bug that unions were not exported
  • added TODO test for constructor case
r.stahl added inline comments.Tue, Apr 9, 4:17 AM
lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
356

I added a test for that and it doesn't seem to work. It just ends up as unknown which is fine, right? Eventually it would be nice to do that of course.

xazax.hun accepted this revision.Thu, Apr 11, 2:06 AM

I have one question, once it is resolved I am fine with committing this.

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
352

Do we actually need this branch? I would expect cross_tu::containsConst(VD, *Ctx) to return true for all the cases where we would return true here.

This revision is now accepted and ready to land.Thu, Apr 11, 2:06 AM
r.stahl updated this revision to Diff 196075.Mon, Apr 22, 8:58 AM

@xazax.hun good point and that actually fixes a bug since that branch should also do the deep check. Added that to the tests.

Looks good, thanks. Can you commit this or do you need someone to commit it on your behalf?

This revision was automatically updated to reflect the committed changes.