Page MenuHomePhabricator

[CFG] [analyzer] Heavier CFGConstructor elements.

Authored by NoQ on Jan 29 2018, 5:52 PM.



I'm planning to introduce some sort of CFGConstructor element, which would help the analyzer (and otherwise whoever enables it - i.e. i don't have plans to enable it for sema warnings) to understand how to perform construction of a C++ object without peeking at the next CFGElement(s) or ascending through ParentMap to the parent statements. Because we're doing the worklist thing rather than recursive traversal of the AST, we cannot easily guess, by looking at the CXXConstructExpr, where does it construct into, so having some sort of context seems to be extremely handy.

Because there are so many ways of constructing an object in C++, this element may eventually become quite beefy. For now in the analyzer we only have simple cases in the analyzer, such as construction into a single variable's DeclStmt or into CXXCtorInitializer or into CXXNewExpr; in these cases having just a single parent statement is enough - so the plan is to improve upon this ( Aggregate initialization expressions may contain multiple constructions at once, so for those we'd need extra data to figure out what part of the "whole thing" (element, field, base) is being constructed by the current constructor; in these cases the "whole thing" is not necessarily the initializer list as a temporary - it may still be, say, eventually turned into a plain variable, so this whole context would be expressed. Hopefully this'd also help us solve the problematic CXXDefaultArgExpr problem - where the constructor nested within it is in fact not even a part of the current function's AST, which screws a lot of our invariants.

So for now i propose to allow me to allocate extra data for CFGConstructor in the CFG's BumpPtrAllocator, so that i had enough room for incremental development without constantly thinking about bit manipulations; i eventually revert this decision if i notice that two pointers are indeed enough to hold all the data we need, but for now it seems highly unlikely. On the other hand, i'm fine with replacing it with a pair of regular AST node pointers until i actually need more room.

The auxiliary data class is called ConstructionContext (not prefixing it with CFG or whatever) because i don't plan on keeping it contained within CFG, but rather passing it around actively in the analyzer. In particular, i believe that ConstructionContext should replace CXXBindTemporaryExpr as a (more) unique identifier of a temporary object (for instance, with a backreference to the call site in the CXXDefaultArgExpr case). So, hopefully we'd even be able to use it during destruction.

CFGConstructor is not a sub-class of CFGStmt. It'd be much easier to use in the analyzer if it was a sub-class, but it'd make CFGStmt's isKind() slower, which will affect compilation time regardless of whether CFGConstructors are actually enabled.

The new CFGConstructor element takes place of the respective CFGStmt element whenever we managed to compute its respective ConstructionContext. In order to demonstrate how it works, i added computation of ConstructionContext for constructors into CXXNewExprs. This is far from enough for replacing existing mechanisms in the analyzer, but in a couple of patches i hope i'd be able to actually convert the analyzer to the new behavior. For now the analyzer remains pretty much untouched - just prepared to see the new element.

LiveVariables needed a slight fix as well. If my grep-fu doesn't fail me, it's only used by the analyzer. So only analyzer should be affected.

It might be possible to replace CFGAllocator completely with the new ConstructionContext (inline both the allocator and the constructor during evaluation of CFGConstructor), but i didn't try, and i'm not convinced that it's a good idea.

I made the constructor of CFGStmt and CFGInitializer explicit because it took me an hour to catch a bug when i tried to re-use ExprEngine::ProcessStmt() to process the construct-expression.

Diff Detail

rC Clang

Event Timeline

NoQ created this revision.Jan 29 2018, 5:52 PM
MTC added a subscriber: MTC.Jan 29 2018, 9:47 PM
NoQ updated this revision to Diff 132053.Jan 30 2018, 2:36 PM

Remove the stack of contexts for now. We're not using it anywhere yet, and i'm not sure if it'd be necessary.

I think it will be great to have a more explicit representation of construction in the CFG! Comments in line.


I think it would be good to add a comment saying what Trigger is.


Can you add a comment saying that this is only used by the analyzer's CFG?


A comment here would be helpful to future maintainers. For example, what does it mean for a construction context to be "Current"?


Is it possible that there is already a CurrentConstructionContext? Will this overwrite it?

Can you write an assertion to make sure that this doesn't happen?

Generally I would expect a context to have a well-defined start and end. I think we should be explicit about where we expect it to start and end so we can find violations of our expectations.


Please, let's try to avoid changes that are unrelated to the functionality being added.


I think this really needs to be a flag in AnalyzerOptions so you can keep the existing CFG tests for the behavior that other clients of the CFG rely on and add new tests with this flag explicitly set.


You don't seem to be using the constructor context for anything other than getting the CXXConstructExpr. Will a later patch require having a CFGConstructor here?

This seems like unnecessarily duplicated code. Can we just change ProcessStmt to take a Stmt * rather than a CFGStmt and use it for all statements including constructors?

NoQ updated this revision to Diff 132476.Feb 1 2018, 2:28 PM
NoQ marked 5 inline comments as done.

Address comments. Most importantly, separate out sema and analyzer CFG tests.


Yes, for now it'd be overwritten every time we meet a new constructor (this wasn't the case in the first version of this revision). However, the additional check (which should be eventually replaced with an assertion, but is for now needed) that we're picking the right context in CXXConstructExpr visit, together with only visiting every constructor once during CFG construction during normal visits (because weird stuff like CXXDefaultArgExpr isn't yet supported), guarantees that we're not doing anything wrong.

I should have cleaned this up, but i don't want to invest attention into that because subsequent patches will clearly make things way more complex than that, whenever we start dealing with a multitude of construction contexts or with partially-constructed contexts.

So for now it's a trivial kinda-safe solution.

I should document that though, for sure.


I decreased indent of this whole huge for() loop, so the blame is already screwed (but phabricator carefully hides that), so i think this is a valid part of the patch.


Hmm, well, yeah, i guess, there isn't much value in passing the CFGElement around as long as we can find the current CFGElement any time we want in ExprEngine->currBldrCtx.

But if not for that, it was the whole point to have the construction context there. Will fix.

NoQ updated this revision to Diff 132520.Feb 1 2018, 5:54 PM

Add more targeted tests.

NoQ updated this revision to Diff 132532.Feb 1 2018, 9:25 PM
NoQ marked an inline comment as done.

Cleanup construction contexts after they have been consumed. Add assertions to verify that our understanding of the context's lifespan is reasonable.

I will run the new mode on a large codebase before committing, to see if these assertions ever fail or if i'm breaking anything else.

Prettify ConstructionContext into a class with getters, write explicit constructors as pointed out in other patches.

NoQ updated this revision to Diff 132637.Feb 2 2018, 11:15 AM

Whoops! Actually use the new function with the assertion.

Also disable creating construction contexts when the flag is off.

NoQ added inline comments.Feb 2 2018, 11:18 AM

Fixed now - added the VisitForConstructionContext() wrapper which contains the assertion that checks that we're not overwriting any existing context.

The context is being cleaned up after it's consumed in appendConstructor().

dcoughlin accepted this revision.Feb 2 2018, 1:08 PM

This looks good to me, although as I noted inline I think both the name and the comment for VisitForConstructionContext() are confusing. If you can be more precise I think it would help future maintainers.


I find the name of this method very confusing. It sounds like you are visiting an AST node called a 'ForConstructionContext', but that is not what this function does. It is also not clear from either the name or the comment what the method is supposed to do. Can you be more precise?

You can fix this in a follow-up commit if you like.

This revision is now accepted and ready to land.Feb 2 2018, 1:08 PM
NoQ added inline comments.Feb 2 2018, 1:14 PM

Yeah i guess i just followed the VisitForTemporaryDtors pattern^^ Will fix.

NoQ updated this revision to Diff 132685.Feb 2 2018, 2:25 PM

Rename the method, add comments.

xazax.hun added inline comments.Feb 4 2018, 5:27 AM

Maybe I am getting this wrong, but I think in this case the members will be default initialized and will get indeterminate values.

Default initialization is performed in three situations:

  1. when a base class or a non-static data member is not mentioned in a constructor initializer list and that constructor is called.

The effects of default initialization are:

if T is a non-POD (until C++11) class type ...

if T is an array type, every element of the array is default-initialized;

otherwise, nothing is done: the objects with automatic storage duration (and their subobjects) are initialized to indeterminate values.


So this is one of the places where subclassing would help? Could you measure the compile time regression after making CFGStmt's isKind more complex?

NoQ updated this revision to Diff 132908.Feb 5 2018, 4:32 PM
NoQ marked an inline comment as done.

Fix uninitialized variables.

NoQ added inline comments.Feb 5 2018, 4:48 PM

Ew, right. I should be more careful (no idea how it works).


Yeah, any user of the new mode would be surprised that constructors are not statements, which is indeed annoying. But i wanted to play super safe.

I'm not seeing any compile time regressions yet after making isKind more complex - i.e. ran "clang sqlite3.c" release-without-asserts 100 times before and 100 times after, mean compilation time difference around 0.04% (in the bad direction, i.e. after-time is 0.04% worse than before-time), p-value (Welch's t-test) around 0.7. I guess i could probably run llvm compilation before and after for more coverage, but that wouldn't be statistically reliable.

Do we have any more reliable benchmarks?

NoQ updated this revision to Diff 133139.Feb 6 2018, 7:41 PM

Make CFGConstructor a sub-class of CFGStmt. I failed to notice any compile time regressions for now; if any, they must be super bottlenecked on CFG construction or analysis-based warnings. After poking @rsmith in the chat a little bit, we decided to go the easy way and make constructor a statement.

NoQ added a comment.Feb 8 2018, 1:24 PM

All right, so this change is indeed safe, i.e. no crashes or changes in analyzer behavior so far on my large codebase run. Will commit now.

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.