This is an archive of the discontinued LLVM Phabricator instance.

[patch] Adding Consumed Analysis to Clang
ClosedPublic

Authored by chriswailes on Jul 29 2013, 6:20 PM.

Details

Reviewers
dblaikie
delesley
Summary

This patch adds the beginnings of an analysis that tracks the "consumed" state of objects (this was previously mentioned on the list as uniqueness analysis).

The analysis currently provides enough functionality to detect when a class like std::unique_ptr is dereferenced after it is created with a default constructor. It can also track the state of an object across the branches of an if-statement if the statement contains a single test of a tracked variable. Lastly, the state of a value that is passed as a RValue References to a function is correctly tracked.

Future work, to follow in the next couple of weeks, will include more precise handling of for- and while-loops, support for member functions that consume an object, and tracking of state across function boundaries.

Diff Detail

Event Timeline

chriswailes updated this revision to Unknown Object (????).Jul 29 2013, 6:55 PM

Changed the code to correctly test for UO_LNot instead of UO_Not, and removed a leftover print statement. Added a test cast for the negation of a test.

Here's a few ideas, mostly micro details. I haven't quite got the macro design sufficiently in my head to respond to that yet.

include/clang/Analysis/Analyses/Consumed.h
40
57

Might want to run clang-format over the code, it'll add things like the space for * here (to become "const VarDecl *")

63

Drop the section headers like this - they're not really done in LLVM code so far as I know.

73

Drop 'void' from the parameter list. This isn't necessary in C++ & not prevalent in the LLVM codebase. (see comment below about removing unnecessarily explicit ctor implementations)

74

I assume this should be a const reference parameter (non-const copy constructors are weird)

Better than that - can you just not define either constructor & rely on the defaults? That way you should hopefully get a move ctor for free. (OK, granted DenseMap would have to be movable for that - but that's someone else's problem unless you want to make it yours)

80

Another 'void' in the param list. Consider this general feedback & I won't call out any other instances I see.

include/clang/Basic/DiagnosticSemaKinds.td
2175

Notice that other diagnostics don't begin with an uppercase - they're sentence fragments, not sentences. Rephrase these diagnostics in a similar manner.

If possible, it'd be nice not to talk about "states" explicitly. Perhaps:

"attempt to use '%0' on a consumed object '%1'" ? Open to other wordsmithing...

lib/Analysis/Consumed.cpp
53

more big headers we don't tend to write - file-local functions are basically by definition 'helper functions'.

61

Clang-format should tidy this up, probably dropping the empty lines & maybe one-lining the case+returns (& perhaps outdenting the cases to line up with the switch - at least I think that's LLVM's preferred style)

104

Could do this in the ctor init list with "ModeStack(1, Disabled)" but I don't know if that's much/any better.

Or you could call "reset" to keep the functionality in one place.

150

Could this ever be less than Call->getNumArgs? (perhaps with variadic functions?)

191

consider having a short return with an inverted condition here, then outdenting the body of this instead of having it all inside the 'if'

196

FIXMEs are by definition deferred, so you don't really need "(Deferred)".

200

Another possible early return.

202

& a 3rd

362

(*x).y -> x->y

368

(*x).y -> x->y

383

(*x).y -> x->y

385

Else after return (just write it as "if (x) return; return;" rather than "if (x) return; else return;" )

398

Do you need to erase/insert just to change the value? You should just be able to do it with "I->second = Unknown";

(oh, you'd need the iterators to be non-const iterators)

403

Looks like 'Other' is unconditionally dereferenced and thus cannot be null - perhaps it should be be passed by const reference instead of pointer?

406

For a short scope iterator, consider just calling it 'I'

409

Not strictly unacceptable, but consider not copying this local and just using I->first or I->second. For a short loop I personally find it easier to read as I can quickly spot the "this is the current element" code" when it directly uses the iterator. Open to disagreement/other ideas, though.

413

Unnecessary { on one expression block

415

Unnecessary blank line

417

Could make this one line (& drop the braces) by calling setState?

426

Map[Var] = State; should suffice for this function

444

I'd consider moving this declaration down to just before the loop so it's clearer what it's for (& since it's not needed for the intervening line of code - so this would adhere to the general principle of least scope)

456

It'd be more idiomatic to name this variable "RD".

456

There's a function further down ( ConsumedAnalyzer::run) that starts with a dyn_cast_or_null and then an "if (!D) return" - this seems tidier than nesting the whole body of the function. Consider doing the same here.

459

Microoptimization:

Rather than find+insert, if you use insert here (with a default value for the boolean 'value'), test the second element of the pair returned and then populate the value if it was newly inserted. Something like this might be nice:

std::pair<CacheMapType::iterator, bool> Entry = ConsumableTypeCache.insert(std::make_pair(RD, false));

if (Entry.second)

Entry.first->second = hasConsumableAttributes(*RD);

return Entry.first->second;

(where the loop over the CXXRecordDecl's methods and attributes is pulled out into a 'hasConsumableAttributes' function)

462

-> rather than (*x).y

462

remove {} from one line block

489

Is this fixme still valid? I remember you spoke to me about this issue & thought you'd ended up punting on this & classifying loop blocks as causing an "unspecified" state.

514

Trailing blank line in a block seems strange (& there's a few more blank lines than I'd expect here - could be wrong though. Again, whatever clang-format does, I'm OK with (though I'm not sure it touches blank lines, in which case I'll express some preferences))

524

This indentation isn't really common in LLVM - clang-format should choose appropriately.

531

Unnecessary {}. Once you remove those, clang-format should choose whether to put the return on the same line as the 'if' or a separate line. I forget which way it goes.

566

Any idea how bad this'll hurt the compiler for large functions? I assume we only track variables with these annotations - so we're not going to kill the compiler tomorrow since nothing is annotated, right?

575

Should this case just be moved to the end of this if/else if chain and just be an unconditional 'else'?

(& what happens if there are more than 2 successors? (switch block?))

596

An owning smart pointer would be nice - perhaps CurrStates should be an OwningPtr<T>?

Will this lead to a double-delete when you cleanup the contents of BlockInfo (hmm, don't seem to be cleaning those up at all)? I'd like a bit more clear ownership (perhaps keeping the ConsumedStateMaps directly in the BlockInfo, rather than pointers)

642

just "return Method->hasAttr<TestsUnconsumedAttr>();" instead? Or perhaps skip the function and make this call directly at the callsites if it's sufficiently legible.

lib/Sema/SemaDeclAttr.cpp
1096

not -> !

(& several more)

1109

Comment (& the one up at 1009) seem a bit unnecessary. The function name is self explanatory.

1155

More one-statement blocks with {}

test/SemaCXX/warn-consumed-analysis.cpp
34

I think we should drop the "unnecessary test" warning for now. Feel free to leave a comment where it would go (probably not even to the level of FIXME, just "A warning could be emitted here about a constant test, but care should be taken regarding things such as macro expansions & templates.")

36

I wonder whether there's a way we could make these diagnostics more legible for the unique_ptr case. Could we have some verbs in the attributes? (I guess at that point we'd need a class level attribute for a consumable type - which would make it easier to decide whether a type was consumable (save you the member search). Is there any other way in which we'd benefit from/use a class level attribute?)

46

Line wrapping here is strange (comment, newline escape, then another comment starter?). Generally we don't worry about 80 cols in tests, so you can just remove the line wrapping.

74

As we discussed - unknown state objects should allow all operations (operations allowable only on valid or only on invalid objects)

test/SemaCXX/warn-consumed-parsing.cpp
11

Weird line wrapping again. If you'd really like to put these on separate lines you can use the @-1 notation (check other tests for examples - I think that's the syntax) to refer to lines by offset, rather than the current line.

If you're going to do that, I'd consider putting the comment above the line, rather than below.

Overall, looks good. I'm letting David comment on style issues, and I agree with his other concerns. My comments are specific to the tracking algorithm.

lib/Analysis/Consumed.cpp
284

I don't think this should be a recursive visitor, or possibly even a visitor at all. You are only looking for very specific patterns in the AST, rather than visiting every sub-expression.

413

Note that when merging two maps, if a variable is defined in one map, but not the other, then the variable is no longer in scope. In that case the variable should be removed from the map, rather than added.

492

// TODO: Handle variable definitions, e.g. bool valid = x.isValid(); if (valid) ...; ...;

575

This case should be a general succ_size() > 1 case.

577

Be careful when reusing CurrStates in a new block. You need to remove the pointer from BlockInfo, otherwise a subsequent back-jump to the current block will end up modifying the variable map for some successor to the current block.

582

You need to distinguish here between a successor to a block that you've already processed (which is jump back to the beginning of a loop) and a successor that you haven't yet processed. In the case of backward jump, you shouldn't do an ordinary merge, you should just check that the assumptions at the end of the loop match those on entry.

596

As David mentions, you need to write a destructor for BlockInfo; this is a memory leak.

chriswailes updated this revision to Unknown Object (????).Aug 6 2013, 6:39 PM

Rewrote the visitor due to the behavior of the CFG traversal. Responded to feedback.

delesley added inline comments.Aug 7 2013, 11:08 AM
include/clang/Analysis/Analyses/Consumed.h
68

CFG blocks are numbered from 0 to N, so you can save both time and memory by using a vector indexed by the block number, rather than a hash table on the pointer.

71

Again, this should be a BitVector, indexed by the block ID, rather than a set.

lib/Analysis/Consumed.cpp
146

Use getOverloadedOperator() == OO_Equal, rather than string comparisons.

467

Indent by 4?

chriswailes updated this revision to Unknown Object (????).Aug 7 2013, 7:20 PM

Switched from using DenseSet and DenseMap to a BlockSet and a dynamically allocated array. Moved from using a RecursiveASTVisitor to a StmtVisitor for ConsumedStmtVisitor.

chriswailes updated this revision to Unknown Object (????).Aug 9 2013, 4:50 PM

Made the code link under release build.

dblaikie accepted this revision.Apr 5 2014, 1:41 PM

This was committed in r188206

dblaikie closed this revision.Apr 5 2014, 1:41 PM