Page MenuHomePhabricator

[analyzer] pr28449 - Move literal rvalue construction away from RegionStore.
ClosedPublic

Authored by NoQ on Aug 27 2016, 2:48 PM.

Details

Summary

When binding string literal regions to char arrays, RegionStore's bindArray() method converts the string literals to their lazy compound values before binding. Bug https://llvm.org/bugs/show_bug.cgi?id=28449 shows that similar behavior is required for handling compound literals (brace initializers).

However, it seems illogical to me that RegionStore modifies the value it was asked to bind - it should be handled by the external code, and RegionStore's bind...() interface should perhaps only bind the given value to the given location, without improvising.

This patch conducts the necessary refactoring to avoid the issue. Now all compound values should be correct to begin with. The patch accidentally fixes the bug.

Additionally, this patch enables loading values from compound-literal regions - it was explicitly disabled, but the problem due to which it was disabled seems already resolved; made the respective tests stronger to see that it's actually working correctly.

Additionally, this patch tweaks the dump method of compound literal regions, addressing a FIXME there and making things prettier.

Didn't notice significant changes on large codebase runs - no new crashes, no changes in positives.

Diff Detail

Repository
rL LLVM

Event Timeline

NoQ updated this revision to Diff 69497.Aug 27 2016, 2:48 PM
NoQ retitled this revision from to [analyzer] pr28449 - Move literal rvalue construction away from RegionStore..
NoQ updated this object.
NoQ added a subscriber: cfe-commits.
NoQ added inline comments.Aug 27 2016, 2:50 PM
lib/StaticAnalyzer/Core/ExprEngineC.cpp
669 ↗(On Diff #69497)

Whoops, was a note to myself, didn't mean to leave it here, sorry. Will actually have a look and update.

dcoughlin edited edge metadata.Sep 9 2016, 11:26 AM

I agree that it is weird that region store modifies the value it was asked to bind and over all this seems like the right approach.

My big concern with this patch is that the logic that looks whether an lval is being stored into a non-reference location and dereferences it seems overly broad. I'm worried that this could paper over other issues by "fixing up" mismatches that are really the result of some other bug. Is there a way to make that logic more targeted? For example, if it only applies to initializing char arrays with string literals, that is a much more precise check for when to load from the lvalue.

lib/StaticAnalyzer/Core/ExprEngineC.cpp
536 ↗(On Diff #69497)

It seems like this comment ("We bound the temp obj region to the CXXConstructorExpr...") is incomplete/misleading (and has been for a while) because this code is executed when there is no constructor. Can it be updated to something more helpful?

665 ↗(On Diff #69497)

I am surprised Sema wouldn't have already added an implicit cast with an lvalue to rvalue conversion when needed. It seems in many cases it already does. Can you be more explicit in your comments about why this is needed and under what situations Sema doesn't handle this logic? Is initializing an array with a string literal the only time this is needed? If so, can the check be more targeted? I'll note that InitListExpr has an isStringLiteralInit() method and that CodeGen has a special case for it.

669 ↗(On Diff #69497)

It is probably also worth looking into what it would take to support unions here.

673 ↗(On Diff #69497)

If you are going to iterate through the initializer expressions in reverse, you also have to iterate through the field decls in reverse. Also, I think it is worth commenting why you iterate in reverse order here.

679 ↗(On Diff #69497)

This logic for checking with an expression is an lvalue and whether the type of the storage location is reference and loading from the lvalue if not is duplicated 3 times in this patch. Can you factor it out and give an meaningful name?

696 ↗(On Diff #69497)

I think it would good to add a test with array fillers if we don't already have them. For example:

int a[2000] = {1, 2, 3};
clang_analyzer_eval(a[0] == 1);
clang_analyzer_eval(a[1999] == 0);
699 ↗(On Diff #69497)

Should there be an assert here to make sure we have enumerated all the cases? I'm worried about the 'etc'. How will we know if we have not considered a case?

NoQ updated this revision to Diff 118625.Oct 11 2017, 8:25 AM

Because i didn't get back to this in a while, and similar crashes keep coming, i decided to leave this refactoring as a FIXME.

dcoughlin accepted this revision.Oct 11 2017, 11:51 AM

OK. Seems reasonable!

This revision is now accepted and ready to land.Oct 11 2017, 11:51 AM
This revision was automatically updated to reflect the committed changes.