This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] In getSVal() API, disable auto-detection of void type as char type.
ClosedPublic

Authored by NoQ on Oct 11 2017, 7:42 AM.

Details

Summary

In D38358, we ended up believing that reading the first byte of the void pointer is not the intended behavior of ProgramState::getSVal(Loc) when the type of the loaded value is not specified/hinted by the user and gets auto-detected as void. Therefore, instead of replacing the auto-detected void type with char and returning the first byte, fail with an easy-to-understand assertion to warn the user that it's better to specify the type explicitly in his case (even if he'd have to specify ACtx.CharTy anyway).

Additionally, allow specifying the type in the ProgramState::getSVal(const MemRegion *) override, because with the new assertion, a new use-case immediately shows up that requires such change. And, generally, it's a good idea to be able to specify the type here.

Diff Detail

Repository
rL LLVM

Event Timeline

NoQ created this revision.Oct 11 2017, 7:42 AM
xazax.hun edited edge metadata.Oct 30 2017, 4:01 AM

I like the idea of making ProgramState::getSVal(const MemRegion *) symmetric to ProgramState::getSVal(Loc).

Just some philosophical questions which should probably be addressed in the future:
But also I wonder, should we maintain all these overloads? I mean, a lot of the APIs accept both SVal/Loc and MemRegion *. Usually, one of the implementations will end up just calling the other after some wrapping/unwrapping. Maybe ripping MemRegion * from the APIs would make it easier to use? E.g.: CallAndMessageChecker just unwraps the region from an SVal just to wrap it back again within the API call. However, we tend to use MemRegion * when storing info in the program state.

Otherwise LGTM!

include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
307 ↗(On Diff #118616)

The whitespace here is a bit off. I know it is not related to this patch, but this could be fixed if we already touch this file.

NoQ added a comment.Oct 30 2017, 4:34 AM

Yeah, cleaning up this API would be great - as long as everybody loves to have the API broken and rewrite stuff.

If we are to simplify some APIs, i'd definitely start with getAsSymExpr()/getAsSymbol()/getAsSymbolicExpression().

Essentially we only need getSVal(MR) because only MR can be bound to, while other types of Loc cannot be bound to. Technically, only (base MR, offset) pairs can be bound to, but this is "implementation detail" that we're still trying to keep that way (even though it leaks like hell everywhere). So i'd rather think of the Loc overload as of a convenient wrapper (i.e., the user receives Loc in some checkLocation callback, so he wants to put it directly into the API to find his binding). And people would still need to operate with regions from time to time (eg., construct a sub-region manually when we expect it to be there, or strip casts).

Also the overload between getSVal(Ex, LC) and getSVal(R, Ty), when they do completely different things, seems confusing.

NoQ edited the summary of this revision. (Show Details)Oct 31 2017, 7:40 AM
dcoughlin accepted this revision.Nov 10 2017, 11:20 AM

LGTM. Thanks!

In D38801#910524, @NoQ wrote:

Also the overload between getSVal(Ex, LC) and getSVal(R, Ty), when they do completely different things, seems confusing.

Yes! I think we should change this (possibly rename them both?). I think it is a big barrier to entry -- I certainly found it super confusing when first getting started on the analyzer code base.

This revision is now accepted and ready to land.Nov 10 2017, 11:20 AM
NoQ updated this revision to Diff 125555.Dec 5 2017, 9:36 AM

Rebase on top of D39862 (attn. George!~).

This revision was automatically updated to reflect the committed changes.