While extending the GenericTaintChecker for my purposes I'm hitting a case where taint is not propagated where I believe it should be. Currently, the following example will propagate taint correctly:
char buf[16]; taint_source(buf); taint_sink(buf);
However, the following fails:
char buf[16]; taint_source(&buf); taint_sink(buf);
In the first example, buf has it's symbol correctly extracted (via GenericTaintChecker::getPointedToSymbol()) as a SymbolDerived{conj_$N{char}, element{buf,0 S64b,char}}, it's marked as tainted and then the taint check correctly finds it using ProgramState::isTainted().
In the second example, the SVal obtained in GenericTaintChecker::getPointedToSymbol() is a LazyCompoundVal so SVal::getAsSymbol() returns a NULL SymbolRef, meaning the symbol is not tainted.
This change extends GenericTaintChecker to obtain a SymbolRegionValue from the LCV MemRegion in getPointedToSymbol(), and then extends ProgramState::isTainted() to correctly match a SymbolRegionValue{buf} against a SymbolDerived{conj_$N{char}, element{buf,0 S64b,char}}.
I'm not familiar enough with analyzer internals to be confident in whether this is the right approach to fix this bug, so feedback is welcome.
We need to be careful in the case when we don't have the definition in the current translation unit. In this case we may still have derived symbols by casting the pointer into some blindly guessed type, which may be primitive or having well-defined primitive fields.
By the way, in D26837 i'm suspecting that there are other errors of this kind in the checker, eg. when a function returns a void pointer, we put taint on symbols of type "void", which is weird.
Adding Alexey who may recall something on this topic.