This patch improves the diagnostics of the alpha.security.taint.TaintPropagation and taint related checkers by showing the "Taint originated here" note at the correct place, where the attacker may inject it. This greatly improves the understandability of the taint reports.
Taint Analysis: The attacker injects the malicious data at the taint source (e.g. getenv() call) which is then propagated and used at taint sink (e.g. exec() call) causing a security vulnerability (e.g. shell injection vulnerability), without data sanitation.
The goal of the checker is to discover and show to the user these potential taint source, sink pairs and the propagation call chain.
In the baseline the taint source was pointing to an invalid location, typically somewhere between the real taint source and sink.
After the fix, the "Taint originated here" tag is correctly shown at the taint source. This is the function call where the attacker can inject a malicious data (e.g. reading from environment variable, reading from file, reading from standard input etc.).
Before the patch the clang static analyzer puts the taint origin note wrongly to the strtol(..) call.
int main(){ char *pathbuf; char *user_data=getenv("USER_INPUT"); char *end; long size=strtol(user_data, &end, 10); // note: Taint originated here. if (size > 0){ pathbuf=(char*) malloc(size+1);//note: Untrusted data is used to specify the buffer size ... // ... free(pathbuf); } return 0; }
After the fix, the taint origin point is correctly annotated at getenv() where the attacker really injects the value.
int main(){ char *pathbuf; char *user_data=getenv("USER_INPUT"); // note: Taint originated here. char *end; long size=strtol(user_data, &end, 10); if (size > 0){ pathbuf=(char*) malloc(size+1);//note: Untrusted data is used to specify the buffer size ... // ... free(pathbuf); } return 0; }
The BugVisitor placing the note was wrongly going back only until introduction of the tainted SVal in the sink.
This patch removes the BugVisitor from the implmeentation and replaces it with 2 new NoteTags. One, in the taintOriginTrackerTag() prints the "taint originated here" Note and the other in taintPropagationExplainerTag() explaining how the taintedness is propagating from argument to argument or to the return value ("Taint propagated to the xth argument").
This implemetnation uses the interestingess bugReport utility to track back the tainted symbols through propagating function calls to the point where the taintedness was introduced by a source function call.
The checker which wishes to emit a Taint related diagnostic must use the categories::TaintedData BugType category and must mark the tainted symbols as interesting. Then the TaintPropagationChecker will automatically generate the "Taint originated here" and the "Taint propagated to..." diagnostic notes.
You can find the new improved reports
And the old reports (look out for "Taint originated here" notes. They are at the wrong place, close to the end of the reports)
A simple example report from curl:
Basline:
New:
The overloads having the extra ReturnFirstOnly parameter shouldn't be visible here in the header.
That is an implementation detail that no users should know about.
Note that having a single default argument overload potentially doubles the variations the user might need to keep in mind when choosing the right one. So there is value in simplicity.