This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker
ClosedPublic

Authored by dkrupp on Mar 3 2023, 4:03 AM.

Details

Summary

We extend the checker documentation with the following information

-How the user can mark inline that a data sanitation was performed performed to make a superflous report disappear
-Add references to CWEs and coding guidelines
-Describe the limitations of the checker
-Provide life-like examples

Diff Detail

Event Timeline

dkrupp created this revision.Mar 3 2023, 4:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2023, 4:03 AM
dkrupp requested review of this revision.Mar 3 2023, 4:03 AM
dkrupp added a reviewer: NoQ.Mar 3 2023, 4:11 AM
donat.nagy added inline comments.Mar 3 2023, 8:06 AM
clang/docs/analyzer/checkers.rst
78–80

Why is this paragraph (and the one above it) wrapped inconsistently? If we are touching these docs, perhaps we could re-wrap them to e.g 80 characters / line.

2386–2387

This old word choice is awkward, consider replacing it with e.g. "data from the network".

2389

Why not just "databases"?

2403

If the filename is too long (more than 1014 characters), this is a buffer overflow. I admit that having a secondary unrelated vulnerability makes the example more realistic :), but I think we should still avoid it. (This also appears in other variants of the example code, including the "No vulnerability anymore" one.)

2425

Nitpick: the comment formatting is inconsistent: for example, this is lowercase while most others start with an uppercase letter, or half of the comments have a space after the // while the others don't.

2440–2442

Perhaps explicitly mention that there are no built-in filters.

2456–2460

Separating the actual sanitization and the function that's magically recognized by the taint checker doesn't seem to be a good design pattern. Here csa_sanitize() is just a synonym for the "silence this checker here" marker, which is very confusing, because if someone is not familiar with this locally introduced no-op function, they will think that it's performing actual sanitization! At the very least we should rename this magical no-op to csa_mark_sanitized() or something similar.

The root issue is that in this example we would like to use a verifier function (that determines whether the tainted data is safe) instead of a sanitizer function (that can convert any tainted data into safe data) and our taint handling engine is not prepared to handle conditional Filter effects like "this function removes taint from its first argument if its return value is true".

I think it would be much better to switch to a different example where the "natural" solution is more aligned with the limited toolbox provided by our taint framework (i.e. it's possible define a filter function that actually removes problematic parts of the untrusted input).

2614

Spellcheck: "unknown"

dkrupp updated this revision to Diff 543586.Jul 24 2023, 9:22 AM
dkrupp marked 6 inline comments as done.
  • changed the main example to a data sanitation example (sanitizeFileName()) instead of a data verification example
  • fixed typos
  • fixed sphinx warnings

Thanks @donat.nagy for your review. I addressed your remarks. After patch https://reviews.llvm.org/D155848 these sanitizing examples work properly.

clang/docs/analyzer/checkers.rst
78–80

The formatting of this paragraph should not be impacted by this unrleated change. I will revert all unrelated formatting changes.

2403

True. cmd buffer increased to 2048

2456–2460

I changed this fist example to be a data sanitation example, where the sanitizeFileName(..) function changes the user input to an empty string if the filneme is invalid.

Then in the next example we show the generic csa_mark_sanitized() function and how it can be used to mark the valid code paths of verifier functions.

Mostly LGTM, but I'd like to ask you to ensure consistent line wrapping (to e.g. 72 or 80 characters) in the free-flowing text that was added or rewritten by this commit. (I know that this is a minor question because the linebreaks won't be visible in the final output, but the raw markdown is itself a human-readable format and we should keep it reasonably clean.)

dkrupp updated this revision to Diff 543862.Jul 25 2023, 1:10 AM

-lines wrapped to 80 characters

dkrupp marked 2 inline comments as done.Jul 25 2023, 1:11 AM
donat.nagy accepted this revision.Jul 25 2023, 1:14 AM
This revision is now accepted and ready to land.Jul 25 2023, 1:14 AM
This revision was landed with ongoing or failed builds.Jul 25 2023, 2:35 AM
This revision was automatically updated to reflect the committed changes.
steakhal added inline comments.Jul 25 2023, 3:06 AM
clang/docs/analyzer/checkers.rst
2471

Have you considered unconditionally having this function with an empty body?
That way it would have no "noise" at callsite.

dkrupp added inline comments.Jul 25 2023, 3:50 AM
clang/docs/analyzer/checkers.rst
2471

But that way the program would not compile, because the definition would not be found. Or maybe I misunderstand you.

Maybe in the future we could add an another type of filter function which would support validation type of functions: would sanitize the passed parameter only, if the function returns with non-null, non-zero.

steakhal added inline comments.Jul 25 2023, 5:18 AM
clang/docs/analyzer/checkers.rst
2469–2472

I was thinking of this when I mentioned "function with empty body".
Notice the inline, so that one could put this into a header file without violating ODR.


This way at the callsites, one wouldn't need to ifdef-clutter the code.
I think this would lead to more maintainable code in the long run, with happier users.

dkrupp added inline comments.Jul 25 2023, 10:42 AM
clang/docs/analyzer/checkers.rst
2469–2472

Good Point! I will add this to the next patch to this checker. thanks.