This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][doc] Add user documenation for taint analysis
ClosedPublic

Authored by gamesh411 on Nov 5 2021, 2:44 AM.

Details

Summary

Checker alpha.security.taint.TaintPropagation now has user documentation for
taint analysis with an example showing external YAML configuration format.
The format of the taint configuration file is now documented under the user
documentation of Clang SA.

Diff Detail

Event Timeline

gamesh411 created this revision.Nov 5 2021, 2:44 AM
gamesh411 requested review of this revision.Nov 5 2021, 2:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2021, 2:44 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Excellent work. Don't be afraid of the number of nits I dump on you!
Good job.

clang/docs/analyzer/checkers.rst
2354

Same for the rest.

2367

Per https://reviews.llvm.org/D113004#inline-1078695 we should not advocate users use the -Xclang machinery, we should rather refer to it by other tools such as scan-build. However, we haven't reached a consensus about this decision yet.
Consider moving some parts of this doc to the proposed Configuration documentation file - housing the more user-facing analyzer options.

2372

Please note that this configuration file is not stable (yet).
We might change it in a non-backward compatible way without any notice in the upcoming releases.

clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
20
42–52

Since there the SrcArgs is empty, fread will work as an unconditional propagator - aka. a taint source.
It's probably better to phrase it something like that.

By doing so you could signify that in the next example you have a non-empty SrcArgs list, thus the propagation is conditional.

I think it would be better to follow this pattern in the corresponding section later as well.

50
77

The literature probably refers to cleansing functions as sanitizers. Wikipedia

90

Previously it was referred to by the key word. Chose one for consistency.

90

I would rather use the operation or function call instead of event everywhere.

92
93

This sentence definitely needs some polishing.

  1. It should emphasize that a source is an unconditional propagation - at least in the current design.
  2. The user might want to enlist or mark a specific function as taint source but definitely not identify them.
95
103
110
115

In the following section, you have an empty line after this marker line. Consolidate these.

118

You sometimes use parameter and in other cases argument seemingly interchangeably. You should consider consolidating these.

134

It's not exactly for this patch, but we should investigate If we could infer this index from the declaration of the function.

gamesh411 marked 13 inline comments as done.Nov 18 2021, 1:48 AM
gamesh411 added inline comments.
clang/docs/analyzer/checkers.rst
2367

Removed the command-line invocation part, and just left a mention to the configuration option.

clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
134

Good idea, this seems like the way forward, I agree.

gamesh411 updated this revision to Diff 388132.Nov 18 2021, 1:55 AM

Fix the review comments of @steakhal

gamesh411 marked 4 inline comments as done.Nov 18 2021, 1:56 AM
whisperity added inline comments.Nov 18 2021, 4:50 AM
clang/docs/analyzer/checkers.rst
2357–2358

What does it mean that these are "default propagations"? That taint passes through calls to them trivially?

2361

execvp and execvP? What is the capital P for? I can't find this overload in the POSIX docs.

2369

Perhaps we should link to the in-house YAML doc (http://llvm.org/docs/YamlIO.html#introduction-to-yaml) first, and have the user move to other sources from there?

clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
6

The Clang Static [...]?

6

uses, or can use?

7

Clang -> Clang SA / CSA?

7

So the checker has the defaults all the time, or only by enabling the alias can I get the defaults?

8

Ditto about YAML.

14

I think this wanted to be an anchor? Also see how the syntax highlighting broke in the next section.

24

AFAIK, RST anchors are global identifiers for the project, so maybe we should take care in adding "clang-sa" or some other sort of namespaceing...

(Although this might depend on how exactly the docs are built!)

31

Filters are not mentioned earlier.

32–37

If none of the comments themselves are just commented out ("optional") keys in the YAML, it would be better readable with a bit of formatting.

42

Why is this comment at indent 0?

52

What are these magic numbers and what is their meaning?

126–127

Ensure that the numeric literal shows up as "code" with double backtick.

gamesh411 updated this revision to Diff 389117.Nov 23 2021, 1:28 AM
gamesh411 marked 15 inline comments as done.

tidy up based on comments from whispy

clang/docs/analyzer/checkers.rst
2357–2358

Added just 2 sentences about what default means.
Please see them a bit further up in the new version of the patch.

2361

I have found a refence to this function inside TargetLibraryInfoTest.cpp.

926   │   case LibFunc_execv:
927   │   case LibFunc_execvp:
928   │     return (NumParams == 2 && FTy.getParamType(0)->isPointerTy() &&
929   │             FTy.getParamType(1)->isPointerTy() &&
930   │             FTy.getReturnType()->isIntegerTy(32));
931   │   case LibFunc_execvP:
932   │   case LibFunc_execvpe:
933   │   case LibFunc_execve:
934   │     return (NumParams == 3 && FTy.getParamType(0)->isPointerTy() &&
935   │             FTy.getParamType(1)->isPointerTy() &&
936   │             FTy.getParamType(2)->isPointerTy() &&
937   │             FTy.getReturnType()->isIntegerTy(32));

It seems like an OSX-specific spelling and seems like it has the semantics of execve.
During the implementation of this patch, I just collected the references to functions inside GenericTaintChecker.cpp.

clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
6

For brevity and clarity, I will leave it at uses, it is just a generic statement about the availability of taint analysis as a method, and I think it would not improve the understanding to change it to can use. IMHO it is implicit, and can use would make people wonder as to what does Clang need to use it etc.

7

Good point, added a line to clarify this.

24

added clangsa- prefix

31

Added a mention with references in the Overview section.

32–37

Reformatted the examples section according to your suggestions.

42

It is because Propagations YAML-keys are used to define both sources and propagations. So to comment the "source-section" 0-indent is used, and to comment on individual entries within, 2-indent comments are used.

I have consolidated the indents and introduced vertical whitespace instead of multiple indentation levels.

52

Added an extra comment on top of the example YAML file.

gamesh411 updated this revision to Diff 389134.Nov 23 2021, 2:41 AM

fix indentation warning
make inline code formatting look better

gamesh411 updated this revision to Diff 389684.Nov 25 2021, 1:03 AM

update with arc diff $(git merge-base HEAD upstream) --update D113251
in order to satisfy workflow pre-merge checks

This revision was not accepted when it landed; it landed in state Needs Review.Nov 28 2021, 7:56 PM
This revision was automatically updated to reflect the committed changes.