This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/docs/analyzer/
-
docs/
-
analyzer/
-
user-docs.rst
-
user-docs/
2/8
Configuration.rst

Differential D113004

[analyzer][docs] Add Configuration section to user-docs
Needs ReviewPublic

Authored by steakhal on Nov 2 2021, 4:28 AM.

Download Raw Diff

Details

Reviewers

NoQ
Szelethus
martong
ASDenysPetrov
balazske
gamesh411

Summary

This change is twofold.

Introduces the Configuration subpage under the user-docs of the analyzer.
Adds a section describing the modeling of flexible array members.

This page would house a more user-friendly digest of the most useful analyzer config options.
It's not aimed to be a complete list of the options, rather be first aid for the users if the analysis has some unintended behavior, such as false positives regarding e.g. FAMs for form 3.
They could use this page to help them decide if there is a known option that they missed and give guidance on how to look for more config options if these don't solve their issue.

Diff Detail

Event Timeline

steakhal created this revision.Nov 2 2021, 4:28 AM

Herald added subscribers: manas, dkrupp, donat.nagy and 7 others. · View Herald TranscriptNov 2 2021, 4:28 AM

steakhal requested review of this revision.Nov 2 2021, 4:28 AM

Big yes to the idea, the config of choice as the first to document seems fine, but I'm not sure we should lecture the user right in the very first paragraph. How about a short description of what a FAM is in the first place, than examples of the 3 types of it, and why we assume that only the first and the second to be such by default. If you insist, a word of caution would be okay in the last paragraph.

Harbormaster completed remote builds in B131938: Diff 384040.Nov 2 2021, 5:28 AM

Thanks for the remarks @Szelethus!
I'm gonna update accordingly.

Briefly introduce FAMs by quoting Wikipedia.
Use double backticks for inline code.

steakhal added a reviewer: gamesh411.Nov 3 2021, 7:18 AM

Harbormaster completed remote builds in B132201: Diff 384425.Nov 3 2021, 7:48 AM

Some -analyzer-config flags indeed deserve better treatment. I appreciate the effort.

clang/docs/analyzer/user-docs/Configuration.rst
12	Most of these options are not intended for end users. Anything with `-cc1` or `-Xclang` is not intended for end users. If end users need this, we should provide user-friendly options (eg., driver flags). In fact, we already have an `-analyzer-config` flag in `scan-build` and `scan-build --help` already documents one user-facing flag: -analyzer-config <options> Provide options to pass through to the analyzer's -analyzer-config flag. Several options are separated with comma: 'key1=val1,key2=val2' Available options: * stable-report-filename=true or false (default) Switch the page naming to: report-<filename>-<function/method name>-<id>.html instead of report-XXXXXX.html This might be a better place to put this info. If you really want a more generic documentation page, you should still use a user-facing tool such as `scan-build` in your examples. Or anyhow redirect to whatever UI the user is using.
73	We recommend not using FAMs at all. The linux kernel uses them all over the place and, let's face it, we're in no position to call Linus Torvalds a bad programmer and tell him to go fix his code before using our tools.

Thanks for the review @NoQ.

clang/docs/analyzer/user-docs/Configuration.rst
12	We mostly use CodeChecker, but I don't want to advocate any specific tool here. On the other hand, it would be still really useful for the users to get an idea of which options we encourage them to tinker with to see if it solves their issue before filing a ticket that the analyzer can't do something. I'm not exactly sure how to do this without using the driver or advocating a specific tool. Aside from this, `scan-build` doesn't seem to be the perfect place for this IMO. Should I stick to documenting these in the `scan-build` perl file? @NoQ @Szelethus @martong WDYT
73	Sure, I'm going to paraphrase this.

steakhal mentioned this in D113251: [analyzer][doc] Add user documenation for taint analysis.Nov 5 2021, 10:43 AM

Szelethus added inline comments.Nov 9 2021, 5:20 AM

clang/docs/analyzer/user-docs/Configuration.rst
12	In fact, we already have an -analyzer-config flag in scan-build and scan-build --help already documents one user-facing flag: We could consider adding all this to `-analyzer-config-help`, and maybe filter developer only configs to `-analyzer-config-help`, similar to how we do it for checkers. Most of these options are not intended for end users. Anything with -cc1 or -Xclang is not intended for end users. If end users need this, we should provide user-friendly options (eg., driver flags). I agree with this sentiment, but not at the cost of these things remaining undocumented. I personally support this patch, even if its precedes bumping some flags to the driver level. Mind that checkers can only be enabled through frontend flags, but are documented as user facing. We could however omit those sentences that use `-cc1`. Just have an intro similar to https://clang.llvm.org/docs/analyzer/checkers.html. Maybe another sentence about "only those configs that are listed on this page are meant for end users!". Aside from this, scan-build doesn't seem to be the perfect place for this IMO. Agreed.

martong added inline comments.Nov 18 2021, 2:17 AM

clang/docs/analyzer/user-docs/Configuration.rst
12	I support @Szelethus 's viewpoint. We could have an `-analyzer-config-help-developer` (and possibly an `-analyzer-config-help-alpha`). Very similarly to what we already have with `-analyzer-checker-option-help-*`. I'd expect 3 follow up patches then: Document the existing cli options. Currently there are many options undocumented: ... -analyzer-checker-help-developer -analyzer-checker-help Display the list of analyzer checkers that are available -analyzer-checker-option-help-alpha -analyzer-checker-option-help-developer -analyzer-checker-option-help ... Implement `-analyzer-config-help-developer`, this would include all options that are intended for debugging. Implement `-analyzer-config-help-alpha`, this would include all options that are intended for brave and friendly users. @NoQ What do you think?

martong added inline comments.Nov 18 2021, 2:22 AM

clang/docs/analyzer/user-docs/Configuration.rst
12	Implement `-analyzer-config-help-developer`, this would include all options that are intended for debugging. Implement `-analyzer-config-help-alpha`, this would include all options that are intended for brave and friendly And once we have this explicit categorization of config options then I think it is perfectly legit to have such a `Configuration.rst` that includes the user facing (non-alpha or alpha) options.

As a user, I'm not sure if this would be the right way moving forward. This patch conflates a lot of parallel (but related) executions of the idea. (I'm not against introducing points 1 and 2 in the same patch, but do not feel like it's right to do it this way!)

If I am having a bunch of false positives related to FAMs, I wouldn't think that looking at a file named Configuration is where I might find the answer.
If I want to consider further reading about what options I have to fine-tune the analyser (especially after realising that --analyser-help-whatever or CodeChecker is not explaining stuff to me in a satisfactory way...), then it looks weird to have a file which has a "<h2>" about Flexible array members and shows the config option at the end, after a lengthy introduction to the particular issue.

In D113004#3139662, @martong wrote:

Document the existing cli options.

As far as I understand we are using TableGen for the flags in CSA too, right? I think TableGen could generate at least the skeleton of the documentation. We sure do it with the supported attributes. I do not advocate that the specific format there is the true solution, but certainly looks better organised than what we are proposing in this patch.

How I would like to enjoy reading about configuration options (without getting into the debate as to what options the user should know about in the first place...).
First, the global (i.e. analyser-wide) configuration options, listed, may or may not categorised, in a sorted way. The "primary key" should be the configuration option (or the conceptual group it belongs to). Maybe not show it like consider-checker-option-as-meaningful but "Consider checker option as meaningful" to make the writing be more human-friendly and less machine-generated. Conversely, that would mean grepping the actual flag will not show up anything unless we kind of repeat ourselves...

For each option, include a simple example, the very striking change that shows how the analysis will change depending on the value of the configuration.
(E.g. in the case of FAMs, make it simple: flag ON -> this warning happens, that warning doesn't. Flag OFF -> this warning does not happen, but that other one does!)
If there needs to be an elaborate documentation for the conceptual background of the configuration, I would prefer if it was its own page -- which is of course cross-referenced from the configuration list.
This page then would contain all the theoretical background and definition of all the various forms developers might end up defining FAMs, the discussion on which is standard, which is not supported or recommended, and why.

As to whether the checker-specific configs should be not listed here / listed but not elaborated here / properly elaborated here, I do not see the right answer. Perhaps the check-specific configuration should live together with the check's documentation (as a sub-heading), but follow the same format: simple example in the list, elaboration on its own page.

For reference, the Tidy convention is that there is a global list of all checks which is just a Table-of-Contents, not even a summary is shown there (which I believe is kind of a drawback of that structure, but let's take it as a given), and each check's documentation has its own elaboration of the options it can take.
Then again, Tidy doesn't really have global options that change a significant part of the behaviour for all, so elaboration per-checker is fine there.

I think we should document all clang analyzer options on this page (which is listed by clang -cc1 -analyzer-config-help"), where the section header should be the analyzer option name (e.g. consider-single-element-arrays-as-flexible-array-members).
I agree with @whisperity, that options, which affect the analysis should be described with code examples highlighting the changes the option causes in the analysis behaviour. (such as in case of consider-single-element-arrays-as-flexible-array-members, or widen-loops, unroll-loops etc.)

These descriptions would enable present and future front-end tool developers (scan-build, CodeChecker etc.) to understand and decide, which options they would like to forward to the users. For example "consider-single-element-arrays-as-flexible-array-members" can be presented to the end-user (who is running the anlaysis through CodeChekcer), but " ctu-invocation-list" may not (as CodeChecker configures up this parameter automatically in CTU analysis mode).

So we can regard all of these parameters as "developer" configuration options, but I think it is much better to document the config options such as "consider-single-element-arrays-as-flexible-array-members" _here_, in clang static analyzer, where it is implemented, instead of in CodeChecker where there is no related code. In CodeChecker docs we would like to give a reference to this config option and the related documentation.

So the primary consumer of this doc are clang front-end tool(scan-build,codechecker etc.) developers and secondarily users that are referred here (from those front-end tools).

I also support the idea of @whisperity of autogenerating the titles of this doc based on the TD file, but that could be done in a later step too.

So I think this patch could go in like this, with the change that the section title should not be "Flexible array member modeling", but "consider-single-element-arrays-as-flexible-array-members" and probably some example of reports that this may change.

How the user finds these pages and false positive related documentation?

When a user suspect a false positive report, (s)he will typically try to look up the doc by the emitting checker name name, such as https://clang.llvm.org/docs/analyzer/checkers.html#deadcode-deadstores-c
where he should find the description of potential false positives and checker options that may alter the behaviour (e.g. deadcode.DeadStores:WarnForDeadNestedAssignments). This how clang tidy does it (e.g. https://clang.llvm.org/extra/clang-tidy/checks/bugprone-signed-char-misuse.html)
Then, we could refer to global analyzer options from here, which may have an effect to the mentioned false positives.
So, I think we should require all checker developers to document all checker options in the (https://clang.llvm.org/docs/analyzer/checkers.html (./clang/docs/analyzer/checkers.rst)).

ASDenysPetrov added inline comments.Apr 14 2022, 3:02 AM

clang/docs/analyzer/user-docs/Configuration.rst
73	we're in no position to call Linus Torvalds a bad programmer But still there are much more programmers that are not so good in programing than Linus, therefore this recommendation applies for them. That's the fact that `int a[1]` and `int a[]` bring different information. Most every book of best practice consider avoiding the code that can be misinterpreted or interpreted ambiguously even if the language syntax allows it.

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 3:02 AM

Ping

In D113004#3652363, @ASDenysPetrov wrote:

Ping

We agreed with @steakhal that we should move this under the "Developer Docs" section. That way, the consider-single-element-arrays-as-flexible-array-members=true option remains hidden from the average users. But for advanced users/developers we can still point to the relevant options.

@steakhal, any update on this?

Revision Contents

Path

Size

clang/

docs/

analyzer/

user-docs.rst

1 line

user-docs/

Configuration.rst

86 lines

Diff 384425

clang/docs/analyzer/user-docs.rst

	User Docs			User Docs
	=========			=========

	Contents:			Contents:

	.. toctree::			.. toctree::
	:maxdepth: 2			:maxdepth: 2

				user-docs/Configuration
	user-docs/CrossTranslationUnit			user-docs/CrossTranslationUnit

clang/docs/analyzer/user-docs/Configuration.rst

This file was added.

				=============
				Configuration
				=============

				The analyzer can be configured with a wide variety of options and switches.
				One can fine-tune its behavior by passing these options as the
				``-analyzer-config`` command line parameter to the analyzer frontend action.
				For the complete list of the supported option list run this command:

				.. code-block:: bash

				$ clang -cc1 -analyze -analyzer-config-help
				NoQUnsubmitted Not Done Reply Inline Actions Most of these options are not intended for end users. Anything with `-cc1` or `-Xclang` is not intended for end users. If end users need this, we should provide user-friendly options (eg., driver flags). In fact, we already have an `-analyzer-config` flag in `scan-build` and `scan-build --help` already documents one user-facing flag: -analyzer-config <options> Provide options to pass through to the analyzer's -analyzer-config flag. Several options are separated with comma: 'key1=val1,key2=val2' Available options: * stable-report-filename=true or false (default) Switch the page naming to: report-<filename>-<function/method name>-<id>.html instead of report-XXXXXX.html This might be a better place to put this info. If you really want a more generic documentation page, you should still use a user-facing tool such as `scan-build` in your examples. Or anyhow redirect to whatever UI the user is using. NoQ: Most of these options are not intended for end users. Anything with `-cc1` or `-Xclang` is not…
				steakhalAuthorUnsubmitted Done Reply Inline Actions We mostly use CodeChecker, but I don't want to advocate any specific tool here. On the other hand, it would be still really useful for the users to get an idea of which options we encourage them to tinker with to see if it solves their issue before filing a ticket that the analyzer can't do something. I'm not exactly sure how to do this without using the driver or advocating a specific tool. Aside from this, `scan-build` doesn't seem to be the perfect place for this IMO. Should I stick to documenting these in the `scan-build` perl file? @NoQ @Szelethus @martong WDYT steakhal: We mostly use CodeChecker, but I don't want to advocate any specific tool here. On the other…
				SzelethusUnsubmitted Not Done Reply Inline Actions In fact, we already have an -analyzer-config flag in scan-build and scan-build --help already documents one user-facing flag: We could consider adding all this to `-analyzer-config-help`, and maybe filter developer only configs to `-analyzer-config-help`, similar to how we do it for checkers. Most of these options are not intended for end users. Anything with -cc1 or -Xclang is not intended for end users. If end users need this, we should provide user-friendly options (eg., driver flags). I agree with this sentiment, but not at the cost of these things remaining undocumented. I personally support this patch, even if its precedes bumping some flags to the driver level. Mind that checkers can only be enabled through frontend flags, but are documented as user facing. We could however omit those sentences that use `-cc1`. Just have an intro similar to https://clang.llvm.org/docs/analyzer/checkers.html. Maybe another sentence about "only those configs that are listed on this page are meant for end users!". Aside from this, scan-build doesn't seem to be the perfect place for this IMO. Agreed. Szelethus: > In fact, we already have an -analyzer-config flag in scan-build and scan-build --help already…
				martongUnsubmitted Not Done Reply Inline Actions I support @Szelethus 's viewpoint. We could have an `-analyzer-config-help-developer` (and possibly an `-analyzer-config-help-alpha`). Very similarly to what we already have with `-analyzer-checker-option-help-`. I'd expect 3 follow up patches then: Document the existing cli options. Currently there are many options undocumented: ... -analyzer-checker-help-developer -analyzer-checker-help Display the list of analyzer checkers that are available -analyzer-checker-option-help-alpha -analyzer-checker-option-help-developer -analyzer-checker-option-help ... Implement `-analyzer-config-help-developer`, this would include all options that are intended for debugging. Implement `-analyzer-config-help-alpha`, this would include all options that are intended for brave and friendly users. @NoQ What do you think? martong:* I support @Szelethus 's viewpoint. We could have an `-analyzer-config-help-developer` (and…
				martongUnsubmitted Not Done Reply Inline Actions Implement `-analyzer-config-help-developer`, this would include all options that are intended for debugging. Implement `-analyzer-config-help-alpha`, this would include all options that are intended for brave and friendly And once we have this explicit categorization of config options then I think it is perfectly legit to have such a `Configuration.rst` that includes the user facing (non-alpha or alpha) options. martong: > 2) Implement `-analyzer-config-help-developer`, this would include all options that are…

				.. contents::
				:local:

				Flexible array member modeling
				______________________________

				In C struct data types may end with a flexible array member with no
				specificed size:

				.. code-block:: cpp

				struct vectord {
				short len; // there must be at least one other data member
				double arr[]; // the flexible array member must be last
				// The compiler may reserve extra padding space here, like it can between struct members
				};

				// ...

				struct vectord vector = malloc(sizeof(struct vectord) + n sizeof(double));
				vector->len = n;
				for (int i = 0; i < vector->len; i++)
				vector->arr[i] = ...;

				This pattern is widely used to spare an extra allocation and a pointer
				indirection thus gaining better cache locality.
				Source: `Wikipedia <https://en.wikipedia.org/wiki/Flexible_array_member>`_

				----

				There are multiple forms of FAMs in the wild:

				- Form 1 - Incomplete array member:

				.. code-block:: cpp

				struct Foo {
				char field1;
				int my_fam[]; // no size!
				};

				- Form 2 - Constant array member with size 0:

				.. code-block:: cpp

				struct Foo {
				char field1;
				int my_fam[0]; // zero size!
				};

				- Form 3 - Constant array member with size 1:

				.. code-block:: cpp

				struct Foo {
				char field1;
				int my_fam[1]; // single element!
				};

				We recommend not using FAMs at all. If you really need them, stick to using
				NoQUnsubmitted Not Done Reply Inline Actions We recommend not using FAMs at all. The linux kernel uses them all over the place and, let's face it, we're in no position to call Linus Torvalds a bad programmer and tell him to go fix his code before using our tools. NoQ: > We recommend not using FAMs at all. The linux kernel uses them all over the place and, let's…
				steakhalAuthorUnsubmitted Done Reply Inline Actions Sure, I'm going to paraphrase this. steakhal: Sure, I'm going to paraphrase this.
				ASDenysPetrovUnsubmitted Not Done Reply Inline Actions we're in no position to call Linus Torvalds a bad programmer But still there are much more programmers that are not so good in programing than Linus, therefore this recommendation applies for them. That's the fact that `int a[1]` and `int a[]` bring different information. Most every book of best practice consider avoiding the code that can be misinterpreted or interpreted ambiguously even if the language syntax allows it. ASDenysPetrov: > we're in no position to call Linus Torvalds a bad programmer But still there are much more…
				the first form mentioned in the list - using an incomplete array member
				declaration. It will deliver the intent of the programmer that the size
				of the array is indeed unknown.

				Form 2 is also discouraged, since not all compilers might support it.
				Form 3 is considered harmful! It's highly likely that any use of that form
				would result in undefined behavior. The compiler has the right to compile the
				binary under the assumption that the array has exactly one element.

				By default, the Clang Static Analyzer will consider only form 1 and form 2
				as flexible array members. If the project under analysis uses FAMs of
				form 3, then consider enabling the
				``consider-single-element-arrays-as-flexible-array-members=true`` config option.