This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
ReleaseNotes.rst
-
analyzer/
2/3
checkers.rst
-
include/clang/StaticAnalyzer/Checkers/
-
clang/
-
StaticAnalyzer/
-
Checkers/
-
Checkers.td
-
test/Analysis/
-
Analysis/
-
PR49642.c
-
analyzer-config.c
-
analyzer-enabled-checkers.c
-
conversion.c
-
errno-stdlibraryfunctions-notes.c
-
errno-stdlibraryfunctions.c
-
std-c-library-functions-POSIX-lookup.c
-
std-c-library-functions-POSIX-socket-sockaddr.cpp
-
std-c-library-functions-POSIX.c
-
std-c-library-functions-arg-constraints-note-tags.cpp
-
std-c-library-functions-arg-constraints-notes.cpp
-
std-c-library-functions-arg-constraints-tracking-notes.c
-
std-c-library-functions-arg-constraints.c
-
std-c-library-functions-arg-constraints.cpp
-
std-c-library-functions-arg-cstring-dependency.c
-
std-c-library-functions-arg-enabled-checkers.c
-
std-c-library-functions-arg-weakdeps.c
-
std-c-library-functions-eof.c
-
std-c-library-functions-inlined.c
-
std-c-library-functions-lookup.c
-
std-c-library-functions-lookup.cpp
-
std-c-library-functions-path-notes.c
-
std-c-library-functions-restrict.c
-
std-c-library-functions-restrict.cpp
-
std-c-library-functions-vs-stream-checker.c
-
std-c-library-functions.c
-
std-c-library-functions.cpp
-
std-c-library-posix-crash.c
-
stream-errno-note.c
-
stream-errno.c
-
stream-noopen.c
-
stream-note.c
-
stream-stdlibraryfunctionargs.c
-
weak-dependencies.c

Differential D152436

[clang][analyzer] Move checker alpha.unix.StdCLibraryFunctions out of alpha.
AcceptedPublic

Authored by balazske on Jun 8 2023, 5:26 AM.

Download Raw Diff

Details

Reviewers

Szelethus
NoQ
steakhal
gamesh411

Summary

This checker can be good enough to move out of alpha.
I am not sure about the exact requirements, this review can be a place
for discussion about what should be fixed (if any).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

balazske created this revision.Jun 8 2023, 5:26 AM

Herald added a reviewer: Szelethus. · View Herald TranscriptJun 8 2023, 5:26 AM

Herald added a reviewer: NoQ. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: steakhal, manas, ASDenysPetrov and 10 others. · View Herald Transcript

balazske requested review of this revision.Jun 8 2023, 5:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2023, 5:26 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

I could test the checker on these projects (CTU analysis was not used):
memcached,tmux,curl,twin,vim,openssl,sqlite,ffmpeg,postgres,tinyxml2,libwebm,xerces,bitcoin,protobuf,qtbase,contour,acid

These are reports that could be improved:
link
In this case function fileno returns -1 because of failure, but this is not indicated in a NoteTag. This is a correct result, only the note is missing. This problem can be solved if a note is displayed on every branch ("case") of the standard C functions. But this leads to many notes at un-interesting places. If the note is displayed only at "interesting" values another difficulty shows up: The note disappears from places where it should be shown because the "interestingness" is not set, for example at conditions of if statement. So the solution may require more work. This case with function fileno occurs 13 times in all the tested projects.
link
The function open is not modeled in StdCLibraryFunctionsChecker, it should not return less than -1 but this information is not included now.
link
This looks wrong, L should not be 0 because len looks > 0 (see the macros that set len). Probably the many bitwise operations cause the problem.
link
socket can not return less than -1 but this function is not modeled currently.
link
fwrite with 0 buffer and 0 size should not be an error, this is not checked now.
link
When file_size is 0 status.ok() is probably false that is not correctly recognized (may work in CTU mode?).

These results look good:
link
link
link
link
link
link
link
link
In this last case it looks like that previous call to ftell returns -1, this value is assigned to fileSize. This is again a case for improvement similar as the case with fileno.

One deficiency is that some filenames of test files contain the old std-c-library-functions-arg name that is not used any more.

Another question is if default value of ModelPOSIX can be changed to true?

Harbormaster completed remote builds in B237474: Diff 529568.Jun 8 2023, 5:57 AM

Szelethus added reviewers: steakhal, gamesh411.Jun 9 2023, 1:49 AM

I am not sure about the exact requirements, this review can be a place for discussion about what should be fixed (if any).

D52984 added the "Making your checker better" section to the dev manual: https://clang-analyzer.llvm.org/checker_dev_manual.html (nobody can be faulted for not finding this, aside from those that witnessed its creation, it has faded from the collective memory of the analyzer developers).

In D152436#4405558, @balazske wrote:

I could test the checker on these projects (CTU analysis was not used):
memcached,tmux,curl,twin,vim,openssl,sqlite,ffmpeg,postgres,tinyxml2,libwebm,xerces,bitcoin,protobuf,qtbase,contour,acid

These are reports that could be improved:
link
In this case function fileno returns -1 because of failure, but this is not indicated in a NoteTag. This is a correct result, only the note is missing. This problem can be solved if a note is displayed on every branch ("case") of the standard C functions. But this leads to many notes at un-interesting places. If the note is displayed only at "interesting" values another difficulty shows up: The note disappears from places where it should be shown because the "interestingness" is not set, for example at conditions of if statement. So the solution may require more work. This case with function fileno occurs 13 times in all the tested projects.

Yeah, this is a tough cookie... is it okay to find hide the -1 branch behind an off-by-default checker option for the time being?

link
The function open is not modeled in StdCLibraryFunctionsChecker, it should not return less than -1 but this information is not included now.
link
socket can not return less than -1 but this function is not modeled currently.

These should be a rather painless fix, right?

link
This looks wrong, L should not be 0 because len looks > 0 (see the macros that set len). Probably the many bitwise operations cause the problem.
link
When file_size is 0 status.ok() is probably false that is not correctly recognized (may work in CTU mode?).

Looks like something we can live with.

link
fwrite with 0 buffer and 0 size should not be an error, this is not checked now.

Some discussion for that: D140387#inline-1360054. There is a FIXME in the code for it -- not sure how common this specific issue is, but we did stumble on it in an open source project... how painful would it be to fix this?

These results look good:
link
link
link
link
link
link
link
link
In this last case it looks like that previous call to ftell returns -1, this value is assigned to fileSize. This is again a case for improvement similar as the case with fileno.

In D152436#4405558, @balazske wrote:

These are reports that could be improved:
link
In this case function fileno returns -1 because of failure, but this is not indicated in a NoteTag. This is a correct result, only the note is missing. This problem can be solved if a note is displayed on every branch ("case") of the standard C functions. But this leads to many notes at un-interesting places. If the note is displayed only at "interesting" values another difficulty shows up: The note disappears from places where it should be shown because the "interestingness" is not set, for example at conditions of if statement. So the solution may require more work. This case with function fileno occurs 13 times in all the tested projects.

Could you elaborate on what do you mean by "The note disappears from places where it should be shown because the "interestingness" is not set, for example at conditions of if statement.". A short example would do the job I think.

I looked at the TPs, and if the violation was introduced by an assumption (instead of an assignment), then it's really hard to spot which assumption is important for the bug.
I wonder if we could add the TrackConstraintBRVisitor to the bugreport to "highlight" that particular assumption/place.

In D152436#4408301, @steakhal wrote:

In D152436#4405558, @balazske wrote:

These are reports that could be improved:
link
In this case function fileno returns -1 because of failure, but this is not indicated in a NoteTag. This is a correct result, only the note is missing. This problem can be solved if a note is displayed on every branch ("case") of the standard C functions. But this leads to many notes at un-interesting places. If the note is displayed only at "interesting" values another difficulty shows up: The note disappears from places where it should be shown because the "interestingness" is not set, for example at conditions of if statement. So the solution may require more work. This case with function fileno occurs 13 times in all the tested projects.

Could you elaborate on what do you mean by "The note disappears from places where it should be shown because the "interestingness" is not set, for example at conditions of if statement.". A short example would do the job I think.

I looked at the TPs, and if the violation was introduced by an assumption (instead of an assignment), then it's really hard to spot which assumption is important for the bug.
I wonder if we could add the TrackConstraintBRVisitor to the bugreport to "highlight" that particular assumption/place.

The question is first if this problem must be fixed before the checker comes out of alpha state. If yes I try to make another patch with this fix. I tried this previously but do not remember exactly what the problem was.

clang/docs/analyzer/checkers.rst
922	This is applicable to C++ too?

In D152436#4408811, @balazske wrote:

In D152436#4408301, @steakhal wrote:

I looked at the TPs, and if the violation was introduced by an assumption (instead of an assignment), then it's really hard to spot which assumption is important for the bug.
I wonder if we could add the TrackConstraintBRVisitor to the bugreport to "highlight" that particular assumption/place.

The question is first if this problem must be fixed before the checker comes out of alpha state. If yes I try to make another patch with this fix. I tried this previously but do not remember exactly what the problem was.

WIthout an explicit note message there, I don't see how could we advertise this as a "mature" checker.

clang/docs/analyzer/checkers.rst
922	Yes it's applicable to c++ if they use these C APIs. However, I would prefer not to extend it with C++. IMO that would only raise confusion.

In D152436#4411912, @steakhal wrote:

In D152436#4408811, @balazske wrote:

In D152436#4408301, @steakhal wrote:

I looked at the TPs, and if the violation was introduced by an assumption (instead of an assignment), then it's really hard to spot which assumption is important for the bug.
I wonder if we could add the TrackConstraintBRVisitor to the bugreport to "highlight" that particular assumption/place.

The question is first if this problem must be fixed before the checker comes out of alpha state. If yes I try to make another patch with this fix. I tried this previously but do not remember exactly what the problem was.

WIthout an explicit note message there, I don't see how could we advertise this as a "mature" checker.

Is it possible to hide functions hindered by this problem behing an off-by-default flag?

It is possible to add note tags to show decisions at standard functions. For example at fileno show if it has failed or not failed. The most simple way is to add it to all places, this means a note will show up on any bug path at all standard function usages. This is how it works already with the existing notes. Like in the following code:

int __test_case_note();

int test_case_note_1(int y) {
  int x1 = __test_case_note(); // expected-note{{Function returns 1}}
  int x = __test_case_note(); // expected-note{{Function returns 0}} \
                              // expected-note{{'x' initialized here}}
  return y / x; // expected-warning{{Division by zero}} \
                // expected-note{{Division by zero}}
}

int test_case_note_2(int y) {
  int x = __test_case_note(); // expected-note{{Function returns 1}}
  return y / (x - 1); // expected-warning{{Division by zero}} \
                      // expected-note{{Division by zero}}
}

Here the first note at line with "x1" is not necessary. This problem can be fixed if the note is only shown when the return value is "interesting":

int __test_case_note();

int test_case_note_1(int y) {
  int x1 = __test_case_note(); // no note
  int x = __test_case_note(); // expected-note{{Function returns 0}} \
                              // expected-note{{'x' initialized here}}
  return y / x; // expected-warning{{Division by zero}} \
                // expected-note{{Division by zero}}
}

int test_case_note_2(int y) {
  int x = __test_case_note(); // no note
  return y / (x - 1); // expected-warning{{Division by zero}} \
                      // expected-note{{Division by zero}}
}

But in this case the note at test_case_note_2 disappears because x-1 is interesting, but not x. Fixing this problem looks more difficult.

From these two solutions, which one is better? (Show many unnecessary notes, or show only necessary ones but lose some of the useful notes too.)

In D152436#4432736, @balazske wrote:

From these two solutions, which one is better? (Show many unnecessary notes, or show only necessary ones but lose some of the useful notes too.)

How likely are the problems with the 2nd case? I know its a hassle, but can you upload results for the first and the second case? Its hard to tell without seeing how these pan out in the real world.

In D152436#4432736, @balazske wrote:

[...]
From these two solutions, which one is better? (Show many unnecessary notes, or show only necessary ones but lose some of the useful notes too.)

I think we must avoid polluting the environment with unnecessary notes (especially if we'd emit "many" of them), so I strongly suggest option 2.

Later it would be possible to fix the limitations of the interestingness system, e.g. by introducing a "mark all symbols in this expression as interesting" function. Perhaps we should open a separate discussion about this on discourse -- but this review and the de-alpha-ing of StdCLibraryFunctions should not be delayed by this tangentially related engine improvement!

Personally I think it's completely acceptable if the analyzer sometimes emits bug reports that are true positives but lack a message like "expected-note{{Function returns 1}}" as I'd guess that in the majority of these cases it wouldn't be terribly difficult for the user to manually derive this information from the context. (I'd say that even the previously highlighted fileno report is acceptable -- it's surprising at first, but all bug reports come with an implicit "On a certain execution path..." prefix and in this case it's easy to see that "certain" = "when fileno returns -1".)

If these issues produce lots of issues that are very confusing, then we should put the affected functions behind off-by-default flags, finish this review process, and revisit them later when the interestingness system is improved; but based on the available information I don't think that's necessary.

I resign as a reviewer as I'm not deeply connected to this checker, thus I won't block it or accept it.
However, my opinion is that a checker should be "released" if they have clear diagnostics (which includes that it doesn't flood the user with unimportant diagnostics either).
Consequently, if there are features missing to accomplish that, then that thing is a blocker.

TBH I never understood why interestingness is not transitive over the SymExpr dependencies (symbols_begin/end).
This was not the only case when it hindered us. Just think of how taint propagates.

In D152436#4437692, @donat.nagy wrote:

Personally I think it's completely acceptable if the analyzer sometimes emits bug reports that are true positives but lack a message [...]

I must admit that I'm in the other camp.

If these issues produce lots of issues that are very confusing, then we should put the affected functions behind off-by-default flags, finish this review process, and revisit them later when the interestingness system is improved; but based on the available information I don't think that's necessary.

I can agree with this pragmatic approach.

In D152436#4437822, @steakhal wrote:

In D152436#4437692, @donat.nagy wrote:

Personally I think it's completely acceptable if the analyzer sometimes emits bug reports that are true positives but lack a message [...]

I must admit that I'm in the other camp.

To clarify my position the "sometimes" in my comment should've been "rarely" or something closer to that. I don't want to release confusing garbage, I'm just trying to optimize sum(helpfulness of checkers) and I feel that the last few steps towards perfect helpfulness are often disproportionately expensive. If a code contains two bugs, then two rough but understandable warnings are more valuable than one well-polished friendly warning + one completely missed bug (because we didn't have time to write a checker for it).

@balazske:

Do I understand it correctly that the "create note that's shown when the return value is marked as interesting" (the option 2 that I suggested) adds / will add clarifying notes to the fileno/fstat error reports and the ftell issue? Did you see any real-life examples where this option 2 does not provide useful notes? If not, then I think we can accept that option 2 is a sufficient solution for these issues.

In addition to this, there are these issues noted by @Szelethus:

In D152436#4408199, @Szelethus wrote:

In D152436#4405558, @balazske wrote:

link
The function open is not modeled in StdCLibraryFunctionsChecker, it should not return less than -1 but this information is not included now.
link
socket can not return less than -1 but this function is not modeled currently.

These should be a rather painless fix, right?

[...]

link
fwrite with 0 buffer and 0 size should not be an error, this is not checked now.

Some discussion for that: D140387#inline-1360054. There is a FIXME in the code for it -- not sure how common this specific issue is, but we did stumble on it in an open source project... how painful would it be to fix this?

If it's not too difficult, then please upload a new version of this patch that implements "option 2" (i.e. produces notes when the return value of the std library function is marked as interesting) + handles the three requests of @Szelethus. I hope that you can do this and then this review can be concluded quickly.

If there are significant obstacles (or I misunderstood the situation), then please reply and discuss the next steps.

Uh-oh, looks like I'm not paying nearly enough attention to this discussion (sorry about that!!)

I'm somewhat skeptical of the decision made in D151225 because the entire reason I originally implemented StdCLibraryFunctions was to deal with false positives I was seeing. It was really valuable even without the bug-finding part. So I really wish we could find some way to keep bug-finding and modeling separate.

I haven't read the entire discussion though, I need to catch up 😓

In this case function fileno returns -1 because of failure, but this is not indicated in a NoteTag. This is a correct result, only the note is missing. This problem can be solved if a note is displayed on every branch ("case") of the standard C functions. But this leads to many notes at un-interesting places. If the note is displayed only at "interesting" values another difficulty shows up: The note disappears from places where it should be shown because the "interestingness" is not set, for example at conditions of if statement. So the solution may require more work. This case with function fileno occurs 13 times in all the tested projects.

Extra notes along the path are fine as long as their isPrunable() flag is correctly set. It's perfectly ok to say "(15) Hey btw we've just ran over this statement and here's what we assume it did" in a section of a path that's already displayed to the user. Even if you say that on every line, it's probably ok.

But if the note causes a new nested stack frame to be displayed (which was otherwise pruned from the report), there better be a good reason for this, because this can easily increase the complexity of the bug report by a factor of 100.

It's definitely a requirement for a non-alpha checker to make sure that there are enough non-prunable pieces in the report for the user to understand the report. A lot of our existing on-by-default checkers (and even some core checkers) don't really hold up to this expectation (looking at you null dereference), but almost every time they don't, that's a false positive. If you need to pass -analyzer-config prune-paths=false in order to see a key step in the bug report, that's a false positive even if your path simulation was perfect.

So if any of these notes are essential to understanding any bug report (by this checker or by another checker, eg. like getenv() returning null is essential for null dereference checker), there needs to be a way for the note tag to learn that (eg., the getenv() should be able to ask whether the return value is being tracked by trackExpressionValue()) and if so, the note has to be marked as unprunable.

For first experiment I have made patch D153612 that adds a NoteTag to "all" standard function calls.

In D152436#4438956, @NoQ wrote:

Uh-oh, looks like I'm not paying nearly enough attention to this discussion (sorry about that!!)

I'm somewhat skeptical of the decision made in D151225 because the entire reason I originally implemented StdCLibraryFunctions was to deal with false positives I was seeing. It was really valuable even without the bug-finding part. So I really wish we could find some way to keep bug-finding and modeling separate.

I haven't read the entire discussion though, I need to catch up 😓

The problem was that modeling and report generation could not be separated correctly. Both are implemented in one class but are differently named checkers that should run in a specific order because dependency issues, this was not good. Other problem was that if the modeling checker runs first, it will apply state changes for pre and post conditions without generating a bug report even if a bug could be found in the previous state. The old state is then lost and other checkers will not find that bug. For example a case of null pointer argument to a function is always removed by the modeling part of the checker, even if this was a case when a bug report should be generated.

In D152436#4443858, @balazske wrote:

In D152436#4438956, @NoQ wrote:

I'm somewhat skeptical of the decision made in D151225 because the entire reason I originally implemented StdCLibraryFunctions was to deal with false positives I was seeing. It was really valuable even without the bug-finding part. So I really wish we could find some way to keep bug-finding and modeling separate.

The problem was that modeling and report generation could not be separated correctly. Both are implemented in one class but are differently named checkers that should run in a specific order because dependency issues, this was not good.

In my view, it would certainly be possible through enormous efforts to further granularize this checker (or these large ones in general), so that the modeling and reporting portions would could be cleanly separated into their own checker objects. That certianly was my belief a couple years back -- I sank months and months into MallocChecker, yet I'm still not even close to that goal.

So, with the modeling and the reporting being the same entity, we can't express that some more specific checkers should run before it. StreamChecker can construct more specific messages thatn StdLibraryFunctions for a null stream object, but only if it runs ahead of it. That implies a both a weak and a strong dependency on what is essentially the same checker. As things stand, not sure how we could have avoided this if we want these checkers to finally leave the alpha state.

In D152436#4443828, @balazske wrote:

For first experiment I have made patch D153612 that adds a NoteTag to "all" standard function calls.

Could you post the results for it as you have them please?

Using the latest version of the checker.

Herald added a subscriber: wangpc. · View Herald TranscriptAug 7 2023, 7:15 AM

Harbormaster completed remote builds in B250781: Diff 547777.Aug 7 2023, 9:09 AM

Why is this checker placed in the "unix" group (instead of e.g. "core")? As far as I understand it can check some POSIX-specific functions with a flag (that's off by default) but the core functionality checks platform-independent standard library functions.

Moreover, if you can, please provide links to some test result examples on open source projects (either run some fresh tests or provide links to an earlier test run that was running with the same functionality).

This checker was originally in the "unix" package. I agree that this is not exact and core can be better, the checked functions should be available in any default C library on UNIX, OSX, Windows or other platforms too, even the POSIX ones at least in some cases. This applies even to the other related checkers alpha.unix.Stream and alpha.unix.Errno.

I have checked the results on some projects (memcached,tmux,curl,twin,vim,openssl,sqlite,ffmpeg,postgres,xerces,bitcoin).

These results are more interesting, some look correct, some probably not:
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=curl_curl-7_66_0_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions&report-id=2243964&report-hash=d4a4bda38c5a6fdaabe2c1867158b106&report-filepath=%2atftpd.c
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=ffmpeg_n4.3.1_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions&report-hash=908f965d980d60292af95db0fa10cd5f&report-id=2252082&report-filepath=%2av4l2_buffers.c
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=postgres_REL_13_0_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions&report-hash=914e79646cb0de40dab434ba24c8c23c&report-id=2259781&report-filepath=%2adsm_impl.c
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=postgres_REL_13_0_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions&report-hash=58d8278be40f99597b44323d2574c053&report-id=2259789&report-filepath=%2asyslogger.c
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=postgres_REL_13_0_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions&report-hash=1928ba718d9742340937d425ec3978c6&report-id=2260011&report-filepath=%2apg_backup_custom.c
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=bitcoin_v0.20.1_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions&report-hash=6ad3a20f18f2850293b4cdd867e404e2&report-id=2266103&report-filepath=%2aenv_posix.cc

This is more questionable:
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=twin_v0.8.1_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions&report-hash=50a98122502701302b7b75a6a56342e8&report-id=2244071&report-filepath=%2ashm.c

Correct but interesting, the note about failure of ftell is shown:
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=xerces_v3.2.3_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions&report-hash=4ab640064066880ac7031727869c92f4&report-id=2260149&report-filepath=%2aThreadTest.cpp

I did not find results that are obvious false positive.
Many results are the case when fileno returns -1 and this value is used without check. The checker generates a note about failure of fileno. For example at these results:
https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_stdclibraryfunctions_alpha&is-unique=on&diff-type=New&checker-name=unix.StdCLibraryFunctions
(There are cases when fileno(stderr) is assumed to fail. This case can be eliminated if the StreamChecker is enabled, after an improvement of the checker. But for this StreamChecker must run before StdCLibraryFunctionsChecker?)

Thanks for the sample reports.
I'm fine if we want to make it a non-alpha (released) checker.

An orthogonal question is, whether we want to have it under the code package.
I'm not sure there are official guidance for elevating a checker to code, but here are my assumptions.
To me, these are the questions to see how valuable our reports are for the user.

How many issues does it raise? Would we flood the user?
How "interesting" those issues are? Do they have *actual* value for the user? (Not only niece edge-cases, that is fancy to know about, but actual users would genuinely commit such mistakes)
How long those bug-paths are in practice? I'd argue, the longer they are, usually the less actionable they are for the user. Less actionable reports are also less valuable, or even harmful.
In general, how understandable these reports are? Do we have all the interesting "notes" or "events" on the path?

Please, reflect on these questions and argue why we should have diagnostics of this kind?
Note that, I'm (probably) fine enabling modeling parts by default,but having diagnostics by default is another thing.

Additionally, it might make sense to first "release" the checker, and after another llvm release, turn this into a Core checker.

It doesn't have to be in core just because it's dealing with "core features" of the language.

The core package is for checks without which path-sensitive analysis becomes so incredibly incorrect that we don't want to support such configuration, we don't want our users to ever use it.

(Ideally core checkers shouldn't emit any warnings at all. It has to be possible to disable every individual warning and still have other warnings work correctly. But unfortunately that's not the reality of the situation.)

(I agree unix doesn't sound right though. But we already have MallocChecker in unix, which is arguably way worse. What we really need is, to replace our package system with a hashtag system.)

About the questions:

How many issues does it raise? Would we flood the user?

I did not experience that the checker produces many warnings. Any warning from this checker is connected to a function call of a standard API, and the number of such calls is usually not high. Typically one
problem which the checker reports can occur often in a specific program, for example the fileno case (fileno returns -1 at failure, often this failure is not handled and value -1 is used as a file number).
This should not be a case of hundreds of warnings.

How "interesting" those issues are? Do they have *actual* value for the user? (Not only niece edge-cases, that is fancy to know about, but actual users would genuinely commit such mistakes)

If the coder cares about all edge-cases of API calls, these are real and important issues. More often most of the results are just cases of ignored errors that are very rare, the programmer probably intentionally did not handle these because it is not worth for a such rare situation. From security point of view these cases can be used to find places where it is possible to make an API call (which normally "never" fails) intentionally fail and produce unexpected behavior of the program. So for an average application many results are not very important, for stability and security critical code the results can be more important.

How long those bug-paths are in practice? I'd argue, the longer they are, usually the less actionable they are for the user. Less actionable reports are also less valuable, or even harmful.

The bug path can be long, often only the very last part is important, but sometimes not.

In general, how understandable these reports are? Do we have all the interesting "notes" or "events" on the path?

These should be not more difficult to understand than a division by zero, only with a function call instead of division.

As I don't use this checker, thus I cannot speak of my experience.
However, the reasons look solid and I'm fine with moving this checker to unix.StdCLibraryFunctions.
Let some time for other reviewers to object before landing this. Lets say, one week from now.
@xazax.hun @NoQ?

As next steps, I'd be glad set the default value of ModelPOSIX to true. I don't see much harm doing so.
(And maybe getting rid of that checker option entirely.)

clang/docs/analyzer/checkers.rst
979–982

This revision is now accepted and ready to land.Aug 11 2023, 9:13 AM

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

3 lines

analyzer/

checkers.rst

188 lines

include/

clang/

StaticAnalyzer/

Checkers/

Checkers.td

43 lines

test/

Analysis/

PR49642.c

2 lines

analyzer-config.c

4 lines

analyzer-enabled-checkers.c

1 line

conversion.c

4 lines

errno-stdlibraryfunctions-notes.c

4 lines

errno-stdlibraryfunctions.c

4 lines

std-c-library-functions-POSIX-lookup.c

6 lines

std-c-library-functions-POSIX-socket-sockaddr.cpp

6 lines

std-c-library-functions-POSIX.c

12 lines

std-c-library-functions-arg-constraints-note-tags.cpp

4 lines

std-c-library-functions-arg-constraints-notes.cpp

4 lines

std-c-library-functions-arg-constraints-tracking-notes.c

2 lines

std-c-library-functions-arg-constraints.c

8 lines

std-c-library-functions-arg-constraints.cpp

2 lines

std-c-library-functions-arg-cstring-dependency.c

4 lines

std-c-library-functions-arg-enabled-checkers.c

10 lines

std-c-library-functions-arg-weakdeps.c

10 lines

std-c-library-functions-eof.c

10 lines

std-c-library-functions-inlined.c

10 lines

std-c-library-functions-lookup.c

4 lines

std-c-library-functions-lookup.cpp

4 lines

std-c-library-functions-path-notes.c

4 lines

std-c-library-functions-restrict.c

4 lines

std-c-library-functions-restrict.cpp

4 lines

std-c-library-functions-vs-stream-checker.c

8 lines

std-c-library-functions.c

12 lines

std-c-library-functions.cpp

2 lines

std-c-library-posix-crash.c

4 lines

4 lines

4 lines

8 lines

4 lines

stream-stdlibraryfunctionargs.c

10 lines

weak-dependencies.c

2 lines

Diff 547777

clang/docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines
	------------			------------

	libclang			libclang
	--------			--------

	Static Analyzer			Static Analyzer
	---------------			---------------

				- Move checker ``alpha.unix.StdCLibraryFunctions`` out of the ``alpha`` package
				to ``unix.StdCLibraryFunctions``.

	.. _release-notes-sanitizers:			.. _release-notes-sanitizers:

	Sanitizers			Sanitizers
	----------			----------

	Python Binding Changes			Python Binding Changes
	----------------------			----------------------

	Show All 13 Lines

clang/docs/analyzer/checkers.rst

Show First 20 Lines • Show All 913 Lines • ▼ Show 20 Lines

"""""""""""""""""""""""""""""""""""

Check for mismatched deallocators.

.. literalinclude:: checkers/mismatched_deallocator_example.cpp

:language: c

.. _unix-Vfork:

unix.Vfork (C)

balazskeAuthorUnsubmitted

Done

This is applicable to C++ too?

balazske: This is applicable to C++ too?

steakhalUnsubmitted

Done

Yes it's applicable to c++ if they use these C APIs.
However, I would prefer not to extend it with C++. IMO that would only raise confusion.

steakhal: Yes it's applicable to c++ if they use these C APIs. However, I would prefer not to extend it…

""""""""""""""

Check for proper usage of ``vfork``.

.. code-block:: c

int test(int x) {

pid_t pid = vfork(); // warn

if (pid != 0)

Show All 40 Lines

``strlen, strnlen, strcpy, strncpy, strcat, strncat, strcmp, strncmp, strcasecmp, strncasecmp, wcslen, wcsnlen``.

.. code-block:: c

int test() {

return strlen(0); // warn

}

.. _unix-StdCLibraryFunctions:

unix.StdCLibraryFunctions (C)

"""""""""""""""""""""""""""""""""""

steakhalUnsubmitted

Not Done

return strlen(0); // warn

}

.. _unix-StdCLibraryFunctions:

unix.StdCLibraryFunctions (C)

- """""""""""""""""""""""""""""""""""

+ """""""""""""""""""""""""""""

Check for calls of standard library functions that violate predefined argument

steakhal:

Check for calls of standard library functions that violate predefined argument

constraints. For example, it is stated in the C standard that for the ``int

isalnum(int ch)`` function the behavior is undefined if the value of ``ch`` is

not representable as unsigned char and is not equal to ``EOF``.

.. code-block:: c

#define EOF -1

void test_alnum_concrete(int v) {

int ret = isalnum(256); // \

// warning: Function argument outside of allowed range

(void)ret;

}

void buffer_size_violation(FILE *file) {

enum { BUFFER_SIZE = 1024 };

wchar_t wbuf[BUFFER_SIZE];

const size_t size = sizeof(*wbuf); // 4

const size_t nitems = sizeof(wbuf); // 4096

// Below we receive a warning because the 3rd parameter should be the

// number of elements to read, not the size in bytes. This case is a known

// vulnerability described by the ARR38-C SEI-CERT rule.

fread(wbuf, size, nitems, file);

}

You can think of this checker as defining restrictions (pre- and postconditions)

on standard library functions. Preconditions are checked, and when they are

violated, a warning is emitted. Post conditions are added to the analysis, e.g.

that the return value must be no greater than 255.

For example if an argument to a function must be in between 0 and 255, but the

value of the argument is unknown, the analyzer will conservatively assume that

it is in this interval. Similarly, if a function mustn't be called with a null

pointer and the null value of the argument can not be proven, the analyzer will

assume that it is non-null.

These are the possible checks on the values passed as function arguments:

- The argument has an allowed range (or multiple ranges) of values. The checker

can detect if a passed value is outside of the allowed range and show the

actual and allowed values.

- The argument has pointer type and is not allowed to be null pointer. Many

(but not all) standard functions can produce undefined behavior if a null

pointer is passed, these cases can be detected by the checker.

- The argument is a pointer to a memory block and the minimal size of this

buffer is determined by another argument to the function, or by

multiplication of two arguments (like at function ``fread``), or is a fixed

value (for example ``asctime_r`` requires at least a buffer of size 26). The

checker can detect if the buffer size is too small and in optimal case show

the size of the buffer and the values of the corresponding arguments.

.. code-block:: c

int test_alnum_symbolic(int x) {

int ret = isalnum(x);

// after the call, ret is assumed to be in the range [-1, 255]

if (ret > 255) // impossible (infeasible branch)

if (x == 0)

return ret / x; // division by zero is not reported

return ret;

}

Additionally to the argument and return value conditions, this checker also adds

state of the value ``errno`` if applicable to the analysis. Many system

functions set the ``errno`` value only if an error occurs (together with a

specific return value of the function), otherwise it becomes undefined. This

checker changes the analysis state to contain such information. This data is

used by other checkers, for example :ref:`alpha-unix-Errno`.

**Limitations**

The checker can not always provide notes about the values of the arguments.

Without this information it is hard to confirm if the constraint is indeed

violated. The argument values are shown if they are known constants or the value

is determined by previous (not too complicated) assumptions.

The checker can produce false positives in cases such as if the program has

invariants not known to the analyzer engine or the bug report path contains

calls to unknown functions. In these cases the analyzer fails to detect the real

range of the argument.

**Parameters**

The checker models functions (and emits diagnostics) from the C standard by

default. The ``ModelPOSIX`` option enables modeling (and emit diagnostics) of

additional functions that are defined in the POSIX standard. This option is

disabled by default.

.. _osx-checkers:

osx

^^^

macOS checkers.

.. _osx-API:

▲ Show 20 Lines • Show All 1,609 Lines • ▼ Show 20 Lines

* The taintedness property is not propagated through function calls which are

unknown (or too complex) to the analyzer, unless there is a specific

propagation rule built-in to the checker or given in the YAML configuration

file. This causes potential true positive findings to be lost.

alpha.unix

^^^^^^^^^^^

.. _alpha-unix-StdCLibraryFunctions:

alpha.unix.StdCLibraryFunctions (C)