This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Upstream BitwiseShiftChecker
ClosedPublic

Authored by donat.nagy on Jul 26 2023, 4:22 AM.

Details

Summary

This commit releases a checker that was developed to a stable level in the Ericsson-internal fork of Clang Static Analyzer.

Note that the functionality of this checker overlaps with core.UndefinedBinaryOperatorResult ("UBOR"), but there are several differences between them:
(1) UBOR is only triggered when the constant folding performed by the Clang Static Analyzer engine determines that the value of a binary
operator expression is undefined; this checker can report issues where the operands are not constants.
(2) UBOR has unrelated checks for handling other binary operators, this checker only examines bitwise shifts.
(3) This checker has a Pedantic flag and by default does not report expressions (e.g. -2 << 2) that're undefined by the standard but consistently supported in practice.
(4) UBOR exhibits buggy behavior in code that involves cast expressions, e.g.

void foo(unsigned short s) {
  if (s == 2) {
    (void) ((unsigned int) s) << 16;
  }
}

Later it would be good to eliminate this overlap (perhaps by deprecating and then eliminating the bitwise shift handling in UBOR), but in my opinion that belongs to separate commits.

Co-authored-by: Endre Fulop <endre.fulop@sigmatechnology.se>

Diff Detail

Event Timeline

donat.nagy created this revision.Jul 26 2023, 4:22 AM
Herald added a project: Restricted Project. · View Herald Transcript
donat.nagy requested review of this revision.Jul 26 2023, 4:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 4:22 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I tested this commit on several open source projects (in the default mode Pedantic=false), the following table summarizes the results:

vim_v8.2.1920_baseline_vs_vim_v8.2.1920_with_bitwise_shiftNew reportsLost reportsno differences
openssl_openssl-3.0.0-alpha7_baseline_vs_openssl_openssl-3.0.0-alpha7_with_bitwise_shiftNew reportsLost reports+2 FPs from same invalid loop assumption [1]
sqlite_version-3.33.0_baseline_vs_sqlite_version-3.33.0_with_bitwise_shiftNew reportsLost reportsone FP lost for unclear reasons
ffmpeg_n4.3.1_baseline_vs_ffmpeg_n4.3.1_with_bitwise_shiftNew reportsLost reports21 new reports, 14 lost reports, mostly FPs [2]
postgres_REL_13_0_baseline_vs_postgres_REL_13_0_with_bitwise_shiftNew reportsLost reportstwo FPs are "converted" to new checker, one unreadable report lost
libwebm_libwebm-1.0.0.27_baseline_vs_libwebm_libwebm-1.0.0.27_with_bitwise_shiftNew reportsLost reportsno differences
xerces_v3.2.3_baseline_vs_xerces_v3.2.3_with_bitwise_shiftNew reportsLost reportsno differences
bitcoin_v0.20.1_baseline_vs_bitcoin_v0.20.1_with_bitwise_shiftNew reportsLost reports6 new FPs caused by incorrect config of our CI [3], one FP lost
protobuf_v3.13.0_baseline_vs_protobuf_v3.13.0_with_bitwise_shiftNew reportsLost reportsthree reports "converted" to new checker
qtbase_v6.2.0_baseline_vs_qtbase_v6.2.0_with_bitwise_shiftNew reportsLost reports3 new reports, 3 lost reports [4]
contour_v0.2.0.173_baseline_vs_contour_v0.2.0.173_with_bitwise_shiftNew reportsLost reportsno differences
acid_2022-08-02-codechecker-test_baseline_vs_acid_2022-08-02-codechecker-test_with_bitwise_shiftNew reportsLost reportsno differences

More detailed summary:
[1] When the engine encounters a loop, it arbitrarily assumes that (there is a valid path where) the loop body will be executed 0 times. These unfounded assumptions produce false positives in many stable checkers.
[2] It seems that most of these warnings report the same underlying issues in different ways, but I didn't do a through analysis yet.
[3] The build/analysis procedure of bitcoin was misconfigured in our CI and the VERIFY_CHECK() assertions were preprocessed incorrectly.
[4] Logical relationships between these reports:

  • two new false positives (1 and 2) are caused by the same underlying engine misunderstanding which had caused a different false positve in the baseline;
  • one old report is eliminated because core.BitwiseShift is non-pedantic by default;
  • one old false positive is lost for unclear reasons;
  • one new false positive appeared, it's caused by an incorrect assumption that an 'if' and a 'while' are logically independent (in fact, the loop body will never run in cases where the 'if' condition is false).
clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
46–47

In this test file these warnings were irrelevant, so I picked a different arbitrary constant to eliminate them.

(Fix incorrect filename mark in the license header.)

NoQ added a comment.Jul 26 2023, 5:18 PM

(1) UBOR is only triggered when the constant folding performed by the Clang Static Analyzer engine determines that the value of a binary operator expression is undefined

Yes I wholeheartedly agree, these checks are pre-condition checks, they should have never been implemented in checkPostStmt. Instead they should stop the engine from evaluating the statement entirely if they think undefined behavior would occur, just like the division by zero checker. Just for that reason I'm happy to remove the existing bitshift check in favor of this check, as well as remove the ExprEngine/SValBuilder code that produces UndefinedVals in these cases. So, architecturally, this is the step in the right direction and I love it!

(4) UBOR exhibits buggy behavior in code that involves cast expressions

Hmm, what makes your check more resilient to our overall type-incorrectness?

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
94–98

Because addTransition() is a very confusing API that's very easy to misuse in subtle ways (causing unexpected state splits), I feel anxious when I see functions like

bool isValidShift();
bool isOvershift();
bool isOperandNegative(OperandSide Side);
bool isLeftShiftOverflow();

that look like boolean getters but in fact call addTransition() under the hood. If we could at least call them checkOvershift() etc., that'd be much better. Ideally I wish there was some safeguard against producing redundant transitions, like make them return an ExplodedNode and chain them, or something like that.

301–302

Can we just append this to the warning? The addNote() is useful for notes that need to be placed in code outside of the execution path, but if it's always next to the warning, it probably doesn't make sense to display it separately.

352–353

Props!! Even if they don't do anything, ultimately it's their responsibility.

(4) UBOR exhibits buggy behavior in code that involves cast expressions

Hmm, what makes your check more resilient to our overall type-incorrectness?

The code of UBOR (UndefResultChecker.cpp) uses the method LHS->countl_zero() (count zeros on the left side of the binary representation) which is very convenient but uses the "cached" bit width that is stored in the APSInt and apparently not updated by the casts. The checker proposed in this commit avoids this pitfall because it calls getActiveBits() on the APSInt and subtracts the result from the bit width of the type of the LHS (which is queried from the AST node).

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
94–98

These functions were originally called checkXXX() but I felt that the meaning of their boolean return value was too confusing with that naming scheme.

I'll try to ensure that emitReport or addTransition are only called on the topmost layer (the run() method) and change the return types of internal subroutines to make them return nullable pointers (e.g. std::unique_ptr<PathSensitiveBugReport>) instead of bools. This would (1) eliminate the "what does true mean at this point?" issues (2) clarify that this checker never performs more than one transition. (If the shift is invaild, we transition to a sink node by emitting a bug; if the shift is vaild, we perform a transition to record the state update.)

301–302

I didn't append this to the warning because I felt that the warning text is already too long. (By the way, an earlier internal variant of this checker produced two separate notes next to the warning message.)

I tweaked the messages of this checker before initiating this review, but I feel that there's still some room for improvements. (E.g. in this particular case perhaps we could omit some of the details that are not immediately useful and could be manually calculated from other parts of the message...)

I like this, especially, that the functionality of UBOR can be merged and extended in a much cleaner way (architecturally).

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
301–302

Just a thought on simplifying the diagnostics a bit:

Warning: "Right operand is negative in left shift"
Note: "The result of left shift is undefined because the right operand is negative"
Shortened: "Undefined left shift due to negative right operand"

Warning: "Left shift by '34' overflows the capacity of 'int'"
Note: "The result of left shift is undefined because the right operand '34' is not smaller than 32, the capacity of 'int'"
Shortened: "Undefined left shift: '34' exceeds 'int' capacity (32 bits)"

The more complex notes are a bit sketchy, as relevant information can be lost, and the following solution is probably a bit too much, but could prove to be an inspiration:

Warning: "Left shift of '1024' overflows the capacity of 'int'"
CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 bits for bitshift"
C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"
C Note: "The value '1024' is represented by 11 bits, allowing at most 20 bits for bitshift"

Shortened:
CXX Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity (32 bits, including sign)"
C Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity (31 bits, excluding sign)"

donat.nagy added inline comments.Aug 1 2023, 1:32 AM
clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
301–302

Clarification about the Msg/ShortMsg distinction:
I'm intentionally separating the short warning messages and the longer note messages because createBugReport() enforces the convention that it will always emit a warning and a note at the bug location.

According to the comments in the source code, the intention is that the note contains all the relevant information, while the warning is a brief summary that can be displayed in situations where the notes wouldn't fit the UI.

IIUC many checkers ignore this distinction and emit the same (often long and cumbersome) message both as a note and as a warning (createBugReport() has a variant which takes only one message parameter and passes it to both locations), but I tried to follow it because I think it's ugly when the same message is repeated twice and there is some sense in providing both a brief summary and a full description that doesn't use potentially-ambiguous abbreviations to save space.

Of course I could also accept a community decision that this "brief warning / full note" distinction is deprecated and will be eliminated (because e.g. front-end utilities are not prepared to handle it), but in that case I'd strongly suggest a redesign where we eliminate the redundantly printed 'note' message. (There is no reason to say the same thing twice! There is no reason to say the same thing twice!)

On the other hand, in addition to this Msg/ShortMsg distinction, this part of the code also adds the extra LeftNote (as a remnant from an earlier internal version of this checker), and that's that's what I'd like to merge into Msg (following NoQ's suggestion and keeping it distinct from the ShortMsg).

steakhal added inline comments.Aug 3 2023, 5:52 AM
clang/docs/analyzer/checkers.rst
35
44–50

I believe shifting out the "sign bit" is UB because the representation of the signed integer type was not defined.
Prior C++20 (?), it was allowed to represent signed integer types as two's-complement (virtually every platform uses this), one's complement, sign-magnitude. Wiki.
And it turns out, all these three representations behave differently with negative values.
Check out JF's talk from 2018 about this.

I'd need to think about the relevance of C++20 in this context, but here is what I think of now:
My idea is that the observed behavior is not influenced by the "standard", rather by the time we released C++20, they realized that no actual platform uses weird signed integer representations.
Given this observation, I'd argue that we shouldn't have this flag at all.

81

We should exactly elaborate on what "proper" means here.

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
142

One does not frequently see the long long type.
Do you have a specific reason why int or unsigned wouldn't be good?

150

How about swapping these branches on the ResultVal.isUndef() predicate?

152–156

Generally, I don't favor mutating member functions.
It makes it harder to track how we ended up with a given State.

157–158

I wonder if we should add a message here that we introduced a new constraint: "X assumed to be non-negative".
To add such a message, you should do something similar to what is done in the StdLibraryFunctionsChecker.
Take the C.getPredecessor() and chain all your NoteTags by calling the Pred = C.addTransition(NewState, Pred, Tag) like this. This way. This is how one "sequence of ExplodedNodes" can be made.

clang/test/Analysis/analyzer-config.c
36

I'm pretty sure I got rid of this one :D
Consider rebasing on top of llvm/main.

donat.nagy marked an inline comment as done.Aug 3 2023, 6:05 AM

@steakhal Thanks for the review!

I answered some of your questions; and I'll handle the rest soon.

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
150

The argument of ProgramState::assume() must be a DefinedOrUnknownSVal, so I need to cast the return value of evalBinOp (which may be Undefined) to this type.

157–158

This is already done by BitwiseShiftValidator::createNoteTag(). It's a bit complex because the checker merges all its assumptions into a single note tag (because I didn't want to emit two or three notes directly after each other.

clang/test/Analysis/analyzer-config.c
36

Oops, this might be an error introduced during a rebase... Thanks for catching it!

donat.nagy planned changes to this revision.Aug 9 2023, 4:39 AM
donat.nagy marked 5 inline comments as done.

I'll soon upload a refactored version.

clang/docs/analyzer/checkers.rst
44–50

Thanks for the background information! I knew the rough outlines of the situation, but your links provided interesting details.

I agree that the non-pedantic mode is the "sane mode" that reflects the real behavior of the hardware; but the standards (with the exception of C++20) explicitly declare that signed shift overflow and shifting negative values is UB and there may be secondary requirements like SEI-CERT rule INT32-C that demand compliance to the standards.

I don't think that I'd spend time on implementing pedantic mode because there are more important things to do, but as we already have this feature, I think we should release it as an off-by-default option.

81

What would you suggest? I could try to write a slightly longer suggestion, but I don't want to repeat the same conditions that I listed above the code example.

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
142

I inherited this type choice from old code and didn't think much about it. My guess is that LimitVal needs to be a signed integer (because e.g. -1 < 0u would be evaluated to false) and it's a long long because that's guaranteed to represent all unsigned values. (However, the actual values of Limit are very small.)

I'll probably use int for Limit and LimitVal with some comments to clarify the reason for this type choice.

152–156

I also agree that mutating member functions are often problematic and may make the code more difficult to understand.

However, in this particular case I think it's important to ensure that we're always using the most recent state, and that's difficult to guarantee in the "pass everything as arguments and return values" style. Moreover, I do some parallel bookkeeping because I want to summarize the state changes in a single note tag, and that would make the "naive functional" solution even more convoluted.

In a functional language like Haskell I'd use computations in a State monad to implement this logic; my C++ code is a direct translation of that solution. (As a secondary utility, my class also provides read-only access to some context like a Reader monad, because functions that take 5-8 arguments are also very difficult to understand. This way, the relevant arguments are highlighted as actual arguments, while the shared context is described by the const data members.)

301–302

Among notes, my only planned change is that instead of

Warning: "Left shift of '1024' overflows the capacity of 'int'"
CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 bits for bitshift"
C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"
C Note: "The value '1024' is represented by 11 bits, allowing at most 20 bits for bitshift"

I'll implement something like

Warning: "Left shift of '1024' overflows the capacity of 'int'"
CXX Note: "Left shift of '1024' (represented by 11 bits) is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
C Note: "Left shift of '1024' (represented by 11 bits) is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"

donat.nagy marked 2 inline comments as done.Aug 9 2023, 6:28 AM
donat.nagy added inline comments.
clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
301–302

Correction: now I'm leaning towards just discarding the secondary note, because the message examples listed in the previous comment are just the variants where the right operand is unknown. In the more detailed message template "The shift '{0} << {1}' is undefined {2}, so {3} bit{4} overflow{5}" there is no place to insert the "represented by {} bits" message.

donat.nagy updated this revision to Diff 548626.Aug 9 2023, 8:03 AM
donat.nagy marked 6 inline comments as done.Aug 9 2023, 8:08 AM
donat.nagy added inline comments.
clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
94–98

I eliminated isValidShift() by merging it with run() (which was very short) and renamed the rest of the functions to checkXXX.

Now the exploded graph is only modified in the top-level method run().

142

I changed the type of the Limit argument to unsigned (because that's what this function is called with) and changed the type of LimitVal to IntTy with a comment that describes that it must be signed.

NoQ added inline comments.Aug 9 2023, 4:51 PM
clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
301–302

There's nothing wrong with long, multi-sentence diagnostic messages!

Unlike the compiler proper, we aren't typically used from the command line, so we aren't trying to fit into 80 columns. So we start our warnings with a capital letter and expect them to be, at the very least, complete sentences.

I haven't gone deep into the implementation, but by only looking at the quality of the code and the structure I assume it implements the right behavior.
I've checked the tests though, and they look good.
I only found irrelevant nits. Awesome work.


Have you considered checking the shift operands for taint?
I wonder if we could warn if we cannot prove the correct use of the shift operator when either of the operands is tainted.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
135

Check for subsequent duplicated words. They are usually typos.

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
10–12

Please elaborate what counts as "incorrect use".

30–31

This is used quite often and hinders the readability by indenting the args a lot. Making the spelling shorter, would somewhat mitigate this.

36
43
54

Where does this comment corresponds to?

58–60

How about using the same enum type for these two? It would signify that these are used together.
Also, by only looking at the possible values, it's not apparent that these are bitflag values. A suffix might help the reader.

61–64

Have you considered using std::optional?
Given that the dimension of this variable is "bits", I think it would be great to have that in its name.

82

Check if your member functions can be declared as const.

107–110

I'd prefer BR or Report. Since we know already that we are in the path-sensitive domain, P has less value there.
I'd apply this concept everywhere in the patch.

125–128

To me it feels as if this comment should be at the definition of checkLeftShiftOverflow, and only keep here a short reminder. Similarly to how you did with the previous cases.

149
191–192
219

The isLeftShift() ? "left" : "right" expression appears 4 times in total.
I'd also recommend to not distinguish the capitalized "left" and "right" strings, and just post-process the messages to ensure that the first character is capitalized inside the createBugReport().

Also note that you can reuse the same placeholder, so you would only need to pass the side() only once.

static void capitalizeBeginning(std::string &Str) {
  if (Str.empty())
    return;
  Str[0] = llvm::toUpper(Str[0]);
}

StringRef side() const {
  return isLeftShift() ? "left" : "right";
}
241

This will ensure that the definition of LeftAvailableBitWidth does not wrap.
And, more importantly, assert(LeftBitWidth >= UsedBitsInLeftOperand) also holds, asserted later.

253–255

Okay, so this is an invariant.
How about rephrasing this to better express that:

We should have already reported if the left operand of the shift was negative, thus this cannot be negative here.
264

I'd move this closer to the first "use" place.

316
clang/test/Analysis/bitwise-shift-common.c
109

Hmm, does this "but" actually mean "and" here?
Anyway, both using "but" and "and" there looks awkward.
It might be only me though.

130

It's an unrelated comment, but always bothers me to see 'right' mentioned here, where there is nothing called like that...

149
clang/test/Analysis/bitwise-shift-sanity-checks.c
21–23

How about simplifying these lines to -Wno-shift? Have you considered it?

58–59

Let's test if the shift RHS operand is bigger than 64.

clang/test/Analysis/bitwise-shift-state-update.c
31–36

Have you considered using clang_analyzer_value(left) for inspecting the assumed range of the operands?
IMO that leads to more readable tests.

donat.nagy marked 4 inline comments as done.Aug 10 2023, 4:57 AM

Thanks for the suggestions, I answered those where I had something to say and I'll implement the rest of them.

Have you considered checking the shift operands for taint?
I wonder if we could warn if we cannot prove the correct use of the shift operator when either of the operands is tainted.

It would be straightforward to add taint handling; but I'd prefer to leave it for a follow-up commit. Some experimentation would be needed to see whether it produces useful results (do real-world projects use shifts on tainted numbers? do we encounter false positives?).

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
54

The two data members (Ctx and State) that are directly below the comment.

I tried to split the list of data members into three groups (primary mutable state, secondary mutable state, const members), but now I reconsider this I feel that these header comments are actually superfluous.

I'll return to a variant of the earlier layout where I put NonNegFlags/UpperBound to the end and comment that they are only used for note tag creation; but don't emphasize the (immediately visible) constness / non-const-ness of other members with comments.

58–60

Using the enum type is a good idea, however I'd probably put the Flag suffix into the name of the type:

enum SignednessFlags : unsigned { NonNegLeft = 1, NonNegRight = 2 };
SignednessFlags NonNegOperands = 0;
61–64

I considered using std::optional, but using this "old-school" solution ensures that NoUpperBound is comparable to and larger than the "real" upper bound values, which lets me avoid some ugly boolean logic. I'll update the comment and probably also the variable name.

107–110

The P stands for the "Ptr" in the name of the type alias BugReportPtr; but your comment clearly highlights that this variable name is confusing ;).

I'll probably avoid the auto and use BugReportPtr Bug = ... to remind the reader that this is a pointer (that can be null) and if it's non-null, then we found a bug.

191–192

This is an important distinction, I use "not smaller than {2}" as a shorter synonym of "larger than or equal to {2}". I can choose some other wording if you feel that this is not clear enough.

264

Good point, previously it was used by the calculation that is now performed by assumeRequirement.

301–302

Even if the message length is not limited, I think it's better to delete the additional note line (instead of merging it with the "regular" note line) because it's just too much information and the gist of the issue is lost behind the flood of redundant numbers.

However, this is just a subjective decision and I'm open to extending the message if you feel that something important is missing from it.

316

I'd say that the current wording is correct because the the checker "use"s the comparison operators (in its implementation) instead of "accept"ing them from outside; but I can replace the message with e.g. "this function does not accept other comparison operators" if you'd prefer that.

clang/test/Analysis/bitwise-shift-common.c
109

Yes, this sentence fragment is like "not too small but not too large", where "but" is equivalent to "and" and both are a bit awkward.

clang/test/Analysis/bitwise-shift-sanity-checks.c
21–23

Unfortunately there is no such shortcut: when I tried to execute trunk clang with it, I got "warning: unknown warning option '-Wno-shift'".

donat.nagy marked 21 inline comments as done.Aug 10 2023, 10:15 AM
donat.nagy added inline comments.
clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
30–31

Good idea!

58–60

I rename the member NonNegFlags to NonNegOperands but its type remains a plain unsigned because otherwise using the or-assign operator like NonNegOperands |= NonNegLeft produces the error

Assigning to 'enum SignednessFlags' from incompatible type 'unsigned int' (clang typecheck_convert_incompatible)

(note that both operands were from the same enum type).

219

I introduced a method for isLeftShift() ? "left" : "right" (which is indeed repeated several times), but I didn't define capitalizeBeginning because that would've been used only once.

Your remark that [with the capitalization function] "you can reuse the same placeholder, so you would only need to pass the side() only once" seems to be incorrect, because Side == OperandSide::Left ? "Left" : "Right" (which operand are we talking about?) and isLeftShift() ? "left" : "right" (what kind of shift operator are we talking about?) are two independent values.

donat.nagy marked 2 inline comments as done.

Uploading a new version based on the reviews.

donat.nagy marked 3 inline comments as done.Aug 10 2023, 10:17 AM

Great progress.
Now, moving to the docs.
Have you checked that the docs compile without errors and look as you would expect? Please post how it looks.

Please make a note about this new checker in the clang's release docs, as we usually announce new checkers and major changes there.

clang/docs/analyzer/checkers.rst
58–64

else after return

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
54

One way to emphasize this is by choosing names like FoldedState, or anything else that has something to do with "motion" or "change".
Given that if we have ProgramStateRefs in helper classes they usually remain still. Think of BugReportVisitors for example. Consequently,it's important to raise awareness when it's not like that. A different name would achieve this.

58–60

Okay, indeed one would need to define the operator|(SignednessStatus, SignednessStatus) and operato&|(SignednessStatus, SignednessStatus) as well.
It probably doesn't worth it. Such a pity that we can't have nice things in C++.

61–64

What "ugly boolean logic" are you referring to?
I thought one can always use value_or(999) anyway. To me, that means that the option approach is a refinement of yours. But depending on the situation, it might be possible to avoid the extrema value 999 altogether.

107–110

Oh I see. To me, it's fairly obvious now that bugreports are just smart-pointers.
So I implicitly looked for some other reason. But just look at other checker, and call it the same way how they do, for consistency. That's all I wanted.
There are two/three common options I believe: B or BR, and maybe Report.

191–192

No, I have not strong feelings. Feel free to use the one you like more.

219

Makes sense. It was so tempting though.

273

getExtValue(), I believe, asserts that it "fits", thus the concrete int is at most 64 bits wide. Does it work for _BigInts? (or whatever we have like that :D)

300–304

When I see something like this I always think of using a ternary.
Have you also considered?

316

No strong feelings :)

336–340

AHA! Now I got you. You also call this R.
Just use R at the callsite as well. Job done.

356–358

How about eagerly inline initializing this field? And avoiding mutable as well.

368–370
donat.nagy marked 13 inline comments as done.Aug 14 2023, 8:55 AM

I handled the inline comments, I'll check the example code and edit the release notes tomorrow.

clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
54

My intuition is that the variable name "State" already implies that it will be regularly changed and updated as we are going forward; but you're right that in this project we've a tradition of storing snapshots of "old" states in various helper classes.

(By the way, I'm a bit worried when I read code that stores and passes around random stale snapshots of the state. Of course we cannot avoid the branching nature of the exploded graph, but other than that it'd be good to store the State in language constructs where you cannot accidentally lose information by building onto an older state variable.)

61–64

I'm prejudiced against optional in C++ because I feel that it's clunky and doesn't fit the rest of the language. (Note that I like Maybe in Haskell and don't have strong feelings about "using a variable that may be None" in Python; so this is specific to C++.)

However, in this particular case I decided to switch to std::optional<unsigned>, because it lets me write !UpperBoundBitCount instead of the longer (but perhaps more readable) UpperBoundBitCount != NoUpperBound.

The "ugly boolean logic" that I spoke about is (!UpperBoundBitCount || Limit < UpperBoundBitCount.value()); however I think this is still better than Limit < UpperBoundBitCount.value_or(999) which falls to the ground between two chairs as it introduces both the type optional and the arbitrary placeholder number.

107–110

Ok, I switched to the variable name "BR". I didn't have a strong preference for using "Bug", I just thought that your request is "the current name is bad, use anything else" and not "please use one of the traditional names".

219

Yes, trivial string formatting can introduce a surprising amount of complications...

273

This final check is executed when we already know that 0 <= ConcreteRight < 128 (or the bit width of the largest integer type). I added a comment that mentions this and replaced the type std::int64_t with a plain unsigned.

I also updated the testcase large_right to ensure that later changes don't introduce crashes related to absurdly oversized right arguments.

336–340

This is a code fragment that survived my refactoring efforts without significant changes... until now :)

Now that you highlighted it, I'm renaming R to BR just to be a contrarian :)

356–358

This pattern systematically appears in all checkers that I've seen because the constructor of BugType queries and stores the name of the checker, which is only initialized when the checker (*this) is registered. This dependency means that the BugType cannot be initialized in the constructor of the Checker.

donat.nagy marked 7 inline comments as done.
steakhal accepted this revision.Aug 15 2023, 1:58 AM

The test coverage is also impressive. Thanks!

This revision is now accepted and ready to land.Aug 15 2023, 1:58 AM
donat.nagy edited the summary of this revision. (Show Details)

I verified that the checker handles the examples in the documentation correctly (and added them to the test suite). However, as I was tweaking the examples in the documentation, I accidentally found a situation where the checker produces a very surprising and arguably incorrect error message.

After investigating this issue, I added the testcases signed_aritmetic_{good,bad} which document the current sub-optimal state. The root cause of this problem is a high-level property of the engine (that it assumes that signed overflows are always possible and acceptable) and I don't see a local workaround that would silence or fix these incorrect error messages.

@steakhal @NoQ What do you know about these signed overflow issues (I presume that analogous issues also appear in other checkers)? How should we handle this limitation of this checker?

(The release notes entry is still missing, I'll try to add it tomorrow.)

steakhal accepted this revision.Aug 15 2023, 11:15 AM

After investigating this issue, I added the testcases signed_aritmetic_{good,bad} which document the current sub-optimal state. The root cause of this problem is a high-level property of the engine (that it assumes that signed overflows are always possible and acceptable) and I don't see a local workaround that would silence or fix these incorrect error messages.

@steakhal @NoQ What do you know about these signed overflow issues (I presume that analogous issues also appear in other checkers)? How should we handle this limitation of this checker?

I don't think most checkers depend on value ranges explicitly, except for a handful of checkers maybe. So, I don't think it's such a huge deal, but I agree that it's a problem.
I would bet that the StdLibraryFunctionsChecker would suffer from the same issue, and the OOB checker(s) along with it.
I don't know of heuristics mitigating the damage.

I don't think we should do anything about it unless it's frequent enough.
Try to come up with a heuristic to be able to measure how often this happens, if you really care.
Once you have a semi-working heuristic for determining if that bugpath suffers from this, you can as well use it for marking the bugreport invalid if that's the case to lean towards suppressing those FPs.

I don't think it's completely necessary to fix those FPs to land this. I think of that as an improvement, on top of this one.
I hope I clarified my standpoint.

clang/test/Analysis/bitwise-shift-sanity-checks.c
71–75

Let's be explicit about the actual assumed value range, and use clang_analyzer_value().
I also recommend using an explicit FIXME for calling out what should be there, instead of abusing a non-matching expected pattern. I know I used that in the past, and probably seen it from me. I feel ashamed that I did that. I know I did that to have cleaner diffs, once fixed, but I honestly think it does more harm than good.

I don't think we should do anything about it unless it's frequent enough.
Try to come up with a heuristic to be able to measure how often this happens, if you really care.
Once you have a semi-working heuristic for determining if that bugpath suffers from this, you can as well use it for marking the bugreport invalid if that's the case to lean towards suppressing those FPs.

Minor clarification: these are not FPs, these are true positives with a bad error message. I was annoyed when I found this surprising bug on the almost-ready checker that I was working on; but I wouldn't say that this is an especially severe issue.

I don't think it's completely necessary to fix those FPs to land this. I think of that as an improvement, on top of this one.
I hope I clarified my standpoint.

I agree, I'll create a new revision which mentions this checker in the release notes, and I hope that I'll be able to merge that. Later I'll try to return to this question with either an engine improvement or a heuristic mitigation.

clang/test/Analysis/bitwise-shift-sanity-checks.c
71–75

I already tried calling clang_analyzer_value() at that point when I was investigating, but unfortunately it won't say anything useful because the actual range corresponding to the symbolic expression is only calculated when the evalBinOp() calls examine it.

I'll change the "expected - warning" to a FIXME.

Updated the release notes. This could be the final version of this commit if you agree that it's good enough.

donat.nagy marked 2 inline comments as done.Aug 16 2023, 5:03 AM
donat.nagy added inline comments.
clang/docs/analyzer/checkers.rst
81

(I'm marking this request as "Done" because I couldn't find a better wording. Feel free to reopen the discussion if you wish.)

donat.nagy updated this revision to Diff 550706.EditedAug 16 2023, 5:07 AM

A few minutes ago I accidentally re-uploaded the previous version of the patch (diffs #550318 and #550703 are identical); now I'm fixing this by uploading the actually updated variant.

This revision was landed with ongoing or failed builds.Aug 18 2023, 1:48 AM
This revision was automatically updated to reflect the committed changes.