EnumCastOutOfRangeChecker should not perform enum range checks on LValueToRValue casts, since this type of cast does not actually change the underlying type. Performing the unnecessary check actually triggered an assertion failure deeper in EnumCastOutOfRange for certain input (which is captured in the accompanying test code).
Details
- Reviewers
Szelethus gamesh411 NoQ - Group Reviewers
Restricted Project - Commits
- rG8d4ccfe3689e: Merging r369760: --------------------------------------------------------------…
rL371058: Merging r369760:
rG09ce8ec78a95: [analyzer] Avoid unnecessary enum range check on LValueToRValue casts
rL369760: [analyzer] Avoid unnecessary enum range check on LValueToRValue casts
rC369760: [analyzer] Avoid unnecessary enum range check on LValueToRValue casts
Diff Detail
Event Timeline
+ @gamesh411 as he took over this checker before I commited it on his behalf, +@NoQ because he is far more knowledgeable about this part of the analyzer.
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
121–122 | So this is where the assertion comes from, and will eventually lead to SValBuilder::evalEQ, which calls SValBuilder::evalBinOp, where this will fire on line 427: assert(op == BO_Add); Seems like this happens because unused's value in your testcase will be retrieved as a Loc, while the values in the enum are (correctly) NonLoc, and SValBuilder::evalBinOp thinks this is some sort of pointer arithmetic (5 + ptr etc). How about instead of checking for LValueToRValue cast, we check whether ValueToCast is Loc, and bail out if so? That sounds like a more general solution, but I didn't sit atop of this for hours. |
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
94–98 | If you look at the list of cast kinds, you'll be shocked to see ridiculously weird cornercases. Even though lvalue-to-rvalue cast definitely stands out (because it's the only one that has very little to do with actual casting), i'd still be much more comfortable if we *whitelist* the casts that we check, rather than blacklist them. Can you take a look at the list and find which casts are we actually talking about and hardcode them here? | |
clang/test/Analysis/enum-cast-out-of-range.c | ||
27–34 | I guess this covers D33672#1537287! That said, there's one thing about this test that i don't understand. Why does this AST contain an implicit lvalue-to-rvalue cast at all? This looks like a (most likely otherwise harmless) bug in the AST; if i understand correctly, this should be a freestanding DeclRefExpr. |
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
94–98 | I'm happy to cross-check a list; however, I'm not aware of the list you are referring to. Would you mind providing a bit more detail? | |
121–122 | Is this the only case where ValueToCast will be Loc? I was unsure about the implications of Loc vs. NonLoc in this code, and I didn't want to risk breaking a check that should have been valid. That's why I opted for bailing on an LValueToRValue cast at a higher level-- that seemed much safer to me. I certainly might be missing something, but I couldn't find any evidence that an LValueToRValue cast should ever need to be checked for enum range errors. I will certainly defer to your judgement, but I'd like to have a better understanding of why looking for ValueToCast == Loc would actually be more correct. | |
clang/test/Analysis/enum-cast-out-of-range.c | ||
27–34 | It sure looks like D33672 is the same bug, so yes, I bet it will fix that one. This fix also addresses https://bugs.llvm.org/show_bug.cgi?id=41388. (Forgive me if there's a way to directly link to bugs.llvm.org with this markup.) Not sure about whether the AST should have represented this node as a DeclRefExpr-- that certainly makes some sense, but the LValueToRValue cast isn't really wrong either. At the end of the day, this statement is useless, so I'm not sure it matters (much) how the AST represents it. Representing it as LValueToRValue cast wouldn't cause side effects, would it? (E.g., cause IR to contain explicit dereferencing code?) |
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
94–98 | ||
121–122 |
In this case it probably doesn't matter too much, but generally i always recommend against this sort of stuff: if possible, consult the AST. A Loc may represent either a glvalue or a pointer-typed prvalue, and there's no way to tell the two apart by only looking at that Loc. In particular, both unary operators * and & are modeled as no-op. I've been a lot of incorrect code that tried to dereference a Loc too many or too few times because the code had misjudged whether it has already obtained the prvalue it was looking for. But it's easy to discriminate between the two when you look at the AST. The Static Analyzer is a converter from ASTs to ExplodedGraphs. When in doubt, consult the AST. |
Oh, btw, thank you for working on this!
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
121–122 | Notes taken, thanks! :) |
Glad to help!
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
94–98 | Ah. Okay -- That is a list I've seen before, of course... :) Hmm... regarding whitelisting versus blacklisting... In this case, it seems to me that switching to whitelisting casts that we know we should be checking increases the risk that we would introduce a new bug -- specifically that we'd accidentally leave out cast type that should have been included, which would disable a legitimate check (that some users might already be relying on). By contrast, blacklisting the one cast type we know should *not* be checked adds zero new risk and still fixes this bug. The sense of risk for me comes partly from being fairly new to working on clang AST code -- this is my first change. It strikes me that if I were to feel confident about having the "correct" whitelist, I would have to understand every cast type fairly deeply. Given that I see several cast kinds in that list that I know nothing about, I'm concerned with the level of effort required, and that I still won't be able to fully mitigate the risk. Can you help me understand your rationale for preferring a whitelist in this case? Or, do you feel I've overstated the risk here? I'm not emotionally invested in being proven correct-- just want to learn! :) |
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
94–98 | Generally, if you're not sure, try to not to warn. False positives are worse than false negatives. It is natural to start with a small set of cases in which you're sure, and then slowly cover more and more cases as you investigate further. Leave a comment if you're unsure if you covered all cases, so that people knew that the code is potentially incomplete and it's a good place to look for potential improvements. When you're actively developing the checker, you should rely on your own experiments with the real-world code that you plan to analyze. When the checker is still in alpha, it's fine for it to be more aggressive than necessary, because "you're still experimenting with it", but this will need to be addressed when the checker is moved out of alpha. In particular, you have the following tricks at your disposal:
|
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
94–98 | Okay. Very fair points. I've been focused too much on released production code lately-- the mantra there is to be a surgical as possible, and minimize risk of any changes other than the one we are specifically trying to correct. Of course, that doesn't preclude wider-scope change, but it requires a great deal more time and effort to validate. In this case, where the code is alpha, and where missing a valid warning is better than issuing a spurious one, it makes sense to do as you've described. I'll work on that. Thanks for taking the time to explain in more detail. Logistical question now (since this is my first change and review for LLVM/clang code) -- does this review stall until I have updated code (which may not happen quickly due to other work on my plate)? Or do we greenlight this change, and I just follow it up later with the whitelist version in a separate review? |
Thanks!
clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp | ||
---|---|---|
94–98 | This change is already is improvement, so feel free to add a // FIXME: Turn this into a whitelist. and commit :) Generally, patches don't need to be perfect, but they should be going in the right direction and shouldn't cause glaring regressions. A lot of review feedback is not in order to block the patch, but instead to help you make it even better and share common values and such. | |
clang/test/Analysis/enum-cast-out-of-range.c | ||
27–34 | Aha, i think i made some sense out of it. That's different in C and C++. In particular, if you have a freestanding *x where x is a null pointer, it's UB in C but not in C++ (though &*x is explicitly defined to be the same as x in C, so it's not a UB anymore if you proceed to turn it back into a pointer rvalue). I figure, Clang models these different rules by automatically wrapping any freestanding lvalue into an implicit lvalue-to-rvalue cast in C but not in C++. |
LGTM, thanks! Do you need someone to commit this on your behalf? Also, could you please make the comments capitalized, terminated, and fitting in 80 columns?
clang/test/Analysis/enum-cast-out-of-range.c | ||
---|---|---|
14–24 | Note that you don't need to add the entire warning message here, and since its all the same, we could shorten it. Could you please format these lines like this?: enum unscoped_unspecified_t InvalidAfterRangeEnd = (enum unscoped_unspecified_t)(5); // expected-warning@-1{{not in the valid range of values for the enum}} If by some miracle the entire warning message fits in 80 columns, go for it! :) |
I have updated the comments and line formatting as you recommended. Given that this change is already "Accepted", can I (should I) upload new differential for this change, or should this be delivered as-is, and I'll upload the new diffs as a new/separate change?
And yes, I will need someone to commit on my behalf. I'm too new to have commit privs. :)
Yup, you can upload it and the green checkmark will stay. If it doesn't, I'll accept again.
I think there was one case when I added like 800 extra LOCs to a patch and phabricator automatically marked it as "needs reviews", but I never came across this after that, even when I rewrote the entire thing.
Oh, there is no need for a new differential, you can update this one by clicking on 'update diff' in the right panel.
Follow-up on reviewer feedback. Changed from blacklisting LValueToRValue to whitelisting IntegralCast. This was a good call -- additional testing with different cast kinds showed that the assertion tripped for other casts besides LValueToRValue, e.g., FloatToIntegral. I couldn't see any casts other than Integral where the enum check seemed appropriate. Testing with only IntegralCast enabled gave expected (correct) results.
Also reformatted the new regtest file per reviewer comments.
You seem to have diffed against your latest local commit, rather than against trunk (or master, if you use the monorepo). Phabricator isn't smart enough to put two and two together, and only displays the uploaded diff (though one has to admit, its doing a damn good job at that at least!).
Could you please upload the version diffed against master (git diff master -U999999)?
Kristoff, if you wouldn't mind, since you offered earlier, please go ahead and commit this change as-is, since it was accepted. I ran into some non-technical issues with my follow-up changes and I'm going to be unavailable for several weeks. To mitigate risk and work for my team, I'd like to submit the newer changes separately (and will reference this review in that changeset when I do, of course), after I return to work.
And-- thank you for your help and feedback! I appreciated this upstreaming process (especially for what seemed like a fairly small/simple chnage). I expect there will be many more.
What I meant is that the diff has to be made against the master branch or I won't be able to apply it locally. Say your repo is structured like this:
* dcba2 (HEAD -> my_fix) * dcba1 * abcd4 (master) * abcd3 * abcd2 * abcd1
Then the diff has to be made with git diff master -U99999 > my_fix.diff and uploaded here, or with the use of arcanist (which I still find incredibly inconvenient). If you only upload the latest commit, I'd be missing dcba1.
If you have a lot of changes you want to submit, like this:
* hjkd2 (enable_feature_by_default) * hjkd1 * haha1 (my_feature) * dcba2 (HEAD -> my_fix) * dcba1 * abcd4 (master)
You should create a revision for each branch, diff them against each other (git checkout my_feature && git diff my_fix -U99999 > my_feature.diff), and mark them as dependencies of each other on the right hand panel.
It is kinda inconvenient, I'll admit :^)
I seem to have been able to put this together by fetching the individual diffs and squashing them this time :)
@Szelethus, firstly, thank you for landing this change. I really appreciate it! Secondly, apologies for misspelling your name earlier. And lastly, thank you for your patient explanation of how to get the diffs updated correctly in a Phabricator review. I think my mistake was that I made the change and the updates updates in a private branch, and not directly off master, and then duplicated the changes on master. For one of those sets of changes, I amended the first commit with the second round of changes, and I think I confused myself by doing that. In any case, lesson learned, and thank you again for your coaching!
If you look at the list of cast kinds, you'll be shocked to see ridiculously weird cornercases. Even though lvalue-to-rvalue cast definitely stands out (because it's the only one that has very little to do with actual casting), i'd still be much more comfortable if we *whitelist* the casts that we check, rather than blacklist them.
Can you take a look at the list and find which casts are we actually talking about and hardcode them here?