This is an archive of the discontinued LLVM Phabricator instance.

[attributes] [analyzer] Add an attribute to prevent checkers from modeling a function
Needs ReviewPublic

Authored by xazax.hun on Dec 30 2019, 2:51 PM.

Details

Summary

While the static analyzer is doing a great job modeling functions most of the time, sometimes we have global dynamic invariants that might be infeasible to reason about statically (or in some cases just to complex to worth to implement it).

It would be great to have a way to tell the checkers that certain functions are not worth modeling. In case of fuchsia.HandleCheck, we could achieve the same by simply removing all the annotations. This is not always desired though. The annotations are useful on their own, they document the ownership semantics of the handles. So instead of removing the annotations for these functions, we think it might be better to introduce yet another annotation for this use case.

What do you think? Is this reasonable or do you have any alternative ideas?

For the curious the syscall I have problem with in Fuchsia is the zx_channel_read: https://fuchsia.dev/fuchsia-src/reference/syscalls/channel_read.md

The problem is mainly with the handles parameter. We might not receive handles at all. And we might know that in advance that we will not receive handles, so we do not need to check the actual_handles. So assuming handles contains open handles will end up producing spurious handle leak errors.

Diff Detail

Event Timeline

xazax.hun created this revision.Dec 30 2019, 2:51 PM

What do you think? Is this reasonable or do you have any alternative ideas?

This seems like a very specialized attribute for the static analyzer and I'm not certain how much utility it adds, but that may be because I don't understand the analyzer requirements sufficiently yet. It seems as though this attribute is intended to suppress diagnostics within a certain function for a certain check, much like gsl::suppress does for the C++ Core Guidelines. Is that correct? If so, perhaps we should just reuse that attribute (but with a different spelling)?

This seems like a very specialized attribute for the static analyzer and I'm not certain how much utility it adds, but that may be because I don't understand the analyzer requirements sufficiently yet. It seems as though this attribute is intended to suppress diagnostics within a certain function for a certain check, much like gsl::suppress does for the C++ Core Guidelines. Is that correct? If so, perhaps we should just reuse that attribute (but with a different spelling)?

In some sense they are similar, but not entirely. When I annotate a function with gsl::suppress, I would expect to suppress diagnostics inside the function. When I annotate a function with analyzer_checker_ignore, that would suppress diagnostics in the callers of the function. So while the gsl::suppress is merely a way to suppress diagnostics, this attribute does change how the analysis is carried out.

This seems like a very specialized attribute for the static analyzer and I'm not certain how much utility it adds, but that may be because I don't understand the analyzer requirements sufficiently yet. It seems as though this attribute is intended to suppress diagnostics within a certain function for a certain check, much like gsl::suppress does for the C++ Core Guidelines. Is that correct? If so, perhaps we should just reuse that attribute (but with a different spelling)?

In some sense they are similar, but not entirely. When I annotate a function with gsl::suppress, I would expect to suppress diagnostics inside the function. When I annotate a function with analyzer_checker_ignore, that would suppress diagnostics in the callers of the function. So while the gsl::suppress is merely a way to suppress diagnostics, this attribute does change how the analysis is carried out.

Thank you for the explanation, that makes sense, but I'm still a bit uncomfortable. In this case, it seems like the author of the API needs to 1) know that users will be analyzing the code with clang's static analyzer, 2) and that a particular function will cause problems for a specific check. It seems like the API author won't be in a position to add the attribute to the correct APIs, and what we really need is something for a *user* to write on an existing declaration they may not control or have the ability to modify the declaration of. I am not certain how much I like this idea, but perhaps we could allow the attribute to be written on a redeclaration and apply it to the canonical declaration so that users could add the attribute instead of relying on the library author to do it for them?

Also, does this mean that changing the name of a check in the static analyzer will now have the potential to break code? e.g., if a check move from the alpha group to the core group, the check name changes, and thus code needs to be updated. I presume one of the things this attribute will do is diagnose when an unknown check name is given? This has the potential to introduce diagnostics where a library is written for a newer clang but being compiled within an older clang, so we'd probably need a warning flag or something else to control the diagnostic.

Thank you for the explanation, that makes sense, but I'm still a bit uncomfortable. In this case, it seems like the author of the API needs to 1) know that users will be analyzing the code with clang's static analyzer, 2) and that a particular function will cause problems for a specific check. It seems like the API author won't be in a position to add the attribute to the correct APIs, and what we really need is something for a *user* to write on an existing declaration they may not control or have the ability to modify the declaration of. I am not certain how much I like this idea, but perhaps we could allow the attribute to be written on a redeclaration and apply it to the canonical declaration so that users could add the attribute instead of relying on the library author to do it for them?

This is a valid concern. Currently, this is a common practice in the static analyzer to exclude certain functions from the checks, but until now, we usually hard coded the name/signature of the function rather than relying on an attribute. This did make sense, since it is an implementation detail that belongs to the analyzer. The main reason why we was thinking about making this an annotation, because if there is an annotation, when the API changes, we do not need to update the analyzer, and do not need to make sure that the analyzer version and the API matches.

Also, does this mean that changing the name of a check in the static analyzer will now have the potential to break code? e.g., if a check move from the alpha group to the core group, the check name changes, and thus code needs to be updated. I presume one of the things this attribute will do is diagnose when an unknown check name is given? This has the potential to introduce diagnostics where a library is written for a newer clang but being compiled within an older clang, so we'd probably need a warning flag or something else to control the diagnostic.

This is correct. If we want to diagnose non-existent checker names we definitely would need a diagnostic flag and probably we want to have the check off by default. The reason other reason (apart from backward compatibility) is that, some users might load non-upstreamed checkers as plugins for some builds (but not for others).

I think if there are many problems with this concept, I will fall back to hard code this information in the checker instead of using an annotation.

NoQ added a comment.Jan 3 2020, 3:30 AM

Would changing the literal in the attribute have the same effect? I.e., acquire_handle("Fuchsia_But_Please_Ignore_Me").

In D72018#1802636, @NoQ wrote:

Would changing the literal in the attribute have the same effect? I.e., acquire_handle("Fuchsia_But_Please_Ignore_Me").

It should, but doesn't currently because we don't have any checking that the string literal matches a name in the static analyzer or clang-tidy for that attribute.

I think if there are many problems with this concept, I will fall back to hard code this information in the checker instead of using an annotation.

How much hard coded information would this remove? How often do you find that users want to add to the list of hard coded functions themselves but can't? Maybe the benefits to the attribute outweigh the downside and it is worth exploring ways to resolve some of these concerns -- I don't have a good feeling for the problem space though.

In D72018#1802636, @NoQ wrote:

Would changing the literal in the attribute have the same effect? I.e., acquire_handle("Fuchsia_But_Please_Ignore_Me").

It should, but doesn't currently because we don't have any checking that the string literal matches a name in the static analyzer or clang-tidy for that attribute.

Actually, the static analyzer checker does check for the literal, so this could work.

I think if there are many problems with this concept, I will fall back to hard code this information in the checker instead of using an annotation.

How much hard coded information would this remove? How often do you find that users want to add to the list of hard coded functions themselves but can't? Maybe the benefits to the attribute outweigh the downside and it is worth exploring ways to resolve some of these concerns -- I don't have a good feeling for the problem space though.

I expect not to have too many functions that need such exclusions. It was more about making it easier to do some exclusions without touching the analyzer. I will ask around what are the preferences of the team, maybe Artem's proposal will work for us.

In D72018#1802636, @NoQ wrote:

Would changing the literal in the attribute have the same effect? I.e., acquire_handle("Fuchsia_But_Please_Ignore_Me").

It should, but doesn't currently because we don't have any checking that the string literal matches a name in the static analyzer or clang-tidy for that attribute.

Actually, the static analyzer checker does check for the literal, so this could work.

Huh, that's interesting. I would expect that check to happen in more than just the static analyzer though -- for instance, clang-tidy checks may want to use those attributes, or even a less-heavy CFG analysis than the static analyzer might.

I think if there are many problems with this concept, I will fall back to hard code this information in the checker instead of using an annotation.

How much hard coded information would this remove? How often do you find that users want to add to the list of hard coded functions themselves but can't? Maybe the benefits to the attribute outweigh the downside and it is worth exploring ways to resolve some of these concerns -- I don't have a good feeling for the problem space though.

I expect not to have too many functions that need such exclusions. It was more about making it easier to do some exclusions without touching the analyzer. I will ask around what are the preferences of the team, maybe Artem's proposal will work for us.

Okay, that makes sense to me, thank you!

NoQ added a comment.EditedJan 3 2020, 11:27 AM

I expect not to have too many functions that need such exclusions. It was more about making it easier to do some exclusions without touching the analyzer. I will ask around what are the preferences of the team, maybe Artem's proposal will work for us.

I think it ultimately makes sense to introduce analyzer IPA hints ("please inline this function", "please evaluate this one conservatively", "please don't model effects of this function on checker state", etc.), given that otherwise our set of heuristics on this subject are incomprehensible.

But if you have a chance to hardcode a small list of legacy functions with weird behavior (that cannot be expressed with annotations; maybe add explicit modeling for such functions) and use the analyzer enforce the lack of other exceptional functions (so that everything else was expressible with annotations), that should be the way to go simply because it makes your function surfaces much easier to reason about for the programmer.