This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Supporting function attributes in .model files.
AbandonedPublic

Authored by pgousseau on Oct 14 2015, 10:22 AM.

Details

Summary

Dear All,

I would like to propose a patch for specifying function attributes in .model files to be used by the analyzer checkers.

There are a number of checkers (such as nullability checkers) that rely on attributes, I would like to make these checkers available to a wider number of functions by tagging the function definition with appropriate attribute information. I have a situation where we cannot append the function attributes to library header files; instead I propose a patch that allows us to publish attribute information via .model files. The patch modifies the .model file parser to allow for the extraction of function declarations found in the .model file. Attributes are merged by reusing Sema's merging capabilities.
A new option 'faux-attributes' is added to enable this feature. 'faux-attributes' is disabled by default.

Please can you review this for submission

Regards,

Pierre Gousseau
SN-Systems - Sony Computer Entertainment

Diff Detail

Event Timeline

pgousseau updated this revision to Diff 37362.Oct 14 2015, 10:22 AM
pgousseau retitled this revision from to [RFC][Analyzer] Supporting function attributes in .model files..
pgousseau updated this object.
pgousseau added a subscriber: cfe-commits.
pgousseau updated this revision to Diff 37466.Oct 15 2015, 3:14 AM

Fix incorrect caching of model files declarations.

Hi!

I have some high level questions and notes about this patch.

I implemented the function modelling as a Google Summer of Code project and Ted Kremenek was my mentor. I am happy that you found an useful application of the function modeling system, and I think, in general, it is a good idea to be able to store attributes for the modelled functions. However I am a bit surprised, when I saw this patch. The main reason is that, models lack a lot of functionality right now.

Main missing features:

  • Ability to specify multiple model paths similar to how header include paths are specified.
  • Support for methods, namespaces, overloaded functions.

Did you find the feature useful despite those limitations? I am interested to improve the situation in the future, unfortunately I find very little time to work on this area recently, but I do welcome every changes.

I have two comments before I start to review the patch itself. Right now this patch contains two modifications. One for the .model files and one for a checker. I think it would be better to separate these two changes into two separate patches. If the motivation behind the merge of those patches is that, you want to test a feature you implemented in .model files, than I think there are other checker that are using attributes, so I think you should be able to provide a test case with the two changes separated. (For example using the nonnull parameter checker.)

The other comment is that, the .model files are intended to work in a way, that checkers should not be able utilize the information whether some data is coming from a model file or the analyzed translation unit. In fact, this is an abstraction, that makes it possible to develop the checkers and the modelling mechanism independently. With you patch, the checker should observe the model declaration explicitly. I think, a better design would somehow merge the attributes that are coming from the original translation unit with the attributes coming from the model files, and expore that set of attributes. This way the checker could not tell what the source of the information is. Unfortunately I do not know what would be the best way to enfore this, since the checkers can just observe the AST of the original translation unit and bypassing the models.

Hi!

Hi! Thank you for reviewing.

I have some high level questions and notes about this patch.

I implemented the function modelling as a Google Summer of Code project and Ted Kremenek was my mentor. I am happy that you found an useful application of the function modeling system, and I think, in general, it is a good idea to be able to store attributes for the modelled functions. However I am a bit surprised, when I saw this patch. The main reason is that, models lack a lot of functionality right now.

Main missing features:

  • Ability to specify multiple model paths similar to how header include paths are specified.
  • Support for methods, namespaces, overloaded functions.

Did you find the feature useful despite those limitations? I am interested to improve the situation in the future, unfortunately I find very little time to work on this area recently, but I do welcome every changes.

Yes I think this is useful despite the missing features. With the current model file design I am confident those features could be easily added at a later stage?

I have two comments before I start to review the patch itself. Right now this patch contains two modifications. One for the .model files and one for a checker. I think it would be better to separate these two changes into two separate patches. If the motivation behind the merge of those patches is that, you want to test a feature you implemented in .model files, than I think there are other checker that are using attributes, so I think you should be able to provide a test case with the two changes separated. (For example using the nonnull parameter checker.)

I see, yes using the nonnull param checker for testing seems a better fit, I will update the patch.

The other comment is that, the .model files are intended to work in a way, that checkers should not be able utilize the information whether some data is coming from a model file or the analyzed translation unit. In fact, this is an abstraction, that makes it possible to develop the checkers and the modelling mechanism independently. With you patch, the checker should observe the model declaration explicitly. I think, a better design would somehow merge the attributes that are coming from the original translation unit with the attributes coming from the model files, and expore that set of attributes. This way the checker could not tell what the source of the information is. Unfortunately I do not know what would be the best way to enfore this, since the checkers can just observe the AST of the original translation unit and bypassing the models.

I agree, abstracting the attributes' origin is nicer, hopefully we can find a solution that is elegant enough. I will have a look.
Thanks!

Pierre

pgousseau updated this revision to Diff 37765.Oct 19 2015, 9:32 AM
pgousseau edited edge metadata.

Following Gabor's review:

Remove changes to UncheckedReturn checker.
Add a test using the NonNullParamChecker checker.
Abstract the origin of the attributes.

An analyzer option "faux-attributes" is added. This is 'false' by default and meant to prevent unnecessary parsing of model files.

zaks.anna added inline comments.Oct 20 2015, 10:29 PM
lib/Analysis/BodyFarm.h
43

Why do we not have sanity checks? What happens when there is a conflict?

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
512

This name is too generic.

571

Should we walk the entire AST here only to get the info from the few functions in the farm?

The AnalysisConsumer visitor tries to make sure the whole AST is not serialized, which is very expensive when dealing with PCH files. (You an find comments about it if you search for "PCH".)

lib/StaticAnalyzer/Frontend/ModelConsumer.cpp
37

Why func lost constness here?

lib/StaticAnalyzer/Frontend/ModelInjector.cpp
49

Does ModelInjector::onBodySynthesis return immediately if the model has been parsed but the Decl is not in the map?

If that is not the case, wouldn't it be very expensive to call onBodySynthesis on every decl, most of which are not modeled? If I understand correctly, we would try to parse the model file for every such decl.

test/Analysis/model-attributes.cpp
3

80 col?

pgousseau added inline comments.Oct 21 2015, 11:13 AM
lib/Analysis/BodyFarm.h
43

I am worried that any custom sanity check I put would end up confusing the "faux-attributes" user ... I have just noticed that Sema has some merging capabilities, I will have a look, see if it can be reused. Thanks!

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
512

Will rename it thanks!

571

I see, yes this can be optimized thanks!

lib/StaticAnalyzer/Frontend/ModelConsumer.cpp
37

No reason, will fix thanks!

lib/StaticAnalyzer/Frontend/ModelInjector.cpp
49

Yes, there might be cases where the model file's Decl might not match the model file filename, causing re-parsing. Will fix thanks!

test/Analysis/model-attributes.cpp
3

Will fix!

pgousseau updated this revision to Diff 38438.Oct 26 2015, 11:23 AM

Following Anna's review:

Remove unnecessary AST walk over declarations by reusing already captured declarations.
Add handling of merge conflicts using Sema merge methods.
Add condition at the end of ModelInjector::onBodySynthesis to prevent unnecessary model file parsing.
Fix 80 col in regression tests.
Add regression tests to exercise merge conflicts.

The const qualifier on model's declarations has to be removed as Sema merge methods take non const parameters.

pgousseau retitled this revision from [RFC][Analyzer] Supporting function attributes in .model files. to [Analyzer] Supporting function attributes in .model files..Nov 9 2015, 2:21 AM
pgousseau updated this object.
zaks.anna edited edge metadata.Dec 4 2015, 2:45 PM

Pierre,

Have you seen the post about API Notes?
http://llvm.cc/t/cfe-dev-clang-and-swift/331

I believe using API notes would be a better approach for adding annotations. By the time the static analyzer sees the AST, the annotations would already be there. The API Notes should already seamlessly work for nullability annotations. Does that work for you? (Do we still need some parts of this patch? I am not sure.)

A related topic is deciding what type of attributes should be used by the static analyzer checkers. Currently, there are 2 options, neither of which is ideal:

  1. Extend clang with new attributes for static analyzes. This is not highly desirable because the clang attributes namespace will get polluted and compiler users might start using them. This approach is also not convenient for proprietary checkers.
  2. Annotate attribute (AnnotateAttr) is used in some places; however, this attribute creeps into LLVMIR, which caused problems in the past.

The best approach would be to put all analyzer attributes behind "a fence" so that we could add/remove them without worrying that compiler(not analyzer) users depend on them. The attribute would not be CodeGened.

Here what it might look like:
analyzer_annotate(family, arg…)
family: id
arg: [id=] value
value: number | string | id

Is anyone interested in implementing this?

Pierre,

Have you seen the post about API Notes?
http://llvm.cc/t/cfe-dev-clang-and-swift/331

I believe using API notes would be a better approach for adding annotations. By the time the static analyzer sees the AST, the annotations would already be there. The API Notes should already seamlessly work for nullability annotations. Does that work for you? (Do we still need some parts of this patch? I am not sure.)

Thanks for pointing it out! It is not clear yet if APINotes is suited for our problematic, will need to ask some questions first on the "Clang and Swift" cfe-dev thread.

A related topic is deciding what type of attributes should be used by the static analyzer checkers. Currently, there are 2 options, neither of which is ideal:

  1. Extend clang with new attributes for static analyzes. This is not highly desirable because the clang attributes namespace will get polluted and compiler users might start using them. This approach is also not convenient for proprietary checkers.
  2. Annotate attribute (AnnotateAttr) is used in some places; however, this attribute creeps into LLVMIR, which caused problems in the past.

The best approach would be to put all analyzer attributes behind "a fence" so that we could add/remove them without worrying that compiler(not analyzer) users depend on them. The attribute would not be CodeGened.

Here what it might look like:
analyzer_annotate(family, arg…)
family: id
arg: [id=] value
value: number | string | id

Is anyone interested in implementing this?

The "analyzer_annotate" idea sounds good, I would need to do more evaluation before deciding this is suitable for us though.

pgousseau abandoned this revision.Dec 10 2015, 10:29 AM

Abandoning for now while evaluating Swift's APINotes.