This is an archive of the discontinued LLVM Phabricator instance.

APINotes: add APINotesYAMLCompiler
ClosedPublic

Authored by compnerd on Oct 5 2020, 3:52 PM.

Details

Summary

This adds the skeleton of the YAML Compiler for APINotes. This change
only adds the YAML IO model for the API Notes along with a new testing
tool apinotes-test which can be used to verify that can round trip the
YAML content properly. It provides the basis for the future work which
will add a binary serialization and deserialization format to the data
model.

Diff Detail

Event Timeline

compnerd created this revision.Oct 5 2020, 3:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2020, 3:52 PM
Herald added subscribers: jfb, mgorny. · View Herald Transcript
compnerd requested review of this revision.Oct 5 2020, 3:52 PM

Seems like this patch handles already existing attributes in Clang, so this might not be affected by the concerns brought up here by James:
http://lists.llvm.org/pipermail/cfe-dev/2020-October/066996.html

clang/include/clang/APINotes/APINotesYAMLCompiler.h
17

It would be useful to have more documentation here.

clang/include/clang/APINotes/Types.h
2

So we are mapping existing attributes here, I am missing this to be documented. Why do we support only these attributes?
Would be consistent to put SwiftPrivate here as well, instead of implicitly interpreting that as bool in APINotesYAMLCompiler.cpp, wouldn't it?
I think we should list all processed attributes here, or we should just use Attrs.inc (see below my comment as well).

26

This seems a bit redundant to Attrs.td. I'd prefer to have the structure of an attribute defined only in one place. Why can't we directly reuse the types in the generated Attrs.inc? In this case

class EnumExtensibilityAttr : public InheritableAttr {
public:
  enum Kind {
    Closed,
    Open
  };

could perfectly fit, wouldn't it?

33

Should this be rather SwiftNewTypeKind? Since swift_wrapper is deprecated according to the docs: https://clang.llvm.org/docs/AttributeReference.html#swift-newtype

clang/lib/APINotes/APINotesYAMLCompiler.cpp
241

Why are these classes in a unnamed namespace? I'd expect them to be in the header under the clang::api_notes namespace, so users of the APINotesYAMLCompiler could use these as the structured form of the YAML content. Isn't that the point of the whole stuff?

440

Hmm, why do we need "none"? Can't we interpret the non-existence as "none"?

622

I think the stream (llvm::outs) should not be written in stone. And why not llvm::errs (we dump to errs usually) ? Could it be a parameter?

clang/test/APINotes/Inputs/Frameworks/Simple.framework/Headers/Simple.apinotes
2

I am pretty sure this does not provide a full coverage. What about e.g Functions? I'd like to see them tested as well.

clang/test/APINotes/yaml-roundtrip.test
5

Why do we have this difference? This seems odd and as a superfluous complexity.

clang/tools/apinotes-test/APINotesTest.cpp
23

auto -> std::string

compnerd marked 3 inline comments as done.Oct 16 2020, 8:30 AM

Thanks for the feedback!

clang/include/clang/APINotes/Types.h
2

These are the ones that are currently needed and can be safely merged. I really wouldn't want to extend the support to all the attributes in the initial attempt to merge the functionality as it is something which is actually in production already. However, once the functionality is in a shared state, it is much easier to revise and co-evolve other consumers.

clang/lib/APINotes/APINotesYAMLCompiler.cpp
241

There will be follow up types that provide structured access to the data. These types are strictly for the serialization to and from YAML via YAML::IO.

440

At the very least we need it for compatibility - this is already a shipping feature. However, nullability is also not completely annotated. So, there is some benefit in tracking the explicit none vs missing.

622

Sure, that is reasonable, I should be able to add a llvm::raw_ostream & parameter.

clang/test/APINotes/Inputs/Frameworks/Simple.framework/Headers/Simple.apinotes
2

Correct, it isn't meant to provide full coverage at all. It is merely meant to be enough to ensure that we can load the YAML and process it. The full coverage will come with the follow up patches as they will require filling out more of the infrastructure. I am trying to keep the patch at a reasonable size. I can add additional test cases if you feel strongly that they should be added now, but, I do worry about the size of the patch ballooning.

clang/test/APINotes/yaml-roundtrip.test
5

The difference is needed for compatibility. Richard wanted the fully spelt out names, but that requires backwards compatibility, so we need the difference here.

as it is something which is actually in production already
...
At the very least we need it for compatibility - this is already a shipping feature.

I value that you guys have a prototype on a fork that is being used in production. But, upstreaming and these reviews give us the chance to evolve and to create an implementation that serves well the whole Clang community. I am having a hard time to accept "this is how it is implemented in our fork" as a technical argument. Besides, I am not sure how could the Clang community benefit about being backward compatible with a specialized fork and thus making superfluous complications. This feature could be really valuable, but I'd like to have it landed in a high quality form which serves the whole community (and the static analyzer developers). I think that a simple copy-paste of the fork will not do it.

clang/include/clang/APINotes/Types.h
2

I understand that an initial implementation may not support all attributes. What's more, an evolved implementation may not do that either.

Including Attrs.inc does not put any requirement to support all attributes, but we could reuse the types and enums there. This way we could avoid the need to mirror the types in Attrs.inc.

clang/lib/APINotes/APINotesYAMLCompiler.cpp
241

Sounds good. Could you please describe in a nutshell which are the following steps for the upstreaming? I mean it could make the review easier if we could see a kind of road-map for the upcoming patches.

440

Optional<EnumExtensibilityAttr::Kind> ?

clang/test/APINotes/yaml-roundtrip.test
5

I have concerns about making things more complicated just to be compatible with a downstream fork.

martong added a reviewer: NoQ.Oct 19 2020, 9:53 AM
martong added a subscriber: NoQ.

Adding @NoQ as a reviewer, he might have some valuable comments, as he is very keen on this feature.

compnerd marked 3 inline comments as done.Oct 26 2020, 11:37 AM

I value that you guys have a prototype on a fork that is being used in production. But, upstreaming and these reviews give us the chance to evolve and to create an implementation that serves well the whole Clang community. I am having a hard time to accept "this is how it is implemented in our fork" as a technical argument. Besides, I am not sure how could the Clang community benefit about being backward compatible with a specialized fork and thus making superfluous complications. This feature could be really valuable, but I'd like to have it landed in a high quality form which serves the whole community (and the static analyzer developers). I think that a simple copy-paste of the fork will not do it.

I'm not against evolving the implementation. However, compatibility with production code is important. I don't think that it is possible to ignore that, especially when the impact is non-trivial (it is not a small localized use). The large scale usage is valuable in terms of demonstrating the value of the functionality, so it is valuable to clang community. I am not suggesting just a copy from the Swift implementation, and do care about it being a high quality implementation. But that cannot come at the cost of compatibility with the existing usage.

clang/include/clang/APINotes/Types.h
2

From that description, I think I am not necessarily following what you are suggesting. Could you be a bit more concrete in what you are suggesting in terms of the reuse?

clang/lib/APINotes/APINotesYAMLCompiler.cpp
241

I am still digesting the complete implementation, which is extremely large.

Currently, my thought is to split up it up with this change which only deals with the YAML aspect, then follow that up with the processing that is required:

  • conversion to bitcode
  • conversion from bitcode

And then the work to associate the attributes from the deserialized form to the AST.

I am trying to split this up into smaller chunks so that it can be adequately reviewed rather than just be a dump of a feature and then have to rework it to be something that makes sense in the upstream tree.

I also feel uncomfortable with the current version in Swift being merged as I can't seem to understand the testing aspect well enough, which is why I introduced the apinotes-test-tool so that we can be sure that the pieces can be tested.

440

That representation could work, let me see if I can get YAML::IO to provide something which would be compatible.

clang/test/APINotes/yaml-roundtrip.test
5

The compatibility is extremely important unfortunately. We could go with just the terse version for the time being if you are strongly opposed to the two spellings.

I am having a hard time to accept "this is how it is implemented in our fork" as a technical argument. Besides, I am not sure how could the Clang community benefit about being backward compatible with a specialized fork and thus making superfluous complications.

Being compatible with Apple's SDKs is quite important for some part of the community.

This is similar to Clang implementing GCC and MSVC attributes, for example. Some might call them specialized attributes -- they are non-standard and are certainly not used by every C++ user -- but there exists a sizeable population of downstream consumers that rely on them. Apple's SDKs are in the same class.

This feature could be really valuable, but I'd like to have it landed in a high quality form which serves the whole community (and the static analyzer developers). I think that a simple copy-paste of the fork will not do it.

I completely agree that the feature should be useful for static analysis. Do compatibility aliases hurt these goals?

compnerd marked 4 inline comments as done.Oct 26 2020, 2:34 PM
compnerd added inline comments.
clang/lib/APINotes/APINotesYAMLCompiler.cpp
440

The representation is already Optional<EnumExtensibilityAttr::Kind>, but we need to map the user providing none back to a llvm::None so this will still need to be listed as is.

compnerd updated this revision to Diff 300796.Oct 26 2020, 2:36 PM

Address review comments.

compnerd marked an inline comment as done.Oct 26 2020, 2:56 PM
compnerd updated this revision to Diff 300815.Oct 26 2020, 3:49 PM
compnerd marked an inline comment as done.

Add more testing coverage

compnerd marked an inline comment as done.Oct 26 2020, 3:58 PM
compnerd added inline comments.
clang/include/clang/APINotes/Types.h
26

The none-case here is not the same as missing - it tracks the explicitly audited case. I suppose we can change the internal enum case from None to Audited if you like.

I am having a hard time to accept "this is how it is implemented in our fork" as a technical argument. Besides, I am not sure how could the Clang community benefit about being backward compatible with a specialized fork and thus making superfluous complications.

Being compatible with Apple's SDKs is quite important for some part of the community.

This is similar to Clang implementing GCC and MSVC attributes, for example. Some might call them specialized attributes -- they are non-standard and are certainly not used by every C++ user -- but there exists a sizeable population of downstream consumers that rely on them. Apple's SDKs are in the same class.

Fair enough. But I don't think that Clang developers just copied the implementation from GCC or from MSVC.
I accept to keep the interface of the feature (i.e. the YAML format) as it is in the fork, okay. But, I'd like to have maximum flexibility from the submitter by willing to change the details of the implementation. I don't think that the implementation in the fork is necessarily the best realization, we can do better, after all that's the purpose of this review, isn't it?

This feature could be really valuable, but I'd like to have it landed in a high quality form which serves the whole community (and the static analyzer developers). I think that a simple copy-paste of the fork will not do it.

I completely agree that the feature should be useful for static analysis. Do compatibility aliases hurt these goals?

No. But they make the tests and the whole system more complicated, IMHO.

clang/include/clang/APINotes/Types.h
2

I was thinking about to simply
#include "clang/AST/Attr.h
(which further includes Attrs.inc).

This way the class clang::EnumExtensibilityAttr and many other attribute classes are immediately available for your use. And you don't have to define very similar classes here again (unless I misunderstand something).

26

I am not sure I can follow. Could you please elaborate what is an "explicit y audited" case?
https://clang.llvm.org/docs/AttributeReference.html#enum-extensibility mentions only open/closed ...

clang/test/APINotes/yaml-roundtrip.test
5

I'd rather stick with the terse version if possible (do we have to pursue rsmith about this?) If this is really a hassle then let's forget about it.

Fair enough. But I don't think that Clang developers just copied the implementation from GCC or from MSVC.

No (nor we could copy the implementation for a number of reasons). However, we did have to be bug-for-bug compatible sometimes for compatibility.

compnerd marked an inline comment as done.Oct 28 2020, 9:48 AM

I'd like to have maximum flexibility from the submitter by willing to change the details of the implementation

I don't think that is a fair expectation - there are plenty of times where this is technical disagreement on functionality, and it doesn't get immediately resolved. A high quality implementation doesn't require that everything be changed immediately, nor does it mean that everything must be completely flexible. There are trade-offs, and the goal is to strike a balance between simplicity and the needs of the project.

clang/include/clang/APINotes/Types.h
26

There are three states consider:

enum Unaudited {
};
enum __attribute__((__enum_extensibility__(open))) Open {
};
enum __attribute__((__enum_extensibility__(closed))) Closed {
};

The optionality of the value indicates whether the value is present or not in the APINotes, not the tri-state nature of the attribute.

I'd like to have maximum flexibility from the submitter by willing to change the details of the implementation

I don't think that is a fair expectation - there are plenty of times where this is technical disagreement on functionality, and it doesn't get immediately resolved. A high quality implementation doesn't require that everything be changed immediately, nor does it mean that everything must be completely flexible. There are trade-offs, and the goal is to strike a balance between simplicity and the needs of the project.

Yes, absolutely, flexibility is needed from both sides (submitter and reviewer). It is our common interest to get an extensible implementation upstreamed. I hope I didn't push it too hard, I didn't mean any offence.

clang/include/clang/APINotes/Types.h
26

Ok. Thanks for the explanation.

But how could we describe then flag_enum? As a possible extension in the future (not in this patch).
E.g.:

enum __attribute__((enum_extensibility(open),flag_enum)) OpenFlagEnum {
  D0 = 1 << 0, D1 = 1 << 1
};

Or here

enum __attribute__((flag_enum)) Foo;

should Foo have an Audited or None Kind here?

If we were to add support to flag_enum would you create a new enum class for that here?

Note that I am asking all these questions because I am planning to extend and add more attributes in the future once the whole APINotes stuff is landed.

martong added inline comments.Oct 28 2020, 10:41 AM
clang/lib/APINotes/APINotesYAMLCompiler.cpp
33

Could you please clang-format the code? The Lint: Pre-merge checks are all over the code and makes the review a bit harder.

compnerd added inline comments.Oct 28 2020, 1:17 PM
clang/include/clang/APINotes/Types.h
26

The questions are reasonable - and we should understand what is missing and what can be done and what cannot be done.

I think that the use of the metadata is extremely important. The question is, are there any users who care about recovering that information? IMO, we should try to represent the input as faithfully as possible, even if that means that we need to replicate the enumerations. However, if the consumers do not care about the inferred state vs the written state, I don't think that it is an immediate concern if we cannot recover that from the representation, we can simplify and extend as and when needed, especially when we have the feature merged (I think that it is easier once merged only because this is a large feature, and there is a large user base that is using this, which complicates things. Really, this functionality should have been merged a long time ago, but that is both my own opinion and past), and hopefully in many cases we can do it during review time.

The tricky bit to remember that there is an additional state that is implicitly being added here, and optionality should reference the contents of the APINotes, not the state in the declaration that is being mutated.

Is there a good place to write this down? I am willing to add comments explaining this so that the next person (ha, no, not the next person, but me, because I *will* forget) is not tripped up on this nuance.

compnerd added inline comments.Oct 29 2020, 8:52 AM
clang/lib/APINotes/APINotesYAMLCompiler.cpp
33

Sure, though, I do think that some of the extra whitespace makes it easier to read.

martong added inline comments.Oct 30 2020, 9:28 AM
clang/include/clang/APINotes/Types.h
26

Thanks for clarifying this. It was not clear for me that these types were to represent the YAML file contents. It makes much more sense now.

Is there a good place to write this down?

I think a file comment right after the license could do it.

compnerd updated this revision to Diff 302298.Nov 2 2020, 8:51 AM

rebase, clang-format, add comments

martong accepted this revision.Nov 3 2020, 2:35 AM

Looks good to me now! Thanks for addressing my concerns.

clang/tools/apinotes-test/APINotesTest.cpp
16

We still have: clang-format: please reformat the code. (Mabe you already know, but if you use arc then all these formatting issues are handled automatically and it will upload a linted patch to Phabricator.)

This revision is now accepted and ready to land.Nov 3 2020, 2:35 AM

Thanks for the reviews! I'll reformat the rest of the code before merging. Im going to be working on merging this downstream first before merging this upstream, so it will take a couple of days still.

This revision was landed with ongoing or failed builds.Nov 5 2020, 10:55 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Nov 5 2020, 11:58 AM

Looks like the test doesn't pass on Windows: http://45.33.8.238/win/27342/step_7.txt

Please take a look, and revert if it takes a while to fix.

Here's the output of the diff on that Win bot:

thakis@thakis6-w MINGW64 /c/src/llvm-project (merge)
$ diff --strip-trailing-cr out/gn/obj/clang/test/APINotes/Output/yaml-roundtrip.test.tmp.result clang/test/APINotes/Inputs/Frameworks/Simple.framework/Headers/Simple.apinotes
1d0
< ---
8c7
<         Nullability:     Nonnull
---
>         Nullability:     N
14c13
<         Nullability:     Optional
---
>         Nullability:     O
20c19
<         Nullability:     Unspecified
---
>         Nullability:     U
26c25
<         Nullability:     Unspecified
---
>         Nullability:     S
29,30c28
<         Nullability:     Unspecified
< ...
---
>         Nullability:     Scalar

@thakis, thanks for the heads up, I've disabled the test on Windows - I should've disabled it on Windows in the first place.