This is an archive of the discontinued LLVM Phabricator instance.

[clang][extract-api] Add enum support
ClosedPublic

Authored by zixuw on Mar 16 2022, 5:52 PM.

Details

Summary

Add support for enum records

  • Add EnumConstantRecord and EnumRecord to store API information for enums
  • Implement VisitEnumDecl in ExtractAPIVisitor
  • Implement serializatin for enum records and MemberOf relationship
  • Add test case for enum records
  • Few other improvements

Diff Detail

Event Timeline

zixuw created this revision.Mar 16 2022, 5:52 PM
Herald added a project: Restricted Project. · View Herald Transcript
zixuw requested review of this revision.Mar 16 2022, 5:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 5:52 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
dang added inline comments.Mar 17 2022, 11:44 AM
clang/lib/SymbolGraph/ExtractAPIConsumer.cpp
197 ↗(On Diff #416037)

Should this be static or in an anonymous namespace?

clang/lib/SymbolGraph/Serialization.cpp
228 ↗(On Diff #416037)

Nice!

253 ↗(On Diff #416037)

"Enumeration Case"

257 ↗(On Diff #416037)

"Enumeration"

321 ↗(On Diff #416037)

Since we are going to add more cases to this switch, would you mind doing something along the lines of:

const char *KindString = "";
switch(Kind) {
  default:
     llvm_unreachable("Unhandled relationship kind in Symbol Graph!");
     break;
  case RelationshipKind::MemberOf: 
    KindString = "memberOf";
    break;
}

Relationship["kind"] = KindString

or a method in the serializer for getting the string representation of the relationship kind?

343 ↗(On Diff #416037)

Quick design question: Do we want to be silently failing in these situations (especially since this shouldn't be happening)? Let me know what you think.

zixuw added inline comments.Mar 17 2022, 5:09 PM
clang/lib/SymbolGraph/Serialization.cpp
321 ↗(On Diff #416037)

Definitely! That's a good idea. I'm going to have a 'getString' sort of method in the serializer to do this.

343 ↗(On Diff #416037)

I'm using this check to intentionally skip symbols that we do not want to spit out, for example unconditionally unavailable symbols, or after we have support for typedef records, anonymous enum decls that's declared with a typedef so that we don't have duplicate information, etc.
Optional<Object> serializeAPIRecord does this check now, and if we Serializer::shouldSkip it, None will be returned. So it is expected, not really silently failing.

Hey, apologies for missing the initial patch in D119479.

This sounds really interesting!
The lack of docs make it very difficult to understand what this library is for and how to use it if you don't already know.
It's not clear what the SymbolGraph library is (I think there's a doc missing from the directory). The patches are tagged with [extract-api] so it must be related, but if it's something distinct it should be documented and reusable, and if it's the same why is it not called ExtractAPI :-)

The main entry point seems to be API.h, which has only a trivial file comment and no comments on what the meaning of the structs and fields are. Especially if this is part of clang proper, it's going to be read and maintained and debugged by a lot of people who have never heard of SymbolGraph and don't know swift.

It's also not clear from looking at this why this is part of clang itself and not a libTooling-style external tool - it seems similar in scope and applicability to a lot of the things in clang-tools-extra.

I know a lot of these things don't feel like they need much explanation when you're and your reviewer are familiar with it, so apologies that this is tedious, but the cost of undocumented code like this in shared projects is high.

dang added inline comments.Mar 18 2022, 3:31 AM
clang/lib/SymbolGraph/Serialization.cpp
343 ↗(On Diff #416037)

Yeah that makes sense. If we are doing the check in serializeAPIRecord it might be worth not calling serializeEnumRecord at all for the sake of keeping the code handling these kinds of situation in a single place.

Hi Sam! Thanks for your interest and taking a look!

Hey, apologies for missing the initial patch in D119479.

This sounds really interesting!
The lack of docs make it very difficult to understand what this library is for and how to use it if you don't already know.
It's not clear what the SymbolGraph library is (I think there's a doc missing from the directory). The patches are tagged with [extract-api] so it must be related, but if it's something distinct it should be documented and reusable, and if it's the same why is it not called ExtractAPI :-)

My apologies for the confusion caused! Yes I do need to improve the docs for existing changes and put more efforts into the readability of the code in future patches. And I definitely agree that SymbolGraph is not really an appropriate name for the library. Initially the code was directly put in clangFrontend but I had to move them out to resolve a circular dependency problem between frontend and clangIndex. And it seems that I'm just bad at naming things 🙂.
I've put up another patch D122160 to refactor all the ExtractAPI work and also add missing doc comments to make things clearer.

The main entry point seems to be API.h, which has only a trivial file comment and no comments on what the meaning of the structs and fields are. Especially if this is part of clang proper, it's going to be read and maintained and debugged by a lot of people who have never heard of SymbolGraph and don't know swift.

It's also not clear from looking at this why this is part of clang itself and not a libTooling-style external tool - it seems similar in scope and applicability to a lot of the things in clang-tools-extra.

Implementing this with libTooling in clang-tools-extra is definitely something we've considered and evaluated during designing and before the proposal. And we decided that including this in clang core had great benefits:

  • Implementing ExtractAPI in clang itself allows the functionalities to be easily accessible by more downstream tools to use either the API information collected or the serialized output via clang or libclang, for example IDE integrations.
  • We could share parts of the code with Swift down the line.
  • The proposed clang-extract-api tool only need to parse a set of given header files for flexibility and efficiency. Implementing ExtractAPI as a clang driver allows a more flexible control of invoking the tool with various of clang options without the need of a compilation database, and be decoupled from compilations and builds.

I know a lot of these things don't feel like they need much explanation when you're and your reviewer are familiar with it, so apologies that this is tedious, but the cost of undocumented code like this in shared projects is high.

Not at all! There are all valid and great feedbacks and questions. Thank you very much for pointing there out Sam! I've definitely neglected documenting and presenting the code for better readability and community collaboration. It's always welcome and exciting to have these feedbacks to move the tool forward together.

zixuw updated this revision to Diff 417147.Mar 21 2022, 6:08 PM
zixuw marked 2 inline comments as done.

Rebase onto D122160 refactoring

zixuw updated this revision to Diff 417154.Mar 21 2022, 6:40 PM
zixuw marked 4 inline comments as done.
  • Address review issues;
  • Fix memory managment for EnumConstantRecord and EnumRecord
clang/lib/SymbolGraph/ExtractAPIConsumer.cpp
197 ↗(On Diff #416037)

The whole ExtractAPIVisitor class is already in an anonymous namespace right?

dang added inline comments.Mar 22 2022, 4:40 AM
clang/include/clang/ExtractAPI/API.h
33

I still think this belongs in APISet. It belongs there because it defines unique pointers that are specifically tied to the lifetime of APISet. Any other thing that needs a reference to things allocated in the APISet should just get the raw pointer out, ideally they should be storing the pointer returned by the add* methods of APISet.

zixuw updated this revision to Diff 417424.Mar 22 2022, 4:13 PM

Rebase on main with changed and merged refactoring in D122160

dang accepted this revision.Mar 23 2022, 9:41 AM

LGTM!

This revision is now accepted and ready to land.Mar 23 2022, 9:41 AM
This revision was automatically updated to reflect the committed changes.

This (or some closely related commit?) causes a huge amount of warnings when building with GCC:

warning: ‘clang::extractapi::EnumRecord’ has a field ‘clang::extractapi::EnumRecord::Constants’ whose type uses the anonymous namespace [-Wsubobject-linkage]
 167 | struct EnumRecord : APIRecord {
     |        ^~~~~~~~~~
zixuw added a comment.Mar 23 2022, 1:38 PM

This (or some closely related commit?) causes a huge amount of warnings when building with GCC:

warning: ‘clang::extractapi::EnumRecord’ has a field ‘clang::extractapi::EnumRecord::Constants’ whose type uses the anonymous namespace [-Wsubobject-linkage]
 167 | struct EnumRecord : APIRecord {
     |        ^~~~~~~~~~

Hi Martin! Thanks for the update on the warnings on GCC, didn't know about that.
I think the warning is complaining about the use of APIRecordUniquePtr, which should be removed by D122331.

dexonsmith added inline comments.
clang/test/ExtractAPI/enum.c
5

I just noticed these tests use %clang instead of %clang_cc1. Usually it's best to test the driver in clang/test/Driver using -### (does the driver produce the right -cc1 flags?) and then the compilation in isolation here using %clang_cc1 and a -cc1 command-line. Is it possible to update the tests to do that?

6

I think this will *always* be empty, even if there are warnings or errors, since those get written to stderr (not stdout). You could fix that by putting 2>&1 before the pipe, but see below:

13–14

It's usually much easier to test diagnostics using clangs builtin -verify flag instead of FileCheck. -verify is documented in the header: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Frontend/VerifyDiagnosticConsumer.h (maybe elsewhere too?).

This gives you good diffs between expected and unexpected diagnostics. In this case, you could replace these CHECK-NOTs with:

// expected-no-diagnostics

Usually you'd put that after the RUN line. When diagnostics are expected, you usually put those comments on the same line where you expect the diagnostic (there are tricks to avoid long lines and to handle other locations; see the docs).

dexonsmith added inline comments.Apr 28 2022, 12:58 PM
clang/test/ExtractAPI/enum.c
11

Diffing against reference output is usually not ideal:

  • It makes it hard to evolve the output, since many unrelated tests could be affected and need an update. In practice, people may just regenerate the output without really knowing if the essential feature is still behaving correctly.
  • It makes it hard to evolve the tests, since line numbers are hardcoded.

Is it possible to update to use FileCheck for the output? You can just check the parts of the output that are relevant for this specific test, embed CHECKs directly into the input file, and use @LINE so that the line numbers tied to the CHECK comments instead of hardcoded.

13–14

As an example I just posted https://reviews.llvm.org/D124634 for this file; PTAL, and then it'd be great if you could update the other tests when you get a chance.

136

(I noticed the diff when adding // expected-no-diagnostics as part of https://reviews.llvm.org/D124634 -- I couldn't add it to the top of the input file, since that disrupted all of the line numbers and caused the diff to fail. An easy workaround in this case was just to move it to the end of the file so I wasn't blocked but it'd be better if the test were able to evolve over time I think.)