This is an archive of the discontinued LLVM Phabricator instance.

[clang][extract-api] Add global record support
ClosedPublic

Authored by zixuw on Feb 10 2022, 1:44 PM.

Details

Summary

Add facilities for extract-api:

  • Structs/classes to hold collected API information: APIRecord, API
  • Structs/classes for API information:
    • AvailabilityInfo: aggregated availbility information
    • DeclarationFragments: declaration fragments
      • DeclarationFragmentsBuilder: helper class to build declaration fragments for various types/declarations
    • FunctionSignature: function signature
  • Serialization: Serializer
  • Add output file for ExtractAPIAction
  • Refactor clang::RawComment::getFormattedText to provide an additional getFormattedLines for a more detailed view of comment lines used for the SymbolGraph format

Add support for global records (global variables and functions)

  • Add GlobalRecord based on APIRecord to store global records' information
  • Implement VisitVarDecl and VisitFunctionDecl in ExtractAPIVisitor to collect information
  • Implement serialization for global records
  • Add test case for global records

Diff Detail

Event Timeline

zixuw created this revision.Feb 10 2022, 1:44 PM
zixuw requested review of this revision.Feb 10 2022, 1:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2022, 1:44 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
zixuw updated this revision to Diff 407695.Feb 10 2022, 3:13 PM

Apply clang-format changes

zixuw updated this revision to Diff 407716.Feb 10 2022, 4:10 PM

Fix a Windows build failure where designated initializers are not available

zixuw updated this revision to Diff 407967.Feb 11 2022, 11:22 AM

Fix a test failure on Windows where the ':' delimiter for sed was taken by the Windows path convention.

zixuw updated this revision to Diff 408027.Feb 11 2022, 1:16 PM

Fix a test failure on Windows where diff doesn't recognize the "-a" option.

zixuw updated this revision to Diff 408673.Feb 14 2022, 5:15 PM

Convert file path to forward slashes for the URI scheme

zixuw updated this revision to Diff 409783.Feb 17 2022, 1:55 PM

Use %/t for path substitution to normalize path separators for file URI

There is a circular dependency if you use -DBUILD_SHARED_LIBS=on:

CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "clangFrontend" of type SHARED_LIBRARY
    depends on "clangIndex" (weak)
  "clangIndex" of type SHARED_LIBRARY
    depends on "clangFrontend" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed.  Build files cannot be regenerated correctly.
FAILED: build.ninja
zixuw added a comment.Feb 23 2022, 9:51 AM

There is a circular dependency if you use -DBUILD_SHARED_LIBS=on:

CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "clangFrontend" of type SHARED_LIBRARY
    depends on "clangIndex" (weak)
  "clangIndex" of type SHARED_LIBRARY
    depends on "clangFrontend" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed.  Build files cannot be regenerated correctly.
FAILED: build.ninja

Huh I pulled in clangIndex to generate USRs, but didn't try building shared libs. I'll see what I can do.
Good find! Thanks for taking a look!

zixuw updated this revision to Diff 413947.Mar 8 2022, 3:08 PM

Move ExtractAPIConsumer which references clangIndex USR generation methods out of clangFrontend to resolve a cycle between clangFrontend and clangIndex when building shared libs.

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2022, 3:08 PM
zixuw added a comment.Mar 8 2022, 3:10 PM

I've moved things around to resolve the cycle, but sadly it seems that shared lib builds are already broken and I could not test it:

Linking CXX shared library lib/libLLVMTransformUtils.dylib
FAILED: lib/libLLVMTransformUtils.dylib :
 ....
Undefined symbols for architecture x86_64:
  "(anonymous namespace)::MinCostMaxFlow::MinBaseDistance", referenced from:
      (anonymous namespace)::FlowAdjuster::jumpDistance(llvm::FlowJump*) const in SampleProfileInference.cpp.o
zixuw added a comment.Mar 8 2022, 4:02 PM

Wait that's just a current linking issue on main, I suppose not specific to shared lib builds.

zixuw updated this revision to Diff 413967.Mar 8 2022, 4:11 PM

Fix reference to vtable of ExtractAPIAction from clang/FrontendTools/ExecuteCompilerInstance

zixuw updated this revision to Diff 413992.Mar 8 2022, 6:58 PM

Plug clangSymbolGraph into clangFrontendTools

ributzka added inline comments.Mar 9 2022, 9:16 AM
clang/include/clang/SymbolGraph/AvailabilityInfo.h
32

We also need unconditionally unavailable.

dang added a subscriber: dang.Mar 14 2022, 11:33 AM
dang requested changes to this revision.Mar 15 2022, 6:52 AM

You should fix the test to take into account the serializer feedback I left behind

clang/include/clang/SymbolGraph/API.h
34

Might be worth deleting the default constructor.

45

APIRecord isn't an abstract class so can be instantiated. To be strictly compliant with the LLVM-style RTTI you would need an enum case for a plain record. Otherwise you could introduce a pure virtual function to APIRecord to make abstract. A good candidate for that would be the destructor, if you make that pure virtual but provide an empty implementation, the behaviour should be what you want.

clang/lib/AST/RawCommentList.cpp
366

Consider not using a lambda for this since you only do this once. Also consider rewriting the logic as

auto LastChar = S.find_last_not_of("\n");
S.erase(LastChar+1, S.size())
428

Might want to use a SmallString<124> here as it is likely that the comment line would fit into that and the += operator would only operate on stack storage. In the end of the common case (the line fits in 124 chars) you would only get a single heap allocation when you store emplace it into Results.

clang/lib/SymbolGraph/DeclarationFragments.cpp
55
77

ditto "genericParameter"

clang/lib/SymbolGraph/ExtractAPIConsumer.cpp
174

explicit doesn't do anything here since this constructor takes in 2 arguments?

clang/lib/SymbolGraph/Serialization.cpp
124

Should be "occ"

234

".func"

238

".variable"

239

"Global Variable"

253

Currently the spec (https://github.com/apple/swift-docc-symbolkit/blob/main/openapi.yaml) lists 0.3.0 as the version. Let's use that.

259

something more descriptive would be nice here. I think you want the output off getClangFullRepositoryVersion or getClangFullVersion so we have something meaningful to go off for future bugs.

265

As discussed offline we need to pass clang an extra flag to populate this. I suggest --product-name.

284

This is called "declarationFragments" also this is a function only concept as per the spec https://github.com/apple/swift-docc-symbolkit/blob/224736ddcdfcba87ae92bb5c8d74e8332f218f36/openapi.yaml#L311

This revision now requires changes to proceed.Mar 15 2022, 6:52 AM
dang added inline comments.Mar 15 2022, 7:34 AM
clang/lib/SymbolGraph/Serialization.cpp
238

".var" sorry

zixuw updated this revision to Diff 415658.Mar 15 2022, 6:36 PM
  • Address reviews
    • Add and handle unconditionally unavailable
    • Delete the default constructor of APIRecord
    • Make APIRecord abstract for LLVM RTTI
    • Improve getFormattedText and getFormattedLines
    • Fix wordings in serializer to conform to the SymbolGraph format
    • Remove unnecessary explicit keyword
    • Use clang with full version info for the generator field
    • Adjust test case
  • Make it obvious that the USR parameter will be copied in add* APIs
zixuw marked 12 inline comments as done.Mar 15 2022, 6:38 PM
zixuw marked an inline comment as done.Mar 15 2022, 6:49 PM
dang accepted this revision.Mar 16 2022, 4:03 AM

Looks mostly good now.

clang/include/clang/SymbolGraph/API.h
100

This is somewhat counter-intuitive but to decouple this from the small size template parameter you should pass in SmallVectorImpl<char>. Later when you need to construct a StringRef from it to pass it to copy you can do it by doing StringRef(USR.data(), USR.size()). Also probably worth making it a const reference just to indicate to clients it doesn't get modified in the process.

This revision is now accepted and ready to land.Mar 16 2022, 4:03 AM
zixuw updated this revision to Diff 415974.Mar 16 2022, 2:07 PM

Using SmallString/SmallVectorImpl<char> to pass USR into the add* APIs
still feel weird and unintuitive. And it's still not clear to the caller that
Name won't be copied but USR will.
Trying out another approach: still pass both Name and USR using StringRef
and so that the caller will be responsible to keep the string alive. In the
meanwhile expose a recordUSR method in API that wraps the USR generation,
and puts the USR string in persistent allocator memory. This way it is clear to
the caller that it needs to either use recordUSR to get the StringRef USR
needed for the add* calls, or to be responsible for keeping the string alive
if the caller generates USRs on its own.

zixuw marked 5 inline comments as done.Mar 16 2022, 2:53 PM
zixuw added a comment.Mar 16 2022, 3:08 PM

Landing this in a moment

This revision was automatically updated to reflect the committed changes.
zixuw added a comment.Mar 16 2022, 3:26 PM

Fixing bot failure right now

We are seeing test failures after this patch is landed:

Clang :: SymbolGraph/global_record.c
Script:
--
: 'RUN: at line 1';   rm -rf /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp
: 'RUN: at line 2';   split-file /b/s/w/ir/x/w/llvm-llvm-project/clang/test/SymbolGraph/global_record.c /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp
: 'RUN: at line 3';   sed -e "s@INPUT_DIR@/b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp@g" /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/reference.output.json.in >>  /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/reference.output.json
: 'RUN: at line 5';   /b/s/w/ir/x/w/staging/llvm_build/bin/clang -extract-api -target arm64-apple-macosx  /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/input.c -o /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/output.json | /b/s/w/ir/x/w/staging/llvm_build/bin/FileCheck -allow-empty /b/s/w/ir/x/w/llvm-llvm-project/clang/test/SymbolGraph/global_record.c
: 'RUN: at line 7';   sed -e "s@\"generator\": \"clang.*\"@\"generator\": \"clang\"@g"  /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/output.json >> /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/output-normalized.json
: 'RUN: at line 9';   diff /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/reference.output.json /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/output-normalized.json
--
Exit Code: 1

Command Output (stdout):
--
8c8
<     "generator": "clang"
---
>     "generator": "Fuchsia clang version 15.0.0 (https://llvm.googlesource.com/a/llvm-project a597d6a780b184539f504392168b004bf392a135)"

--

Failed builder task: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8819495289632137105/overview . Detailed test outputs: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8819495289632137105/test-results

Could you fix the test please? If it takes a long time to fix, could you revert the change first please?
I think you probably shouldn't simply use diff at L9 in clang/test/SymbolGraph/global_record.c.

zixuw added a comment.Mar 16 2022, 5:04 PM

We are seeing test failures after this patch is landed:

Clang :: SymbolGraph/global_record.c
Script:
--
: 'RUN: at line 1';   rm -rf /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp
: 'RUN: at line 2';   split-file /b/s/w/ir/x/w/llvm-llvm-project/clang/test/SymbolGraph/global_record.c /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp
: 'RUN: at line 3';   sed -e "s@INPUT_DIR@/b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp@g" /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/reference.output.json.in >>  /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/reference.output.json
: 'RUN: at line 5';   /b/s/w/ir/x/w/staging/llvm_build/bin/clang -extract-api -target arm64-apple-macosx  /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/input.c -o /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/output.json | /b/s/w/ir/x/w/staging/llvm_build/bin/FileCheck -allow-empty /b/s/w/ir/x/w/llvm-llvm-project/clang/test/SymbolGraph/global_record.c
: 'RUN: at line 7';   sed -e "s@\"generator\": \"clang.*\"@\"generator\": \"clang\"@g"  /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/output.json >> /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/output-normalized.json
: 'RUN: at line 9';   diff /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/reference.output.json /b/s/w/ir/x/w/staging/llvm_build/tools/clang/test/SymbolGraph/Output/global_record.c.tmp/output-normalized.json
--
Exit Code: 1

Command Output (stdout):
--
8c8
<     "generator": "clang"
---
>     "generator": "Fuchsia clang version 15.0.0 (https://llvm.googlesource.com/a/llvm-project a597d6a780b184539f504392168b004bf392a135)"

--

Failed builder task: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8819495289632137105/overview . Detailed test outputs: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8819495289632137105/test-results

Could you fix the test please? If it takes a long time to fix, could you revert the change first please?
I think you probably shouldn't simply use diff at L9 in clang/test/SymbolGraph/global_record.c.

Ah sorry about that, fixing it right now

zixuw added a comment.Mar 16 2022, 5:10 PM

@haowei I got a quick fix of completely purging the generator output in the test case. Could you take a look to have a sanity check before I land the fix on main?

diff --git a/clang/test/SymbolGraph/global_record.c b/clang/test/SymbolGraph/global_record.c
index ba4bf967e630..1e0294cda153 100644
--- a/clang/test/SymbolGraph/global_record.c
+++ b/clang/test/SymbolGraph/global_record.c
@@ -4,7 +4,9 @@
 // RUN: %t/reference.output.json
 // RUN: %clang -extract-api -target arm64-apple-macosx \
 // RUN: %t/input.c -o %t/output.json | FileCheck -allow-empty %s
-// RUN: sed -e "s@\"generator\": \"clang.*\"@\"generator\": \"clang\"@g" \
+
+// Generator version is not consisten across test runs, normalize it.
+// RUN: sed -e "s@\"generator\": \".*\"@\"generator\": \"?\"@g" \
 // RUN: %t/output.json >> %t/output-normalized.json
 // RUN: diff %t/reference.output.json %t/output-normalized.json
 
@@ -32,7 +34,7 @@ char unavailable __attribute__((unavailable));
       "minor": 5,
       "patch": 3
     },
-    "generator": "clang"
+    "generator": "?"
   },
   "module": {
     "name": "",

@haowei I got a quick fix of completely purging the generator output in the test case. Could you take a look to have a sanity check before I land the fix on main?

diff --git a/clang/test/SymbolGraph/global_record.c b/clang/test/SymbolGraph/global_record.c
index ba4bf967e630..1e0294cda153 100644
--- a/clang/test/SymbolGraph/global_record.c
+++ b/clang/test/SymbolGraph/global_record.c
@@ -4,7 +4,9 @@
 // RUN: %t/reference.output.json
 // RUN: %clang -extract-api -target arm64-apple-macosx \
 // RUN: %t/input.c -o %t/output.json | FileCheck -allow-empty %s
-// RUN: sed -e "s@\"generator\": \"clang.*\"@\"generator\": \"clang\"@g" \
+
+// Generator version is not consisten across test runs, normalize it.
+// RUN: sed -e "s@\"generator\": \".*\"@\"generator\": \"?\"@g" \
 // RUN: %t/output.json >> %t/output-normalized.json
 // RUN: diff %t/reference.output.json %t/output-normalized.json
 
@@ -32,7 +34,7 @@ char unavailable __attribute__((unavailable));
       "minor": 5,
       "patch": 3
     },
-    "generator": "clang"
+    "generator": "?"
   },
   "module": {
     "name": "",

It passed the test locally on my machine.

Hello,

Did you see this sanitizer error when running global_record.c?

https://lab.llvm.org/buildbot/#/builders/5/builds/20765/steps/13/logs/stdio

zixuw added a comment.Mar 17 2022, 9:43 AM

Hello,

Did you see this sanitizer error when running global_record.c?

https://lab.llvm.org/buildbot/#/builders/5/builds/20765/steps/13/logs/stdio

Hi! Taking a look right now

Hello,

Did you see this sanitizer error when running global_record.c?

https://lab.llvm.org/buildbot/#/builders/5/builds/20765/steps/13/logs/stdio

Hi @uabelho, I've disabled the test for now in 54b145d5cac2b008380828e8f67c439038ea370b.

Hello,

Did you see this sanitizer error when running global_record.c?

https://lab.llvm.org/buildbot/#/builders/5/builds/20765/steps/13/logs/stdio

Hi @uabelho, I've disabled the test for now in 54b145d5cac2b008380828e8f67c439038ea370b.

Thanks!

zixuw added a comment.Mar 21 2022, 9:41 AM

Hello,

Did you see this sanitizer error when running global_record.c?

https://lab.llvm.org/buildbot/#/builders/5/builds/20765/steps/13/logs/stdio

@dang fixed this in fc3537697db7724834d8071cfee10cacceb9fc2a and flipped the test back on. Thanks Daniel!

clang/lib/CMakeLists.txt