This is an archive of the discontinued LLVM Phabricator instance.

[clang-doc] Implement a YAML generator
ClosedPublic

Authored by juliehockett on Feb 22 2018, 6:20 PM.

Details

Summary

Implmenting a YAML generator from the emitted bitcode summary of declarations. Emits one YAML file per declaration information.

For a more detailed overview of the tool, see the design document on the mailing list: RFC: clang-doc proposal

Diff Detail

Event Timeline

juliehockett created this revision.Feb 22 2018, 6:20 PM

Please run Clang-format and Clang-tidy modernize over new code.

clang-doc/generators/Generators.h
29

Unnecessary ; after constructor body. Please enable Clang's -Wextra-semi

30

Please use = default;

47

Unnecessary ; after constructor body.

48

Please use = default;

test/clang-doc/namespace-yaml.cpp
11

Unnecessary ; after function body.

I may be mistaken, but Clang source code didn't use llvm namespace by default. Insetad you should include clang/Basic/LLVM.h and use llvm:: for rest of things.

clang-doc/tool/ClangDocMain.cpp
42

There is no need to enclose static definitions in anonymous namespace.

juliehockett marked 6 inline comments as done.

Updating based on parent changes

Updating to for adjustments to the internal representation & cleaning up duplication.

This comment was removed by Athosvk.
clang-doc/generators/YAMLGenerator.cpp
159

You should easily be able to unify these 'empty' checks

I'm a bit late on this, but I'd say that YAML is usually not a 'final' format. What would be the use-cases for this? And if is meant as an alternative intermediate format, why not instead of having one big file, have one file per input file? Just wondering what the particular motivation for that could be

(Don't mind the previous and inline comment, it was old and I never got to submit it. Differential doesn't let me remove it unfortunately)

I'm a bit late on this, but I'd say that YAML is usually not a 'final' format. What would be the use-cases for this? And if is meant as an alternative intermediate format, why not instead of having one big file, have one file per input file? Just wondering what the particular motivation for that could be

The idea is that it's a generally-consumable output format, and so could be interpreted by external software fairly trivially. The bitsream format is compact and good for things that will live entirely in the clang-doc tool, but is harder to deal with outside that scope. YAML bridges that gap.

I haven't thought too much about splitting it up, but it might make sense, given that the one file could get very large. By file might work--let me play with it a bit to see what makes the most sense.

Athosvk added a comment.EditedApr 10 2018, 7:55 AM

I'm a bit late on this, but I'd say that YAML is usually not a 'final' format. What would be the use-cases for this? And if is meant as an alternative intermediate format, why not instead of having one big file, have one file per input file? Just wondering what the particular motivation for that could be

The idea is that it's a generally-consumable output format, and so could be interpreted by external software fairly trivially. The bitsream format is compact and good for things that will live entirely in the clang-doc tool, but is harder to deal with outside that scope. YAML bridges that gap.

That's what I expected :).

The primary advantage of one file per output to us was granularity. That allows you to do incremental builds and distribute them at this stage (locally over multiple cores, but also remotely if need be).

sammccall edited reviewers, added: ioeric; removed: sammccall.May 15 2018, 7:20 AM
sammccall added a subscriber: sammccall.
ioeric added inline comments.May 17 2018, 2:41 AM
clang-doc/Representation.h
238 ↗(On Diff #140021)

There are two alternatives to avoid this:

  1. Pull the info storage into a separate struct e.g. struct InfoSet::Infos, and add a constructor setter setInfos(Infos &&);
  2. friend yaml::MappingTraits<InfoSet>;
clang-doc/generators/CMakeLists.txt
4

Why is this a separate library?

clang-doc/generators/Generators.h
25

Please add documentation.

27

The parameters are not trivial. Could you add documentation?

27

Passing a reference to a unique pointer seems a bit strange. If Generator doesn't own IS, it should be passed by reference; otherwise, this should be passed by value.

29

Have you considered passing a output stream instead of passing in a directory and writing output to a hardcoded name? And I think Root (or a output stream) should be passed in via generateinstead the constructor.

40

Please add documentation.

50

Please add documentation and explain why this is needed.

50

If you plan to plugin in more generators (e.g. in your customized build of clang-doc), consider using llvm::Registry which allows you to link in new generators without having to modify code. See clang::clangd::URISchemeRegistry for sample usage.

clang-doc/generators/YAMLGenerator.cpp
224

YAML mapping for unique_ptr is a bit unusual. I wonder whether this would work all the time e.g. if the unique_ptr has not been allocated.

251

Note that this would only work on real file system. You would not be able to run this in unit tests, and some tool executors run on virtual file system as well, so if you do go with directory, you would also want to pass in a VFS (clang/Basic/VirtualFileSystem.h).

juliehockett marked 11 inline comments as done.
juliehockett edited the summary of this revision. (Show Details)

Updating for better integration with MR framework, the generator now takes in an info and an output stream and emits the YAML to that stream. Each info is emitted to its own YAML file.

clang-doc/generators/Generators.h
50

Oh cool -- didn't know about that. Thanks!

clang-doc/generators/YAMLGenerator.cpp
224

Mmm yes it's a little weird -- that said, is there a better way emit the info given a pointer?

ioeric added inline comments.Jun 4 2018, 1:33 AM
clang-doc/Generators.h
25 ↗(On Diff #149373)

nit: This seems a bit redundant as the virtual method has been set to 0.

49 ↗(On Diff #149373)

I think this should go into the cpp file, and you might need the anchor source/destination trick to force the plugin to be linked. See example:
https://github.com/llvm-mirror/clang/blob/master/lib/Tooling/StandaloneExecution.cpp#L88
https://github.com/llvm-mirror/clang/blob/master/lib/Tooling/Execution.cpp#L111

YAMLGenerator could potentially also live in cpp file if you don't expect it to be used directly.

clang-doc/generators/YAMLGenerator.cpp
224

Not sure what the initial intention was for using unique_ptr here, but one option is to get rid of the unique_ptr and just store CommentInfo in the structure. Another (less ideal) option is to check whether I is nullptr and allocate when it's null (only when deserialization though; I think you could tell this from IO); and you'd probably want to avoid the allocation when the comment info is actually not present in IO (IIRC, mapping would be called even when the info is not present).

clang-doc/tool/ClangDocMain.cpp
112–113

If you want to fail early when Format is invalid, maybe move it a bit more earlier, e.g. before generating MapOutput?

123

nit: I wonder if you could merge creation of InfoOS into getPath so that the helper function would just return llvm::Expected<raw_fd_ostream>

juliehockett marked 4 inline comments as done.

Addressing comments

ioeric accepted this revision.Jun 5 2018, 1:10 AM

lgtm

clang-doc/Generators.h
25 ↗(On Diff #149373)

Oops, sorry, I only meant the first sentence; the second sentence (" ... expected to be implement and exposed ...") is still useful.

clang-doc/Representation.h
198 ↗(On Diff #149856)

nit: maybe just put comment in a separate line.

clang-doc/YAMLGenerator.cpp
265 ↗(On Diff #149856)

nit: add static?

This revision is now accepted and ready to land.Jun 5 2018, 1:10 AM
juliehockett marked 3 inline comments as done.

Fixing comments

juliehockett added inline comments.Jun 5 2018, 12:06 PM
clang-doc/YAMLGenerator.cpp
265 ↗(On Diff #149856)

static declares it as file-local, and so the extern reference to this in Generators.cpp wouldn't be allowed.

clang-doc/generators/YAMLGenerator.cpp
224

Sorry this slipped through before -- it's a pointer because it has to be able to store itself (the other infos just store the CommentInfo directly, but each CommentInfo can have comment children). The pointer only appears there.

clang-doc/tool/ClangDocMain.cpp
123

You can't, because when you return llvm::make_error it invokes a copy constructor, which is deleted in raw_ostreams.

juliehockett closed this revision.Jun 6 2018, 9:18 AM

Closed in r334103 .