This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
tools/
-
CMakeLists.txt
-
clang-doc/
-
CMakeLists.txt
19/21
ClangDoc.h
19/19
ClangDoc.cpp
22/27
ClangDocReporter.h
48/48
ClangDocReporter.cpp
-
tool/
-
CMakeLists.txt
5/5
ClangDocMain.cpp

Differential D41102

Setup clang-doc frontend framework
ClosedPublic

Authored by juliehockett on Dec 11 2017, 5:36 PM.

Download Raw Diff

Details

Reviewers

klimek
jakehehrlich
sammccall
lebedev.ri

Commits

Summary

Setting up the mapper part of the frontend framework for a clang-doc tool. It creates a series of relevant matchers for declarations, and uses the ToolExecutor to traverse the AST and extract the matching declarations and comments. The mapper serializes the extracted information to individual records for reducing and eventually doc generation.

For a more detailed overview of the tool, see the design document on the mailing list: RFC: clang-doc proposal

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Refactoring bitcode writer

Next, i suggest to look into code self-debugging, see comments.
Also, i have added a few questions, it would be great to know that my understanding is correct?

I'm sorry that it seems like we are going over and over and over over the same code again,
this is the very base of the tool, i think it is important to get it as close to great as possible.
I *think* these review comments move it in that direction, not in the opposite direction?

clang-doc/BitcodeWriter.cpp
47 ↗	(On Diff #135559)	So in other words this is making an assumption that no file with more than 65535 lines will be analyzed, correct? Can you add that as comment please?
56 ↗	(On Diff #135559)	AbbrevDsc Abbrev = nullptr;
57 ↗	(On Diff #135559)	// Is this 'description' valid? operator bool() const { return Abbrev != nullptr && Name.data() != nullptr && !Name.empty(); }
137 ↗	(On Diff #135559)	So `FUNCTION_MANGLED_NAME` is phased out, and is thus missing, as far as i understand?
148 ↗	(On Diff #135559)	+`assert(RecordIdNameMap[ID] && "Unknown Abbreviation");`
153 ↗	(On Diff #135559)	+`assert(RecordIdNameMap[ID] && "Unknown Abbreviation");`
158 ↗	(On Diff #135559)	Called only once, and that call does nothing. I'd drop it.
175 ↗	(On Diff #135559)	/// \brief Emits a block ID and the block name to the BLOCKINFO block. void ClangDocBitcodeWriter::emitBlockID(BlockId ID) { const auto& BlockIdName = BlockIdNameMap[ID]; assert(BlockIdName.data() && BlockIdName.size() && "Unknown BlockId!"); Record.clear(); Record.push_back(ID); Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETBID, Record); Record.clear(); for (const char C : BlockIdName) Record.push_back(C); Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_BLOCKNAME, Record); }
187 ↗	(On Diff #135559)	/// \brief Emits a record name to the BLOCKINFO block. void ClangDocBitcodeWriter::emitRecordID(RecordId ID) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); prepRecordData(ID); (Yes, `prepRecordData()` will have the same code. It should get optimized away.)
194 ↗	(On Diff #135559)	void ClangDocBitcodeWriter::emitAbbrev(RecordId ID, BlockId Block) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); auto Abbrev = std::make_shared<BitCodeAbbrev>();
204 ↗	(On Diff #135559)	So remember that in a previous iteration, seemingly useless `AbbrevDsc` stuff was added to the `RecordIdNameMap`? It is going to pay-off now: void ClangDocBitcodeWriter::emitRecord(StringRef Str, RecordId ID) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); assert(RecordIdNameMap[ID].Abbrev == &StringAbbrev && "Abbrev type mismatch"); if (!prepRecordData(ID, !Str.empty())) return; ... And if we did not add an `RecordIdNameMap` entry for this `RecordId`, then i believe that will also be detected because `Abbrev` will be a `nullptr`.
205 ↗	(On Diff #135559)	assert(Str.size() < (1U << BitCodeConstants::StringLengthSize)); Record.push_back(Str.size());
210 ↗	(On Diff #135559)	void ClangDocBitcodeWriter::emitRecord(const Location &Loc, RecordId ID) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); assert(RecordIdNameMap[ID].Abbrev == &LocationAbbrev && "Abbrev type mismatch"); if (!prepRecordData(ID, !OmitFilenames)) return; ...
211 ↗	(On Diff #135559)	Call me paranoid, but: assert(Loc.LineNumber < (1U << BitCodeConstants::LineNumberSize)); Record.push_back(Loc.LineNumber); assert(Loc.Filename.size()) < (1U << BitCodeConstants::StringLengthSize)); Record.push_back(Loc.Filename.size());
217 ↗	(On Diff #135559)	void ClangDocBitcodeWriter::emitRecord(int Val, RecordId ID) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); assert(RecordIdNameMap[ID].Abbrev == &IntAbbrev && "Abbrev type mismatch"); if (!prepRecordData(ID, Val)) return;
218 ↗	(On Diff #135559)	assert(Val < (1U << BitCodeConstants::IntSize)); Record.push_back(Val);
222 ↗	(On Diff #135559)	bool ClangDocBitcodeWriter::prepRecordData(RecordId ID, bool ShouldEmit) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); if (!ShouldEmit) return false;
232 ↗	(On Diff #135559)	Since `ClangDocBitcodeWriter` is not re-used, but re-constructed* each time, `Abbrevs.clear();` does nothing. Hmm, i wonder if that will be a bad thing. Benchmarking will tell i guess :/
236 ↗	(On Diff #135559)	https://godbolt.org/g/rD6BWK also suggests it should be `static const`
276 ↗	(On Diff #135559)	Uhm, do you plan on calling `emitBlockInfo()` from anywhere else other than `emitBlockInfoBlock()`? Since it takes `const std::vector<RecordId>` instead of a `const std::initializer_list<RecordId>&`, a memory copy will happen... https://godbolt.org/g/rD6BWK
clang-doc/BitcodeWriter.h
35 ↗	(On Diff #135559)	`LineNumFixedSize` is used for a different things. Given such a specific name, i think it may be confusing? Also, looking at http://llvm.org/doxygen/classllvm_1_1BitstreamWriter.html#ae6a40b4a5ea89bb8b5076c26e0d0b638 i guess these all should be `unsigned`. I think this would be better, albeit more verbose: struct BitCodeConstants { static constexpr unsigned SignatureBitSize = 8U; static constexpr unsigned SubblockIDSize = 5U; static constexpr unsigned IntSize = 16U; static constexpr unsigned StringLengthSize = 16U; static constexpr unsigned LineNumberSize = 16U; };
53 ↗	(On Diff #135559)	So what exactly does `BitCodeConstants::SubblockIDSize` mean? static_assert(BI_LAST < (1U << BitCodeConstants::SubblockIDSize), "Too many block id's!"); ?
94 ↗	(On Diff #135559)	So i have a question: if something (`FUNCTION_MANGLED_NAME` in this case) is phased out, does it have to stay in this enum? That will introduce holes in `RecordIdNameMap`. Are the actual numerical id's of enumerators stored in the bitcode, or the string (abbrev, `RecordIdNameMap[].Name`)? Looking at tests, i guess these enums are internal detail, and they can be changed freely, including removing enumerators. Am i wrong? I think that should be explained in a comment before this `enum`.
100 ↗	(On Diff #135559)	If `AbbreviationMap` comment makes sense, i guess that common code should be moved here, i.e. static constexpr unsigned RecordIdCount = RI_LAST - RI_FIRST + 1; and use this new variable in those two places.
163 ↗	(On Diff #135559)	We know we will have at most `RI_LAST - RI_FIRST + 1` abbreviations. Right now that results in just ~40 abbreviations. Would it make sense to AbbreviationMap() : Abbrevs(RI_LAST - RI_FIRST + 1) {} ? (or `llvm::DenseMap<unsigned, unsigned> Abbrevs = llvm::DenseMap<unsigned, unsigned>(RI_LAST - RI_FIRST + 1);` but that looks uglier to me..)

The change to USR seems like quite an improvement already! That being said, I do think that it might be preferable to opt out of the use of strings for linking things together. What we did with our clang-doc is that we directly used pointers to refer to other types. So for example, our class for storing Record/CXX related information has something like:

std::vector<Function*>	mMethods;
std::vector<Variable*>	mVariables;
std::vector<Enum*>	mEnums;
std::vector<Typedef*>	mTypedefs;

Only upon serialization we fetch some kind of USR that would uniquely identify the type. This is especially useful to us for the conversion to HTML and I think the same would go for this backend, as it seems this way you'll have to do string lookups to get to the actual types, which would be inefficient in multiple aspects. It can make the backend a little more of a one-on-one conversion, e.g. with one of our HTML template definitions (note: this is a Jinja2 template in Python):

{%- for enum in inEntry.GetMemberEnums() -%}
	<tr class="separator">
		<td class="memSeparator" colspan="3"></td>
	</tr>
	<tr class="memitem:EAllocatorStrategy">
		<td class="memItemLeft" align="right">{{- Modifiers.RenderAccessModifier(enum.GetAccessModifier()) -}}</td>
		<td class="memItemMiddle" align="left">enum <a href="{{ enum.GetID() }}.html">{{- enum.GetName().GetName()|e -}}</a></td>
		<td class="memItemRight" valign="bottom">{{- Descriptions.RenderDescription(enum.GetBriefDescription()) -}}</td>
	</tr>
{%- endfor -%}

Disadvantage is of course that you add complexity to certain parts of the deserialization (/serialization) for nested types and inheritance, by either having to do so in the correct order or having to defer the process of initializing these pointers. But see this as just as some thought sharing. I do think this would improve the interaction in the backend (assuming you use the same representation as currently in the frontend). Also, we didn't apply this to our Type representation (which we use to store the type of a member, parameter etc.), which stores the name of the type rather than a pointer to it (since it can also be a built-in), though it embeds pretty much every possible modifier on said type, like this:

EntryName			mName;									
bool				mIsConst = false;						
EReferenceType			mReferenceType = EReferenceType::None;	
std::vector<bool>		mPointerConstnessMask;					
std::vector<std::string>	mArraySizes;							
bool				mIsAtomic = false;						
std::vector<Attribute>		mAttributes;							
bool				mIsExpansion = false;					
std::vector<TemplateArgument>	mTemplateArguments;						
std::unique_ptr<FunctionTypeProperties>     mFunctionTypeProperties = nullptr;		
EntryName			mParentCXXEntry;

The last member refers to the case where a pointer is a pointer to member, though some other fields may require some explaining too. Anyway, this is just to give some insight into how we structured our representation, where we largely omitted string representations where possible.

Have you actually started work already on some backend? Developing backend and frontend in tandem can provide some additional insights as to how things should be structured, especially representation-wise!

clang-doc/Representation.h
113 ↗	(On Diff #135559)	How come these are actually unique ptrs? They can be stored directly in the vector, right? (same for CommentInfo children, FnctionInfo params etc.)

Please run Clang-format and Clang-tidy modernize.

clang-doc/Representation.h
80 ↗	(On Diff #135559)	Please separate constructors from data members with empty line.

Continued refactoring the bitcode writer
Added a USR attribute to infos
Created a Reference struct to replace the string references to other infos

In D41102#1017499, @Athosvk wrote:

Disadvantage is of course that you add complexity to certain parts of the deserialization (/serialization) for nested types and inheritance, by either having to do so in the correct order or having to defer the process of initializing these pointers. But see this as just as some thought sharing. I do think this would improve the interaction in the backend (assuming you use the same representation as currently in the frontend).

I agree that the pointer approach would be much more efficient on the backend, but the issue here is that the mapper has no idea where the representation of anything other than the decl it's currently looking at will be, since it sees each decl and serializes it immediately. The reducer, on the other hand, will be able to see everything, and so such pointers could be added as a pass over the final reduced data structure.
So, as an idea (as this diff implements), I updated the string references to be a struct, which holds the USR of the referenced type (for serialization, both here in the mapper and for the dump option in the reducer, as well as a pointer to an Info struct. This pointer is not used at this point, but would be populated by the reducer. Thoughts?

Have you actually started work already on some backend? Developing backend and frontend in tandem can provide some additional insights as to how things should be structured, especially representation-wise!

I added you as a subscriber on the follow-up patches (the reducer, YAML/MD formats) -- would love to hear your thoughts! As of now, the MD output is very rough, but I'm hoping to keep moving forward on that in the next few days.

clang-doc/BitcodeWriter.h
53 ↗	(On Diff #135559)	It's the current abbrev id width for the block (described here), so it's the max id width for the block's abbrevs.
94 ↗	(On Diff #135559)	Yes, the enum is an implementation detail (`FUNCTION_MANGLED_NAME` should have been removed earlier). I'll put the comment describing how it works!

Fixing CMakeLists formatting

Could you please add a bit more tests? In particular, i'd like to see how blocks-in-blocks work.
I.e. class-in-class, class-in-function, ...

Is there some (internal to BitstreamWriter) logic that would 'assert()' if trying to output some recordid
which is, according to the BLOCKINFO_BLOCK, should not be there?
E.g. outputting VERSION in BI_COMMENT_BLOCK_ID?

clang-doc/BitcodeWriter.cpp
30 ↗	(On Diff #135682)	Ok, these three functions still look off, how about this? // Yes, not by reference, https://godbolt.org/g/T52Vcj static void AbbrevGen(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev, const std::initializer_list<llvm::BitCodeAbbrevOp> Ops) { for(const auto &Op : Ops) Abbrev->Add(Op); } static void IntAbbrev(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev) { AbbrevGen(Abbrev, { // 0. Fixed-size integer {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::IntSize}}); } static void StringAbbrev(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev) { AbbrevGen(Abbrev, { // 0. Fixed-size integer (length of the following string) {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::StringLengthSize}, // 1. The string blob {llvm::BitCodeAbbrevOp::Blob}}); } // Assumes that the file will not have more than 65535 lines. static void LocationAbbrev(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev) { AbbrevGen(Abbrev, { // 0. Fixed-size integer (line number) {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::LineNumberSize}, // 1. Fixed-size integer (length of the following string (filename)) {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::StringLengthSize}, // 2. the string blob {llvm::BitCodeAbbrevOp::Blob}}); } Though i bet clang-format will mess-up the formatting again :/
108 ↗	(On Diff #135682)	Some of these `IntAbbrev`'s are actually `bool`s. Would it make sense to already think about being bitcode-size-conservative and introduce `BoolAbbrev` from the get go? static void BoolAbbrev(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev) { AbbrevGen(Abbrev, { // 0. Fixed-size boolean {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::BoolSize}}); } where `BitCodeConstants::BoolSize` = `1U` ? Or is there some internal padding that would make that pointless?
156 ↗	(On Diff #135682)	Uh, oh, i'm sorry, all(?) these `"Unknown Abbreviation"` are likely copypaste gone wrong. I'm not sure why i wrote that comment. `"Unknown RecordId"` might make more sense?
240 ↗	(On Diff #135682)	Ok, now that i think about it, it can't be that easy. Maybe FIXME: assumes 8 bits per byte assert(llvm::APInt(8Usizeof(Val), Val, /isSigned=*/true).getBitWidth() <= BitCodeConstants::IntSize)); Not sure whether `getBitWidth()` is really the right function to ask though. (Not sure how this all works for negative numbers)
clang-doc/BitcodeWriter.h
53 ↗	(On Diff #135559)	So in other words that `static_assert()` is doing the right thing? Add it after the `enum BlockId{}` then please, will both document things, and ensure that things remain in a sane state.
172 ↗	(On Diff #135682)	Newline after constructor
216 ↗	(On Diff #135682)	`// Emission of appropriate abbreviation type`

Thank you for working on this!
Some more thoughts.

clang-doc/BitcodeWriter.cpp
191 ↗	(On Diff #135682)	Why do we have this indirection? Is there a need to first to (unefficiently?) copy to `Record`, and then emit from there? Wouldn't this work just as well? Record.clear(); Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_BLOCKNAME, BlockIdNameMap[ID]);
196 ↗	(On Diff #135682)	Hmm, so i've been staring at this and http://llvm.org/doxygen/classllvm_1_1BitstreamWriter.html and i must say i'm not fond of this indirection. What i don't understand is, in previous function, we don't store `BlockId`, why do we want to store `RecordId`? Aren't they both unstable, and are implementation detail? Do we want to store it (`RecordId`)? If yes, please explain it as a new comment in code. If no, i guess this would work too? assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); Record.clear(); Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETRECORDNAME, RecordIdNameMap[ID].Name); And after that you can lower the default size of `SmallVector<> Record` down to, hm, `4`?
clang-doc/BitcodeWriter.h
161 ↗	(On Diff #135682)	This alias is used exactly once, for `Record` member variable in this class. Is there any point in having this alias?
161 ↗	(On Diff #135682)	Also, why is `uint64_t` used? We either push `char`, or `enum`, or `int`. Do we ever need 64-bit?
clang-doc/ClangDoc.h
47 ↗	(On Diff #135682)	Please add space before `{}`, and drop unneeded `;`
clang-doc/Mapper.h
56 ↗	(On Diff #135682)	`ClangDocMapper` class is staring to look like a god-class. I would recommend: Rename `ClangDocMapper` to `ClangDocASTVisitor`. It's kind-of conventional to name `RecursiveASTVisitor`-based classes like that. Move `ClangDocCommentVisitor` out of the `ClangDocMapper`, into `namespace {}` in `clang-doc/Mapper.cpp` Split `ClangDocSerializer` into new .h/.cpp Replace `ClangDocSerializer Serializer;` with `ClangDocSerializer& Serializer;` Instantiate `ClangDocSerializer` (in `MapperActionFactory`, i think?) before `ClangDocMapper` Pass `ClangDocSerializer&` into `ClangDocMapper` ctor.

lebedev.ri mentioned this in D43779: [Tooling] [0/1] Refactor FrontendActionFactory::create() to return std::unique_ptr<>.Feb 26 2018, 12:47 PM

Moved the serialization logic out of the Mapper class and into its own namespace
Updated tests
Addressing comments

In D41102#1017918, @lebedev.ri wrote:

Is there some (internal to BitstreamWriter) logic that would 'assert()' if trying to output some recordid
which is, according to the BLOCKINFO_BLOCK, should not be there?
E.g. outputting VERSION in BI_COMMENT_BLOCK_ID?

Yes -- it will fail an assertion:
Assertion 'V == Op.getLiteralValue() && "Invalid abbrev for record!"' failed.

clang-doc/BitcodeWriter.cpp
191 ↗	(On Diff #135682)	No, since `BlockIdNameMap[ID]` returns a `StringRef`, which can be manipulated into an `std::string` or a `const char*`, but the `Stream` wants an `unsigned char`. So, the copying is to satisfy that. Unless there's a better way to convert a `StringRef` into an array of `unsigned char`?
196 ↗	(On Diff #135682)	I'm not entirely certain what you mean -- in `emitBlockId()`, we are storing both the block id and the block name in separate records (`BLOCKINFO_CODE_SETBID`, `BLOCKINFO_CODE_BLOCKNAME`, respectively). In `emitRecordId()`, we're doing something slightly different, in that we emit one record with both the record id and the record name (in record `BLOCKINFO_CODE_SETRECORDNAME`). Replacing the copy loop here has the same issue as above, namely that there isn't an easy way to convert between a `StringRef` and an array of `unsigned char`.
240 ↗	(On Diff #135682)	That assertion fails :/ I could do something like `static_cast<int64_t>(Val) == Val` but that would require a) IntSize being a power of 2 b) updating the assert anytime IntSize is updated, and 3) still throws a warning about comparing a signed to an unsigned int...
clang-doc/BitcodeWriter.h
53 ↗	(On Diff #135559)	No...it's the (max) number of the abbrevs relevant to the block itself, which is to say some subset of the RecordIds for any given block (e.g. for a `BI_COMMENT_BLOCK`, the number of abbrevs would be 12 and so on the abbrev width would be 4). To assert for it we could put block start/end markers on the RecordIds and then use that to calculate the bitwidth, if you think the assertion should be there.

Diffusion mentioned this in rC326201: [Tooling] [0/1] Refactor FrontendActionFactory::create() to return std….Feb 27 2018, 7:22 AM

Diffusion mentioned this in rL326201: [Tooling] [0/1] Refactor FrontendActionFactory::create() to return std….

Tried fixing tooling::FrontendActionFactory::create() in D43779/D43780, but had to revert due to gcc4.8 issues :/

Thank you for working on this, some more review notes.

In D41102#1020107, @juliehockett wrote:

In D41102#1017918, @lebedev.ri wrote:

Is there some (internal to BitstreamWriter) logic that would 'assert()' if trying to output some recordid
which is, according to the BLOCKINFO_BLOCK, should not be there?
E.g. outputting VERSION in BI_COMMENT_BLOCK_ID?

Yes -- it will fail an assertion:
Assertion 'V == Op.getLiteralValue() && "Invalid abbrev for record!"' failed.

Ok, great.
And it will also complain if you try to output a block within block?

clang-doc/BitcodeWriter.cpp
191 ↗	(On Diff #135682)	Aha, i see, did not think of that. But there is a `bytes()` function in `StringRef`, which returns `iterator_range<const unsigned char *>`. Would it help? http://llvm.org/doxygen/classllvm_1_1StringRef.html#a5e8f22c3553e341404b445430a3b075b
240 ↗	(On Diff #135682)	I see. Let's not have this assertion for now, just a `FIXME`.
184 ↗	(On Diff #136010)	That comment seems wrong. If the namespace is indeed supposed to be closed, it should happen after the lambda is called, i.e. assert(RecordIdNameMap.size() == RecordIdCount); return RecordIdNameMap; }(); } // namespace doc // AbbreviationMap
265 ↗	(On Diff #136010)	I think it is as simple as assert(Loc.LineNumber < (1U << BitCodeConstants::LineNumberSize)); ?
367 ↗	(On Diff #136010)	So i guess this should be: void ClangDocBitcodeWriter::emitBlockInfo( BlockId BID, const std::initializer_list<RecordId> &RIDs) { assert(RIDs.size() < (1U << BitCodeConstants::SubblockIDSize), "Too many records in a block!"); emitBlockID(BID); ... ?
clang-doc/BitcodeWriter.h
53 ↗	(On Diff #135559)	Aha, i see, so that should go into `ClangDocBitcodeWriter::emitBlockInfoBlock()`, since that already has that info. (On a related node, it feels like this all should be somehow tablegen-generated, but that is for some later, post-commit cleanup.)

Fixing comments

In D41102#1020808, @lebedev.ri wrote:

Ok, great.
And it will also complain if you try to output a block within block?

Um...no. Since you can have subblocks within blocks.

clang-doc/BitcodeWriter.cpp
191 ↗	(On Diff #135682)	Replaced it with an ArrayRef to the `bytes_begin()` and `bytes_end()`, but that only works for the block id, not the record id, since `emitRecordId()` also has to emit the ID number in addition to the name in the same record.
265 ↗	(On Diff #136010)	`LineNumber` is a signed int, so the compiler complains that we're comparing signed and unsigned ints.

lebedev.ri added inline comments.Feb 28 2018, 7:23 AM

clang-doc/BitcodeWriter.h

37 ↗

(On Diff #136161)

Hmm, you build with asserts enabled, right?
I tried testing this, and three tests fail with

clang-doc: /build/llvm/include/llvm/Bitcode/BitstreamWriter.h:122: void llvm::BitstreamWriter::Emit(uint32_t, unsigned int): Assertion `(Val & ~(~0U >> (32-NumBits))) == 0 && "High bits set!"' failed.

Failing Tests (3):
    Clang Tools :: clang-doc/mapper-class-in-function.cpp
    Clang Tools :: clang-doc/mapper-function.cpp
    Clang Tools :: clang-doc/mapper-method.cpp

  Expected Passes    : 6
  Unexpected Failures: 3

At least one failure is because of BoolSize, so i'd suspect the assertion itself is wrong...

Running clang-format and fixing newlines

clang-doc/BitcodeWriter.h
37 ↗	(On Diff #136161)	I do, and I've definitely seen that one triggered before but it's been because something was off in how the data was being outputted as I was shifting things around. That said, I'm not seeing it in my local build with this diff though -- I'll update it again just to make sure they're in sync.

Thank you for working on this!
Some more review notes.
Please look into adding a bit more tests.

clang-doc/BitcodeWriter.cpp
179 ↗	(On Diff #136303)	Since this is the only string we ever push to `Record`, can we add an assertion to make sure we always have enough room for it? E.g. for (const auto &Init : Inits) { RecordId RID = Init.first; RecordIdNameMap[RID] = Init.second; assert((1 + RecordIdNameMap[RID].size()) <= Record.size()); // Since record was just created, it should not have any dynamic size. // Or move the small size into a variable and use it when declaring the Record and here. }
230 ↗	(On Diff #136303)	Sadly, i can not prove it via godbolt (can't add LLVM as library), but i'd expect streamlining this should at least not hurt, i.e. something like Record.append(RecordIdNameMap[ID].Name.begin(), RecordIdNameMap[ID].Name.end()); ?
196 ↗	(On Diff #135682)	Tried locally, and yes, we do need to output record id. What we could actually do, is simply inline that `EmitRecord()`, first emitting the RID, and then the name. template <typename Container> void EmitRecord(unsigned Code, int ID, const Container &Vals) { // If we don't have an abbrev to use, emit this in its fully unabbreviated // form. auto Count = static_cast<uint32_t>(makeArrayRef(Vals).size()); EmitCode(bitc::UNABBREV_RECORD); EmitVBR(Code, 6); EmitVBR(Count + 1, 6); // Including ID EmitVBR64(ID, 6); // 'Prefix' with ID for (unsigned i = 0, e = Count; i != e; ++i) EmitVBR64(Vals[i], 6); } But that will result in rather ugly code. So given that the record names are quite short, and all the other strings we output directly, maybe leave it as it is for now, until it shows in profiles?
clang-doc/BitcodeWriter.h
226 ↗	(On Diff #136303)	Needs a comment about the choice of static size of Record. I.e. the maximal amount of stuff we expect to push there is recordname string (right now `IsDefinition` is the longest at `13` chars) + 1 integer. And add a newline // Notes SmallVector<uint32_t, 16> Record; llvm::BitstreamWriter &Stream; ...
37 ↗	(On Diff #136161)	I did not retry with updated tree/patch, but i'm quite sure i did hit those asserts. My current build line: -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DLLVM_BINUTILS_INCDIR:PATH=/usr/include -DLLVM_BUILD_TESTS:BOOL=ON -DLLVM_ENABLE_ASSERTIONS:BOOL=ON -DLLVM_ENABLE_LLD:BOOL=ON -DLLVM_ENABLE_PROJECTS:STRING=clang;libcxx;libcxxabi;compiler-rt;lld -DLLVM_ENABLE_SPHINX:BOOL=ON -DLLVM_ENABLE_WERROR:BOOL=ON -DLLVM_PARALLEL_LINK_JOBS:STRING=1 -DLLVM_TARGETS_TO_BUILD:STRING=X86 -DLLVM_USE_SANITIZER:STRING=Address Additional env variables: export MALLOC_CHECK_=3 export MALLOC_PERTURB_=$(($RANDOM % 255 + 1)) export ASAN_OPTIONS=abort_on_error=1 export UBSAN_OPTIONS=print_stacktrace=1
clang-doc/Mapper.cpp
28 ↗	(On Diff #136303)	+// If we should ignore this declaration, exit this decl ?
clang-doc/Mapper.h
30 ↗	(On Diff #136303)	I wonder if we could reflect the usage of `RecursiveASTVisitor` in the class name. Though `ClangDocMapperASTVisitor` sounds too long?
clang-doc/Representation.h
27 ↗	(On Diff #136303)	Is there an intentional decision to minimize `sizeof()` of these structs? Many(?) of those could be `SmallString`'s
test/CMakeLists.txt
44 ↗	(On Diff #136303)	There is are no tests with `CommentBlock` blocks.
test/clang-doc/mapper-class-in-class.cpp
6 ↗	(On Diff #136161)	Ok, so this actually produced `c:@S@X.bc` and `c:@S@X@S@Y.bc`. Please do something like: // RUN: llvm-bcanalyzer %t/docs/c:@S@X.bc --dump \| FileCheck %s --check-prefix CHECK-X // RUN: llvm-bcanalyzer %t/docs/c:@S@X@S@Y.bc --dump \| FileCheck %s --check-prefix CHECK-X-Y // CHECK-X: <BLOCKINFO_BLOCK/> // CHECK-X: <VersionBlock NumWords=1 BlockCodeSize=4> // CHECK-X: <Version abbrevid=4 op0=1/> // CHECK-X: </VersionBlock> // CHECK-X: <RecordBlock NumWords=6 BlockCodeSize=4> // CHECK-X: <USR abbrevid=4 op0=6/> blob data = 'c:@S@X' // CHECK-X: <Name abbrevid=5 op0=1/> blob data = 'X' // CHECK-X: <IsDefinition abbrevid=7 op0=1/> // CHECK-X: <TagType abbrevid=10 op0=3/> // CHECK-X: </RecordBlock> // CHECK-X-Y: <BLOCKINFO_BLOCK/> // CHECK-X-Y: <VersionBlock NumWords=1 BlockCodeSize=4> // CHECK-X-Y: <Version abbrevid=4 op0=1/> // CHECK-X-Y: </VersionBlock> // CHECK-X-Y: <RecordBlock NumWords=11 BlockCodeSize=4> // CHECK-X-Y: <USR abbrevid=4 op0=10/> blob data = 'c:@S@X@S@Y' // CHECK-X-Y: <Name abbrevid=5 op0=1/> blob data = 'Y' // CHECK-X-Y: <Namespace abbrevid=6 op0=1 op1=6/> blob data = 'c:@S@X' // CHECK-X-Y: <IsDefinition abbrevid=7 op0=1/> // CHECK-X-Y: <TagType abbrevid=10 op0=3/> // CHECK-X-Y: </RecordBlock> On a related note, is there any way to auto-generate these `CHECK` lines? There is this `llvm/utils/update_test_checks.py`, but i doubt it will work here.
test/clang-doc/mapper-class-in-function.cpp
8 ↗	(On Diff #136161)	Here too, i suppose
test/clang-doc/mapper-enum.cpp
7–8 ↗	(On Diff #136303)	Could you please also add a similar `enum class` test?
17 ↗	(On Diff #136303)	Can `TypeBlock` be on the same depth as `VersionBlock`? Via `using`/`typename`? If yes, please add such a test.
test/clang-doc/mapper-method.cpp
8 ↗	(On Diff #136161)	And here

Fixing comments and adding tests

Thank you for working on this!
Some more nitpicking.

Please consider adding even more tests (ideally, all this code should have 100% test coverage)

clang-doc/BitcodeWriter.cpp
139 ↗	(On Diff #136520)	This change is not covered by tests. (I've actually found out that the hard way, by trying to find why it didn't trigger any asssertions, oh well)
325 ↗	(On Diff #136520)	I think it would be cleaner to move it (at least the enterblock, it might make sense to leave the header at the very top) after the static variable
363 ↗	(On Diff #136520)	I.e. ... , FUNCTION_IS_METHOD}}}; Stream.EnterBlockInfoBlock(); for (const auto &Block : TheBlocks) { assert(Block.second.size() < (1U << BitCodeConstants::SubblockIDSize)); emitBlockInfo(Block.first, Block.second); } Stream.ExitBlock(); emitVersion(); }
clang-doc/BitcodeWriter.h
19 ↗	(On Diff #136520)	Please sort includes, clang-tidy complains.
32 ↗	(On Diff #136520)	/build/clang-tools-extra/clang-doc/BitcodeWriter.h:32:23: warning: invalid case style for variable 'VERSION_NUMBER' [readability-identifier-naming] static const unsigned VERSION_NUMBER = 1; ^~~~~~~~~~~~~~ VersionNumber
163 ↗	(On Diff #136520)	The simplest solution would be #ifndef NDEBUG // Don't want explicit dtor unless needed ~ClangDocBitcodeWriter() { // Check that the static size is large-enough. assert(Record.capacity() == BitCodeConstants::RecordSize); } #endif
228 ↗	(On Diff #136520)	So you want to be really definitive with this. I wanted to avoid that, actually.. Then i'm afraid one more assert is needed, to make sure this is actually true. I'm not seeing any way to make `SmallVector` completely static, so you could either add one more wrapper around it (rather ugly), or check the final size in the `ClangDocBitcodeWriter` destructor (will not pinpoint when the size has 'overflowed')
246 ↗	(On Diff #136520)	Does it ever make sense to output `BlockInfoBlock` anywhere else other than once at the very beginning? I'd think you should drop the boolean param, and unconditinally call the `emitBlockInfoBlock();` from `ClangDocBitcodeWriter::ClangDocBitcodeWriter()` ctor.
248 ↗	(On Diff #136520)	The naming choices confuse me. There is `writeBitstream()` and `emitBlock()`, which is called from `writeBitstream()` to write the actual contents of the block. Why one is `write` and another is `emit`? To match the `BitstreamWriter` naming choices? (which uses `Emit` prefix)? To avoid the confusion of which one outputs the actual content, and which one outputs the whole block? I think it should be: - void emitBlock(const NamespaceInfo &I); + void emitBlockContent(const NamespaceInfo &I); - void ClangDocBitcodeWriter::writeBitstream(const T &I, bool WriteBlockInfo); + void ClangDocBitcodeWriter::emitBlock(const T &I, bool EmitBlockInfo); This way, i think their names would clearner-er state what they do, and won't be weirdly different. What do you think?
clang-doc/Representation.h
18 ↗	(On Diff #136520)	Please sort includes, clang-tidy complains.
clang-doc/Serialize.cpp
88 ↗	(On Diff #136520)	/build/clang-tools-extra/clang-doc/Serialize.cpp:88:17: warning: invalid case style for variable 'i' [readability-identifier-naming] for (unsigned i = 0, e = C->getNumArgs(); i < e; ++i) ^ ~ ~~ I I I /build/clang-tools-extra/clang-doc/Serialize.cpp:88:24: warning: invalid case style for variable 'e' [readability-identifier-naming] for (unsigned i = 0, e = C->getNumArgs(); i < e; ++i) ^ ~~ E E
107 ↗	(On Diff #136520)	/build/clang-tools-extra/clang-doc/Serialize.cpp:107:19: warning: invalid case style for variable 'i' [readability-identifier-naming] for (unsigned i = 0, e = C->getDepth(); i < e; ++i) ^ ~ ~~ I I I /build/clang-tools-extra/clang-doc/Serialize.cpp:107:26: warning: invalid case style for variable 'e' [readability-identifier-naming] for (unsigned i = 0, e = C->getDepth(); i < e; ++i) ^ ~~ E E
clang-doc/Serialize.h
19 ↗	(On Diff #136520)	Please sort includes, clang-tidy complains.
clang-doc/tool/ClangDocMain.cpp
80 ↗	(On Diff #136520)	Why at the beginning though? Couldn't the user pass `-extra-arg=-fno-parse-all-comments`, which could override this?

Adding tests, fixing comments, and removing an (as-of-yet) unused element of the CommentInfo struct.

clang-doc/BitcodeWriter.cpp
139 ↗	(On Diff #136520)	So after a some digging, this particular field can't be tested right now as the mapper doesn't look at any `TemplateDecl`s (something that definitely needs to be implemented, but in a follow-on patch). I've removed it for now, until it can be properly used/tested.
196 ↗	(On Diff #135682)	If that makes sense to you, sounds good to me!
clang-doc/BitcodeWriter.h
37 ↗	(On Diff #136161)	Figured it out -- the `Reference` struct didn't have default for the enum, and so if it wasn't initialized it was undefined. Should be fixed now.
test/clang-doc/mapper-enum.cpp
17 ↗	(On Diff #136303)	Not currently -- I'm planning to add that functionality in the future, but right now it ignores typedef or using decls.

Could some other people please review this differential, too?
I'm sure i have missed things.

Some more nitpicking.

For this differential as standalone, i'we mostly run out of things to nitpick.
Some things can probably be done better (the blockid/recordid stuff could probably be nicer if tablegen-ed, but that is for later).

I'll try to look at the next differential, and at them combined.

clang-doc/BitcodeWriter.cpp
120 ↗	(On Diff #136650)	We don't actually push these strings to the `Record` (but instead output them directly), so this assertion is not really meaningful, i think?
clang-doc/BitcodeWriter.h
21 ↗	(On Diff #136650)	+DenseMap
21 ↗	(On Diff #136650)	+StringRef
197 ↗	(On Diff #136650)	Humm, you could avoid this constant, and conserve a few bits, if you move the init-list out of `emitBlockInfoBlock()` to somewhere e.g. after the `enum RecordId`, and then since the `BlockId ID` is already passed, you could compute it on-the-fly the same way the `BitCodeConstants::SubblockIDSize` is asserted in `emitBlockInfo*()`. Not sure if it's worth doing though. Maybe just add it as a `NOTE` here.
249 ↗	(On Diff #136650)	Stale comment
clang-doc/Representation.h
60 ↗	(On Diff #136650)	`Info *Ref;` isn't used anywhere
117 ↗	(On Diff #136650)	`llvm::Optional<Location> DefLoc;` ?

Addressing comments

lebedev.ri added inline comments.Mar 2 2018, 10:38 AM

clang-doc/Representation.h
117 ↗	(On Diff #136791)	I meant that `IsDefinition` controls whether `DefLoc` will be set/used or not. So with `llvm::Optional<Location> DefLoc`, you don't need the `bool IsDefinition`.

Removing IsDefinition field.

clang-doc/Representation.h
117 ↗	(On Diff #136791)	That...makes so much sense. Oops. Thank you!

Eugene.Zelenko added inline comments.Mar 5 2018, 6:15 PM

clang-doc/BitcodeWriter.h
160 ↗	(On Diff #136809)	Looks like Clang-format was applied incorrectly, because this is Google, not LLVM style. Please note that it doesn't modify file, just output formatted code to terminal. Please reformat other files, including those in dependent patches.

My apologies for getting back on this so late!

In D41102#1017683, @juliehockett wrote:

So, as an idea (as this diff implements), I updated the string references to be a struct, which holds the USR of the referenced type (for serialization, both here in the mapper and for the dump option in the reducer, as well as a pointer to an Info struct. This pointer is not used at this point, but would be populated by the reducer. Thoughts?

This seems like quite a decent approach! That being said, I don't see the pointer yet? I assume you mean that you will be adding this? Additionally, a slight disadvantage of doing this generic approach is that you need to do bookkeeping on what it is referencing, but I guess there's no helping that due to the architecture which makes you rely upon the USR? Personally I'd prefer having the explicit types if and where possible. So for now a RecordInfo has a vecotr of Reference's to its parents, but we know the parents can only be of certain kinds (more than just a RecordType, but you get the point); it won't be an enum, namespace or function.

As I mentioned, we did this the other way around, which also has the slight advantage that I only had to create and save the USR once per info instance (as in, 10 references to a class only add the overhead of 10 pointers, rather than each having the USR as well), but our disadvantage was of course that we had delayed serialization (although we could arguably do both simultaneously). It seems each method has its merits :).

In D41102#1028228, @Athosvk wrote:

This seems like quite a decent approach! That being said, I don't see the pointer yet? I assume you mean that you will be adding this? Additionally, a slight disadvantage of doing this generic approach is that you need to do bookkeeping on what it is referencing, but I guess there's no helping that due to the architecture which makes you rely upon the USR? Personally I'd prefer having the explicit types if and where possible. So for now a RecordInfo has a vecotr of Reference's to its parents, but we know the parents can only be of certain kinds (more than just a RecordType, but you get the point); it won't be an enum, namespace or function.

If you take a look at the follow-on patch to this (D43341), you'll see that that is where the pointer is added in (since it is irrelevant to the mapper portion, as it cannot be filled out until the information has been reduced). The back references to children and whatnot are also added there.

As I mentioned, we did this the other way around, which also has the slight advantage that I only had to create and save the USR once per info instance (as in, 10 references to a class only add the overhead of 10 pointers, rather than each having the USR as well), but our disadvantage was of course that we had delayed serialization (although we could arguably do both simultaneously). It seems each method has its merits :).

The USRs are kept for serialization purposes -- given the modular nature of the design, the goal is to be able to write out the bitstream and have it be consumable with all necessary information. Since we can't write out pointers (and it would be useless if we did, since they would change as soon as the file was read in), we maintain the USRs to have a means of re-finding the referenced declaration.

That said, I was looking at the Clangd symbol indexing code yesterday, and noticed that they're hashing the USRs (since they get a little lengthy, particularly when you have nested and/or overloaded functions). I'm going to take a look at that today to try to make the USRs more space-efficient here.

Adding hashing to reduce the size of USRs and updating tests.

Nice!
Some further notes based on the SHA1 nature.

clang-doc/BitcodeWriter.cpp
74 ↗	(On Diff #137244)	Those are mixed up. `USRLengthSize` is definitively supposed to be second.
81 ↗	(On Diff #137244)	The sha1 is all-printable, so how about using `BitCodeAbbrevOp::Encoding::Char6` ? Char4 would work best, but it is not there.
149 ↗	(On Diff #137244)	Ha, and all the `*_USR` are actually `StringAbbrev`'s, not confusing at all :)
309 ↗	(On Diff #137244)	Now it would make sense to also assert that this sha1(usr).strlen() == 20
clang-doc/BitcodeWriter.h
46 ↗	(On Diff #137244)	Can definitively lower this to `5U` (2^6 == 32, which is more than the 20 8-bit chars of sha1)
clang-doc/Representation.h
59 ↗	(On Diff #137244)	Now that USR is sha1'd, this is always 20 8-bit characters long.
107 ↗	(On Diff #137244)	`20` Maybe place `using USRString = SmallString<20>; // SHA1 of USR` somewhere and use it everywhere?

In D41102#1028760, @juliehockett wrote:

If you take a look at the follow-on patch to this (D43341), you'll see that that is where the pointer is added in (since it is irrelevant to the mapper portion, as it cannot be filled out until the information has been reduced). The back references to children and whatnot are also added there.

Oops! I'll have a look!

In D41102#1028760, @juliehockett wrote:

The USRs are kept for serialization purposes -- given the modular nature of the design, the goal is to be able to write out the bitstream and have it be consumable with all necessary information. Since we can't write out pointers (and it would be useless if we did, since they would change as soon as the file was read in), we maintain the USRs to have a means of re-finding the referenced declaration.

What I was referring to was the storing of a USR per reference. Of course, serializing pointers wouldn't work, but what I mean is that what we used as a USR was stored in what was pointed to, not in the reference that tells what we are pointing to. To be a little more concise, a RecordInfo has pointers to the FuntionInfo for its member functions. Upon serialization, the RecordInfo queries the USR of those functions. A function being referenced multiple times remains to only have the USR stored. If I understand correctly, you currently save the USR for time an InfoType references another InfoType.

Anyhow, don't pay too much attention to that comment, it's all meant as a minor thing. It sure is looking good so far!

In D41102#1028995, @lebedev.ri wrote:

Some further notes based on the SHA1 nature.

I'm sorry, brainfreeze, i meant 40 chars, not 20.
Updated comments...

clang-doc/BitcodeWriter.cpp
309 ↗	(On Diff #137244)	40 that is
clang-doc/BitcodeWriter.h
46 ↗	(On Diff #137244)	Edit: to 6U (2^6 == 64, which is more than the 40 8-bit chars of sha1)
clang-doc/Representation.h
59 ↗	(On Diff #137244)	40 that is
107 ↗	(On Diff #137244)	40

Updating bitcode writer for hashed USRs, and re-running clang-format. Also cleaning up a couple of unused fields.

Hmm, i'm missing something about the way store sha1...

clang-doc/BitcodeWriter.cpp
53 ↗	(On Diff #137457)	This is VBR because USRLengthSize is of such strange size, to conserve the bits?
57 ↗	(On Diff #137457)	Looking at the `NumWords` changes (decrease!) in the tests, and this is bugging me. And now that i have realized what we do with USR: we first compute SHA1, and get 20x uint8_t store/use it internally then hex-ify it, getting 40x char (assuming 8-bit char) then convert to char6, winning back two bits. but we still loose 2 bits. Question: why do we store sha1 of USR as a string? Why can't we just store that USRString (aka USRSha1 binary) directly? That would be just 20 bytes, you just couldn't go any lower than that.
clang-doc/Representation.h
29 ↗	(On Diff #137457)	Right, of course, internally this is kept in the binary format, which is just 20 chars. This is not the string (the hex-ified version of sha1), but the raw sha1, the binary. This should somehow convey that. This should be something closer to `USRSha1`.

There's a few places where we can trim some of the boilerplate, which I think is important - it's hard to find the "real code" among all the plumbing in places.
Other than that, this seems OK to me.

clang-doc/BitcodeWriter.h
116 ↗	(On Diff #137457)	I think you don't want to declare ID in the unspecialized template, so you get a compile error if you try to use it. (Using traits for this sort of thing seems a bit overboard to me, but YMMV)
154 ↗	(On Diff #137457)	Hmm, you spend a lot of effort plumbing this variable around! Why is it so important? Filesize? (I'm not that familiar with LLVM bitcode, but surely we'll end up with a string table anyway?) If it really is an important option people will want, the command-line arg should probably say why.
241 ↗	(On Diff #137457)	OK, I don't get this at all. We have to declare emitBlockContent(NamespaceInfo) and the specialization of MapFromInfoToBlockId<NamespaceInfo>, and deal with the public interface emitBlock being a template function where you can't tell what's legal to pass, instead of writing: void emitBlock(const NamespaceInfo &I) { SubStreamBlockGuard Block(Stream, BI_NAMESPACE_BLOCK_ID); // <-- this one line ... } This really seems like templates for the sake of templates :(
clang-doc/ClangDoc.h
10 ↗	(On Diff #137457)	This comment doesn't seem accurate - there's no main() in this file. There's a FrontendActionFactory, but nothing in this file uses it.
37 ↗	(On Diff #137457)	nit: seems odd to put all this implementation in the header. (personally I'd just expose a function returning unique_ptr<FrontendActionFactory> from the header, but up to you...)
38 ↗	(On Diff #137457)	for ASTConsumers implemented by ASTVisitors, there seems a fairly strong convention to just make the same class extend both (MapASTVisitor, here). That would eliminate one plumbing class...
clang-doc/Mapper.cpp
33 ↗	(On Diff #137457)	It seems a bit of a poor fit to use a complete bitcode file (header, version, block info) as your value format when you know the format, and know there'll be no version skew. Is it easy just to emit the block we care about?
clang-doc/Representation.h
29 ↗	(On Diff #137457)	I'm not sure that any of the implementation (either USR or SHA) belongs in the type name. In clangd we called this type SymbolID, which seems like a reasonable name here too.
44 ↗	(On Diff #137457)	this is probably the right place to document these fields - what are the legal kinds? what's the name of a comment, direction, etc?

This revision is now accepted and ready to land.Mar 8 2018, 4:51 PM

Closed by commit rL327102: [clang-doc] Setup clang-doc frontend framework (authored by juliehockett). · Explain WhyMar 8 2018, 7:21 PM

This revision was automatically updated to reflect the committed changes.

juliehockett marked 11 inline comments as done.

Herald added a subscriber: llvm-commits. · View Herald TranscriptMar 8 2018, 7:21 PM

Might have been better to not start landing until the all differentials are understood/accepted, but i understand that it is not really up to me to decide.
Let's hope nothing in the next differentials will require changes to this initial code :)

clang-doc/BitcodeWriter.h
241 ↗	(On Diff #137457)	If you want to add a new block, in one case you just need to add one template <> struct MapFromInfoToBlockId<???Info> { static const BlockId ID = BI_???_BLOCK_ID; }; In the other case you need to add whole void ClangDocBitcodeWriter::emitBlock(const ???Info &I) { StreamSubBlockGuard Block(Stream, BI_???_BLOCK_ID); emitBlockContent(I); } (and it was even longer initially) It seems just templating one static variable is shorter than duplicating `emitBlock()` each time, no? Do compare the current diff with the original diff state. I think these templates helped move much of the duplication to simplify the code overall.

Since the commit was reverted, did you mean to either recommit it, or reopen this (with updated diff), so it does not get lost?

In D41102#1034919, @lebedev.ri wrote:

Since the commit was reverted, did you mean to either recommit it, or reopen this (with updated diff), so it does not get lost?

Relanded in r327295.

clang-doc/BitcodeWriter.h
154 ↗	(On Diff #137457)	It was for testing purposes (so that the tests aren't flaky on filenames), but I replaced it with regex.
241 ↗	(On Diff #137457)	You'd still have to add the appropriate `emitBlock()` function for any new block, since it would have different attributes.
clang-doc/Mapper.cpp
33 ↗	(On Diff #137457)	Ideally, yes, but right now in the clang BitstreamWriter there's no way to tell the instance what all the abbreviations are without also emitting the blockinfo to the output stream, though I'm thinking about taking a stab at separating the two. Also, this relies on the llvm-bcanalyzer for testing, which requires both the header and the blockinfo in order to read the data :/

lebedev.ri added inline comments.Mar 14 2018, 1:44 PM

clang-doc/BitcodeWriter.cpp
230 ↗	(On Diff #136303)	And https://github.com/mattgodbolt/compiler-explorer/issues/841 is done, so now we can see that `SmallVector::append()` at least results in less code: https://godbolt.org/g/xJQ59c

So what part is failing, specifically?
The SHA1 blobs of USR's differ in the llvm-bcanalyzer dumps?
The actual filenames %t/docs/bc/<sha1-to-text> differ?
I guess both?

First one you should be able to handle by replacing the actual values with a regex
(i'd guess <USR abbrevid=4 op0=20 op1=11 <...> op19=226 op20=232/> -> <USR abbrevid=4 .*/>, but did not try)
I'm not sure we care about the actual values here, do we?

Second one is interesting.
If we assume that the order in which those are generated is the same, which i think is a safer assumption,
then you could just use result id, not key (sha1-to-text of USR), i.e. %t/docs/bc/00.bc, %t/docs/bc/01.bc and so on.
I.e. something like:

  if (DumpMapperResult) {
+   unsigned id = 0;
    Exec->get()->getToolResults()->forEachResult([&](StringRef Key,
                                                     StringRef Value) {
      SmallString<128> IRRootPath;
      llvm::sys::path::native(OutDirectory, IRRootPath);
      llvm::sys::path::append(IRRootPath, "bc");
      std::error_code DirectoryStatus =
          llvm::sys::fs::create_directories(IRRootPath);
      if (DirectoryStatus != OK) {
        llvm::errs() << "Unable to create documentation directories.\n";
        return;
      }
-     llvm::sys::path::append(IRRootPath, Key + ".bc");
+     llvm::sys::path::append(IRRootPath, std::to_string(id) + ".bc");
      std::error_code OutErrorInfo;
      llvm::raw_fd_ostream OS(IRRootPath, OutErrorInfo, llvm::sys::fs::F_None);
      if (OutErrorInfo != OK) {
        llvm::errs() << "Error opening documentation file.\n";
        return;
      }
      OS << Value;
      OS.close();
+     id++;
    });
  }

Hm, or possibly you could just pass the triple to clang?

I was just thinking of disabling the one test that has an issue (class-in-function) on Windows -- the filename is only used in generating *some* USRs, so all of the other ones are fine. We ran into some issues with that though, since UNSUPPORTED: system-windows didn't seem to disable the test on the machine I have access to. Thoughts?

In D41102#1041773, @juliehockett wrote:

I was just thinking of disabling the one test that has an issue (class-in-function) on Windows -- the filename is only used in generating *some* USRs, so all of the other ones are fine. We ran into some issues with that though, since UNSUPPORTED: system-windows didn't seem to disable the test on the machine I have access to. Thoughts?

UNSUPPORTED: system-windows

Perhaps that is only for msvc?

Have you tried something more broad, like
UNSUPPORTED: mingw32,win32
?

In D41102#1041791, @lebedev.ri wrote:

Have you tried something more broad, like
UNSUPPORTED: mingw32,win32
?

That wasn't working either, confusingly, at least on the local windows machine I have.

Huh, something weird is going on there.
What about the other way around, REQUIRES: linux ?

After much digging, it looks like the lit config is never initialized in clang-tools-extra like it is in the other projects. REQUIRES et.al. work properly once that's in there (see D44708). Once that lands I'll reland this and *hopefully* that'll be that!

hintonda removed a subscriber: hintonda.Mar 24 2018, 11:57 AM

Revision Contents

Path

Size

tools/

CMakeLists.txt

1 line

clang-doc/

21 lines

88 lines

90 lines

114 lines

259 lines

tool/

CMakeLists.txt

18 lines

ClangDocMain.cpp

69 lines

Diff 126645

tools/CMakeLists.txt

	create_subdirectory_options(CLANG TOOL)			create_subdirectory_options(CLANG TOOL)

	add_clang_subdirectory(diagtool)			add_clang_subdirectory(diagtool)
	add_clang_subdirectory(driver)			add_clang_subdirectory(driver)
	add_clang_subdirectory(clang-diff)			add_clang_subdirectory(clang-diff)
				add_clang_subdirectory(clang-doc)
	add_clang_subdirectory(clang-format)			add_clang_subdirectory(clang-format)
	add_clang_subdirectory(clang-format-vs)			add_clang_subdirectory(clang-format-vs)
	add_clang_subdirectory(clang-fuzzer)			add_clang_subdirectory(clang-fuzzer)
	add_clang_subdirectory(clang-import-test)			add_clang_subdirectory(clang-import-test)
	add_clang_subdirectory(clang-offload-bundler)			add_clang_subdirectory(clang-offload-bundler)

	add_clang_subdirectory(c-index-test)			add_clang_subdirectory(c-index-test)

	Show All 24 Lines

tools/clang-doc/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS
				support
				)

				add_clang_library(clangDoc
				ClangDoc.cpp
				ClangDocReporter.cpp

				LINK_LIBS
				clangAnalysis
				clangAST
				clangASTMatchers
				clangBasic
				clangFormat
				clangFrontend
				clangLex
				clangTooling
				clangToolingCore
				)

				add_subdirectory(tool)

tools/clang-doc/ClangDoc.h

This file was added.

				//===-- ClangDoc.cpp - ClangDoc ---------------------------------- C++ --===//
				//
				sammccallUnsubmitted Done Reply Inline Actions This needs some high-level documentation: what does the clang-doc library do, what's the main user (clang-doc command-line tool), what are the major moving parts. I don't personally have a strong opinion on how this is split between this header / the implementation / a documentation page for the tool itself, but we'll probably need something for each of those. (I think it's OK to defer the user-facing documentation to another patch, but we should do it before the tool becomes widely publicized or included in an llvm release) sammccall: This needs some high-level documentation: what does the clang-doc library do, what's the main…
				sammccallUnsubmitted Done Reply Inline Actions This comment is still relevant. `ClangDoc.h` in particular sounds like the API entrypoint, but the only thing that's documented here is an implementation detail. The file comment here should describe at a high level how documentation is extracted, combined, and output. most of the files have no file comment describing what the file is responsible for, what it interacts with etc. If I was contributing a patch here, how would I know whether a given header was the right layer for a new function? sammccall: This comment is still relevant. - `ClangDoc.h` in particular sounds like the API entrypoint…
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANGDOC_H
				#define LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANGDOC_H

				#include "ClangDocReporter.h"
				#include "clang/AST/AST.h"
				#include "clang/AST/ASTConsumer.h"
				#include "clang/AST/ASTContext.h"
				#include "clang/AST/RecursiveASTVisitor.h"
				#include "clang/Frontend/ASTConsumers.h"
				#include "clang/Frontend/FrontendActions.h"
				#include "clang/Tooling/Tooling.h"
				#include <string>
				#include <vector>

				namespace clang {
				namespace doc {

				// A Context which contains extra options which are used in ClangMoveTool.
				struct ClangDocContext {
				sammccallUnsubmitted Done Reply Inline Actions what's clangmovetool? sammccall: what's clangmovetool?
				sammccallUnsubmitted Done Reply Inline Actions nit: this sounds more like "options" than a context to me, though there's only one member to go on :-) sammccall: nit: this sounds more like "options" than a context to me, though there's only one member to go…
				// Which format to emit representation in.
				OutFormat EmitFormat;
				sammccallUnsubmitted Done Reply Inline Actions Is this the intermediate representation referred to in the design doc, or the final output format? If the former, why two formats rather than picking one? YAML is nice for being usable by out-of-tree tools (though not as nice as JSON). But it seems like providing YAML as a trivial backend format would fit well? Bitcode is presumably more space-efficient - if this is significant in practice it seems like a better choice. sammccall: Is this the intermediate representation referred to in the design doc, or the final output…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions That's the idea -- for developing purposes, I wrote up the YAML output first for this patch, and there will be a follow-on patch expanding the bitcode/binary output. I've updated the flags to default to the binary, with an option to dump the yaml (rather than the other way around). juliehockett: That's the idea -- for developing purposes, I wrote up the YAML output first for this patch…
				sammccallUnsubmitted Done Reply Inline Actions What's still not clear to me is: is YAML a) a "real" intermediate format, or b) just a debug representation? I would suggest for orthogonality that there only be one intermediate format, and that any debug version be generated from it. In practice I guess this means: the reporter builds the in-memory representation you can serialize/deserialize memory representation to the IR (bitcode) you can serialize memory representation to debug representation (YAML) but not parse maybe the clang-doc core should only know about IR, and YAML should be produced in the same way e.g. HTML would be? This does pose a short-term problem: the canonical IR is bitcode, we need YAML for the lit tests, and we don't have the decoder/transformer part yet. This could be solved either by using YAML as the IR for now and switching later, or by adding a simple decoder now. Either way it points to the reporter not having an output format option, and having to support two formats. WDYT? I might be missing something here. sammccall: What's still not clear to me is: is YAML a) a "real" intermediate format, or b) just a debug…
				juliehockettAuthorUnsubmitted Not Done Reply Inline Actions The mapper now only has the ability to write to bitcode -- I'm working on writing up a simple decoder to use for testing and will update the patch again once that's working. Once that's in place, that will also serve the purpose of being the foundation for how we're going to read the bitcode into the backend to produce actual docs. Does that make sense? juliehockett: The mapper now only has the ability to write to bitcode -- I'm working on writing up a simple…
				};
				JonasTothUnsubmitted Done Reply Inline Actions Is this a string for a discrete set of configurations? If so maybe a `enum class` would be a better fit. JonasToth: Is this a string for a discrete set of configurations? If so maybe a `enum class` would be a…

				class ClangDocVisitor : public RecursiveASTVisitor<ClangDocVisitor> {
				public:
				sammccallUnsubmitted Done Reply Inline Actions This API makes essentially everything public. Is that the intent? It seems like `ClangDocVisitor` is a detail, and the operation you want to expose is "extract doc from this AST into this reporter" or maybe "create an AST consumer that feeds this reporter". It would be useful to have an API to extract documentation from individual AST nodes (e.g. a Decl). But I'd be nervous about trying to use the classes exposed here to do that. If it's efficiently possible, it'd be nice to expose a function. (one use case for this is clangd) sammccall: This API makes essentially everything public. Is that the intent? It seems like…
				jakehehrlichUnsubmitted Done Reply Inline Actions Correct me if I'm wrong but I believe that everything needs to be public in this case because the base class needs to be able to call them. So the visit methods all need to be public. jakehehrlich: Correct me if I'm wrong but I believe that everything needs to be public in this case because…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Yes to the `VisitDecl` methods being public because of the base class. That said, I shifted a few things around here and implemented it as a `MatcherFinder` instead of a `RecursiveASTVisitor`. The change will allow us to make most of the methods private, and have the ability to fairly easily implement an API for pulling a specific node (e.g. by name or by decl type). As far as I understand (and please correct me if I'm wrong), the matcher traverses the tree in a similar way. This will also make mapping through individual nodes easier. juliehockett:* Yes to the `Visit*Decl` methods being public because of the base class. That said, I shifted a…
				sammccallUnsubmitted Done Reply Inline Actions Sorry for being vague - yes overridden or "CRTP-overridden" methods may need to be public. I meant that the classes themselves don't need to be exposed, I think. (The header could just expose a function to create the needed ones, that returns `unique_ptr<interface>` There are now fewer classes exposed here, but I think most/all of them can still reasonably be hidden. sammccall: Sorry for being vague - yes overridden or "CRTP-overridden" methods may need to be public. I…
				juliehockettAuthorUnsubmitted Not Done Reply Inline Actions So I've restructured this again and collapsed all of the tooling things into to ExecutionContext. The only thing exposed here now is the callback, which is registered on the matcher. Is there anything else I'm missing? juliehockett: So I've restructured this again and collapsed all of the tooling things into to…
				explicit ClangDocVisitor(ASTContext *Context, ClangDocReporter &Reporter)
				: Context(Context), Reporter(Reporter) {}

				virtual bool VisitNamedDecl(const NamedDecl *D);

				JonasTothUnsubmitted Done Reply Inline Actions Not sure if this method should be virtual. `RecursiveASTVisitor` uses the crtp to not need virtual methods but still behaving the same. JonasToth: Not sure if this method should be virtual. `RecursiveASTVisitor` uses the crtp to not need…
				void ParseUnattachedComments();
				sammccallUnsubmitted Done Reply Inline Actions `override` where applicable sammccall: `override` where applicable
				juliehockettAuthorUnsubmitted Done Reply Inline Actions I might be wrong, but I don't believe the VisitDecl methods are overrides for RecursiveASTVisitor? juliehockett:* I might be wrong, but I don't believe the Visit*Decl methods are overrides for…
				jakehehrlichUnsubmitted Done Reply Inline Actions These methods are not virtual methods. It's technically legal to use the override keyword if a subclass shadows a non-virtual method but I don't think its what we want to do here. jakehehrlich: These methods are not virtual methods. It's technically legal to use the override keyword if a…
				bool IsNewComment(SourceLocation Loc, SourceManager &Manager) const;
				JDevlieghereUnsubmitted Done Reply Inline Actions I know it's confusing given the amount of existing code that uses UpperCamelCase for functions, but I think that (as this is new code) we'd want to stay close to the style guide and use lowerCamelCase where we can. JDevlieghere: I know it's confusing given the amount of existing code that uses UpperCamelCase for functions…

				private:
				ASTContext *Context;
				ClangDocReporter &Reporter;
				};

				class ClangDocConsumer : public clang::ASTConsumer {
				public:
				explicit ClangDocConsumer(ASTContext *Context, ClangDocReporter &Reporter)
				: Visitor(Context, Reporter), Reporter(Reporter) {}

				virtual void HandleTranslationUnit(clang::ASTContext &Context);

				private:
				ClangDocVisitor Visitor;
				ClangDocReporter &Reporter;
				};

				class ClangDocAction : public clang::ASTFrontendAction {
				public:
				ClangDocAction(ClangDocReporter &Reporter) : Reporter(Reporter) {}

				virtual std::unique_ptr<clang::ASTConsumer> CreateASTConsumer(clang::CompilerInstance &Compiler, llvm::StringRef InFile);
				virtual void EndSourceFileAction();

				private:
				ClangDocReporter &Reporter;
				};
				jakehehrlichUnsubmitted Done Reply Inline Actions This should be moved to the .cpp file. Because there is no key function (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vague-vtable) this method will be redefined in every translation unit that includes this header. jakehehrlich: This should be moved to the .cpp file. Because there is no key function (https://itanium-cxx…

				class ClangDocActionFactory : public tooling::FrontendActionFactory {
				public:
				ClangDocActionFactory(ClangDocContext &Context, ClangDocReporter &Reporter)
				: Context(Context), Reporter(Reporter) {}

				clang::FrontendAction *create() override {
				return new ClangDocAction(Reporter);
				}

				private:
				ClangDocContext &Context;
				sammccallUnsubmitted Done Reply Inline Actions this class can definitely be hidden in the c++ file, behind a newClangDocActionFactory() func (actually I think newFrontendActionFactory in Tooling.h could be extended to cover this, but not 100% sure) sammccall: this class can definitely be hidden in the c++ file, behind a newClangDocActionFactory() func…
				ClangDocReporter &Reporter;
				};

				} // namespace doc
				} // namespace clang

				#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANGDOC_H

tools/clang-doc/ClangDoc.cpp

This file was added.

				//===-- ClangDoc.cpp - ClangDoc ---------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "ClangDoc.h"
				#include "clang/AST/AST.h"
				#include "clang/AST/ASTConsumer.h"
				#include "clang/AST/ASTContext.h"
				#include "clang/AST/Comment.h"
				#include "clang/AST/RecursiveASTVisitor.h"
				#include "clang/Frontend/ASTConsumers.h"
				#include "clang/Frontend/CompilerInstance.h"
				#include "clang/Frontend/FrontendActions.h"
				#include "clang/Tooling/Tooling.h"

				using namespace clang;
				using namespace clang::tooling;
				using namespace llvm;
				jakehehrlichUnsubmitted Done Reply Inline Actions Is it possible to use VisitEnumDecl and VisitRecordDecl separately here? jakehehrlich: Is it possible to use VisitEnumDecl and VisitRecordDecl separately here?

				namespace clang {
				namespace doc {

				// TODO: limit to functions/objects/namespaces/etc?
				bool ClangDocVisitor::VisitNamedDecl(const NamedDecl *D) {
				SourceManager &Manager = Context->getSourceManager();
				if (!IsNewComment(D->getLocation(), Manager))
				JonasTothUnsubmitted Done Reply Inline Actions Here manager can be const& if it is on the other places too JonasToth: Here manager can be const& if it is on the other places too
				return true;

				DeclInfo DI;
				jakehehrlichUnsubmitted Done Reply Inline Actions I can't think of a good way to dedup these two methods at the moment. Can you put a TODO here to deduplicate these two specializations? jakehehrlich: I can't think of a good way to dedup these two methods at the moment. Can you put a TODO here…
				DI.D = D;
				DI.QualifiedName = D->getQualifiedNameAsString();
				jakehehrlichUnsubmitted Done Reply Inline Actions I think you should use llvm_unrechable here jakehehrlich: I think you should use llvm_unrechable here
				RawComment *Comment = Context->getRawCommentForDeclNoCache(D);

				// TODO: Move set attached to the initial comment parsing, not here
				if (Comment) {
				Comment->setAttached();
				jakehehrlichUnsubmitted Done Reply Inline Actions It looks like you're using this pattern a lot. It might be worth factoring this out somehow. jakehehrlich: It looks like you're using this pattern a lot. It might be worth factoring this out somehow.
				DI.Comment =
				Reporter.ParseFullComment(Comment->parse(*Context, nullptr, D));
				}
				Reporter.AddDecl(Manager.getFilename(D->getLocation()), DI);
				return true;
				}
				jakehehrlichUnsubmitted Done Reply Inline Actions Can you separate this into VisitFunctionDecl and VisitCXXMethodDecl? jakehehrlich: Can you separate this into VisitFunctionDecl and VisitCXXMethodDecl?

				void ClangDocVisitor::ParseUnattachedComments() {
				SourceManager &Manager = Context->getSourceManager();
				for (RawComment *Comment : Context->getRawCommentList().getComments()) {
				JonasTothUnsubmitted Done Reply Inline Actions I think Manager can be const&. Looks like only read methods were called. JonasToth: I think Manager can be const&. Looks like only read methods were called.
				if (!IsNewComment(Comment->getLocStart(), Manager) \|\| Comment->isAttached())
				continue;
				CommentInfo CI =
				Reporter.ParseFullComment(Comment->parse(*Context, nullptr, nullptr));
				JonasTothUnsubmitted Done Reply Inline Actions Full sentence. `set attached` == `setAttached`? Removing the not here and using the method name is probably enough already. JonasToth: Full sentence. `set attached` == `setAttached`? Removing the not here and using the method…
				Reporter.AddComment(Manager.getFilename(Comment->getLocStart()), CI);
				}
				}

				bool ClangDocVisitor::IsNewComment(SourceLocation Loc,
				SourceManager &Manager) const {
				jakehehrlichUnsubmitted Done Reply Inline Actions Can this be a const method? jakehehrlich: Can this be a const method?
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Not right now -- it's actually updating the `Attached` attribute of the comment, since it's not actually set in the initial parsing. It should be moved out into the initial comment parsing (see FIXME), but that's a separate patch. I should probably write that. :) juliehockett: Not right now -- it's actually updating the `Attached` attribute of the comment, since it's not…
				if (!Loc.isValid())
				JonasTothUnsubmitted Done Reply Inline Actions Manager could be const&. JonasToth: Manager could be const&.
				return false;
				const std::string &Filename = Manager.getFilename(Loc);
				if (!Reporter.HasFile(Filename) \|\| Reporter.HasSeenFile(Filename))
				return false;
				if (Manager.isInSystemHeader(Loc) \|\| Manager.isInExternCSystemHeader(Loc))
				return false;
				Reporter.AddFileInTU(Filename);
				return true;
				}

				void ClangDocConsumer::HandleTranslationUnit(ASTContext &Context) {
				Visitor.TraverseDecl(Context.getTranslationUnitDecl());
				Visitor.ParseUnattachedComments();
				}

				jakehehrlichUnsubmitted Done Reply Inline Actions I think this method should return a StringRef instead of an std::string because the const char* returned by getFilename should live at least as long as the source manager. jakehehrlich: I think this method should return a StringRef instead of an std::string because the const char*…
				std::unique_ptr<ASTConsumer> ClangDocAction::CreateASTConsumer(CompilerInstance &Compiler, StringRef InFile) {
				return llvm::make_unique<ClangDocConsumer>(&Compiler.getASTContext(), Reporter);
				}

				void ClangDocAction::EndSourceFileAction() {
				JonasTothUnsubmitted Done Reply Inline Actions `llvm::make_unique` is cleaner here. JonasToth: `llvm::make_unique` is cleaner here.
				for (const auto &Filename : Reporter.GetFilesInThisTU()) {
				Reporter.AddFileSeen(Filename);
				}
				Reporter.ClearFilesInThisTU();
				jakehehrlichUnsubmitted Done Reply Inline Actions So haven't looked enough at the reporter code yet but it seems to me this should a unique pointer. You seem to already be aware of that based on a TODO I saw in the reporter code though. Is it possible that "parseFullComent" should just take a plain old pointer instead of a unique_ptr or shared_ptr? jakehehrlich: So haven't looked enough at the reporter code yet but it seems to me this should a unique…
				}

				} // namespace doc
				} // namespace clang
				jakehehrlichUnsubmitted Done Reply Inline Actions I think you want to return S here so that the move constructor is used instead. str() returns a reference to S which will cause the copy constructor to be called. I think most std::string implementations have a copy on write optimization but it's strictly more ideal to use the move constructor. jakehehrlich: I think you want to return S here so that the move constructor is used instead. str() returns a…
				jakehehrlichUnsubmitted Done Reply Inline Actions I think it's kind of annoying that this can't be a const method because of these mangle calls. I don't really understand why MangleContext works the way that it does but it could be that this is a situation where the "mutable" keyword should be used on MC to allow what should be a const method to actully be const. That might be something to look into. jakehehrlich: I think it's kind of annoying that this can't be a const method because of these mangle calls.
				jakehehrlichUnsubmitted Done Reply Inline Actions Can you add a comment documenting what this function does? jakehehrlich: Can you add a comment documenting what this function does?
				jakehehrlichUnsubmitted Done Reply Inline Actions Pro Tip: Always explicitly refer to this as "llvm::make_unique" because you'll have to revert this change if you don't. Some of the build bots have C++14 headers instead of C++11 headers. This means that llvm::make_unique and std::make_unique will both be defined. This means that using "make_unique" will cause an error even though only llvm::make_unique can be referred to unqualified. So even if you're inside of the llvm namespace you should explicitly refer to "llvm::make_unique" and never use "make_unique". jakehehrlich: Pro Tip: Always explicitly refer to this as "llvm::make_unique" because you'll have to revert…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Oh interesting -- thanks for the tip! juliehockett: Oh interesting -- thanks for the tip!

tools/clang-doc/ClangDocReporter.h

This file was added.

				//===-- Doc.cpp - ClangDoc --------------------------------------- C++ --===//
				//
				sammccallUnsubmitted Done Reply Inline Actions nit: header is out of sync with the filename Each header should have a high level description of what this component is and how it fits into the system. sammccall: nit: header is out of sync with the filename Each header should have a high level description…
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_REPORTER_H
				#define LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_REPORTER_H

				#include "clang/AST/AST.h"
				#include "clang/AST/ASTConsumer.h"
				#include "clang/AST/ASTContext.h"
				#include "clang/AST/CommentVisitor.h"
				#include "clang/AST/RecursiveASTVisitor.h"
				#include "clang/Frontend/ASTConsumers.h"
				#include "clang/Frontend/FrontendActions.h"
				#include "clang/Tooling/Tooling.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Support/raw_ostream.h"
				#include <set>
				#include <string>
				#include <vector>

				using namespace clang::comments;

				namespace clang {
				namespace doc {

				enum class OutFormat {YAML, LLVM};

				struct StringPair {
				sammccallUnsubmitted Done Reply Inline Actions document :-) I'm not sure "LLVM" is a suitable name for the bitcode format. It probably makes sense just to name this as being "clang-doc's binary format", maybe commenting that it's related to LLVM bitcode. It's really an implementation detail: these files won't be (I think?) interoperable with any other tools that process bitcode. sammccall: document :-) I'm not sure "LLVM" is a suitable name for the bitcode format. It probably makes…
				std::string Key;
				std::string Value;
				JDevlieghereUnsubmitted Done Reply Inline Actions Do you still need this? JDevlieghere: Do you still need this?
				juliehockettAuthorUnsubmitted Not Done Reply Inline Actions Yes, it's used to serialize the map to yaml. juliehockett: Yes, it's used to serialize the map to yaml.
				};

				struct CommentInfo {
				AthosvkUnsubmitted Not Done Reply Inline Actions Storing the type information seems more suitable than storing just the name and type as a string. In my view, the frontend creates a format suitable for (almost) any backend to use without further parsing. This would for example require me to parse part of the name to get the namespace. Athosvk: Storing the type information seems more suitable than storing just the name and type as a…
				std::string Kind;
				std::string Text;
				std::string Name;
				AthosvkUnsubmitted Not Done Reply Inline Actions You might want to separate this out to a FieldType/MemberType or something alike, as only class members will have this set, while you also use this for parameters/return types etc. I know there's AS_NONE but it seems a little wasteful considering the amount of instances that will not have this set Athosvk: You might want to separate this out to a FieldType/MemberType or something alike, as only class…
				std::string Direction;
				std::string ParamName;
				std::string CloseName;
				bool SelfClosing = false;
				JDevlieghereUnsubmitted Done Reply Inline Actions Would this be a good use case for `llvm::SmallVector`? JDevlieghere: Would this be a good use case for `llvm::SmallVector`?
				bool Explicit = false;
				JDevlieghereUnsubmitted Done Reply Inline Actions Why not use `llvm::StringMap` here? I'm guessing you went with a `StringPair` because of the YAMP serialization, but that should work with StringMap. JDevlieghere: Why not use `llvm::StringMap` here? I'm guessing you went with a `StringPair` because of the…
				llvm::StringMap<std::string> Attrs;
				llvm::SmallVector<std::string, 8> Args;
				jakehehrlichUnsubmitted Done Reply Inline Actions There are a lot of std::string members here, do we know for sure that we CommentInfo to own all of these? My general strategy is to avoid owning data (e.g. use StringRef and ArrayRef) unless there's a good reason to own the data. This is a general question I have about all FooInfo structs. jakehehrlich: There are a lot of std::string members here, do we know for sure that we CommentInfo to own all…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions So the issue here is that most of this data is owned by the various Decl things, which will go out of scope before the data is serialized. That said, I think this won't be a problem at all once I refactor the intermediate output for mapping and reducing instead of doing it in memory. juliehockett: So the issue here is that most of this data is owned by the various Decl things, which will go…
				llvm::SmallVector<int, 8> Position;
				std::vector<CommentInfo> Children;
				};
				JonasTothUnsubmitted Done Reply Inline Actions Here a short `children()` method return llvm::make_range shortens the code in a later loop and might benefit in the future for iterations over children. JonasToth: Here a short `children()` method return llvm::make_range shortens the code in a later loop and…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Is there a reason you wouldn't be able to just use `for (const CommentInfo &c : CI.Children)` ? The later loop I believe you're referencing doesn't loop over this struct, it looks at the children of a `comments::Comment` type. juliehockett: Is there a reason you wouldn't be able to just use `for (const CommentInfo &c : CI.Children)` ?
				JonasTothUnsubmitted Done Reply Inline Actions Yes of course. But you can not enforce that the user of `Children` will not modify it. This is a struct with seemingly no constraints on its values meaning that would be ok. If some values belong together and must work together and kept consistent and `const` method returning `const&` to `Children` might be better. Modfying `Children` can then be done via methods that ensure consistency between the values. JonasToth: Yes of course. But you can not enforce that the user of `Children` will not modify it. This is…

				// TODO: collect declarations of the same object, comment is preferentially:
				// 1) docstring on definition, 2) combined docstring from non-def decls, or
				// 3) comment on definition, 4) no comment.
				struct DeclInfo {
				const Decl *D;
				std::string QualifiedName;
				CommentInfo Comment;
				};

				struct FileRecord {
				std::string Filename;
				std::vector<DeclInfo> Decls;
				std::vector<CommentInfo> UnattachedComments;
				};

				class ClangDocReporter : public ConstCommentVisitor<ClangDocReporter> {
				public:
				ClangDocReporter(const std::vector<std::string> &SourcePathList);

				void AddComment(StringRef Filename, CommentInfo &CI);
				void AddDecl(StringRef Filename, DeclInfo &D);
				void AddFile(StringRef Filename);
				AthosvkUnsubmitted Done Reply Inline Actions NItpick but should probably be 'IsDefined' Athosvk: NItpick but should probably be 'IsDefined'
				void AddFileInTU(StringRef Filename) { FilesInThisTU.insert(Filename); }
				void AddFileSeen(StringRef Filename) { FilesSeen.insert(Filename); }
				void ClearFilesInThisTU() { FilesInThisTU.clear(); };

				void visitTextComment(const TextComment *C);
				void visitInlineCommandComment(const InlineCommandComment *C);
				void visitHTMLStartTagComment(const HTMLStartTagComment *C);
				void visitHTMLEndTagComment(const HTMLEndTagComment *C);
				void visitBlockCommandComment(const BlockCommandComment *C);
				void visitParamCommandComment(const ParamCommandComment *C);
				void visitTParamCommandComment(const TParamCommandComment *C);
				void visitVerbatimBlockComment(const VerbatimBlockComment *C);
				AthosvkUnsubmitted Done Reply Inline Actions Seems common to almost all Info structs, so you can probably move it to the/some base. Namespaces do seem unrelated, so maybe you can make another struct inbetween? E.g. something like struct SymbolInfo : Info which contains a field for DefinitionFile and Locations (since that may not be used for namespaces either). Additionally, what will you do when you merge this output information from multiple compilation untis back together? Only one should have the DefinitionFile set as the other compilation units won't see the definition. What happens if a function stays undefined? Can you generate documentation for it? Athosvk: Seems common to almost all Info structs, so you can probably move it to the/some base.
				juliehockettAuthorUnsubmitted Not Done Reply Inline Actions Some of this is addressed in the mapper refactoring, but the thought about separating out the namespace with another layer is a good one. Let me know what you think of the update! juliehockett: Some of this is addressed in the mapper refactoring, but the thought about separating out the…
				void visitVerbatimBlockLineComment(const VerbatimBlockLineComment *C);
				AthosvkUnsubmitted Done Reply Inline Actions This should probably be a NamedType like the parameters Athosvk: This should probably be a NamedType like the parameters
				void visitVerbatimLineComment(const VerbatimLineComment *C);
				AthosvkUnsubmitted Done Reply Inline Actions Perhaps you could already attach the parameter comments to this? So something like a struct ParamInfo { NamedType Type; std::string/CommentInfo Description/CommentInfo; } Or are you planning to keep this to a later stage? At this point it seems like the backend will have to parse a CommentInfo struct to attach comments to parameters etc. manually Athosvk: Perhaps you could already attach the parameter comments to this? So something like a ```…
				juliehockettAuthorUnsubmitted Not Done Reply Inline Actions That's a good point--is makes sense to attach a comment to any named type, since class members and whatnot can also have them. That said, it can't really be done until the declaration with the documentation comment is seen, so with the new MR framework this might be a task to push off to the reducer (which will be the next patch). juliehockett: That's a good point--is makes sense to attach a comment to any named type, since class members…

				CommentInfo ParseFullComment(comments::FullComment *Comment);
				JDevlieghereUnsubmitted Done Reply Inline Actions You probably want to return a const-ref here, rather than a copy. JDevlieghere: You probably want to return a const-ref here, rather than a copy.

				const std::set<std::string>& GetFilesInThisTU() const { return FilesInThisTU; }
				AthosvkUnsubmitted Done Reply Inline Actions Currently info stores the locations of occurrences, but this seems like a hard thing to do when it comes namespaces. Is it useful to have this particular information? Athosvk: Currently info stores the locations of occurrences, but this seems like a hard thing to do when…
				bool HasFile(StringRef Filename) const;
				bool HasSeenFile(StringRef Filename) const;
				void Serialize(clang::doc::OutFormat Format, llvm::raw_ostream &OS) const;

				private:
				void parseComment(CommentInfo CI, comments::Comment C);
				void serializeYAML(llvm::raw_ostream &OS) const;
				void serializeLLVM(llvm::raw_ostream &OS) const;
				const char *getCommandName(unsigned CommandID);
				bool isWhitespaceOnly(StringRef S);

				CommentInfo *CurrentCI;
				llvm::StringMap<FileRecord> FileRecords;
				std::set<std::string> FilesInThisTU;
				std::set<std::string> FilesSeen;
				};

				} // namespace doc
				} // namespace clang

				#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_REPORTER_H
				jakehehrlichUnsubmitted Done Reply Inline Actions This seems like a code smell to me. I haven't read though the usage of CurrentCI well enough yet to properly conclude what to do about this but I have an idea. Do you think it's possible to have another class that includes the methods that use CurrentCI? The other alternative might be to pass this just to where it's needed as that's basically what you're doing. This all said, I haven't read most of this code yet so feel free to disregard this for right now. jakehehrlich: This seems like a code smell to me. I haven't read though the usage of CurrentCI well enough…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions I tidied this up a little bit by breaking out the CommentVisitor into its own class and returning unique_ptrs, but it's not ideal still because the class itself uses a raw pointer to actually build the comment. The issue is that the `visitComment` methods are all called from the base `ConstCommentVisitor` traverse method, and so the data structure can't be passed around there... juliehockett:* I tidied this up a little bit by breaking out the CommentVisitor into its own class and…
				jakehehrlichUnsubmitted Done Reply Inline Actions I think you should use explicit template specialization to make these "createFooInfo" methods uniform. This will enable other code that calls these methods to be written in a more uniform fashion. so define something like template<class T> void createInfo(const T D, const FullComment C, ...); and then define various specializations of that member function instead of creating a new method for each createFooInfo method. jakehehrlich: I think you should use explicit template specialization to make these "createFooInfo" methods…
				jakehehrlichUnsubmitted Done Reply Inline Actions sans the BasicInfo one I think you should use the same specialization trick here. After you do that the main difference between createInfo methods will be what collection they add too. That suggests to me that the collection the info is added to should be made a parameter to a method that does all the actual work. jakehehrlich: sans the BasicInfo one I think you should use the same specialization trick here. After you do…
				sammccallUnsubmitted Done Reply Inline Actions What's the relationship between ClangDocReporter, the classes in ClangDoc.h and the intermediate format? If the reporter is fixed-function and always writes the intermediate format, then it seems something of a confusing name - I'd have expected "Reporter" to be an abstract set of callbacks e.g. for producing different final output formats. In that case, it seems like even injecting (into ClangDoc.h classes) the thing that builds the intermediate format is overkill - why not just inject a sink for the structs that get produced? sammccall: What's the relationship between ClangDocReporter, the classes in ClangDoc.h and the…
				jakehehrlichUnsubmitted Done Reply Inline Actions If you add a public "using DeclType = FooDecl;" to each "FooInfo" you can eliminate the second template argument and make the intent of this code more clear. This also formalizes the connection these types have to each other. jakehehrlich: If you add a public "using DeclType = FooDecl;" to each "FooInfo" you can eliminate the second…

tools/clang-doc/ClangDocReporter.cpp

This file was added.

				//===-- Doc.cpp - ClangDoc --------------------------------------- C++ --===//
				//
				sammccallUnsubmitted Done Reply Inline Actions It looks like the plan for merging data across sources is to hold all information in one in-memory structure and incrementally add to it as you get information from TUs. (This should be documented somewhere!) This seems somewhat hostile to parallel processing: you're going to need to synchronize access to the structs owned by the ClangDocReporter if you want to gather from multiple TUs at once. Moreover, documenting large codebases using multiple machines in parallel seems very difficult. And obviously it assumes you can fit the generated documentation for the codebase in memory, which would be nice to avoid. Have you considered a mapreduce-like architecture, where the mapper gets AST callbacks and spits out data, and the reducer is responsible for assembling all the data together? We don't have good framework support for mapreduce in open-source clang (we do have something internally at google, and I'd really like to better support this pattern in libtooling). Still, you can see a simple example of this pattern (albeit with a trivial reducer) in clang/tools/extra/clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp. sammccall: It looks like the plan for merging data across sources is to hold all information in one in…
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "ClangDocReporter.h"
				#include "clang/AST/AST.h"
				#include "clang/AST/ASTConsumer.h"
				#include "clang/AST/ASTContext.h"
				#include "clang/AST/CommentVisitor.h"
				#include "clang/AST/RecursiveASTVisitor.h"
				#include "clang/Frontend/ASTConsumers.h"
				#include "clang/Frontend/CompilerInstance.h"
				#include "clang/Frontend/FrontendActions.h"
				#include "clang/Tooling/Tooling.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Support/YAMLTraits.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace clang;
				using namespace clang::tooling;
				using namespace llvm;

				LLVM_YAML_IS_SEQUENCE_VECTOR(clang::doc::DeclInfo)
				LLVM_YAML_IS_SEQUENCE_VECTOR(clang::doc::CommentInfo)
				LLVM_YAML_IS_SEQUENCE_VECTOR(clang::doc::StringPair)

				namespace llvm {
				namespace yaml {

				template <> struct MappingTraits<clang::doc::StringPair> {
				static void mapping(IO &IO, clang::doc::StringPair &Pair) {
				IO.mapRequired("Key", Pair.Key);
				IO.mapRequired("Value", Pair.Value);
				}
				};

				jakehehrlichUnsubmitted Done Reply Inline Actions nit: Can this just do one lookup? auto F = llvm::make_unique<File>(); F->Filename = Filename; Docs.Files[Filename] = std::move(Filename); jakehehrlich: nit: Can this just do one lookup? ``` auto F = llvm::make_unique<File>(); F->Filename =…
				template <> struct MappingTraits<clang::doc::FileRecord> {
				static void mapping(IO &IO, clang::doc::FileRecord &Info) {
				IO.mapRequired("Filename", Info.Filename);
				jakehehrlichUnsubmitted Done Reply Inline Actions instead of inserting a pair can we just use '[]' syntax? jakehehrlich: instead of inserting a pair can we just use '[]' syntax?
				IO.mapRequired("Decls", Info.Decls);
				IO.mapRequired("UnattachedComments", Info.UnattachedComments);
				}
				};

				template <> struct MappingTraits<clang::doc::DeclInfo> {
				static void mapping(IO &IO, clang::doc::DeclInfo &Info) {
				IO.mapRequired("Name", Info.QualifiedName);
				IO.mapRequired("Comment", Info.Comment);
				}
				};
				jakehehrlichUnsubmitted Done Reply Inline Actions There's no need for I here, also use llvm::make_unique jakehehrlich: There's no need for I here, also use llvm::make_unique

				jakehehrlichUnsubmitted Done Reply Inline Actions If you make a "populateNamespaceInfo" method that just calls populateBasicInfo but checks to see that the namespace hasn't already been added you can move this outside of this if statement which will make it more uniform with the other invocations. Also if you then specialize a general form of "populateInfo", include a specialization for NamespaceInfo, and do a few more things in other methods I think most of these methods become identical (the Function stuff is still different). jakehehrlich: If you make a "populateNamespaceInfo" method that just calls populateBasicInfo but checks to…
				template <> struct MappingTraits<clang::doc::CommentInfo> {

				struct NormalizedStringMap {
				NormalizedStringMap(IO &) {}
				NormalizedStringMap(IO &, const llvm::StringMap<std::string> &Map) {
				for (const auto &Entry : Map) {
				clang::doc::StringPair Pair{Entry.getKeyData(), Entry.getValue()};
				VectorMap.push_back(Pair);
				}
				}

				llvm::StringMap<std::string> denormalize(IO &) {
				llvm::StringMap<std::string> Map;
				for (const auto &Pair : VectorMap)
				Map[Pair.Key] = Pair.Value;
				jakehehrlichUnsubmitted Done Reply Inline Actions ditto jakehehrlich: ditto
				return Map;
				}

				std::vector<clang::doc::StringPair> VectorMap;
				jakehehrlichUnsubmitted Done Reply Inline Actions Is this a check that populateTypeInfo could do instead? Or do we sometimes want to call populateTypeInfo on non-definitions? jakehehrlich: Is this a check that populateTypeInfo could do instead? Or do we sometimes want to call…
				};

				static void mapping(IO &IO, clang::doc::CommentInfo &Info) {
				MappingNormalization<NormalizedStringMap, llvm::StringMap<std::string>> keys(IO, Info.Attrs);

				IO.mapRequired("Kind", Info.Kind);
				if (!Info.Text.empty())
				IO.mapOptional("Text", Info.Text);
				if (!Info.Name.empty())
				IO.mapOptional("Text", Info.Name);
				if (!Info.Direction.empty())
				IO.mapOptional("Direction", Info.Direction);
				if (!Info.ParamName.empty())
				jakehehrlichUnsubmitted Done Reply Inline Actions ditto jakehehrlich: ditto
				IO.mapOptional("ParamName", Info.ParamName);
				if (!Info.CloseName.empty())
				IO.mapOptional("CloseName", Info.CloseName);
				if (Info.SelfClosing)
				JonasTothUnsubmitted Done Reply Inline Actions That loop will copy the string in every iteration. Is this intended? Same with the function: It will copy the whole list. I think at least one of both copies is not necessary. JonasToth: That loop will copy the string in every iteration. Is this intended? Same with the function…
				jakehehrlichUnsubmitted Done Reply Inline Actions Same comment here as I had on createTypeInfo/populateTypeInfo jakehehrlich: Same comment here as I had on createTypeInfo/populateTypeInfo
				IO.mapOptional("SelfClosing", Info.SelfClosing);
				if (Info.Explicit)
				IO.mapOptional("Explicit", Info.Explicit);
				if (Info.Args.size() > 0)
				IO.mapOptional("Args", Info.Args);
				JonasTothUnsubmitted Done Reply Inline Actions That move will move from the reference. That has the effect, that the call site can not use `CI` anymore because its moved from. (same in `AddDecl`). Is this intended? Even if it is, that might be very subtle. Taking by value might be a better option here. JonasToth: That move will move from the reference. That has the effect, that the call site can not use…
				if (Info.Attrs.size() > 0)
				IO.mapOptional("Attrs", keys->VectorMap);
				if (Info.Position.size() > 0)
				IO.mapOptional("Position", Info.Position);
				if (Info.Children.size() > 0)
				IO.mapOptional("Children", Info.Children);
				}
				};

				} // end namespace yaml
				JonasTothUnsubmitted Done Reply Inline Actions `std::make_pair(FileName, std::move(FI))` would be shorter. JonasToth: `std::make_pair(FileName, std::move(FI))` would be shorter.
				JDevlieghereUnsubmitted Done Reply Inline Actions Also I don't think there's a point in moving StringRefs as they are intended to be light weight already. JDevlieghere: Also I don't think there's a point in moving StringRefs as they are intended to be light weight…
				} // end namespace llvm

				namespace clang {
				namespace doc {

				ClangDocReporter::ClangDocReporter(const std::vector<std::string> &SourcePathList) {
				for (const std::string &Path : SourcePathList)
				AddFile(Path);
				jakehehrlichUnsubmitted Done Reply Inline Actions no need for I, and use llvm::make_unique jakehehrlich: no need for I, and use llvm::make_unique
				}

				void ClangDocReporter::AddComment(StringRef Filename, CommentInfo &CI) {
				FileRecords[Filename].UnattachedComments.push_back(CI);
				jakehehrlichUnsubmitted Done Reply Inline Actions Is this something that can go inside populateFunctionInfo? jakehehrlich: Is this something that can go inside populateFunctionInfo?
				}

				void ClangDocReporter::AddDecl(StringRef Filename, DeclInfo &DI) {
				FileRecords[Filename].Decls.push_back(DI);
				}

				void ClangDocReporter::AddFile(StringRef Filename) {
				FileRecord FI;
				FI.Filename = Filename;
				jakehehrlichUnsubmitted Done Reply Inline Actions As I'm looking though these methods more I'm thinking you might want to break each of these createInfo methods up into smaller parts. For instance the addLocation/addComment part is the same in everyone of these, they all extract some name from a decl, they all use that string to get an iterator to the needed item. They all check to see if that iterator is the end and then add the item to the container etc... There's a lot more opportunity for deduplication if break these things up some. jakehehrlich: As I'm looking though these methods more I'm thinking you might want to break each of these…
				FileRecords.insert(std::make_pair(Filename, FI));
				}

				CommentInfo ClangDocReporter::ParseFullComment(comments::FullComment *Comment) {
				CommentInfo CI;
				parseComment(&CI, Comment);
				return CI;
				}
				jakehehrlichUnsubmitted Done Reply Inline Actions nit: could you rewrite with a single lookup. jakehehrlich: nit: could you rewrite with a single lookup.

				JDevlieghereUnsubmitted Done Reply Inline Actions Why not `CurrentCI->SelfClosing = C->isSelfClosing()`? JDevlieghere: Why not `CurrentCI->SelfClosing = C->isSelfClosing()`?
				void ClangDocReporter::visitTextComment(const TextComment *C) {
				if (!isWhitespaceOnly(C->getText()))
				CurrentCI->Text = C->getText();
				}

				void ClangDocReporter::visitInlineCommandComment(
				jakehehrlichUnsubmitted Done Reply Inline Actions ditto again jakehehrlich: ditto again
				const InlineCommandComment *C) {
				CurrentCI->Name = getCommandName(C->getCommandID());
				for (unsigned i = 0, e = C->getNumArgs(); i != e; ++i)
				CurrentCI->Args.push_back(C->getArgText(i));
				}

				void ClangDocReporter::visitHTMLStartTagComment(const HTMLStartTagComment *C) {
				CurrentCI->Name = C->getTagName();
				CurrentCI->SelfClosing = C->isSelfClosing();
				if (C->getNumAttrs() != 0) {
				for (unsigned i = 0, e = C->getNumAttrs(); i != e; ++i) {
				const HTMLStartTagComment::Attribute &Attr = C->getAttr(i);
				JDevlieghereUnsubmitted Done Reply Inline Actions Same here. JDevlieghere: Same here.
				CurrentCI->Attrs.insert(std::make_pair(Attr.Name, Attr.Value));
				JonasTothUnsubmitted Done Reply Inline Actions The condition for this if might trigger unexpected behaviour if `getNumAttrs` returns negative values. To be safe you could use > 0 instead != 0 JonasToth: The condition for this if might trigger unexpected behaviour if `getNumAttrs` returns negative…
				jakehehrlichUnsubmitted Done Reply Inline Actions Do we need this method? jakehehrlich: Do we need this method?
				}
				}
				JonasTothUnsubmitted Done Reply Inline Actions minor nit: the loop condition is correct! when reading really fast one might overlook the if above and wonder if `i < e` might be better. but this is opinionated and just a suggestion. JonasToth: minor nit: the loop condition is correct! when reading really fast one might overlook the if…
				}

				void ClangDocReporter::visitHTMLEndTagComment(const HTMLEndTagComment *C) {
				CurrentCI->Name = C->getTagName();
				CurrentCI->SelfClosing = true;
				}

				void ClangDocReporter::visitBlockCommandComment(const BlockCommandComment *C) {
				CurrentCI->Name = getCommandName(C->getCommandID());
				for (unsigned i = 0, e = C->getNumArgs(); i != e; ++i)
				CurrentCI->Args.push_back(C->getArgText(i));
				}

				JonasTothUnsubmitted Done Reply Inline Actions Similar here. If e is negative you will execute the loop until I wraps around. Not sure if this is a real issue, depending on the postcondition of getNumArgs. JonasToth: Similar here. If e is negative you will execute the loop until I wraps around. Not sure if this…
				void ClangDocReporter::visitParamCommandComment(const ParamCommandComment *C) {
				JonasTothUnsubmitted Done Reply Inline Actions Now I have a question :) the condition `i > e` seems odd to me. `i == 0` in the first iteration and i expect `e > 0` so this loop should never execute or did I oversee something? Same below JonasToth: Now I have a question :) the condition `i > e` seems odd to me. `i == 0` in the first…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Oops my bad -- you're right. Same above/below. juliehockett: Oops my bad -- you're right. Same above/below.
				CurrentCI->Direction =
				ParamCommandComment::getDirectionAsString(C->getDirection());
				CurrentCI->Explicit = C->isDirectionExplicit();
				if (C->hasParamName() && C->isParamIndexValid())
				CurrentCI->ParamName = C->getParamNameAsWritten();
				}

				jakehehrlichUnsubmitted Done Reply Inline Actions Can we use emplace_back here instead of copying a NamedType? jakehehrlich: Can we use emplace_back here instead of copying a NamedType?
				void ClangDocReporter::visitTParamCommandComment(
				const TParamCommandComment *C) {
				if (C->hasParamName() && C->isPositionValid())
				CurrentCI->ParamName = C->getParamNameAsWritten();

				if (C->isPositionValid()) {
				for (unsigned i = 0, e = C->getDepth(); i != e; ++i)
				CurrentCI->Position.push_back(C->getIndex(i));
				}
				}
				JonasTothUnsubmitted Done Reply Inline Actions Similar thing. JonasToth: Similar thing.

				jakehehrlichUnsubmitted Done Reply Inline Actions If you use emplace_back here you don't need the explicit std::move jakehehrlich: If you use emplace_back here you don't need the explicit std::move
				void ClangDocReporter::visitVerbatimBlockComment(
				const VerbatimBlockComment *C) {
				CurrentCI->Name = getCommandName(C->getCommandID());
				CurrentCI->CloseName = C->getCloseName();
				}

				void ClangDocReporter::visitVerbatimBlockLineComment(
				const VerbatimBlockLineComment *C) {
				if (!isWhitespaceOnly(C->getText()))
				CurrentCI->Text = C->getText();
				}

				void ClangDocReporter::visitVerbatimLineComment(const VerbatimLineComment *C) {
				if (!isWhitespaceOnly(C->getText()))
				CurrentCI->Text = C->getText();
				}
				JonasTothUnsubmitted Done Reply Inline Actions `comments::Comment` could get a `childs()` method returning a view to iterate with nice range based loops. JonasToth: `comments::Comment` could get a `childs()` method returning a view to iterate with nice range…
				JDevlieghereUnsubmitted Done Reply Inline Actions Yup, you can use `llvm::make_range` for this. JDevlieghere: Yup, you can use `llvm::make_range` for this.

				bool ClangDocReporter::HasFile(StringRef Filename) const {
				return FileRecords.find(Filename) != FileRecords.end();
				}

				bool ClangDocReporter::HasSeenFile(StringRef Filename) const {
				return FilesSeen.find(Filename) != FilesSeen.end();
				}

				void ClangDocReporter::Serialize(clang::doc::OutFormat Format,
				llvm::raw_ostream &OS) const {
				Format == clang::doc::OutFormat::LLVM ? serializeLLVM(OS) : serializeYAML(OS);
				}

				void ClangDocReporter::parseComment(CommentInfo CI, comments::Comment C) {
				CurrentCI = CI;
				CI->Kind = C->getCommentKindName();
				ConstCommentVisitor<ClangDocReporter>::visit(C);
				for (comments::Comment *Child : llvm::make_range(C->child_begin(), C->child_end())) {
				CommentInfo ChildCI;
				parseComment(&ChildCI, Child);
				CI->Children.push_back(ChildCI);
				}
				JonasTothUnsubmitted Done Reply Inline Actions Extract range into utility method of `Comment` JonasToth: Extract range into utility method of `Comment`
				juliehockettAuthorUnsubmitted Done Reply Inline Actions See above -- `comments::Comment` is a clang type that stores all the information about a particular piece of a comment -- the `CommentInfo` struct is specific to the clang-doc setup. Is that what you're thinking about? juliehockett: See above -- `comments::Comment` is a clang type that stores all the information about a…
				JonasTothUnsubmitted Done Reply Inline Actions I was just thinking that creating this range here is clumsy. But if there is no way to fix that easily you can leave it as is. This means the suggested `Children` thing above is a non-issue is it? If yes just ignore my comments then :) JonasToth: I was just thinking that creating this range here is clumsy. But if there is no way to fix that…
				}

				void ClangDocReporter::serializeYAML(llvm::raw_ostream &OS) const {
				yaml::Output Output(OS);
				for (const auto &F : FileRecords) {
				FileRecord NonConstValue = F.second;
				Output << NonConstValue;
				}
				JonasTothUnsubmitted Done Reply Inline Actions will copy `S`. using a `const&` removes the potential allocation JonasToth: will copy `S`. using a `const&` removes the potential allocation
				}

				void ClangDocReporter::serializeLLVM(llvm::raw_ostream &OS) const {
				// TODO: Implement.
				OS << "Not yet implemented.\n";
				}

				const char *ClangDocReporter::getCommandName(unsigned CommandID) {
				;
				const CommandInfo *Info = CommandTraits::getBuiltinCommandInfo(CommandID);
				if (Info)
				return Info->Name;
				// TODO: Add parsing for \file command.
				JonasTothUnsubmitted Done Reply Inline Actions Empty Statement JonasToth: Empty Statement
				return "<not a builtin command>";
				}

				bool ClangDocReporter::isWhitespaceOnly(StringRef S) {
				return S.find_first_not_of(" \t\n\v\f\r") == std::string::npos \|\| S.empty();
				}

				} // namespace doc
				} // namespace clang
				jakehehrlichUnsubmitted Done Reply Inline Actions So it looks like the reason you need CurrentCI is because the visit methods need it and you need different CI's to be used at different visit calls but the visit methods can't take any more parameters. I think you should put the visit methods in another class that takes a pointer to a CommentInfo as an argument to the constructor. I think that should clean up this code smell and help mitigate the use of shared_ptr everywhere. jakehehrlich: So it looks like the reason you need CurrentCI is because the visit methods need it and you…
				jakehehrlichUnsubmitted Done Reply Inline Actions can these be const auto&? jakehehrlich: can these be const auto&?
				jakehehrlichUnsubmitted Done Reply Inline Actions ditto on these jakehehrlich: ditto on these
				jakehehrlichUnsubmitted Done Reply Inline Actions Do we need to explicitly pass these types? I think template argument deduction should fill this in for us. jakehehrlich: Do we need to explicitly pass these types? I think template argument deduction should fill this…
				jakehehrlichUnsubmitted Done Reply Inline Actions ditto on explicit tempalte argument. jakehehrlich: ditto on explicit tempalte argument.
				jakehehrlichUnsubmitted Done Reply Inline Actions Can you factor this out into a function and dedup the code below? jakehehrlich: Can you factor this out into a function and dedup the code below?
				jakehehrlichUnsubmitted Done Reply Inline Actions Can you report an error to the user that says something along the lines of "not implemented yet" (leave the TODO as well) jakehehrlich: Can you report an error to the user that says something along the lines of "not implemented…
				jakehehrlichUnsubmitted Done Reply Inline Actions I think it would be better if instead of returning a string, you just fail and print a message to the user (well, first print the message and then fail). jakehehrlich: I think it would be better if instead of returning a string, you just fail and print a message…
				jakehehrlichUnsubmitted Done Reply Inline Actions I think my personal preference (not a universally followed preference) is to use a StringRef because at some point you might very well need a StringRef. jakehehrlich: I think my personal preference (not a universally followed preference) is to use a StringRef…
				jakehehrlichUnsubmitted Done Reply Inline Actions Rather than rolling this on your own you should use S.find_if_not(std::isspace). Also if S is empty then find_* functions should always return npos right? If that's the case then you don't need S.empty(). jakehehrlich: Rather than rolling this on your own you should use S.find_if_not(std::isspace). Also if S is…
				jakehehrlichUnsubmitted Done Reply Inline Actions You use the same basic code 3 times for different file names. Can you factor that out into a function? Also in this block you output the OutErrorInfo message but in blocks below you don't. You should always output that message. jakehehrlich: You use the same basic code 3 times for different file names. Can you factor that out into a…
				jakehehrlichUnsubmitted Done Reply Inline Actions Instead of assigning a CI like this, could you construct a new ClangDocCommentVisitor on the stack? The idea would be that you could would still have a "CI" member variable that would be set in the ClangDocCommentVisitor's constructor. That way it never has to change and each visitor is just responsible for constructing one CommentInfo jakehehrlich: Instead of assigning a CI like this, could you construct a new ClangDocCommentVisitor on the…

tools/clang-doc/tool/CMakeLists.txt

This file was added.

				include_directories(${CMAKE_CURRENT_SOURCE_DIR}/..)

				add_clang_executable(clang-doc
				ClangDocMain.cpp
				)

				target_link_libraries(clang-doc
				clangAST
				clangASTMatchers
				clangBasic
				clangFormat
				clangFrontend
				clangDoc
				clangRewrite
				clangTooling
				clangToolingCore
				)

				No newline at end of file

tools/clang-doc/tool/ClangDocMain.cpp

This file was added.

				//===-- ClangDocMain.cpp - Clangdoc ------------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "ClangDoc.h"
				#include "clang/Driver/Options.h"
				#include "clang/Frontend/FrontendActions.h"
				#include "clang/Tooling/CommonOptionsParser.h"
				#include "clang/Tooling/Tooling.h"
				#include "llvm/Support/Process.h"
				#include "llvm/Support/Signals.h"
				#include <string>

				using namespace clang;
				using namespace llvm;

				namespace {

				cl::OptionCategory ClangDocCategory("clang-doc options");

				cl::opt<bool>
				EmitLLVM("emit-llvm",
				cl::desc("Output in LLVM bitstream format (default is YAML)."),
				cl::init(false), cl::cat(ClangDocCategory));

				cl::opt<bool>
				DoxygenOnly("doxygen",
				cl::desc("Use only doxygen-style comments to generate docs."),
				cl::init(false), cl::cat(ClangDocCategory));

				} // namespace

				int main(int argc, const char **argv) {
				llvm::sys::PrintStackTraceOnErrorSignal(argv[0]);
				tooling::CommonOptionsParser OptionsParser(argc, argv, ClangDocCategory);

				clang::doc::OutFormat EmitFormat;
				EmitLLVM ? EmitFormat = clang::doc::OutFormat::LLVM :
				JonasTothUnsubmitted Done Reply Inline Actions The two lines could be merged when initializing `EmitFormat` directly. JonasToth: The two lines could be merged when initializing `EmitFormat` directly.
				EmitFormat = clang::doc::OutFormat::YAML;
				JDevlieghereUnsubmitted Done Reply Inline Actions I'm curious if there's a particular reason that you seems to prefer EmitLLVM ? EmitFormat = clang::doc::OutFormat::LLVM : EmitFormat = clang::doc::OutFormat::YAML; over EmitFormat = EmitLLVM ? clang::doc::OutFormat::LLVM : clang::doc::OutFormat::YAML; JDevlieghere: I'm curious if there's a particular reason that you seems to prefer ``` EmitLLVM ? EmitFormat…

				// TODO: Update the source path list to only consider changed files for
				// incremental doc updates
				doc::ClangDocReporter Reporter(OptionsParser.getSourcePathList());
				JonasTothUnsubmitted Done Reply Inline Actions Missing full stop. Comments are supposed to be full sentences by convention. JonasToth: Missing full stop. Comments are supposed to be full sentences by convention.
				doc::ClangDocContext Context{EmitFormat};

				tooling::ClangTool Tool(OptionsParser.getCompilations(),
				OptionsParser.getSourcePathList());

				if (!DoxygenOnly)
				Tool.appendArgumentsAdjuster(tooling::getInsertArgumentAdjuster(
				"-fparse-all-comments", tooling::ArgumentInsertPosition::BEGIN));

				doc::ClangDocActionFactory Factory(Context, Reporter);

				llvm::outs() << "Parsing codebase...\n";
				int Status = Tool.run(&Factory);
				if (Status)
				return Status;

				llvm::outs() << "Writing docs...\n";
				Reporter.Serialize(EmitFormat, llvm::outs());

				return 0;
				}
				JonasTothUnsubmitted Done Reply Inline Actions final `return 0;` for main is missing. JonasToth: final `return 0;` for main is missing.
				jakehehrlichUnsubmitted Done Reply Inline Actions Can you convert this error_code to a message and display that to the user? jakehehrlich: Can you convert this error_code to a message and display that to the user?

This is an archive of the discontinued LLVM Phabricator instance.

Setup clang-doc frontend frameworkClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 126645

tools/CMakeLists.txt

tools/clang-doc/CMakeLists.txt

tools/clang-doc/ClangDoc.h

tools/clang-doc/ClangDoc.cpp

tools/clang-doc/ClangDocReporter.h

tools/clang-doc/ClangDocReporter.cpp

tools/clang-doc/tool/CMakeLists.txt

tools/clang-doc/tool/ClangDocMain.cpp

Setup clang-doc frontend framework
ClosedPublic