This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
CMakeLists.txt
-
clang-doc/
-
CMakeLists.txt
10/10
ClangDoc.h
2/5
ClangDoc.cpp
11/11
ClangDocBinary.h
16/17
ClangDocBinary.cpp
8/8
ClangDocMapper.h
10/10
ClangDocMapper.cpp
3/3
ClangDocRepresentation.h
-
tool/
-
CMakeLists.txt
5/5
ClangDocMain.cpp
-
docs/
3/3
clang-doc.rst
-
test/
1/1
CMakeLists.txt
-
clang-doc/
-
mapper-namespace.cpp
-
mapper-type.cpp

Differential D41102

Setup clang-doc frontend framework
ClosedPublic

Authored by juliehockett on Dec 11 2017, 5:36 PM.

Download Raw Diff

Details

Reviewers

klimek
jakehehrlich
sammccall
lebedev.ri

Commits

Summary

Setting up the mapper part of the frontend framework for a clang-doc tool. It creates a series of relevant matchers for declarations, and uses the ToolExecutor to traverse the AST and extract the matching declarations and comments. The mapper serializes the extracted information to individual records for reducing and eventually doc generation.

For a more detailed overview of the tool, see the design document on the mailing list: RFC: clang-doc proposal

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Refactoring bitcode writer

Next, i suggest to look into code self-debugging, see comments.
Also, i have added a few questions, it would be great to know that my understanding is correct?

I'm sorry that it seems like we are going over and over and over over the same code again,
this is the very base of the tool, i think it is important to get it as close to great as possible.
I *think* these review comments move it in that direction, not in the opposite direction?

clang-doc/BitcodeWriter.cpp
47 ↗	(On Diff #135559)	So in other words this is making an assumption that no file with more than 65535 lines will be analyzed, correct? Can you add that as comment please?
56 ↗	(On Diff #135559)	AbbrevDsc Abbrev = nullptr;
57 ↗	(On Diff #135559)	// Is this 'description' valid? operator bool() const { return Abbrev != nullptr && Name.data() != nullptr && !Name.empty(); }
137 ↗	(On Diff #135559)	So `FUNCTION_MANGLED_NAME` is phased out, and is thus missing, as far as i understand?
148 ↗	(On Diff #135559)	+`assert(RecordIdNameMap[ID] && "Unknown Abbreviation");`
153 ↗	(On Diff #135559)	+`assert(RecordIdNameMap[ID] && "Unknown Abbreviation");`
158 ↗	(On Diff #135559)	Called only once, and that call does nothing. I'd drop it.
175 ↗	(On Diff #135559)	/// \brief Emits a block ID and the block name to the BLOCKINFO block. void ClangDocBitcodeWriter::emitBlockID(BlockId ID) { const auto& BlockIdName = BlockIdNameMap[ID]; assert(BlockIdName.data() && BlockIdName.size() && "Unknown BlockId!"); Record.clear(); Record.push_back(ID); Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETBID, Record); Record.clear(); for (const char C : BlockIdName) Record.push_back(C); Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_BLOCKNAME, Record); }
187 ↗	(On Diff #135559)	/// \brief Emits a record name to the BLOCKINFO block. void ClangDocBitcodeWriter::emitRecordID(RecordId ID) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); prepRecordData(ID); (Yes, `prepRecordData()` will have the same code. It should get optimized away.)
194 ↗	(On Diff #135559)	void ClangDocBitcodeWriter::emitAbbrev(RecordId ID, BlockId Block) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); auto Abbrev = std::make_shared<BitCodeAbbrev>();
204 ↗	(On Diff #135559)	So remember that in a previous iteration, seemingly useless `AbbrevDsc` stuff was added to the `RecordIdNameMap`? It is going to pay-off now: void ClangDocBitcodeWriter::emitRecord(StringRef Str, RecordId ID) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); assert(RecordIdNameMap[ID].Abbrev == &StringAbbrev && "Abbrev type mismatch"); if (!prepRecordData(ID, !Str.empty())) return; ... And if we did not add an `RecordIdNameMap` entry for this `RecordId`, then i believe that will also be detected because `Abbrev` will be a `nullptr`.
205 ↗	(On Diff #135559)	assert(Str.size() < (1U << BitCodeConstants::StringLengthSize)); Record.push_back(Str.size());
210 ↗	(On Diff #135559)	void ClangDocBitcodeWriter::emitRecord(const Location &Loc, RecordId ID) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); assert(RecordIdNameMap[ID].Abbrev == &LocationAbbrev && "Abbrev type mismatch"); if (!prepRecordData(ID, !OmitFilenames)) return; ...
211 ↗	(On Diff #135559)	Call me paranoid, but: assert(Loc.LineNumber < (1U << BitCodeConstants::LineNumberSize)); Record.push_back(Loc.LineNumber); assert(Loc.Filename.size()) < (1U << BitCodeConstants::StringLengthSize)); Record.push_back(Loc.Filename.size());
217 ↗	(On Diff #135559)	void ClangDocBitcodeWriter::emitRecord(int Val, RecordId ID) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); assert(RecordIdNameMap[ID].Abbrev == &IntAbbrev && "Abbrev type mismatch"); if (!prepRecordData(ID, Val)) return;
218 ↗	(On Diff #135559)	assert(Val < (1U << BitCodeConstants::IntSize)); Record.push_back(Val);
222 ↗	(On Diff #135559)	bool ClangDocBitcodeWriter::prepRecordData(RecordId ID, bool ShouldEmit) { assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); if (!ShouldEmit) return false;
232 ↗	(On Diff #135559)	Since `ClangDocBitcodeWriter` is not re-used, but re-constructed* each time, `Abbrevs.clear();` does nothing. Hmm, i wonder if that will be a bad thing. Benchmarking will tell i guess :/
236 ↗	(On Diff #135559)	https://godbolt.org/g/rD6BWK also suggests it should be `static const`
276 ↗	(On Diff #135559)	Uhm, do you plan on calling `emitBlockInfo()` from anywhere else other than `emitBlockInfoBlock()`? Since it takes `const std::vector<RecordId>` instead of a `const std::initializer_list<RecordId>&`, a memory copy will happen... https://godbolt.org/g/rD6BWK
clang-doc/BitcodeWriter.h
35 ↗	(On Diff #135559)	`LineNumFixedSize` is used for a different things. Given such a specific name, i think it may be confusing? Also, looking at http://llvm.org/doxygen/classllvm_1_1BitstreamWriter.html#ae6a40b4a5ea89bb8b5076c26e0d0b638 i guess these all should be `unsigned`. I think this would be better, albeit more verbose: struct BitCodeConstants { static constexpr unsigned SignatureBitSize = 8U; static constexpr unsigned SubblockIDSize = 5U; static constexpr unsigned IntSize = 16U; static constexpr unsigned StringLengthSize = 16U; static constexpr unsigned LineNumberSize = 16U; };
53 ↗	(On Diff #135559)	So what exactly does `BitCodeConstants::SubblockIDSize` mean? static_assert(BI_LAST < (1U << BitCodeConstants::SubblockIDSize), "Too many block id's!"); ?
94 ↗	(On Diff #135559)	So i have a question: if something (`FUNCTION_MANGLED_NAME` in this case) is phased out, does it have to stay in this enum? That will introduce holes in `RecordIdNameMap`. Are the actual numerical id's of enumerators stored in the bitcode, or the string (abbrev, `RecordIdNameMap[].Name`)? Looking at tests, i guess these enums are internal detail, and they can be changed freely, including removing enumerators. Am i wrong? I think that should be explained in a comment before this `enum`.
100 ↗	(On Diff #135559)	If `AbbreviationMap` comment makes sense, i guess that common code should be moved here, i.e. static constexpr unsigned RecordIdCount = RI_LAST - RI_FIRST + 1; and use this new variable in those two places.
163 ↗	(On Diff #135559)	We know we will have at most `RI_LAST - RI_FIRST + 1` abbreviations. Right now that results in just ~40 abbreviations. Would it make sense to AbbreviationMap() : Abbrevs(RI_LAST - RI_FIRST + 1) {} ? (or `llvm::DenseMap<unsigned, unsigned> Abbrevs = llvm::DenseMap<unsigned, unsigned>(RI_LAST - RI_FIRST + 1);` but that looks uglier to me..)

The change to USR seems like quite an improvement already! That being said, I do think that it might be preferable to opt out of the use of strings for linking things together. What we did with our clang-doc is that we directly used pointers to refer to other types. So for example, our class for storing Record/CXX related information has something like:

std::vector<Function*>	mMethods;
std::vector<Variable*>	mVariables;
std::vector<Enum*>	mEnums;
std::vector<Typedef*>	mTypedefs;

Only upon serialization we fetch some kind of USR that would uniquely identify the type. This is especially useful to us for the conversion to HTML and I think the same would go for this backend, as it seems this way you'll have to do string lookups to get to the actual types, which would be inefficient in multiple aspects. It can make the backend a little more of a one-on-one conversion, e.g. with one of our HTML template definitions (note: this is a Jinja2 template in Python):

{%- for enum in inEntry.GetMemberEnums() -%}
	<tr class="separator">
		<td class="memSeparator" colspan="3"></td>
	</tr>
	<tr class="memitem:EAllocatorStrategy">
		<td class="memItemLeft" align="right">{{- Modifiers.RenderAccessModifier(enum.GetAccessModifier()) -}}</td>
		<td class="memItemMiddle" align="left">enum <a href="{{ enum.GetID() }}.html">{{- enum.GetName().GetName()|e -}}</a></td>
		<td class="memItemRight" valign="bottom">{{- Descriptions.RenderDescription(enum.GetBriefDescription()) -}}</td>
	</tr>
{%- endfor -%}

Disadvantage is of course that you add complexity to certain parts of the deserialization (/serialization) for nested types and inheritance, by either having to do so in the correct order or having to defer the process of initializing these pointers. But see this as just as some thought sharing. I do think this would improve the interaction in the backend (assuming you use the same representation as currently in the frontend). Also, we didn't apply this to our Type representation (which we use to store the type of a member, parameter etc.), which stores the name of the type rather than a pointer to it (since it can also be a built-in), though it embeds pretty much every possible modifier on said type, like this:

EntryName			mName;									
bool				mIsConst = false;						
EReferenceType			mReferenceType = EReferenceType::None;	
std::vector<bool>		mPointerConstnessMask;					
std::vector<std::string>	mArraySizes;							
bool				mIsAtomic = false;						
std::vector<Attribute>		mAttributes;							
bool				mIsExpansion = false;					
std::vector<TemplateArgument>	mTemplateArguments;						
std::unique_ptr<FunctionTypeProperties>     mFunctionTypeProperties = nullptr;		
EntryName			mParentCXXEntry;

The last member refers to the case where a pointer is a pointer to member, though some other fields may require some explaining too. Anyway, this is just to give some insight into how we structured our representation, where we largely omitted string representations where possible.

Have you actually started work already on some backend? Developing backend and frontend in tandem can provide some additional insights as to how things should be structured, especially representation-wise!

clang-doc/Representation.h
113 ↗	(On Diff #135559)	How come these are actually unique ptrs? They can be stored directly in the vector, right? (same for CommentInfo children, FnctionInfo params etc.)

Please run Clang-format and Clang-tidy modernize.

clang-doc/Representation.h
80 ↗	(On Diff #135559)	Please separate constructors from data members with empty line.

Continued refactoring the bitcode writer
Added a USR attribute to infos
Created a Reference struct to replace the string references to other infos

In D41102#1017499, @Athosvk wrote:

Disadvantage is of course that you add complexity to certain parts of the deserialization (/serialization) for nested types and inheritance, by either having to do so in the correct order or having to defer the process of initializing these pointers. But see this as just as some thought sharing. I do think this would improve the interaction in the backend (assuming you use the same representation as currently in the frontend).

I agree that the pointer approach would be much more efficient on the backend, but the issue here is that the mapper has no idea where the representation of anything other than the decl it's currently looking at will be, since it sees each decl and serializes it immediately. The reducer, on the other hand, will be able to see everything, and so such pointers could be added as a pass over the final reduced data structure.
So, as an idea (as this diff implements), I updated the string references to be a struct, which holds the USR of the referenced type (for serialization, both here in the mapper and for the dump option in the reducer, as well as a pointer to an Info struct. This pointer is not used at this point, but would be populated by the reducer. Thoughts?

Have you actually started work already on some backend? Developing backend and frontend in tandem can provide some additional insights as to how things should be structured, especially representation-wise!

I added you as a subscriber on the follow-up patches (the reducer, YAML/MD formats) -- would love to hear your thoughts! As of now, the MD output is very rough, but I'm hoping to keep moving forward on that in the next few days.

clang-doc/BitcodeWriter.h
53 ↗	(On Diff #135559)	It's the current abbrev id width for the block (described here), so it's the max id width for the block's abbrevs.
94 ↗	(On Diff #135559)	Yes, the enum is an implementation detail (`FUNCTION_MANGLED_NAME` should have been removed earlier). I'll put the comment describing how it works!

Fixing CMakeLists formatting

Could you please add a bit more tests? In particular, i'd like to see how blocks-in-blocks work.
I.e. class-in-class, class-in-function, ...

Is there some (internal to BitstreamWriter) logic that would 'assert()' if trying to output some recordid
which is, according to the BLOCKINFO_BLOCK, should not be there?
E.g. outputting VERSION in BI_COMMENT_BLOCK_ID?

clang-doc/BitcodeWriter.cpp
30 ↗	(On Diff #135682)	Ok, these three functions still look off, how about this? // Yes, not by reference, https://godbolt.org/g/T52Vcj static void AbbrevGen(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev, const std::initializer_list<llvm::BitCodeAbbrevOp> Ops) { for(const auto &Op : Ops) Abbrev->Add(Op); } static void IntAbbrev(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev) { AbbrevGen(Abbrev, { // 0. Fixed-size integer {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::IntSize}}); } static void StringAbbrev(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev) { AbbrevGen(Abbrev, { // 0. Fixed-size integer (length of the following string) {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::StringLengthSize}, // 1. The string blob {llvm::BitCodeAbbrevOp::Blob}}); } // Assumes that the file will not have more than 65535 lines. static void LocationAbbrev(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev) { AbbrevGen(Abbrev, { // 0. Fixed-size integer (line number) {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::LineNumberSize}, // 1. Fixed-size integer (length of the following string (filename)) {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::StringLengthSize}, // 2. the string blob {llvm::BitCodeAbbrevOp::Blob}}); } Though i bet clang-format will mess-up the formatting again :/
108 ↗	(On Diff #135682)	Some of these `IntAbbrev`'s are actually `bool`s. Would it make sense to already think about being bitcode-size-conservative and introduce `BoolAbbrev` from the get go? static void BoolAbbrev(std::shared_ptr<llvm::BitCodeAbbrev> &Abbrev) { AbbrevGen(Abbrev, { // 0. Fixed-size boolean {llvm::BitCodeAbbrevOp::Fixed, BitCodeConstants::BoolSize}}); } where `BitCodeConstants::BoolSize` = `1U` ? Or is there some internal padding that would make that pointless?
156 ↗	(On Diff #135682)	Uh, oh, i'm sorry, all(?) these `"Unknown Abbreviation"` are likely copypaste gone wrong. I'm not sure why i wrote that comment. `"Unknown RecordId"` might make more sense?
240 ↗	(On Diff #135682)	Ok, now that i think about it, it can't be that easy. Maybe FIXME: assumes 8 bits per byte assert(llvm::APInt(8Usizeof(Val), Val, /isSigned=*/true).getBitWidth() <= BitCodeConstants::IntSize)); Not sure whether `getBitWidth()` is really the right function to ask though. (Not sure how this all works for negative numbers)
clang-doc/BitcodeWriter.h
172 ↗	(On Diff #135682)	Newline after constructor
216 ↗	(On Diff #135682)	`// Emission of appropriate abbreviation type`
53 ↗	(On Diff #135559)	So in other words that `static_assert()` is doing the right thing? Add it after the `enum BlockId{}` then please, will both document things, and ensure that things remain in a sane state.

Thank you for working on this!
Some more thoughts.

clang-doc/BitcodeWriter.cpp
191 ↗	(On Diff #135682)	Why do we have this indirection? Is there a need to first to (unefficiently?) copy to `Record`, and then emit from there? Wouldn't this work just as well? Record.clear(); Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_BLOCKNAME, BlockIdNameMap[ID]);
196 ↗	(On Diff #135682)	Hmm, so i've been staring at this and http://llvm.org/doxygen/classllvm_1_1BitstreamWriter.html and i must say i'm not fond of this indirection. What i don't understand is, in previous function, we don't store `BlockId`, why do we want to store `RecordId`? Aren't they both unstable, and are implementation detail? Do we want to store it (`RecordId`)? If yes, please explain it as a new comment in code. If no, i guess this would work too? assert(RecordIdNameMap[ID] && "Unknown Abbreviation"); Record.clear(); Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETRECORDNAME, RecordIdNameMap[ID].Name); And after that you can lower the default size of `SmallVector<> Record` down to, hm, `4`?
clang-doc/BitcodeWriter.h
161 ↗	(On Diff #135682)	This alias is used exactly once, for `Record` member variable in this class. Is there any point in having this alias?
161 ↗	(On Diff #135682)	Also, why is `uint64_t` used? We either push `char`, or `enum`, or `int`. Do we ever need 64-bit?
clang-doc/ClangDoc.h
48	Please add space before `{}`, and drop unneeded `;`
clang-doc/Mapper.h
56 ↗	(On Diff #135682)	`ClangDocMapper` class is staring to look like a god-class. I would recommend: Rename `ClangDocMapper` to `ClangDocASTVisitor`. It's kind-of conventional to name `RecursiveASTVisitor`-based classes like that. Move `ClangDocCommentVisitor` out of the `ClangDocMapper`, into `namespace {}` in `clang-doc/Mapper.cpp` Split `ClangDocSerializer` into new .h/.cpp Replace `ClangDocSerializer Serializer;` with `ClangDocSerializer& Serializer;` Instantiate `ClangDocSerializer` (in `MapperActionFactory`, i think?) before `ClangDocMapper` Pass `ClangDocSerializer&` into `ClangDocMapper` ctor.

lebedev.ri mentioned this in D43779: [Tooling] [0/1] Refactor FrontendActionFactory::create() to return std::unique_ptr<>.Feb 26 2018, 12:47 PM

Moved the serialization logic out of the Mapper class and into its own namespace
Updated tests
Addressing comments

In D41102#1017918, @lebedev.ri wrote:

Is there some (internal to BitstreamWriter) logic that would 'assert()' if trying to output some recordid
which is, according to the BLOCKINFO_BLOCK, should not be there?
E.g. outputting VERSION in BI_COMMENT_BLOCK_ID?

Yes -- it will fail an assertion:
Assertion 'V == Op.getLiteralValue() && "Invalid abbrev for record!"' failed.

clang-doc/BitcodeWriter.cpp
191 ↗	(On Diff #135682)	No, since `BlockIdNameMap[ID]` returns a `StringRef`, which can be manipulated into an `std::string` or a `const char*`, but the `Stream` wants an `unsigned char`. So, the copying is to satisfy that. Unless there's a better way to convert a `StringRef` into an array of `unsigned char`?
196 ↗	(On Diff #135682)	I'm not entirely certain what you mean -- in `emitBlockId()`, we are storing both the block id and the block name in separate records (`BLOCKINFO_CODE_SETBID`, `BLOCKINFO_CODE_BLOCKNAME`, respectively). In `emitRecordId()`, we're doing something slightly different, in that we emit one record with both the record id and the record name (in record `BLOCKINFO_CODE_SETRECORDNAME`). Replacing the copy loop here has the same issue as above, namely that there isn't an easy way to convert between a `StringRef` and an array of `unsigned char`.
240 ↗	(On Diff #135682)	That assertion fails :/ I could do something like `static_cast<int64_t>(Val) == Val` but that would require a) IntSize being a power of 2 b) updating the assert anytime IntSize is updated, and 3) still throws a warning about comparing a signed to an unsigned int...
clang-doc/BitcodeWriter.h
53 ↗	(On Diff #135559)	No...it's the (max) number of the abbrevs relevant to the block itself, which is to say some subset of the RecordIds for any given block (e.g. for a `BI_COMMENT_BLOCK`, the number of abbrevs would be 12 and so on the abbrev width would be 4). To assert for it we could put block start/end markers on the RecordIds and then use that to calculate the bitwidth, if you think the assertion should be there.

Diffusion mentioned this in rC326201: [Tooling] [0/1] Refactor FrontendActionFactory::create() to return std….Feb 27 2018, 7:22 AM

Diffusion mentioned this in rL326201: [Tooling] [0/1] Refactor FrontendActionFactory::create() to return std….

Tried fixing tooling::FrontendActionFactory::create() in D43779/D43780, but had to revert due to gcc4.8 issues :/

Thank you for working on this, some more review notes.

In D41102#1020107, @juliehockett wrote:

In D41102#1017918, @lebedev.ri wrote:

Is there some (internal to BitstreamWriter) logic that would 'assert()' if trying to output some recordid
which is, according to the BLOCKINFO_BLOCK, should not be there?
E.g. outputting VERSION in BI_COMMENT_BLOCK_ID?

Yes -- it will fail an assertion:
Assertion 'V == Op.getLiteralValue() && "Invalid abbrev for record!"' failed.

Ok, great.
And it will also complain if you try to output a block within block?

clang-doc/BitcodeWriter.cpp
191 ↗	(On Diff #135682)	Aha, i see, did not think of that. But there is a `bytes()` function in `StringRef`, which returns `iterator_range<const unsigned char *>`. Would it help? http://llvm.org/doxygen/classllvm_1_1StringRef.html#a5e8f22c3553e341404b445430a3b075b
240 ↗	(On Diff #135682)	I see. Let's not have this assertion for now, just a `FIXME`.
184 ↗	(On Diff #136010)	That comment seems wrong. If the namespace is indeed supposed to be closed, it should happen after the lambda is called, i.e. assert(RecordIdNameMap.size() == RecordIdCount); return RecordIdNameMap; }(); } // namespace doc // AbbreviationMap
265 ↗	(On Diff #136010)	I think it is as simple as assert(Loc.LineNumber < (1U << BitCodeConstants::LineNumberSize)); ?
367 ↗	(On Diff #136010)	So i guess this should be: void ClangDocBitcodeWriter::emitBlockInfo( BlockId BID, const std::initializer_list<RecordId> &RIDs) { assert(RIDs.size() < (1U << BitCodeConstants::SubblockIDSize), "Too many records in a block!"); emitBlockID(BID); ... ?
clang-doc/BitcodeWriter.h
53 ↗	(On Diff #135559)	Aha, i see, so that should go into `ClangDocBitcodeWriter::emitBlockInfoBlock()`, since that already has that info. (On a related node, it feels like this all should be somehow tablegen-generated, but that is for some later, post-commit cleanup.)

Fixing comments

In D41102#1020808, @lebedev.ri wrote:

Ok, great.
And it will also complain if you try to output a block within block?

Um...no. Since you can have subblocks within blocks.

clang-doc/BitcodeWriter.cpp
191 ↗	(On Diff #135682)	Replaced it with an ArrayRef to the `bytes_begin()` and `bytes_end()`, but that only works for the block id, not the record id, since `emitRecordId()` also has to emit the ID number in addition to the name in the same record.
265 ↗	(On Diff #136010)	`LineNumber` is a signed int, so the compiler complains that we're comparing signed and unsigned ints.

lebedev.ri added inline comments.Feb 28 2018, 7:23 AM

clang-doc/BitcodeWriter.h

37 ↗

(On Diff #136161)

Hmm, you build with asserts enabled, right?
I tried testing this, and three tests fail with

clang-doc: /build/llvm/include/llvm/Bitcode/BitstreamWriter.h:122: void llvm::BitstreamWriter::Emit(uint32_t, unsigned int): Assertion `(Val & ~(~0U >> (32-NumBits))) == 0 && "High bits set!"' failed.

Failing Tests (3):
    Clang Tools :: clang-doc/mapper-class-in-function.cpp
    Clang Tools :: clang-doc/mapper-function.cpp
    Clang Tools :: clang-doc/mapper-method.cpp

  Expected Passes    : 6
  Unexpected Failures: 3

At least one failure is because of BoolSize, so i'd suspect the assertion itself is wrong...

Running clang-format and fixing newlines

clang-doc/BitcodeWriter.h
37 ↗	(On Diff #136161)	I do, and I've definitely seen that one triggered before but it's been because something was off in how the data was being outputted as I was shifting things around. That said, I'm not seeing it in my local build with this diff though -- I'll update it again just to make sure they're in sync.

Thank you for working on this!
Some more review notes.
Please look into adding a bit more tests.

clang-doc/BitcodeWriter.cpp
196 ↗	(On Diff #135682)	Tried locally, and yes, we do need to output record id. What we could actually do, is simply inline that `EmitRecord()`, first emitting the RID, and then the name. template <typename Container> void EmitRecord(unsigned Code, int ID, const Container &Vals) { // If we don't have an abbrev to use, emit this in its fully unabbreviated // form. auto Count = static_cast<uint32_t>(makeArrayRef(Vals).size()); EmitCode(bitc::UNABBREV_RECORD); EmitVBR(Code, 6); EmitVBR(Count + 1, 6); // Including ID EmitVBR64(ID, 6); // 'Prefix' with ID for (unsigned i = 0, e = Count; i != e; ++i) EmitVBR64(Vals[i], 6); } But that will result in rather ugly code. So given that the record names are quite short, and all the other strings we output directly, maybe leave it as it is for now, until it shows in profiles?
179 ↗	(On Diff #136303)	Since this is the only string we ever push to `Record`, can we add an assertion to make sure we always have enough room for it? E.g. for (const auto &Init : Inits) { RecordId RID = Init.first; RecordIdNameMap[RID] = Init.second; assert((1 + RecordIdNameMap[RID].size()) <= Record.size()); // Since record was just created, it should not have any dynamic size. // Or move the small size into a variable and use it when declaring the Record and here. }
230 ↗	(On Diff #136303)	Sadly, i can not prove it via godbolt (can't add LLVM as library), but i'd expect streamlining this should at least not hurt, i.e. something like Record.append(RecordIdNameMap[ID].Name.begin(), RecordIdNameMap[ID].Name.end()); ?
clang-doc/BitcodeWriter.h
37 ↗	(On Diff #136161)	I did not retry with updated tree/patch, but i'm quite sure i did hit those asserts. My current build line: -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DLLVM_BINUTILS_INCDIR:PATH=/usr/include -DLLVM_BUILD_TESTS:BOOL=ON -DLLVM_ENABLE_ASSERTIONS:BOOL=ON -DLLVM_ENABLE_LLD:BOOL=ON -DLLVM_ENABLE_PROJECTS:STRING=clang;libcxx;libcxxabi;compiler-rt;lld -DLLVM_ENABLE_SPHINX:BOOL=ON -DLLVM_ENABLE_WERROR:BOOL=ON -DLLVM_PARALLEL_LINK_JOBS:STRING=1 -DLLVM_TARGETS_TO_BUILD:STRING=X86 -DLLVM_USE_SANITIZER:STRING=Address Additional env variables: export MALLOC_CHECK_=3 export MALLOC_PERTURB_=$(($RANDOM % 255 + 1)) export ASAN_OPTIONS=abort_on_error=1 export UBSAN_OPTIONS=print_stacktrace=1
226 ↗	(On Diff #136303)	Needs a comment about the choice of static size of Record. I.e. the maximal amount of stuff we expect to push there is recordname string (right now `IsDefinition` is the longest at `13` chars) + 1 integer. And add a newline // Notes SmallVector<uint32_t, 16> Record; llvm::BitstreamWriter &Stream; ...
clang-doc/Mapper.cpp
28 ↗	(On Diff #136303)	+// If we should ignore this declaration, exit this decl ?
clang-doc/Mapper.h
30 ↗	(On Diff #136303)	I wonder if we could reflect the usage of `RecursiveASTVisitor` in the class name. Though `ClangDocMapperASTVisitor` sounds too long?
clang-doc/Representation.h
27 ↗	(On Diff #136303)	Is there an intentional decision to minimize `sizeof()` of these structs? Many(?) of those could be `SmallString`'s
test/CMakeLists.txt
44	There is are no tests with `CommentBlock` blocks.
test/clang-doc/mapper-class-in-class.cpp
6 ↗	(On Diff #136161)	Ok, so this actually produced `c:@S@X.bc` and `c:@S@X@S@Y.bc`. Please do something like: // RUN: llvm-bcanalyzer %t/docs/c:@S@X.bc --dump \| FileCheck %s --check-prefix CHECK-X // RUN: llvm-bcanalyzer %t/docs/c:@S@X@S@Y.bc --dump \| FileCheck %s --check-prefix CHECK-X-Y // CHECK-X: <BLOCKINFO_BLOCK/> // CHECK-X: <VersionBlock NumWords=1 BlockCodeSize=4> // CHECK-X: <Version abbrevid=4 op0=1/> // CHECK-X: </VersionBlock> // CHECK-X: <RecordBlock NumWords=6 BlockCodeSize=4> // CHECK-X: <USR abbrevid=4 op0=6/> blob data = 'c:@S@X' // CHECK-X: <Name abbrevid=5 op0=1/> blob data = 'X' // CHECK-X: <IsDefinition abbrevid=7 op0=1/> // CHECK-X: <TagType abbrevid=10 op0=3/> // CHECK-X: </RecordBlock> // CHECK-X-Y: <BLOCKINFO_BLOCK/> // CHECK-X-Y: <VersionBlock NumWords=1 BlockCodeSize=4> // CHECK-X-Y: <Version abbrevid=4 op0=1/> // CHECK-X-Y: </VersionBlock> // CHECK-X-Y: <RecordBlock NumWords=11 BlockCodeSize=4> // CHECK-X-Y: <USR abbrevid=4 op0=10/> blob data = 'c:@S@X@S@Y' // CHECK-X-Y: <Name abbrevid=5 op0=1/> blob data = 'Y' // CHECK-X-Y: <Namespace abbrevid=6 op0=1 op1=6/> blob data = 'c:@S@X' // CHECK-X-Y: <IsDefinition abbrevid=7 op0=1/> // CHECK-X-Y: <TagType abbrevid=10 op0=3/> // CHECK-X-Y: </RecordBlock> On a related note, is there any way to auto-generate these `CHECK` lines? There is this `llvm/utils/update_test_checks.py`, but i doubt it will work here.
test/clang-doc/mapper-class-in-function.cpp
8 ↗	(On Diff #136161)	Here too, i suppose
test/clang-doc/mapper-enum.cpp
7–8 ↗	(On Diff #136303)	Could you please also add a similar `enum class` test?
17 ↗	(On Diff #136303)	Can `TypeBlock` be on the same depth as `VersionBlock`? Via `using`/`typename`? If yes, please add such a test.
test/clang-doc/mapper-method.cpp
8 ↗	(On Diff #136161)	And here

Fixing comments and adding tests

Thank you for working on this!
Some more nitpicking.

Please consider adding even more tests (ideally, all this code should have 100% test coverage)

clang-doc/BitcodeWriter.cpp
139 ↗	(On Diff #136520)	This change is not covered by tests. (I've actually found out that the hard way, by trying to find why it didn't trigger any asssertions, oh well)
325 ↗	(On Diff #136520)	I think it would be cleaner to move it (at least the enterblock, it might make sense to leave the header at the very top) after the static variable
363 ↗	(On Diff #136520)	I.e. ... , FUNCTION_IS_METHOD}}}; Stream.EnterBlockInfoBlock(); for (const auto &Block : TheBlocks) { assert(Block.second.size() < (1U << BitCodeConstants::SubblockIDSize)); emitBlockInfo(Block.first, Block.second); } Stream.ExitBlock(); emitVersion(); }
clang-doc/BitcodeWriter.h
19 ↗	(On Diff #136520)	Please sort includes, clang-tidy complains.
32 ↗	(On Diff #136520)	/build/clang-tools-extra/clang-doc/BitcodeWriter.h:32:23: warning: invalid case style for variable 'VERSION_NUMBER' [readability-identifier-naming] static const unsigned VERSION_NUMBER = 1; ^~~~~~~~~~~~~~ VersionNumber
163 ↗	(On Diff #136520)	The simplest solution would be #ifndef NDEBUG // Don't want explicit dtor unless needed ~ClangDocBitcodeWriter() { // Check that the static size is large-enough. assert(Record.capacity() == BitCodeConstants::RecordSize); } #endif
228 ↗	(On Diff #136520)	So you want to be really definitive with this. I wanted to avoid that, actually.. Then i'm afraid one more assert is needed, to make sure this is actually true. I'm not seeing any way to make `SmallVector` completely static, so you could either add one more wrapper around it (rather ugly), or check the final size in the `ClangDocBitcodeWriter` destructor (will not pinpoint when the size has 'overflowed')
246 ↗	(On Diff #136520)	Does it ever make sense to output `BlockInfoBlock` anywhere else other than once at the very beginning? I'd think you should drop the boolean param, and unconditinally call the `emitBlockInfoBlock();` from `ClangDocBitcodeWriter::ClangDocBitcodeWriter()` ctor.
248 ↗	(On Diff #136520)	The naming choices confuse me. There is `writeBitstream()` and `emitBlock()`, which is called from `writeBitstream()` to write the actual contents of the block. Why one is `write` and another is `emit`? To match the `BitstreamWriter` naming choices? (which uses `Emit` prefix)? To avoid the confusion of which one outputs the actual content, and which one outputs the whole block? I think it should be: - void emitBlock(const NamespaceInfo &I); + void emitBlockContent(const NamespaceInfo &I); - void ClangDocBitcodeWriter::writeBitstream(const T &I, bool WriteBlockInfo); + void ClangDocBitcodeWriter::emitBlock(const T &I, bool EmitBlockInfo); This way, i think their names would clearner-er state what they do, and won't be weirdly different. What do you think?
clang-doc/Representation.h
18 ↗	(On Diff #136520)	Please sort includes, clang-tidy complains.
clang-doc/Serialize.cpp
88 ↗	(On Diff #136520)	/build/clang-tools-extra/clang-doc/Serialize.cpp:88:17: warning: invalid case style for variable 'i' [readability-identifier-naming] for (unsigned i = 0, e = C->getNumArgs(); i < e; ++i) ^ ~ ~~ I I I /build/clang-tools-extra/clang-doc/Serialize.cpp:88:24: warning: invalid case style for variable 'e' [readability-identifier-naming] for (unsigned i = 0, e = C->getNumArgs(); i < e; ++i) ^ ~~ E E
107 ↗	(On Diff #136520)	/build/clang-tools-extra/clang-doc/Serialize.cpp:107:19: warning: invalid case style for variable 'i' [readability-identifier-naming] for (unsigned i = 0, e = C->getDepth(); i < e; ++i) ^ ~ ~~ I I I /build/clang-tools-extra/clang-doc/Serialize.cpp:107:26: warning: invalid case style for variable 'e' [readability-identifier-naming] for (unsigned i = 0, e = C->getDepth(); i < e; ++i) ^ ~~ E E
clang-doc/Serialize.h
19 ↗	(On Diff #136520)	Please sort includes, clang-tidy complains.
clang-doc/tool/ClangDocMain.cpp
81	Why at the beginning though? Couldn't the user pass `-extra-arg=-fno-parse-all-comments`, which could override this?

Adding tests, fixing comments, and removing an (as-of-yet) unused element of the CommentInfo struct.

clang-doc/BitcodeWriter.cpp
139 ↗	(On Diff #136520)	So after a some digging, this particular field can't be tested right now as the mapper doesn't look at any `TemplateDecl`s (something that definitely needs to be implemented, but in a follow-on patch). I've removed it for now, until it can be properly used/tested.
196 ↗	(On Diff #135682)	If that makes sense to you, sounds good to me!
clang-doc/BitcodeWriter.h
37 ↗	(On Diff #136161)	Figured it out -- the `Reference` struct didn't have default for the enum, and so if it wasn't initialized it was undefined. Should be fixed now.
test/clang-doc/mapper-enum.cpp
17 ↗	(On Diff #136303)	Not currently -- I'm planning to add that functionality in the future, but right now it ignores typedef or using decls.

Could some other people please review this differential, too?
I'm sure i have missed things.

Some more nitpicking.

For this differential as standalone, i'we mostly run out of things to nitpick.
Some things can probably be done better (the blockid/recordid stuff could probably be nicer if tablegen-ed, but that is for later).

I'll try to look at the next differential, and at them combined.

clang-doc/BitcodeWriter.cpp
120 ↗	(On Diff #136650)	We don't actually push these strings to the `Record` (but instead output them directly), so this assertion is not really meaningful, i think?
clang-doc/BitcodeWriter.h
21 ↗	(On Diff #136650)	+DenseMap
21 ↗	(On Diff #136650)	+StringRef
197 ↗	(On Diff #136650)	Humm, you could avoid this constant, and conserve a few bits, if you move the init-list out of `emitBlockInfoBlock()` to somewhere e.g. after the `enum RecordId`, and then since the `BlockId ID` is already passed, you could compute it on-the-fly the same way the `BitCodeConstants::SubblockIDSize` is asserted in `emitBlockInfo*()`. Not sure if it's worth doing though. Maybe just add it as a `NOTE` here.
249 ↗	(On Diff #136650)	Stale comment
clang-doc/Representation.h
60 ↗	(On Diff #136650)	`Info *Ref;` isn't used anywhere
117 ↗	(On Diff #136650)	`llvm::Optional<Location> DefLoc;` ?

Addressing comments

lebedev.ri added inline comments.Mar 2 2018, 10:38 AM

clang-doc/Representation.h
117 ↗	(On Diff #136791)	I meant that `IsDefinition` controls whether `DefLoc` will be set/used or not. So with `llvm::Optional<Location> DefLoc`, you don't need the `bool IsDefinition`.

Removing IsDefinition field.

clang-doc/Representation.h
117 ↗	(On Diff #136791)	That...makes so much sense. Oops. Thank you!

Eugene.Zelenko added inline comments.Mar 5 2018, 6:15 PM

clang-doc/BitcodeWriter.h
160 ↗	(On Diff #136809)	Looks like Clang-format was applied incorrectly, because this is Google, not LLVM style. Please note that it doesn't modify file, just output formatted code to terminal. Please reformat other files, including those in dependent patches.

My apologies for getting back on this so late!

In D41102#1017683, @juliehockett wrote:

So, as an idea (as this diff implements), I updated the string references to be a struct, which holds the USR of the referenced type (for serialization, both here in the mapper and for the dump option in the reducer, as well as a pointer to an Info struct. This pointer is not used at this point, but would be populated by the reducer. Thoughts?

This seems like quite a decent approach! That being said, I don't see the pointer yet? I assume you mean that you will be adding this? Additionally, a slight disadvantage of doing this generic approach is that you need to do bookkeeping on what it is referencing, but I guess there's no helping that due to the architecture which makes you rely upon the USR? Personally I'd prefer having the explicit types if and where possible. So for now a RecordInfo has a vecotr of Reference's to its parents, but we know the parents can only be of certain kinds (more than just a RecordType, but you get the point); it won't be an enum, namespace or function.

As I mentioned, we did this the other way around, which also has the slight advantage that I only had to create and save the USR once per info instance (as in, 10 references to a class only add the overhead of 10 pointers, rather than each having the USR as well), but our disadvantage was of course that we had delayed serialization (although we could arguably do both simultaneously). It seems each method has its merits :).

In D41102#1028228, @Athosvk wrote:

This seems like quite a decent approach! That being said, I don't see the pointer yet? I assume you mean that you will be adding this? Additionally, a slight disadvantage of doing this generic approach is that you need to do bookkeeping on what it is referencing, but I guess there's no helping that due to the architecture which makes you rely upon the USR? Personally I'd prefer having the explicit types if and where possible. So for now a RecordInfo has a vecotr of Reference's to its parents, but we know the parents can only be of certain kinds (more than just a RecordType, but you get the point); it won't be an enum, namespace or function.

If you take a look at the follow-on patch to this (D43341), you'll see that that is where the pointer is added in (since it is irrelevant to the mapper portion, as it cannot be filled out until the information has been reduced). The back references to children and whatnot are also added there.

As I mentioned, we did this the other way around, which also has the slight advantage that I only had to create and save the USR once per info instance (as in, 10 references to a class only add the overhead of 10 pointers, rather than each having the USR as well), but our disadvantage was of course that we had delayed serialization (although we could arguably do both simultaneously). It seems each method has its merits :).

The USRs are kept for serialization purposes -- given the modular nature of the design, the goal is to be able to write out the bitstream and have it be consumable with all necessary information. Since we can't write out pointers (and it would be useless if we did, since they would change as soon as the file was read in), we maintain the USRs to have a means of re-finding the referenced declaration.

That said, I was looking at the Clangd symbol indexing code yesterday, and noticed that they're hashing the USRs (since they get a little lengthy, particularly when you have nested and/or overloaded functions). I'm going to take a look at that today to try to make the USRs more space-efficient here.

Adding hashing to reduce the size of USRs and updating tests.

Nice!
Some further notes based on the SHA1 nature.

clang-doc/BitcodeWriter.cpp
74 ↗	(On Diff #137244)	Those are mixed up. `USRLengthSize` is definitively supposed to be second.
81 ↗	(On Diff #137244)	The sha1 is all-printable, so how about using `BitCodeAbbrevOp::Encoding::Char6` ? Char4 would work best, but it is not there.
149 ↗	(On Diff #137244)	Ha, and all the `*_USR` are actually `StringAbbrev`'s, not confusing at all :)
309 ↗	(On Diff #137244)	Now it would make sense to also assert that this sha1(usr).strlen() == 20
clang-doc/BitcodeWriter.h
46 ↗	(On Diff #137244)	Can definitively lower this to `5U` (2^6 == 32, which is more than the 20 8-bit chars of sha1)
clang-doc/Representation.h
59 ↗	(On Diff #137244)	Now that USR is sha1'd, this is always 20 8-bit characters long.
107 ↗	(On Diff #137244)	`20` Maybe place `using USRString = SmallString<20>; // SHA1 of USR` somewhere and use it everywhere?

In D41102#1028760, @juliehockett wrote:

If you take a look at the follow-on patch to this (D43341), you'll see that that is where the pointer is added in (since it is irrelevant to the mapper portion, as it cannot be filled out until the information has been reduced). The back references to children and whatnot are also added there.

Oops! I'll have a look!

In D41102#1028760, @juliehockett wrote:

The USRs are kept for serialization purposes -- given the modular nature of the design, the goal is to be able to write out the bitstream and have it be consumable with all necessary information. Since we can't write out pointers (and it would be useless if we did, since they would change as soon as the file was read in), we maintain the USRs to have a means of re-finding the referenced declaration.

What I was referring to was the storing of a USR per reference. Of course, serializing pointers wouldn't work, but what I mean is that what we used as a USR was stored in what was pointed to, not in the reference that tells what we are pointing to. To be a little more concise, a RecordInfo has pointers to the FuntionInfo for its member functions. Upon serialization, the RecordInfo queries the USR of those functions. A function being referenced multiple times remains to only have the USR stored. If I understand correctly, you currently save the USR for time an InfoType references another InfoType.

Anyhow, don't pay too much attention to that comment, it's all meant as a minor thing. It sure is looking good so far!

In D41102#1028995, @lebedev.ri wrote:

Some further notes based on the SHA1 nature.

I'm sorry, brainfreeze, i meant 40 chars, not 20.
Updated comments...

clang-doc/BitcodeWriter.cpp
309 ↗	(On Diff #137244)	40 that is
clang-doc/BitcodeWriter.h
46 ↗	(On Diff #137244)	Edit: to 6U (2^6 == 64, which is more than the 40 8-bit chars of sha1)
clang-doc/Representation.h
59 ↗	(On Diff #137244)	40 that is
107 ↗	(On Diff #137244)	40

Updating bitcode writer for hashed USRs, and re-running clang-format. Also cleaning up a couple of unused fields.

Hmm, i'm missing something about the way store sha1...

clang-doc/BitcodeWriter.cpp
53 ↗	(On Diff #137457)	This is VBR because USRLengthSize is of such strange size, to conserve the bits?
57 ↗	(On Diff #137457)	Looking at the `NumWords` changes (decrease!) in the tests, and this is bugging me. And now that i have realized what we do with USR: we first compute SHA1, and get 20x uint8_t store/use it internally then hex-ify it, getting 40x char (assuming 8-bit char) then convert to char6, winning back two bits. but we still loose 2 bits. Question: why do we store sha1 of USR as a string? Why can't we just store that USRString (aka USRSha1 binary) directly? That would be just 20 bytes, you just couldn't go any lower than that.
clang-doc/Representation.h
29 ↗	(On Diff #137457)	Right, of course, internally this is kept in the binary format, which is just 20 chars. This is not the string (the hex-ified version of sha1), but the raw sha1, the binary. This should somehow convey that. This should be something closer to `USRSha1`.

There's a few places where we can trim some of the boilerplate, which I think is important - it's hard to find the "real code" among all the plumbing in places.
Other than that, this seems OK to me.

clang-doc/BitcodeWriter.h
116 ↗	(On Diff #137457)	I think you don't want to declare ID in the unspecialized template, so you get a compile error if you try to use it. (Using traits for this sort of thing seems a bit overboard to me, but YMMV)
154 ↗	(On Diff #137457)	Hmm, you spend a lot of effort plumbing this variable around! Why is it so important? Filesize? (I'm not that familiar with LLVM bitcode, but surely we'll end up with a string table anyway?) If it really is an important option people will want, the command-line arg should probably say why.
241 ↗	(On Diff #137457)	OK, I don't get this at all. We have to declare emitBlockContent(NamespaceInfo) and the specialization of MapFromInfoToBlockId<NamespaceInfo>, and deal with the public interface emitBlock being a template function where you can't tell what's legal to pass, instead of writing: void emitBlock(const NamespaceInfo &I) { SubStreamBlockGuard Block(Stream, BI_NAMESPACE_BLOCK_ID); // <-- this one line ... } This really seems like templates for the sake of templates :(
clang-doc/ClangDoc.h
11	This comment doesn't seem accurate - there's no main() in this file. There's a FrontendActionFactory, but nothing in this file uses it.
38	nit: seems odd to put all this implementation in the header. (personally I'd just expose a function returning unique_ptr<FrontendActionFactory> from the header, but up to you...)
39	for ASTConsumers implemented by ASTVisitors, there seems a fairly strong convention to just make the same class extend both (MapASTVisitor, here). That would eliminate one plumbing class...
clang-doc/Mapper.cpp
33 ↗	(On Diff #137457)	It seems a bit of a poor fit to use a complete bitcode file (header, version, block info) as your value format when you know the format, and know there'll be no version skew. Is it easy just to emit the block we care about?
clang-doc/Representation.h
29 ↗	(On Diff #137457)	I'm not sure that any of the implementation (either USR or SHA) belongs in the type name. In clangd we called this type SymbolID, which seems like a reasonable name here too.
44 ↗	(On Diff #137457)	this is probably the right place to document these fields - what are the legal kinds? what's the name of a comment, direction, etc?

This revision is now accepted and ready to land.Mar 8 2018, 4:51 PM

Closed by commit rL327102: [clang-doc] Setup clang-doc frontend framework (authored by juliehockett). · Explain WhyMar 8 2018, 7:21 PM

This revision was automatically updated to reflect the committed changes.

juliehockett marked 11 inline comments as done.

Herald added a subscriber: llvm-commits. · View Herald TranscriptMar 8 2018, 7:21 PM

Might have been better to not start landing until the all differentials are understood/accepted, but i understand that it is not really up to me to decide.
Let's hope nothing in the next differentials will require changes to this initial code :)

clang-doc/BitcodeWriter.h
241 ↗	(On Diff #137457)	If you want to add a new block, in one case you just need to add one template <> struct MapFromInfoToBlockId<???Info> { static const BlockId ID = BI_???_BLOCK_ID; }; In the other case you need to add whole void ClangDocBitcodeWriter::emitBlock(const ???Info &I) { StreamSubBlockGuard Block(Stream, BI_???_BLOCK_ID); emitBlockContent(I); } (and it was even longer initially) It seems just templating one static variable is shorter than duplicating `emitBlock()` each time, no? Do compare the current diff with the original diff state. I think these templates helped move much of the duplication to simplify the code overall.

Since the commit was reverted, did you mean to either recommit it, or reopen this (with updated diff), so it does not get lost?

In D41102#1034919, @lebedev.ri wrote:

Since the commit was reverted, did you mean to either recommit it, or reopen this (with updated diff), so it does not get lost?

Relanded in r327295.

clang-doc/BitcodeWriter.h
154 ↗	(On Diff #137457)	It was for testing purposes (so that the tests aren't flaky on filenames), but I replaced it with regex.
241 ↗	(On Diff #137457)	You'd still have to add the appropriate `emitBlock()` function for any new block, since it would have different attributes.
clang-doc/Mapper.cpp
33 ↗	(On Diff #137457)	Ideally, yes, but right now in the clang BitstreamWriter there's no way to tell the instance what all the abbreviations are without also emitting the blockinfo to the output stream, though I'm thinking about taking a stab at separating the two. Also, this relies on the llvm-bcanalyzer for testing, which requires both the header and the blockinfo in order to read the data :/

lebedev.ri added inline comments.Mar 14 2018, 1:44 PM

clang-doc/BitcodeWriter.cpp
230 ↗	(On Diff #136303)	And https://github.com/mattgodbolt/compiler-explorer/issues/841 is done, so now we can see that `SmallVector::append()` at least results in less code: https://godbolt.org/g/xJQ59c

So what part is failing, specifically?
The SHA1 blobs of USR's differ in the llvm-bcanalyzer dumps?
The actual filenames %t/docs/bc/<sha1-to-text> differ?
I guess both?

First one you should be able to handle by replacing the actual values with a regex
(i'd guess <USR abbrevid=4 op0=20 op1=11 <...> op19=226 op20=232/> -> <USR abbrevid=4 .*/>, but did not try)
I'm not sure we care about the actual values here, do we?

Second one is interesting.
If we assume that the order in which those are generated is the same, which i think is a safer assumption,
then you could just use result id, not key (sha1-to-text of USR), i.e. %t/docs/bc/00.bc, %t/docs/bc/01.bc and so on.
I.e. something like:

  if (DumpMapperResult) {
+   unsigned id = 0;
    Exec->get()->getToolResults()->forEachResult([&](StringRef Key,
                                                     StringRef Value) {
      SmallString<128> IRRootPath;
      llvm::sys::path::native(OutDirectory, IRRootPath);
      llvm::sys::path::append(IRRootPath, "bc");
      std::error_code DirectoryStatus =
          llvm::sys::fs::create_directories(IRRootPath);
      if (DirectoryStatus != OK) {
        llvm::errs() << "Unable to create documentation directories.\n";
        return;
      }
-     llvm::sys::path::append(IRRootPath, Key + ".bc");
+     llvm::sys::path::append(IRRootPath, std::to_string(id) + ".bc");
      std::error_code OutErrorInfo;
      llvm::raw_fd_ostream OS(IRRootPath, OutErrorInfo, llvm::sys::fs::F_None);
      if (OutErrorInfo != OK) {
        llvm::errs() << "Error opening documentation file.\n";
        return;
      }
      OS << Value;
      OS.close();
+     id++;
    });
  }

Hm, or possibly you could just pass the triple to clang?

I was just thinking of disabling the one test that has an issue (class-in-function) on Windows -- the filename is only used in generating *some* USRs, so all of the other ones are fine. We ran into some issues with that though, since UNSUPPORTED: system-windows didn't seem to disable the test on the machine I have access to. Thoughts?

In D41102#1041773, @juliehockett wrote:

I was just thinking of disabling the one test that has an issue (class-in-function) on Windows -- the filename is only used in generating *some* USRs, so all of the other ones are fine. We ran into some issues with that though, since UNSUPPORTED: system-windows didn't seem to disable the test on the machine I have access to. Thoughts?

UNSUPPORTED: system-windows

Perhaps that is only for msvc?

Have you tried something more broad, like
UNSUPPORTED: mingw32,win32
?

In D41102#1041791, @lebedev.ri wrote:

Have you tried something more broad, like
UNSUPPORTED: mingw32,win32
?

That wasn't working either, confusingly, at least on the local windows machine I have.

Huh, something weird is going on there.
What about the other way around, REQUIRES: linux ?

After much digging, it looks like the lit config is never initialized in clang-tools-extra like it is in the other projects. REQUIRES et.al. work properly once that's in there (see D44708). Once that lands I'll reland this and *hopefully* that'll be that!

hintonda removed a subscriber: hintonda.Mar 24 2018, 11:57 AM

Revision Contents

Path

Size

CMakeLists.txt

1 line

clang-doc/

22 lines

60 lines

86 lines

90 lines

559 lines

95 lines

283 lines

ClangDocRepresentation.h

99 lines

tool/

CMakeLists.txt

18 lines

ClangDocMain.cpp

100 lines

docs/

clang-doc.rst

62 lines

test/

CMakeLists.txt

1 line

clang-doc/

mapper-namespace.cpp

70 lines

mapper-type.cpp

137 lines

Diff 133726

CMakeLists.txt

	add_subdirectory(clang-apply-replacements)			add_subdirectory(clang-apply-replacements)
	add_subdirectory(clang-reorder-fields)			add_subdirectory(clang-reorder-fields)
	add_subdirectory(modularize)			add_subdirectory(modularize)
	if(CLANG_ENABLE_STATIC_ANALYZER)			if(CLANG_ENABLE_STATIC_ANALYZER)
	add_subdirectory(clang-tidy)			add_subdirectory(clang-tidy)
	add_subdirectory(clang-tidy-vs)			add_subdirectory(clang-tidy-vs)
	endif()			endif()

	add_subdirectory(change-namespace)			add_subdirectory(change-namespace)
				add_subdirectory(clang-doc)
	add_subdirectory(clang-query)			add_subdirectory(clang-query)
	add_subdirectory(clang-move)			add_subdirectory(clang-move)
	add_subdirectory(clangd)			add_subdirectory(clangd)
	add_subdirectory(include-fixer)			add_subdirectory(include-fixer)
	add_subdirectory(pp-trace)			add_subdirectory(pp-trace)
	add_subdirectory(tool-template)			add_subdirectory(tool-template)

	# Add the common testsuite after all the tools.			# Add the common testsuite after all the tools.
	Show All 11 Lines

clang-doc/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS
				support
				)

				add_clang_library(clangDoc
				ClangDoc.cpp
				ClangDocMapper.cpp
				ClangDocBinary.cpp

				LINK_LIBS
				clangAnalysis
				clangAST
				clangASTMatchers
				clangBasic
				clangFormat
				clangFrontend
				clangLex
				clangTooling
				clangToolingCore
				)

				add_subdirectory(tool)

clang-doc/ClangDoc.h

This file was added.

				//===-- ClangDoc.h - ClangDoc ------------------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANGDOC_H
				#define LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANGDOC_H
				sammccallUnsubmitted Done Reply Inline Actions This comment doesn't seem accurate - there's no main() in this file. There's a FrontendActionFactory, but nothing in this file uses it. sammccall: This comment doesn't seem accurate - there's no main() in this file. There's a…

				#include <string>
				#include <vector>
				#include "ClangDocBinary.h"
				#include "ClangDocMapper.h"
				#include "clang/AST/AST.h"
				#include "clang/AST/ASTConsumer.h"
				#include "clang/AST/ASTContext.h"
				#include "clang/AST/Comment.h"
				#include "clang/AST/RecursiveASTVisitor.h"
				#include "clang/ASTMatchers/ASTMatchFinder.h"
				#include "clang/ASTMatchers/ASTMatchersInternal.h"
				#include "clang/Frontend/ASTConsumers.h"
				#include "clang/Frontend/CompilerInstance.h"
				#include "clang/Frontend/FrontendActions.h"
				#include "clang/Tooling/Tooling.h"

				using namespace clang::ast_matchers;

				namespace clang {
				namespace doc {

				/// Callback class for matchers.
				/// Parses each match and sends it along to the reporter for serialization.
				class ClangDocCallback : public MatchFinder::MatchCallback {
				public:
				sammccallUnsubmitted Done Reply Inline Actions Having `ClangDocMain` responsible for building the MatchFinder, but `ClangDoc` responsible for implementing the callbacks seems like an odd choice for layering: there's a deep implicit contract between the matchers and the callbacks, they are going to end up being tightly coupled so the split doesn't gain much using MatchCallback as the interface exposes a detail you're quite likely to want to change. Some heavy users of ASTMatchers end up moving to explicit AST traversal for efficiency reasons. It would seem cleaner to have the MatchFinder and collection of callbacks all owned by one class in `ClangDoc.cpp`, and just have `ClangDoc.h` expose a function that creates the `FrontendActionFactory` from it. This gives you a narrower interface with less implicit contracts, where ASTMatchers is an implementation detail of this TU. sammccall: Having `ClangDocMain` responsible for building the MatchFinder, but `ClangDoc` responsible for…
				ClangDocCallback(StringRef BoundName, ExecutionContext &ECtx,
				sammccallUnsubmitted Done Reply Inline Actions nit: seems odd to put all this implementation in the header. (personally I'd just expose a function returning unique_ptr<FrontendActionFactory> from the header, but up to you...) sammccall: nit: seems odd to put all this implementation in the header. (personally I'd just expose a…
				ClangDocBinaryWriter &Writer)
				sammccallUnsubmitted Done Reply Inline Actions Something seems slightly off here: we register a separate ClangDocCallback for each type of decl, but then each one detects what node it actually got... There are a few ways to reduce this duplication: (most reduction) use RecursiveASTVisitor, which naturally couples type and handling code (the matchers seem trivial, which makes this feasible) use separate callbacks for each type (a ClangDocCallback<T>?) (least reduction) create one callback and add it a bunch of times, or once with an anyof() matcher sammccall: Something seems slightly off here: we register a separate ClangDocCallback for each type of…
				sammccallUnsubmitted Done Reply Inline Actions for ASTConsumers implemented by ASTVisitors, there seems a fairly strong convention to just make the same class extend both (MapASTVisitor, here). That would eliminate one plumbing class... sammccall: for ASTConsumers implemented by ASTVisitors, there seems a fairly strong convention to just…
				: ECtx(ECtx), Mapper(Writer), BoundName(BoundName) {}
				virtual void run(const MatchFinder::MatchResult &Result) final;

				private:
				lebedev.riUnsubmitted Done Reply Inline Actions Eww, `tooling::FrontendActionFactory::create()` is supposed to return owning pointer, not `std::unique_ptr<>` :/ lebedev.ri: Eww, `tooling::FrontendActionFactory::create()` is supposed to return owning pointer, not `std…
				template <class T>
				void processMatchedDecl(const T *D);
				int getLine(const NamedDecl *D) const;
				StringRef getFile(const NamedDecl *D) const;
				lebedev.riUnsubmitted Done Reply Inline Actions I would add an empty line: private: template <class T> void processMatchedDecl(const T D); int getLine(const NamedDecl D) const; StringRef getFile(const NamedDecl D) const; comments::FullComment getComment(const NamedDecl D) const; std::string getName(const NamedDecl D) const; lebedev.ri: //I// would add an empty line: ``` private: template <class T> void processMatchedDecl…
				comments::FullComment getComment(const NamedDecl D) const;
				lebedev.riUnsubmitted Done Reply Inline Actions Please add space before `{}`, and drop unneeded `;` lebedev.ri: Please add space before `{}`, and drop unneeded `;`
				std::string getName(const NamedDecl *D) const;
				lebedev.riUnsubmitted Done Reply Inline Actions s/virtual/override/ ? lebedev.ri: s/virtual/override/ ?

				ASTContext *Context;
				ExecutionContext &ECtx;
				ClangDocMapper Mapper;
				StringRef BoundName;
				};

				} // namespace doc
				} // namespace clang

				#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANGDOC_H
				lebedev.riUnsubmitted Done Reply Inline Actions s/virtual/override/ ? lebedev.ri: s/virtual/override/ ?

clang-doc/ClangDoc.cpp

This file was added.

				//===-- ClangDoc.cpp - ClangDoc ---------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "ClangDoc.h"
				#include "clang/AST/AST.h"
				#include "clang/AST/Comment.h"
				#include "clang/AST/Mangle.h"
				#include "clang/ASTMatchers/ASTMatchFinder.h"
				#include "clang/ASTMatchers/ASTMatchersInternal.h"
				#include "llvm/Bitcode/BitstreamWriter.h"

				using namespace clang;
				using namespace clang::ast_matchers;
				using namespace clang::tooling;
				using namespace llvm;

				namespace clang {
				namespace doc {

				template <typename T>
				void ClangDocCallback::processMatchedDecl(const T *D) {
				if (!Context->getSourceManager().isWrittenInMainFile(D->getLocation()))
				return;
				std::string Name = getName(D);
				ECtx.reportResult(
				Name, Mapper.emitInfo(D, getComment(D), Name, getLine(D), getFile(D)));
				}
				lebedev.riUnsubmitted Not Done Reply Inline Actions I wonder if `Name` should be `std::move()`'d ? Or not, `reportResult()` seems to take `StringRef`... (in general, it might be a good idea to run clang-tidy on the code) lebedev.ri: I wonder if `Name` should be `std::move()`'d ? Or not, `reportResult()` seems to take…
				juliehockettAuthorUnsubmitted Not Done Reply Inline Actions So the `ExecutionContext` can do implement different ways to do this -- in this case, the default container created is the `InMemoryToolResults`, which technically takes in `StringRef`s, but copies their data to its in-memory representation: void InMemoryToolResults::addResult(StringRef Key, StringRef Value) { KVResults.push_back({Key.str(), Value.str()}); } A different implementation of it (i.e. a results container not in memory) would likely have to be backed by a file, so the data would be written out there anyways. juliehockett: So the `ExecutionContext` can do implement different ways to do this -- in this case, the…

				void ClangDocCallback::run(const MatchFinder::MatchResult &Result) {
				Context = Result.Context;
				if (const auto *M = Result.Nodes.getNodeAs<NamespaceDecl>(BoundName))
				processMatchedDecl(M);
				sammccallUnsubmitted Done Reply Inline Actions if you're just going to call processMatchedDecl anyway, why pass in `BoundName` and allow it to vary rather than using a fixed string? sammccall: if you're just going to call processMatchedDecl anyway, why pass in `BoundName` and allow it to…
				else if (const auto *M = Result.Nodes.getNodeAs<RecordDecl>(BoundName))
				processMatchedDecl(M);
				else if (const auto *M = Result.Nodes.getNodeAs<EnumDecl>(BoundName))
				processMatchedDecl(M);
				else if (const auto *M = Result.Nodes.getNodeAs<CXXMethodDecl>(BoundName))
				processMatchedDecl(M);
				else if (const auto *M = Result.Nodes.getNodeAs<FunctionDecl>(BoundName))
				processMatchedDecl(M);
				}

				jakehehrlichUnsubmitted Done Reply Inline Actions I don't see any reason this can't be a const method. If I recall a previous version you said that it can be it can't be const because it modifies the Comment but that shouldn't violate the this being a const method. No modifications are being made to any members of this object and no non-const references/pointers to any of the members are accessed or needed. jakehehrlich: I don't see any reason this can't be a const method. If I recall a previous version you said…
				comments::FullComment ClangDocCallback::getComment(const NamedDecl D) const {
				RawComment *Comment = Context->getRawCommentForDeclNoCache(D);
				// FIXME: Move setAttached to the initial comment parsing.
				if (Comment) {
				Comment->setAttached();
				return Comment->parse(*Context, nullptr, D);
				}
				return nullptr;
				}

				int ClangDocCallback::getLine(const NamedDecl *D) const {
				return Context->getSourceManager().getPresumedLoc(D->getLocStart()).getLine();
				}

				StringRef ClangDocCallback::getFile(const NamedDecl *D) const {
				return Context->getSourceManager()
				.getPresumedLoc(D->getLocStart())
				.getFilename();
				}

				std::string ClangDocCallback::getName(const NamedDecl *D) const {
				if (const auto *F = dyn_cast<FunctionDecl>(D)) {
				sammccallUnsubmitted Not Done Reply Inline Actions this needs a comment describing what/why it's doing. In particular, you're usually using qnames, but sometimes mangled names? Downstream consumers are going to have a very hard time doing something meaningful with that. Where you need a stable, machine-readable identifier that distinguishes between overloads, I'd suggest USR (see USRGeneration). Where you need something human readable, you should think carefully about the representation you want, and try to avoid depending on it being unique. sammccall: this needs a comment describing what/why it's doing. In particular, you're usually using…
				MangleContext *MC = Context->createMangleContext();
				std::string S;
				llvm::raw_string_ostream MangledName(S);
				if (const auto *Ctor = dyn_cast<CXXConstructorDecl>(F))
				MC->mangleCXXCtor(Ctor, CXXCtorType::Ctor_Complete, MangledName);
				else if (const auto *Dtor = dyn_cast<CXXDestructorDecl>(F))
				MC->mangleCXXDtor(Dtor, CXXDtorType::Dtor_Complete, MangledName);
				else
				MC->mangleName(F, MangledName);
				return MangledName.str();
				}
				return D->getQualifiedNameAsString();
				}

				} // namespace doc
				} // namespace clang

clang-doc/ClangDocBinary.h

This file was added.

				//===-- ClangDocBinary.h - ClangDoc Binary ---------------------- C++ --===//
				//
				sammccallUnsubmitted Done Reply Inline Actions (As well as a file comment, this two-word description is pretty confusing - binary used as a noun seems like it would refer to the compiled clang-doc tool itself) sammccall: (As well as a file comment, this two-word description is pretty confusing - binary used as a…
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_BINARY_H
				#define LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_BINARY_H

				#include <vector>
				#include "ClangDocRepresentation.h"
				#include "clang/AST/AST.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Bitcode/BitstreamReader.h"
				#include "llvm/Bitcode/BitstreamWriter.h"

				using namespace llvm;

				namespace clang {
				namespace doc {

				class AbbreviationMap {
				llvm::DenseMap<unsigned, unsigned> Abbrevs;

				sammccallUnsubmitted Done Reply Inline Actions nit: llvm style is e.g. `BI_NamespaceBlockID` with prefix or `BlockID::NamespaceBlockID` (using enum class) sammccall: nit: llvm style is e.g. `BI_NamespaceBlockID` with prefix or `BlockID::NamespaceBlockID` (using…
				public:
				AbbreviationMap() {}
				void set(unsigned recordID, unsigned abbrevID);
				unsigned get(unsigned recordID);
				void clear();
				};

				class ClangDocBinaryWriter {
				public:
				ClangDocBinaryWriter(bool OmitFilenames = false)
				: OmitFilenames(OmitFilenames){};

				using RecordData = SmallVector<uint64_t, 128>;

				template <typename T>
				void writeBitstream(const T &I, BitstreamWriter &Stream);

				private:
				void emitBlockInfoBlock(BitstreamWriter &Stream);
				void emitHeader(BitstreamWriter &Stream);
				void emitStringRecord(StringRef Str, unsigned RecordId,
				BitstreamWriter &Stream);
				void emitLocationRecord(int LineNumber, StringRef File, unsigned RecordId,
				BitstreamWriter &Stream);
				void emitIntRecord(int Value, unsigned RecordId, BitstreamWriter &Stream);
				void emitNamedTypeBlock(const NamedType &N, StringRef ID,
				BitstreamWriter &Stream);
				void emitCommentBlock(const CommentInfo *I, BitstreamWriter &Stream);

				void emitRecordID(unsigned ID, const char *Name, BitstreamWriter &Stream);
				void emitBlockID(unsigned ID, const char *Name, BitstreamWriter &Stream);
				lebedev.riUnsubmitted Done Reply Inline Actions Should these take `StringRef` instead of `const char ` ? lebedev.ri:* Should these take `StringRef` instead of `const char *` ?
				lebedev.riUnsubmitted Done Reply Inline Actions Also, isn't the first param always a `BlockIds`? Why not pass enumerators, and make it more obvious? lebedev.ri: Also, isn't the first param always a `BlockIds`? Why not pass enumerators, and make it more…
				void emitStringAbbrev(unsigned D, unsigned Block, BitstreamWriter &Stream);
				void emitLocationAbbrev(unsigned D, unsigned Block, BitstreamWriter &Stream);
				void emitIntAbbrev(unsigned D, unsigned Block, BitstreamWriter &Stream);

				RecordData Record;
				lebedev.riUnsubmitted Done Reply Inline Actions ^ I think all these `unsigned Block` is actually a `BlockIds Block` ? And `unsigned D` is actually `DataTypes D` ? lebedev.ri: ^ I think all these `unsigned Block` is actually a `BlockIds Block` ? And `unsigned D` is…
				bool OmitFilenames;
				AbbreviationMap Abbrevs;
				};

				class ClangDocBinaryReader {
				public:
				ClangDocBinaryReader(raw_ostream &OS) : OS(OS) {}

				using RecordData = SmallVector<uint64_t, 128>;

				bool readBitstream(SmallString<2048> Bits);

				private:
				enum class Cursor { BadBlock = 1, Record, BlockEnd, BlockBegin };
				bool readBlock(llvm::BitstreamCursor &Stream, unsigned ID);
				Cursor skipUntilRecordOrBlock(llvm::BitstreamCursor &Stream,
				unsigned &BlockOrRecordID);

				Optional<llvm::BitstreamBlockInfo> BlockInfo;
				std::map<unsigned, StringRef> RecordNames;
				lebedev.riUnsubmitted Done Reply Inline Actions Nice! Some thoughts: I agree it makes sense to keep it close to the enum definition, in header... This will result in global constructor. Generally they are frowned upon in LLVM. But since this is a standalone binary, it may be ok? Have you tried using `StringRef` here, instead of `std::string`? `std::map` is in general a bad idea. Since the `enum`'s enumerators are all small and consecutive, maybe try `llvm::IndexedMap`? lebedev.ri: Nice! Some thoughts: 1. I agree it makes sense to keep it close to the enum definition, in…
				lebedev.riUnsubmitted Done Reply Inline Actions Also, this should be `static const`, since the underlying enum won't change on the fly. `#llvm` suggests to use TableGen here, i'm not sure how that would work. As i have now noticed, there isn't a init-list constructor, so I think something like this might work: static const llvm::IndexedMap<BlockId> BlockIdNameMap = []() { llvm::IndexedMap<BlockId> map; map.reserve(BI_LAST); // There is no init-list constructor for the IndexedMap, so have to improvise static const std::initializer_list<std::pair<BlockId, const char* const>> inits = { {NAMESPACE_BLOCK_ID, "NamespaceBlock"}, ... }; for(const auto& init : inits) map[init.first] = init.second; }(); Also, even though `llvm::IndexedMap<>` is using `llvm::SmallVector<>` internally, it does not expose the initial size as template parameter, unfortunately, but hardcodes it to `0`. I think it would be great to add one more template parameter to `llvm::IndexedMap<>`, which would default to `0`, but would allow us here to avoid all memory allocation altogether. What do you think? If you do agree that using `IndexedMap` seems like the right choice, but don't want to write the patch for template parameter, i might look into it.. lebedev.ri: Also, this should be `static const`, since the underlying enum won't change on the fly.
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Had to play with it a bit, but it's working now. For the template parameter, I'm happy to take a look! Avoiding allocation here would be great. juliehockett: Had to play with it a bit, but it's working now. For the template parameter, I'm happy to take…
				raw_ostream &OS;
				};

				} // namespace doc
				} // namespace clang

				#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_BINARY_H
				lebedev.riUnsubmitted Done Reply Inline Actions Same as in the previous note ^ lebedev.ri: Same as in the previous note ^
				lebedev.riUnsubmitted Done Reply Inline Actions i'd add an empty line before `void emitStringAbbrev(RecordId ID, BlockId Block, BitstreamWriter &Stream);` lebedev.ri: i'd add an empty line before `void emitStringAbbrev(RecordId ID, BlockId Block, BitstreamWriter…
				sammccallUnsubmitted Done Reply Inline Actions again, this template really seems like it's a set of overloads. sammccall: again, this template really seems like it's a set of overloads.

clang-doc/ClangDocBinary.cpp

This file was added.

				//===-- ClangDocBinary.cpp - ClangDoc Binary -------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "ClangDocBinary.h"

				using namespace llvm;

				namespace clang {
				namespace doc {

				enum BlockIds {
				NAMESPACE_BLOCK_ID = bitc::FIRST_APPLICATION_BLOCKID,
				lebedev.riUnsubmitted Done Reply Inline Actions I wonder if you could add a map from `BlockIds` enumerator to the textual representation. e.g. `NAMESPACE_BLOCK_ID` -> "NamespaceBlock" ... `RECORD_BLOCK_ID` -> "RecordBlock" ... This would allow to only pass the `BlockId`, and avoid passing hardcoded string each time. lebedev.ri: I wonder if you could add a map from `BlockIds` enumerator to the textual representation. e.g.
				NONDEF_BLOCK_ID,
				ENUM_BLOCK_ID,
				NAMED_TYPE_BLOCK_ID,
				RECORD_BLOCK_ID,
				FUNCTION_BLOCK_ID,
				COMMENT_BLOCK_ID,
				BI_FIRST = NAMESPACE_BLOCK_ID,
				BI_LAST = COMMENT_BLOCK_ID
				};

				#define INFODATATYPES(X) X##_FULLY_QUALIFIED_NAME, X##_NAME, X##_NAMESPACE,

				enum DataTypes {
				lebedev.riUnsubmitted Done Reply Inline Actions Maybe `"Unknown Abbreviation."` ? lebedev.ri: Maybe `"Unknown Abbreviation."` ?
				COMMENT_KIND = 1,
				COMMENT_TEXT,
				COMMENT_NAME,
				COMMENT_POSITION,
				COMMENT_DIRECTION,
				COMMENT_PARAMNAME,
				COMMENT_CLOSENAME,
				COMMENT_SELFCLOSING,
				COMMENT_EXPLICIT,
				COMMENT_ATTRKEY,
				COMMENT_ATTRVAL,
				COMMENT_ARG,
				NAMED_TYPE_ID,
				NAMED_TYPE_TYPE,
				NAMED_TYPE_NAME,
				NAMED_TYPE_ACCESS,
				INFODATATYPES(NAMESPACE)
				INFODATATYPES(NONDEF)
				NONDEF_LOCATION,
				INFODATATYPES(ENUM)
				ENUM_LOCATION,
				ENUM_SCOPED,
				ENUM_NAMED_TYPE,
				ENUM_MEMBER,
				INFODATATYPES(RECORD)
				RECORD_LOCATION,
				RECORD_TAG_TYPE,
				RECORD_MEMBER,
				RECORD_PARENT,
				RECORD_VPARENT,
				INFODATATYPES(FUNCTION)
				FUNCTION_LOCATION,
				FUNCTION_MANGLED_NAME,
				lebedev.riUnsubmitted Done Reply Inline Actions So, these `emitStringAbbrev()`, `emitLocationAbbrev()` and `emitIntAbbrev()` are quite similar. How about something like: template <typename Lambda> void ClangDocBinaryWriter::emitAbbrev(RecordId ID, BlockId Block, Lambda &&L, &Stream) { auto Abbrev = std::make_shared<BitCodeAbbrev>(); Abbrev->Add(BitCodeAbbrevOp(ID)); L(Abbrev); Abbrevs.add(ID, Stream.EmitBlockInfoAbbrev(Block, std::move(Abbrev))); } void ClangDocBinaryWriter::emitStringAbbrev(RecordId ID, BlockId Block, BitstreamWriter &Stream) { auto EmitString = [](std::shared_ptr<BitCodeAbbrev> &Abbrev) { Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, BitCodeConstants::LineNumFixedSize)); // String size Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // String }; emitAbbrev(ID, Block, EmitString, Stream); } ... ? lebedev.ri: So, these `emitStringAbbrev()`, `emitLocationAbbrev()` and `emitIntAbbrev()` are quite similar.
				FUNCTION_PARENT,
				FUNCTION_ACCESS,
				DT_FIRST = COMMENT_KIND,
				DT_LAST = FUNCTION_ACCESS
				};

				lebedev.riUnsubmitted Done Reply Inline Actions And that de-magicking made these strings longer than `80` columns, boo :( lebedev.ri: And that de-magicking made these strings longer than `80` columns, boo :(
				#undef INFODATATYPES

				lebedev.riUnsubmitted Done Reply Inline Actions I think you should `std::move()` the `Abbrev`, it's not used afterwards in this function anyways, and it seems to generate nicer code https://godbolt.org/g/ow58JV lebedev.ri: I think you should `std::move()` the `Abbrev`, it's not used afterwards in this function…
				void AbbreviationMap::set(unsigned recordID, unsigned abbrevID) {
				lebedev.riUnsubmitted Done Reply Inline Actions So it does not set the abbreviation, since it is not supposed to be called if the abbreviation is already set, but it adds a unique abbreviation. I think it should be called `void AbbreviationMap::add(unsigned recordID, unsigned abbrevID)` then lebedev.ri: So it does not set the abbreviation, since it is not supposed to be called if the…
				lebedev.riUnsubmitted Done Reply Inline Actions This is marked as done, but the name is still the same, and no counter-comment was added, as far as i can see lebedev.ri: This is marked as done, but the name is still the same, and no counter-comment was added, as…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions It was changed to AbbreviationMap::add from AbbreviationMap::set, as you suggested -- unless I missed something in your comment? juliehockett: It was changed to AbbreviationMap::add from AbbreviationMap::set, as you suggested -- unless I…
				lebedev.riUnsubmitted Done Reply Inline Actions Oh right, sorry, i was thinking about some other code it seems. lebedev.ri: Oh right, sorry, i was thinking about some other code it seems.
				assert(Abbrevs.find(recordID) == Abbrevs.end() &&
				"Abbreviation already set.");
				Abbrevs[recordID] = abbrevID;
				}

				unsigned AbbreviationMap::get(unsigned recordID) {
				assert(Abbrevs.find(recordID) != Abbrevs.end() && "Abbreviation not set.");
				return Abbrevs[recordID];
				}

				void AbbreviationMap::clear() { Abbrevs.clear(); }

				void ClangDocBinaryWriter::emitHeader(BitstreamWriter &Stream) {
				// Emit the file header.
				Stream.Emit((unsigned)'D', 8);
				Stream.Emit((unsigned)'O', 8);
				lebedev.riUnsubmitted Done Reply Inline Actions General comment: shouldn't the bitcode be versioned? lebedev.ri: General comment: shouldn't the bitcode be versioned?
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Possibly? My understanding of the versioning (which could be incorrect) was that it was for the LLVM IR and how it is written in the given file -- I'm not writing to LLVM IR here, just using it as a data storage format, and so didn't think it was necessary. Happy to add it in though, but which version number should I use? juliehockett: Possibly? My understanding of the versioning (which could be incorrect) was that it was for the…
				lebedev.riUnsubmitted Done Reply Inline Actions The question i'm asking is: what will happen if two different (documenting different attributes, with non-identical `enum {something}Id`, etc) clang-doc's were used to generate two different parts of the docs (two different TU's)? When merging two parts, if the older clang-doc is used, will it only accept the part if bc it understands? Or fail altogether? And, does it make sense to allow to generate such mixed-up documentation? lebedev.ri: The question i'm asking is: what will happen if two different (documenting different attributes…
				juliehockettAuthorUnsubmitted Not Done Reply Inline Actions After some thought, I think it will depend on how the bitcode changes in the future. The reader can be implemented to simply ignore anything it doesn't recognize (with a default switch case), so that route is possible, but if the representation shifts in a major way it should probably just bail if the version is too early. I think this a good question to consider in implementing the reader and reducer portions of the tool -- for now, I've added the version number to the writer, so it can be checked in that part. juliehockett: After some thought, I think it will depend on how the bitcode changes in the future. The reader…
				Stream.Emit((unsigned)'C', 8);
				Stream.Emit((unsigned)'S', 8);
				}

				/// \brief Emits a block ID in the BLOCKINFO block.
				void ClangDocBinaryWriter::emitBlockID(unsigned ID, const char *Name,
				BitstreamWriter &Stream) {
				Record.clear();
				Record.push_back(ID);
				Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETBID, Record);

				lebedev.riUnsubmitted Done Reply Inline Actions So you are actually checking that there is either no string, or the string is of zero length here. Is this function ever going to be called with a null `Name`? All calls in this Differential always pass a static C string here. Also see my comments about passing enumerator, and having a map that would avoid passing string altogether. lebedev.ri: So you are actually checking that there is either no string, or the string is of zero length…
				// Emit the block name if present.
				if (!Name \|\| Name[0] == 0) return;
				Record.clear();
				while (Name) Record.push_back(Name++);
				Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_BLOCKNAME, Record);
				}

				/// \brief Emits a record ID in the BLOCKINFO block.
				void ClangDocBinaryWriter::emitRecordID(unsigned ID, const char *Name,
				BitstreamWriter &Stream) {
				Record.clear();
				Record.push_back(ID);
				while (Name) Record.push_back(Name++);
				Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETRECORDNAME, Record);
				}

				// Common Abbreviations

				void ClangDocBinaryWriter::emitStringAbbrev(unsigned D, unsigned Block,
				BitstreamWriter &Stream) {
				auto Abbrev = std::make_shared<BitCodeAbbrev>();
				lebedev.riUnsubmitted Done Reply Inline Actions These constants are somewhat vague. Maybe consolidate them somewhere somehow, e.g.: clang-doc/ClangDocBinary.cpp: namespace { struct BitCodeConstants { static constexpr int LineNumFixedSize = 16; ... } } lebedev.ri: These constants are somewhat vague. Maybe consolidate them somewhere somehow, e.g.: ``` clang…
				Abbrev->Add(BitCodeAbbrevOp(D));
				Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 16)); // String size
				Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // String
				Abbrevs.set(D, Stream.EmitBlockInfoAbbrev(Block, Abbrev));
				}

				void ClangDocBinaryWriter::emitLocationAbbrev(unsigned D, unsigned Block,
				BitstreamWriter &Stream) {
				auto Abbrev = std::make_shared<BitCodeAbbrev>();
				Abbrev->Add(BitCodeAbbrevOp(D));
				Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 16)); // Line number
				Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 16)); // Filename size
				lebedev.riUnsubmitted Done Reply Inline Actions There is a common patter here; I'd try to do something like: class StreamSubBlock { BitstreamWriter &Stream; public: StreamSubBlock(BitstreamWriter &Stream_, BlockId ID) : Stream(Stream_) { Stream.EnterSubblock(ID, BitCodeConstants::SubblockIDSize); } // Optionally, also delete all the other constructors / copy/move operators. ~StreamSubBlock() { Stream.ExitBlock(); } } void ClangDocBinaryWriter::emitNamedTypeBlock(const NamedType &N, NamedType::FieldName ID, BitstreamWriter &Stream) { StreamSubBlock Block(Stream, NAMED_TYPE_BLOCK_ID); emitIntRecord(ID, NAMED_TYPE_ID, Stream); emitStringRecord(N.Type, NAMED_TYPE_TYPE, Stream); emitStringRecord(N.Name, NAMED_TYPE_NAME, Stream); emitIntRecord(N.Access, NAMED_TYPE_ACCESS, Stream); } I.e. use RAII lebedev.ri: There is a common patter here; I'd try to do something like: ``` class StreamSubBlock {…
				Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Filename
				Abbrevs.set(D, Stream.EmitBlockInfoAbbrev(Block, Abbrev));
				}

				void ClangDocBinaryWriter::emitIntAbbrev(unsigned D, unsigned Block,
				BitstreamWriter &Stream) {
				auto Abbrev = std::make_shared<BitCodeAbbrev>();
				Abbrev->Add(BitCodeAbbrevOp(D));
				Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 16)); // Integer
				Abbrevs.set(D, Stream.EmitBlockInfoAbbrev(Block, Abbrev));
				}

				// Common Records

				void ClangDocBinaryWriter::emitStringRecord(StringRef Str, unsigned RecordId,
				BitstreamWriter &Stream) {
				if (Str.empty()) return;
				Record.clear();
				Record.push_back(RecordId);
				Record.push_back(Str.size());
				Stream.EmitRecordWithBlob(Abbrevs.get(RecordId), Record, Str);
				}

				void ClangDocBinaryWriter::emitLocationRecord(int LineNumber, StringRef File,
				unsigned RecordId,
				BitstreamWriter &Stream) {
				if (OmitFilenames) return;
				Record.clear();
				Record.push_back(RecordId);
				Record.push_back(LineNumber);
				Record.push_back(File.size());
				Stream.EmitRecordWithBlob(Abbrevs.get(RecordId), Record, File);
				}

				void ClangDocBinaryWriter::emitIntRecord(int Value, unsigned RecordId,
				BitstreamWriter &Stream) {
				if (!Value) return;
				Record.clear();
				Record.push_back(RecordId);
				Record.push_back(Value);
				Stream.EmitRecordWithAbbrev(Abbrevs.get(RecordId), Record);
				}

				// Common Blocks

				void ClangDocBinaryWriter::emitNamedTypeBlock(const NamedType &N, StringRef ID,
				lebedev.riUnsubmitted Done Reply Inline Actions From the docs i can see that `5` is `CodeLen`, but how is this decided, etc? Seems like a magical constant, maybe consolidate them somewhere, like in the previous note? lebedev.ri: From the docs i can see that `5` is `CodeLen`, but how is this decided, etc? Seems like a…
				BitstreamWriter &Stream) {
				Stream.EnterSubblock(NAMED_TYPE_BLOCK_ID, 5);
				emitStringRecord(ID, NAMED_TYPE_ID, Stream);
				emitStringRecord(N.Type, NAMED_TYPE_TYPE, Stream);
				emitStringRecord(N.Name, NAMED_TYPE_NAME, Stream);
				emitIntRecord(N.Access, NAMED_TYPE_ACCESS, Stream);
				Stream.ExitBlock();
				}

				void ClangDocBinaryWriter::emitCommentBlock(const CommentInfo *I,
				BitstreamWriter &Stream) {
				Stream.EnterSubblock(COMMENT_BLOCK_ID, 5);
				emitStringRecord(I->Kind, COMMENT_KIND, Stream);
				emitStringRecord(I->Text, COMMENT_TEXT, Stream);
				emitStringRecord(I->Name, COMMENT_NAME, Stream);
				emitStringRecord(I->Direction, COMMENT_DIRECTION, Stream);
				emitStringRecord(I->ParamName, COMMENT_PARAMNAME, Stream);
				emitStringRecord(I->CloseName, COMMENT_CLOSENAME, Stream);
				emitIntRecord(I->SelfClosing, COMMENT_SELFCLOSING, Stream);
				emitIntRecord(I->Explicit, COMMENT_EXPLICIT, Stream);
				for (const auto &A : I->AttrKeys)
				emitStringRecord(A, COMMENT_ATTRKEY, Stream);
				for (const auto &A : I->AttrValues)
				emitStringRecord(A, COMMENT_ATTRVAL, Stream);
				for (const auto &A : I->Args) emitStringRecord(A, COMMENT_ARG, Stream);
				for (const auto &P : I->Position)
				emitStringRecord(P, COMMENT_POSITION, Stream);
				for (const auto &C : I->Children) emitCommentBlock(C.get(), Stream);
				Stream.ExitBlock();
				}

				void ClangDocBinaryWriter::emitBlockInfoBlock(BitstreamWriter &Stream) {
				Abbrevs.clear();
				emitHeader(Stream);
				Stream.EnterBlockInfoBlock();

				// Comment Block
				emitBlockID(COMMENT_BLOCK_ID, "CommentBlock", Stream);
				emitRecordID(COMMENT_KIND, "Kind", Stream);
				emitRecordID(COMMENT_TEXT, "Text", Stream);
				emitRecordID(COMMENT_NAME, "Name", Stream);
				emitRecordID(COMMENT_DIRECTION, "Direction", Stream);
				emitRecordID(COMMENT_PARAMNAME, "ParamName", Stream);
				emitRecordID(COMMENT_CLOSENAME, "CloseName", Stream);
				emitRecordID(COMMENT_SELFCLOSING, "SelfClosing", Stream);
				emitRecordID(COMMENT_EXPLICIT, "Explicit", Stream);
				emitRecordID(COMMENT_ATTRKEY, "AtrrKey", Stream);
				emitRecordID(COMMENT_ATTRVAL, "AttrVal", Stream);
				emitRecordID(COMMENT_ARG, "Arg", Stream);
				emitRecordID(COMMENT_POSITION, "Position", Stream);
				emitStringAbbrev(COMMENT_KIND, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_TEXT, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_NAME, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_POSITION, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_DIRECTION, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_PARAMNAME, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_CLOSENAME, COMMENT_BLOCK_ID, Stream);
				emitIntAbbrev(COMMENT_SELFCLOSING, COMMENT_BLOCK_ID, Stream);
				emitIntAbbrev(COMMENT_EXPLICIT, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_ATTRKEY, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_ATTRVAL, COMMENT_BLOCK_ID, Stream);
				emitStringAbbrev(COMMENT_ARG, COMMENT_BLOCK_ID, Stream);

				// NamedType Block
				emitBlockID(NAMED_TYPE_BLOCK_ID, "NamedTypeBlock", Stream);
				emitRecordID(NAMED_TYPE_ID, "ID", Stream);
				emitRecordID(NAMED_TYPE_TYPE, "Type", Stream);
				emitRecordID(NAMED_TYPE_NAME, "Name", Stream);
				emitRecordID(NAMED_TYPE_ACCESS, "Access", Stream);
				emitStringAbbrev(NAMED_TYPE_ID, NAMED_TYPE_BLOCK_ID, Stream);
				emitStringAbbrev(NAMED_TYPE_TYPE, NAMED_TYPE_BLOCK_ID, Stream);
				emitStringAbbrev(NAMED_TYPE_NAME, NAMED_TYPE_BLOCK_ID, Stream);
				emitIntAbbrev(NAMED_TYPE_ACCESS, NAMED_TYPE_BLOCK_ID, Stream);

				#define INFORECORD(X) \
				emitRecordID(X##_FULLY_QUALIFIED_NAME, "FullyQualifiedName", Stream); \
				emitRecordID(X##_NAME, "Name", Stream); \
				emitRecordID(X##_NAMESPACE, "Namespace", Stream);

				#define INFOABBREV(X) \
				emitStringAbbrev(X##_FULLY_QUALIFIED_NAME, X##_BLOCK_ID, Stream); \
				emitStringAbbrev(X##_NAME, X##_BLOCK_ID, Stream); \
				emitStringAbbrev(X##_NAMESPACE, X##_BLOCK_ID, Stream);

				// Namespace Block
				emitBlockID(NAMESPACE_BLOCK_ID, "NamespaceBlock", Stream);
				INFORECORD(NAMESPACE)
				INFOABBREV(NAMESPACE)

				// NonDef Block
				emitBlockID(NONDEF_BLOCK_ID, "NonDefBlock", Stream);
				INFORECORD(NONDEF)
				emitRecordID(NONDEF_LOCATION, "Location", Stream);
				INFOABBREV(NONDEF)
				emitLocationAbbrev(NONDEF_LOCATION, NONDEF_BLOCK_ID, Stream);

				// Enum Block
				emitBlockID(ENUM_BLOCK_ID, "EnumBlock", Stream);
				INFORECORD(ENUM)
				emitRecordID(ENUM_LOCATION, "Location", Stream);
				emitRecordID(ENUM_SCOPED, "Scoped", Stream);
				INFOABBREV(ENUM)
				emitLocationAbbrev(ENUM_LOCATION, ENUM_BLOCK_ID, Stream);
				emitIntAbbrev(ENUM_SCOPED, ENUM_BLOCK_ID, Stream);

				// Record Block
				emitBlockID(RECORD_BLOCK_ID, "RecordBlock", Stream);
				INFORECORD(RECORD)
				emitRecordID(RECORD_LOCATION, "Location", Stream);
				emitRecordID(RECORD_TAG_TYPE, "TagType", Stream);
				emitRecordID(RECORD_PARENT, "Parent", Stream);
				emitRecordID(RECORD_VPARENT, "VParent", Stream);
				INFOABBREV(RECORD)
				emitLocationAbbrev(RECORD_LOCATION, RECORD_BLOCK_ID, Stream);
				emitIntAbbrev(RECORD_TAG_TYPE, RECORD_BLOCK_ID, Stream);
				emitStringAbbrev(RECORD_PARENT, RECORD_BLOCK_ID, Stream);
				emitStringAbbrev(RECORD_VPARENT, RECORD_BLOCK_ID, Stream);

				// Function Block
				emitBlockID(FUNCTION_BLOCK_ID, "FunctionBlock", Stream);
				INFORECORD(FUNCTION)
				emitRecordID(FUNCTION_LOCATION, "Location", Stream);
				emitRecordID(FUNCTION_MANGLED_NAME, "MangledName", Stream);
				emitRecordID(FUNCTION_PARENT, "Parent", Stream);
				emitRecordID(FUNCTION_ACCESS, "Access", Stream);
				INFOABBREV(FUNCTION)
				emitLocationAbbrev(FUNCTION_LOCATION, FUNCTION_BLOCK_ID, Stream);
				emitStringAbbrev(FUNCTION_MANGLED_NAME, FUNCTION_BLOCK_ID, Stream);
				emitStringAbbrev(FUNCTION_PARENT, FUNCTION_BLOCK_ID, Stream);
				emitIntAbbrev(FUNCTION_ACCESS, FUNCTION_BLOCK_ID, Stream);

				#undef INFORECORDS
				#undef INFOABBREV

				Stream.ExitBlock();
				}

				// Info emission

				#define EMITINFO(X) \
				emitStringRecord(I.FullyQualifiedName, X##_FULLY_QUALIFIED_NAME, Stream); \
				emitStringRecord(I.SimpleName, X##_NAME, Stream); \
				emitStringRecord(I.Namespace, X##_NAMESPACE, Stream); \
				emitCommentBlock(&I.Description, Stream);

				template <>
				void ClangDocBinaryWriter::writeBitstream(const NamespaceInfo &I,
				BitstreamWriter &Stream) {
				emitBlockInfoBlock(Stream);
				Stream.EnterSubblock(NAMESPACE_BLOCK_ID, 5);
				EMITINFO(NAMESPACE)
				Stream.ExitBlock();
				}

				template <>
				void ClangDocBinaryWriter::writeBitstream(const NonDefInfo &I,
				BitstreamWriter &Stream) {
				emitBlockInfoBlock(Stream);
				Stream.EnterSubblock(NONDEF_BLOCK_ID, 5);
				EMITINFO(NONDEF)
				emitLocationRecord(I.LineNumber, I.Filename, NONDEF_LOCATION, Stream);
				Stream.ExitBlock();
				}

				template <>
				void ClangDocBinaryWriter::writeBitstream(const EnumInfo &I,
				BitstreamWriter &Stream) {
				emitBlockInfoBlock(Stream);
				Stream.EnterSubblock(ENUM_BLOCK_ID, 5);
				EMITINFO(ENUM)
				emitLocationRecord(I.LineNumber, I.Filename, ENUM_LOCATION, Stream);
				emitIntRecord(I.Scoped, ENUM_SCOPED, Stream);
				for (const auto &N : I.Members) emitNamedTypeBlock(N, "Member", Stream);
				Stream.ExitBlock();
				}

				template <>
				void ClangDocBinaryWriter::writeBitstream(const RecordInfo &I,
				BitstreamWriter &Stream) {
				emitBlockInfoBlock(Stream);
				Stream.EnterSubblock(RECORD_BLOCK_ID, 5);
				EMITINFO(RECORD)
				emitLocationRecord(I.LineNumber, I.Filename, RECORD_LOCATION, Stream);
				emitIntRecord(I.TagType, RECORD_TAG_TYPE, Stream);
				for (const auto &N : I.Members) emitNamedTypeBlock(N, "Member", Stream);
				for (const auto &P : I.Parents) emitStringRecord(P, RECORD_PARENT, Stream);
				for (const auto &P : I.VirtualParents)
				emitStringRecord(P, RECORD_VPARENT, Stream);
				Stream.ExitBlock();
				}

				template <>
				void ClangDocBinaryWriter::writeBitstream(const FunctionInfo &I,
				BitstreamWriter &Stream) {
				emitBlockInfoBlock(Stream);
				Stream.EnterSubblock(FUNCTION_BLOCK_ID, 5);
				EMITINFO(FUNCTION)
				emitLocationRecord(I.LineNumber, I.Filename, FUNCTION_LOCATION, Stream);
				emitStringRecord(I.MangledName, FUNCTION_MANGLED_NAME, Stream);
				emitStringRecord(I.Parent, FUNCTION_PARENT, Stream);
				emitNamedTypeBlock(I.ReturnType, "Return", Stream);
				for (const auto &N : I.Params) emitNamedTypeBlock(N, "Param", Stream);
				emitIntRecord(I.Access, FUNCTION_ACCESS, Stream);
				Stream.ExitBlock();
				}

				#undef EMITINFO

				// Reader

				bool ClangDocBinaryReader::readBitstream(SmallString<2048> Bits) {
				BitstreamCursor Stream(Bits);

				if (Stream.AtEndOfStream()) return false;

				// Sniff for the signature.
				if (Stream.Read(8) != 'D' \|\| Stream.Read(8) != 'O' \|\| Stream.Read(8) != 'C' \|\|
				Stream.Read(8) != 'S')
				return false;

				// Read the top level blocks.
				while (!Stream.AtEndOfStream()) {
				unsigned Code = Stream.ReadCode();
				if (Code != bitc::ENTER_SUBBLOCK) return false;

				switch (Stream.ReadSubBlockID()) {
				case llvm::bitc::BLOCKINFO_BLOCK_ID: {
				BlockInfo = Stream.ReadBlockInfoBlock(/ReadBlockInfoNames=/true);
				if (!BlockInfo) return false;
				Stream.setBlockInfo(&*BlockInfo);
				// Extract the record names associated with each field
				for (unsigned i = BI_FIRST; i <= BI_LAST; ++i) {
				for (const auto &N : (*BlockInfo).getBlockInfo(i)->RecordNames)
				RecordNames[N.first] = N.second;
				}
				continue;
				}
				case NAMESPACE_BLOCK_ID:
				if (readBlock(Stream, NAMESPACE_BLOCK_ID)) return true;
				continue;
				case NONDEF_BLOCK_ID:
				if (readBlock(Stream, NONDEF_BLOCK_ID)) return true;
				continue;
				case NAMED_TYPE_BLOCK_ID:
				if (readBlock(Stream, NAMED_TYPE_BLOCK_ID)) return true;
				continue;
				case COMMENT_BLOCK_ID:
				if (readBlock(Stream, COMMENT_BLOCK_ID)) return true;
				continue;
				case RECORD_BLOCK_ID:
				if (readBlock(Stream, RECORD_BLOCK_ID)) return true;
				continue;
				case ENUM_BLOCK_ID:
				if (readBlock(Stream, ENUM_BLOCK_ID)) return true;
				continue;
				case FUNCTION_BLOCK_ID:
				if (readBlock(Stream, FUNCTION_BLOCK_ID)) return true;
				continue;
				default:
				if (!Stream.SkipBlock()) return false;
				continue;
				}
				}
				return true;
				}

				bool ClangDocBinaryReader::readBlock(llvm::BitstreamCursor &Stream,
				unsigned ID) {
				if (Stream.EnterSubBlock(ID)) return false;

				SmallVector<uint64_t, 1024> Record;
				while (true) {
				unsigned BlockOrCode = 0;
				Cursor Res = skipUntilRecordOrBlock(Stream, BlockOrCode);

				switch (Res) {
				case Cursor::BadBlock:
				return false;
				case Cursor::BlockEnd:
				return true;
				case Cursor::BlockBegin:
				if (readBlock(Stream, BlockOrCode)) continue;
				if (!Stream.SkipBlock()) return false;
				continue;
				case Cursor::Record:
				break;
				}

				// Read the record.
				Record.clear();
				StringRef Blob;
				unsigned RecID = Stream.readRecord(BlockOrCode, Record, &Blob);
				if (RecID < DT_FIRST \|\| RecID > DT_LAST) continue;

				#define INFOCASES(X) \
				case X##_FULLY_QUALIFIED_NAME: \
				case X##_NAME: \
				case X##_NAMESPACE:

				switch ((DataTypes)RecID) {
				// Locations
				case ENUM_LOCATION:
				case RECORD_LOCATION:
				case FUNCTION_LOCATION:
				case NONDEF_LOCATION:
				OS << RecordNames[RecID] << ": " << Blob << ":" << Record[0] << "\n";
				continue;

				// Strings
				INFOCASES(NAMESPACE)
				INFOCASES(NONDEF)
				INFOCASES(ENUM)
				INFOCASES(RECORD)
				INFOCASES(FUNCTION)
				case NAMED_TYPE_ID:
				case NAMED_TYPE_TYPE:
				case NAMED_TYPE_NAME:
				case RECORD_PARENT:
				case RECORD_VPARENT:
				case FUNCTION_PARENT:
				case FUNCTION_MANGLED_NAME:
				case COMMENT_KIND:
				case COMMENT_TEXT:
				case COMMENT_NAME:
				case COMMENT_DIRECTION:
				case COMMENT_PARAMNAME:
				case COMMENT_CLOSENAME:
				case COMMENT_ATTRKEY:
				case COMMENT_ATTRVAL:
				case COMMENT_ARG:
				case COMMENT_POSITION:
				OS << RecordNames[RecID] << ": " << Blob << "\n";
				continue;

				// Ints
				case ENUM_SCOPED:
				case RECORD_TAG_TYPE:
				case NAMED_TYPE_ACCESS:
				case FUNCTION_ACCESS:
				case COMMENT_SELFCLOSING:
				case COMMENT_EXPLICIT:
				OS << RecordNames[RecID] << ": " << Record[0] << "\n";
				default:
				continue;
				}
				}
				}

				#undef INFOCASES

				ClangDocBinaryReader::Cursor ClangDocBinaryReader::skipUntilRecordOrBlock(
				llvm::BitstreamCursor &Stream, unsigned &BlockOrRecordID) {
				BlockOrRecordID = 0;

				while (!Stream.AtEndOfStream()) {
				unsigned Code = Stream.ReadCode();

				switch ((llvm::bitc::FixedAbbrevIDs)Code) {
				case llvm::bitc::ENTER_SUBBLOCK:
				BlockOrRecordID = Stream.ReadSubBlockID();
				return Cursor::BlockBegin;
				case llvm::bitc::END_BLOCK:
				if (Stream.ReadBlockEnd()) return Cursor::BadBlock;
				return Cursor::BlockEnd;
				case llvm::bitc::DEFINE_ABBREV:
				Stream.ReadAbbrevRecord();
				continue;
				case llvm::bitc::UNABBREV_RECORD:
				return Cursor::BadBlock;
				default:
				// We found a record.
				BlockOrRecordID = Code;
				return Cursor::Record;
				}
				}
				llvm_unreachable("Premature stream end.");
				}

				} // namespace doc
				} // namespace clang

clang-doc/ClangDocMapper.h

This file was added.

				//===-- ClangDocMapper.h - ClangDocMapper ------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_MAPPER_H
				#define LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_MAPPER_H

				#include <memory>
				#include <string>
				#include <vector>
				#include "ClangDocBinary.h"
				#include "ClangDocRepresentation.h"
				#include "clang/AST/AST.h"
				#include "clang/AST/ASTConsumer.h"
				#include "clang/AST/ASTContext.h"
				#include "clang/AST/CommentVisitor.h"
				#include "clang/AST/RecursiveASTVisitor.h"
				#include "clang/Frontend/ASTConsumers.h"
				#include "clang/Frontend/FrontendActions.h"
				#include "clang/Tooling/Execution.h"
				#include "clang/Tooling/Tooling.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace clang::comments;
				using namespace clang::tooling;

				namespace clang {
				namespace doc {

				class ClangDocCommentVisitor
				: public ConstCommentVisitor<ClangDocCommentVisitor> {
				sammccallUnsubmitted Done Reply Inline Actions why is this exposed? (and what does it do?) sammccall: why is this exposed? (and what does it do?)
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Moved it into the mapper class, but it traverses a comment and extracts its information into the `CommentInfo` struct juliehockett: Moved it into the mapper class, but it traverses a comment and extracts its information into…
				public:
				ClangDocCommentVisitor(CommentInfo &CI) : CurrentCI(CI) {}

				void parseComment(const comments::Comment *C);

				void visitTextComment(const TextComment *C);
				void visitInlineCommandComment(const InlineCommandComment *C);
				void visitHTMLStartTagComment(const HTMLStartTagComment *C);
				void visitHTMLEndTagComment(const HTMLEndTagComment *C);
				void visitBlockCommandComment(const BlockCommandComment *C);
				void visitParamCommandComment(const ParamCommandComment *C);
				void visitTParamCommandComment(const TParamCommandComment *C);
				void visitVerbatimBlockComment(const VerbatimBlockComment *C);
				void visitVerbatimBlockLineComment(const VerbatimBlockLineComment *C);
				void visitVerbatimLineComment(const VerbatimLineComment *C);

				private:
				StringRef getCommandName(unsigned CommandID) const;
				bool isWhitespaceOnly(StringRef S) const;

				CommentInfo &CurrentCI;
				};

				class ClangDocMapper {
				public:
				sammccallUnsubmitted Done Reply Inline Actions Naming: as things stand, `ClangDoc` looks like the mapper, and this is some sort of serializer helper: ClangDoc consumes the input, decides what to do with it, and writes the output. sammccall: Naming: as things stand, `ClangDoc` looks like the mapper, and this is some sort of serializer…
				ClangDocMapper(ClangDocBinaryWriter &Writer) : Writer(Writer) {}

				template <class C>
				StringRef emitInfo(const C D, const FullComment FC, StringRef Key,
				sammccallUnsubmitted Done Reply Inline Actions why is this a template method rather than a set of overloads? I think if you pass in the wrong type, you'll get (at best) a linker error instead of a useful compile error. sammccall: why is this a template method rather than a set of overloads? I think if you pass in the wrong…
				int LineNumber, StringRef File);
				sammccallUnsubmitted Done Reply Inline Actions when returning a stringref, it might pay to be explicit about who owns the data, so the caller knows the safe lifetime. (This isn't always spelled out in llvm, but should probably be done more often!) sammccall: when returning a stringref, it might pay to be explicit about who owns the data, so the caller…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Definitely reasonable, particularly since this one was left over from a past diff where the string buffer was created externally, and so now it shouldn't be returning the ref (as you noticed below). juliehockett: Definitely reasonable, particularly since this one was left over from a past diff where the…

				private:
				template <typename T>
				SmallString<2048> serialize(T &I) const;
				void populateSymbolInfo(SymbolInfo &I, StringRef Name, StringRef SimpleName,
				StringRef Namespace, const FullComment *C,
				int LineNumber, StringRef File);
				void populateFunctionInfo(FunctionInfo &I, const FunctionDecl *D,
				StringRef Name, const FullComment *C,
				int LineNumber, StringRef File);
				template <class C>
				StringRef serializeNonDefInfo(const C *D, StringRef Name,
				const FullComment *FC, int LineNumber,
				StringRef File);
				void parseFields(RecordInfo &I, const RecordDecl *D) const;
				void parseEnumerators(EnumInfo &I, const EnumDecl *D) const;
				void parseBases(RecordInfo &I, const CXXRecordDecl *D) const;
				void parseParameters(FunctionInfo &I, const FunctionDecl *D) const;
				void parseFullComment(const FullComment *C, CommentInfo &CI);
				std::string getParentNamespace(const DeclContext *D) const;

				ClangDocBinaryWriter &Writer;
				};

				} // namespace doc
				} // namespace clang

				#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_MAPPER_H
				jakehehrlichUnsubmitted Done Reply Inline Actions This should be a CommentInfo& to avoid copy constructing. It also then lets you view ClangDocCommentVisitor as a kind of CommentInfo builder jakehehrlich: This should be a CommentInfo& to avoid copy constructing. It also then lets you view…
				jakehehrlichUnsubmitted Done Reply Inline Actions This shouldn't return CommentInfo if you make CurrentCI a reference jakehehrlich: This shouldn't return CommentInfo if you make CurrentCI a reference

clang-doc/ClangDocMapper.cpp

This file was added.

				//===-- ClangDocMapper.cpp - ClangDoc Mapper ----------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "ClangDocMapper.h"
				#include "llvm/Support/Path.h"
				#include "llvm/Support/YAMLTraits.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;
				using clang::comments::FullComment;

				namespace clang {
				namespace doc {

				// ClangDocMapper

				template <>
				StringRef ClangDocMapper::emitInfo(const NamespaceDecl *D,
				const FullComment *FC, StringRef Name,
				int LineNumber, StringRef File) {
				NamespaceInfo I;
				I.FullyQualifiedName = Name;
				I.SimpleName = D->getNameAsString();
				I.Namespace = getParentNamespace(D);
				if (FC) parseFullComment(FC, I.Description);
				return serialize(I);
				}

				template <>
				StringRef ClangDocMapper::emitInfo(const RecordDecl D, const FullComment FC,
				StringRef Name, int LineNumber,
				StringRef File) {
				if (!D->isThisDeclarationADefinition())
				return serializeNonDefInfo(D, Name, FC, LineNumber, File);
				RecordInfo I;
				populateSymbolInfo(I, Name, D->getNameAsString(), getParentNamespace(D), FC,
				LineNumber, File);
				I.TagType = D->getTagKind();
				if (const auto *CXXR = dyn_cast<CXXRecordDecl>(D)) parseBases(I, CXXR);
				parseFields(I, D);
				return serialize(I);
				}

				template <>
				StringRef ClangDocMapper::emitInfo(const FunctionDecl D, const FullComment FC,
				StringRef Name, int LineNumber,
				StringRef File) {
				if (!D->isThisDeclarationADefinition())
				return serializeNonDefInfo(D, Name, FC, LineNumber, File);
				FunctionInfo I;
				populateFunctionInfo(I, D, Name, FC, LineNumber, File);
				I.Access = clang::AccessSpecifier::AS_none;
				return serialize(I);
				}

				template <>
				StringRef ClangDocMapper::emitInfo(const CXXMethodDecl *D,
				const FullComment *FC, StringRef Name,
				int LineNumber, StringRef File) {
				if (!D->isThisDeclarationADefinition())
				return serializeNonDefInfo(D, Name, FC, LineNumber, File);
				FunctionInfo I;
				populateFunctionInfo(I, D, Name, FC, LineNumber, File);
				I.Parent = D->getParent()->getQualifiedNameAsString();
				I.Access = D->getAccess();
				return serialize(I);
				}

				template <>
				StringRef ClangDocMapper::emitInfo(const EnumDecl D, const FullComment FC,
				StringRef Name, int LineNumber,
				StringRef File) {
				if (!D->isThisDeclarationADefinition())
				return serializeNonDefInfo(D, Name, FC, LineNumber, File);
				EnumInfo I;
				populateSymbolInfo(I, Name, D->getNameAsString(), getParentNamespace(D), FC,
				LineNumber, File);
				I.Scoped = D->isScoped();
				parseEnumerators(I, D);
				return serialize(I);
				}

				template <typename T>
				SmallString<2048> ClangDocMapper::serialize(T &I) const {
				SmallString<2048> Buffer;
				llvm::BitstreamWriter Stream(Buffer);
				jakehehrlichUnsubmitted Done Reply Inline Actions If/When you make CurrentCI a reference you should return CI here instead. jakehehrlich: If/When you make CurrentCI a reference you should return CI here instead.
				Writer.writeBitstream(I, Stream);
				return Buffer;
				}

				void ClangDocMapper::parseFullComment(const FullComment *C, CommentInfo &CI) {
				ClangDocCommentVisitor Visitor(CI);
				Visitor.parseComment(C);
				}

				void ClangDocMapper::populateSymbolInfo(SymbolInfo &I, StringRef Name,
				StringRef SimpleName,
				StringRef Namespace,
				const FullComment *C, int LineNumber,
				StringRef File) {
				I.FullyQualifiedName = Name;
				I.SimpleName = SimpleName;
				I.Namespace = Namespace;
				I.LineNumber = LineNumber;
				I.Filename = File;
				if (C) parseFullComment(C, I.Description);
				}

				void ClangDocMapper::populateFunctionInfo(FunctionInfo &I,
				const FunctionDecl *D, StringRef Name,
				const FullComment *C, int LineNumber,
				StringRef File) {
				populateSymbolInfo(I, D->getQualifiedNameAsString(), D->getNameAsString(),
				getParentNamespace(D), C, LineNumber, File);
				I.MangledName = Name;
				NamedType N;
				N.Type = D->getReturnType().getAsString();
				I.ReturnType = N;
				parseParameters(I, D);
				}

				template <class C>
				StringRef ClangDocMapper::serializeNonDefInfo(const C *D, StringRef Name,
				const FullComment *FC,
				int LineNumber, StringRef File) {
				NonDefInfo I;
				I.FullyQualifiedName = Name;
				I.SimpleName = D->getNameAsString();
				I.Namespace = getParentNamespace(D);
				I.LineNumber = LineNumber;
				I.Filename = File;
				if (FC) parseFullComment(FC, I.Description);
				return serialize(I);
				}

				void ClangDocMapper::parseFields(RecordInfo &I, const RecordDecl *D) const {
				for (const FieldDecl *F : D->fields()) {
				NamedType N;
				N.Type = F->getTypeSourceInfo()->getType().getAsString();
				N.Name = F->getQualifiedNameAsString();
				// FIXME: Set Access to the appropriate value.
				I.Members.emplace_back(N);
				}
				sammccallUnsubmitted Done Reply Inline Actions If I'm reading correctly, serialize() returns a SmallString by value, and now you're returning a (dangling) stringref to that temporary. sammccall: If I'm reading correctly, serialize() returns a SmallString by value, and now you're returning…
				}

				void ClangDocMapper::parseEnumerators(EnumInfo &I, const EnumDecl *D) const {
				for (const EnumConstantDecl *E : D->enumerators()) {
				NamedType N;
				N.Type = E->getQualifiedNameAsString();
				// FIXME: Set Access to the appropriate value.
				I.Members.emplace_back(N);
				}
				}

				void ClangDocMapper::parseParameters(FunctionInfo &I,
				const FunctionDecl *D) const {
				for (const ParmVarDecl *P : D->parameters()) {
				NamedType N;
				N.Type = P->getOriginalType().getAsString();
				N.Name = P->getQualifiedNameAsString();
				// FIXME: Set Access to the appropriate value.
				I.Params.emplace_back(N);
				}
				}

				void ClangDocMapper::parseBases(RecordInfo &I, const CXXRecordDecl *D) const {
				for (const CXXBaseSpecifier &B : D->bases()) {
				if (!B.isVirtual()) I.Parents.push_back(B.getType().getAsString());
				}
				for (const CXXBaseSpecifier &B : D->vbases())
				I.VirtualParents.push_back(B.getType().getAsString());
				}

				std::string ClangDocMapper::getParentNamespace(const DeclContext *D) const {
				if (const auto *N = dyn_cast<NamedDecl>(D->getParent())) {
				return N->getQualifiedNameAsString();
				jakehehrlichUnsubmitted Done Reply Inline Actions Assuming you make the reference changes above, this should be rewritten to something like the following: CurrentCI.Children.emplace_back(); ClangDocCommentVisitor Visitor(CurrentCI.Children.back()); Visitor.parseComment(Child); jakehehrlich: Assuming you make the reference changes above, this should be rewritten to something like the…
				}
				return "";
				}

				// ClangDocCommentVisitor

				void ClangDocCommentVisitor::parseComment(const comments::Comment *C) {
				CurrentCI.Kind = C->getCommentKindName();
				ConstCommentVisitor<ClangDocCommentVisitor>::visit(C);
				for (comments::Comment *Child :
				make_range(C->child_begin(), C->child_end())) {
				CurrentCI.Children.emplace_back(std::make_shared<CommentInfo>());
				ClangDocCommentVisitor Visitor(*CurrentCI.Children.back());
				Visitor.parseComment(Child);
				}
				}

				void ClangDocCommentVisitor::visitTextComment(const TextComment *C) {
				if (!isWhitespaceOnly(C->getText())) CurrentCI.Text = C->getText();
				}

				lebedev.riUnsubmitted Done Reply Inline Actions It would be nice if you could (as a new Differential) add a `children()` function to that class that will do that automatically. lebedev.ri: It would be nice if you could (as a new Differential) add a `children()` function to that class…
				juliehockettAuthorUnsubmitted Done Reply Inline Actions Will do :) (and same for the below) juliehockett: Will do :) (and same for the below)
				void ClangDocCommentVisitor::visitInlineCommandComment(
				const InlineCommandComment *C) {
				CurrentCI.Name = getCommandName(C->getCommandID());
				for (unsigned i = 0, e = C->getNumArgs(); i != e; ++i)
				CurrentCI.Args.push_back(C->getArgText(i));
				}

				void ClangDocCommentVisitor::visitHTMLStartTagComment(
				const HTMLStartTagComment *C) {
				CurrentCI.Name = C->getTagName();
				CurrentCI.SelfClosing = C->isSelfClosing();
				for (unsigned i = 0, e = C->getNumAttrs(); i < e; ++i) {
				const HTMLStartTagComment::Attribute &Attr = C->getAttr(i);
				CurrentCI.AttrKeys.push_back(Attr.Name);
				lebedev.riUnsubmitted Done Reply Inline Actions It would be awesome if you could (as a new Differential) add a `args_begin()`/`args_end()`/`args()` functions for range-based `for`. lebedev.ri: It would be awesome if you could (as a new Differential) add a `args_begin()`/`args_end…
				CurrentCI.AttrValues.push_back(Attr.Value);
				}
				}

				void ClangDocCommentVisitor::visitHTMLEndTagComment(
				const HTMLEndTagComment *C) {
				CurrentCI.Name = C->getTagName();
				CurrentCI.SelfClosing = true;
				lebedev.riUnsubmitted Done Reply Inline Actions It would be awesome if you could (as a new Differential) add a `attrs_begin()`/`attrs_end()`/`attrs()` functions for range-based `for`. lebedev.ri: It would be awesome if you could (as a new Differential) add a `attrs_begin()`/`attrs_end…
				}

				void ClangDocCommentVisitor::visitBlockCommandComment(
				const BlockCommandComment *C) {
				CurrentCI.Name = getCommandName(C->getCommandID());
				for (unsigned i = 0, e = C->getNumArgs(); i < e; ++i)
				CurrentCI.Args.push_back(C->getArgText(i));
				}

				void ClangDocCommentVisitor::visitParamCommandComment(
				const ParamCommandComment *C) {
				CurrentCI.Direction =
				ParamCommandComment::getDirectionAsString(C->getDirection());
				CurrentCI.Explicit = C->isDirectionExplicit();
				if (C->hasParamName() && C->isParamIndexValid())
				CurrentCI.ParamName = C->getParamNameAsWritten();
				lebedev.riUnsubmitted Done Reply Inline Actions It would be awesome if you could (as a new Differential) add a `args_begin()`/`args_end()`/`args()` functions for range-based `for`. lebedev.ri: It would be awesome if you could (as a new Differential) add a `args_begin()`/`args_end…
				}

				void ClangDocCommentVisitor::visitTParamCommandComment(
				const TParamCommandComment *C) {
				if (C->hasParamName() && C->isPositionValid())
				CurrentCI.ParamName = C->getParamNameAsWritten();

				if (C->isPositionValid()) {
				for (unsigned i = 0, e = C->getDepth(); i < e; ++i)
				CurrentCI.Position.push_back(std::to_string(C->getIndex(i)));
				}
				}

				void ClangDocCommentVisitor::visitVerbatimBlockComment(
				const VerbatimBlockComment *C) {
				CurrentCI.Name = getCommandName(C->getCommandID());
				CurrentCI.CloseName = C->getCloseName();
				}

				lebedev.riUnsubmitted Done Reply Inline Actions It would be awesome if you could (as a new Differential) add a `positions_begin()`/`positions_end()`/`positions()` functions for range-based `for`. lebedev.ri: It would be awesome if you could (as a new Differential) add a `positions_begin…
				void ClangDocCommentVisitor::visitVerbatimBlockLineComment(
				const VerbatimBlockLineComment *C) {
				if (!isWhitespaceOnly(C->getText())) CurrentCI.Text = C->getText();
				}

				void ClangDocCommentVisitor::visitVerbatimLineComment(
				const VerbatimLineComment *C) {
				jakehehrlichUnsubmitted Done Reply Inline Actions Can you use isspace here instead of keeping a list of characters that are considered spaces? jakehehrlich: Can you use isspace here instead of keeping a list of characters that are considered spaces?
				if (!isWhitespaceOnly(C->getText())) CurrentCI.Text = C->getText();
				}

				StringRef ClangDocCommentVisitor::getCommandName(unsigned CommandID) const {
				const CommandInfo *Info = CommandTraits::getBuiltinCommandInfo(CommandID);
				if (Info) return Info->Name;
				// TODO: Add parsing for \file command.
				return "<not a builtin command>";
				}

				bool ClangDocCommentVisitor::isWhitespaceOnly(StringRef S) const {
				return std::all_of(S.begin(), S.end(), isspace);
				}

				} // namespace doc
				} // namespace clang

clang-doc/ClangDocRepresentation.h

This file was added.

				///===-- ClangDocRepresentation.h - ClangDocRepresenation -------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_REPRESENTATION_H
				#define LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_REPRESENTATION_H

				#include <string>
				#include "clang/AST/Type.h"
				#include "clang/Basic/Specifiers.h"
				#include "llvm/ADT/SmallVector.h"

				using namespace llvm;

				namespace clang {
				namespace doc {

				// A representation of a parsed comment.
				struct CommentInfo {
				StringRef Kind;
				StringRef Text;
				StringRef Name;
				StringRef Direction;
				StringRef ParamName;
				StringRef CloseName;
				bool SelfClosing = false;
				bool Explicit = false;
				llvm::SmallVector<StringRef, 4> AttrKeys;
				llvm::SmallVector<StringRef, 4> AttrValues;
				llvm::SmallVector<StringRef, 4> Args;
				llvm::SmallVector<StringRef, 4> Position;
				std::vector<std::shared_ptr<CommentInfo>> Children;
				};

				// TODO: Pull the CommentInfo for a parameter or member out of the record or
				AthosvkUnsubmitted Done Reply Inline Actions I might be missing something, but can't this be a unique ptr? Shouldn't children of comments only have one parent? Athosvk: I might be missing something, but can't this be a unique ptr? Shouldn't children of comments…
				// function's CommentInfo.
				// Info for named types (parameters, members).
				struct NamedType {
				std::string Type;
				std::string Name;
				AccessSpecifier Access = clang::AccessSpecifier::AS_none;
				CommentInfo Description;
				AthosvkUnsubmitted Done Reply Inline Actions Perhaps use an enum class instead? Same goes for the other enums Athosvk: Perhaps use an enum class instead? Same goes for the other enums
				};

				/// A base struct for Infos.
				struct Info {
				std::string FullyQualifiedName;
				std::string SimpleName;
				std::string Namespace;
				CommentInfo Description;
				};

				struct NamespaceInfo : public Info {};

				struct SymbolInfo : public Info {
				int LineNumber;
				StringRef Filename;
				};

				AthosvkUnsubmitted Done Reply Inline Actions It's not too important for now , but you probably want to at least store the namespace identifier for each nested namespace at some point. So instead you store a vector of namespaces, which in the final markdown generation stage allows you to link to each namespace individually (assuming you'll have some kind of namespace overview pages) Athosvk: It's not too important for now , but you probably want to at least store the namespace…
				struct NonDefInfo : public SymbolInfo {};

				// TODO: Expand to allow for documenting templating.
				// Info for functions.
				struct FunctionInfo : public SymbolInfo {
				std::string MangledName;
				std::string Parent;
				NamedType ReturnType;
				llvm::SmallVector<NamedType, 4> Params;
				AccessSpecifier Access;
				};

				// TODO: Expand to allow for documenting templating, inheritance access,
				// friend classes
				// Info for types.
				struct RecordInfo : public SymbolInfo {
				TagTypeKind TagType;
				llvm::SmallVector<NamedType, 4> Members;
				llvm::SmallVector<std::string, 4> Parents;
				llvm::SmallVector<std::string, 4> VirtualParents;
				};

				// TODO: Expand to allow for documenting templating.
				// Info for types.
				struct EnumInfo : public SymbolInfo {
				bool Scoped;
				llvm::SmallVector<NamedType, 4> Members;
				};

				// TODO: Add functionality to include separate markdown pages.

				} // namespace doc
				} // namespace clang

				#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_DOC_CLANG_DOC_REPRESENTATION_H
				No newline at end of file

clang-doc/tool/CMakeLists.txt

This file was added.

				include_directories(${CMAKE_CURRENT_SOURCE_DIR}/..)

				add_clang_executable(clang-doc
				ClangDocMain.cpp
				)

				target_link_libraries(clang-doc
				PRIVATE
				clangAST
				clangASTMatchers
				clangBasic
				clangFormat
				clangFrontend
				clangDoc
				clangRewrite
				clangTooling
				clangToolingCore
				)

clang-doc/tool/ClangDocMain.cpp

This file was added.

				//===-- ClangDocMain.cpp - Clangdoc ------------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include <string>
				#include "ClangDoc.h"
				#include "ClangDocBinary.h"
				#include "clang/AST/AST.h"
				#include "clang/AST/Decl.h"
				#include "clang/ASTMatchers/ASTMatchFinder.h"
				#include "clang/ASTMatchers/ASTMatchersInternal.h"
				#include "clang/Driver/Options.h"
				#include "clang/Frontend/FrontendActions.h"
				#include "clang/Tooling/CommonOptionsParser.h"
				#include "clang/Tooling/Execution.h"
				#include "clang/Tooling/StandaloneExecution.h"
				#include "clang/Tooling/Tooling.h"
				#include "llvm/ADT/APFloat.h"
				#include "llvm/Support/Path.h"
				#include "llvm/Support/Process.h"
				#include "llvm/Support/Signals.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace clang::ast_matchers;
				using namespace clang::tooling;
				using namespace clang;
				using namespace llvm;

				namespace {

				static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
				static cl::OptionCategory ClangDocCategory("clang-doc options");

				static cl::opt<bool> DumpResult("dump", cl::desc("Dump results to stdout."),
				cl::init(false), cl::cat(ClangDocCategory));

				static cl::opt<bool> OmitFilenames("omit-filenames",
				lebedev.riUnsubmitted Done Reply Inline Actions Hmm, are you sure about `docs` being the param to specify where to output the docs? I'd expect to see `-o / --output` or a positional argument. Or is that impossible due to some parent LLVM/clang implicit requirements? lebedev.ri: Hmm, are you sure about `docs` being the param to specify where to output the docs? I'd…
				cl::desc("Omit filenames in output."),
				cl::init(false), cl::cat(ClangDocCategory));

				static cl::opt<bool> DoxygenOnly(
				lebedev.riUnsubmitted Done Reply Inline Actions `options are: md` Though this appears to be a dead code right now lebedev.ri: `options are: md` Though this appears to be a dead code right now
				"doxygen", cl::desc("Use only doxygen-style comments to generate docs."),
				cl::init(false), cl::cat(ClangDocCategory));

				} // namespace

				int main(int argc, const char **argv) {
				sys::PrintStackTraceOnErrorSignal(argv[0]);

				auto Exec = clang::tooling::createExecutorFromCommandLineArgs(
				argc, argv, ClangDocCategory);

				if (!Exec) {
				errs() << toString(Exec.takeError()) << "\n";
				return 1;
				}

				MatchFinder Finder;
				ExecutionContext *ECtx = Exec->get()->getExecutionContext();
				doc::ClangDocBinaryWriter Writer(OmitFilenames);

				doc::ClangDocCallback NCallback("namespace", *ECtx, Writer);
				Finder.addMatcher(namespaceDecl().bind("namespace"), &NCallback);
				doc::ClangDocCallback RCallback("record", *ECtx, Writer);
				Finder.addMatcher(recordDecl().bind("record"), &RCallback);
				doc::ClangDocCallback ECallback("enum", *ECtx, Writer);
				Finder.addMatcher(enumDecl().bind("enum"), &ECallback);
				doc::ClangDocCallback MCallback("method", *ECtx, Writer);
				Finder.addMatcher(cxxMethodDecl(isUserProvided()).bind("method"), &MCallback);
				doc::ClangDocCallback FCallback("function", *ECtx, Writer);
				Finder.addMatcher(functionDecl(unless(cxxMethodDecl())).bind("function"),
				&FCallback);

				ArgumentsAdjuster ArgAdjuster;
				if (!DoxygenOnly)
				ArgAdjuster = combineAdjusters(
				lebedev.riUnsubmitted Done Reply Inline Actions Why at the beginning though? Couldn't the user pass `-extra-arg=-fno-parse-all-comments`, which could override this? lebedev.ri: Why at the beginning though? Couldn't the user pass `-extra-arg=-fno-parse-all-comments`, which…
				getInsertArgumentAdjuster("-fparse-all-comments",
				tooling::ArgumentInsertPosition::BEGIN),
				ArgAdjuster);
				auto Err =
				Exec->get()->execute(newFrontendActionFactory(&Finder), ArgAdjuster);
				if (Err) errs() << toString(std::move(Err)) << "\n";

				if (DumpResult) {
				doc::ClangDocBinaryReader Reader(outs());
				Exec->get()->getToolResults()->forEachResult(
				[&Reader](StringRef Key, SmallString<2048> Value) {
				outs() << "---\n"
				<< "KEY: " << Key.str() << "\n";
				Reader.readBitstream(Value);
				});
				}

				lebedev.riUnsubmitted Done Reply Inline Actions This does not seem to be a erroneous situation to be in lebedev.ri: This does not seem to be a erroneous situation to be in
				return 0;
				}
				lebedev.riUnsubmitted Done Reply Inline Actions I'm having trouble following. `DumpResult` description says `Dump results to stdout.` Why does it need `OutDirectory`? lebedev.ri: I'm having trouble following. `DumpResult` description says `Dump results to stdout.` Why does…

docs/clang-doc.rst

This file was added.

				===================
				Clang-Doc
				===================

				.. contents::

				:program:`clang-doc` is a tool for generating C and C++ documenation from
				source code and comments.

				The tool is in a very early development stage, so you might encounter bugs and
				crashes. Submitting reports with information about how to reproduce the issue
				to `the LLVM bugtracker <https://llvm.org/bugs>`_ will definitely help the
				project. If you have any ideas or suggestions, please to put a feature request
				there.

				Use
				=====

				:program:`clang-doc` is a `LibTooling
				<http://clang.llvm.org/docs/LibTooling.html>`_-based tool, and so requires a
				compile command database for your project (for an example of how to do this
				see `How To Setup Tooling For LLVM
				<http://clang.llvm.org/docs/HowToSetupToolingForLLVM.html>`_).

				The tool can be used on a single file or multiple files as defined in
				the compile commands database:

				.. code-block:: console

				$ clang-doc file.cpp -p /path/to/compile/commands

				This generates an intermediate representation of the declarations and their
				associated information in the specified TUs, serialized to LLVM
				bitcode.

				As currently implemented, the tool is only able to parse TUs that can be
				stored in-memory. Future additions will extend the current framewwork to use
				map-reduce frameworks to allow for use with large codebases.
				lebedev.riUnsubmitted Done Reply Inline Actions framework lebedev.ri: framework

				:program:`clang-doc` offers the following options:

				.. code-block:: console

				$ clang-doc --help
				USAGE: clang-doc [options] <source0> [... <sourceN>]
				lebedev.riUnsubmitted Done Reply Inline Actions This does not seem to talk about the path where to store the generated docs. lebedev.ri: This does not seem to talk about the path where to store the generated docs.

				OPTIONS:

				Generic Options:

				-help - Display available options (-help-hidden for more)
				-help-list - Display list of available options (-help-list-hidden for more)
				-version - Display the version of this program

				clang-doc options:

				-doxygen - Use only doxygen-style comments to generate docs.
				-dump - Dump results to stdout.
				lebedev.riUnsubmitted Done Reply Inline Actions Please refresh the doc to actually show the `clang-doc` current help output. lebedev.ri: Please refresh the doc to actually show the `clang-doc` current help output.
				-extra-arg=<string> - Additional argument to append to the compiler command line
				-extra-arg-before=<string> - Additional argument to prepend to the compiler command line
				-omit-filenames - Omit filenames in output.
				-p=<string> - Build path

test/CMakeLists.txt

	Show All 35 Lines
	set(CLANG_TOOLS_TEST_DEPS			set(CLANG_TOOLS_TEST_DEPS
	# For the clang-apply-replacements test that uses clang-rename.			# For the clang-apply-replacements test that uses clang-rename.
	clang-rename			clang-rename

	# Individual tools we test.			# Individual tools we test.
	clang-apply-replacements			clang-apply-replacements
	clang-change-namespace			clang-change-namespace
	clangd			clangd
				clang-doc
				lebedev.riUnsubmitted Done Reply Inline Actions There is are no tests with `CommentBlock` blocks. lebedev.ri: There is are no tests with `CommentBlock` blocks.
	clang-include-fixer			clang-include-fixer
	clang-move			clang-move
	clang-query			clang-query
	clang-reorder-fields			clang-reorder-fields
	find-all-symbols			find-all-symbols
	modularize			modularize
	pp-trace			pp-trace

	Show All 36 Lines

test/clang-doc/mapper-namespace.cpp

This file was added.

				// RUN: rm -rf %t
				// RUN: mkdir %t
				// RUN: echo "" > %t/compile_flags.txt
				// RUN: cp "%s" "%t/test.cpp"
				// RUN: clang-doc --dump --omit-filenames -doxygen -p %t %t/test.cpp \| FileCheck %s

				namespace A {
				// CHECK: ---
				// CHECK: KEY: A
				// CHECK: FullyQualifiedName: A
				// CHECK: Name: A

				void f() {};
				// CHECK: ---
				// CHECK: KEY: _ZN1A1fEv
				// CHECK: FullyQualifiedName: A::f
				// CHECK: Name: f
				// CHECK: Namespace: A
				// CHECK: MangledName: _ZN1A1fEv
				// CHECK: ID: Return
				// CHECK: Type: void
				// CHECK: Access: 3
				// CHECK: Access: 3

				} // A

				namespace A {
				// CHECK: ---
				// CHECK: KEY: A
				// CHECK: FullyQualifiedName: A
				// CHECK: Name: A

				namespace B {
				// CHECK: ---
				// CHECK: KEY: A::B
				// CHECK: FullyQualifiedName: A::B
				// CHECK: Name: B
				// CHECK: Namespace: A

				enum E { X };
				// CHECK: ---
				// CHECK: KEY: A::B::E
				// CHECK: FullyQualifiedName: A::B::E
				// CHECK: Name: E
				// CHECK: Namespace: A::B
				// CHECK: ID: Member
				// CHECK: Type: A::B::X
				// CHECK: Access: 3

				E func(int i) {
				return X;
				}

				// CHECK: ---
				// CHECK: KEY: _ZN1A1B4funcEi
				// CHECK: FullyQualifiedName: A::B::func
				// CHECK: Name: func
				// CHECK: Namespace: A::B
				// CHECK: MangledName: _ZN1A1B4funcEi
				// CHECK: ID: Return
				// CHECK: Type: enum A::B::E
				// CHECK: Access: 3
				// CHECK: ID: Param
				// CHECK: Type: int
				// CHECK: Name: i
				// CHECK: Access: 3
				// CHECK: Access: 3

				} // B
				} // C

test/clang-doc/mapper-type.cpp

This file was added.

				// RUN: rm -rf %t
				// RUN: mkdir %t
				// RUN: echo "" > %t/compile_flags.txt
				// RUN: cp "%s" "%t/test.cpp"
				// RUN: clang-doc --dump --omit-filenames -doxygen -p %t %t/test.cpp \| FileCheck %s

				union A { int X; int Y; };
				// CHECK: ---
				// CHECK: KEY: A
				// CHECK: FullyQualifiedName: A
				// CHECK: Name: A
				// CHECK: TagType: 2
				// CHECK: ID: Member
				// CHECK: Type: int
				// CHECK: Name: A::X
				// CHECK: Access: 3
				// CHECK: ID: Member
				// CHECK: Type: int
				// CHECK: Name: A::Y
				// CHECK: Access: 3
				// CHECK: ---
				// CHECK: KEY: A::A
				// CHECK: FullyQualifiedName: A::A
				// CHECK: Name: A
				// CHECK: Namespace: A

				enum B { X, Y };
				// CHECK: ---
				// CHECK: KEY: B
				// CHECK: FullyQualifiedName: B
				// CHECK: Name: B
				// CHECK: ID: Member
				// CHECK: Type: X
				// CHECK: Access: 3
				// CHECK: ID: Member
				// CHECK: Type: Y
				// CHECK: Access: 3

				struct C { int i; };
				// CHECK: ---
				// CHECK: KEY: C
				// CHECK: FullyQualifiedName: C
				// CHECK: Name: C
				// CHECK: ID: Member
				// CHECK: Type: int
				// CHECK: Name: C::i
				// CHECK: Access: 3
				// CHECK: ---
				// CHECK: KEY: C::C
				// CHECK: FullyQualifiedName: C::C
				// CHECK: Name: C
				// CHECK: Namespace: C

				class D {};
				// CHECK: ---
				// CHECK: KEY: D
				// CHECK: FullyQualifiedName: D
				// CHECK: Name: D
				// CHECK: TagType: 3
				// CHECK: ---
				// CHECK: KEY: D::D
				// CHECK: FullyQualifiedName: D::D
				// CHECK: Name: D
				// CHECK: Namespace: D

				class E {
				// CHECK: ---
				// CHECK: KEY: E
				// CHECK: FullyQualifiedName: E
				// CHECK: Name: E
				// CHECK: TagType: 3
				// CHECK: ---
				// CHECK: KEY: E::E
				// CHECK: FullyQualifiedName: E::E
				// CHECK: Name: E
				// CHECK: Namespace: E

				public:
				E() {}
				// CHECK: ---
				// CHECK: KEY: _ZN1EC1Ev
				// CHECK: FullyQualifiedName: E::E
				// CHECK: Name: E
				// CHECK: Namespace: E
				// CHECK: MangledName: _ZN1EC1Ev
				// CHECK: Parent: E
				// CHECK: ID: Return
				// CHECK: Type: void
				// CHECK: Access: 3

				~E() {}
				// CHECK: ---
				// CHECK: KEY: _ZN1ED1Ev
				// CHECK: FullyQualifiedName: E::~E
				// CHECK: Name: ~E
				// CHECK: Namespace: E
				// CHECK: MangledName: _ZN1ED1Ev
				// CHECK: Parent: E
				// CHECK: ID: Return
				// CHECK: Type: void
				// CHECK: Access: 3

				protected:
				void ProtectedMethod();
				// CHECK: ---
				// CHECK: KEY: _ZN1E15ProtectedMethodEv
				// CHECK: FullyQualifiedName: _ZN1E15ProtectedMethodEv
				// CHECK: Name: ProtectedMethod
				// CHECK: Namespace: E
				};

				void E::ProtectedMethod() {}
				// CHECK: ---
				// CHECK: KEY: _ZN1E15ProtectedMethodEv
				// CHECK: FullyQualifiedName: E::ProtectedMethod
				// CHECK: Name: ProtectedMethod
				// CHECK: Namespace: E
				// CHECK: MangledName: _ZN1E15ProtectedMethodEv
				// CHECK: Parent: E
				// CHECK: ID: Return
				// CHECK: Type: void
				// CHECK: Access: 3
				// CHECK: Access: 1

				class F : virtual private D, public E {};
				// CHECK: ---
				// CHECK: KEY: F
				// CHECK: FullyQualifiedName: F
				// CHECK: Name: F
				// CHECK: TagType: 3
				// CHECK: Parent: class E
				// CHECK: VParent: class D
				// CHECK: ---
				// CHECK: KEY: F::F
				// CHECK: FullyQualifiedName: F::F
				// CHECK: Name: F
				// CHECK: Namespace: F

This is an archive of the discontinued LLVM Phabricator instance.

Setup clang-doc frontend frameworkClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 133726

CMakeLists.txt

clang-doc/CMakeLists.txt

clang-doc/ClangDoc.h

clang-doc/ClangDoc.cpp

clang-doc/ClangDocBinary.h

clang-doc/ClangDocBinary.cpp

clang-doc/ClangDocMapper.h

clang-doc/ClangDocMapper.cpp

clang-doc/ClangDocRepresentation.h

clang-doc/tool/CMakeLists.txt

clang-doc/tool/ClangDocMain.cpp

docs/clang-doc.rst

test/CMakeLists.txt

test/clang-doc/mapper-namespace.cpp

test/clang-doc/mapper-type.cpp

Setup clang-doc frontend framework
ClosedPublic