diff --git a/clang-tools-extra/clang-doc/README.txt b/clang-tools-extra/clang-doc/README.txt new file mode 100644 --- /dev/null +++ b/clang-tools-extra/clang-doc/README.txt @@ -0,0 +1,60 @@ +============================== +How clang-doc works internally +============================== + +Clang-doc uses the "tooling" library which can run the compiler. It can take the files directly or +it can extract the file list and the commands corresponding to each from a compile_commands.json +file. + +The tooling library spins up threads and parses the compilation units in parallel. Clang-doc +registers a callback to run on the AST of each unit. + +When the AST is known, the MapASTVisitor in Mapper.cpp is run on the AST. It has callbacks for the +main AST nodes that clang-doc cares about. This is a very simple object that mostly calls into +Serialize.cpp to generate the "representation" of the code. These are the various *Info structures +like FunctionInfo and NamespaceInfo defined in Representation.h that correspond to each element of +the code that might be documented. + +The representation from each execution thread is serialized to bitcode using BitcodeWriter.cpp. This +is a custom bitcode defined in BitcodeWriter.h and is NOT regular LLVM bitcode IR. Watch out: the +"Serialize.cpp" file is not related to this step though the name may imply it. + +These bitcode representations are then passed back to the main thread and deserialized back to a +new copy of the representation in BitcodeReader.cpp. This round-trip through bitcode is used only to +get the data out of the AST visitor (which is expected to return a byte stream) and is never saved +or used for any other purpose. This round-trip adds significant complexity and we should consider +passing the Representation object hierarchy back to the main thread out-of-band without +serialization and deleting the bitcode representation. + +After deserialization, the various representation objects from each thread are merged/reduced into a +single structure. This is necessary both to collect everything and to merge the declarations of +things (of which there may be many) with the actual implementation. Many fields on a record can't be +merged in the abstract (for example, a boolean field on two structures with two different values has +no clear merged result). But this question can be resolved by thinking about the items as +definitions and declarations and picking the value which would be present on the definition (usually +the more specific or non-default value). + +After merging the final representation is passed to the output generator which writes the final +files. + +================== +To add a new field +================== + +To add a new bit of information to the documentation that you extract from the AST: + + 1. Add it to the appropriate *Info structure in Representation.h. + + 2. Extract the information from the AST by adding to the code in Serialize.cpp and possibly + Mapper.cpp. Save it to the Info structure used above. + + 3. Write bitcode serialization code in BitcodeWriter.cpp for the new field. + + 4. Write bitcode deserialization code in BitcodeReader.cpp. + + 5. Add merging code in the merge() function for your Info structure (see above for advice). + + 6. Update the backend(s) to use the new data. + +Note! Steps 3, 4, and especially 5 are easy to forget. Without all of these, the code may look like +it works but you will only get default values out at the end.