This is an archive of the discontinued LLVM Phabricator instance.

[clang-doc] Clean up *Info constructors.
ClosedPublic

Authored by brettw on Sep 19 2022, 4:09 PM.

Details

Summary

The *Info object (for the copy of the AST") constructors had many duplicated variants. Many of the variants seemed to be in an attempt to avoid default arguments. But default arguments are not prohibited and using them allows most of the variants to be removed which improves readability.

Remove the IsInGlobalNamespace flag on a Reference. This is set when the path is empty, and only read once in the HTML generator with the identical condition. The constructor cleanup exposed a problem where this was set to false when the constructor with no path was used, but true when the path was set to empty.

There should be no observable change with the exception that IsInGlobalNamespace is no longer emitted in YAML.

Diff Detail

Event Timeline

brettw created this revision.Sep 19 2022, 4:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 4:09 PM
brettw requested review of this revision.Sep 19 2022, 4:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 4:09 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I am a bit puzzled by the intent of cleaning up the constructors in Info object. I didn't see it as an issue to have multiple constructors, which is a common practice. After this clean up It doesn't make it easier to use these objects, in fact, in Serialize.cpp, you have to explicitly specify a new SymbolID() when creating a new Reference object. While this change won't affect (except for the global namespace) the output from clang-docs, it is not semantically equivalent before the cleanup.

Would you mind explaining the reasons for removing these constructors?

clang-tools-extra/clang-doc/Representation.h
165

The clean up here prevents setting a customized SymbolID for TypeInfo, which becomes a problem in the HTMLGeneratorTest, in which TypeInfo is constructed with an constant EmptySID. While the outcome is the same, it is not semantically equivalent.

219

This is already set in the constructor. Does it still need to be set explicitly here?

brettw marked an inline comment as done.Sep 21 2022, 10:55 AM

First, for the mechanical part of the change, this code implement multiple constructors:

  Info() = default;
  Info(InfoType IT) : IT(IT) {}
  Info(InfoType IT, SymbolID USR) : USR(USR), IT(IT) {}
  Info(InfoType IT, SymbolID USR, StringRef Name)
      : USR(USR), IT(IT), Name(Name) {}
  Info(InfoType IT, SymbolID USR, StringRef Name, StringRef Path)

is not common practice. This is an extremely verbose and error-prone way of implementing default arguments. It seems obvious to me that the one constructor that implements the exact same behavior is much more clear. Most of this patch is doing this mechanical change for these objects.

There are a few cases in this patch where I removed some constructors that took some "other" subset of parameters.

  Reference() = default;
  Reference(llvm::StringRef Name) : Name(Name) {}
  Reference(llvm::StringRef Name, StringRef Path)
      : Name(Name), Path(Path), IsInGlobalNamespace(Path.empty()) {}
  Reference(SymbolID USR, StringRef Name, InfoType IT)
      : USR(USR), Name(Name), RefType(IT) {}
  Reference(SymbolID USR, StringRef Name, InfoType IT, StringRef Path)
      : USR(USR), Name(Name), RefType(IT), Path(Path),
        IsInGlobalNamespace(Path.empty()) {}

Working on this code, I found the number of constructors with different parameters to make things more difficult to follow. It's harder to figure out what exactly a line of code is doing when there are so many variants you have to match against. It's also more difficult to figure out how to create one of these objects when you have to read so many variants to see what to call. This mental overhead may not be obvious when just looking at a diff, especially the test code where you may see an extra empty SID parameter inserted here and there. This new one tries to have a single constructor with default arguments from the right, with the exception of the "TypeInfo" that creates a single named type with no SymbolID because that's by far the most common variant in tests.

With this Reference constructor in particular, all of these variants hide a bug. If I didn't call that bug out in the change description, would you have seen it? Having a standard constructor makes this much easier to follow. Changing several callers to manually specify an empty SymbolID() seems like a small price to pay (I think this is the only case where any calls get "more complicated."

I tried to standardize on a single way to construct these objects. As an example, the TypeInfo is just a wrapper around a Reference. Reference has many constructors, not all of which are a subset of the others. Does this mean TypeInfo should get these same 5 constructor variants? The author said no, they picked 3 of the variants and implemented TypeInfo constructors, and used different names for some of the parameters which makes it even harder to follow. Continuing down the chain, FieldTypeInfo derives from TypeInfo, and MemberTypeInfo derives from FieldTypeInfo. Does this mean they should get the same 5 variants, plus initializers for their own values? I would say no. If these derived classes needed to specify the Reference information, they should take a Reference as the first parameter (they don't all need to do this today).

There's a problem with the current object model. The FieldTypeInfo and MemberTypeInfo derive from TypeInfo. But I think that this is wrong. The FieldTypeInfo isn't the type information for a field, it's really a "Field" and a MemberTypeInfo isn't the type info for a member, it's the "Member" itself. Conceptually, FieldTypeInfo should be a FieldInfo and have a struct TypeInfo Type; rather than using an "is a" relationship and deriving from a type. Given that, this object naming and constructor "should" be:

  MemberInfo(TypeInfo Type, StringRef Name, AccessSpecifier Access = Public)

This patch gets us closer to this, where we still have the weird "is a" derivation but the constructors match what I think they should be:

  MemberTypeInfo() = default;
  MemberTypeInfo(const TypeInfo &TI, StringRef Name, AccessSpecifier Access)

The parameters that define the implementation details of how the type is represented don't belong on this class: if the caller wants to create a type, they should create it using one of the appropriate TypeInfo constructors rather than each variant being duplicated here.

clang-tools-extra/clang-doc/Representation.h
165

It does not prevent setting a custom SymbolID. The full constructor is the one above that takes a Reference. That is the one that's normally used in the production code; I added it in my previous patch. This patch is a followup from that one because I wanted to remove some now-redundant constructors but thought that I would separate it out to make it easier to review. The minority of cases in HTMLGeneratorTest that needed to pass more than the simple name/path have been updated to take a Reference.

The only code that changed in a way that you described is on line 373. But I'm not sure what you're arguing about "semantically equivalent". In this case, the caller wanted a "void" TypeInfo but tryped a bunch of unnecessary stuff. I think it's likely that the author of this code was confused about the constructor situation for TypeInfo and used a more complex variant than was required. It illustrates exactly why this cleanup is useful.

219

It does not, but I have learned to always supply default initializers for POD types when there's a trivial initial value. It's easy for somebody to later come by and add a constructor in such a way that the value is not uninitialized (you seem to be arguing above that you would prefer some of the old constructors, and making those additions could easily cause this problem). This has no runtime overhead and provides some insurance against hard-to-diagnose uninitialized data bugs in the future.

brettw marked an inline comment as done.Sep 22 2022, 12:12 PM

If you have more questions, I can schedule a meeting.

@brettw Thanks for the clarification. While I'm more familiar w/ these set of patches, I have to say that that I was also unsure/surprised by some of these changes. If you have concrete plans for cleanups letting us know about those plans in the initial review will help things proceed more smoothly. For instance, I wasn't aware that you had plans to refactor the various TypeInfos until now, and that context will make reviewing new patches more straightforward on my end.

Anyway, thanks for taking the time to provide the full context for us. I found it very helpful.

clang-tools-extra/clang-doc/Representation.h
165

It does not prevent setting a custom SymbolID. The full constructor is the one above that takes a Reference. That is the one that's normally used in the production code; I added it in my previous patch.

I think the more salient point is that the SymbolID is only used in TypeInfo to instantiate a Reference, so I don't think we're losing too much flexibility by forcing users of this API to construct the Reference directly.

If we want to leave support for setting the SymbolD in place, we should mark it with a FIXME and an issue on github to revisit. Otherwise, I think we can move on.

This patch is a followup from that one because I wanted to remove some now-redundant constructors but thought that I would separate it out to make it easier to review.

Thank you. Small incremental patches are always appreciated.

The minority of cases in HTMLGeneratorTest that needed to pass more than the simple name/path have been updated to take a Reference.

The only code that changed in a way that you described is on line 373. But I'm not sure what you're arguing about "semantically equivalent".

I agree w/ @haowei that these are not semantically equivalent, despite the fact that they result in the same output. EmptySID is const and is a specific instance of SymbolID that has a unique meaning. While it is just default constructed, using another instance is semantically different, regardless if the resulting output will be the same or not.

In this case, the caller wanted a "void" TypeInfo but tryped a bunch of unnecessary stuff. I think it's likely that the author of this code was confused about the constructor situation for TypeInfo and used a more complex variant than was required. It illustrates exactly why this cleanup is useful.

Let's leave speculation about the original authors out of this review. we all agree that clang-doc is in need of maintenance, so thank you for your patch.

202

do we still want to use std::move here now that its a StringRef? I don't think we would do that w/ const char*, which is what StringRef boils down to. It should be passed by value everywhere.

see
https://llvm.org/docs/ProgrammersManual.html#passing-strings-the-stringref-and-twine-classes,
https://llvm.org/docs/ProgrammersManual.html#the-stringref-class
https://llvm.org/docs/ProgrammersManual.html#string-like-containers

brettw updated this revision to Diff 462281.Sep 22 2022, 1:36 PM
brettw added inline comments.Sep 23 2022, 9:57 AM
clang-tools-extra/clang-doc/Representation.h
202

The move is unnecessary and I removed it.

@brettw Thanks for the clarification. I was oncall for buildgardener last week and got quite busy due to breakages.

Thanks for your clarification, that is really helpful to understand the your intent. While I still don't see the way these constructors are implemented is an issue. I don't have a strong opinions on this and OK with your refactors.

"semantically equivalent" means these test code should have same input/output, performance same routines of task and use same amount of memory. Before your change, these typeinfo's symbolID in the test are always pointing to a constant EmptySID, after your change, some of these typeinfo points to an empty symbol ID in allocated in memory when typeinfo is created. While the output is the same, they are not equivalent. For refactor changes, I would prefer to avoid introducing such changes.

I have one question about deleting the IsInGlobalNamespace. I understand it is not well implemented and has a bug in its constructor, but I don't think we should simply delete it just because it is buggy and not well used. It is part of C++ and commonly used. Do you have any plan to add it back after all the refactor work finishes?

Please avoid directly replying to the email as they won't be logged in the
phabricator.

This boolean is logged into YAML output, while I don't have data, I could
bet there will be users depending on it even if it looks so unlikely. We
don't own the tool in the first place, in that case, I felt if there is a
bug in its constructor, we should fix the bug instead of abruptly deleting
it, even if it looks so easy to be derived from other values.

I can guarantee that nobody uses this for anything useful because it doesn't work well enough. For example, before I started changing it, requesting HTML output crashed on startup and many of the most basic syntactic things were broken. My previous patch changed enums from being a list of dictionaries which is much more backwards incompatible than this. I don't think we need to worry about that.

haowei added inline comments.Sep 26 2022, 5:17 PM
clang-tools-extra/clang-doc/BitcodeReader.cpp
336

This block ID came from L582, which read from a bitcode file when BI_REFERENCE_BLOCK_ID is encountered. I am not familiar with the content of an arbitrary bitcode file. But after your patch, if this block ID is encountered, it will result in an error (a behavior change).

It looks like REFERENCE_IS_IN_GLOBAL_NAMESPACE is only defined in in clang-doc and not part of LLVM's bitcode header. So it looks like it is not a standardized ID to me. Could you confirm it?

If REFERENCE_IS_IN_GLOBAL_NAMESPACE only presents in bitcode file generated from BitcodeWriter from clang-doc and we are not expecting to provide compatibility to an arbitrary bitcode generated from elsewhere, I am OK with deleting this field (I still don't think this is the right approach to do but I don't have a strong opinion on this either).

brettw added inline comments.Sep 26 2022, 7:02 PM
clang-tools-extra/clang-doc/BitcodeReader.cpp
336

Correct, this whole bitcode system is used only to communicate within clang-doc between threads (!) (I think it should be entirely removed but I don't have the time for that). It is never serialized to a file and is not shared with any other component.

haowei accepted this revision.Sep 26 2022, 7:05 PM

LGTM
@paulkirth Do you have more comments?

This revision is now accepted and ready to land.Sep 26 2022, 7:05 PM
paulkirth accepted this revision.Sep 26 2022, 7:06 PM

I'm good w/ this. LGTM

This revision was automatically updated to reflect the committed changes.