This is an archive of the discontinued LLVM Phabricator instance.

[clang-doc] Add support for explicitly typed enums
ClosedPublic

Authored by brettw on Sep 16 2022, 10:14 AM.

Details

Summary

Add support for explicitly typed enums:

enum Foo : unsigned { ... };

to the internal representation and to the YAML output.

Add support for getting the value of an enum constant, as well as accessing the original expression that produced it. This changes the YAML output of enums from an array of strings for the enum members to an array of dictionaries. These dictionaries now report the name, value, and original expression.

The markdown and HTML outputs are unchanged, they still output the name from the new enhanced internal schema.

Diff Detail

Event Timeline

brettw created this revision.Sep 16 2022, 10:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 10:14 AM
brettw requested review of this revision.Sep 16 2022, 10:14 AM
brettw edited the summary of this revision. (Show Details)Sep 16 2022, 10:14 AM
brettw edited the summary of this revision. (Show Details)

Can you make the first line of the summary the commit title? It's a bit more descriptive.

clang-tools-extra/clang-doc/BitcodeReader.cpp
59

do you need to do all of this? APSInt already supports to/from string methods, as well as converting to/from integers. can you use that here and in the writer to avoid some complexity?

221

Don't you still need this? if it's now unused, we should probably remove it from the RecordId enum too.

clang-tools-extra/clang-doc/Serialize.cpp
185

We normally try to avoid returning small string by value. see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallstring-h

Normally in the case that we need to return a string from a method, we just use std::string. In other cases, it may be more appropriate to use an output parameter instead of a return.

316

why was 128 chosen? Aren't we storing it into a SmallString<16> in EnumValueInfo? is there some external reason that we expect this to be the right size?

Do you have an idea for how long these are likely to be? if we expect them to be large or completely unpredictable, it may make more sense to use a std::string and avoid wasting stack space.

brettw updated this revision to Diff 460882.Sep 16 2022, 1:33 PM
brettw marked 2 inline comments as done.
brettw retitled this revision from [clang-doc] Export more enum information to [clang-doc] Export enum type and value information..
brettw edited the summary of this revision. (Show Details)
brettw added inline comments.
clang-tools-extra/clang-doc/BitcodeReader.cpp
59

I don't think converting to an integer is a good idea because people sometimes use min/max values for enum values and since this could be signed or unsigned, it gets kind of complicated.

Serializing a number as a string to bitcode also seemed wrong to me.

The simplest thing would be to store the value as a string in the EnumValueInfo and then always treat this as a string from then on. If you want it simplified, I think that's the thing to do. But I thought you would want the numeric value stored in clang-doc's "ast" because some backends may want to treat this as a number, check its signed/unsignedness, etc. I happy to change this if you want.

221

I think this is only used for properties and not the child members, so it isn't used. I removed the enum.

clang-tools-extra/clang-doc/Serialize.cpp
185

I was just copying some other code in this file, I changed this to std::string.

316

I changed this and the EnumValueInfo struct to std::string. I think the usage of SmallString in these records is over the top but I was trying to copy the existing style.

I think maybe you made the title the first line of the summary instead of the other way around. I was looking for this as the title: [clang-doc] Add support for explicitly typed enums

clang-tools-extra/clang-doc/BitcodeReader.cpp
59

Those are fair points, and I think I misread/misunderstood a bit of what's going on here.

As for encoding/decoding integers, BitcodeWriter already has integer support, so if you need to convert to/from those, it should already work, right?

Regardless, you may want to look at the bitcode reader/writer in llvm to see how they serialize/deserialize APInt, as their implementation seems a bit more straightforward IMO.

https://github.com/llvm/llvm-project/blob/9050a59c6678789831b7286b8b68d18b966c4694/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp#L1636
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp#L2520
https://github.com/llvm/llvm-project/blob/9050a59c6678789831b7286b8b68d18b966c4694/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L2843
https://github.com/llvm/llvm-project/blob/9050a59c6678789831b7286b8b68d18b966c4694/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L2743

71

You can avoid the resize w/ SmallVector(Size) constructor, right?

clang-tools-extra/clang-doc/Representation.h
425

Sorry to nitpick, but SmallString was the correct choice to use in this type. We avoid its use as a return value, because it tends to be brittle, and stops us from assigning into arbitrary sized SmallStrings. In the struct, it's a reasonable choice, especially if you expect most uses to be small. for these expressions, I expect most to either be the number itself, or some shift operation, so SmallString<16> was probably more than sufficient.

clang-tools-extra/clang-doc/Serialize.cpp
316

small buffer optimizations significantly improve performance in the compiler and related tools. switching to a default string implementation that can use those optimizations reduced the total number of heap allocations in a normal run of clang by almost half.

That's why LLVM's programmer's manual and contribution guidelines suggest using those whenever possible.

brettw retitled this revision from [clang-doc] Export enum type and value information. to [clang-doc] Add support for explicitly typed enums.Sep 16 2022, 4:03 PM
brettw edited the summary of this revision. (Show Details)
brettw updated this revision to Diff 461259.Sep 19 2022, 10:18 AM
brettw marked an inline comment as done.
brettw added inline comments.
clang-tools-extra/clang-doc/BitcodeReader.cpp
59

I just converted everything to a string which allowed most of this code to be deleted. I don't think any theoretical benefit of keeping the APSInt is worth the complexity.

71

This is now deleted.

clang-tools-extra/clang-doc/Representation.h
425

In the common case there will be will be empty, so std::string actually seems more efficient. Also, I personally think the current quantity of SmallString in this code is over-optimization.

This revision is now accepted and ready to land.Sep 19 2022, 10:28 AM

Sorry, it looks like a pre-submit formatting check failed. can you git clang-format HEAD~ and re-upload? I can land your change after that.

brettw updated this revision to Diff 461366.Sep 19 2022, 2:33 PM

Clang-formatted

This revision was landed with ongoing or failed builds.Sep 19 2022, 2:54 PM
This revision was automatically updated to reflect the committed changes.