Now CVType and CVSymbol are effectively type-safe wrappers around
ArrayRef<uint8_t>. Make the kind() accessor load it from the
RecordPrefix, which is the same for types and symbols.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 29978 Build 29977: arc lint + arc unit
Event Timeline
Nice, does this actually have any performance impact? Since the arrays are smaller and have better cache locality.
llvm/unittests/DebugInfo/CodeView/RandomAccessVisitorTest.cpp | ||
---|---|---|
370–373 | Minor nitpick, but this looks kinda gross to me. How about: std::vector<CVType> TypeArray = { {Builder.records()[0]}, {Builder.records()[1]} }; I don't feel strongly though, just a suggestion. |
I like when removing code makes a class actually less complex :) (while keeping the same feature set)
llvm/include/llvm/DebugInfo/CodeView/CVRecord.h | ||
---|---|---|
36–39 | return RecordData.size() >= sizeof(RecordPrefix) Do we expect an invalid (zero) .RecordKind here? | |
45 | return {}; | |
llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h | ||
54–55 | This seems like a recurrent pattern, why not add a constructor such as: Which reduces all this code to: RecordPrefix R{ulittle16_t(2), ulittle16_t((uint16_t)Sym.Kind)}; CVSymbol Result(&R); Which could be further improved if RecordPrefix had a RecordPrefix(uint16_t, uint16_t) constructor. | |
56 | Should be 2. | |
llvm/include/llvm/DebugInfo/CodeView/TypeDeserializer.h | ||
115 | Remove makeFakeFieldList and make this: RecordPrefix R{ulittle16_t(2), ulittle16_t((uint16_t)TypeLeafKind::LF_FIELDLIST)}; CVSymbol Result(&R); | |
llvm/lib/DebugInfo/CodeView/ContinuationRecordBuilder.cpp | ||
70 | See comment above. | |
174–175 | Same here. |
I haven't done any measurements, actually. I think it will affect the public stream, though, which maintains this vector:
std::vector<CVSymbol> Records;
I got rid of the per-module vector of symbols, though, so I'm not sure how big it will be. I guess I'll just measure heap usage on clang.pdb before I commit.
I'm actually wondering if we should remove the size as well, since it's technically redundant with the RecordLen. Originally, while working on the dumper, we used ArrayRef<uint8_t> so that we had a way to make sure we didn't overrun our parent buffer, like an .debug$S subsection. But, I could imagine we might want to validate our inputs up front, and then pass around plain (or wrapped) record pointers in the linker for performance reasons. Hard to say.
Anyway, I agree, this change stands on its own as a good simplification. :)
It looks like there's a bunch of new CV code in LLDB that I have to go refactor, so I'll go do that, and then reupload.
llvm/include/llvm/DebugInfo/CodeView/CVRecord.h | ||
---|---|---|
36–39 | I don't think so. | |
llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h | ||
54–55 | In addition to those constructors, I was actually thinking maybe CVRecord should just point to a RecordPrefix* instead of using the ArrayRef wrapper, and then convert back to uint8_t* in .data() and .content(). I tried your suggestion, but I screwed up the field order of RecordPrefix, so I think I'll leave it like this, since it's more self-documenting: RecordPrefix Prefix; Prefix.RecordLen = 2; Prefix.RecordKind = uint16_t(Sym.Kind); CVSymbol Result(&Prefix, sizeof(Prefix)); I want to pass a size along with the prefix pointer because the whole idea of using ArrayRef in the first place is to maintain the buffer size separately from the input data so we don't end up trusting it. | |
56 | I could say 2, but this is the seralizer, so really it just gets overwritten. Do you think I should leave it uninitialized, 2, -1, 0? |
llvm/include/llvm/DebugInfo/CodeView/CVRecord.h | ||
---|---|---|
29 | To add to what you said in a comment above, do you think that if we could add assert(Data.size() == ((RecordPrefix *)RecordData.data())->RecordPrefix + 2) at relevant places below; and then after a while we could simply switch to RecordPrefix *, once issues are ironed out? | |
llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h | ||
56 | I'd say "2" because you want as much as possible to keep data coherent in time. Regardless of what comes after. I think the code should be correct before being optimal. If someone looks for RecordPrefix and then tries to understand how it works, it'd be nice if the code displays the same, correct, behavior everywhere. But then, you can argue that RecordPrefix.RecordLen is simply a constrait, tied to the amount data that comes after. And maybe the calculation logic for that .RecordLen field should be hidden inside the RecordPrefix class. In that case, to avoid screwing the init order, ideally you could simply say: RecordPrefix Prefix{uint16_t(Sym.Kind)}; CVSymbol Result(&Prefix); ...and let RecordPrefix (or CVSymbol) handle sizing? | |
llvm/include/llvm/DebugInfo/CodeView/TypeDeserializer.h | ||
125 | Applying what I said above would give: RecordPrefix Pre{uint16_t(TypeLeafKind::LF_FIELDLIST)}; CVType FieldList(&Pre); | |
llvm/lib/DebugInfo/CodeView/SimpleTypeSerializer.cpp | ||
35 | Here it is the same thing: writeRecordPrefix() writes a .RecordLen = 0, but it could very well write 2 to be coherent, then you would be able to trust the RecordPrefix *. | |
llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp | ||
40 | I suppose you've chosen 1 because that's a invalid record size? Comment maybe? |
llvm/include/llvm/DebugInfo/CodeView/CVRecord.h | ||
---|---|---|
29 | I didn't mean now. |
- one more change
llvm/include/llvm/DebugInfo/CodeView/CVRecord.h | ||
---|---|---|
29 | Eventually, yes, I think it's possible. | |
llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h | ||
56 | Makes sense. I'll try to standardize on 2, and I like the logic that the data becomes trustworthy. | |
llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp | ||
40 | Actually, the reason I did this was to avoid ambiguity between the T* begin, T* end ctor and the T* base, size_t count ctor. If I use zero, it's ambiguous. Implicit null strikes again. :) |
LGTM. With a few minor comments:
llvm/include/llvm/DebugInfo/CodeView/TypeDeserializer.h | ||
---|---|---|
122–125 | RecordPrefix Pre(TypeLeafKind::LF_FIELDLIST); like the other places? And above. | |
llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp | ||
40 | Oh I see. Why not static CVSymbol Tombstone(DenseMapInfo<ArrayRef<uint8_t>>::getTombstoneKey()); then? |
To add to what you said in a comment above, do you think that if we could add assert(Data.size() == ((RecordPrefix *)RecordData.data())->RecordPrefix + 2) at relevant places below; and then after a while we could simply switch to RecordPrefix *, once issues are ironed out?