This is an archive of the discontinued LLVM Phabricator instance.

[codeview] Remove Type member from CVRecord
ClosedPublic

Authored by rnk on Mar 29 2019, 3:18 PM.

Details

Summary

Now CVType and CVSymbol are effectively type-safe wrappers around
ArrayRef<uint8_t>. Make the kind() accessor load it from the
RecordPrefix, which is the same for types and symbols.

Event Timeline

rnk created this revision.Mar 29 2019, 3:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2019, 3:18 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

Nice, does this actually have any performance impact? Since the arrays are smaller and have better cache locality.

llvm/unittests/DebugInfo/CodeView/RandomAccessVisitorTest.cpp
370–373

Minor nitpick, but this looks kinda gross to me. How about:

std::vector<CVType> TypeArray = {
  {Builder.records()[0]},
  {Builder.records()[1]}
};

I don't feel strongly though, just a suggestion.

aganea added a comment.EditedMar 30 2019, 1:09 PM

I like when removing code makes a class actually less complex :) (while keeping the same feature set)

llvm/include/llvm/DebugInfo/CodeView/CVRecord.h
36–39

return RecordData.size() >= sizeof(RecordPrefix)

Do we expect an invalid (zero) .RecordKind here?

45

return {};

llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h
54–55

This seems like a recurrent pattern, why not add a constructor such as:
CVRecord(RecordPrefix* R) : RecordData((uint8_t*)R, R->RecordLen + 2) {}

Which reduces all this code to:

RecordPrefix R{ulittle16_t(2), ulittle16_t((uint16_t)Sym.Kind)};
CVSymbol Result(&R);

Which could be further improved if RecordPrefix had a RecordPrefix(uint16_t, uint16_t) constructor.

56

Should be 2.

llvm/include/llvm/DebugInfo/CodeView/TypeDeserializer.h
115

Remove makeFakeFieldList and make this:

RecordPrefix R{ulittle16_t(2), ulittle16_t((uint16_t)TypeLeafKind::LF_FIELDLIST)};
CVSymbol Result(&R);
llvm/lib/DebugInfo/CodeView/ContinuationRecordBuilder.cpp
70

See comment above.

174–175

Same here.

rnk planned changes to this revision.Apr 1 2019, 2:04 PM
rnk marked 9 inline comments as done.

Nice, does this actually have any performance impact? Since the arrays are smaller and have better cache locality.

I haven't done any measurements, actually. I think it will affect the public stream, though, which maintains this vector:

std::vector<CVSymbol> Records;

I got rid of the per-module vector of symbols, though, so I'm not sure how big it will be. I guess I'll just measure heap usage on clang.pdb before I commit.

I like when removing code makes a class actually less complex :) (while keeping the same feature set)

I'm actually wondering if we should remove the size as well, since it's technically redundant with the RecordLen. Originally, while working on the dumper, we used ArrayRef<uint8_t> so that we had a way to make sure we didn't overrun our parent buffer, like an .debug$S subsection. But, I could imagine we might want to validate our inputs up front, and then pass around plain (or wrapped) record pointers in the linker for performance reasons. Hard to say.

Anyway, I agree, this change stands on its own as a good simplification. :)


It looks like there's a bunch of new CV code in LLDB that I have to go refactor, so I'll go do that, and then reupload.

llvm/include/llvm/DebugInfo/CodeView/CVRecord.h
36–39

I don't think so.

llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h
54–55

In addition to those constructors, I was actually thinking maybe CVRecord should just point to a RecordPrefix* instead of using the ArrayRef wrapper, and then convert back to uint8_t* in .data() and .content().

I tried your suggestion, but I screwed up the field order of RecordPrefix, so I think I'll leave it like this, since it's more self-documenting:

RecordPrefix Prefix;
Prefix.RecordLen = 2;
Prefix.RecordKind = uint16_t(Sym.Kind);
CVSymbol Result(&Prefix, sizeof(Prefix));

I want to pass a size along with the prefix pointer because the whole idea of using ArrayRef in the first place is to maintain the buffer size separately from the input data so we don't end up trusting it.

56

I could say 2, but this is the seralizer, so really it just gets overwritten. Do you think I should leave it uninitialized, 2, -1, 0?

rnk updated this revision to Diff 193191.Apr 1 2019, 4:00 PM
rnk marked 2 inline comments as done.
  • fix one lldb usage
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 1 2019, 4:00 PM
aganea added inline comments.Apr 1 2019, 6:48 PM
llvm/include/llvm/DebugInfo/CodeView/CVRecord.h
29

To add to what you said in a comment above, do you think that if we could add assert(Data.size() == ((RecordPrefix *)RecordData.data())->RecordPrefix + 2) at relevant places below; and then after a while we could simply switch to RecordPrefix *, once issues are ironed out?

llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h
56

I'd say "2" because you want as much as possible to keep data coherent in time. Regardless of what comes after. I think the code should be correct before being optimal. If someone looks for RecordPrefix and then tries to understand how it works, it'd be nice if the code displays the same, correct, behavior everywhere.

But then, you can argue that RecordPrefix.RecordLen is simply a constrait, tied to the amount data that comes after. And maybe the calculation logic for that .RecordLen field should be hidden inside the RecordPrefix class. In that case, to avoid screwing the init order, ideally you could simply say:

RecordPrefix Prefix{uint16_t(Sym.Kind)};
CVSymbol Result(&Prefix);

...and let RecordPrefix (or CVSymbol) handle sizing?

llvm/include/llvm/DebugInfo/CodeView/TypeDeserializer.h
125

Applying what I said above would give:

RecordPrefix Pre{uint16_t(TypeLeafKind::LF_FIELDLIST)};
CVType FieldList(&Pre);
llvm/lib/DebugInfo/CodeView/SimpleTypeSerializer.cpp
35

Here it is the same thing: writeRecordPrefix() writes a .RecordLen = 0, but it could very well write 2 to be coherent, then you would be able to trust the RecordPrefix *.

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
40

I suppose you've chosen 1 because that's a invalid record size? Comment maybe?

aganea marked an inline comment as done.Apr 1 2019, 6:51 PM
aganea added inline comments.
llvm/include/llvm/DebugInfo/CodeView/CVRecord.h
29

I didn't mean now.

rnk updated this revision to Diff 193388.Apr 2 2019, 4:24 PM
rnk marked 4 inline comments as done.
  • Add RecordPrefix ctor, remove all dummy RecordLen assignments
rnk updated this revision to Diff 193390.Apr 2 2019, 4:26 PM
rnk marked an inline comment as done.
  • one more change
llvm/include/llvm/DebugInfo/CodeView/CVRecord.h
29

Eventually, yes, I think it's possible.

llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h
56

Makes sense. I'll try to standardize on 2, and I like the logic that the data becomes trustworthy.

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
40

Actually, the reason I did this was to avoid ambiguity between the T* begin, T* end ctor and the T* base, size_t count ctor. If I use zero, it's ambiguous. Implicit null strikes again. :)

aganea accepted this revision.Apr 2 2019, 4:49 PM

LGTM. With a few minor comments:

llvm/include/llvm/DebugInfo/CodeView/TypeDeserializer.h
122–125

RecordPrefix Pre(TypeLeafKind::LF_FIELDLIST); like the other places? And above.

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
40

Oh I see. Why not static CVSymbol Tombstone(DenseMapInfo<ArrayRef<uint8_t>>::getTombstoneKey()); then?

This revision is now accepted and ready to land.Apr 2 2019, 4:49 PM
rnk marked 2 inline comments as done.Apr 3 2019, 9:41 AM

Thanks!

rnk updated this revision to Diff 193526.Apr 3 2019, 9:41 AM
  • final updates
This revision was automatically updated to reflect the committed changes.