This is an archive of the discontinued LLVM Phabricator instance.

[CodeView] Hook CodeViewRecordIO up to the type serialization path.
ClosedPublic

Authored by zturner on Nov 2 2016, 11:41 AM.

Details

Summary

This patch does a lot. The following classes are all deleted

  1. TypeTableBuilder
  2. ListRecordBuilder
  3. MemoryTypeTableBuilder
  4. FieldListRecordBuilder
  5. TypeRecordBuilder
  6. TypeSerializationVisitor

In their place, they are replaced with two new classes:

  1. TypeSerializer
  2. TypeTableBuilder

#2 has the same name as a previous class, but it is completely re-written.
#1 is a replacement for the previous TypeSerializationVisitor.

The two new classes make use of the previously introduced CodeViewRecordIO to accomplish everything that used to be accomplished by the 6 deleted classes. The advantages of this new approach have been discussed previously, but I will repeat them here for posterity and for anyone reading this unfamiliar with previous discussions.

  1. A single codepath for both reading and writing. Now TypeDeserializer and TypeSerializer both use the same mappings to define their record layout.
  2. Better handling of field list record splitting and size fields. Previously the injection of the Record length was separated from the splitting logic, even though each one needs to know certain details about the other. This led to some hackish code where certain classes would make assumptions about what happened in the others. In some cases there were latent bugs (such as it being possible to write a record that was 2 bytes too large due to not taking into consideration the record length when splitting).
  3. Being able to use the same machinery for symbol record. This patch will bring full support for serializing and deserializing type records. We have additional code still in LLVM that deserializes symbol records. To support serialization, we would need to go re-implement a parallel codepath. With this, we will be able to achieve serialization and deserialization without adding this parallel codepath.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner updated this revision to Diff 76753.Nov 2 2016, 11:41 AM
zturner retitled this revision from to [CodeView] Hook CodeViewRecordIO up to the type serialization path..
zturner updated this object.
zturner added reviewers: amccarth, inglorion, rnk, ruiu.
zturner added a subscriber: llvm-commits.
zturner updated this object.Nov 2 2016, 11:42 AM
inglorion edited edge metadata.Nov 2 2016, 2:16 PM

Overall looks good. I have a couple of questions I have posted inline. Once those are resolved, I'll accept.

include/llvm/DebugInfo/CodeView/CodeViewRecordIO.h
26 ↗(On Diff #76753)

Out of curiosity, why move this include? And, while no the subject, why not use cstdint?

lib/DebugInfo/CodeView/CodeViewRecordIO.cpp
148 ↗(On Diff #76753)

Do we need to perform this check when reading, too? If not, could you move it after the assert(Guid.size() ...)?

Also, I would make a symbolic constant, particularly since it's repeated a number of times. Believe it or not, it wasn't immediately clear to me where the 16 came from (yes, I'm having my caffeine now).

lib/DebugInfo/CodeView/TypeRecordMapping.cpp
119 ↗(On Diff #76753)

s/spawns/spans/ ?

zturner added inline comments.Nov 2 2016, 2:52 PM
include/llvm/DebugInfo/CodeView/CodeViewRecordIO.h
26 ↗(On Diff #76753)

I tend to include files in order from most dependencies to fewest dependencies. This way you don't have hidden transitive dependencies picked up through previous include, helping to enforce include-what-you-use. C++ headers can depend on C headers, but not the other way around, therefore I tend to prefer including C++ system headers before C system headers.

lib/DebugInfo/CodeView/CodeViewRecordIO.cpp
148 ↗(On Diff #76753)

I theory we don't need to use these checks when reading, but it would be kind of interesting if one ever fired. Currently we believe they should never fire, but if it were to fire on a non-Clang-generated PDB, then it might mean we don't know as much as we thought we did about the format, because it means we have a record larger than the size we thought was the maximum. On the other hand, if it fired on a clang-generated PDB, then it would indicate we did something wrong (which in theory should have been caught during writing, but who knows).

I think it will be useful at some point or another because once we get libfuzzer stood up on Windows, we may want to start fuzzing PDBs, and at that point all bets are off.

amccarth edited edge metadata.Nov 2 2016, 3:22 PM

Looks pretty good.

include/llvm/DebugInfo/CodeView/TypeRecordMapping.h
22 ↗(On Diff #76753)

These forward declarations seem unnecessary. The TypeRecordMapping constructors are defined in this header, so I believe you need the complete definitions of msf::StreamReader and msf::StreamWriter, and you have the complete definitions, because CodeViewRecordIO.h includes them transitively (because it needs the complete types).

include/llvm/DebugInfo/CodeView/TypeSerializer.h
22 ↗(On Diff #76753)

I guess clang-format puts includes in ASCII order rather than alphabetical order since 'S' < 'i'.

25 ↗(On Diff #76753)

You probably should #include <string.h> for ::memcpy and ::memmove, which are used several times in this header.

(or <cstring> for std::memcpy and std::memmove)

include/llvm/DebugInfo/CodeView/TypeTableBuilder.h
68 ↗(On Diff #76753)

Should Record be a reference or are you intentionally applying Func to copies of the records? Oh, I see. These aren't actually records but ArrayRefs to records, so the copy is just a couple pointers. Some days I really miss Hungarian notation.

lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
1579 ↗(On Diff #76753)

I think you meant FLRB rather than FLBR.

It's too bad the FieldListRecordBuilder c'tor and d'tor don't do the begin and end calls for you.

1786 ↗(On Diff #76753)

FLRB

zturner added inline comments.Nov 2 2016, 4:00 PM
include/llvm/DebugInfo/CodeView/TypeRecordMapping.h
22 ↗(On Diff #76753)

The constructors are defined, but they only accept references. So a full definition is not needed for these constructors. While it's true that CodeViewRecordIO.h includes them transitively, we should not have one header be dependent on implementation details of another header. If the code in CodeViewRecordIO.h moved into a CPP file, then this code could subsequently break.

amccarth added inline comments.Nov 2 2016, 5:00 PM
include/llvm/DebugInfo/CodeView/TypeRecordMapping.h
22 ↗(On Diff #76753)

Yes, these constructors take a reference, but the member initializers take the parameters by value, so I'm pretty sure you need complete definitions here.

zturner updated this revision to Diff 77045.Nov 7 2016, 9:08 AM
zturner edited edge metadata.

Updated based on all suggestions.

I'm happy with this. @amccarth ?

amccarth accepted this revision.Nov 8 2016, 12:22 PM
amccarth edited edge metadata.

LGTM, pending the little fixes already mentioned.

This revision is now accepted and ready to land.Nov 8 2016, 12:22 PM
This revision was automatically updated to reflect the committed changes.
llvm/trunk/lib/DebugInfo/CodeView/TypeStreamMerger.cpp