This is an archive of the discontinued LLVM Phabricator instance.

[CodeView] Fix cycles in debug info when merging Types with global hashes
ClosedPublic

Authored by aganea on Feb 5 2019, 2:32 PM.

Details

Summary

When type streams with forward references were merged using GHashes, cycles were introduced in the debug info. This was caused by GlobalTypeTableBuilder::insertRecordAs() not inserting the record on the second pass, thus yielding an empty ArrayRef at that record slot. Later on, upon PDB emission, TpiStreamBuilder::commit() would skip that empty record, thus offseting all indices that came after in the stream.

The solution comes in two steps:

  1. Fix the hash calculation, by doing a multiple-step resolution, iff there are forward references in the input stream.
  2. Fix merge by resolving with multiple passes, therefore moving records with forward references at the end of the stream.

I had to make some esthetic changes, such as to support GHashes in llvm-readoj - I could commit that separately.
There's also a bugfix for dumpCodeViewMergedTypes() which previously could reference deleted memory (see comment in llvm-readobj.cpp).

Fixes PR40221

Diff Detail

Repository
rL LLVM

Event Timeline

aganea created this revision.Feb 5 2019, 2:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 2:32 PM
rnk added inline comments.Feb 5 2019, 2:50 PM
include/llvm/DebugInfo/CodeView/TypeCollection.h
33 ↗(On Diff #185405)

I guess this is a question for @zturner.

include/llvm/DebugInfo/CodeView/TypeHashing.h
119 ↗(On Diff #185405)

I think this would have to go to fix point, not just two passes. Consider the case where each record in the type stream refers to the next, i.e. a pointer to a pointer to a pointer ... to void, and everything is backwards. =(

aganea marked an inline comment as done.Feb 5 2019, 2:54 PM
aganea added inline comments.
include/llvm/DebugInfo/CodeView/TypeHashing.h
119 ↗(On Diff #185405)

Good point. I can make a test case, and modify to handle that.

zturner added inline comments.Feb 5 2019, 4:05 PM
include/llvm/DebugInfo/CodeView/TypeHashing.h
119 ↗(On Diff #185405)

I think we can do slightly better than go to fixed point. First go forwards, then go backwards over the remaining records. That's verrrrrry likely to get you to the end in 2 passes. If nothing else, definitely a better heuristic then always going forward.

zturner added inline comments.Feb 5 2019, 4:15 PM
include/llvm/DebugInfo/CodeView/TypeCollection.h
33 ↗(On Diff #185405)

I'm not sold on this method. It doesn't make much sense in the base class here, because you could theoretically have implementations that aren't backed by an array. It sounds like you even ran into this in LazyRandomTypeCollection. However, I don't actually think you need this anyway. More on why later in a later comment.

tools/llvm-readobj/COFFDumper.cpp
1929 ↗(On Diff #185405)

Instead of constructing a TypeTableCollection here and passing it CVTypes.records(), I think you can just use the CVTypes directly. Specifically, something like this should work:

{
  ListScope S(Writer, "MergedTypeStream");
  TypeDumpVisitor TDV(CVTypes, &Writer, opts::CodeViewSubsectionBytes);
  error(codeview::visitTypeStream(TpiTypes, TDV);
  Writer.flush();
}

and similarly for the code below.

aganea updated this revision to Diff 185568.Feb 6 2019, 8:51 AM
aganea marked 5 inline comments as done.

Simplified, following comments.

I made the test a bit more complicated, how I'm not sure such things happen with real life OBJs. The whole test resolves in 1+3 passes (1 initial and 3 extra). Evidently if we could walk backwards it could resolve in 1+2 passes, but I think it's not worth it. This whole bug only happens with very small MASM OBJs.

aganea marked an inline comment as done.Feb 6 2019, 8:54 AM
aganea added inline comments.
include/llvm/DebugInfo/CodeView/TypeHashing.h
119 ↗(On Diff #185405)

Tried that, but given we're dealing with CVTypeArray as a Range type, we can't iterate back; unless we use something like LazyRandomTypeCollection for caching after the first run. Given that forward references seem to happen on MASM OBJs only, using a LazyRandomTypeCollection for every OBJ sounds a bit overkill. So instead I opted for several passes, like TypeStreamMerger.

tools/llvm-readobj/llvm-readobj.cpp
647 ↗(On Diff #185568)

I had to move this here, because otherwise in the old place the TPI stream would reference deleted memory (because BinaryOrErr is deleted above when dumpInput() exits)

aganea edited the summary of this revision. (Show Details)Feb 6 2019, 9:05 AM
rnk added inline comments.Feb 6 2019, 11:49 AM
include/llvm/DebugInfo/CodeView/TypeHashing.h
120 ↗(On Diff #185568)

I think there should be a comment before this loop explaining why it's needed, and that it's not performance critical because most objs don't have forward references.

tools/llvm-readobj/llvm-readobj.cpp
647 ↗(On Diff #185568)

Hm, but won't this print the merged stream once per obj? And won't we still reference deleted memory on the second obj input? I guess the proper fix is to std::move the OwningBinary into a vector in the global ReadObjTypeTableBuilder object.

aganea updated this revision to Diff 185622.Feb 6 2019, 1:00 PM
aganea marked 2 inline comments as done.

Removed changes that aren't critical for this fix.

rnk accepted this revision.Feb 6 2019, 2:46 PM

lgtm!

test/tools/llvm-readobj/codeview-merging-ghash.test
107 ↗(On Diff #185622)

Nice test!

This revision is now accepted and ready to land.Feb 6 2019, 2:46 PM
This revision was automatically updated to reflect the committed changes.