[PDB] Enable NativeSession to create symbols for built-in types on demand
ClosedPublic

Authored by amccarth on Jul 7 2017, 5:01 PM.

Details

Summary

There is a reserved range of type indexes for built-in types (like integers).
This will create a symbol for a built-in type if the caller askes for one by
type index. This is also plumbing for being able to recall symbols by type
index in general, but user-defined types will come in subsequent patches.

Diff Detail

Repository
rL LLVM
amccarth created this revision.Jul 7 2017, 5:01 PM
rnk added inline comments.Jul 11 2017, 9:57 AM
llvm/include/llvm/DebugInfo/PDB/Native/NativeSession.h
42 ↗(On Diff #105726)

Any reason not to use codeview::TypeIndex for this parameter? Just trying to avoid exposing internal libraries out through this interface?

86 ↗(On Diff #105726)

Generally, LLVM uses bare uint32_t rather than std::uint32_t. You could argue it's C legacy, but that's currently the norm, so I'd rather stick with it.

llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
44 ↗(On Diff #105726)

It seems unlikely that a builtin type will grow beyond 4GB.

zturner added inline comments.Jul 11 2017, 10:05 AM
llvm/include/llvm/DebugInfo/PDB/Native/NativeSession.h
13 ↗(On Diff #105726)

Can you use llvm::DenseMap instead (and delete this #include)

42 ↗(On Diff #105726)

+1

llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
107–108 ↗(On Diff #105726)

I can think of a couple of issues with this function.

  1. I might be misunderstanding the purpose of findSymbolByTypeIndex, but doesn't TypeIndex imply that this function *only* works for types? What about symbols, which don't have a type index? Would those get a different function?
  1. What about id types (e.g. types from the IPI string instead of the TPI stream)? Would we eventually add a findSymbolByIdIndex too?
  1. Why even have a map? Why not just have a std::vector<std::unique_ptr<NativeRawSymbol>> Types; and then in the constructor you resize it to Tpi.getNumTypeRecords(), and in this function if it's simple you create one on the fly, otherwise you access Types[TI.toArrayIndex()], and if it's nullptr you create a new one otherwise you return the existing one? This would be O(1) instead of doing the hash computation.
zturner added inline comments.Jul 11 2017, 10:07 AM
llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
107–108 ↗(On Diff #105726)

Alternatively, maybe we don't even need a cache? LazyRandomTypeCollection already has O(1) amortized random access, as it is effectively already the cache. So maybe you can create everything on the fly, and not even have a cache?

Ok I think I get it. There's a distinction between "sym index id" (DIA terminology) and "type index" (codeview terminology) that I think is getting lost here. Can we make a typedef for SymIndexId (call it something else if you need to, since that would conflict with DIA typedefs. Like PDB_SymId. Then have this function return a PDB_SymId and accept a codeview::TypeIndex?

llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
114–115 ↗(On Diff #105726)

Can you add a fixme to handle this? Simple Types include things like pointers to other simple types, which we'll need to handle.

amccarth marked 2 inline comments as done.Jul 11 2017, 12:02 PM

I've got one more suggestion from Zach to take care of, then I'll upload an updated patch.

llvm/include/llvm/DebugInfo/PDB/Native/NativeSession.h
42 ↗(On Diff #105726)

I tried TypeIndex originally, but there were two issues:

  1. Users of the API generally get the type indexes from symbol methods like getTypeId, which (like DIA), return a uint32_t. Using a TypeIndex seemed to introduce conversions without much value
  2. There's no standard hash function for a TypeIndex, and defining one seemed gratuitous.

I've gone ahead and switched back to TypeIndex.

llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
44 ↗(On Diff #105726)

Sure, but, again, this was a matter of conforming to the types used by the already-defined API. The getLength() method for a symbol returns a uint64_t, so I was trying to avoid unnecessary conversions. I've switched to uint32_t.

107–108 ↗(On Diff #105726)
  1. No, this is specifically for looking up a type symbol by its type index, not the unique symbol index that the API hands out. The motivation here is, for example, a record for an enum has a method called getUnderlyingType which returns a TypeIndex, not a unique symbol index, so we need to be able to find a type symbol by its type index.
  1. I'm still fuzzy on what the IPI stream contains versus the TPI. What is an id type?
  1. The symbol cache, which is indexed by symbol IDs is just a vector.
107–108 ↗(On Diff #105726)

Re: LazyRandomTypeCollection. Yes, I think that makes sense to do this for the user-defined types, so that seems like it would replace the need for caching those. But findSymbolByTypeIndex needs to work for both user-defined types and built-in types.

The only trick is that the cached objects are stamped with the symbol's unique ID, so somewhere I'm going to have to keep track of those so that re-visiting a type gets the original ID.

amccarth updated this revision to Diff 106089.Jul 11 2017, 1:31 PM

Addressed feedback from zturner and rnk.

llvm/include/llvm/DebugInfo/PDB/Native/NativeSession.h
13 ↗(On Diff #105726)

Sure, but the TypeIndexToSymbolId map won't be dense in most cases. There will likely be a relatively small number of entries with scattered key values. What (where?) are the guidelines for when to use DenseMap?

amccarth updated this revision to Diff 106108.Jul 11 2017, 3:48 PM

Switched from std::unordered_map to llvm::DenseMap per zturner's suggestion.

rnk accepted this revision.Jul 12 2017, 11:30 AM

lgtm

This revision is now accepted and ready to land.Jul 12 2017, 11:30 AM
This revision was automatically updated to reflect the committed changes.