Enable llvm-pdbutil to list enumerations using native PDB reader
ClosedPublic

Authored by amccarth on Jul 21 2017, 3:00 PM.

Details

Summary

This extends the native reader to enable llvm-pdbutil to list the enums in a
PDB and it includes a simple test. It does not yet list the values in the
enumerations, which requires an actual implementation of
NativeEnumSymbol::FindChildren.

To exercise this code, use a command like:

llvm-pdbutil pretty -native -enums foo.pdb

Diff Detail

Repository
rL LLVM
amccarth created this revision.Jul 21 2017, 3:00 PM

Is it that much extra work to implement children? It would be nice if this could just be feature complete. It seems like all you have to do is read the FieldList member of the CodeView record, load that item, and then create an enumerator that iterates each item and returns a NativeConstantSymbol or whatever we're calling it

rnk added inline comments.Jul 24 2017, 10:02 AM
llvm/include/llvm/DebugInfo/PDB/Native/NativeEnumTypes.h
22 ↗(On Diff #107724)

I'm wondering if we can get away with implementing these findChildren methods by building a vector of PDBSymbols up front, and then this class becomes a simple adapter from vector to IPDBEnumChildren. It seems to me if the caller calls findChildren, they're going to walk them all, so we can do O(n) work without slowing things down asymptotically.

llvm/lib/DebugInfo/PDB/Native/NativeExeSymbol.cpp
48–49 ↗(On Diff #107724)

This check will never fail. The optional HadError parameter is essentially an awkward way to thread error handling out of a range-based-for. The idea is that if the iterator encounters an error, it halts iteration and sets *HadError to true. The caller can check for failure outside the loop. We don't want to eagerly check for errors because that would require iterating over all the data.

Does this method get called often or early? I'm wondering if it would be better to iterate and filter the type stream up front and make a vector.

Is it that much extra work to implement children? It would be nice if this could just be feature complete. It seems like all you have to do is read the FieldList member of the CodeView record, load that item, and then create an enumerator that iterates each item and returns a NativeConstantSymbol or whatever we're calling it

I think I see how to do this, but I'd feel better doing it as a subsequent patch. I'd need to make the NativeSession be the owner of the Tpi stream.

llvm/include/llvm/DebugInfo/PDB/Native/NativeEnumTypes.h
22 ↗(On Diff #107724)

Am I missing something? I think this is still O(n), as calls to getNext continue through the types to the next one that matches. It may be a small speedup in cases where the caller is planning to iterate over the entire range. But it's overkill if the caller isn't going to iterate the entire list. And I don't think it changes the complexity.

And a vector would require more memory, which may or may not be a big deal.

llvm/lib/DebugInfo/PDB/Native/NativeExeSymbol.cpp
48–49 ↗(On Diff #107724)

OK, I get what you've said about the error, and I can remove the unnecessary check.

In a dumper, this method is called once for the enums, so, no, it's not called frequently. In a random-access scenario, I'd expect we'd be following TypeIndexes rather than asking the Exe scope for all of the enums it contains.

zturner added inline comments.Jul 27 2017, 6:02 PM
llvm/include/llvm/DebugInfo/PDB/Native/NativeEnumSymbol.h
39–64 ↗(On Diff #107724)

Can we get rid of the ifdefs?

llvm/lib/DebugInfo/PDB/Native/NativeEnumSymbol.cpp
23–24 ↗(On Diff #107724)

How about cantFail(visitTypeRecord(CV, *this));

61 ↗(On Diff #107724)

Remove ifdefs?

llvm/lib/DebugInfo/PDB/Native/NativeEnumTypes.cpp
27–29 ↗(On Diff #107724)

This is incorrect. getChildCount() is supposed to return the number of values that the enumerator is going to enumerate. The value being returned here doesn't account for the fact that not everything is an enum. Also, I'm pretty sure we use this method in llvm-pdbutil pretty. In any case, it seems pretty important to me.

34 ↗(On Diff #107724)

Not crazy about leaving this unsupported, but if we're going to do it, at the very least it needs a llvm_unreachable.

llvm/lib/DebugInfo/PDB/Native/NativeExeSymbol.cpp
47 ↗(On Diff #107724)

If you call Tpi->typeCollection() instead of Tpi->types(), then you can store a LazyRandomTypeCollection in the NativeEnumTypes, which should be more efficient. This also solves the problem of making the session own the TpiStream, and should enable you to easily create an enumerator for the values of the enum.

In fact, I'd like to eventually delete the types() method now that we have the random access type collection.

llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
109 ↗(On Diff #107724)

Should we be asserting that I.Kind == LF_ENUM?

amccarth marked an inline comment as done.Jul 28 2017, 11:16 AM
amccarth added inline comments.
llvm/lib/DebugInfo/PDB/Native/NativeEnumTypes.cpp
27–29 ↗(On Diff #107724)

Yeah, the "doesn't really return what we want" in the comment was intended to capture that this isn't right.

You and I had a discussion a while back about removing this functionality because it requires enumerating all the types an extra time just to count them, for rather minimal value. But we didn't really reach a conclusion.

As far as I can tell, pretty is the only user, and it seems like it adds only minimal value. So my preference would be to remove the getChildCount method from the enumerators altogether.

34 ↗(On Diff #107724)

This is related to the previous comment.

On the one hand, the interface defined in IPDBEnumChildren suggests it's a simple forward iterator (getNext, reset). On the other hand, it's an indexed collection (getChildCount, getChildAtIndex). It kind of violates the single responsibility principle.

We could enumerate everything once, essentially building a container, which would let us implement all the methods efficiently at the cost of more memory and moving a bunch of the cost to object creation time.

It isn't a big deal for a dumper that's going through everything once, but it's not hard to imagine other users that would prefer the slim iterator idea.

amccarth marked an inline comment as done.Jul 28 2017, 4:54 PM

I'm reworking NativeEnumTypes per the discussion. I'll have an update on Monday.

amccarth updated this revision to Diff 109244.Aug 1 2017, 3:50 PM

Enable llvm-pdbutil to list enumerations using native PDB reader

Per suggestions, changed NativeEnumTypes to be both a container and an
iterator, which allows correct implementation of counting, etc.

Implemented most of the NativeEnumSymbol methods that had been #if'ed out. I
stopped overriding the rest of them, as the default handling of the
NativeRawSymbol base class is appropriate.

Still planning to list the actual members of the numeration in a separate
patch.

amccarth marked 4 inline comments as done.Aug 2 2017, 9:59 AM

I'll ping when it's ready for another look.

llvm/lib/DebugInfo/PDB/Native/NativeExeSymbol.cpp
47 ↗(On Diff #107724)

Working on this.

amccarth updated this revision to Diff 109767.Aug 4 2017, 10:01 AM

Simplified NativeEnumTypes by moving work to NativeSession which already had
access to Tpi and thus LazyRandomTypeCollection. NativeEnumTypes can now
exactly answer how many objects is knows about because it scans the types
when its constructed and maintains a vector of the relevant TypeIndexes.

zturner added inline comments.Aug 4 2017, 12:58 PM
llvm/lib/DebugInfo/PDB/Native/NativeEnumSymbol.cpp
17–18 ↗(On Diff #109767)

Can you do

using namespace llvm;
using namespace llvm::pdb;

instead of putting the namespace inline? That's the preferred approach for LLVM style.

21 ↗(On Diff #109767)

Can you choose a better name than fooCV? :)

24 ↗(On Diff #109767)

should we assert that fooCV.type() == TypeLeafKind::LV_ENUM? Ideally it would be nice if this constructor were private and there was a single method that would create a Native symbol from a CVType that would create the right concrete type. But we can save that for later.

llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
113 ↗(On Diff #109767)

early return?

118 ↗(On Diff #109767)

It looks like this will create a new instance every time someone calls this method? I thought it was going to do a scan up front and cache everything, and then subsequent calls would just return directly from the cache?

amccarth marked 6 inline comments as done.Aug 4 2017, 1:16 PM
amccarth added inline comments.
llvm/lib/DebugInfo/PDB/Native/NativeEnumSymbol.cpp
17–18 ↗(On Diff #109767)

Um. OK, but that seems an odd style in general, since it would be prone to name collisions.

24 ↗(On Diff #109767)

Sure.

Now they are constructed only by NativeSession, which has such an assertion. I guess all these NativeXXXSymbol classes should have private constructors, and the NativeSession could be a friend to them all.

llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
118 ↗(On Diff #109767)

The NativeEnumTypes "iterator" caches just the TypeIndexes and lazily calls this method to create the actual symbols when they're needed.

I think you're right that there's a hole in the logic here that could cause it to create the same symbol multiple times, which is a problem. I'll fix that by moving this to findSymbolByTypeIndex. (See the TODO below.)

zturner accepted this revision.Aug 4 2017, 1:20 PM
This revision is now accepted and ready to land.Aug 4 2017, 1:20 PM
amccarth marked 4 inline comments as done.Aug 4 2017, 1:44 PM
amccarth updated this revision to Diff 109817.Aug 4 2017, 1:46 PM

Addressed additional feedback.

Also made createEnumSymbol go through findSymbolByTypeIndex to avoid making
duplicate symbols.

This revision was automatically updated to reflect the committed changes.