Page MenuHomePhabricator

Teach `llvm-pdbutil pretty -native` about `-injected-sources`
ClosedPublic

Authored by thakis on Jul 9 2019, 10:36 AM.

Details

Summary

pretty -native -injected-sources -injected-source-content works with this patch, and produces identical output to the dia version

  • I'd like to get feedback on the high-level design here, since I'm not familiar with how things are supposed to be done in this code. In particular, does the split between NativeEnumInjectedSources and InjectedSourceStream make sense? Are these names consistent with the rest of the code?
  • The injected source _writing_ is done in PDBFileBuilder. Now that InjectedSourceStream exists, should there be a InjectedSourceStreamBuilder that does the building part? (If so, probably in a follow-up)

(I wish I had had this when working on PR41626, and when I was looking at natvis stuff again the other day I realized I had almost forgotten all this stuff again. So I figured I'd try to implement this while I still remember it.)

Diff Detail

Repository
rL LLVM

Event Timeline

thakis created this revision.Jul 9 2019, 10:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2019, 10:36 AM
thakis marked an inline comment as done.Jul 9 2019, 10:37 AM
thakis added inline comments.
llvm/tools/llvm-pdbutil/llvm-pdbutil.cpp
937 ↗(On Diff #208735)

(This is needed because NativeSession returns nullptr if there are no injected sources. DIASession also has a return nullptr in one case, but I guess it's never hit in practice?)

thakis edited the summary of this revision. (Show Details)

-injected-source-content works

thakis edited the summary of this revision. (Show Details)
thakis added a reviewer: rnk.

rebase, done

Probably time to call ETIMEOUT on zturner. rnk, can you look at this one too given you already have seen all the prerequisite CLs? I'm pretty happy with this by now.

remove a dead function

rnk added a comment.Jul 15 2019, 3:45 PM

I'd like to get feedback on the high-level design here, since I'm not familiar with how things are supposed to be done in this code. In particular, does the split between NativeEnumInjectedSources and InjectedSourceStream make sense? Are these names consistent with the rest of the code?

Looks consistent with the existing DIA interface implementations to me, so I'd go with it.

The injected source _writing_ is done in PDBFileBuilder. Now that InjectedSourceStream exists, should there be a InjectedSourceStreamBuilder that does the building part? (If so, probably in a follow-up)

Sounds like a good follow-up.

llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp
40 ↗(On Diff #209930)

I think this class only uses I->second throughout, so would it be simpler to take a reference to the hash table value? I think it's SrcHeaderBlockEntry, right? I think that would be more readable, since we aren't iterating, we shouldn't need an iterator, we're just looking at one source file entry.

50 ↗(On Diff #209930)

Hm. I guess perhaps we should've allowed error propagation out of these APIs. I guess that's not fixable without an interface change.

thakis marked 2 inline comments as done.

give NativeInjectedSource a SrcHeaderBlockEntry instead of a const_iterator

llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp
40 ↗(On Diff #209930)

Good suggestion, done.

zturner added inline comments.Jul 16 2019, 9:05 AM
llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp
50 ↗(On Diff #209930)

The thinking at the time was that anything that might generate an error should happen during load, not on lazy evaluation of the properties. That's why most of the classes have a initialize or reload() static method which returns an Expected<T>, so that all validation can be done up front. I'm open to the possibility that that wasn't the best decision though, but it kept the code simple and handled most of the cases we would expect to see in practice.

thakis marked an inline comment as done.Jul 16 2019, 9:30 AM
thakis added inline comments.
llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp
50 ↗(On Diff #209930)

Thanks for chiming in! That makes sense to me.

I changed the code to verify that all referenced strings exist in reload() and removed the error handling here. I didn't (yet) do this for getCode() since that'd duplicate quite a bit of code. From your comment, it kind of sounds like the intended design is that InjectedSourceStream should grab _all_ data from disk and store it in memory, and the NativeEnumInjectedSources should refer to that. Is that accurate? If so, InjectedSourceStream should probably have a

struct InjectedSource {
  uint32_t CRC32;
  StringRef Name;
  StringRef ObjName;
  StringRef VName;
  PDB_SourceCompression Compression;
  llvm::MemoryBuffer Contents;
};

and store a name->InjectedSource map, and check that the CRC32 matches the CRC32 of Contents at load time too. Does that sound right?

If so, is there an existing method to load the contents of a stream into a MemoryBuffer somewhere?

half-address zturner comment

thakis marked an inline comment as done.Jul 16 2019, 10:04 AM
thakis added inline comments.
llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp
50 ↗(On Diff #209930)

Looking at this some more: For just natvis files, loading everything into memory seems fine, but the docs for IDiaInjectedSource kind of suggest that it's possible to store the source code of all cc files compiled to obj files in the pdb, and loading that eagerly would be quite a bit of data. OTOH, it's only loaded eagerly if PDBFile::getInjectedSourceStream() is called, and the only caller of it currently walks all entries anyhow.

rnk accepted this revision.Jul 16 2019, 10:34 AM

lgtm, up to you if you want to wait for @zturner. I like the new error handling approach. :)

llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp
50 ↗(On Diff #209930)

If so, is there an existing method to load the contents of a stream into a MemoryBuffer somewhere?

I don't think so, we moved data into and out of the stream APIs with ArrayRef<uint8_t>, typically stored in BumpPtrAllocators.

This revision is now accepted and ready to land.Jul 16 2019, 10:34 AM
thakis marked an inline comment as done.Jul 16 2019, 10:54 AM
thakis added inline comments.
llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp
50 ↗(On Diff #209930)

Thanks, I'll land this for now as it makes llvm-pdbutil more useful, and then I can iterate from there.

I found out how to get source files into /src/headerblock, it's a C# thing (but it still doesn't set obj):

C:\src\llvm-mono>type Hello.cs
using System;
namespace HelloWorld {
class Hello {
  static void Main() { Console.WriteLine("Hello World!"); }
}
}

C:\src\llvm-mono>csc Hello.cs /debug
Microsoft (R) Visual C# Compiler version 2.10.0.0 (b9fb1610)
Copyright (C) Microsoft Corporation. All rights reserved.


C:\src\llvm-mono>csc Hello.cs /debug /nologo

C:\src\llvm-mono>out\gn\bin\llvm-pdbutil.exe pretty -injected-sources Hello.pdb
Summary for C:\src\llvm-mono\Hello.pdb
  Size: 11776 bytes
  Guid: {9F3B98FA-6D65-D34E-8CF9-86994C5C9569}
  Age: 1
  Attributes: HasPrivateSymbols
---INJECTED SOURCES---
  C:\src\llvm-mono\Hello.cs (92 bytes): obj=<null>, vname=c:\src\llvm-mono\hello.cs, crc=3766985073, compression=Unknown (101)

(llvm-mono just means "monorepo"; in the c# context it could lead to confusion I suppose :P)

This revision was automatically updated to reflect the committed changes.
zturner added inline comments.Jul 16 2019, 1:32 PM
llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp
50 ↗(On Diff #209930)

I don't think you need a MemoryBuffer, because a MemoryBuffer has to be contiguous, which would force a copy out of the mapped PDB.

That's what the BinaryStreamRef stuff is for, it's essentially a wrapper over a PDB file, and logical offset+length in the PDB file. You can read from it as if it were a memory buffer, but it handles the case where the buffer isn't contiguous in the backing file for you and only copies when necessary.

So your idea is basically correct, except I'd store a BinaryStreamRef instead of a MemoryBuffer. I think BinaryStreamReader has a method called readBinaryStreamRef (or something similar), that takes a length and just returns you a binary stream ref that is safe to store (at least until you close the master PDBFile).

DbiStream::initialize() is a good example, because it does stuff like this already.

50 ↗(On Diff #209930)

Yea it's fine to land this now and iterate on it. FWIW, I think it's probably *more* work to get everything into a MemoryBuffer than it is to just save off a BinaryStreamRef, and then it'll just work no matter what or how much stuff you store in there. But maybe I'm biased :)

And yes, it's possible to store arbitrary file content into the injected sources. I had the idea that we could do this for generated code (for example the output of protoc), this way you could debug into code that is generated as part of build step and won't ever be archived on source server. In theory Injected Sources are the way to do this, but the question is really whether the debugger supports it. It probably does for C#, but not so sure about C++. If it does, it would be really cool though.