This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Start emitting source file and line information
ClosedPublic

Authored by rnk on Jun 15 2017, 4:11 PM.

Details

Summary

This is a first step towards getting line info to show up in VS and
windbg. So far, only llvm-pdbutil can parse the PDBs that we produce.
cvdump doesn't like something about our file checksum tables. I'll have
to dig into that next.

This patch adds a new DebugSubsectionRecordBuilder which takes bytes
directly from some other producer, such as a linker, and sticks it into
the PDB. Line tables only need to be relocated. No data needs to be
rewritten.

File checksums and string tables, on the other hand, need to be re-done.

Diff Detail

Repository
rL LLVM

Event Timeline

rnk created this revision.Jun 15 2017, 4:11 PM
zturner edited edge metadata.Jun 15 2017, 4:32 PM

why not use the fancy new yaml2obj stuff instead of checking in binary .obj files for the test?

lld/COFF/PDB.cpp
129 ↗(On Diff #102744)

I know I wrote it and I feel bad finding all these reasons *not* to use it, but maybe we just shouldn't use the visitor here. The only sections that we have to rewrite stuff from a .debug$S section are the symbols, string table, and checksums. Just have a loop that iterates over each record and handle the sections we care about using internal knowledge of the records format to deserialize as little as possible (e.g. in file checksums, you only care about the first 4 bytes)

188 ↗(On Diff #102744)

Yea, if you do what I suggested above, then you can in fact just copy this whole buffer. Because the "name offset" here is a name offset into the checksums buffer, which doesn't get merged with anything so the offset should stay the same.

ruiu added inline comments.Jun 15 2017, 4:33 PM
lld/COFF/Chunks.h
67–68 ↗(On Diff #102744)

You can just make the member variable public because the functions doesn't hide anything now.

lld/COFF/PDB.cpp
139 ↗(On Diff #102744)

Are these virtual functions actually override?

186 ↗(On Diff #102744)

It'd be nice to add a function comment.

195 ↗(On Diff #102744)

Do you actually need to do static_cast twice?

256 ↗(On Diff #102744)

Can you factor out the new code (or part of it) as a new function so that this function doesn't become too long?

268–269 ↗(On Diff #102744)

We usually reverse the condition and do continue.

321 ↗(On Diff #102744)

It is probably more straightforward to define Buffer as std::vector<uint8_t> Buffer(DebugChunk->getSize()).

rnk edited the summary of this revision. (Show Details)Jun 15 2017, 6:15 PM
rnk marked 4 inline comments as done.
rnk added inline comments.
lld/COFF/PDB.cpp
129 ↗(On Diff #102744)

File checksums are actually tricky because you have to extract the size, so I think we should leave that alone, and just be clever about line tables.

139 ↗(On Diff #102744)

Right, they were, but this is gone now. I think our clang warnings are not set correctly for this on Windows.

186 ↗(On Diff #102744)

This is all gone now.

195 ↗(On Diff #102744)

Gone now. I think I printed this at one point and operator<< doesn't work on ulittle16_t, so I needed it.

321 ↗(On Diff #102744)

If I implement Zach's suggestion, I will have to pass ownership or use a longer-lived BumpPtrAllocator (probably the linker's).

rnk updated this revision to Diff 102767.Jun 15 2017, 6:16 PM
  • Put line info directly into the PDB
ruiu added inline comments.Jun 15 2017, 6:27 PM
lld/COFF/Chunks.h
102–104 ↗(On Diff #102767)

Move above protected.

lld/COFF/PDB.cpp
123 ↗(On Diff #102767)

Rename SChunk Chunk?

124–125 ↗(On Diff #102767)

In LLD we usually use shorter names than those, such as just Ret and C.

134 ↗(On Diff #102767)

Do you have to return the last one? If not, you can return here (and return a nullptr at end.)

225–236 ↗(On Diff #102767)

It seems you can factor this out as a function which takes a Chunk and returns a uint8_t*.

zturner added inline comments.Jun 15 2017, 6:31 PM
lld/test/COFF/Inputs/pdb_lines_1.yaml
6 ↗(On Diff #102767)

If we're just testing lines, can we delete everything but the .debug$S section with the appropriate chunk? Seems like you could nuke this entire file down to a single .debug$S section with nothing but a lines section and a checksums section.

205 ↗(On Diff #102767)

See earlier comment about deleting unnecessary subsections, but that said, don't specify both SectionData and Subsections. One or the other. (I should probably find a way to make the yaml error out if you have both)

302 ↗(On Diff #102767)

Delete all this stuff too?

lld/test/COFF/Inputs/pdb_lines_2.yaml
6 ↗(On Diff #102767)

Same deleting comments apply to this file.

I was able to get the first yaml file down to this:

--- !COFF
header:          
  Machine:         IMAGE_FILE_MACHINE_AMD64
  Characteristics: [  ]
sections:        
  - Name:            '.debug$S'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ]
    Alignment:       1
    Subsections:     
      - !Lines
        CodeSize:        19
        Flags:           [  ]
        RelocOffset:     0
        RelocSegment:    0
        Blocks:          
          - FileName:        'c:\src\llvm-project\build\pdb_lines_1.c'
            Lines:           
              - Offset:          0
                LineStart:       2
                IsStatement:     true
                EndDelta:        0
              - Offset:          4
                LineStart:       3
                IsStatement:     true
                EndDelta:        0
              - Offset:          9
                LineStart:       4
                IsStatement:     true
                EndDelta:        0
              - Offset:          14
                LineStart:       5
                IsStatement:     true
                EndDelta:        0
            Columns:         
      - !FileChecksums
        Checksums:       
          - FileName:        'c:\src\llvm-project\build\pdb_lines_1.c'
            Kind:            MD5
            Checksum:        4EB19DCD86C3BA2238A255C718572E7B
          - FileName:        'c:\src\llvm-project\build\foo.h'
            Kind:            MD5
            Checksum:        061EB73ABB642532857A4F1D9CBAC323
      - !StringTable
        Strings:         
          - 'c:\src\llvm-project\build\pdb_lines_1.c'
          - 'c:\src\llvm-project\build\foo.h'
  - Name:            '.debug$S'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_LNK_COMDAT, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ]
    Alignment:       1
    Subsections:     
      - !Lines
        CodeSize:        14
        Flags:           [  ]
        RelocOffset:     0
        RelocSegment:    0
        Blocks:          
          - FileName:        'c:\src\llvm-project\build\foo.h'
            Lines:           
              - Offset:          0
                LineStart:       2
                IsStatement:     true
                EndDelta:        0
              - Offset:          4
                LineStart:       3
                IsStatement:     true
                EndDelta:        0
              - Offset:          9
                LineStart:       4
                IsStatement:     true
                EndDelta:        0
            Columns:         
symbols:         
  - Name:            '.debug$S'
    Value:           0
    SectionNumber:   2
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
    SectionDefinition: 
      Length:          432
      NumberOfRelocations: 4
      NumberOfLinenumbers: 0
      CheckSum:        0
      Number:          0
  - Name:            '.debug$S'
    Value:           0
    SectionNumber:   6
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
    SectionDefinition: 
      Length:          148
      NumberOfRelocations: 4
      NumberOfLinenumbers: 0
      CheckSum:        0
      Number:          5
      Selection:       IMAGE_COMDAT_SELECT_ASSOCIATIVE
...

Can we do something like this? Seems to produce a valid object file and round-trips back. Will it link? I don't mind having a separate test that has a more real-world example, but in that case we shouldn't check in the yaml file with instructions on how to generate it, we should just actually compile something as part of the test. Just seems like we should prefer keeping tests minimal whenever possible so that someone coming along and reading it can understand what all the moving parts are without having to get distracted by irrelevant stuff.

rnk marked 2 inline comments as done.Jun 16 2017, 9:01 AM

I was able to get the first yaml file down to this:
...
Can we do something like this? Seems to produce a valid object file and round-trips back. Will it link? I don't mind having a separate test that has a more real-world example, but in that case we shouldn't check in the yaml file with instructions on how to generate it, we should just actually compile something as part of the test. Just seems like we should prefer keeping tests minimal whenever possible so that someone coming along and reading it can understand what all the moving parts are without having to get distracted by irrelevant stuff.

You've removed the relocations list and the .text section, so this seems overly reduced. This object file would probably cause these CHECKs to fail in the test:
CHECK-NEXT: RelocOffset: 32
CHECK-NEXT: RelocSegment: 2

I don't like hacking down object file yaml because you can end up with artificial, unreproducible test cases like this that can't easily be edited. For example, suppose we want to test an #include mid-function to test filename blocks in line tables. We'd have to do all this reduction again, and it's not automatable. See also icf-associative.test, which I just had to modify because it was similarly reduced to contain invalid .debug$S contents, which LLD now rejects.

LLD currently does not depend on clang, so its test suite cannot depend on clang. That said, compiler-rt does not depend on clang and it uses clang as part of its test suite, so there is definitely prior art, but it will add more dependencies to check-lld.

lld/COFF/Chunks.h
102–104 ↗(On Diff #102767)

I didn't do that because it would introduce padding bytes. Looking now, there's more we can do to get a compact layout.

lld/COFF/PDB.cpp
123 ↗(On Diff #102767)

I want to be clear that this is the .debug$S chunk, not the .debug$T chunk.

134 ↗(On Diff #102767)

I want to warn if there are two, so I can return after the warning and use the first one, but I don't want to return early on finding the main chunk. I'm also not sure this early searching is necessary. It's possible we can defer all this work to the end now that we're passing line tables into the PDB directly.

lld/test/COFF/Inputs/pdb_lines_1.yaml
6 ↗(On Diff #102767)

There are actually two .debug$S sections, and it's interesting to test that because it exercises the subsections loop.

205 ↗(On Diff #102767)

Oops, missed this one, there were two sections.

302 ↗(On Diff #102767)

I got rid of the @ symbols, but if I remove the unwind info the test starts to break.

lld/test/COFF/Inputs/pdb_lines_2.yaml
6 ↗(On Diff #102767)

Removing .drective causes error: broken object file from the linker.

rnk updated this revision to Diff 102836.Jun 16 2017, 9:06 AM
rnk marked an inline comment as done.
  • comments
ruiu added inline comments.Jun 16 2017, 9:29 AM
lld/COFF/Chunks.h
102–104 ↗(On Diff #102767)

Then how about moving this up and moving ChunkKind down?

lld/COFF/PDB.cpp
134 ↗(On Diff #102767)

Is it really a real scenario that an object file containing more htan one .debug$T is fed to the linker? There are a lot of different ways for an object file to be broken, and we in general don't try to make our linker bullet-proof. I'd do the simplest thing we can do at least initially.

113 ↗(On Diff #102836)

You can omit llvm::.

157 ↗(On Diff #102836)

Remove llvm::.

242 ↗(On Diff #102836)

Remove llvm::.

rnk added inline comments.Jun 16 2017, 2:09 PM
lld/COFF/Chunks.h
102–104 ↗(On Diff #102767)

We will still need a new public / protected section if you want to keep the fields grouped together without the constructor in the middle. I like doing that because it makes it easy to analyze the memory usage of the type. It's also Google C++ style, I think.

lld/COFF/PDB.cpp
113 ↗(On Diff #102836)

I went ahead and removed the rest of the unnecessary namespaces in this file.

rnk updated this revision to Diff 102873.Jun 16 2017, 2:10 PM
  • namespaces
rnk updated this revision to Diff 103059.Jun 19 2017, 9:18 AM
  • remove code to find the main .debug$S section
rnk added a comment.Jun 19 2017, 9:20 AM

I removed all the code to find the "main" .debug$S section, since that's where all the remaining complexity is. I think as long as file checksum offsets don't change, we don't need to treat the "main" section specially. That may change in the future, but for now this keeps the code simple.

ruiu accepted this revision.Jun 19 2017, 10:09 AM

LGTM

lld/COFF/PDB.cpp
81–83 ↗(On Diff #103059)

Nice!

This revision is now accepted and ready to land.Jun 19 2017, 10:09 AM
This revision was automatically updated to reflect the committed changes.