This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Copy inlinee lines records into the PDB
ClosedPublic

Authored by rnk on May 30 2019, 1:52 PM.

Details

Summary
  • Fixes inline call frame line table display in windbg.
  • Improve llvm-pdbutil to dump extra file ids.
  • Warn on unknown subsections so we don't have this kind of bug in the future.

Event Timeline

rnk created this revision.May 30 2019, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2019, 1:52 PM
Herald added a subscriber: eraman. · View Herald Transcript

I've noticed btw that obj2yaml/yaml2obj doesn't preserve the BinaryAnnotations if SectionData values are skimmed out of the .yaml (to reduce size). There's also something missing for S_FRAMEPROC in that regard:

lld/test/COFF/pdb-inlinees-extrafiles.s
6

Why not:

$ cl /Z7 /c /O2 t.c
$ obj2yaml t.obj >t.yaml

Which changes the test to:

# REQUIRES: x86
# RUN: yaml2obj %s -o=%t.obj
# RUN: lld-link -entry:main -nodefaultlib %t.obj -out:%t.exe -pdb:%t.pdb -debug
# RUN: llvm-pdbutil dump -il %t.pdb | FileCheck %s

I think it should be made clear/easy to re-generate the tests, if we want to change it, or if someone wants to duplicate it.

24

t1.h

lld/test/COFF/pdb-unknown-subsection.s
13

/s/verison/version (and the same below)

rnk added a comment.May 31 2019, 11:07 AM

I've noticed btw that obj2yaml/yaml2obj doesn't preserve the BinaryAnnotations if SectionData values are skimmed out of the .yaml (to reduce size). There's also something missing for S_FRAMEPROC in that regard:

Hm. Going forward, I kind of feel like we should move towards writing tests in assembly and port some of the YAML ones to assembly. I think if we structure the assembly right, it can be more readable and more compact than YAML. It follows how things were done for ELF lld testing. I have an intern who is working on making LLVM generate more human-readable codeview record assembly, which is basically what I'm hacking on here.

rnk marked 2 inline comments as done.May 31 2019, 1:00 PM
rnk added inline comments.
lld/test/COFF/pdb-inlinees-extrafiles.s
6

Basically it boils down to, which do we think is a more useful test format, YAML or assembly? My preference is for assembly, and I'd like to replace a lot of the .test YAML inputs with .s inputs. Maybe that's unique to me, but it mirrors the direction the ELF linker took, where they moved away from YAML object input tests to assembly tests.

I can't generate assembly from MSVC, so I started with clang assembly output, and modified it to exercise the corner case in question.

aganea added inline comments.May 31 2019, 1:35 PM
lld/test/COFF/pdb-inlinees-extrafiles.s
6

Would there be a way to better represent "structured data" in the assembly below? Can you show a mock of what your intern is doing?

If you choose to go the ELF way for coherence, that is justified. Was the move in ELF because of file size concerns?

aganea added inline comments.May 31 2019, 1:52 PM
lld/test/COFF/pdb-inlinees-extrafiles.s
6

Oh I see, you want to represent both code and debug info in the same textual formal, and YAML doesn't offer that (only through raw SectionDatas. If the assembly below could offer structured representations (and named fields), in essence YAML readability, that would be a great!

rnk marked an inline comment as done.May 31 2019, 1:58 PM
rnk added inline comments.
lld/test/COFF/pdb-inlinees-extrafiles.s
6

Exactly, YAML doesn't do a very good job of representing machine code or relocations in a very human readable way. The code is just in hex SectionData, and the relocations are stored separately in an array, so it's not easily modifiable.

Here's an excerpt of the project proposal I wrote:


Currently, type info is emitted using CodeViewRecordIO as bytes, and then dumped as textual comments in the assembler output. It currently looks like this:

	# Struct (0x1003) {
	#   TypeLeafKind: LF_STRUCTURE (0x1505)
	#   MemberCount: 2
	#   Properties [ (0x200)
	#     HasUniqueName (0x200)
	#   ]
	#   FieldList: <field list> (0x1002)
	#   DerivedFrom: 0x0
	#   VShape: 0x0
	#   SizeOf: 16
	#   Name: Foo
	#   LinkageName: .?AUFoo@@
	# }
	.byte	0x22, 0x00, 0x05, 0x15
	.byte	0x02, 0x00, 0x00, 0x02
	.byte	0x02, 0x10, 0x00, 0x00
	.byte	0x00, 0x00, 0x00, 0x00
	.byte	0x00, 0x00, 0x00, 0x00
	.byte	0x10, 0x00, 0x46, 0x6f
	.byte	0x6f, 0x00, 0x2e, 0x3f
	.byte	0x41, 0x55, 0x46, 0x6f
	.byte	0x6f, 0x40, 0x40, 0x00

However, this is needlessly hard to read. It would be much nicer if we emitted assembly that looked like this:

.short 34       # RecordLen
.short 0x1505   # LF_STRUCTURE
.long 0x200     # Properties
.long 0x1002    # FieldList
.long 0x0       # DerivedFrom
.long 0x0       # VShape
.short 16       # SizeOf
.asciz "Foo"    # Name
.asciz ".?AUFoo@@"      # LinkageName

Nilanjana is still getting started and doesn't have a Phabricator account yet, but this is kind of the direction that I want to go. I figured we'd start with type records, then try to make symbol records more structured, and then move on to .cv_def_range, which is pretty gross right now.

aganea accepted this revision.Jun 1 2019, 7:23 AM

LGTM! I'm less sure about the yaml->assembly change, but we could continue that offline :)
Maybe in a subsequent patch it'd be nice to fix obj2yaml IL part to avoid relying on SectionDatas.

lld/test/COFF/pdb-inlinees-extrafiles.s
6

What about a new .codeview directive which would include more structured data, and also solve .cv_def_range? You would get the best of both worlds, assembly code and readability for debug infos.

.codeview "
      - Kind:            LF_STRUCTURE
        Class:
          MemberCount:     0
          Options:         [ None, ForwardReference, HasUniqueName ]
          FieldList:       0
          Name:            Foo
          UniqueName:      '.?AUFoo@@'
          DerivationList:  0
          VTableShape:     0
          Size:            0
"
This revision is now accepted and ready to land.Jun 1 2019, 7:23 AM
aganea added inline comments.Jun 1 2019, 7:27 AM
lld/test/COFF/pdb-inlinees-extrafiles.s
6

Nothing prevents from writing partial or entire sections with this:

.codeview "
sections:        
  - Name:            '.debug$S'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ]
    Alignment:       1
    Subsections:     
      - !Symbols
        Records:         
          - Kind:            S_OBJNAME
            ObjNameSym:      
              Signature:       0
              ObjectName:      'F:\svn\test\t.obj'
          - Kind:            S_COMPILE3
            Compile3Sym:     
              Flags:           [ SecurityChecks, HotPatch ]
              Machine:         X64
              FrontendMajor:   19
              FrontendMinor:   16
              FrontendBuild:   27031
              FrontendQFE:     1
              BackendMajor:    19
              BackendMinor:    16
              BackendBuild:    27031
              BackendQFE:      1
              Version:         'Microsoft (R) Optimizing Compiler'
This revision was automatically updated to reflect the committed changes.
rnk marked an inline comment as done.Jun 3 2019, 11:58 AM
rnk added inline comments.
lld/test/COFF/pdb-inlinees-extrafiles.s
6

That's an interesting idea. Up until now, we haven't allowed llc, clang, or MC to have dependencies on ObjectYAML because we believed it was too large. We'd have to check that belief, and maybe optimize the code size of this feature, which will be seldomly used. But, it's not like we've been policing the size of the clang binary that much, and I'm sure there are other ways we could optimize its size.

One concern I have is that I want to be able to test feeding corrupt records to LLD. I think we can allow mixing YAML symbol records with today's assembler declared codeview records. If we do that, we won't lose any testing capabilities.

I'll have to think about it.

One other thing is, how do we get labels and relocations into and out of YAML? Today a lot of those things are only supported as offsets into sections and section indices. That's sort of the strong suit of assembly, which is really all about laying out code and finalizing offsets.