This is an archive of the discontinued LLVM Phabricator instance.

[CodeView] Emit proper debug info for ref-qualified member functions
ClosedPublic

Authored by zturner on Nov 17 2018, 8:57 AM.

Details

Summary

When you have a member function with a ref-qualifier, for example:

struct Foo {
  void Func() &;
  void Func2() &&;
};

clang-cl was not emitting this information. Doing so is a bit awkward, because it's not a property of the LF_MFUNCTION type, which is what you'd expect. Instead, it's a property of the this pointer which is actually an LF_POINTER. This record has an attributes bitmask on it, and our handling of this bitmask was all wrong. We had some parts of the bitmask defined incorrectly, but importantly for this bug, we didn't know about these extra 2 bits that represent the ref qualifier at all. We can see this in the reference implementation here:

https://github.com/Microsoft/microsoft-pdb/blob/master/include/cvinfo.h#L1524-L1525

And we can also observe that clang-cl and cl emit different records by compiling a simple program and using llvm-pdbutil to dump the results of each one.

// cat foo.cpp
struct A {
  int NoRefQual();

  int RefQual() &;
  int RefQual() &&;

  int LValueRef() &;

  int RValueRef() &&;
};

int main(int argc, char **argv) {
  A *GenericPtr = nullptr;
  A a;
  return 0;
}

With clang:

$ clang-cl.exe /Z7 /GS- /GR- foo.cpp /o foo-clang.exe

$ llvm-pdbutil.exe dump -types -type-index=0x1009 -dependents -type-data foo-clang.pdb


                     Types (TPI Stream)
============================================================
  Showing 1 records and their dependents (7 records total)
   0x1003 | LF_STRUCTURE [size = 32] `A`
            unique name: `.?AUA@@`
            vtable: <no type>, base list: <no type>, field list: <no type>
            options: forward ref (-> 0x1009) | has unique name, sizeof 0
           Bytes (
             0000: 1E000515 00008002 00000000 00000000 00000000 00004100 2E3F4155 41404000  |......................A..?AUA@@.|
           )
   0x1004 | LF_POINTER [size = 12]
            referent = 0x1003, mode = pointer, opts = None, kind = ptr64
           Bytes (
             0000: 0A000210 03100000 0C000100                                               |............|
           )
   0x1005 | LF_ARGLIST [size = 8]
           Bytes (
             0000: 06000112 00000000                                                        |........|
           )
   0x1006 | LF_MFUNCTION [size = 28]
            return type = 0x0074 (int), # args = 0, param list = 0x1005
            class type = 0x1003, this type = 0x1004, this adjust = 0
            calling conv = cdecl, options = None
           Bytes (
             0000: 1A000910 74000000 03100000 04100000 00000000 05100000 00000000           |....t.......................|
           )
   0x1007 | LF_METHODLIST [size = 20]
            - Method [type = 0x1006, vftable offset = -1, attrs = public]
            - Method [type = 0x1006, vftable offset = -1, attrs = public]
           Bytes (
             0000: 12000612 03000000 06100000 03000000 06100000                             |....................|
           )
   0x1008 | LF_FIELDLIST [size = 80]
            - LF_ONEMETHOD [name = `NoRefQual`]
              type = 0x1006, vftable offset = -1, attrs = public
              Bytes (
                0000: 03000610 00004E6F 52656651 75616C00                                      |......NoRefQual.|
              )
            - LF_METHOD [name = `RefQual`, # overloads = 2, overload list = 0x1007]
              Bytes (
                0000: 02000710 00005265 66517561 6C00                                          |......RefQual.|
              )
            - LF_ONEMETHOD [name = `LValueRef`]
              type = 0x1006, vftable offset = -1, attrs = public
              Bytes (
                0000: 03000610 00004C56 616C7565 52656600                                      |......LValueRef.|
              )
            - LF_ONEMETHOD [name = `RValueRef`]
              type = 0x1006, vftable offset = -1, attrs = public
              Bytes (
                0000: 03000610 00005256 616C7565 52656600                                      |......RValueRef.|
              )
           Bytes (
             0000: 4E000312 11150300 06100000 4E6F5265 66517561 6C00F2F1 0F150200 07100000  |N...........NoRefQual...........|
             0020: 52656651 75616C00 11150300 06100000 4C56616C 75655265 6600F2F1 11150300  |RefQual.........LValueRef.......|
             0040: 06100000 5256616C 75655265 6600F2F1                                      |....RValueRef...|
           )
   0x1009 | LF_STRUCTURE [size = 32] `A`
            unique name: `.?AUA@@`
            vtable: <no type>, base list: <no type>, field list: 0x1008
            options: has unique name, sizeof 1
           Bytes (
             0000: 1E000515 05000002 08100000 00000000 00000000 01004100 2E3F4155 41404000  |......................A..?AUA@@.|
           )

If we follow the type indices here, we can see that RValueRef, LValueRef, both overloads of RefQual and also NoRefQual all have the same LF_MFUNCTION type, which is 0x1006. Since they have the same LF_MFUNCTION, they also have the same LF_POINTER for the this pointer since it is part of that record. For the sake of comparison later, the dump of this pointer record is:

referent = 0x1003, mode = pointer, opts = None, kind = ptr64  0000: 0A000210 03100000 0C000100

With MSVC

$ cl.exe /Z7 /GS- /GR- foo.cpp /o foo-cl.exe

$ llvm-pdbutil.exe dump -types -type-index=0x100E -dependents -type-data foo-cl.pdb


                     Types (TPI Stream)
============================================================
  Showing 1 records and their dependents (11 records total)
   0x1003 | LF_STRUCTURE [size = 32] `A`
            unique name: `.?AUA@@`
            vtable: <no type>, base list: <no type>, field list: <no type>
            options: forward ref (-> 0x100E) | has unique name, sizeof 0
           Bytes (
             0000: 1E000515 00008002 00000000 00000000 00000000 00004100 2E3F4155 41404000  |......................A..?AUA@@.|
           )
   0x1005 | LF_POINTER [size = 12]
            referent = 0x1003, mode = pointer, opts = const, kind = ptr64
           Bytes (
             0000: 0A000210 03100000 0C040100                                               |............|
           )
   0x1006 | LF_ARGLIST [size = 8]
           Bytes (
             0000: 06000112 00000000                                                        |........|
           )
   0x1007 | LF_MFUNCTION [size = 28]
            return type = 0x0074 (int), # args = 0, param list = 0x1006
            class type = 0x1003, this type = 0x1005, this adjust = 0
            calling conv = cdecl, options = None
           Bytes (
             0000: 1A000910 74000000 03100000 05100000 00000000 06100000 00000000           |....t.......................|
           )
   0x1008 | LF_POINTER [size = 12]
            referent = 0x1003, mode = pointer, opts = const, kind = ptr64
           Bytes (
             0000: 0A000210 03100000 0C042100                                               |..........!.|
           )
   0x1009 | LF_MFUNCTION [size = 28]
            return type = 0x0074 (int), # args = 0, param list = 0x1006
            class type = 0x1003, this type = 0x1008, this adjust = 0
            calling conv = cdecl, options = None
           Bytes (
             0000: 1A000910 74000000 03100000 08100000 00000000 06100000 00000000           |....t.......................|
           )
   0x100A | LF_POINTER [size = 12]
            referent = 0x1003, mode = pointer, opts = const, kind = ptr64
           Bytes (
             0000: 0A000210 03100000 0C041100                                               |............|
           )
   0x100B | LF_MFUNCTION [size = 28]
            return type = 0x0074 (int), # args = 0, param list = 0x1006
            class type = 0x1003, this type = 0x100A, this adjust = 0
            calling conv = cdecl, options = None
           Bytes (
             0000: 1A000910 74000000 03100000 0A100000 00000000 06100000 00000000           |....t.......................|
           )
   0x100C | LF_METHODLIST [size = 20]
            - Method [type = 0x1009, vftable offset = -1, attrs = public]
            - Method [type = 0x100B, vftable offset = -1, attrs = public]
           Bytes (
             0000: 12000612 03000000 09100000 03000000 0B100000                             |....................|
           )
   0x100D | LF_FIELDLIST [size = 80]
            - LF_ONEMETHOD [name = `NoRefQual`]
              type = 0x1007, vftable offset = -1, attrs = public
              Bytes (
                0000: 03000710 00004E6F 52656651 75616C00                                      |......NoRefQual.|
              )
            - LF_METHOD [name = `RefQual`, # overloads = 2, overload list = 0x100C]
              Bytes (
                0000: 02000C10 00005265 66517561 6C00                                          |......RefQual.|
              )
            - LF_ONEMETHOD [name = `LValueRef`]
              type = 0x100B, vftable offset = -1, attrs = public
              Bytes (
                0000: 03000B10 00004C56 616C7565 52656600                                      |......LValueRef.|
              )
            - LF_ONEMETHOD [name = `RValueRef`]
              type = 0x1009, vftable offset = -1, attrs = public
              Bytes (
                0000: 03000910 00005256 616C7565 52656600                                      |......RValueRef.|
              )
           Bytes (
             0000: 4E000312 11150300 07100000 4E6F5265 66517561 6C00F2F1 0F150200 0C100000  |N...........NoRefQual...........|
             0020: 52656651 75616C00 11150300 0B100000 4C56616C 75655265 6600F2F1 11150300  |RefQual.........LValueRef.......|
             0040: 09100000 5256616C 75655265 6600F2F1                                      |....RValueRef...|
           )
   0x100E | LF_STRUCTURE [size = 32] `A`
            unique name: `.?AUA@@`
            vtable: <no type>, base list: <no type>, field list: 0x100D
            options: has unique name, sizeof 1
           Bytes (
             0000: 1E000515 05000002 0D100000 00000000 00000000 01004100 2E3F4155 41404000  |......................A..?AUA@@.|
           )

Here we see several different LF_MFUNCTION records in use.

0x1009: Used by Foo::RValueRef and one overload of Foo::RefQual
0x100B: Used by Foo::LValueRef and one overload of Foo::RefQual
0x1007: Used by Foo::NoRefQual

And the only difference between any of them is the "this type" field. So for this pointer types, we have:

0x1008: Used by Foo::RValueRef and one overload of Foo::RefQual
0x100A: Used by Foo::LValueRef and one overload of Foo::RefQual
0x1005: Used by Foo::NoRefQual

And if we then go look at the record data of each one, we have:

0x1008: referent = 0x1003, mode = pointer, opts = const, kind = ptr64  0000: 0A000210 03100000 0C042100
0x100A: referent = 0x1003, mode = pointer, opts = const, kind = ptr64  0000: 0A000210 03100000 0C041100
0x1005: referent = 0x1003, mode = pointer, opts = const, kind = ptr64  0000: 0A000210 03100000 0C040100

The data that llvm-pdbutil looks identical for each one, but if we look closer at the record bytes we can see that there is 1 bit difference in each of them.

After this patch, clang-cl emits compatible debug info for this case.

0x1009: referent = 0x1003, mode = pointer, opts = None, kind = ptr64  0000: 0A000210 03100000 0C002100
0x1007: referent = 0x1003, mode = pointer, opts = None, kind = ptr64  0000: 0A000210 03100000 0C001100
0x1004: referent = 0x1003, mode = pointer, opts = None, kind = ptr64  0000: 0A000210 03100000 0C000100

Diff Detail

Event Timeline

zturner created this revision.Nov 17 2018, 8:57 AM
zturner updated this revision to Diff 174509.Nov 17 2018, 9:55 AM

In my original output, you could still see one bit different on our this pointer records and MSVC's. Specifically, they mark them as const. I went ahead and did the same thing in this update.

zturner updated this revision to Diff 174510.Nov 17 2018, 9:57 AM

Uploaded the wrong patch last time. This time I actually have the change for marking the pointers const.

rnk added inline comments.Nov 19 2018, 12:03 PM
llvm/include/llvm/DebugInfo/CodeView/CodeView.h
362–364

Why should we separate these? I don't think the WinRT smart pointer bit really relates to ref qualifier overloading, does it? Since these are just more flags, grouping them together reduces the number of parameters we need to make pointer records.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
1717

This constructor is getting pretty big at this point, for example.

llvm/lib/DebugInfo/CodeView/TypeDumpVisitor.cpp
364

Most of the diff is updating tests to cope with this. You should be able to commit this one line change separately and fix up all the tests by deleting lines in test containing PointerAttributes.

zturner updated this revision to Diff 174815.Nov 20 2018, 11:48 AM

I moved the patch for marking them const and removing the PointerAttributes field from the dump output to separate patches, so this is now much smaller.

rnk accepted this revision.Nov 20 2018, 2:00 PM

lgtm, thanks!

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
1901

Nice.

2465–2467

Every time we have to add a new method that routes around getTypeIndex and has to do its own TypeIndices.find check, I shed a tear. So far as I can tell, though, it's necessary.

This revision is now accepted and ready to land.Nov 20 2018, 2:00 PM
zturner added inline comments.Nov 20 2018, 2:09 PM
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
2465–2467

Yes, I was sad about this. One way we could have avoided it is if the DI Flags were on the pointer as well as (or instead of) being on the subroutine. Then we could have handled it inside of lowerTypePointer the same way I did for the constness.

This revision was automatically updated to reflect the committed changes.

I think this patch broke Zig's tests:

zig: /store/dev/llvm-project/llvm/include/llvm/Support/Casting.h:255: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::DIDerivedType; Y = const llvm::DIType; typename llvm::cast_retty<X, Y*>::ret_type = const llvm::DIDerivedType*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
#0  0x00007ffff65d3264 in raise () from /nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libc.so.6
#1  0x00007ffff65d4665 in abort () from /nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libc.so.6
#2  0x00007ffff65cbce7 in __assert_fail_base () from /nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libc.so.6
#3  0x00007ffff65cbd92 in __assert_fail () from /nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libc.so.6
#4  0x0000000003ad313e in llvm::cast<llvm::DIDerivedType, llvm::DIType const> (Val=0xa9f98a8)
    at /store/dev/llvm-project/llvm/include/llvm/Support/Casting.h:255
#5  0x00000000049c498a in llvm::CodeViewDebug::getTypeIndexForThisPtr (this=0x1c865190, TypeRef=..., SubroutineTy=0xb640010)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:2419
#6  0x00000000049c1f58 in llvm::CodeViewDebug::lowerTypeMemberFunction (this=0x1c865190, Ty=0xb640010, ClassTy=0x176a1b38, ThisAdjustment=0, 
    IsStaticMethod=false, FO=llvm::codeview::FunctionOptions::None) at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:1843
#7  0x00000000049ba3a8 in llvm::CodeViewDebug::getMemberFunctionType (this=0x1c865190, SP=0x1cc572c0, Class=0x176a1b38)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:419
#8  0x00000000049b9e70 in llvm::CodeViewDebug::getFuncIdForSubprogram (this=0x1c865190, SP=0x1cc572c0)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:355
#9  0x00000000049bddc1 in llvm::CodeViewDebug::emitDebugInfoForFunction (this=0x1c865190, GV=0x1c95d228, FI=...)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:1011
#10 0x00000000049bac5f in llvm::CodeViewDebug::endModule (this=0x1c865190)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:542
#11 0x0000000004922eb5 in llvm::AsmPrinter::doFinalization (this=0x16056300, M=...)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1450
#12 0x0000000005edd4cb in llvm::FPPassManager::doFinalization (this=0x18ed1ac0, M=...)
    at /store/dev/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1697
#13 0x0000000005edd9fd in (anonymous namespace)::MPPassManager::runOnModule (this=0x162b8150, M=...)
    at /store/dev/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1773
#14 0x0000000005eddf34 in llvm::legacy::PassManagerImpl::run (this=0x1a07c230, M=...)
    at /store/dev/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1857
#15 0x0000000005ede163 in llvm::legacy::PassManager::run (this=0x7fffffffaf60, M=...)
    at /store/dev/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1888
#16 0x0000000001d9b889 in ZigLLVMTargetMachineEmitToFile (targ_machine_ref=0xa9f1fe0, module_ref=0xa9f0180,
#5  0x00000000049c498a in llvm::CodeViewDebug::getTypeIndexForThisPtr (this=0x1c865190, TypeRef=..., SubroutineTy=0xb640010)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:2419
2419      TypeIndex TI = lowerTypePointer(cast<DIDerivedType>(Ty), Options);
(gdb) p Ty->dump()
<0xa9f98a8> = !DIBasicType(name: "i32", size: 32, encoding: DW_ATE_signed)
$3 = void
#8  0x00000000049b9e70 in llvm::CodeViewDebug::getFuncIdForSubprogram (this=0x1c865190, SP=0x1cc572c0)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:355
355                                    DisplayName);
(gdb) p DisplayName
$4 = {static npos = 18446744073709551615, Data = 0x1c985700 "StructWithNoFields_add", Length = 22}
#9  0x00000000049bddc1 in llvm::CodeViewDebug::emitDebugInfoForFunction (this=0x1c865190, GV=0x1c95d228, FI=...)
    at /store/dev/llvm-project/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:1011
1011        OS.EmitIntValue(getFuncIdForSubprogram(GV->getSubprogram()).getIndex(), 4);
(gdb) p GV->getSubprogram()
$5 = (llvm::DISubprogram *) 0x1cc572c0
(gdb) p GV->getSubprogram()->dump()
<0x1cc572c0> = distinct !DISubprogram(name: "StructWithNoFields_add", scope: <0x176a1b38>, file: <0xb1bf220>, line: 7, type: <0xb640010>, scopeLine: 7, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: <0xa9b7098>, retainedNodes: <0x1c87f3b0>)
$6 = void

Relevant excerpt from the module IR:

; Function Attrs: nobuiltin nounwind uwtable
define internal fastcc i32 @StructWithNoFields_add(i32, i32) unnamed_addr #0 !dbg !12316 {
Entry:
  %a = alloca i32, align 4
  %b = alloca i32, align 4
  store i32 %0, i32* %a, align 4
  call void @llvm.dbg.declare(metadata i32* %a, metadata !12319, metadata !DIExpression()), !dbg !12322
  store i32 %1, i32* %b, align 4
  call void @llvm.dbg.declare(metadata i32* %b, metadata !12320, metadata !DIExpression()), !dbg !12323
  %2 = load i32, i32* %a, align 4, !dbg !12324
  %3 = load i32, i32* %b, align 4, !dbg !12327
  %4 = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %2, i32 %3), !dbg !12328
  %5 = extractvalue { i32, i1 } %4, 0, !dbg !12328
  %6 = extractvalue { i32, i1 } %4, 1, !dbg !12328
  br i1 %6, label %OverflowFail, label %OverflowOk, !dbg !12328

OverflowFail:                                     ; preds = %Entry
  tail call fastcc void @panic(%"[]u8"* @1141, %StackTrace* null), !dbg !12328
  unreachable, !dbg !12328

OverflowOk:                                       ; preds = %Entry
  ret i32 %5, !dbg !12329
}

attributes #0 = { nobuiltin nounwind uwtable "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" }


!1089 = !DIBasicType(name: "i32", size: 32, encoding: DW_ATE_signed)
!12316 = distinct !DISubprogram(name: "StructWithNoFields_add", scope: !12317, file: !621, line: 7, type: !4560, scopeLine: 7, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !1, retainedNodes: !12318)
!12317 = !DICompositeType(tag: DW_TAG_structure_type, name: "StructWithNoFields", scope: !621, file: !621, line: 6, elements: !1259)
!12318 = !{!12319, !12320}
!12319 = !DILocalVariable(name: "a", arg: 1, scope: !12316, file: !621, line: 7, type: !1089)
!12320 = !DILocalVariable(name: "b", arg: 2, scope: !12321, file: !621, line: 7, type: !1089)

The same debug info, when using DWARF instead of CodeView, builds successfully.

Any clues as to something I'm doing wrong, or is this a regression in LLVM?

rnk added a comment.Dec 26 2018, 1:29 PM

We can change the code to avoid the crash, but what debug info did you want clang to emit? It seems like the best way to describe StructWithNoFields_add is as a static method of the class StructWithNoFields, since it doesn't appear to have a 'this' parameter.

rnk added a comment.Dec 26 2018, 1:56 PM

This shouldn't crash anymore after rL350073.

andrewrk added a comment.EditedDec 26 2018, 5:55 PM
In D54667#1340985, @rnk wrote:

We can change the code to avoid the crash, but what debug info did you want clang to emit? It seems like the best way to describe StructWithNoFields_add is as a static method of the class StructWithNoFields, since it doesn't appear to have a 'this' parameter.

Thanks for the fix!

I think that is a good way to emit debug info. Here's the code from the frontend:

const StructWithNoFields = struct {
    fn add(a: i32, b: i32) i32 {
        return a + b;
    }
};

test "call struct static method" {
    const result = StructWithNoFields.add(3, 4);
    assert(result == 7);
}

In Zig, a struct acts as a namespace full of static functions, so what you said works perfectly.

I'm also open for suggestions if there was a better way to represent this in LLVM's debug info.

rnk added a comment.Dec 27 2018, 10:40 AM

In Zig, a struct acts as a namespace full of static functions, so what you said works perfectly.

I'm also open for suggestions if there was a better way to represent this in LLVM's debug info.

In that case, you might want to add the DINode::FlagStaticMember flag to all your methods. If a DISubprogram lists a DICompositeType as its scope or is used as an element of the composite type, it gets treated as a C++-style method where the first argument is assumed to be 'this'. That's a very C++-centric way to do things, so it might be nicer if we explicitly marked C++ instance methods, but that's the way things are today and changing it would require thinking about backwards compatibility. >_>

The benefit of explicitly marking everything as a "static" method for zig is that if the first argument happens to be a pointer type it won't end up looking like an instance method.

In D54667#1341372, @rnk wrote:

In that case, you might want to add the DINode::FlagStaticMember flag to all your methods. If a DISubprogram lists a DICompositeType as its scope or is used as an element of the composite type, it gets treated as a C++-style method where the first argument is assumed to be 'this'. That's a very C++-centric way to do things, so it might be nicer if we explicitly marked C++ instance methods, but that's the way things are today and changing it would require thinking about backwards compatibility. >_>

The benefit of explicitly marking everything as a "static" method for zig is that if the first argument happens to be a pointer type it won't end up looking like an instance method.

Thank you for the advice! We made this change downstream.