This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Fix flaky `variables-locations.test` after PR38857
AbandonedPublic

Authored by aleksandr.urakov on Oct 10 2018, 8:25 AM.

Details

Summary

This patch fixes the flaky test variables-locations.test. The test began flaking after the fix of the PR38857 issue. It have happened because in PDBLocationToDWARFExpression we treat a VFRAME register as a LLDB_REGNUM_GENERIC_FP, which leads to EBP on Windows x86, but in this case we can't read locals relative to EBP. Moreover, it seems that we have no alternative base, so I just have changed double with float to avoid alignment.

By the way, what do you think, how can we make LLDB support aligned stacks? As far as I know, similar alignment problems are reproducible on non-Windows too.

Diff Detail

Event Timeline

Could you document that in the test?

This revision is now accepted and ready to land.Oct 10 2018, 10:25 AM

By the way, what do you think, how can we make LLDB support aligned stacks? As far as I know, similar alignment problems are reproducible on non-Windows too.

When you see VFRAME, you need to look in FPO data. As you might have guessed, VFRAME only occurs in X86.

I compiled your sample program with a few modifications, and I'll show some output of llvm-pdbutil to illustrate how analyzing FPO data works (which would also give you some insight into how this will eventually be implemented without DIA)

Here is the source code I compiled:

int g_var = 2222;

void __fastcall with_double(short arg_0, float arg_1) { char loc_0 = 'x'; double dvar = 0.5678; }

void __fastcall with_float(short arg_0, float arg_1) { char loc_0 = 'x'; float fvar = 0.5678f; }

int main(int argc, char *argv[]) {
  bool loc_0 = true;
  int loc_1 = 3333;

  with_double(1111, 0.1234);
  with_float(1111, 0.1234);

  return 0;
}

Then I ran this command.

$ llvm-pdbutil.exe dump -symbols -modi=0 vlt.pdb


                          Symbols
============================================================
  Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\vlt.obj`:
      <snip>
      52 | S_GPROC32 [size = 52] `with_double`
           parent = 0, end = 268, addr = 0001:0000, code size = 50
           type = `0x1001 (void (short, float))`, debug start = 0, debug end = 0, flags = none
     104 | S_FRAMEPROC [size = 32]
           size = 28, padding size = 0, offset to padding = 0
           bytes of callee saved registers = 0, exception handler addr = 0000:0000
           local fp reg = VFRAME, param fp reg = EBP
           flags =
     <snip>
     236 | S_LOCAL [size = 16] `dvar`
           type=0x0041 (double), flags = none
     252 | S_DEFRANGE_FRAMEPOINTER_REL [size = 16]
           offset = -16, range = [0001:0027,+23)
           gaps = 2
     268 | S_END [size = 4]
     272 | S_GPROC32 [size = 52] `with_float`
           parent = 0, end = 484, addr = 0001:0064, code size = 44
           type = `0x1001 (void (short, float))`, debug start = 0, debug end = 0, flags = none
     324 | S_FRAMEPROC [size = 32]
           size = 16, padding size = 0, offset to padding = 0
           bytes of callee saved registers = 0, exception handler addr = 0000:0000
           local fp reg = EBP, param fp reg = EBP
           flags =
     <snip>
     452 | S_LOCAL [size = 16] `fvar`
           type=0x0040 (float), flags = none
     468 | S_DEFRANGE_FRAMEPOINTER_REL [size = 16]
           offset = -8, range = [0001:0087,+21)
           gaps = 2
     484 | S_END [size = 4]

the S_GPROC32 and S_END records form a pair, so all relevant data for this function is inside of the matching pair.

Both dvar and fvar are of type S_DEFRANGE_FRAMEPOINTER_REL, which means they're relative to the framepointer. So we need to search for the S_FRAMEPROC record inside of this function. It's immediately after the S_GPROC32 record in both cases. In the case of with_float we find that it says "local fp reg = EBP". This means it's easy, nothing special to do which is why it fixed the issue for you changing to float. On the other hand, as you noticed the other one says VFRAME.

This means we need to go to the FPO data. But first we need to find the address of this function. The S_GPROC32 of with_double says it's at address 0001:0000. So we check the section headers to find out what is section 1.

$ llvm-pdbutil.exe dump -section-headers vlt.pdb


                      Section Headers
============================================================

  SECTION HEADER #1
     .text name
     3E3FD virtual size
      1000 virtual address
     3E400 size of raw data
       600 file pointer to raw data
         0 file pointer to relocation table
         0 file pointer to line numbers
         0 number of relocations
         0 number of line numbers
  60000020 flags
           IMAGE_SCN_CNT_CODE
           IMAGE_SCN_MEM_EXECUTE
           IMAGE_SCN_MEM_READ

So now we know section 1 starts at virtual address 0x1000, and this particular function is at offset 0000, so it is also at virtual address 0x1000. The function has size 50, so we are looking for an FPO record in the range of [0x1000,0x1032)

Now let's look at the FPO data in the PDB.

                        Old FPO Data
============================================================
  RVA    | Code | Locals | Params | Prolog | Saved Regs | Use BP | Has SEH | Frame Type
0000131A |   20 |      0 |      0 |      0 |          0 |  false |   false |       FPO
00001483 |   19 |      0 |      0 |      0 |          0 |  false |   false |       FPO
<snip>

                        New FPO Data
============================================================
  RVA    | Code | Locals | Params | Stack | Prolog | Saved Regs | Has SEH | Has C++EH | Start | Program
00001000 |   50 |      0 |      4 |     0 |      9 |          0 |   false |     false |  true | $T0 .raSearch = $eip $T0 ^ = $esp $T0 4 + =
00001001 |   49 |      0 |      4 |     0 |      8 |          4 |   false |     false | false | $T0 .raSearch = $eip $T0 ^ = $esp $T0 4 + = $ebp $T0 4 - ^ =
00001003 |   47 |      0 |      4 |     0 |      6 |          4 |   false |     false | false | $T0 $ebp 4 + = $eip $T0 ^ = $esp $T0 4 + = $ebp $T0 4 - ^ =
00001006 |   44 |      0 |      4 |     0 |      3 |          4 |   false |     false | false | $T1 $ebp 4 + = $T0 $T1 4 - 8 @ = $eip $T1 ^ = $esp $T1 4 + = $ebp $T1 4 - ^ =
00001040 |   44 |      0 |      4 |     0 |      6 |          0 |   false |     false |  true | $T0 .raSearch = $eip $T0 ^ = $esp $T0 4 + =

Generally speaking you can ignore anything in Old FPO data, it's only used by MASM and very old compilers. And we can see that there are 4 FPO entries for this function.

How do we know which one to choose? Well, the S_DEFRANGE_FRAMEPOINTER_REL told us that it is at range [0001:0027,+23). Confusingly, it's decimal this time, so this is actually the virtual address range [0x101B,0x1032)

That corresponds to this FPO entry.

00001006 |   44 |      0 |      4 |     0 |      3 |          4 |   false |     false | false | $T1 $ebp 4 + = $T0 $T1 4 - 8 @ = $eip $T1 ^ = $esp $T1 4 + = $ebp $T1 4 - ^ =

The important thing here is this magical string at the end. These strings are little miniature "programs" which have to be "executed". They are separated by line on equals sign, so the program becomes:

$T1 $ebp 4 + 
$T0 $T1 4 - 8 @ 
$eip $T1 ^ 
$esp $T1 4 + 
$ebp $T1 4 - ^

this is using RPN notation (e.g. 4+8 is written "4 8 +"), where + means plus, - means minus, @ means align, ^ means dereference. So we can interpret this as:

$T1 = $ebp + 4
$T0 = alignTo($T1 - 4, 8)
$eip = *($T1)
$esp = $T1 + 4
$ebp = *($T1 - 4)

$T0 is the special "vframe" register. It is always T0. So if you want to find out what address VFRAME means, it is alignTo($ebp + 4 - 4, 8) = alignTo($ebp, 8). And there's your answer.

How to access this via DIA-like API? That part I do not know. I know you can access FPO data, for example through this interface

It seems the IDiaFrameData interface provides some similar functionality here.

You didn't include it here, but I notice the test file just writes clang-cl /Zi. Should we be passing -m32 or -m64? Currently, the test just runs with whatever architecture happens to be set via the VS command prompt. The behavior here is different on x86 and x64 so perhaps it requires separate tests.

Thanks a lot for so detailed answer, it helps!

So we need to parse a FPO program and to convert it in a DWARF expression too. The problem here (in the DIA case) is that I don't know how to retrieve the required FPO range (we have a symbol context when creating a variable, but it seems that it doesn't contain enough information). As for the raw PDB case, can the same S_LOCAL have several S_DEFRANGE_FRAMEPOINTER_REL records for several ranges? If so, then the problem exists for the raw PDB case too, but there's a solution: we can combine processing of all S_DEFRANGE_FRAMEPOINTER_REL records in the single DWARF expression (and call the required FPO program depending on current eip).

The interesting moment here is that MSVC doesn't emit correct information for locals. For the program aaa.cpp:

void bar(char a, short b, int c) { }

void __fastcall foo(short arg_0, float arg_1) {
  char loc_0 = 'x';
  double __declspec(align(8)) loc_1 = 0.5678;
  bar(1, 2, 3);
}

int main(int argc, char *argv[]) {
  foo(1111, 0.1234);
  return 0;
}

compiled with cl /Zi /Oy aaa.cpp we have the next disassembly of foo:

push    ebp
mov     ebp, esp
and     esp, 0FFFFFFF8h
sub     esp, 10h
mov     [esp+4], cx     ; arg_0
mov     byte ptr [esp+3], 'x' ; loc_0
movsd   xmm0, ds:__real@3fe22b6ae7d566cf
movsd   qword ptr [esp+8], xmm0 ; loc_1
push    3               ; c
push    2               ; b
push    1               ; a
call    j_?bar@@YAXDFH@Z ; bar(char,short,int)
add     esp, 0Ch
mov     esp, ebp
pop     ebp
retn    4

but MSVC emits the next info about locals:

296 | S_GPROC32 [size = 44] `foo`
      parent = 0, end = 452, addr = 0001:23616, code size = 53
      type = `0x1003 (void (short, float))`, debug start = 14, debug end = 47, flags = has fp
340 | S_FRAMEPROC [size = 32]
      size = 16, padding size = 0, offset to padding = 0
      bytes of callee saved registers = 0, exception handler addr = 0000:0000
      local fp reg = VFRAME, param fp reg = EBP
      flags = has async eh | opt speed
372 | S_REGREL32 [size = 20] `arg_0`
      type = 0x0011 (short), register = ESP, offset = 4294967284
392 | S_REGREL32 [size = 20] `arg_1`
      type = 0x0040 (float), register = EBP, offset = 8
412 | S_REGREL32 [size = 20] `loc_1`
      type = 0x0041 (double), register = ESP, offset = 4294967288
432 | S_REGREL32 [size = 20] `loc_0`
      type = 0x0070 (char), register = ESP, offset = 4294967283
452 | S_END [size = 4]

so neither LLDB nor Visual Studio show valid values (except for arg_1).

You didn't include it here, but I notice the test file just writes clang-cl /Zi. Should we be passing -m32 or -m64? Currently, the test just runs with whatever architecture happens to be set via the VS command prompt. The behavior here is different on x86 and x64 so perhaps it requires separate tests.

The problem is that x64 LLDB can't debug x86 applications on Windows (and vice versa), so I rely here on the fact that tests are run in the same VS command prompt as LLVM was built. But yes, it's still possible to run tests for x86 LLVM under the x64 VS command prompt, so this one will fail. I think we can somehow guard this when I'll implement compiler_config you have mentioned in D52618.

Could you document that in the test?

Sure, thanks!

As for aligned stack cross-platform problems, I mean also problems with stack unwinding. They seem to appear on non-Windows too. It's because x86AssemblyInspectionEngine doesn't support stack alignment now. I've made some changes locally to fix it, but they are still raw to publish. The main idea is to save one more frame address (along with CFA) for every row of an unwind plan (I've called this AFA - aligned frame address), and add an analysis for and esp, ... etc. to x86AssemblyInspectionEngine. What do you think about a such approach?

Thanks a lot for so detailed answer, it helps!

So we need to parse a FPO program and to convert it in a DWARF expression too. The problem here (in the DIA case) is that I don't know how to retrieve the required FPO range (we have a symbol context when creating a variable, but it seems that it doesn't contain enough information).

I think the SymbolContext should have enough information. As long as you can find the PDBSymbolFunction for the current frame, that gives you the range of the function, and the IDiaSymbol for the variable should give you the important info like register, offset, etc.

As for the raw PDB case, can the same S_LOCAL have several S_DEFRANGE_FRAMEPOINTER_REL records for several ranges? If so, then the problem exists for the raw PDB case too, but there's a solution: we can combine processing of all S_DEFRANGE_FRAMEPOINTER_REL records in the single DWARF expression (and call the required FPO program depending on current eip).

Not as far as I know. There should be only 1. But note that is not the only type of record that can appear, there are several others. But basically there will be S_LOCAL followed by 1 record describing the location.

The interesting moment here is that MSVC doesn't emit correct information for locals. For the program aaa.cpp:

void bar(char a, short b, int c) { }

void __fastcall foo(short arg_0, float arg_1) {
  char loc_0 = 'x';
  double __declspec(align(8)) loc_1 = 0.5678;
  bar(1, 2, 3);
}

int main(int argc, char *argv[]) {
  foo(1111, 0.1234);
  return 0;
}

compiled with cl /Zi /Oy aaa.cpp we have the next disassembly of foo:

push    ebp
mov     ebp, esp
and     esp, 0FFFFFFF8h
sub     esp, 10h
mov     [esp+4], cx     ; arg_0
mov     byte ptr [esp+3], 'x' ; loc_0
movsd   xmm0, ds:__real@3fe22b6ae7d566cf
movsd   qword ptr [esp+8], xmm0 ; loc_1
push    3               ; c
push    2               ; b
push    1               ; a
call    j_?bar@@YAXDFH@Z ; bar(char,short,int)
add     esp, 0Ch
mov     esp, ebp
pop     ebp
retn    4

but MSVC emits the next info about locals:

296 | S_GPROC32 [size = 44] `foo`
      parent = 0, end = 452, addr = 0001:23616, code size = 53
      type = `0x1003 (void (short, float))`, debug start = 14, debug end = 47, flags = has fp
340 | S_FRAMEPROC [size = 32]
      size = 16, padding size = 0, offset to padding = 0
      bytes of callee saved registers = 0, exception handler addr = 0000:0000
      local fp reg = VFRAME, param fp reg = EBP
      flags = has async eh | opt speed
372 | S_REGREL32 [size = 20] `arg_0`
      type = 0x0011 (short), register = ESP, offset = 4294967284
392 | S_REGREL32 [size = 20] `arg_1`
      type = 0x0040 (float), register = EBP, offset = 8
412 | S_REGREL32 [size = 20] `loc_1`
      type = 0x0041 (double), register = ESP, offset = 4294967288
432 | S_REGREL32 [size = 20] `loc_0`
      type = 0x0070 (char), register = ESP, offset = 4294967283
452 | S_END [size = 4]

so neither LLDB nor Visual Studio show valid values (except for arg_1).

Generally speaking, if you're using /Oy all bets are off and good luck :)

Interestingly, clang actually gets this right but we use a totally different CodeView record.

$ llvm-pdbutil.exe dump -symbols -modi=0 foo-clang.pdb | grep -A 4 loc_1
     392 | S_LOCAL [size = 16] `loc_1`
           type=0x0041 (double), flags = none
     408 | S_DEFRANGE_FRAMEPOINTER_REL [size = 16]
           offset = -16, range = [0001:0075,+51)
           gaps = 2

D:\src\llvmbuild\ninja-release-x64>

If you try to debug the same program in VS built with clang with the exact same command line, it will work.

I need to study the disassembly and FPO program of the cl.exe version some more to decide if the bug is in the compiler or the debugger, but I think the bug is in the compiler.

It's funny because our optimized debug info is not very good, so I'm surprised there's a case where we're better than MSVC here :)

Thanks a lot for so detailed answer, it helps!

So we need to parse a FPO program and to convert it in a DWARF expression too. The problem here (in the DIA case) is that I don't know how to retrieve the required FPO range (we have a symbol context when creating a variable, but it seems that it doesn't contain enough information).

I think the SymbolContext should have enough information. As long as you can find the PDBSymbolFunction for the current frame, that gives you the range of the function, and the IDiaSymbol for the variable should give you the important info like register, offset, etc.

Do we have access to the current instruction pointer? That's what you need to find the correct FPO record.

Do we have access to the current instruction pointer? That's what you need to find the correct FPO record.

No, it seems that we haven't. But if there's the only one S_DEFRANGE_FRAMEPOINTER_REL for variable, then there must be the only one definition of VFRAME for the variable's range, and then I think we can safely search the required FPO data by intersection with the variable's scope range. I'll try to implement this, thanks!

As for aligned stack cross-platform problems, I mean also problems with stack unwinding. They seem to appear on non-Windows too. It's because x86AssemblyInspectionEngine doesn't support stack alignment now. I've made some changes locally to fix it, but they are still raw to publish. The main idea is to save one more frame address (along with CFA) for every row of an unwind plan (I've called this AFA - aligned frame address), and add an analysis for and esp, ... etc. to x86AssemblyInspectionEngine. What do you think about a such approach?

I am not sure I fully understand the discussion here (I got lost in the windows-specific jargon), but are we talking about the situation where a function re-aligns it's stack pointer on entry via some sequence like:

mov %esp, %ebp
and %-8, %esp
...
mov %ebp, %esp
ret

?

If so, then I don't see why the instruction emulator should have a problem with this sequence, because after mov %esp, %ebp it will conclude that the frame of this function is ebp-based, and use that for unwinding (I know there were some issues here in the past, but I hope I have fixed those already). Or are you saying that your compiler manages to align the stack without producing a frame pointer? I think that would be very tricky, as the function itself needs to restore the original %esp value somehow so it can return properly. Can you show me the disassembly of the function in question?

aleksandr.urakov added a comment.EditedOct 16 2018, 4:30 AM

Yes, I mean exactly the same case. For sequences like you've written yes, the unwind should work, but there must be some problems with saved registers. x86AssemblyInspectionEngine doesn't handle instructions like and %-8, %esp, so the register save would work only if this instruction hadn't changed the esp value (e.g. esp was already aligned). Otherwise, if esp was changed, the offset to CFA in RegisterLocation of some register will be invalid, because it will not take the alignment into account.

Moreover, it is impossible to specify a location for some saved register on a such stack with the CFA + offset restore type, because we can't know how esp will be changed after and %-8, %esp. So I suggest to introduce one more frame address (along with CFA), and make it point to esp right after and ..., %esp. So any saved register would have AFA + offset restore type (I've called for now this frame address as AFA - aligned frame address).

As for MSVC-compiled sources, the things are even more interesting. Consider the following program:

struct __declspec(align(256)) OverAligned {
  char c;
};

void foo(int foo_arg) {
    OverAligned oa_foo = { 1 };
    auto aaa_foo = 1234;
}

void bar(int bar_arg) {
    OverAligned oa_bar = { 2 };
    auto aaa_bar = 5678;
    foo(1111);
}

int main() {
  bar(2222);
  return 0;
}

We will have the next disassembly for bar:

push    ebx
mov     ebx, esp
sub     esp, 8
and     esp, -100h
add     esp, 4
push    ebp
mov     ebp, [ebx+4]
mov     [esp+4], ebp
mov     ebp, esp
sub     esp, 200h
mov     byte ptr [ebp-200h], 2
mov     dword ptr [ebp-4], 5678
push    1111            ; foo_arg
call    j_?foo@@YAXH@Z  ; foo(int)
add     esp, 4
mov     esp, ebp
pop     ebp
mov     esp, ebx
pop     ebx
retn

so here ebx is used to retrieve arguments, while ebp to work with locals. x86AssemblyInspectionEngine doesn't support such a scheme now, so even the unwind doesn't work. So I suggest to analyze mov ebx, esp etc. too, so in this case CFA will be tied to ebx, while AFA (see above) will be tied to ebp.

I've implemented some sketch of this, and it seems that it works, but I want to make sure that nobody has objections to this approach.

That's an interesting code snippet. Thanks for sharing that.

The thing that's not clear to me is: are you specifically interested in unwinding from these kinds of functions without debug info? Because it sounds to me like the info Zachary provided is enough to unwind from these functions, assuming debug info is present. And in this case there shouldn't be any need for instruction emulation.

The thing that's not clear to me is: are you specifically interested in unwinding from these kinds of functions without debug info? Because it sounds to me like the info Zachary provided is enough to unwind from these functions, assuming debug info is present. And in this case there shouldn't be any need for instruction emulation.

Yes, it seems that the info is enough to restore some of unwind plan rows, but not all of them. Here is an FPO table for the code above (compiled with cl /Zi /GS- /c a.cpp, linked with link /nodefaultlib /debug:full /entry:main a.obj):

                        New FPO Data                        
============================================================
  RVA    | Code | Locals | Params | Stack | Prolog | Saved Regs | Has SEH | Has C++EH | Start | Program
00001030 |   52 |    512 |      4 |     0 |     31 |          0 |   false |     false |  true | $T0 $ebp = $T1 $ebx = $eip $T1 4 + ^ = $ebx $T1 ^ = $esp $T1 8 + = $ebp $ebp ^ = 
00001070 |   65 |    512 |      4 |     0 |     31 |          0 |   false |     false |  true | $T0 $ebp = $T1 $ebx = $eip $T1 4 + ^ = $ebx $T1 ^ = $esp $T1 8 + = $ebp $ebp ^ = 
000010C0 |   20 |      0 |      0 |     0 |      3 |          0 |   false |     false |  true | $T0 $ebp = $eip $T0 4 + ^ = $ebp $T0 ^ = $esp $T0 8 + =

1030 is the RVA of foo, 1070 - bar, 10C0 - main. So we can restore unwind rows for each function except for prologues and epilogues (however, I think that such restore is not so trivial too, we need to parse and convert an FPO program into a DWARF expression for each register). For prologues and epilogues we need to emulate instructions. And what about the problem with saved registers I've mentioned above? It seems exist on non-Windows too, and (correct me if I'm wrong, please) the unwind plan is restored from instruction emulation there for such cases? So we still need to support this in x86AssemblyInspectionEngine?

Yes it sounds like the situation is very much similar here as it is in dwarf land -- the compilers there also don't tend to emit unwind info for prologues and epilogues.

The thing to realize about the instruction emulation is that it is always going to be a best effort enterprise. We can try to pattern match the sequences generated by the compilers, but it's always going to be playing catch up, and prone to false positives. That's why it's important to use the debug info provided by the compiler, as it is in a much better position to generate the unwind info.

Even the emulator itself has two modes of operation: in the first one it synthesizes the unwind info out-of-the-blue. In the second one it takes the existing unwind info and "augments" it with additional rows. Obviously, the second one is much more easier to implement and reliable.

The conversion of the FPO programs into DWARF does not sound like a particularly fun enterprise, though I'm not sure if that's the only way to do this kind of thing. In any case, it sounds like the process should be easily unit-testable. As for the saved registers, won't the compiler's unwind info contain that information? I would expect that is necessary for exception unwinding..

tl;dr; I am not against improving the instruction emulator. I just think it's not the most important priority right now. Getting the compiler-generated info working puts the emulator out of the critical path, and leaves it just responsible for the less frequent cases when someone is stopped in the middle of the prologue, or needs to access a variable that resides in a spilled register. But if you want to work on the emulator first, then go ahead..

Thanks for explanations!

I completely agree with you that it were better (and simpler) to implement it with the help of debug info. And if I will improve it later, I'll also choose the way you have described. But the problem is that this sketch was made a couple of months ago, and it seems work, so may be it is not a so bad idea to use it (and not redo it now to work with a debug info)? Besides, in ideal implementation we would still need something like this to cover that rare cases with prologues and epilogues.

I've been stashing a lot of changes for past few months, and now the stash is big enough, so I'm trying to send it all on review. Now I understand that it's not a good strategy :) The such approach makes inconvenient situations like this, so I think it is better to send reviews as soon as possible. Sorry for that.

I've brushed the sketch and have sent it for review last week, here it is: D53435

aleksandr.urakov abandoned this revision.Dec 4 2018, 1:58 AM

Abandon due to D55122.