This is an archive of the discontinued LLVM Phabricator instance.

Add debuginfo-tests that use cdb on Windows
ClosedPublic

Authored by rnk on Nov 6 2018, 3:33 PM.

Details

Summary

This is an initial prototype of how we can run debugger integration tests on
Windows. cdb and windbg share a command language and debugger engine. Visual
Studio has its own, but we should at least be able to use cdb as the basis for
optimized debug info integration tests.

There's a lot of work to do here still. For example:

  • Make fewer assumptions about the SDK location
  • Don't assume x64 (important, I need x86 testing)
  • More environment isolation, have lit setup vcvars instead of passing LIB and INCLUDE down.
  • Write a .py file to replace the grep+sed RUN line

But, this seemed like a good enough concept to commit as is, since it's useful
to me already.

Diff Detail

Repository
rL LLVM

Event Timeline

rnk created this revision.Nov 6 2018, 3:33 PM

It would be nice if instead of having a set of windows-only tests, we could wrap cdb similar to llgdb.py wraps LLDB, so these tests run on all platforms. If the Windows debugger is just too different for this to make sense, let me know.

rnk added a comment.Nov 6 2018, 3:56 PM

It would be nice if instead of having a set of windows-only tests, we could wrap cdb similar to llgdb.py wraps LLDB, so these tests run on all platforms. If the Windows debugger is just too different for this to make sense, let me know.

In the long run, I want to have both cross-debugger integration tests and debugger specific ones.

We're already having issues writing CHECK lines that match with both gdb and lldb. Take a look at asan.c, for example:

// DEBUGGER: r
// DEBUGGER: p s
// CHECK: a =
// DEBUGGER: p s.a[0]
// CHECK: = 0
// DEBUGGER: p s.a[1]
// CHECK: = 1
// DEBUGGER: p s.a[7]

It's written that way to avoid relying on format differences, and it's pretty limiting. See also the way that cdb prints decimal integers as "0n42". The wrapper would have to reformat cdb output or do... something, it's not clear what yet. Even simple commands like "break 13" would need a lot of work to translate into cdb commands. At least, I can never set breakpoints by line number manually in windbg. =P I always give up and do it by symbol name or by jamming __debugbreak() into the source as I've done here.

We've been talking about setting up debugger integration tests for years, and we have never done it, so I really just wanted to float the idea and do the absolute minimum amount of work to get it set up and running.

I think the only way to realistically make this work for all platforms would be to separate the source file from the input/output. The source file would be the test case, and if you wanted to support a different debugger you would need to supply a different input/output file..

That said, as Reid mentioned, we've been talking about getting this going for years, so at this point, and there's a steep initial overhead in making something that abstracts over multiple debuggers. So at least until we have something that works on Windows with some interesting tests, I think we shouldn't try to create the abstraction just yet.

I think the only way to realistically make this work for all platforms would be to separate the source file from the input/output. The source file would be the test case, and if you wanted to support a different debugger you would need to supply a different input/output file..

I don't necessarily agree with that statement. The debuginfo-tests use only a very small subset of debugger functionality that includes

  • setting breakpoints
  • printing the value of integer variables
  • continuing to the next breakpoint

I'm genuinely curious what is so different about the Windows debugger that it couldn't be wrapped in a translation layer like llgdb.py that abstracts these three operations. This would cover a large set of the existing tests. I'm fine with having a few extra tests that test something that only works on a specific platform here and there, but I really don't want us to grow separate platform-specific testsuites. Inevitably, someone working on platform A will fix a general bug in LLVM and then only add a test for platform A and that's bad for everyone. What I'm trying to avoid is a situation like the debug info tests in LLVM that are almost entirely x86-specific, even if they are testing stuff that happens at the IR level.

+gbedwell

Just to throw the idea out there, why not abandon debuginfo-tests entirely and try using Dexter for this. Dexter's design center is debug-info quality, not correctness, but it already knows how to drive several different debuggers on multiple platforms.
Dexter would have to become an LLVM project tool, and I'm not sure how Sony management feels about that, but I think this would be an awesome use-case.

zturner added a comment.EditedNov 7 2018, 8:54 AM

I think the only way to realistically make this work for all platforms would be to separate the source file from the input/output. The source file would be the test case, and if you wanted to support a different debugger you would need to supply a different input/output file..

I don't necessarily agree with that statement. The debuginfo-tests use only a very small subset of debugger functionality that includes

  • setting breakpoints
  • printing the value of integer variables
  • continuing to the next breakpoint

I'm genuinely curious what is so different about the Windows debugger that it couldn't be wrapped in a translation layer like llgdb.py that abstracts these three operations. This would cover a large set of the existing tests. I'm fine with having a few extra tests that test something that only works on a specific platform here and there, but I really don't want us to grow separate platform-specific testsuites. Inevitably, someone working on platform A will fix a general bug in LLVM and then only add a test for platform A and that's bad for everyone. What I'm trying to avoid is a situation like the debug info tests in LLVM that are almost entirely x86-specific, even if they are testing stuff that happens at the IR level.

I don't think we want tests that are limited to that amount of simplicity. We don't just want to print integers, we'd like to also print callstacks. And objects. And other things that aren't integers. A cdb call stack looks like this:

000000c4`aa35f730 00007ffc`8944bc72 : 00000294`3b6e0ae8 00000294`3b6e0a98 cccccccc`cccccccc cccccccc`cccccccc : MSVCP140D!_Cnd_wait+0x20 [f:\dd\vctools\crt\crtw32\stdcpp\thr\cond.c @ 106] 
08 000000c4`aa35f760 00007ffc`89450a54 : 00000294`3b6e0ae8 00000294`3b6e0a98 cccccccc`cccccccc cccccccc`cccccccc : liblldb!std::_Cnd_waitX+0x32 [c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.15.26726\include\thr\xthread @ 97] 
09 000000c4`aa35f790 00007ffc`8944122d : 00000294`3b6e0ae8 000000c4`aa35f828 00000000`00000000 00000000`00000000 : liblldb!std::condition_variable::wait+0x54 [c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.15.26726\include\mutex @ 714] 
0a 000000c4`aa35f7d0 00007ffc`89440897 : 00000294`3b6e09e0 000000c4`aa35fb78 00000000`00000000 00000000`00000000 : liblldb!lldb_private::Listener::GetEventInternal+0x20d [d:\src\llvm-mono\lldb\source\core\listener.cpp @ 367] 
0b 000000c4`aa35f8e0 00007ffc`8939b70e : 00000294`3b6e09e0 000000c4`aa35fa78 000000c4`aa35fb78 cccccccc`cccccccc : liblldb!lldb_private::Listener::GetEvent+0x57 [d:\src\llvm-mono\lldb\source\core\listener.cpp @ 404] 
0c 000000c4`aa35f930 00007ffc`8939b118 : 00000294`3b6de700 cccccccc`cccccccc cccccccc`cccccccc cccccccc`cccccccc : liblldb!lldb_private::Debugger::DefaultEventHandler+0x27e [d:\src\llvm-mono\lldb\source\core\debugger.cpp @ 1546] 
0d 000000c4`aa35fc30 00007ffc`8960cf62 : 00000294`3b6de700 00000294`00000001 cccccccc`cccccccc cccccccc`cccccccc : liblldb!lldb_private::Debugger::EventHandlerThread+0x28 [d:\src\llvm-mono\lldb\source\core\debugger.cpp @ 1600]

Then you need a way to abstract over the command lines needed to generate the executables (clang-cl and lld-link use different flag syntax than clang++ etc). Then there's the issue of when a test is using Microsoft specific extensions. At the end of all of this, it's going to take a lot of effort to implement this layer of abstraction that is ultimately going to be subverted for a large number of tests anyway when something doesn't fit cleanly into the abstraction. I think there is also the issue of how much actual overlap there is between the set of interesting test cases for DWARF / Itanium ABI and PDB / Microsoft ABI. Many issues that we would want to add tests for would be surrounding fixes specific to the way we emit the PDB or that are constructed due to knowledge of how we emit CodeView in certain situations. And none of those test cases will be interesting to abstract over.

Finally, by strictly limiting the amount of output we're checking against, we can be too permissive. For example, we have a command up there that checks against the line g. But this will match any line that has the letter g in it, which is extremely permissive. It would be more useful if the line said DEBUGGER: 0:000> g

The line that says CHECK: a = 0n2 will also match if the variable happens to be 23 or any number of other values. So we'd really like to be able to be more precise here.

So I'm not convinced there is a lot of added value, or at the very least, I don't believe it to be worth the cost. Especially since as far as I can tell, nobody has even run debuginfo-tests since late August, because it was actually broken by r341135 on August 30 (fixed in r346060 yesterday)

+gbedwell

Just to throw the idea out there, why not abandon debuginfo-tests entirely and try using Dexter for this. Dexter's design center is debug-info quality, not correctness, but it already knows how to drive several different debuggers on multiple platforms.
Dexter would have to become an LLVM project tool, and I'm not sure how Sony management feels about that, but I think this would be an awesome use-case.

Depends where we draw the distinction between quality and correctness. We specifically want a way to test that when we fix a correctness bug, it's actually fixed against Microsoft debuggers.

Especially since as far as I can tell, nobody has even run debuginfo-tests since late August, because it was actually broken by r341135 on August 30 (fixed in r346060 yesterday)

Can you please refrain from making such general statements? They distract from the discussion.

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/ runs the debuginfo-tests continuously. There is a configuration issue that prevents it from running the ASAN subset of the tests at the moment.

+gbedwell

Just to throw the idea out there, why not abandon debuginfo-tests entirely and try using Dexter for this. Dexter's design center is debug-info quality, not correctness, but it already knows how to drive several different debuggers on multiple platforms.
Dexter would have to become an LLVM project tool, and I'm not sure how Sony management feels about that, but I think this would be an awesome use-case.

Depends where we draw the distinction between quality and correctness. We specifically want a way to test that when we fix a correctness bug, it's actually fixed against Microsoft debuggers.

Dexter knows how to drive Visual Studio tools already, as well as gdb and (maybe) lldb. I have never looked inside it but I'd expect Greg to have made it straightforward to add new tools.

+gbedwell

Just to throw the idea out there, why not abandon debuginfo-tests entirely and try using Dexter for this. Dexter's design center is debug-info quality, not correctness, but it already knows how to drive several different debuggers on multiple platforms.
Dexter would have to become an LLVM project tool, and I'm not sure how Sony management feels about that, but I think this would be an awesome use-case.

Depends where we draw the distinction between quality and correctness. We specifically want a way to test that when we fix a correctness bug, it's actually fixed against Microsoft debuggers.

Dexter knows how to drive Visual Studio tools already, as well as gdb and (maybe) lldb. I have never looked inside it but I'd expect Greg to have made it straightforward to add new tools.

If it fits our use-cases, that sounds like a great idea. Are there some examples of dexter input/outputs we could take a look at to see how good of a fit it would be?

Especially since as far as I can tell, nobody has even run debuginfo-tests since late August, because it was actually broken by r341135 on August 30 (fixed in r346060 yesterday)

Can you please refrain from making such general statements? They distract from the discussion.

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/ runs the debuginfo-tests continuously. There is a configuration issue that prevents it from running the ASAN subset of the tests at the moment.

The issue introduced by r341135 doesn't seem related to ASAN unless I'm misunderstanding the issue. Were debuginfo-tests passing on that bot even before r346060 landed?

Especially since as far as I can tell, nobody has even run debuginfo-tests since late August, because it was actually broken by r341135 on August 30 (fixed in r346060 yesterday)

Can you please refrain from making such general statements? They distract from the discussion.

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/ runs the debuginfo-tests continuously. There is a configuration issue that prevents it from running the ASAN subset of the tests at the moment.

The issue introduced by r341135 doesn't seem related to ASAN unless I'm misunderstanding the issue. Were debuginfo-tests passing on that bot even before r346060 landed?

r341135 is a fix to lit.site.cfg.py.in my best guess is that because clang-stage1-cmake-RA-expensive runs the debuginfo-tests inside the llvm/tools/clang/test/debuginfo-tests subdirectory that doesn't get used in that configuration?

+gbedwell

Just to throw the idea out there, why not abandon debuginfo-tests entirely and try using Dexter for this. Dexter's design center is debug-info quality, not correctness, but it already knows how to drive several different debuggers on multiple platforms.

I initially looked at the possibility of expanding debuginfo-tests rather than reinventing the wheel with dexter, but in the end decided that the aims were different enough that it would be messy to shoehorn one into the other. I'm open to it though. I think there's a case to be made for potentially sharing some of the low-level "this is my generalized abstraction layer for how I can talk to lots of different debuggers" code as a library with different client tools, but I wouldn't want to bog down this discussion with that.

I'm wary of going too far off-topic for this review as this is more related to the 'in the long run' aside rather than the review itself, but as I was asked for a few more details, for reference, dexter drives LLDB via the Python API ( https://lldb.llvm.org/python-reference.html ) and Visual Studio via the DTE interface ( https://docs.microsoft.com/en-us/dotnet/api/envdte.dte?view=visualstudiosdk-2017 ).

For something like the hello.c example in this review, and equivalent dexter test would look like:

#include <stdio.h>
int main() {
  printf("hello world\n");
  int x = 42;  // DexWatch('x')
}

// DexExpectWatchValue('x', '42')

It will step through every line of the program it can, collecting info about each step until it reaches the program exit. It won't currently produce a pass/fail, but rather a score. That is, if it observes that 'x' is visible and has the value 42 it'll get a perfect score. If it's not visible it'll get a worse score. If it's visible, but tells me that the value is 43, it'll get a worse score still (on the assumption that actively lying to the user is a much worse sin than shrugging and saying "I don't know").
The best place to look for real examples is probably Jeremy and Tom's test corpus which they've been using to raise bugs against LLVM: https://github.com/SNSystems/dexter/pull/4/commits/825c2c9aaccc0fb38ede2c8732152d6500c22ac5
(see also http://llvm.org/devmtg/2018-04/slides/Bedwell-Measuring_the_User_Debugging_Experience_poster.png )

rnk added a comment.Nov 7 2018, 10:48 AM

I hadn't realized that Dexter knew how to drive VS tools. I'll have to go read more and get back to you all.

I think it would be more promising than attempting to come up with a new llgdb.py-like abstraction for cdb. Specifically, abstracting over setting breakpoints and reformatting output is what makes that difficult. Everything can of course be done with enough effort, but especially in testing, it often makes sense to trade off duplication between test cases for ease of use.

(Dexter) will step through every line of the program it can, collecting info about each step until it reaches the program exit. It won't currently produce a pass/fail, but rather a score. That is, if it observes that 'x' is visible and has the value 42 it'll get a perfect score.

Seems like it would easy enough to base pass/fail on getting a perfect score, for the purpose of debuginfo-tests.

In D54187#1290432, @rnk wrote:

I hadn't realized that Dexter knew how to drive VS tools. I'll have to go read more and get back to you all.

I think it would be more promising than attempting to come up with a new llgdb.py-like abstraction for cdb. Specifically, abstracting over setting breakpoints and reformatting output is what makes that difficult. Everything can of course be done with enough effort, but especially in testing, it often makes sense to trade off duplication between test cases for ease of use.

Note that WinDbg (specifically) is an important use case, and uses a different debug engine than VS. So Dexter would (at the very least) need to be extended to support WinDbg (which has the same debugging engine as cdb). But I agree it's worth trying out and seeing what kind of test cases we can and can't fit into it.

hans added a subscriber: hans.Dec 18 2018, 1:35 AM

This seems to be stuck. Is there some way we can get incremental progress on this?

I'm not very familiar with debug info, but I think it would be a huge win (and somewhat overdue) just to get some very basic tests like rnk's hello.c automated and running continuously.

I understand that it would be even better if the whole test suite could be made to somehow run under multiple debuggers, but in the meantime the proposed patch here seems like a good step to get things started. If that's accepted and we get this running on a buildbot, it would be a great improvement over where we are today.

thakis added a subscriber: thakis.Apr 9 2019, 10:32 AM

aprantl: Ping on Hans's question from Dec 18.

Did anyone take the time to look at dexter and whether it would fit the bill here? It would be great to avoid reimplementing its functionality just to have something that then only works on Windows.

My position is that for the kind of very basic end-to-end testing that we do in this repository (set breakpoints+print variables), it should be possible to have one wrapper that supports multiple debuggers. I'm also fine with adding platform-specific tests into a subdirectory, but that should be the exception, not the rule. What I would like to avoid is growing a large parallel body of basic tests that only work on one platform.

It sounded like dexter would be the missing glue layer that supports windows debuggers and unix debuggers alike. If that is feasible I would prefer to extend dexter to support WinDbg and migrating debuginfo-tests to use dexter over adding new code and more complexity to debuginfo-tests.

thakis added a comment.EditedApr 9 2019, 11:15 AM

Did anyone take the time to look at dexter and whether it would fit the bill here? It would be great to avoid reimplementing its functionality just to have something that then only works on Windows.

My position is that for the kind of very basic end-to-end testing that we do in this repository (set breakpoints+print variables), it should be possible to have one wrapper that supports multiple debuggers. I'm also fine with adding platform-specific tests into a subdirectory, but that should be the exception, not the rule. What I would like to avoid is growing a large parallel body of basic tests that only work on one platform.

It sounded like dexter would be the missing glue layer that supports windows debuggers and unix debuggers alike. If that is feasible I would prefer to extend dexter to support WinDbg and migrating debuginfo-tests to use dexter over adding new code and more complexity to debuginfo-tests.

I just took a quick look. dexter is 7.3kLoC of python while this here is 50 lines. Dexter has code to talk to visual studio over COM (dex/debugger/visualstudio/windows/ComInterface.py etc) but doesn't have a way to call cdb. Visual Studio is a way bigger dependency than cdb.

$ cd ~/src
$ git clone https://github.com/SNSystems/dexter.git
$ cd dexter
$ find dex -name '*.py' -type f | xargs wc -l | grep total
  7382 total
$ find dex -name '*.py' -type f | xargs grep cdb
# nothing

As zturner said above, we want explicit smoke testing for cdb. There's no desire to do all or even most of our debug info testing this way, but we'd like to have some integration tests for the debug engine we care about. dexter doesn't seem to help with this use case.

(Also, dexter has seen 7 commits over its lifetime, so I'm not sure that's something we'd want to depend on.)

aprantl accepted this revision.Apr 9 2019, 9:09 PM

Okay, so it looks like dexter might be a replacement for the lldgb.py wrapper and would support gdb, lldb, and the visual studio debugger. I think it would be nice to migrate the bulk of the debuginfo-tests repository to that. Until someone implements a cdb driver in dexter, however, I suppose there is no harm in adding a few files that directly talk to cdb. And I'm in no position to estimate how much work a cdb driver would be since I'm neither familiar with dexter or cdb :-)

This revision is now accepted and ready to land.Apr 9 2019, 9:09 PM
rnk added a comment.May 28 2019, 3:58 PM

Thanks! I don't think this suite is going to get too far out of control. I think for most debug info features, checking the info itself gives enough confidence that things work, but there are these cases where I also want an integration test.

I'm picking this up again because I really want to have an integration test for this inline function line table bug that I'm working on: https://crbug.com/965670. LLVM thinks it's generating the right information, but the debugger doesn't interpret it the way we expect. I'll try to get these tests running on http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/.

rnk updated this revision to Diff 201779.May 28 2019, 3:59 PM
  • rebase, fixes
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2019, 3:59 PM
This revision was automatically updated to reflect the committed changes.