This is an archive of the discontinued LLVM Phabricator instance.

[RFC][debuginfo-tests][dexter] Add a test generation tool
AbandonedPublic

Authored by Pierre-vh on Feb 28 2020, 4:48 AM.

Details

Summary

We have been working on Dexter for the past few weeks, and we tried to find ways to improve Dexter to make it even better.
One of the ideas we had was to add a tool that can automatically generate Dexter tests. This is the result of our work.

This is a working implementation of a test generation tool. Here's a list of the currently supported features:

  • Seamless integration with Dexter: the tool can simply be invoked like any other tool, using python3 dexter.py gen //...
  • LLDB debugger support (note: this is currently the only debugger supported)
  • Supports generation of DexExpectWatchValue and DexExpectStepKind commands
    • Generation of DexExpectStepKind commands can be disabled using --no-expect-steps
  • Generation of Dexter commands for each visible function argument, local variable or global variable in the top stack frame of each debugger step
    • By default, it generates command for everything, but you can also restrict the set of commands that'll be generated by using --expect-values-of (ALL | ARG | LOCAL | GLOBAL) // default is ALL
      • e.g. If you use --expect-values-of ARG, the tool will only generate DexExpectWatchValue commands for function arguments visible in the top stack frame of each debugger step

Here is an example output from this tool.

void set(int &dest, int val) {
    for(int k = 0; k < 5; ++k)
        dest += val;
}

int main()
{
	int x = 0;
    set(x, 5);
    return x;
}
//===--- AUTO-GENERATED DEXTER COMMANDS ---===//
//DexExpectWatchValue('x', '0',  on_line=9, require_in_order=False)
//DexExpectWatchValue('dest', '0', '5', '10', '15', '20', '25',  on_line=2, require_in_order=False)
//DexExpectWatchValue('val', '5', '5', '5', '5', '5', '5',  on_line=2, require_in_order=False)
//DexExpectWatchValue('dest', '0', '5', '10', '15', '20',  on_line=3, require_in_order=False)
//DexExpectWatchValue('val', '5', '5', '5', '5', '5',  on_line=3, require_in_order=False)
//DexExpectWatchValue('k', '0', '1', '2', '3', '4',  on_line=3, require_in_order=False)
//DexExpectWatchValue('dest', '25',  on_line=4, require_in_order=False)
//DexExpectWatchValue('val', '5',  on_line=4, require_in_order=False)
//DexExpectWatchValue('x', '25',  on_line=10, require_in_order=False)
//DexExpectWatchValue('x', '25',  on_line=11, require_in_order=False)
//DexExpectStepKind('FUNC', 2)
//DexExpectStepKind('VERTICAL_FORWARD', 9)
//DexExpectStepKind('VERTICAL_BACKWARD', 5)
//===--------------------------------------===//

Of course, this implementation is still in a "prototype" stage. Some things are not perfect and it still needs some work before being fully ready for upstream.

  • For instance, the tool always generates commands using require_in_order=False, as there is a bug that prevents us from generating the values in the correct order.
  • The current implementation derives from TestToolBase, which is probably not the best thing to do
  • The current implementation also collects the set of visible variables for every stack frame of every step, every single time whether we are running in test or gen mode. This is not ideal performance-wise.

That said, this implementation is good enough to to run some tests, so we ran some experiments using this tool (using a bash script); Here are some graphs!
(We generated tests using some configuration, and we ran them against GCC/Clang in every optimization mode)

  • Test generated using GCC -Og:
  • Test generated using GCC -O2:
  • Test generated using Clang O0 that just tests for the value of function arguments and does not expect debugger steps:

Here is the .tar containing every graph (70 total). Keep in mind that those were generated relatively early in the development of this tool so the actual results could be slightly different (but still within 10% I believe):

And here's what we found (our conclusions):

  • This tool is great at generating very precise tests, but those tests can be too precise and that can negatively influence the score when running it under some configurations (e.g. a test generated using GCC -Og will not make a perfect score in Clang O0 due to some GCC-isms)
  • We found that Dexter isn't good at detecting cases where the debug experience is actually better than what the test expects. The graph above (test generated using GCC -O2) is a perfect example of this
    • In short, tests should always be generated under the compiler configuration that provides the best debug experience, else the results can't be trusted (e.g. GCC Og or Clang O0)
  • The results of tests generated using the this tool under some compiler (e.g. GCC) shouldn't be trusted if ran under a different compiler (e.g. Clang)
    • This is due to some differences in how clang and GCC generate debug info. For instance, GCC generates an extra step for the closing } of a function while Clang does not. In short, every compiler is different, and tests generated using this tool will be inevitably biased towards the compiler used the generate the test.

We'd really like to get some feedback before investing too much work into this tool. So the question is: What do you think? Is there an upstream interest for this?

Diff Detail

Event Timeline

Pierre-vh created this revision.Feb 28 2020, 4:48 AM
Pierre-vh created this object with visibility "No One".
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 4:48 AM
Pierre-vh changed the visibility from "No One" to "Public (No Login Required)".

Adding Stephen and Tom who have also worked on dexter.

I like the idea of this and think it looks useful, and personally I would say
please do continue to work on it after giving people a chance to reply to this
comment.

I'd expect generated tests to still get attention from a human though. Debug
info doesn't all have the same value to a (compiler) user. E.g. loop induction
variables are likely very important. Because of this we like to use dexter to
target specific pain points while ignoring less important debug info. So a test
which looks at _all_ the debug info isn't necessarily indicative of user
expectations (and therefore may not be valuable).

The current implementation also collects the set of visible variables for
every stack frame of every step, every single time whether we are running in
test or gen mode. This is not ideal performance-wise

@TWeaver is working on a feature which requires conditional breakpoints but he is
out of office at the moment.

We found that Dexter isn't good at detecting cases where the debug experience
is actually better than what the test expects

We once had a similar tool, and we also always used O0 as a baseline.

The results of tests generated using the this tool under some compiler
(e.g. GCC) shouldn't be trusted if ran under a different compiler (e.g. Clang)

See second paragraph in this comment.

Pending comments from Tom and no objections I think you should go for it.

Does anyone else have any comments or concerns about this?

Pierre-vh updated this revision to Diff 247817.Mar 3 2020, 1:10 AM

Here is the updated revision with _verify_options removed.

TWeaver added a comment.EditedMar 3 2020, 5:47 AM

Hi Pierre,

I really like this idea and think there could be some useful applications.

The work I'm doing atm involves implementing a new type of stepping behaviour in DebuggerBase. I'm currently at the prototyping stage but I was thinking about adding support for new and interesting stepping behaviours (such as yours) that don't follow the default use case.

My code is very similar to yours in that it's written within the existing code base and modifies the existing stepping behaviour. I don't think this is the optimal way to go about supporting different stepping behaviours.

Perhaps, if you've the time, we could put our heads together and come up with a design that can support the original use case, our new use cases and any future use cases that may arise.

Do you think there's scope to generalise your work so that it can be implemented in the other debugger/s too?

Kindest Regard
Tom W

Pierre-vh added a comment.EditedMar 3 2020, 6:24 AM

Hi Pierre,

I really like this idea and think there could be some useful applications.

The work I'm doing atm involves implementing a new type of stepping behaviour in DebuggerBase. I'm currently at the prototyping stage but I was thinking about adding support for new and interesting stepping behaviours (such as yours) that don't follow the default use case.

My code is very similar to yours in that it's written within the existing code base and modifies the existing stepping behaviour. I don't think this is the optimal way to go about supporting different stepping behaviours.

Perhaps, if you've the time, we could put our heads together and come up with a design that can support the original use case, our new use cases and any future use cases that may arise.

Do you think there's scope to generalise your work so that it can be implemented in the other debugger/s too?

Kindest Regard
Tom W

Hello,

I haven't attempted to generalize my work as I entirely rely on the LLDB library for this (and I work on Linux so I can't work with any other debugger supported by Dexter)
Maybe we can try to find something together? Where do you wish to discuss this?

Kind regards,
Pierre van Houtryve

jmorse added a comment.Mar 3 2020, 6:52 AM

(NB, I haven't looked at the code as it seems you're requesting feedback for now,)

We'd really like to get some feedback before investing too much work into this tool. So the question is: What do you think? Is there an upstream interest for this?

I agree with Orlando and Tom, and I think there's definitely a place for this kind of tool to assist with writing tests. Determining what the debug experience _should_ be is hard (as Orlando points out) and probably requires a human to verify, but it'd be helpful to automate the test writing process to avoid boilerplate writing. At the very least, when writing tests we could auto-generate expectations and then customise them to the ones that we're interested in. It's easier to delete the uninteresting stuff than to write the interesting stuff from scratch.

More generally, it'd be great to have additional tooling for comparing debug behaviours between optimisation levels. A while back I had a script that:

  • Generated a random test with csmith
  • Ran Dexter over it and interpreted some of its error-output to determine what values variables contained at -O0
  • Did the same again at -O2
  • Examined whether there were any variable values present at -O2 that didn't appear at -O0

Alas it didn't find anything interesting, but applied to a different test suite it could well have. Any development in this kind of direction is helpful IMO.

TWeaver added inline comments.Mar 3 2020, 8:37 AM
debuginfo-tests/dexter/dex/debugger/Debuggers.py
166

I understand this is at the prototype stage, but... 😉

This extra else step here I believe is to setup an empty dict that can be filled with commands generated by the test run for the annotated test file generation (correct me if I'm wrong).

I'm not completely against having an else statement here but I feel there should at least be a comment describing why we'd want differing behaviour.

Also, whilst I'm not a huge fan of the 'big-ball-of-context' approach taken in the past, we could reduce the parameter list to get_debugger_steps by interrogating the context object.

debuginfo-tests/dexter/dex/tools/gen/Tool.py
103

does this mean the generation tool runs inside the same directory the test is within?

if so, where does it store the newly generated/annotated test file?

thanks
Tom W

Pierre-vh marked 2 inline comments as done.Mar 3 2020, 8:55 AM
Pierre-vh added inline comments.
debuginfo-tests/dexter/dex/debugger/Debuggers.py
166

These are really good suggestions, thank you

The else statement is needed because it crashes without it

Traceback (most recent call last):
  File "/llvm-project/debuginfo-tests/dexter/dex/../dexter.py", line 15, in <module>
    return_code = main()
  File "/llvm-project/debuginfo-tests/dexter/dex/tools/Main.py", line 193, in main
    return tool_main(context, module.Tool(context), args)
  File "/llvm-project/debuginfo-tests/dexter/dex/tools/Main.py", line 162, in tool_main
    return_code = tool.go()
  File "/llvm-project/debuginfo-tests/dexter/dex/tools/run_debugger_internal_/Tool.py", line 68, in go
    debugger.start()
  File "/llvm-project/debuginfo-tests/dexter/dex/debugger/DebuggerBase.py", line 125, in start
    for command_obj in chain.from_iterable(self.steps.commands.values()):
AttributeError: 'NoneType' object has no attribute 'values'

I'll try to find a better solution to this problem, and if I can't find one, I'll just add a comment explaining why this else is there

debuginfo-tests/dexter/dex/tools/gen/Tool.py
103

Currently, it runs in the test directory and writes the command directly to the test source
So, in order to use this tool, you need to create a folder, put your .cpp file inside it alongside a test.cfg, like you would do for a normal test

I know this isn't ideal, but I did that because it allowed me to re-use most of the existing logic (because this is a prototype).
A final version should probably work on the .cpp directly and have options to write the commands to another file/stdout/whatever.

I haven't attempted to generalize my work as I entirely rely on the LLDB library for this (and I work on Linux so I can't work with any other debugger supported by Dexter)
Maybe we can try to find something together? Where do you wish to discuss this?

Here is a fine place to discuss imo. I'm currently cooking up a basic outline of the architecture I'd like to propose. I'll need a little bit of time to consider the cases but hopefully I'll have something to propose very soon.

cheers
Tom W

Hi Pierre,

I'm working out the kinks on this proposal but I'd to get your input too,

how do you feel about changing the way _get_step_info works? I'd like to make getting the step info as generic as possible so that we can compose different levels of information at the step getting point. I'd like to add abstract methods to debuggerbase for getting frame ir, loc ir, watch irs, function irs and all the other individual parts that make up a step, including your 'visible variables' .

by being able to compose what information we want to gather at the 'getting step info' point we can choose how hard we want to the debugger to work. We could implement several different step info getters, ones that get a small portion of the information, gather all the info or some place in the middle.

By adding abstract methods to debugger base we could implement the individual methods that get the IR in the different debuggers too.

what do you think?

I'm also working on stripping out the 'start' method within debugger base into a separate controller class so that we can separate debugger operating behaviour from the functions/features of the debugger itself. This may help with your 'step_collection.commands' issue as you could implement your own 'start' method in a different controller class

It's still rather vague and early days but I welcome your feedback on these ideas.

cheers
Tom W

Pierre-vh added a comment.EditedMar 5 2020, 1:11 AM

Hi Pierre,

I'm working out the kinks on this proposal but I'd to get your input too,

how do you feel about changing the way _get_step_info works? I'd like to make getting the step info as generic as possible so that we can compose different levels of information at the step getting point. I'd like to add abstract methods to debuggerbase for getting frame ir, loc ir, watch irs, function irs and all the other individual parts that make up a step, including your 'visible variables' .

by being able to compose what information we want to gather at the 'getting step info' point we can choose how hard we want to the debugger to work. We could implement several different step info getters, ones that get a small portion of the information, gather all the info or some place in the middle.

By adding abstract methods to debugger base we could implement the individual methods that get the IR in the different debuggers too.

what do you think?

I'm also working on stripping out the 'start' method within debugger base into a separate controller class so that we can separate debugger operating behaviour from the functions/features of the debugger itself. This may help with your 'step_collection.commands' issue as you could implement your own 'start' method in a different controller class

It's still rather vague and early days but I welcome your feedback on these ideas.

cheers
Tom W

  • Breaking up _get_step_info into smaller pieces is definitely a good idea: In my opinion, the debugger classes should just act as "translators" between Dexter and the underlying debugger, nothing more, nothing less. Deciding what information to collect should be up to the caller.
    • Ideally, the API should allow the caller to choose what to collect per frame. This would be very useful for the generation tool, as it could ask the debugger to just return the set of visible variables in the top stack frame and then stop. A good alternative would to have a maximum_depth parameter (0 = no limit, 1 = collect info for just one frame, 2 = etc.)
  • I believe that the step, go and launch methods should be renamed. Something like step_in, continue and start_debugging_session seem more appropriate to me. Their current name is a bit cryptic

That's all I can think of for now, I don't have many objections, your plan sounds good to me and would definitely be a step in the right direction.

Hi @Pierre-vh ,

I hope you're keeping safe in these unprecedented times!

first of all, sorry for the delay, I've actually gone and implemented a bunch of stuff in Dexter that is currently going through the final stages of internal review, I'll be posting something up for you to take a look at today.

secondly, in regards to compositing step information from the debugger, I've not done anything about this just yet but I have implemented a new class that allows you run the debugger in any way you wish. This is going to causes changes in your patch. Setting up of a 'test' run is now done in the tool before the calling of _get_debugger_steps. this does mean that there's some interface and argument passing changes that'll need to take place in your patch.

where we go from here will require some discussion but I look forward to potentially collaborating with you in the future.

I'll link the patch here and mark you as a reviewer so you can take a good look at it. if you've any further questions please let me know.

Cheers
Tom W

Hi @Pierre-vh ,

I hope you're keeping safe in these unprecedented times!

first of all, sorry for the delay, I've actually gone and implemented a bunch of stuff in Dexter that is currently going through the final stages of internal review, I'll be posting something up for you to take a look at today.

secondly, in regards to compositing step information from the debugger, I've not done anything about this just yet but I have implemented a new class that allows you run the debugger in any way you wish. This is going to causes changes in your patch. Setting up of a 'test' run is now done in the tool before the calling of _get_debugger_steps. this does mean that there's some interface and argument passing changes that'll need to take place in your patch.

where we go from here will require some discussion but I look forward to potentially collaborating with you in the future.

I'll link the patch here and mark you as a reviewer so you can take a good look at it. if you've any further questions please let me know.

Cheers
Tom W

Hello!

I'm looking forward to seeing your changes, and I'll try my best to give you some useful feedback on it!

However, I'm not sure I'll be able to continue work on this gen tool patch in the future.
The generation tool was done as a prototype for my internship (to run a few experiments with Dexter), but I've since moved to other things (as I'm only here for a limited amount of time), and I don't know if I'll have enough time to come back to this and finish it.

Kind regards,
Pierre van Houtryve

The debugger controller patch is now live and awaiting review at

https://reviews.llvm.org/D76926

TWeaver added a comment.EditedMar 27 2020, 10:47 AM

@Pierre-vh and not a problem buddy, I wish you all the best in your future endeavours, don't worry about feedback on the controller patch if it's not related to your current work! :) you get on with what you need to.

Pierre-vh abandoned this revision.Apr 14 2020, 1:15 AM

Closing for now as I'm working on other things, I'll reopen when/if I can come back to this.