Page MenuHomePhabricator

[lldb-test/WIP] Allow a way to test autocompletion
Needs RevisionPublic

Authored by davide on Feb 7 2018, 2:59 PM.

Details

Summary

This is an experiment to improve out lldb testing capabilities and making them more similar to the one used in LLVM.

Example:

davide@Davidinos-Mac-Pro ~/w/l/b/bin> ./lldb-test autocomplete "target cr"
create
davide@Davidinos-Mac-Pro ~/w/l/b/bin> ./lldb-test autocomplete "target "
create
delete
list
modules
select
stop-hook
symbols
variable

This allows the output to be FileCheck'ed, and has the advantage that it doesn't depend on python to be executed. It also removes a bunch of boilerplate the current autocompletion tests have.
Any feedback on this is appreciated, before I write the actual FileCheck tests.

Diff Detail

Event Timeline

davide created this revision.Feb 7 2018, 2:59 PM
vsk added a comment.Feb 7 2018, 3:01 PM

The general direction lgtm, I'd be happy if we could replace interactive autocomplete tests with this.

davide added a comment.Feb 7 2018, 3:02 PM

You can take a look at the test/testcases/functionalities/completion/TestCompletion.py for the python equivalent. I find the potential FileCheck'ed version much easier to read/write/understand.
I'm possibly biased having worked many years on LLVM, hence I'm asking for general feedback.

In the future, we could add options to the autocomplete subcommand that would allow specification of a target, and things like cursor position to maximize testability.

In general though, I like the approach. It's not hard to imagine 50+ tests being written just for autocomplete this way.

By the way, I'd suggest printing indices in front of each match and including those in the FileCheck tests. Otherwise we could miss completions that sneak in.

The current auto-completer tests aren't interactive - they do exactly the same thing your command does, but from Python. It's fine if you want to add tests but please don't remove the current tests since they directly test what the SB clients use.

This will only allow you to test some of the auto-completers, for instance you don't have symbols so you can't test the symbol completer. But since the symbol commands in this lldb-test have some way to feed symbols in maybe you can crib that. I think you'll also need to make a target and some other bits as well. As you start adding these you might find that this becomes onerous, but that will be an interesting experiment.

You didn't get the HandleCompletion API quite right. That's my fault this isn't well documented. The way it works is that if all the potential matches share a common substring, then the 0th element of the result contains the common substring. If there is no common substring, then the possible matches will be stored from element 1 on in the Results list. If you want examples take a closer look at how the Python test does it.

davide added a comment.Feb 7 2018, 3:25 PM

The current auto-completer tests aren't interactive - they do exactly the same thing your command does, but from Python. It's fine if you want to add tests but please don't remove the current tests since they directly test what the SB clients use.

This is a drop-in replacement for the other test. I'm not sure why we need to keep the other test, but I'll let others comment.

This will only allow you to test some of the auto-completers, for instance you don't have symbols so you can't test the symbol completer. But since the symbol commands in this lldb-test have some way to feed symbols in maybe you can crib that. I think you'll also need to make a target and some other bits as well. As you start adding these you might find that this becomes onerous, but that will be an interesting experiment.

I voluntarily left that as as follow up.

You didn't get the HandleCompletion API quite right. That's my fault this isn't well documented. The way it works is that if all the potential matches share a common substring, then the 0th element of the result contains the common substring. If there is no common substring, then the possible matches will be stored from element 1 on in the Results list. If you want examples take a closer look at how the Python test does it.

The API is a little confusing.
There are multiple issues IMHO:

  1. We shouldn't really discriminate based on the position of the elements of the list, as it's easy to get them wrong, as I did. Instead, the API might return a, let's say, pair, where the first element is the common substring and the second element is a list containing etc..
  2. We pass strings for the second and third argument (cursor/end), when we should just pass offsets
  3. The return value is N and the list contains N +1 values. This is very error prone.

So, I think this might call for a better API?
Also, please note that I read the API and was aware of the oddities, just decided to defer the discussion to another day. I think my usage of the API is correct, as I don't necessarily care about the common substring, if any.

davide added a comment.Feb 7 2018, 3:28 PM

The current auto-completer tests aren't interactive - they do exactly the same thing your command does, but from Python. It's fine if you want to add tests but please don't remove the current tests since they directly test what the SB clients use.

This is a drop-in replacement for the other test. I'm not sure why we need to keep the other test, but I'll let others comment.

This will only allow you to test some of the auto-completers, for instance you don't have symbols so you can't test the symbol completer. But since the symbol commands in this lldb-test have some way to feed symbols in maybe you can crib that. I think you'll also need to make a target and some other bits as well. As you start adding these you might find that this becomes onerous, but that will be an interesting experiment.

I voluntarily left that as as follow up.

You didn't get the HandleCompletion API quite right. That's my fault this isn't well documented. The way it works is that if all the potential matches share a common substring, then the 0th element of the result contains the common substring. If there is no common substring, then the possible matches will be stored from element 1 on in the Results list. If you want examples take a closer look at how the Python test does it.

The API is a little confusing.
There are multiple issues IMHO:

  1. We shouldn't really discriminate based on the position of the elements of the list, as it's easy to get them wrong, as I did. Instead, the API might return a, let's say, pair, where the first element is the common substring and the second element is a list containing etc..
  2. We pass strings for the second and third argument (cursor/end), when we should just pass offsets
  3. The return value is N and the list contains N +1 values. This is very error prone.

Also, there's a bit of overengineering in the API. As far as I can tell, max_return_elements only supports -1, so that argument is bogus (at least, this is what the comment says, but comment and code could go out of sync, so).

davide added a comment.Feb 7 2018, 3:39 PM

By the way, I'd suggest printing indices in front of each match and including those in the FileCheck tests. Otherwise we could miss completions that sneak in.

Instead, or in addition, we might dump the number of matches so that it if changes we notice?

davide added a comment.Feb 7 2018, 3:40 PM

By the way, I'd suggest printing indices in front of each match and including those in the FileCheck tests. Otherwise we could miss completions that sneak in.

Instead, or in addition, we might dump the number of matches so that it if changes we notice?

(and use CHECK-NEXT). Anyway, your comment about adding indices is fine, I'll probably just add them.

Number of matches might be sufficient, but it might be nice to know if the ordering of matches changes for some reason. Unless there's a specific reason we want to allow an unstable order, enforcing a stable order seems desirable just on principle.

On the issue of keeping the other test, I think when an SB API method is basically a pass-through to a private method, then having a test of the SB API method that verifies "did the correct native method get called" is useful if for no other reason than to verify the correctness of the SWIG binding generation. If you've tested that the API method invokes the correct native method, and you've tested the native method directly with a large amount of inputs, then that's sufficient overall coverage IMO

jingham requested changes to this revision.Feb 7 2018, 5:28 PM

You do care about the common match string. When the lldb driver handles completion, if the common match string is not null, we append that to the line at the cursor position, then present the matches if there is more than one. So the common match string also has to be tested.

The ability to page the completion requests in the API would be useful for instance in symbol completion where you can get lots of matches, but if you only plan to display the first page you'd rather not pay the cost to go find them all. I put that in the SB API's because I didn't want to have to add another one when I got around to implementing this. When I get around to this I'll fix the docs... But you could remove that from the lldb private version if you're so motivated. I'll still remember I intended to extend it this way, even if nobody else will see that.

We can't return a std::pair across the SB API's, but we could make the common match be another parameter. There was some reason this seemed logical to me at the time, but I must admit I can't remember why now. It is in practice easy to use, however. You append element 0 to the cursor position, then print the rest of the completions if num_matches is > 1. Again, feel free to switch the lldb_private API if it bugs you.

An additional test in the Python testsuite is:

def test_target_create_dash_co(self):
    """Test that 'target create --co' completes to 'target variable --core '."""
    self.complete_from_to('target create --co', 'target create --core ')

So I still don't see why the file check method is preferable. But to each his own, I guess.

This revision now requires changes to proceed.Feb 7 2018, 5:28 PM

If a dotest test fails, you go to the Failure-whatever.log, take the last line, add a -d to it, and run it. It suspends itself, then you attach to the Python instance with the debugger.

labath added a comment.Feb 8 2018, 3:58 AM

The change seems fine to me, and I don't really have anything to add to the things that were said already.

Testing the completion of things that require a target/module/etc. will be a bit tricky, but that's something we should figure out anyway to have more targeted completion tests.

(Btw, if you're looking for things to FileCheck-ify, I think the stuff under lldb/unittests/UnwindAssembly is a prime candidate and has a much higher bang/buck ratio.)

(Btw, if you're looking for things to FileCheck-ify, I think the stuff under lldb/unittests/UnwindAssembly is a prime candidate and has a much higher bang/buck ratio.)

If you have ideas on how to FileCheck'ify unwind testing, I'm all for it. This recently came up in a discussion where the unwinder has been pointed out as hard to test.
Feel free to ping me offline (or just send me a mail) and we can start thinking about something [and then propose upstream]

No, the unwind unittests that exist today should stay written as unit tests. These are testing the conversion of native unwind formats -- for instance, eh_frame, compact unwind, or instruction analysis -- into the intermediate UnwindPlan representation in lldb. They are runtime invariant, unit tests are the best approach to these. If there were anything to say about these, it would be that we need more testing here - armv7 (AArch32) into UnwindPlan is not tested. eh_frame and compact_unwind into UnwindPlan is not tested.

The part of unwind that is difficult to test is the runtime unwind behavior, and FileCheck style tests don't make that easier in any way. We need a live register context, stack memory, symbols and UnwindPlans to test this correctly -- we either need a full ProcessMock with SymbolVendorMock etc, or we need corefiles, or we need tests with hand-written assembly code.

I'm leaning towards starting to write C test cases with hand written assembly routines that set up their stacks in unusual ways to test the unwinder, I've written unwind testsuites in the past in this way and it works well but it does mean that you need to be on the arch (and possibly even the OS) that the test is written in -- bots will show the failures for everyone after a commit goes in, at least. But these will be written as SB API tests.

No, the unwind unittests that exist today should stay written as unit tests. These are testing the conversion of native unwind formats -- for instance, eh_frame, compact unwind, or instruction analysis -- into the intermediate UnwindPlan representation in lldb. They are runtime invariant, unit tests are the best approach to these. If there were anything to say about these, it would be that we need more testing here - armv7 (AArch32) into UnwindPlan is not tested. eh_frame and compact_unwind into UnwindPlan is not tested.

That's exactly the type of thing that FileCheck tests work best for. I'm not sure why you're saying that unittests are better than FileCheck tests for this scenario.

The part of unwind that is difficult to test is the runtime unwind behavior, and FileCheck style tests don't make that easier in any way. We need a live register context, stack memory, symbols and UnwindPlans to test this correctly -- we either need a full ProcessMock with SymbolVendorMock etc, or we need corefiles, or we need tests with hand-written assembly code.

I don't think we'd necessarily need a live register context or stack memory. A mock register context and stack memory should be sufficient, with an emulator that understands only a handful of instructions.

labath resigned from this revision.Fri, Aug 9, 2:07 AM