This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
packages/Python/lldbsuite/test/functionalities/completion/
-
Python/
-
lldbsuite/
-
test/
-
functionalities/
-
completion/
-
TestCompletion.py

Differential D42656

[testsuite] Remove flakey test relying on `pexpect`
AbandonedPublic

Authored by davide on Jan 29 2018, 12:26 PM.

Download Raw Diff

Details

Reviewers

aprantl
vsk
jingham
jasonmolenda
friss
JDevlieghere

Summary

This passes half of the time on my machine, and half of the time doesn't.
What this tests doesn't seem extremely valuable and the fact it relies on pexpect makes it slightly worse.
As I'm going to enable UNEXPECTED SUCCESSES as failures as soon as possible, I propose to remove this (and the other tests) we can't control

Diff Detail

Event Timeline

davide created this revision.Jan 29 2018, 12:26 PM

davide added a subscriber: lldb-commits.

What's the failure mode? Have we had any issues with this on the bots?

Generally I'm all for removing flaky tests, I'd like to understand what makes this flaky so we can avoid whatever it is in the future. In this case, we should be able test this like we test clang code completion, a la c-index-test.

There are a whole bunch of other tests that test completion in this file that use the exact same mechanism but don't seem to be flakey. Why is this one test flakey?

If for instance it's because "Fo" ends up being ambiguous because we chose too common a start string, then you could trivially fix the test by choosing a more uncommon name to complete on. But I'd want to know why this test is flakey first.

I also don't see why you say this test doesn't test something important. The ability to auto-complete symbol names it pretty important to command-line lldb users. If anything we should have more tests of the symbol completer...

This revision now requires changes to proceed.Jan 29 2018, 12:39 PM

In D42656#991065, @jingham wrote:

There are a whole bunch of other tests that test completion in this file that use the exact same mechanism but don't seem to be flakey. Why is this one test flakey?

If for instance it's because "Fo" ends up being ambiguous because we chose too common a start string, then you could trivially fix the test by choosing a more uncommon name to complete on. But I'd want to know why this test is flakey first.

OK, I'll take a look at this further.

In D42656#991065, @jingham wrote:

There are a whole bunch of other tests that test completion in this file that use the exact same mechanism but don't seem to be flakey. Why is this one test flakey?

If for instance it's because "Fo" ends up being ambiguous because we chose too common a start string, then you could trivially fix the test by choosing a more uncommon name to complete on. But I'd want to know why this test is flakey first.

I also don't see why you say this test doesn't test something important. The ability to auto-complete symbol names it pretty important to command-line lldb users. If anything we should have more tests of the symbol completer...

Apologies, I was thinking of another test (happens when there are multiple UNEXPECTED SUCCESSES ;)

This seems to pass pretty reliably locally. I'd go for UNXFAILING this unless there are objections.

In D42656#991065, @jingham wrote:

There are a whole bunch of other tests that test completion in this file that use the exact same mechanism but don't seem to be flakey. Why is this one test flakey?

So, I take a look at this to reply to your question. They're all flakey.
Most of them fails non-deterministically depending on the load of the system. I just wrote a driver that spawns randomly threads that spin loops & checked out multiple copies of lldb and ran test suites in parallel.
I see many of the failures testing completion randomly failing.

In D42656#991239, @davide wrote:

In D42656#991065, @jingham wrote:

There are a whole bunch of other tests that test completion in this file that use the exact same mechanism but don't seem to be flakey. Why is this one test flakey?

So, I take a look at this to reply to your question. They're all flakey.
Most of them fails non-deterministically depending on the load of the system. I just wrote a driver that spawns randomly threads that spin loops & checked out multiple copies of lldb and ran test suites in parallel.
I see many of the failures testing completion randomly failing.

As an example of a test which just failed on my machine because of a timeout (non-deterministic, happens around ~1/50 times on my machine)

TIMEOUT: test_step_over_3_times_dsym (python_api/thread/TestThreadAPI.py)

[TestThreadAPI.py FAILED] (TIMEOUT)
Command invoked: /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python /Users/dcci/work/llvm/llvm/tools/lldb/test/dotest.py -q --arch=x86_64 --executable /Users/dcci/work/llvm/build/bin/lldb -s /Users/dcci/work/llvm/build/lldb-test-traces -S nm -u CXXFLAGS -u CFLAGS -C /Users/dcci/work/llvm/build/bin/clang --codesign-identity lldb_codesign --server /Users/dcci/work/llvm/build/bin/debugserver --results-port 52049 -S nm --inferior -p TestThreadAPI.py /Users/dcci/work/llvm/llvm/tools/lldb/packages/Python/lldbsuite/test --event-add-entries worker_index=6:int

I'll probably go ahead and either unXFAIL or SKIP the tests that I find being non-reliable cause of pexpect. Having tests that pass most of the time under load but not always is quite unfortunate, in particular now that we want to enable continuous integration for downstream projects using lldb.

Best,

Davide

If we just need to test completion, write a lit-style test that uses lldb-test that looks like this:

RUN: lldb-test complete --target=%T/foo --complete_str=MyPrefix | FileCheck %s

CHECK: Foo::Bar
CHECK: Foo::Baz
etc

Simple and not flaky

So that sounds like a pexpect problem. We have seen that pexpect based tests tend to be flakey, and particular will time out on loaded systems as you are seeing. But I really don't think there are very many cases where we need to use pexpect.

For instance, all the completion tests using the SBCommandInterpreter::HandleCompletion. This is a pretty direct translation to the code that the CommandInterpreter uses to handle tabs, so you wouldn't change what you are testing, and the tests would be much more stable because you're just calling an API rather than scraping stdout from a sub-launched lldb.

We do need to add more command-line parsing tests as well, but it would be straightforward to write a little unit test harness that feeds in command lines and checks the parsed version of the command (before it gets passed to DoExecute) without actually executing the command.

I would rather not just xfail all these pexpect tests, I worry they will just stay that way. Be better if we took the time to write them using more stable testing methods.

There are SB API's that call into the completion mechanism, so you could also just change the TestCompletion.complete_from_to test method to call the SB completion call. Given how the test is written, it looks like all you would have to do is reimplement that method, and all the tests should be on a good footing.

Spinning up a process just to test that auto-completion works though seems a little bit unnecessary, and introduces the possibility that flakiness and bugs in the process spawn mechanism (if any exist) will manifest in the test failure. It would be nice, if and when this test fails, to have some confidence that the failure is related to auto completing symbol names.

In D42656#991284, @zturner wrote:
If we just need to test completion, write a lit-style test that uses lldb-test that looks like this:
RUN: lldb-test complete --target=%T/foo --complete_str=MyPrefix | FileCheck %s

CHECK: Foo::Bar
CHECK: Foo::Baz
etc
Simple and not flaky

That sound a great idea. I'll work on something like that in case this turns out to be unreliable (or, independently, when I get a chance :)

In the meanwhile, I reenabled the tests on rL323707 to see whether they fail on the bot.
FWIW, I agree that's a sledgehammer spinning an instance to test autocompletion.

Best,

Davide

There would be no spinning an instance, it would be call the API in python. No extra process, done in process.

Yes. We do need to have symbols to do symbol completion, which does require a binary, but you don't need to run it. Most of the other tests in there (e.g. simple command completion) should be able to work without even a binary. It seems weird to add a potentially fallible lldb-test layer on top of SBCommandInterpreter.HandleCompletion just so you can send text input through lldb-test rather than directly sending text input to SBCommandInterpreter.HandleCommand.

lldb testcases are known not to be flakey if they don't use pexpect, which these wouldn't. The setup machinery for running a dotest based test is pretty well tested at this point.

And the lldb-test test would not just magically come into being by writing the lit-form text you suggested. You will have to write a lldb-test func that marshals the input (both --complete-string and you'll also need a cursor position to test completion inside a line). That's new and not tested code. Whereas the dotest test relies on the API it is directly testing, and trust that the basic machinery of dotest is going to continue to function.

I think this is obsolete by now.

Revision Contents

Path

Size

packages/

Python/

lldbsuite/

test/

functionalities/

completion/

TestCompletion.py

76 lines

Diff 131849

packages/Python/lldbsuite/test/functionalities/completion/TestCompletion.py

Context not available.
	def test_target_va(self):	def test_target_va(self):
	"""Test that 'target va' completes to 'target variable '."""	"""Test that 'target va' completes to 'target variable '."""
	self.complete_from_to('target va', 'target variable ')	self.complete_from_to('target va', 'target variable ')

	@expectedFailureAll(hostoslist=["windows"], bugnumber="llvm.org/pr24679")
	@expectedFailureAll(
	oslist=lldbplatform.darwin_all,
	bugnumber="llvm.org/pr25485,<rdar://problem/28573448>")
	def test_symbol_name(self):
	self.build()
	self.complete_from_to('''file %s
	breakpoint set -n Fo''' %
	self.getBuildArtifact("a.out"),
	'breakpoint set -n Foo::Bar(int,\\ int)',
	turn_off_re_match=True)

	def complete_from_to(self, str_input, patterns, turn_off_re_match=False):
	"""Test that the completion mechanism completes str_input to patterns,
	where patterns could be a pattern-string or a list of pattern-strings"""
	import pexpect
	# Patterns should not be None in order to proceed.
	self.assertFalse(patterns is None)
	# And should be either a string or list of strings. Check for list type
	# below, if not, make a list out of the singleton string. If patterns
	# is not a string or not a list of strings, there'll be runtime errors
	# later on.
	if not isinstance(patterns, list):
	patterns = [patterns]

	# The default lldb prompt.
	prompt = "(lldb) "

	# So that the child gets torn down after the test.
	self.child = pexpect.spawn(lldbtest_config.lldbExec,
	[self.lldbOption] + ['--no-use-colors'])
	child = self.child
	# Turn on logging for input/output to/from the child.
	with open('child_send.txt', 'w') as f_send:
	with open('child_read.txt', 'w') as f_read:
	child.logfile_send = f_send
	child.logfile_read = f_read

	child.expect_exact(prompt)
	child.setecho(True)
	# Sends str_input and a Tab to invoke the completion machinery.
	child.send("%s\t" % str_input)
	child.sendline('')
	child.expect_exact(prompt)
	child.sendline('')
	child.expect_exact(prompt)

	# Now that the necessary logging is done, restore logfile to None to
	# stop further logging.
	child.logfile_send = None
	child.logfile_read = None

	with open('child_send.txt', 'r') as fs:
	if self.TraceOn():
	print("\n\nContents of child_send.txt:")
	print(fs.read())
	with open('child_read.txt', 'r') as fr:
	from_child = fr.read()
	if self.TraceOn():
	print("\n\nContents of child_read.txt:")
	print(from_child)

	# The matching could be verbatim or using generic re pattern.
	for p in patterns:
	# Test that str_input completes to our patterns or substrings.
	# If each pattern/substring matches from_child, the completion
	# mechanism works!
	if turn_off_re_match:
	self.expect(
	from_child, msg=COMPLETION_MSG(
	str_input, p), exe=False, substrs=[p])
	else:
	self.expect(
	from_child, msg=COMPLETION_MSG(
	str_input, p), exe=False, patterns=[p])
Context not available.