Page MenuHomePhabricator

Allow for tests to be disabled at runtime
ClosedPublic

Authored by fjricci on Sep 15 2016, 2:03 PM.

Details

Summary

The current implementation of the test suite allows the user to run
a certain subset of tests using '-p', but does not allow the inverse,
where a user wants to run all but some number of known failing tests.
Implement this functionality.

Diff Detail

Repository
rL LLVM

Event Timeline

fjricci updated this revision to Diff 71551.Sep 15 2016, 2:03 PM
fjricci retitled this revision from to Allow for tests to be disabled at runtime.
fjricci updated this object.
fjricci added reviewers: zturner, labath, tfiala.
fjricci added subscribers: lldb-commits, sas.
zturner edited edge metadata.Sep 15 2016, 2:04 PM

If a set of tests is failing, wouldn't you just want to xfail them?

The issue is that you can only commit a patch to xfail a test that fails when you run the test suite on master with no local changes.

The problem is that if you run into test failures on other branches or in unconventional configurations, there is no good way to disable failing tests, other than carrying local patches to xfail the tests which fail. Carrying these sorts of local patches is tedious, prone to breakages, and requires many manual changes whenever test suite sources changes.

I'm particular, we run into this with ds2, since it fails some tests passed by lldb-server (and passes some tests xfail-ed by lldb-server).

I also find that I fail different tests on master (with lldb-server) between Ubuntu and CentOS, for example, and I'm not sure that it makes sense to xfail in those cases.

labath edited edge metadata.Sep 16 2016, 3:49 AM

I don't think this is a totally bad idea. In fact we already had something like this (nobody used it though), before it was removed in rL255040. If it goes in, we might start using it actually -- e.g., currently we have watchpoint tests which fail on some devices which do not support watchpoints. There is no reasonable thing we can base the expectation as the exact same device with a different cpu revision could support watchpoints just fine, so we could just define the list of these tests externally (in this case, I would probably annotate them with the watchpoint category and then do the skips based on categories instead).

That said, I do have slightly mixed feelings about it, as it is increasing the complexity of an already complex system, and there are other possible ways to solve the watchpoint problem (have the tests detect whether the device supports watchpoints, and self-skip when appropriate).

packages/Python/lldbsuite/test/dotest.py
803 ↗(On Diff #71551)

We should just import re at top level. A lot of tests already do that, so it's not likely it will break anyone.

fjricci updated this revision to Diff 71651.Sep 16 2016, 8:10 AM
fjricci edited edge metadata.

Refactor re

I do understand the complexity problem, and it was one of my concerns with this as well. For my cases, the complexity here is significantly less than the alternatives, but I also do understand if you don't think that's generally true.

It probably comes down to how often we think that people are running the test suite in cases where this sort of functionality would be useful. I don't really have a good sense for how other people tend to use the test suite, so I'm personally not sure. For our case, it's a big deal, but if we're the only people who this patch helps, I know it doesn't make sense to merge it.

tfiala edited edge metadata.Sep 23 2016, 9:26 AM

There is no reasonable thing we can base the expectation as the exact same device with a different cpu revision could support watchpoints just fine, so we could just define the list of these tests externally (in this case, I would probably annotate them with the watchpoint category and then do the skips based on categories instead).

Tangential: most chips I've worked on that had hardware watchpoint support had an instruction that could be called to find out if such a feature exists. I think ARM does this. I would think we could expose an API that says whether watchpoints are supported or not, and use that info in LLDB and the test suite to enable or disable them.

I'll look at the rest of the change here. I'm not opposed to the general idea, although if it encourages people to skip running tests, then check in code that breaks those tests, "unbeknownst to them" (* only because they were intentionally not running them), then I'd say that's bad news.

tfiala accepted this revision.Sep 23 2016, 9:33 AM
tfiala edited edge metadata.

I am accepting this with one strong reservation which I will explicitly call out here:

  • If somebody checks in changes that are broken, and claims they missed it because they have an xfail exclusion file and didn't catch it, I will rip this out. If the xfails are hard to setup, it is likely that this is a code smell for needing better decorators to more precisely home in on the cases that are failing. Often times version checks are helpful.

I do get the utility this would afford for bring-up of different scenarios, though. Hence I see that being useful enough to have it as an escape hatch.

packages/Python/lldbsuite/test/configuration.py
107–108 ↗(On Diff #71651)

The skip seems okay. The xfail seems *very* dangerous. Nobody else is going to get these xfails. We're setting ourselves up for having people check in tests that are broken. It allows for a workflow where the user "thinks they're done", when they're not.

This revision is now accepted and ready to land.Sep 23 2016, 9:33 AM
fjricci added a comment.EditedSep 23 2016, 9:36 AM

There is no reasonable thing we can base the expectation as the exact same device with a different cpu revision could support watchpoints just fine, so we could just define the list of these tests externally (in this case, I would probably annotate them with the watchpoint category and then do the skips based on categories instead).

Tangential: most chips I've worked on that had hardware watchpoint support had an instruction that could be called to find out if such a feature exists. I think ARM does this. I would think we could expose an API that says whether watchpoints are supported or not, and use that info in LLDB and the test suite to enable or disable them.

I believe that PTRACE_GETHBPREGS with a value of 0 returns the hardware stoppoint info on arm, and the byte representing the number of available hardware watchpoints will be 0 if they aren't supported. Not sure if there's a simpler way.

Ok. Barring objections from anyone else, I'll merge this later on today then, with the understanding that if it causes issues like the ones you describe, it should be reverted.

As long as the only way you can specify the black-list is explicitly on the command line, I think this is fine. There should never be implicit searches for a backlist file. You must have to supply it each time you run the testsuite. That way somebody would have to willfully decide not to run the full testsuite on their patch, and that's a human not a tech problem, since they could just as well check it in with failures they are ignoring, and not need this fancy mechanism...

This revision was automatically updated to reflect the committed changes.

There is no reasonable thing we can base the expectation as the exact same device with a different cpu revision could support watchpoints just fine, so we could just define the list of these tests externally (in this case, I would probably annotate them with the watchpoint category and then do the skips based on categories instead).

Tangential: most chips I've worked on that had hardware watchpoint support had an instruction that could be called to find out if such a feature exists. I think ARM does this. I would think we could expose an API that says whether watchpoints are supported or not, and use that info in LLDB and the test suite to enable or disable them.

I believe that PTRACE_GETHBPREGS with a value of 0 returns the hardware stoppoint info on arm, and the byte representing the number of available hardware watchpoints will be 0 if they aren't supported. Not sure if there's a simpler way.

It's a bit trickier than that. In some cases that call will still return non-zero as the number of supported watchpoints, but the "watchpoint size" field will be zero, and it will still mean that watchpoints don't work. This is probably a kernel bug, though it is pretty easy to work around. The more boring part would be plumbing that information all the way to the test suite - Nothing that can't be done, it's just a bit laborious, so I haven't done that yet.