This is an archive of the discontinued LLVM Phabricator instance.

[lit] Add --xfail and --filter-out (inverse of --filter)
ClosedPublic

Authored by davezarzycki on Feb 14 2021, 4:19 AM.

Details

Summary

In semi-automated environments, XFAILing or filtering out known regressions without actually committing changes or temporarily modifying the test suite can be quite useful.

Diff Detail

Event Timeline

davezarzycki created this revision.Feb 14 2021, 4:19 AM
davezarzycki requested review of this revision.Feb 14 2021, 4:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2021, 4:19 AM
davezarzycki retitled this revision from ' to [lit] Add --skip (inverse of --filter).
davezarzycki edited the summary of this revision. (Show Details)

Add docs not included in the first diff.

This definitely seems like something we could use. I haven't looked at the implementation yet, but just thinking about the feature - would it be better to have a flag that marks the test as XFAIL instead? This would mean that once the issue is fixed, the continued use of the switch would cause an XPASS, indicating the switch could be dropped. What do you think?

This definitely seems like something we could use. I haven't looked at the implementation yet, but just thinking about the feature - would it be better to have a flag that marks the test as XFAIL instead? This would mean that once the issue is fixed, the continued use of the switch would cause an XPASS, indicating the switch could be dropped. What do you think?

On the other hand, someone might use LIT_SKIP to skip tests that take too long time to run only on slow machine (others would have no LIT_SKIP). Marking the tests as XFAIL would prevent to skip passing tests.

This definitely seems like something we could use. I haven't looked at the implementation yet, but just thinking about the feature - would it be better to have a flag that marks the test as XFAIL instead? This would mean that once the issue is fixed, the continued use of the switch would cause an XPASS, indicating the switch could be dropped. What do you think?

On the other hand, someone might use LIT_SKIP to skip tests that take too long time to run only on slow machine (others would have no LIT_SKIP). Marking the tests as XFAIL would prevent to skip passing tests.

The specified use-case in the description is about regressions, which I assume mean test failures, not slower tests.

That being said, if there's a requested use-case for actually skipping tests, due to resource limitations of the machine, maybe a complement to my suggestion would be to have an "UNSUPPORTED" equivalent.

I agree that both --skip (a.k.a. LIT_SKIP) and --xfail (a.k.a. LIT_XFAIL) are independently useful so I've updated the patch.

davezarzycki retitled this revision from [lit] Add --skip (inverse of --filter) to [lit] Add --skip (inverse of --filter) and `--xfail`.Feb 15 2021, 6:20 AM
davezarzycki edited the summary of this revision. (Show Details)
jdenny added inline comments.Feb 15 2021, 6:45 AM
llvm/docs/CommandGuide/lit.rst
220

Why do we need more environment variables? Isn't LIT_OPTS sufficient? (LIT_FILTER, etc. predate LIT_OPTS.)

davezarzycki added inline comments.Feb 15 2021, 8:54 AM
llvm/docs/CommandGuide/lit.rst
220

I'm generally biased against designs that require thinking about shell escaping (or at least more than necessary).

If I might critique myself, I'm not sure that regular expression semantics are best for --xfail. One really wants and should use a precise list of some kind. Any thoughts on what that might look like? I'd lean toward a semicolon separated list.

After some thought, I think a list is better than a regex for --xfail. In contrast, having --skip take a regex makes sense because it is the opposite of --filter (like grep -v and grep respectively).

jdenny added inline comments.Feb 16 2021, 1:03 PM
llvm/docs/CommandGuide/lit.rst
220

I've been using --filter within LIT_OPTS for a while because I use LIT_OPTS for other purposes anyway. I don't find the escaping to be challenging. However, maybe quoting is harder in some shells, or maybe I've not imagined some use cases. OK, fair enough.

yln added a comment.Feb 16 2021, 2:44 PM

I can get behind --skip (patch LGTM), but would a little help to understand when --xfail=path/to/test1;path/to/test2 would be useful (which workflow?). Thanks!

In D96662#2566903, @yln wrote:

I can get behind --skip (patch LGTM), but would a little help to understand when --xfail=path/to/test1;path/to/test2 would be useful (which workflow?). Thanks!

One example where it would be useful - in our downstream merge process, we may identify a test that has started failing due to an upstream bug. We could modify the test explicitly by adding XFAIL to the start of it, or we could add it to the build script that is used to build and test the changes before they are merged in. The --xfail option allows the latter to happen in such a way that when the issue is fixed by a later upstream change, we'll notice (due to the test starting to pass) and remove the option from the script, whereas with --skip, we won't notice (and thus lose the testing downstream the test gave us). I could imagine other cases along these lines too, where --xfail is superior to --skip.

If I might critique myself, I'm not sure that regular expression semantics are best for --xfail. One really wants and should use a precise list of some kind. Any thoughts on what that might look like? I'd lean toward a semicolon separated list.

A precise list would make sense. I'd use commas rather than semicolons, because semicolons are more likely to require shell escaping, I think (commas are less likely to).

In D96662#2566903, @yln wrote:

I can get behind --skip (patch LGTM), but would a little help to understand when --xfail=path/to/test1;path/to/test2 would be useful (which workflow?). Thanks!

One example where it would be useful - in our downstream merge process, we may identify a test that has started failing due to an upstream bug. We could modify the test explicitly by adding XFAIL to the start of it, or we could add it to the build script that is used to build and test the changes before they are merged in. The --xfail option allows the latter to happen in such a way that when the issue is fixed by a later upstream change, we'll notice (due to the test starting to pass) and remove the option from the script, whereas with --skip, we won't notice (and thus lose the testing downstream the test gave us). I could imagine other cases along these lines too, where --xfail is superior to --skip.

I was thinking this could be useful to us for the same reason. Some tests fail because they make assumptions about default target or host. We send patch upstream for those but while they are in review and until we merge again, we tend to add xfail in the test. Sometimes the failure is due to the target being different in which case we just mark it unsupported. --skip and --xfail would be quite useful.

In D96662#2566903, @yln wrote:

I can get behind --skip (patch LGTM), but would a little help to understand when --xfail=path/to/test1;path/to/test2 would be useful (which workflow?). Thanks!

One example where it would be useful - in our downstream merge process, we may identify a test that has started failing due to an upstream bug. We could modify the test explicitly by adding XFAIL to the start of it, or we could add it to the build script that is used to build and test the changes before they are merged in. The --xfail option allows the latter to happen in such a way that when the issue is fixed by a later upstream change, we'll notice (due to the test starting to pass) and remove the option from the script, whereas with --skip, we won't notice (and thus lose the testing downstream the test gave us). I could imagine other cases along these lines too, where --xfail is superior to --skip.

Indeed, this is the problem I'm trying to solve.

If I might critique myself, I'm not sure that regular expression semantics are best for --xfail. One really wants and should use a precise list of some kind. Any thoughts on what that might look like? I'd lean toward a semicolon separated list.

A precise list would make sense. I'd use commas rather than semicolons, because semicolons are more likely to require shell escaping, I think (commas are less likely to).

In my experience, quotes solve most "escaping" problems, so what is used as a separator mostly doesn't matter as long as the character isn't used for variable expansion. Using the shell's statement separator is a good choice because shell users/programmers naturally avoid files with that character embedded in the name.

If I might critique myself, I'm not sure that regular expression semantics are best for --xfail. One really wants and should use a precise list of some kind. Any thoughts on what that might look like? I'd lean toward a semicolon separated list.

A precise list would make sense. I'd use commas rather than semicolons, because semicolons are more likely to require shell escaping, I think (commas are less likely to).

In my experience, quotes solve most "escaping" problems, so what is used as a separator mostly doesn't matter as long as the character isn't used for variable expansion. Using the shell's statement separator is a good choice because shell users/programmers naturally avoid files with that character embedded in the name.

Actually, maybe we should just use multiple --xfail options, rather than a single one that takes a list. That way, the individual file names people use isn't relevant.

If I might critique myself, I'm not sure that regular expression semantics are best for --xfail. One really wants and should use a precise list of some kind. Any thoughts on what that might look like? I'd lean toward a semicolon separated list.

A precise list would make sense. I'd use commas rather than semicolons, because semicolons are more likely to require shell escaping, I think (commas are less likely to).

In my experience, quotes solve most "escaping" problems, so what is used as a separator mostly doesn't matter as long as the character isn't used for variable expansion. Using the shell's statement separator is a good choice because shell users/programmers naturally avoid files with that character embedded in the name.

Actually, maybe we should just use multiple --xfail options, rather than a single one that takes a list. That way, the individual file names people use isn't relevant.

We can do that, but that won't work for LIT_XFAIL, so one would be forced to use LIT_OPTS and now we've come full circle back to the shell escaping problem.

If I might critique myself, I'm not sure that regular expression semantics are best for --xfail. One really wants and should use a precise list of some kind. Any thoughts on what that might look like? I'd lean toward a semicolon separated list.

A precise list would make sense. I'd use commas rather than semicolons, because semicolons are more likely to require shell escaping, I think (commas are less likely to).

In my experience, quotes solve most "escaping" problems, so what is used as a separator mostly doesn't matter as long as the character isn't used for variable expansion. Using the shell's statement separator is a good choice because shell users/programmers naturally avoid files with that character embedded in the name.

Actually, maybe we should just use multiple --xfail options, rather than a single one that takes a list. That way, the individual file names people use isn't relevant.

We can do that, but that won't work for LIT_XFAIL, so one would be forced to use LIT_OPTS and now we've come full circle back to the shell escaping problem.

Of course, after I sent this, I thought of a better way to articulate this. The problem with shell escapes is nesting and getting expansions to happen at the right level. This is why I don't care for LIT_OPTS because that's already one layer of escaping, and if a lit command line option needs further escaping, that's really annoying (for people that know better) and a common source of errors (for people that don't know better, which is to say, most people). This typically manifests when one tries to reason about white space in file names, but other escaping problems can cause nesting challenges.

Ping. Is anybody willing to sign off on this? There is clearly interest by those that maintain downstream repos to have something like --xfail and/or LIT_XFAIL. (I'm partial to the latter due to how my CI works.)

yln added a comment.Feb 18 2021, 11:30 AM

--skip: is it okay if we rename this to --filter-not? It's an uglier name but

  • more "self explanatory", i..e, "opposite of --filter". Anyone who knows --filter will know what it does just by looking the arg name.
  • Discoverable: in the usage and help output, it will appear next to --filter.
  • In the summary, we have a category of Skipped tests: tests that were not run because the user hit [Ctrl+C] or because of overall timeout. --skipped tests would appear in the Excluded category (e.g., --filter and sharding feature). Not using the word skip may help prevent some confusion.

Can we also add the summary counts in the test, so that the test shows that --filter-not excluded tests are counted in the Excluded category? Thanks!
Otherwise, LGTM, will approve today if you split this part out.

--xfail: Thanks everyone for explaining. As someone who also has to deal with downstream test failures, you convinced me that this is useful to have!
Summarizing my understanding: this is about the desire to (temporarily) mark tests as XFAIL in a centralized location (instead of editing individual test files), especially useful for a downstream integration workflows.
@davezarzycki @jhenderson @thopre Is this accurate?

How about --xfail-file which names a file for which each line means one XFAILed test. I imagine that:

  • Working with a file seems better (echo "test/path" >> xfail.txt, sed '/test/d' ./file), than fiddling with the lit invocation itself. Count how many XFAILed tests we have in upstream without even opening the file, etc.
  • The file would "make sense" in version control: line-wise addition/removals. "Didn't this test fail before? What was the issue previously? git log -- xfail.txt and see commit for fix and line removal.

What do you think?

In D96662#2572321, @yln wrote:

--skip: is it okay if we rename this to --filter-not? It's an uglier name but

I like this suggestion for all the reasons @yln gave. However, I'd prefer --filter-out. It has the same advantages @yln gave, such as being obvious if you know what --filter does. A small disadvantage is lack of grammatical symmetry. My issue is that --filter is not a clear name whereas --filter-in would've been clear. Without that context, --filter-not is worse, in my opinion. It sounds like "don't filter these tests", but isn't that the meaning of --filter? --filter-out is clear in any context, in my opinion.

In D96662#2572321, @yln wrote:

--skip: is it okay if we rename this to --filter-not? It's an uglier name but

I like this suggestion for all the reasons @yln gave. However, I'd prefer --filter-out. It has the same advantages @yln gave, such as being obvious if you know what --filter does. A small disadvantage is lack of grammatical symmetry. My issue is that --filter is not a clear name whereas --filter-in would've been clear. Without that context, --filter-not is worse, in my opinion. It sounds like "don't filter these tests", but isn't that the meaning of --filter? --filter-out is clear in any context, in my opinion.

I like --filter-out. Could we maybe introduce --filter-in and make --filter an alias for --filter-in?

In D96662#2572321, @yln wrote:

--skip: is it okay if we rename this to --filter-not? It's an uglier name but

  • more "self explanatory", i..e, "opposite of --filter". Anyone who knows --filter will know what it does just by looking the arg name.
  • Discoverable: in the usage and help output, it will appear next to --filter.
  • In the summary, we have a category of Skipped tests: tests that were not run because the user hit [Ctrl+C] or because of overall timeout. --skipped tests would appear in the Excluded category (e.g., --filter and sharding feature). Not using the word skip may help prevent some confusion.

Can we also add the summary counts in the test, so that the test shows that --filter-not excluded tests are counted in the Excluded category? Thanks!
Otherwise, LGTM, will approve today if you split this part out.

--xfail: Thanks everyone for explaining. As someone who also has to deal with downstream test failures, you convinced me that this is useful to have!
Summarizing my understanding: this is about the desire to (temporarily) mark tests as XFAIL in a centralized location (instead of editing individual test files), especially useful for a downstream integration workflows.
@davezarzycki @jhenderson @thopre Is this accurate?

Yes. Besides the centralisation, it also avoids changing upstream source (i.e. it allows to separate modifications to test as opposed to disabled test).

How about --xfail-file which names a file for which each line means one XFAILed test. I imagine that:

  • Working with a file seems better (echo "test/path" >> xfail.txt, sed '/test/d' ./file), than fiddling with the lit invocation itself. Count how many XFAILed tests we have in upstream without even opening the file, etc.
  • The file would "make sense" in version control: line-wise addition/removals. "Didn't this test fail before? What was the issue previously? git log -- xfail.txt and see commit for fix and line removal.

What do you think?

I think a file can be more convenient when lots of tests need to be disabled. I don't think that's ever been our case and I would personally be wary of disabling too many cases but it could be useful. By the way, would the syntax allow to skip entire directories?

davezarzycki retitled this revision from [lit] Add --skip (inverse of --filter) and `--xfail` to [lit] Add --xfail and --filter-out (inverse of --filter).
davezarzycki edited the summary of this revision. (Show Details)

I've renamed --skip to --filter-out. I prefer "out" to "not" because that matches the documentation for the command.

I'd like to be done with this patch proposal.

If people want to add --xfail-file later (or rename --filter), go for it. I agree that --xfail-file will be useful but I don't need it yet and I think designing that feature is slightly complicated. Personally, if one is going to have a file, why limit it to just XFAIL? Why not support other lit features like REQUIRES or UNSUPPORTED or ALLOW_RETRIES? All of these might be useful to a downstream maintainer that needs to temporarily tweak a test until something is fixed.

I like --filter-out. Could we maybe introduce --filter-in and make --filter an alias for --filter-in?

Possibly. We'd have to decide whether there should be a LIT_FILTER_IN and how it would override or otherwise combine with LIT_FILTER. The aliasing probably should be a separate patch.

I've renamed --skip to --filter-out. I prefer "out" to "not" because that matches the documentation for the command.

Thanks.

I agree that --xfail-file will be useful but I don't need it yet and I think designing that feature is slightly complicated. Personally, if one is going to have a file, why limit it to just XFAIL? Why not support other lit features like REQUIRES or UNSUPPORTED or ALLOW_RETRIES? All of these might be useful to a downstream maintainer that needs to temporarily tweak a test until something is fixed.

I do see some appeal to having a general config file (for test suite users as opposed to test suite authors) instead of an xfail-specific file, but I've formed no strong opinions on the way --xfail is specified.

Other than some inline comments I just added, I'm fine with this patch as is, but I'm going to leave acceptance to reviewers who described their use cases.

llvm/utils/lit/tests/Inputs/xfail-cl/lit.cfg
21

Most of this config appears irrelevant to this test.

llvm/utils/lit/tests/xfail-cl.py
8

Shouldn't these say XFAIL?

yln accepted this revision.Feb 19 2021, 10:00 AM

I am also happy with --filter-out.

I agree that --xfail-file will be useful but I don't need it yet and I think designing that feature is slightly complicated. Personally, if one is going to have a file, why limit it to just XFAIL? Why not support other lit features like REQUIRES or UNSUPPORTED or ALLOW_RETRIES? All of these might be useful to a downstream maintainer that needs to temporarily tweak a test until something is fixed.

Point taken. Thanks for your patience and for this whole effort! :)

LGTM, with Joel's nits plus one small one from myself.

llvm/utils/lit/lit/main.py
77

Can we do

selected_tests = [t for t in discovered_tests if opts.filter.search(t.getFullName()) and not opts.filter_out.search(t.getFullName()]

and have a function mark_xfailing() just above mark_excluded() below?

Thanks!

99–101

Here

This revision is now accepted and ready to land.Feb 19 2021, 10:00 AM

I believe I have addressed all of the feedback to date.

This revision was landed with ongoing or failed builds.Feb 20 2021, 2:44 AM
This revision was automatically updated to reflect the committed changes.
jdenny added inline comments.Feb 20 2021, 6:22 AM
llvm/utils/lit/tests/xfail-cl.py
6

It's END. not END:. Then you shouldn't need those {{ and }} around the XFAIL.

If a lit invocation runs multiple test suites that contain identically named test files, is there any way to make an --xfail entry apply to only one of them?

If a lit invocation runs multiple test suites that contain identically named test files, is there any way to make an --xfail entry apply to only one of them?

As the code exists today, no. That'd need to be a followup patch.