This is an archive of the discontinued LLVM Phabricator instance.

[lit] Add builtin support for flaky tests in lit
ClosedPublic

Authored by ldionne on Mar 17 2020, 8:13 AM.

Details

Summary

This commit adds a new keyword in lit called ALLOW_FAILURES. This keyword
takes a single integer as an argument, and it allows the test to fail that
number of times before it first succeeds.

This work attempts to make the existing test_retry_attempts more flexible
by allowing by-test customization, as well as eliminate libc++'s FLAKY_TEST
custom logic.

Diff Detail

Event Timeline

ldionne created this revision.Mar 17 2020, 8:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2020, 8:13 AM

Note that an alternative design would be to drop the integer from ALLOW_FAILURES: and make it a tag keyword only. Then, config.test_retry_attempts could be applied only on the tests that are marked with that keyword. I actually prefer this approach, however it is a behavior change for existing users of config.test_retry_attempts.

Also, I'm open to changing ALLOW_FAILURES to something else, like ALLOW_RETRIES. Suggestions welcome.

Finally, note that this doesn't handle XFAIL tests that would succeed repeatedly. This is actually an existing condition of config.test_retry_attempts, and the root cause is that we translate test results higher up in the call chain, near _execute in worker.py. Unfortunately, handling ALLOW_FAILURES at that point involves parsing the test several times, which messes up things like test.xfails. Since this is a pre-existing condition, I did not try to fix it cause it would have required a probably-breaking refactoring.

ldionne added a subscriber: EricWF.Mar 17 2020, 8:19 AM
rnk added a comment.Mar 17 2020, 9:39 AM

Note that an alternative design would be to drop the integer from ALLOW_FAILURES: and make it a tag keyword only. Then, config.test_retry_attempts could be applied only on the tests that are marked with that keyword. I actually prefer this approach, however it is a behavior change for existing users of config.test_retry_attempts.

I guess I prefer this approach for now. So far, the big users of lit (clang and llvm) have few flaky tests, so lit has had no support for retries or deflaking. With your keyword approach, hopefully flaky tests will not proliferate too quickly.

Also, I'm open to changing ALLOW_FAILURES to something else, like ALLOW_RETRIES. Suggestions welcome.

I guess I like ALLOW_RETRIES: N the best. My other idea was RETRY_LIMIT.

Finally, note that this doesn't handle XFAIL tests that would succeed repeatedly. This is actually an existing condition of config.test_retry_attempts, and the root cause is that we translate test results higher up in the call chain, near _execute in worker.py. Unfortunately, handling ALLOW_FAILURES at that point involves parsing the test several times, which messes up things like test.xfails. Since this is a pre-existing condition, I did not try to fix it cause it would have required a probably-breaking refactoring.

I don't think we need to put any effort into making this work. Flaky failing tests should be disabled instead.

Basically, looks good, but I'd suggest a different tag. Thanks!

In D76288#1926868, @rnk wrote:

[...]

I don't think we need to put any effort into making this work. Flaky failing tests should be disabled instead.

Basically, looks good, but I'd suggest a different tag. Thanks!

Great, thanks! I'll change the keyword. Two questions:

  1. Is this set of reviewers fine for lit? I'm not familiar with that component.
  2. Is there documentation for the ShTest format? I could add something to https://llvm.org/docs/TestingGuide.html, or maybe we're better off keeping this feature on the down low?
ldionne updated this revision to Diff 250861.Mar 17 2020, 12:29 PM

Rename the keyword to ALLOW_RETRIES

rnk accepted this revision.Mar 17 2020, 1:22 PM
rnk added a subscriber: yln.

lgtm

As far as code ownership goes, I think @yln has been actively working on lit recently, and might be a good reviewer. I think @delcypher has moved on to other projects, but I can't speak for them.

llvm/utils/lit/lit/TestRunner.py
1508

I'm curious, do you find this logic of having one try and then adding N retries reasonable or confusing? I wasn't sure about it when I wrote it. Should the term "retry" be exclusive from the first try?

llvm/utils/lit/tests/allow-retries.py
3 ↗(On Diff #250861)

I see, cleaning up the counter file is taken care of at a higher level.

This revision is now accepted and ready to land.Mar 17 2020, 1:22 PM
ldionne added a reviewer: yln.Mar 17 2020, 1:33 PM
ldionne marked 2 inline comments as done.
ldionne added inline comments.
llvm/utils/lit/lit/TestRunner.py
1508

I think it is reasonable. For me, a *retry* is something you do after the first try. So the total number of attempts is retries + 1 -- it makes sense that way in my head.

llvm/utils/lit/tests/allow-retries.py
3 ↗(On Diff #250861)

I'm cleaning up *before* I run the test to make sure the counter file does not already exist, which is the case when running the tests over and over again.

yln accepted this revision.Mar 17 2020, 2:51 PM

This looks like a reasonable way to enable per-test retries. (I feel that test_retry_attempts was tacked on as an afterthought.)
Thanks for adding tests, and also thanks to Reid for giving good feedback.

LGTM, with one quick ask to add 2 more tests to document the behavior/intereaction with config.test_retry_attempts:

  • test_retry_attempts (without ALLOW_RETRIES) should still be respected (this test should have been there in the first place).
  • ALLOW_RETRIES overwrites test_retry_attempts, when both are specified.
ldionne updated this revision to Diff 251183.Mar 18 2020, 3:02 PM

Add tests as requested

In D76288#1927738, @yln wrote:

LGTM, with one quick ask to add 2 more tests to document the behavior/intereaction with config.test_retry_attempts:

  • test_retry_attempts (without ALLOW_RETRIES) should still be respected (this test should have been there in the first place).
  • ALLOW_RETRIES overwrites test_retry_attempts, when both are specified.

Thanks for the feedback, I implemented the tests. I'll commit now.

This revision was automatically updated to reflect the committed changes.