This is an archive of the discontinued LLVM Phabricator instance.

[runtimes][CI] Add a 10 minutes individual test time out
ClosedPublic

Authored by ldionne on Dec 1 2021, 11:29 AM.

Details

Reviewers
None
Group Reviewers
Restricted Project
Commits
rG8b5e4c038ed7: [runtimes][CI] Add a 20 minutes individual test time out
Summary

If a single test has been running for more than 10 minutes on a CI node,
something is wrong and it should time-out instead of running until the
node potentially times out itself.

Diff Detail

Event Timeline

ldionne created this revision.Dec 1 2021, 11:29 AM
ldionne requested review of this revision.Dec 1 2021, 11:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2021, 11:29 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Could this be put into the lit config itself, so it applies to even local/developer invocations rather than only CI invocations?

Could this be put into the lit config itself, so it applies to even local/developer invocations rather than only CI invocations?

Sorry for dropping this on the floor -- I've actually been wondering about what's the best answer to this. I believe I've convinced myself that no, the timeout should really only be on the CI nodes. Indeed, the test suite could be run on e.g. a simulator device that is extremely slow, and we wouldn't, in that case, have such a timeout. So I think the correct thing to do is tie this timeout to our CI fleet only, where we know we must have reasonable timing guarantees.

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 2:15 PM
ldionne updated this revision to Diff 416316.Mar 17 2022, 2:16 PM

Rebase onto main. I suspect this will still fail due to missing psutil on our CI nodes -- I'll add it separately.

ldionne updated this revision to Diff 419512.Mar 31 2022, 10:34 AM
ldionne retitled this revision from [runtimes][CI] Add a 5 minutes individual test time out to [runtimes][CI] Add a 10 minutes individual test time out.
ldionne edited the summary of this revision. (Show Details)

Increase timeout to 10 minutes to account for long TSan tests. We should also try to reduce tests that take too long to run in the test suite as a separate effort.

ldionne updated this revision to Diff 421622.Apr 8 2022, 1:03 PM

15 minutes. Geez how long is that test?

ldionne updated this revision to Diff 421977.Apr 11 2022, 11:15 AM

Bump to 20 minutes. We'll really want to investigate the deque test that takes that long.

ldionne accepted this revision as: Restricted Project.Apr 11 2022, 2:46 PM
This revision was not accepted when it landed; it landed in state Needs Review.Apr 11 2022, 2:47 PM
This revision was automatically updated to reflect the committed changes.

Generally if something's sent for review it shouldn't be committed until it's approved. If you're looking to run the CI without review, you can use arc's --draft argument to produce a phab 'review' that isn't sent for review, this makes it clear whether something's intended for human review or not.

Generally if something's sent for review it shouldn't be committed until it's approved. If you're looking to run the CI without review, you can use arc's --draft argument to produce a phab 'review' that isn't sent for review, this makes it clear whether something's intended for human review or not.

I think I disagree with you here. When people submit things for review even though they plan to just land it when CI passed it is a lot easier to see what people are up to and what changes (now matter how big) are made to the code base. libc++ is relatively small, so the changes easily affect other people's patches.

Generally if something's sent for review it shouldn't be committed until it's approved. If you're looking to run the CI without review, you can use arc's --draft argument to produce a phab 'review' that isn't sent for review, this makes it clear whether something's intended for human review or not.

I think I disagree with you here. When people submit things for review even though they plan to just land it when CI passed it is a lot easier to see what people are up to and what changes (now matter how big) are made to the code base. libc++ is relatively small, so the changes easily affect other people's patches.

The issue is that if someone sends something for review, then commits it without approval - it's possible they felt it needed review, but then committed without it because they weren't willing to wait for review or other reasons. Basically - sending it for review is generally a statement of "this needs a second set of eyes" & then committing without approval seems like walking that back.

If folks want to (& they should, for sure) keep an eye on changes in their area - the place/tools for that are the commits list (filters/searches if you need to pare it down a bit, understandably) and Phabricator's Herald rules ( https://secure.phabricator.com/book/phabricator/article/herald/ )

vitalybuka added inline comments.
libcxx/utils/ci/run-buildbot
89

Tests which runs >1000 are problematic for buildbot
It timeouts if no output for 1200sec https://lab.llvm.org/buildbot/#/builders/237/builds/135

I understand that this shell script is not for buildbot, but existence of such tests is a problem.