If a single test has been running for more than 10 minutes on a CI node,
something is wrong and it should time-out instead of running until the
node potentially times out itself.
Details
- Reviewers
- None
- Group Reviewers
Restricted Project - Commits
- rG8b5e4c038ed7: [runtimes][CI] Add a 20 minutes individual test time out
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Could this be put into the lit config itself, so it applies to even local/developer invocations rather than only CI invocations?
Sorry for dropping this on the floor -- I've actually been wondering about what's the best answer to this. I believe I've convinced myself that no, the timeout should really only be on the CI nodes. Indeed, the test suite could be run on e.g. a simulator device that is extremely slow, and we wouldn't, in that case, have such a timeout. So I think the correct thing to do is tie this timeout to our CI fleet only, where we know we must have reasonable timing guarantees.
Rebase onto main. I suspect this will still fail due to missing psutil on our CI nodes -- I'll add it separately.
Increase timeout to 10 minutes to account for long TSan tests. We should also try to reduce tests that take too long to run in the test suite as a separate effort.
Bump to 20 minutes. We'll really want to investigate the deque test that takes that long.
Generally if something's sent for review it shouldn't be committed until it's approved. If you're looking to run the CI without review, you can use arc's --draft argument to produce a phab 'review' that isn't sent for review, this makes it clear whether something's intended for human review or not.
I think I disagree with you here. When people submit things for review even though they plan to just land it when CI passed it is a lot easier to see what people are up to and what changes (now matter how big) are made to the code base. libc++ is relatively small, so the changes easily affect other people's patches.
The issue is that if someone sends something for review, then commits it without approval - it's possible they felt it needed review, but then committed without it because they weren't willing to wait for review or other reasons. Basically - sending it for review is generally a statement of "this needs a second set of eyes" & then committing without approval seems like walking that back.
If folks want to (& they should, for sure) keep an eye on changes in their area - the place/tools for that are the commits list (filters/searches if you need to pare it down a bit, understandably) and Phabricator's Herald rules ( https://secure.phabricator.com/book/phabricator/article/herald/ )
libcxx/utils/ci/run-buildbot | ||
---|---|---|
87 | Tests which runs >1000 are problematic for buildbot I understand that this shell script is not for buildbot, but existence of such tests is a problem. |
Tests which runs >1000 are problematic for buildbot
It timeouts if no output for 1200sec https://lab.llvm.org/buildbot/#/builders/237/builds/135
I understand that this shell script is not for buildbot, but existence of such tests is a problem.