This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
utils/libcxx/test/
-
libcxx/
-
test/
-
format.py

Differential D56064

More tolerance for flaky tests in libc++ on NetBSD
ClosedPublic

Authored by krytarowski on Dec 24 2018, 6:05 AM.

Download Raw Diff

Details

Reviewers

mgorny
EricWF
serge-sans-paille

Commits

rG2803bcf5b0db: More tolerance for flaky tests in libc++ on NetBSD
rL350170: More tolerance for flaky tests in libc++ on NetBSD
rCXX350170: More tolerance for flaky tests in libc++ on NetBSD

Summary

Tests marked with the flaky attribute ("FLAKY_TEST.")
can still report false positives in local tests and on the
NetBSD buildbot.

Additionally a number of tests (probably all threaded
ones) unmarked with the flaky attribute is flaky on
NetBSD.

An ideal solution on the libcxx side would be to raise
max retries for NetBSD and mark failing tests with
the flaky flag, however this adds more maintenance
burden and constant monitoring of flaky tests.

Reduce the work and handle flaky tests as more flaky
on NetBSD and allow flakiness of other tests on
NetBSD.

Diff Detail

Repository: rCXX libc++

Event Timeline

krytarowski created this revision.Dec 24 2018, 6:05 AM

Herald added a reviewer: EricWF. · View Herald TranscriptDec 24 2018, 6:05 AM

Herald added subscribers: llvm-commits, christof. · View Herald Transcript

http://lab.llvm.org:8011/builders/lldb-amd64-ninja-netbsd8/builds/17863 Still valid.

Ping? The bot keeps reporting crashes.

Let's do it.

This revision is now accepted and ready to land.Dec 30 2018, 12:41 AM

Closed by commit rCXX350170: More tolerance for flaky tests in libc++ on NetBSD (authored by kamil). · Explain WhyDec 30 2018, 3:08 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: ldionne. · View Herald TranscriptDec 30 2018, 3:08 PM

I think the tests pointed out by http://lab.llvm.org:8011/builders/lldb-amd64-ninja-netbsd8/builds/17863 are always flaky, not only on FreeBSD. Also, this commit will affect all tests that are marked as flaky, not only the ones that are failing in lldb-amd64-ninja-netbsd8. Instead, I would suggest either taking the time to fix the flaky tests, or always making the max retry equal to 9 for flaky test. I don't see a reason for tests marked as flaky not to have a higher retry count (speed shouldn't be a problem since it only affects flaky tests).

In D56064#1343210, @ldionne wrote:

I think the tests pointed out by http://lab.llvm.org:8011/builders/lldb-amd64-ninja-netbsd8/builds/17863 are always flaky, not only on FreeBSD.

NetBSD, not FreeBSD.

Also, this commit will affect all tests that are marked as flaky, not only the ones that are failing in lldb-amd64-ninja-netbsd8. Instead, I would suggest either taking the time to fix the flaky tests, or always making the max retry equal to 9 for flaky test. I don't see a reason for tests marked as flaky not to have a higher retry count (speed shouldn't be a problem since it only affects flaky tests).

I've been testing the patch locally and after few tries I've identified a dozen of flaky tests that are not marked as flaky; and some flaky ones are still crashing within 3 attempts.

I cannot contribute time to keep tracing the buildbot results and keep marking each new flaky test with this flag 24/7. I needed a big hammer here to workaround the problem and move on.

The *3 logic is on purpose but it's a debatable whether it should be set to be always 9. The current state is good enough.

I would rather revert this, and fix the flaky tests. And if there are flaky tests that aren't marked flaky then we need to do that.

Herald added a reviewer: serge-sans-paille. · View Herald TranscriptJan 5 2019, 11:34 AM

Please revert this change.

In D56064#1347394, @EricWF wrote:

I would rather revert this, and fix the flaky tests. And if there are flaky tests that aren't marked flaky then we need to do that.

Do you volunteer to monitor the buildbot for flaky reports?

I'm going to revert it soon and see what happens. Please keep an eye on http://lab.llvm.org:8011/builders/lldb-amd64-ninja-netbsd8

I was receiving complains from other developers about flakiness.

Diffusion mentioned this in rL350477: Revert "D56064: More tolerance for flaky tests in libc++ on NetBSD".Jan 5 2019, 12:15 PM

Reverted in r350477.

Diffusion mentioned this in rCXX350477: Revert "D56064: More tolerance for flaky tests in libc++ on NetBSD".Jan 5 2019, 12:19 PM

In D56064#1347397, @krytarowski wrote:

In D56064#1347394, @EricWF wrote:

I would rather revert this, and fix the flaky tests. And if there are flaky tests that aren't marked flaky then we need to do that.

Do you volunteer to monitor the buildbot for flaky reports?

I'm always watching the buildbots when I commit and when I see flaky tests I attempt to fix or improve them.

I want each flaky test documented with // FLAKY_TEST and we should always attempt to fix a flaky test before we adjust the test suite to tolerate it.
This patch make it much easier to write flaky tests without knowledge or documentation.

PS. If you're running into constant flakes in the threading tests, try passing --shuffle to prevent all the thread hungry tests from being run together.

Also, please report any flaky failures you see in llvm.org/PR40235 and I will fix them ASAP.

In D56064#1347410, @krytarowski wrote:

I'm going to revert it soon and see what happens. Please keep an eye on http://lab.llvm.org:8011/builders/lldb-amd64-ninja-netbsd8

I was receiving complains from other developers about flakiness.

Were the complaints about the lldb-amd64-ninja-netbsd8 bot specifically? Because if the tests are flaky on all NetBSD system, regardless of hardware and load, then we should find out why.
If it's just the one bot under load, we should target our fix to that.

In D56064#1347535, @EricWF wrote:

In D56064#1347410, @krytarowski wrote:

I'm going to revert it soon and see what happens. Please keep an eye on http://lab.llvm.org:8011/builders/lldb-amd64-ninja-netbsd8

I was receiving complains from other developers about flakiness.

Were the complaints about the lldb-amd64-ninja-netbsd8 bot specifically? Because if the tests are flaky on all NetBSD system, regardless of hardware and load, then we should find out why.
If it's just the one bot under load, we should target our fix to that.

I got complains through private messages.

I will continue in Bugzilla.

Diffusion mentioned this in rGd34e9c63c443: Merge libcxx: Revert "D56064: More tolerance for flaky tests in libc++ on….Dec 12 2019, 12:35 PM

Revision Contents

Path

Size

utils/

libcxx/

test/

format.py

7 lines

Diff 179743

utils/libcxx/test/format.py

#===----------------------------------------------------------------------===##		#===----------------------------------------------------------------------===##
#		#
# The LLVM Compiler Infrastructure		# The LLVM Compiler Infrastructure
#		#
# This file is dual licensed under the MIT and the University of Illinois Open		# This file is dual licensed under the MIT and the University of Illinois Open
# Source Licenses. See LICENSE.TXT for details.		# Source Licenses. See LICENSE.TXT for details.
#		#
#===----------------------------------------------------------------------===##		#===----------------------------------------------------------------------===##

import copy		import copy
import errno		import errno
import os		import os
import time		import time
import random		import random
		import platform

import lit.Test # pylint: disable=import-error		import lit.Test # pylint: disable=import-error
import lit.TestRunner # pylint: disable=import-error		import lit.TestRunner # pylint: disable=import-error
from lit.TestRunner import ParserKind, IntegratedTestKeywordParser \		from lit.TestRunner import ParserKind, IntegratedTestKeywordParser \
# pylint: disable=import-error		# pylint: disable=import-error

from libcxx.test.executor import LocalExecutor as LocalExecutor		from libcxx.test.executor import LocalExecutor as LocalExecutor
import libcxx.util		import libcxx.util
▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	def _evaluate_pass_test(self, test, tmpBase, lit_config,
# TODO: Only list actually needed files in file_deps.		# TODO: Only list actually needed files in file_deps.
# Right now we just mark all of the .dat files in the same		# Right now we just mark all of the .dat files in the same
# directory as dependencies, but it's likely less than that. We		# directory as dependencies, but it's likely less than that. We
# should add a `// FILE-DEP: foo.dat` to each test to track this.		# should add a `// FILE-DEP: foo.dat` to each test to track this.
data_files = [os.path.join(local_cwd, f)		data_files = [os.path.join(local_cwd, f)
for f in os.listdir(local_cwd) if f.endswith('.dat')]		for f in os.listdir(local_cwd) if f.endswith('.dat')]
is_flaky = self._get_parser('FLAKY_TEST.', parsers).getValue()		is_flaky = self._get_parser('FLAKY_TEST.', parsers).getValue()
max_retry = 3 if is_flaky else 1		max_retry = 3 if is_flaky else 1

		# LIBC++ tests tend to be more flaky on NetBSD, so add more retries.
		# We don't do this on other platforms because it's slower.
		if platform.system() in ['NetBSD']:
		max_retry = max_retry * 3

for retry_count in range(max_retry):		for retry_count in range(max_retry):
cmd, out, err, rc = self.executor.run(exec_path, [exec_path],		cmd, out, err, rc = self.executor.run(exec_path, [exec_path],
local_cwd, data_files,		local_cwd, data_files,
env)		env)
report = "Compiled With: '%s'\n" % ' '.join(compile_cmd)		report = "Compiled With: '%s'\n" % ' '.join(compile_cmd)
report += libcxx.util.makeReport(cmd, out, err, rc)		report += libcxx.util.makeReport(cmd, out, err, rc)
if rc == 0:		if rc == 0:
res = lit.Test.PASS if retry_count == 0 else lit.Test.FLAKYPASS		res = lit.Test.PASS if retry_count == 0 else lit.Test.FLAKYPASS
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines