This is an archive of the discontinued LLVM Phabricator instance.

[TSAN] Parameterize the hard-coded threshold of deflake in tsan test
ClosedPublic

Authored by anhtuyen on Jan 30 2020, 6:40 AM.

Details

Summary

A number of testcases in this bucket such as ThreadSanitizer-powerpc64le :: atomic_free3.cpp are designed to deal with intermittent problems. The expected problem does not exist in all executions of the program being tested. The authors of the tests are aware of the issue. They use a script called deflake.bash which runs the executable up to 10 times to deal with the intermittent nature of the tests. By performing more executions, they increase the chances of having at least one of the executions expose the targeted issue.

On test machines with heavier load, the hard-coded threshold of 10 is not sufficient. The intermittent issue is more likely to be hit when the process is on a quiet machine because its threads gets cycles more often. Thus, on a machine with much more active processes, the condition being checked for is less likely to happen on any given execution.

The purpose of this patch is to parameterize the hard-coded threshold used by the compiler-rt/test/tsan/deflake.bash script. The changes to CMakeLists and LIT configuration files are to enable the users to configure the deflake threshold by specifying a positive integer value via a CMake variable. This will be used in the case that a deflake threshold other than 10 is more desirable for their test environment.

--cmake_variables=-D'TSAN_TEST_DEFLAKE_THRESHOLD=<integerValue>'

When the cmake-variable is not defined, the existing value of 10 will be used.

Diff Detail

Event Timeline

anhtuyen created this revision.Jan 30 2020, 6:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2020, 6:40 AM
Herald added subscribers: Restricted Project, steven.zhang, jfb, mgorny. · View Herald Transcript
anhtuyen edited the summary of this revision. (Show Details)Jan 30 2020, 6:45 AM
anhtuyen edited the summary of this revision. (Show Details)Jan 30 2020, 6:49 AM
rnk added a comment.Jan 30 2020, 10:52 AM

More than 10 retries seems excessive. For this particular issue, I think it would be reasonable to mark the test ; UNSUPPORTED: powerpc64. Adding more retries will make the powerpc64 bots slower and less useful.

In D73707#1850128, @rnk wrote:

More than 10 retries seems excessive. For this particular issue, I think it would be reasonable to mark the test ; UNSUPPORTED: powerpc64. Adding more retries will make the powerpc64 bots slower and less useful.

Thanks for your opinion Reid @rnk, but I disagree with your comment based on the following points.
(1) This is to make the value configurable in order to enable users who see the needs to adjust the deflake threshold to a value more suitable for their environment. The existing threshold of 10 is the default and thus there should be no impact to other bots.
(2) I have run the entire LIT (check-all) more than 10 times on the same dedicated machine, and the results are not much slower at all. On average, the time for the entire LIT will increase by 0.26 second if the threshold is 100, and 0.48s for 200.

Threshold of 200Threshold of 100Threshold of 10 (Existing Value)
1st124.04s123.65s122.64s
2nd122.93s123.12s122.49s
3rd123.33s123.26s123s
4th123.21s122.24s121.93s
5th122.6s123.32s122.48s
6th123.29s122.42s123.02s
7th123.01s122.89s122.37s
8th122.57s122.42s122.63s
9th122.61s122.85s122.6s
10th123.34s122.19s122.95s
11th122.98s123.09s122.47s
Average (in seconds)123.08s122.86s122.6s
Increase in seconds vs baseline0.48s0.26s(baseline)
rnk added a subscriber: vitalybuka.Jan 30 2020, 5:18 PM

I think what you are doing is reasonable and practical. I only felt obligated to make the philosophical point that flaky tests are bad, and nobody other than the test owner should have to spend time on them.

In the meantime, I think this is pretty reasonable. Maybe @vitalybuka can help review it.

In D73707#1850870, @rnk wrote:

I think what you are doing is reasonable and practical. I only felt obligated to make the philosophical point that flaky tests are bad, and nobody other than the test owner should have to spend time on them.

In the meantime, I think this is pretty reasonable. Maybe @vitalybuka can help review it.

Thank you very much for your nice word, Reid @rnk . I am very new to LLVM and I hope I can learn more from experts like you.

hubert.reinterpretcast added inline comments.
compiler-rt/test/tsan/deflake.bash
13

Please add quotes around variable expansion here and for further expansions of THRESHOLD.

16

Minor nit: It seems that the usual style uses a space after #?

17

Suggest to use || instead of using ! with &&.

19

This can be

while (( THRESHOLD-- )); do
anhtuyen updated this revision to Diff 241764.Jan 31 2020, 10:13 AM

Thank you very much Hubert @hubert.reinterpretcast for your comments. I have changed the script accordingly.

vitalybuka accepted this revision.Feb 3 2020, 6:07 PM

LGTM

compiler-rt/test/tsan/deflake.bash
17

why do you need this?

I'd expect

for i in $(seq 1 ${THRESHOLD}); do

already exits with error

This revision is now accepted and ready to land.Feb 3 2020, 6:07 PM
anhtuyen marked an inline comment as done.Feb 3 2020, 6:30 PM
anhtuyen added inline comments.
compiler-rt/test/tsan/deflake.bash
17

Hi Vitaly @vitalybuka
Thank you very much for your review and acceptance.
The condition

[[  "${THRESHOLD}" =~ ^[0-9]+$ ]] || exit 1

is to exit early for cases such as that of a negative value.

This revision was automatically updated to reflect the committed changes.
delcypher added inline comments.
compiler-rt/test/tsan/lit.cfg.py
78

@anhtuyen This is breaking our tests. There's a missing space at the end of the substitution which means the command produced is something like

deflake.bash 10/path/to/binary

when it should be

deflake.bash 10 /path/to/binary
anhtuyen added inline comments.Feb 19 2020, 1:04 PM
compiler-rt/test/tsan/lit.cfg.py
78

Hello @delcypher
This does not happen on our machines, but sorry for breaking your test. The fix is so simple that I can commit it in a few minutes. However, can you give me some info about your machine so that I can test it (if I have a similar one) before pushing the fix.

delcypher added inline comments.Feb 19 2020, 1:26 PM
compiler-rt/test/tsan/lit.cfg.py
78

I've landed a quick fix in ddd2257f48a83843aa35e7cbd337c3fe0c103598

anhtuyen marked an inline comment as done.Feb 19 2020, 1:41 PM
anhtuyen added inline comments.
compiler-rt/test/tsan/lit.cfg.py
78

Yes, mine collided it with yours above! Thank you!