Page MenuHomePhabricator

[OpenMP] [test] Fix intermittent omp_get_wtime.c test failures

Authored by jlpeyton on Aug 20 2021, 2:37 PM.



The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel.
Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Put the threshold at a generous 33%.

Brought to attention by Ron Lieberman (Thanks!)

Diff Detail

Event Timeline

jlpeyton created this revision.Aug 20 2021, 2:37 PM
jlpeyton requested review of this revision.Aug 20 2021, 2:37 PM
ronlieb accepted this revision.Aug 20 2021, 3:05 PM

i applied the patch , compiled and ran it 200+ invocations.
no fails, thanks for fixing this

This revision is now accepted and ready to land.Aug 20 2021, 3:05 PM
JonChesterfield accepted this revision.Aug 20 2021, 3:07 PM

Complicated, but robust wall clock timing was always going to be complicated. This looks reasonable and more robust that the current one.

Timing usually involves a fixed size integer that occasionally overflows so a little care is needed around end - start. I don't know how that works out if the return type is a double, maybe it abruptly goes back to zero, but it's hard to guess at what point. However wtime() is specified to return a floating point value so here we are.

If this test still has issues after the latest patch changing to 100%, basically avg() < 2* expected, we could change to median() < 2*expected. This would effectively ignore the larger half of values.

We might also consider dropping this test, it's starting to feel quite a lot more complicated than the implementation

Flang does a more basic test for the time function without risk of failing because of OS scheduling jitter: rG4498137bd7857c07921b4cd6313baac62ead24e2

In my local test execution I just got:

error: average of 100 runs (0.029358) is of by 193.581486%

I think, changing the test as suggested by @Meinersbur sounds reasonable.

simpler approach like flang per MKruse does seem more attractive for this test. agree.