This is an archive of the discontinued LLVM Phabricator instance.

[Zorg] Add timeouts to OpenMP tests.
AcceptedPublic

Authored by Meinersbur on Jul 30 2021, 2:16 PM.

Details

Summary

This patch sets an individual test timeout of 10 seconds. Due to bug llvm.org/PR51235 some tests sometimes take 20 minutes to executed. The timeout is meant to reduce the overall build time and allow detection over tests that take abnormal time to execute.

The timeout of 10s was suggested by @tianshilei1992 which current causes 7 tests to fail (See http://meinersbur.de:8011/#/builders/1). I don't know whether all of them are due to llvm.org/PR51235 or whether some regularly take longer than 10 seconds.

Event Timeline

Meinersbur created this revision.Jul 30 2021, 2:16 PM
Meinersbur requested review of this revision.Jul 30 2021, 2:16 PM

My development machine has NVIDIA GTX2080 with 8GB GPU memory. What my extra arguments are -j 4 --timeout=10. I don't know if you already set the number of parallel jobs, because some test cases really need a large amount of GPU memory, and 4 in my case can make sure that no false failure because of OOM. If you didn't set that, I suggest to do it.
As for the timeout, 10 seconds are long enough to run any existing case in my machine.

Meinersbur retitled this revision from [Zorg] Add timputs to OpenMP tests. to [Zorg] Add timeouts to OpenMP tests..Aug 2 2021, 12:55 PM

My development machine has NVIDIA GTX2080 with 8GB GPU memory. What my extra arguments are -j 4 --timeout=10. I don't know if you already set the number of parallel jobs, because some test cases really need a large amount of GPU memory, and 4 in my case can make sure that no false failure because of OOM.

The system has "only" 6 cores and no SMT/hyperthreads. That is, llvm-lit by default launches 6 threads. I haven't observed OOM errors yet.

This revision is now accepted and ready to land.Aug 2 2021, 4:10 PM
Meinersbur updated this revision to Diff 364604.Aug 5 2021, 1:43 PM

Changed timeout to 90s

Meinersbur added a comment.EditedAug 5 2021, 1:50 PM

After llvm.org/PR51235 has been fixed, I measured the individual test execution in the worker using --time-tests -j1. These are all tests that run for longer than 5s:

63.08s: libomp :: tasking/omp_taskyield.c
60.10s: libomp :: tasking/omp_task_final.c
50.09s: libomp :: tasking/omp_taskwait.c
50.09s: libomp :: tasking/omp_task.c
50.07s: libomp :: api/omp_get_wtime.c
10.09s: libomp :: tasking/omp_task_if.c
10.06s: libomp :: flush/omp_flush.c
6.37s: libomp :: barrier/omp_barrier.c
5.10s: libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49334.cpp
5.05s: libomp :: tasking/taskdep_if0_2.c

Time with -j6 was slightly longer (maybe 20%) for tests running < 10s. To account for this and jitter, I changed the timeout to 90s. (http://meinersbur.de:8011 currently running fine with a timeout of 70s)

Waiting for @gkistanova to approve D107193 before I can reconnect my worker.