Page MenuHomePhabricator

Mark lld/test/ELF as flaky.
AbandonedPublic

Authored by chapuni on Oct 17 2017, 10:07 PM.

Details

Reviewers
ruiu
davide
Summary

They rarely fail (IIRC, exit code 139, w/o stack trace) when load is high. At the moment, I'd like to mark them flaky.

It there any other better idea?
FYI, it's the default value in Ubuntu 14.04

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1029228
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1029228
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Diff Detail

Repository
rL LLVM

Event Timeline

chapuni created this revision.Oct 17 2017, 10:07 PM
davide requested changes to this revision.Oct 17 2017, 10:35 PM
davide added a subscriber: davide.

I really don't like this solution, in particular as a global one. Can we try to reproduce the tests that are failing?

This revision now requires changes to proceed.Oct 17 2017, 10:35 PM

@davide, This is why I didn't silently commit such a trivial change. I am happy if it could be reproduced easily.

grimar added a subscriber: grimar.Oct 18 2017, 1:20 AM
ruiu edited edge metadata.Oct 18 2017, 11:52 AM

I've never seen that before, too, so I'm interested in knowing how to reproduce the issue locally.

For example; http://bb9.pgr.jp/#/builders/20/builds/144

I haven't reproduced in my local tree.
I am happy and sad to hear you guys didn't meet one.

For example; http://bb9.pgr.jp/#/builders/20/builds/144

I haven't reproduced in my local tree.
I am happy and sad to hear you guys didn't meet one.

I think I saw that recently (6/oct/2017).
Probably it was

The Buildbot has detected a new failure on builder lld-x86_64-freebsd while building lld.
Full details are available at:
 http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/11107

Buildbot URL: http://lab.llvm.org:8011/

Buildslave for this Build: as-bldslv5

Build Reason: scheduler
Build Source Stamp: [branch trunk] 315054
Blamelist: grimar

BUILD FAILED: failed test_lld

sincerely,
 -The Buildbot

report is not available anymore, so I am not sure :(
I remember bot blamed me because of segfault and I think it was the same testcase (ELF/sysroot.s).
I had to force bot to rebuild the same revision to stop blaming me and it worked.

Reproduced with RelWithDebInfo.
Crashing in pthread_detach.

#0  pthread_detach (th=140643334551296) at pthread_detach.c:50
#1  0x00007fea272cf875 in __gthread_detach (__threadid=<optimized out>)
    at /build/buildd/gcc-4.8-4.8.4/build/x86_64-linux-gnu/libstdc++-v3/include/x86_64-linux-gnu/bits/gthr-default.h:674
#2  std::thread::detach (this=this@entry=0x7fea21648e30)
    at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:121
#3  0x00007fea278defe8 in operator() (__closure=0x9fbbe8)
    at /home/tnakamura/llvm/llvm-project/llvm/lib/Support/Parallel.cpp:80
#4  _M_invoke<> (this=0x9fbbe8) at /usr/include/c++/4.8/functional:1732
#5  operator() (this=0x9fbbe8) at /usr/include/c++/4.8/functional:1720
#6  std::thread::_Impl<std::_Bind_simple<(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(unsigned int)::__lambda5()> >::_M_run(void) (
    this=0x9fbbd0) at /usr/include/c++/4.8/thread:115
#7  0x00007fea272cfa40 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:84
#8  0x00007fea26a3b184 in start_thread (arg=0x7fea21649700)
    at pthread_create.c:312
#9  0x00007fea26f52ffd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I will investigated later what happened.

I guess the constructor std::thread(F) fails, then its method, detach() crashes.
If we live with -fno-exceptions, I think we may rewrite Parallel.cpp w/o C++11 std::thread, but with pthreads, to catch failure in pthread_create.

Or, could I increase any resources in ulimit/kernel? I tried idea that I knew.
I will attempt to find out with -fexceptions which error would raise.

FYI, I am using 36-core (72 logical processors) host.

chapuni abandoned this revision.Oct 27 2017, 5:57 AM

I will apply this patch locally while the issue is here.