The target-select-so-path test might hang on
some platforms. The reason of that behavior
was in incorrect usage of Filecheck and lldb-mi
processes. Instead of redirecting lldb-mi's output
to Filecheck, we should run lldb-mi session,
finish the session, collect its output and then pass
it to Filecheck.
Also, this patch adds a timer to the test to prevent
it from hanging in the future.
Details
Diff Detail
Event Timeline
It looks like it would be easy to forget to add -gdb-exit to every test. Wouldn't it be better to have the lldb-mi automatically produce a -gdb-exit command when it reads EOF from stdin?
I found out that the reason of hanging of the test suite was in incorrect usage of Filecheck and lldb-mi processes in target-select-so-path.test. Current patch has been tested on CentOS 7.
Can you provide some logs? For example, you might be able to run this test (llvm-lit -a -vv .../target-select-so-path.test) and kill lldb-mi and filecheck processes if they hang. Also, information about which processes hang would be useful too (lldb-mi or filecheck or maybe both).
Alexander, please add a timeout to be sure that the test is not hanging even if something goes wrong. A possible solution for Python 2:
Thanks to @tatyana-krasnukha for the idea about a timer. Added a timer to target-select-so-path test.
I'm accepting this as seems to improve the situation, even though it's not a complete fix.
(Not sure about the 2-minute timeout value though, but we'll see how the bots react).
Sure, here is the output: After some debugging it seems that our select wrapper code is failing. We probably could fix this by terminating the process like in the other error cases in the same method (grep for the "failed to write to the unnamed pipe" error to see the code).
******************** TEST 'lldb :: tools/lldb-mi/target/target-select-so-path.test' FAILED ******************** Script: -- : 'RUN: at line 3'; /home/teemperor/.llvm/rel-build/./bin/clang -o /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/main.c -g : 'RUN: at line 4'; python /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/target-select-so-path.py "/home/teemperor/.llvm/rel-build/bin/lldb-server gdbserver" "/home/teemperor/.llvm/rel-build/bin/lldb-mi --synchronous /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp" /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/target-select-so-path.test -- Exit Code: 143 Command Output (stderr): -- + : 'RUN: at line 3' + /home/teemperor/.llvm/rel-build/./bin/clang -o /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/main.c -g + : 'RUN: at line 4' + python /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/target-select-so-path.py '/home/teemperor/.llvm/rel-build/bin/lldb-server gdbserver' '/home/teemperor/.llvm/rel-build/bin/lldb-mi --synchronous /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp' /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/target-select-so-path.test failed to write to the unnamed pipe: timed out /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.script: line 2: 7284 Terminated python /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/target-select-so-path.py "/home/teemperor/.llvm/rel-build/bin/lldb-server gdbserver" "/home/teemperor/.llvm/rel-build/bin/lldb-mi --synchronous /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp" /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/target-select-so-path.test --
AFAIR, adding an exit(...) to ConnectToRemote won't solve this problem. The test will still be failing on Arch.
Posting my mail here for the record:
I was more hoping it would at least turn the deadlock into a fail,
this way I could at least run the test suit.
Anyway, the actual issue is related Python 3: Arch Linux (and probably
some other distributions that "already" upgraded to Python 3 as
default) will launch Python 3 when the test script calls python ....
And for some reason in Python 3.7, we will not inherit our FD from our
pipe to the subprocess, which causes this strange behavior when write
to the unrelated FD number in ConnectToRemote. When I explicitly call
Python 2.7 from the test, everything runs as expected.
The python docs don't see to mention this change (and it seems like a
bug to me), so I'm open for ideas how to fix this properly. In any
case this problem doesn't block this review.
AFAIR, adding an exit(...) to ConnectToRemote won't solve this problem. The test will still be failing on Arch.
I was more hoping it would at least turn the deadlock into a fail,
this way I could at least run the test suit.
Anyway, the actual issue is related Python 3: Arch Linux (and probably
some other distributions that "already" upgraded to Python 3 as
default) will launch Python 3 when the test script calls python ....
And for some reason in Python 3.7, we will not inherit our FD from our
pipe to the subprocess, which causes this strange behavior when write
to the unrelated FD number in ConnectToRemote. When I explicitly call
Python 2.7 from the test, everything runs as expected.
The python docs don't see to mention this change (and it seems like a
bug to me), so I'm open for ideas how to fix this properly. In any
case this problem doesn't block this review.
(I hope this comment shows up in Phabricator, as it seems
reviews.llvm.org is currently having some internal errors).
Am So., 23. Sep. 2018 um 01:03 Uhr schrieb Alexander Polyakov via
Phabricator <reviews@reviews.llvm.org>:
apolyakov added a comment.
AFAIR, adding an exit(...) to ConnectToRemote won't solve this problem. The test will still be failing on Arch.
If so, we can try to run the script with python2.x. @teemperor can you try to modify target-select-so-path.test this way:
change # RUN: python %p/inputs/target-select-so-path.py "%debugserver" "%lldbmi %t" %s to
# RUN: python2 %p/inputs/target-select-so-path.py "%debugserver" "%lldbmi %t" %s
If this works for you on Arch, I'll update the revision.
Yeah, explicitly typing python2 is what I did to fix it. Not sure if that breaks other OSs though (e.g. if a system has no python2 binary, but only python).
Also, that fix should be its own revision. It's not connected to the idea behind this revision from what I can see.
A lot of tests are failing with Python 3, at least on CentOS. So, I agree the problem doesn't block this review.
It's more like python 2 had a bug where it always inherited the file descriptor, and then python3 fixed that and introduced a special popen argument to control the inheriting behavior.
Back when we were designing this test, I demonstrated the necessary incantations for this to work on both python2 and 3 https://reviews.llvm.org/D49739?id=157310#inline-438290. It seems that did not make it into the final version..