⚙ D52139 [lldb-mi] Fix hanging of target-select-so-path.test

apolyakov created this revision.Sep 15 2018, 12:53 PM

Herald added subscribers: abidh, ki.stfu. · View Herald TranscriptSep 15 2018, 12:53 PM

It looks like it would be easy to forget to add -gdb-exit to every test. Wouldn't it be better to have the lldb-mi automatically produce a -gdb-exit command when it reads EOF from stdin?

It seems reasonable, I'll look at this.

I found out that the reason of hanging of the test suite was in incorrect usage of Filecheck and lldb-mi processes in target-select-so-path.test. Current patch has been tested on CentOS 7.

These changes fixed the issue for me.

This revision is now accepted and ready to land.Sep 20 2018, 9:04 AM

tatyana-krasnukha added a parent revision: D49739: Add new API to SBTarget class.Sep 20 2018, 9:21 AM

FYI, for me the test is still hanging on Arch Linux.

Can you provide some logs? For example, you might be able to run this test (llvm-lit -a -vv .../target-select-so-path.test) and kill lldb-mi and filecheck processes if they hang. Also, information about which processes hang would be useful too (lldb-mi or filecheck or maybe both).

Alexander, please add a timeout to be sure that the test is not hanging even if something goes wrong. A possible solution for Python 2:

target-select-so-path.py.patch1 KBDownload

This revision now requires changes to proceed.Sep 20 2018, 10:30 AM

Thanks to @tatyana-krasnukha for the idea about a timer. Added a timer to target-select-so-path test.

I'm accepting this as seems to improve the situation, even though it's not a complete fix.

(Not sure about the 2-minute timeout value though, but we'll see how the bots react).

teemperor accepted this revision.Sep 20 2018, 12:51 PM

In D52139#1240835, @apolyakov wrote:

Can you provide some logs? For example, you might be able to run this test (llvm-lit -a -vv .../target-select-so-path.test) and kill lldb-mi and filecheck processes if they hang. Also, information about which processes hang would be useful too (lldb-mi or filecheck or maybe both).

@teemperor can you do this? It'll help me a lot.

Sure, here is the output: After some debugging it seems that our select wrapper code is failing. We probably could fix this by terminating the process like in the other error cases in the same method (grep for the "failed to write to the unnamed pipe" error to see the code).

******************** TEST 'lldb :: tools/lldb-mi/target/target-select-so-path.test' FAILED ********************
Script:
--
: 'RUN: at line 3';   /home/teemperor/.llvm/rel-build/./bin/clang -o /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/main.c -g
: 'RUN: at line 4';   python /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/target-select-so-path.py "/home/teemperor/.llvm/rel-build/bin/lldb-server gdbserver" "/home/teemperor/.llvm/rel-build/bin/lldb-mi --synchronous /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp" /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/target-select-so-path.test
--
Exit Code: 143

Command Output (stderr):
--
+ : 'RUN: at line 3'
+ /home/teemperor/.llvm/rel-build/./bin/clang -o /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/main.c -g
+ : 'RUN: at line 4'
+ python /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/target-select-so-path.py '/home/teemperor/.llvm/rel-build/bin/lldb-server gdbserver' '/home/teemperor/.llvm/rel-build/bin/lldb-mi --synchronous /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp' /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/target-select-so-path.test
failed to write to the unnamed pipe: timed out
/home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.script: line 2:  7284 Terminated              python /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/inputs/target-select-so-path.py "/home/teemperor/.llvm/rel-build/bin/lldb-server gdbserver" "/home/teemperor/.llvm/rel-build/bin/lldb-mi --synchronous /home/teemperor/.llvm/rel-build/tools/lldb/lit/tools/lldb-mi/target/Output/target-select-so-path.test.tmp" /home/teemperor/.llvm/llvm/tools/lldb/lit/tools/lldb-mi/target/target-select-so-path.test

--

Do you mean ConnectToRemote method from lldb/tools/lldb-server/lldb-gdbserver.cpp?

Yes, the writeSocketIdToPipe(unnamed_pipe_fd, socket_id); fails in this method/file.

AFAIR, adding an exit(...) to ConnectToRemote won't solve this problem. The test will still be failing on Arch.

teemperor added a comment.Sep 23 2018, 12:01 PM

This comment was removed by teemperor.

Posting my mail here for the record:

I was more hoping it would at least turn the deadlock into a fail,
this way I could at least run the test suit.

Anyway, the actual issue is related Python 3: Arch Linux (and probably
some other distributions that "already" upgraded to Python 3 as
default) will launch Python 3 when the test script calls python ....
And for some reason in Python 3.7, we will not inherit our FD from our
pipe to the subprocess, which causes this strange behavior when write
to the unrelated FD number in ConnectToRemote. When I explicitly call
Python 2.7 from the test, everything runs as expected.

The python docs don't see to mention this change (and it seems like a
bug to me), so I'm open for ideas how to fix this properly. In any
case this problem doesn't block this review.

AFAIR, adding an exit(...) to ConnectToRemote won't solve this problem. The test will still be failing on Arch.

I was more hoping it would at least turn the deadlock into a fail,
this way I could at least run the test suit.

Anyway, the actual issue is related Python 3: Arch Linux (and probably
some other distributions that "already" upgraded to Python 3 as
default) will launch Python 3 when the test script calls python ....
And for some reason in Python 3.7, we will not inherit our FD from our
pipe to the subprocess, which causes this strange behavior when write
to the unrelated FD number in ConnectToRemote. When I explicitly call
Python 2.7 from the test, everything runs as expected.

The python docs don't see to mention this change (and it seems like a
bug to me), so I'm open for ideas how to fix this properly. In any
case this problem doesn't block this review.

(I hope this comment shows up in Phabricator, as it seems
reviews.llvm.org is currently having some internal errors).
Am So., 23. Sep. 2018 um 01:03 Uhr schrieb Alexander Polyakov via
Phabricator <reviews@reviews.llvm.org>:

apolyakov added a comment.

AFAIR, adding an exit(...) to ConnectToRemote won't solve this problem. The test will still be failing on Arch.

https://reviews.llvm.org/D52139

If so, we can try to run the script with python2.x. @teemperor can you try to modify target-select-so-path.test this way:
change # RUN: python %p/inputs/target-select-so-path.py "%debugserver" "%lldbmi %t" %s to
# RUN: python2 %p/inputs/target-select-so-path.py "%debugserver" "%lldbmi %t" %s

If this works for you on Arch, I'll update the revision.

Yeah, explicitly typing python2 is what I did to fix it. Not sure if that breaks other OSs though (e.g. if a system has no python2 binary, but only python).

Also, that fix should be its own revision. It's not connected to the idea behind this revision from what I can see.

Reduced timer from 120 to 30 seconds.

A lot of tests are failing with Python 3, at least on CentOS. So, I agree the problem doesn't block this review.

This revision is now accepted and ready to land.Sep 24 2018, 11:34 AM

apolyakov removed a parent revision: D49739: Add new API to SBTarget class.Sep 24 2018, 12:08 PM

Closed by commit rL342915: [lldb-mi] Fix hanging of target-select-so-path.test (authored by apolyakov). · Explain WhySep 24 2018, 12:14 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptSep 24 2018, 12:14 PM

In D52139#1243170, @teemperor wrote:

Anyway, the actual issue is related Python 3: Arch Linux (and probably
some other distributions that "already" upgraded to Python 3 as
default) will launch Python 3 when the test script calls python ....
And for some reason in Python 3.7, we will not inherit our FD from our
pipe to the subprocess, which causes this strange behavior when write
to the unrelated FD number in ConnectToRemote. When I explicitly call
Python 2.7 from the test, everything runs as expected.

The python docs don't see to mention this change (and it seems like a
bug to me), so I'm open for ideas how to fix this properly. In any
case this problem doesn't block this review.

It's more like python 2 had a bug where it always inherited the file descriptor, and then python3 fixed that and introduced a special popen argument to control the inheriting behavior.

Back when we were designing this test, I demonstrated the necessary incantations for this to work on both python2 and 3 https://reviews.llvm.org/D49739?id=157310#inline-438290. It seems that did not make it into the final version..

Thanks Pavel, I fixed it here https://reviews.llvm.org/D52498.

This is an archive of the discontinued LLVM Phabricator instance.

[lldb-mi] Fix hanging of target-select-so-path.test
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 166714

lit/tools/lldb-mi/target/inputs/target-select-so-path.py