The function ProcessGDBRemote::UpdateThreadIDsFromStopReplyThreadsValue(), which is being called after the thread PCs are updated, is clearing the thread PC list and that is wrong.
Details
Diff Detail
Event Timeline
Could you elaborate on how/why is this wrong? E.g. in which situations does it matter whether we clear the PC list?
I could imagine there being a bug in this ProcessGDBRemote::UpdateThreadIDList -> ProcessGDBRemote::UpdateThreadIDsFromStopReplyThreadsValue sequence when talking to a gdb-remote stub that does not implement the lldb-enhancement jThreadsInfo. If jThreadsInfo isn't implemented, we fall back to looking for the thread-pcs: and threads: key-value pairs in the stop packet that we received. We clear m_thread_pcs, then look if thread-pcs: is present (that's one bug) to add all the thread pc values. After that, we look for the threads: list and if it is present, we call UpdateThreadIDsFromStopReplyThreadsValue which clears m_thread_pcs again (bug #2) and then adds any threads listed to the m_thread_ids.
I could believe there are problems here if we're talking to a gdb-remote that doesn't support jThreadsInfo but does provide threads: or thread-pcs: maybe? It'd be interesting to see what the packet log for a stop packet is showing, and what exactly the gdb-remote stub is.
(just to be clear, the m_thread_pcs.clear() in ProcessGDBRemote::UpdateThreadIDList that I called a bug -- today we only have two ways of populating that, jThreadsInfo or the thread-pcs: value in the stop packet. So clearing it unconditionally here, and then populating it from thread_pcs: seems reasonable.)
Yes, we shouldn't be clearing the m_thread_pcs in UpdateThreadIDsFromStopReplyThreadsValue() otherwise we would loose the thread PCs populated from stop reply thread-pcs: list just before the UpdateThreadIDsFromStopReplyThreadsValue() call.
The other m_thread_pcs.clear() in UpdateThreadIDList() is redundant as we are anyway clearing m_thread_pcs in UpdateThreadPCsFromStopReplyThreadsValue().
I think the m_thread_pcs.clear() in UpdateThreadIDList() is to clear out any old thread pc values that we might have populated in the past. The only way I can see this happening is if the remote stub SOMETIMES returned the thread-pcs: list in the stop packet. UpdateThreadPCsFromStopReplyThreadsValue() which we call if we did get a thread-pcs: in this stop reply packet, clears m_thread_pcs so the only reason to have this also in UpdateThreadIDList() is if we had a stub that was inconsistent.
I agree that the m_thread_pcs.clear() in UpdateThreadIDsFromStopReplyThreadsValue() is wrong, and your patch here is good.
I don't have strong feelings about removing the m_thread_pcs() in UpdateThreadIDList(), but I think I would prefer leaving it in to changing it unless we know it's causing problems. Does this cause problems in your environment?
Sorry, I wasn't clear in my previous comment.
No, it did not cause any problems in our environment.
But it is redundant as UpdateThreadPCsFromStopReplyThreadsValue(), which gets called immediately after clearing m_thread_pcs in UpdateThreadIDList(), will anyway clear the old m_thread_pcs (line 1545 without my changes) before populating them from the thread-pcs: received in the stop reply packet.
For now, leaving the m_thread_pcs.clear() in UpdateThreadIDList() unchanged as it is not causing any problems.
Could you also add a test case for this?
I think it should be possible to test this via the gdb-client (test/testcases/functionalities/gdb_remote_client/) test suite. If I understood the previous comments correctly, you'll need to mock a server that sends a thread-pcs field, but does not implement a jThreadsInfo packet.
That's a good point Pavel. I tried to write one (below) but I never saw what the original failure mode was. Venkata, can you help to make a test case that fails before the patch and works after? Or explain what bug was being fixed exactly? I could see that the code was wrong from reading it, but I never understood how you got to this.
Index: packages/Python/lldbsuite/test/functionalities/gdb_remote_client/TestStopPCs.py
- packages/Python/lldbsuite/test/functionalities/gdb_remote_client/TestStopPCs.py (nonexistent)
+++ packages/Python/lldbsuite/test/functionalities/gdb_remote_client/TestStopPCs.py (working copy)
@@ -0,0 +1,45 @@
+from future import print_function
+import lldb
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test.decorators import *
+from gdbclientutils import *
+
+
+class TestThreadSelectionBug(GDBRemoteTestBase):
+ def test(self):
+ class MyResponder(MockGDBServerResponder):
+ def haltReason(self):
+ return "T02thread:1ff0d;threads:1ff0d,2ff0d;thread-pcs:10001bc00,10002bc00;"
+
+ def threadStopInfo(self, threadnum):
+ if threadnum == 0x1ff0d:
+ return "T02thread:1ff0d;threads:1ff0d,2ff0d;thread-pcs:10001bc00,10002bc00;0:0,1:00bc010001;"
+ if threadnum == 0x2ff0d:
+ return "T00thread:2ff0d;threads:1ff0d,2ff0d;thread-pcs:10001bc00,10002bc00;0:0,1:00bc020001;"
+
+ def qXferRead(self, obj, annex, offset, length):
+ if annex == "target.xml":
+ return """<?xml version="1.0"?>
+ <target version="1.0">
+ <architecture>i386:x86-64</architecture>
+ <feature name="org.gnu.gdb.i386.core">
+ <reg name="rax" bitsize="64" regnum="0" type="int" group="general"/>
+ <reg name="rip" bitsize="64" regnum="1" type="code_ptr" group="general"/>
+ </feature>
+ </target>""", False
+ else:
+ return None, False
+
+ self.server.responder = MyResponder()
+ target = self.dbg.CreateTarget('')
+ if self.TraceOn():
+ self.runCmd("log enable gdb-remote packets")
+ process = self.connect(target)
+
+ self.assertEqual(process.GetNumThreads(), 2)
+ th0 = process.GetThreadAtIndex(0)
+ th1 = process.GetThreadAtIndex(1)
+ self.assertEqual(th0.GetThreadID(), 0x1ff0d)
+ self.assertEqual(th1.GetThreadID(), 0x2ff0d)
+ self.assertEqual(th0.GetFrameAtIndex(0).GetPC(), 0x10001bc00)
+ self.assertEqual(th1.GetFrameAtIndex(0).GetPC(), 0x10002bc00)
Index: packages/Python/lldbsuite/test/functionalities/gdb_remote_client/gdbclientutils.py
- packages/Python/lldbsuite/test/functionalities/gdb_remote_client/gdbclientutils.py (revision 337215)
+++ packages/Python/lldbsuite/test/functionalities/gdb_remote_client/gdbclientutils.py (working copy)
@@ -130,6 +130,8 @@
return self.QEnableErrorStrings() if packet == "?": return self.haltReason()
+ if packet == "s":
+ return self.haltReason()
if packet[0] == "H": return self.selectThread(packet[1], int(packet[2:], 16)) if packet[0:6] == "qXfer:":
@@ -144,6 +146,9 @@
return self.vAttach(int(pid, 16)) if packet[0] == "Z": return self.setBreakpoint(packet)
+ if packet.startswith("qThreadStopInfo"):
+ threadnum = int (packet[15:], 16)
+ return self.threadStopInfo(threadnum)
return self.other(packet) def interrupt(self):
@@ -204,6 +209,9 @@
def setBreakpoint(self, packet): raise self.UnexpectedPacketException()
+ def threadStopInfo(self, threadnum):
+ return ""
+
def other(self, packet): # empty string means unsupported return ""
Sorry, I am not helpful to you in providing a unit test case for this patch. I am still learning about test suite framework and/or trying to get the lldb test suite up and running.
Regarding how I got to this, it is not that I have run into some issue due to the original code. For my private port, I had to create a thread ID list similar to m_thread_ids and during code browsing for that I happened to see the code in concern and observed that clearing m_thread_pcs is wrong there.
After the process gets stopped, while trying to set the thread stop info, we also update the thread list (ProcessGDBRemote::SetThreadStopInfo() ---> Process::UpdateThreadListIfNeeded() ---> ProcessGDBRemote::UpdateThreadList()) and here we can reach UpdateThreadIDList() if m_thread_ids.empty() is true i.e. if we somehow failed to populate m_thread_ids while parsing the stop info packet and need to populate the m_thread_ids now.