This is an archive of the discontinued LLVM Phabricator instance.

Add Read Thread to GDBRemoteCommunication.
ClosedPublic

Authored by EwanCrawford on May 28 2015, 4:15 AM.

Details

Summary

In order to support asynchronous notifications for non-stop mode this patch adds a packet read thread. This is done by implementing AppendBytesToCache() from the communications class, which continually reads packets into a packet queue. To initialize this thread StartReadThread() must be called by the client, so since llgs and platform tools use the GBDRemoteCommunicatos code they must also call this function as well as ProcessGDBRemote.

When the read thread detects an async notify packet it broadcasts this event, where the matching listener will be added in the next non-stop patch.

Packets are now accessed by calling ReadPacket() which pops a packet from the queue, instead of using WaitForPacketWithTimeoutMicroSecondsNoLock()

Diff Detail

Repository
rL LLVM

Event Timeline

EwanCrawford retitled this revision from to Add Read Thread to GDBRemoteCommunication. .
EwanCrawford updated this object.
EwanCrawford edited the test plan for this revision. (Show Details)
EwanCrawford added a reviewer: clayborg.
EwanCrawford set the repository for this revision to rL LLVM.
EwanCrawford added subscribers: Unknown Object (MLST), deepak2427, domipheus and 2 others.

Hello,

the patch seems fine to me in general, but I think we should implement the ReadPacket function in a more robust way.

include/lldb/Core/Communication.h
96

If this is used in GDBRemoteCommunication, shouldn't it be defined there also?
As I understand this, this is precisely the purpose why Communication class reserves the kLoUserBroadcastBit--kHiUserBroadcastBit range.

source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
331

Will this function be used anywhere after your patch? Will it be even possible/correct to use it? Should we remove it?

1008

s/and/an/ ?

1090

If we reach this, then we will basically do a busy loop, persistently locking the mutex, checking the first packet and unlocking it. That sounds bad.

1104

Could we do this properly, like waiting on a condition variable, which will be signaled when a packet appears, or something like that? I'm quite opposed to using sleep for synchronization.

source/Plugins/Process/gdb-remote/GDBRemoteCommunication.h
322

s/trys/tries/

Thanks for taking a look Pavel, I'll add more concrete synchronisation to ReadPacket().

include/lldb/Core/Communication.h
96

Missed that cheers.

source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
1090

Definitely a possiblity, will rework

Hi Ewan,

For a change of this magnitude, I usually someone on my team to run dosep 100x before and after the patch to see if there is any change in flakyness.

Can you do that (or at least 50x) and post the results to this thread?

Thanks,

Vince

clayborg requested changes to this revision.May 28 2015, 9:59 AM
clayborg edited edge metadata.

Separate read thread kills packet throughput performance (at least on MacOSX it does) which is very very important in keeping our debug sessions almost as fast as a native debugger plug-in.

When debugging with ProcessGDBRemote, run the "process plugin packet speed-test" command to test out your speed with and without your patch.

So you can't enable this by default, but you can enable the read thread only for non-stop mode.

Below is a Release build on MacOSX with debugserver (which doesn't use threading either), so no threading on host and no threading in debugserver looks like:

(lldb) process plugin packet speed-test
Testing sending 1000 packets of various sizes:
qSpeedTest(send=0      , recv=0      ) in 0.043407000 sec for  23037.76 packets/sec (  0.043407 ms per packet) with standard deviation of   0.003040 ms
qSpeedTest(send=0      , recv=4      ) in 0.043373000 sec for  23055.82 packets/sec (  0.043373 ms per packet) with standard deviation of   0.002833 ms
qSpeedTest(send=0      , recv=8      ) in 0.043265000 sec for  23113.37 packets/sec (  0.043265 ms per packet) with standard deviation of   0.002885 ms
qSpeedTest(send=0      , recv=16     ) in 0.044877000 sec for  22283.13 packets/sec (  0.044877 ms per packet) with standard deviation of   0.002629 ms
qSpeedTest(send=0      , recv=32     ) in 0.045456000 sec for  21999.30 packets/sec (  0.045456 ms per packet) with standard deviation of   0.003335 ms
qSpeedTest(send=0      , recv=64     ) in 0.045788000 sec for  21839.78 packets/sec (  0.045788 ms per packet) with standard deviation of   0.001968 ms
qSpeedTest(send=0      , recv=128    ) in 0.046402000 sec for  21550.79 packets/sec (  0.046402 ms per packet) with standard deviation of   0.002933 ms
qSpeedTest(send=0      , recv=256    ) in 0.047365000 sec for  21112.63 packets/sec (  0.047365 ms per packet) with standard deviation of   0.003117 ms
qSpeedTest(send=0      , recv=512    ) in 0.058481000 sec for  17099.57 packets/sec (  0.058481 ms per packet) with standard deviation of   0.005901 ms
qSpeedTest(send=0      , recv=1024   ) in 0.075536000 sec for  13238.72 packets/sec (  0.075536 ms per packet) with standard deviation of   0.005299 ms
qSpeedTest(send=4      , recv=0      ) in 0.043505000 sec for  22985.86 packets/sec (  0.043505 ms per packet) with standard deviation of   0.003556 ms
qSpeedTest(send=4      , recv=4      ) in 0.043088000 sec for  23208.32 packets/sec (  0.043088 ms per packet) with standard deviation of   0.001953 ms
qSpeedTest(send=4      , recv=8      ) in 0.043401000 sec for  23040.94 packets/sec (  0.043401 ms per packet) with standard deviation of   0.003155 ms
qSpeedTest(send=4      , recv=16     ) in 0.044738000 sec for  22352.36 packets/sec (  0.044738 ms per packet) with standard deviation of   0.001959 ms
qSpeedTest(send=4      , recv=32     ) in 0.044960000 sec for  22241.99 packets/sec (  0.044960 ms per packet) with standard deviation of   0.002635 ms
qSpeedTest(send=4      , recv=64     ) in 0.045800000 sec for  21834.06 packets/sec (  0.045800 ms per packet) with standard deviation of   0.002267 ms
qSpeedTest(send=4      , recv=128    ) in 0.046912000 sec for  21316.51 packets/sec (  0.046912 ms per packet) with standard deviation of   0.002648 ms
qSpeedTest(send=4      , recv=256    ) in 0.048323000 sec for  20694.08 packets/sec (  0.048323 ms per packet) with standard deviation of   0.002908 ms
qSpeedTest(send=4      , recv=512    ) in 0.058692000 sec for  17038.10 packets/sec (  0.058692 ms per packet) with standard deviation of   0.005240 ms
qSpeedTest(send=4      , recv=1024   ) in 0.075491000 sec for  13246.61 packets/sec (  0.075491 ms per packet) with standard deviation of   0.005945 ms
qSpeedTest(send=8      , recv=0      ) in 0.043619000 sec for  22925.79 packets/sec (  0.043619 ms per packet) with standard deviation of   0.003328 ms
qSpeedTest(send=8      , recv=4      ) in 0.043329000 sec for  23079.23 packets/sec (  0.043329 ms per packet) with standard deviation of   0.002049 ms
qSpeedTest(send=8      , recv=8      ) in 0.044033000 sec for  22710.24 packets/sec (  0.044033 ms per packet) with standard deviation of   0.004740 ms
qSpeedTest(send=8      , recv=16     ) in 0.045068000 sec for  22188.69 packets/sec (  0.045068 ms per packet) with standard deviation of   0.002431 ms
qSpeedTest(send=8      , recv=32     ) in 0.045400000 sec for  22026.43 packets/sec (  0.045400 ms per packet) with standard deviation of   0.003316 ms
qSpeedTest(send=8      , recv=64     ) in 0.046099000 sec for  21692.44 packets/sec (  0.046099 ms per packet) with standard deviation of   0.002411 ms
qSpeedTest(send=8      , recv=128    ) in 0.045800000 sec for  21834.06 packets/sec (  0.045800 ms per packet) with standard deviation of   0.002543 ms
qSpeedTest(send=8      , recv=256    ) in 0.047650000 sec for  20986.36 packets/sec (  0.047650 ms per packet) with standard deviation of   0.003360 ms
qSpeedTest(send=8      , recv=512    ) in 0.059011000 sec for  16945.99 packets/sec (  0.059011 ms per packet) with standard deviation of   0.005562 ms
qSpeedTest(send=8      , recv=1024   ) in 0.076940000 sec for  12997.14 packets/sec (  0.076940 ms per packet) with standard deviation of   0.008166 ms
qSpeedTest(send=16     , recv=0      ) in 0.043593000 sec for  22939.46 packets/sec (  0.043593 ms per packet) with standard deviation of   0.003135 ms
qSpeedTest(send=16     , recv=4      ) in 0.043572000 sec for  22950.52 packets/sec (  0.043572 ms per packet) with standard deviation of   0.003450 ms
qSpeedTest(send=16     , recv=8      ) in 0.043752000 sec for  22856.10 packets/sec (  0.043752 ms per packet) with standard deviation of   0.002907 ms
qSpeedTest(send=16     , recv=16     ) in 0.044829000 sec for  22306.99 packets/sec (  0.044829 ms per packet) with standard deviation of   0.003527 ms
qSpeedTest(send=16     , recv=32     ) in 0.044844000 sec for  22299.53 packets/sec (  0.044844 ms per packet) with standard deviation of   0.002583 ms
qSpeedTest(send=16     , recv=64     ) in 0.046094000 sec for  21694.80 packets/sec (  0.046094 ms per packet) with standard deviation of   0.003188 ms
qSpeedTest(send=16     , recv=128    ) in 0.046679000 sec for  21422.91 packets/sec (  0.046679 ms per packet) with standard deviation of   0.002531 ms
qSpeedTest(send=16     , recv=256    ) in 0.048688000 sec for  20538.94 packets/sec (  0.048688 ms per packet) with standard deviation of   0.003375 ms
qSpeedTest(send=16     , recv=512    ) in 0.058661000 sec for  17047.10 packets/sec (  0.058661 ms per packet) with standard deviation of   0.005435 ms
qSpeedTest(send=16     , recv=1024   ) in 0.076160000 sec for  13130.25 packets/sec (  0.076160 ms per packet) with standard deviation of   0.007130 ms
qSpeedTest(send=32     , recv=0      ) in 0.043395000 sec for  23044.13 packets/sec (  0.043395 ms per packet) with standard deviation of   0.002834 ms
qSpeedTest(send=32     , recv=4      ) in 0.043258000 sec for  23117.11 packets/sec (  0.043258 ms per packet) with standard deviation of   0.003069 ms
qSpeedTest(send=32     , recv=8      ) in 0.043024000 sec for  23242.84 packets/sec (  0.043024 ms per packet) with standard deviation of   0.002030 ms
qSpeedTest(send=32     , recv=16     ) in 0.045055000 sec for  22195.09 packets/sec (  0.045055 ms per packet) with standard deviation of   0.003128 ms
qSpeedTest(send=32     , recv=32     ) in 0.045248000 sec for  22100.42 packets/sec (  0.045248 ms per packet) with standard deviation of   0.002482 ms
qSpeedTest(send=32     , recv=64     ) in 0.046054000 sec for  21713.64 packets/sec (  0.046054 ms per packet) with standard deviation of   0.002869 ms
qSpeedTest(send=32     , recv=128    ) in 0.046790000 sec for  21372.09 packets/sec (  0.046790 ms per packet) with standard deviation of   0.002940 ms
qSpeedTest(send=32     , recv=256    ) in 0.048270000 sec for  20716.80 packets/sec (  0.048270 ms per packet) with standard deviation of   0.003290 ms
qSpeedTest(send=32     , recv=512    ) in 0.058397000 sec for  17124.17 packets/sec (  0.058397 ms per packet) with standard deviation of   0.004785 ms
qSpeedTest(send=32     , recv=1024   ) in 0.075872000 sec for  13180.09 packets/sec (  0.075872 ms per packet) with standard deviation of   0.006399 ms
qSpeedTest(send=64     , recv=0      ) in 0.044656000 sec for  22393.41 packets/sec (  0.044656 ms per packet) with standard deviation of   0.003807 ms
qSpeedTest(send=64     , recv=4      ) in 0.043947000 sec for  22754.68 packets/sec (  0.043947 ms per packet) with standard deviation of   0.003668 ms
qSpeedTest(send=64     , recv=8      ) in 0.044312000 sec for  22567.25 packets/sec (  0.044312 ms per packet) with standard deviation of   0.003122 ms
qSpeedTest(send=64     , recv=16     ) in 0.045796000 sec for  21835.97 packets/sec (  0.045796 ms per packet) with standard deviation of   0.003789 ms
qSpeedTest(send=64     , recv=32     ) in 0.045721000 sec for  21871.79 packets/sec (  0.045721 ms per packet) with standard deviation of   0.002440 ms
qSpeedTest(send=64     , recv=64     ) in 0.046076000 sec for  21703.27 packets/sec (  0.046076 ms per packet) with standard deviation of   0.003583 ms
qSpeedTest(send=64     , recv=128    ) in 0.046691000 sec for  21417.40 packets/sec (  0.046691 ms per packet) with standard deviation of   0.003030 ms
qSpeedTest(send=64     , recv=256    ) in 0.048857000 sec for  20467.90 packets/sec (  0.048857 ms per packet) with standard deviation of   0.003644 ms
qSpeedTest(send=64     , recv=512    ) in 0.058494000 sec for  17095.77 packets/sec (  0.058494 ms per packet) with standard deviation of   0.003907 ms
qSpeedTest(send=64     , recv=1024   ) in 0.075563000 sec for  13233.99 packets/sec (  0.075563 ms per packet) with standard deviation of   0.004260 ms
qSpeedTest(send=128    , recv=0      ) in 0.045052000 sec for  22196.57 packets/sec (  0.045052 ms per packet) with standard deviation of   0.005080 ms
qSpeedTest(send=128    , recv=4      ) in 0.044427000 sec for  22508.84 packets/sec (  0.044427 ms per packet) with standard deviation of   0.003283 ms
qSpeedTest(send=128    , recv=8      ) in 0.044749000 sec for  22346.87 packets/sec (  0.044749 ms per packet) with standard deviation of   0.003924 ms
qSpeedTest(send=128    , recv=16     ) in 0.045931000 sec for  21771.79 packets/sec (  0.045931 ms per packet) with standard deviation of   0.003398 ms
qSpeedTest(send=128    , recv=32     ) in 0.046270000 sec for  21612.28 packets/sec (  0.046270 ms per packet) with standard deviation of   0.003937 ms
qSpeedTest(send=128    , recv=64     ) in 0.046336000 sec for  21581.49 packets/sec (  0.046336 ms per packet) with standard deviation of   0.002467 ms
qSpeedTest(send=128    , recv=128    ) in 0.046792000 sec for  21371.17 packets/sec (  0.046792 ms per packet) with standard deviation of   0.003646 ms
qSpeedTest(send=128    , recv=256    ) in 0.049746000 sec for  20102.12 packets/sec (  0.049746 ms per packet) with standard deviation of   0.005391 ms
qSpeedTest(send=128    , recv=512    ) in 0.059848000 sec for  16709.00 packets/sec (  0.059848 ms per packet) with standard deviation of   0.006661 ms
qSpeedTest(send=128    , recv=1024   ) in 0.076694000 sec for  13038.83 packets/sec (  0.076694 ms per packet) with standard deviation of   0.006645 ms
qSpeedTest(send=256    , recv=0      ) in 0.045154000 sec for  22146.43 packets/sec (  0.045154 ms per packet) with standard deviation of   0.003923 ms
qSpeedTest(send=256    , recv=4      ) in 0.044918000 sec for  22262.79 packets/sec (  0.044918 ms per packet) with standard deviation of   0.002861 ms
qSpeedTest(send=256    , recv=8      ) in 0.044520000 sec for  22461.81 packets/sec (  0.044520 ms per packet) with standard deviation of   0.002970 ms
qSpeedTest(send=256    , recv=16     ) in 0.045413000 sec for  22020.13 packets/sec (  0.045413 ms per packet) with standard deviation of   0.002399 ms
qSpeedTest(send=256    , recv=32     ) in 0.045886000 sec for  21793.14 packets/sec (  0.045886 ms per packet) with standard deviation of   0.003699 ms
qSpeedTest(send=256    , recv=64     ) in 0.046287000 sec for  21604.34 packets/sec (  0.046287 ms per packet) with standard deviation of   0.002430 ms
qSpeedTest(send=256    , recv=128    ) in 0.047735000 sec for  20948.99 packets/sec (  0.047735 ms per packet) with standard deviation of   0.004090 ms
qSpeedTest(send=256    , recv=256    ) in 0.049604000 sec for  20159.66 packets/sec (  0.049604 ms per packet) with standard deviation of   0.003202 ms
qSpeedTest(send=256    , recv=512    ) in 0.058780000 sec for  17012.59 packets/sec (  0.058780 ms per packet) with standard deviation of   0.004809 ms
qSpeedTest(send=256    , recv=1024   ) in 0.077135000 sec for  12964.28 packets/sec (  0.077135 ms per packet) with standard deviation of   0.007074 ms
qSpeedTest(send=512    , recv=0      ) in 0.045352000 sec for  22049.74 packets/sec (  0.045352 ms per packet) with standard deviation of   0.004284 ms
qSpeedTest(send=512    , recv=4      ) in 0.044826000 sec for  22308.48 packets/sec (  0.044826 ms per packet) with standard deviation of   0.003033 ms
qSpeedTest(send=512    , recv=8      ) in 0.044908000 sec for  22267.75 packets/sec (  0.044908 ms per packet) with standard deviation of   0.004014 ms
qSpeedTest(send=512    , recv=16     ) in 0.046137000 sec for  21674.58 packets/sec (  0.046137 ms per packet) with standard deviation of   0.003168 ms
qSpeedTest(send=512    , recv=32     ) in 0.046169000 sec for  21659.55 packets/sec (  0.046169 ms per packet) with standard deviation of   0.003637 ms
qSpeedTest(send=512    , recv=64     ) in 0.046608000 sec for  21455.54 packets/sec (  0.046608 ms per packet) with standard deviation of   0.003272 ms
qSpeedTest(send=512    , recv=128    ) in 0.048330000 sec for  20691.08 packets/sec (  0.048330 ms per packet) with standard deviation of   0.004049 ms
qSpeedTest(send=512    , recv=256    ) in 0.049025000 sec for  20397.76 packets/sec (  0.049025 ms per packet) with standard deviation of   0.003430 ms
qSpeedTest(send=512    , recv=512    ) in 0.060135000 sec for  16629.25 packets/sec (  0.060135 ms per packet) with standard deviation of   0.008172 ms
qSpeedTest(send=512    , recv=1024   ) in 0.076191000 sec for  13124.91 packets/sec (  0.076191 ms per packet) with standard deviation of   0.005539 ms
qSpeedTest(send=1024   , recv=0      ) in 0.047727000 sec for  20952.50 packets/sec (  0.047727 ms per packet) with standard deviation of   0.004504 ms
qSpeedTest(send=1024   , recv=4      ) in 0.047079000 sec for  21240.89 packets/sec (  0.047079 ms per packet) with standard deviation of   0.003641 ms
qSpeedTest(send=1024   , recv=8      ) in 0.047646000 sec for  20988.12 packets/sec (  0.047646 ms per packet) with standard deviation of   0.003186 ms
qSpeedTest(send=1024   , recv=16     ) in 0.048328000 sec for  20691.94 packets/sec (  0.048328 ms per packet) with standard deviation of   0.004481 ms
qSpeedTest(send=1024   , recv=32     ) in 0.048736000 sec for  20518.71 packets/sec (  0.048736 ms per packet) with standard deviation of   0.003320 ms
qSpeedTest(send=1024   , recv=64     ) in 0.050781000 sec for  19692.40 packets/sec (  0.050781 ms per packet) with standard deviation of   0.006328 ms
qSpeedTest(send=1024   , recv=128    ) in 0.052327000 sec for  19110.59 packets/sec (  0.052327 ms per packet) with standard deviation of   0.006097 ms
qSpeedTest(send=1024   , recv=256    ) in 0.054200000 sec for  18450.18 packets/sec (  0.054200 ms per packet) with standard deviation of   0.006549 ms
qSpeedTest(send=1024   , recv=512    ) in 0.061518000 sec for  16255.40 packets/sec (  0.061518 ms per packet) with standard deviation of   0.005501 ms
qSpeedTest(send=1024   , recv=1024   ) in 0.078723000 sec for  12702.77 packets/sec (  0.078723 ms per packet) with standard deviation of   0.006038 ms
Testing receiving 4.0MB of data using varying receive packet sizes:
qSpeedTest(send=0      , recv=32     ) 131072 packets needed to receive 4.0MB in 5.871427000 sec for 0.681265 MB/sec for  22323.70 packets/sec (  0.044795 ms per packet)
qSpeedTest(send=0      , recv=64     )  65536 packets needed to receive 4.0MB in 2.985487000 sec for 1.339815 MB/sec for  21951.53 packets/sec (  0.045555 ms per packet)
qSpeedTest(send=0      , recv=128    )  32768 packets needed to receive 4.0MB in 1.523513000 sec for 2.625511 MB/sec for  21508.19 packets/sec (  0.046494 ms per packet)
qSpeedTest(send=0      , recv=256    )  16384 packets needed to receive 4.0MB in 0.778666000 sec for 5.136991 MB/sec for  21041.12 packets/sec (  0.047526 ms per packet)
qSpeedTest(send=0      , recv=512    )   8192 packets needed to receive 4.0MB in 0.475075000 sec for 8.419723 MB/sec for  17243.59 packets/sec (  0.057993 ms per packet)
qSpeedTest(send=0      , recv=1024   )   4096 packets needed to receive 4.0MB in 0.302972000 sec for 13.202540 MB/sec for  13519.40 packets/sec (  0.073968 ms per packet)
This revision now requires changes to proceed.May 28 2015, 9:59 AM

So the important metric for us the the packets/sec part of the speed test output.

I appreciate the feedback guys. I'll enable the read thread only for non-stop and come back with speed-test and test suite results.

vharron requested changes to this revision.May 31 2015, 11:06 AM
vharron added a reviewer: vharron.

Hi Ewan,

In order to support asynchronous notifications for non-stop mode this patch adds a packet read thread.

Can you help me understand the scenario in which the current code paths don't work?

This is done by implementing AppendBytesToCache() from the communications class, which continually reads packets into a packet queue.

If the host is piping a ton of stdin into a inferior with its stdin read thread stopped, will this just allocate memory on the debug server until it explodes? =)

Most importantly, we just went through a lot of effort to eliminate 2 threads (and thus a lot of race conditions) in lldb-server. I really don't want to introduce another thread if there is any other way.

source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
1053

packet

1090

This just concerns me in general. In general, what is the strategy for preventing thread A from processing thread B's messages? This seems fragile.

1104

+1, this is likely the source of the performance issues

EwanCrawford edited edge metadata.

Here's an updated patch only enabling the thread for non-stop mode. This has the advantage of making the patch simpler as we don't have to worry about the read thread on llgs, since we only recieve notification packets on host. But having two different ways of handling packets doesn't seem very maintainable, so I'm not sure if this is the best solution.

In the current code path packets are usually sent with SendPacketAndWaitForResponse(), where by calling CheckForPacket() we parse the received packet.
However notification packets can be sent asynchronously at any time, not just when we are waiting on a response. So right now when we send a continue packet it gets responded to with an 'OK' in non-stop mode, then the stop async notification arrives some-time later which we never read since it's not a response to any packet.

By having a read thread we can continually call CheckForPacket() to identify the notification packets when they arrive. I'm not sure of a good alternative to having a separate thread for regularly checking for async packets, but am open to suggestions.

I'll post the test-suite/packet-speed comparisons of when the read thread is enabled.

labath added a comment.Jun 1 2015, 5:02 AM

Thanks for adding proper synchronisation, this looks much better now. I also like the fact that we can avoid an additional thread on the server side.

I agree that having two distinct ways of reading a packet does not sound like a good idea. One way I can think of to avoid is to minimize the work the async notification thread does. I.e., instead of having the thread actually read the packet and place it in a queue, we could merely do a select() on the connection descriptor and then send a notification once data becomes available. Then it would be the responsibility of the receiver to read the data at some point and do something with it (separate async notifications from replies and process them).

All this work could be done in WaitForPacketWithTimeoutMicroSecondsNoLock so it would enable better code sharing (I hope) and I like this pattern more since it ensures a more serial processing of the server packets (whereas if you separate async messages from replies it's easy for their processing to become wildly desynchronised). However, this may require implementing a number of things differently, so I'm not sure if it is worth it. What do you think?

source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
1034

tondition->condition

My main concern with any threading is pairing the correct response to a the person that sent the packet. Are async packets marked so they absolutely can't be confused with any other response packets? I really don't like the fact that there are no sequence numbers in the sent/received packets because this makes it hard to know when you get the right response if a previous packet times out. I recently added "qEcho" to support making sure we stay on track and would like to ensure that any of the code added above won't let any packets receive the wrong reply packet.

Ewan, has your non-stop mode implementation been tested against gdbserver?

I agree that having two distinct ways of reading a packet does not sound like a good idea. One way I can think of to avoid is to minimize the work the async notification thread does. I.e., instead of having the thread actually read the packet and place it in a queue, we could merely do a select() on the connection descriptor and then send a notification once data becomes available. Then it would be the responsibility of the receiver to read the data at some point and do something with it (separate async notifications from replies and process them).

All this work could be done in WaitForPacketWithTimeoutMicroSecondsNoLock so it would enable better code sharing (I hope) and I like this pattern more since it ensures a more serial processing of the server packets (whereas if you separate async messages from replies it's easy for their processing to become wildly desynchronised). However, this may require implementing a number of things differently, so I'm not sure if it is worth it. What do you think?

I could probably find a way for the notification to be sent when data is ready by overriding Communication::ReadThread() somehow, might be messy though.
More of a problem is where to call WaitForPacketWithTimeoutMicroSecondsNoLock when we have a notification for data but no packets to send. Somewhere in ProcessGDBRemote::AsyncThread?

My main concern with any threading is pairing the correct response to a the person that sent the packet. Are async packets marked so they absolutely can't be confused with any other response packets? I really don't like the fact that there are no sequence numbers in the sent/received packets because this makes it hard to know when you get the right response if a previous packet times out. I recently added "qEcho" to support making sure we stay on track and would like to ensure that any of the code added above won't let any packets receive the wrong reply packet.

Async packets begin with a % instead of the $ for regular sync packets, so there is a clear differentiation.
Regular synchronous communication also shouldn't become offset by an async packet. As in AppendBytesToCache async packets aren't added to the packet queue, instead they are processed as an event in a broadcast.

Ewan, has your non-stop mode implementation been tested against gdbserver?

I've tested it against gdbserver on x86, without any automated testing however so not as rigorously as would be preferred.
Packet handling seems to work correctly, as does the basics of controlling a single thread while others are executing using the 'thread step/next/c' commands.
Although there are definitely still bugs due to thread plan and process state assumptions.

EwanCrawford edited edge metadata.

I've attempted a different approach where we don't need to add another thread. This patch is a rough attempt for illustrating the idea, but essentially the AsyncThread thread is changed to poll for notification packets. So although this doesn't add a new thread, the changes require an existing thread to be more active.

During polling the async thread checks the cached m_bytes for any notification packets, before attempting to read bytes from the connection. This polling could also be changed so that it's only enabled when we are using non-stop mode.

There's also the possibility that the async packets could be received in the middle of a synchronous send-response sequence, which we need to catch in WaitForPacketWithTimeoutMicroSecondsNoLock() and insert the notify packet into a broadcast.

Would this approach address the concerns about adding a new thread?

labath added a comment.Jun 5 2015, 3:06 PM

I personally think that this is a step in the wrong direction. In my opinion, polling is much worse than an extra thread and it usually increases the latency of everything.

source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
337

I would expect a "peek" function to return instantly and not after a 10ms (or any other) delay.

source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
3242

This feels like a hack. I presume you are doing this to trigger some side effect of SetPrivateState(Stopped). Wouldn't it be better to extract that functionality to a separate function? What is the meaning of the "state" of the whole process in non-stop mode anyway? (e.g. what should the state be if we have 3 threads running and 5 threads stopped?)

I still prefer the previous version as well. Anyway, here are some numbers with the read thread from the packet speed test on different platforms.
There is an impact on performance with mac as suspected, but if this only takes place when non-stop is enabled I don't know how significant the difference is.

Linux - WaitForPacketWithTimeoutMicroSecondsNoLock

Testing receiving 4.0MB of data using varying receive packet sizes:
qSpeedTest(send=0      , recv=32     ) 131072 packets needed to receive 4.0MB in 4.489400000 sec for 0.890988 MB/sec for  29195.88 packets/sec (  0.034251 ms per packet)
qSpeedTest(send=0      , recv=64     )  65536 packets needed to receive 4.0MB in 2.503353000 sec for 1.597857 MB/sec for  26179.29 packets/sec (  0.038198 ms per packet)
qSpeedTest(send=0      , recv=128    )  32768 packets needed to receive 4.0MB in 1.438591000 sec for 2.780499 MB/sec for  22777.84 packets/sec (  0.043902 ms per packet)
qSpeedTest(send=0      , recv=256    )  16384 packets needed to receive 4.0MB in 0.860445000 sec for 4.648757 MB/sec for  19041.31 packets/sec (  0.052517 ms per packet)
qSpeedTest(send=0      , recv=512    )   8192 packets needed to receive 4.0MB in 0.538240000 sec for 7.431629 MB/sec for  15219.98 packets/sec (  0.065703 ms per packet)
qSpeedTest(send=0      , recv=1024   )   4096 packets needed to receive 4.0MB in 0.400955000 sec for 9.976182 MB/sec for  10215.61 packets/sec (  0.097889 ms per packet)

Linux - ReadThread

Testing receiving 4.0MB of data using varying receive packet sizes:
qSpeedTest(send=0      , recv=32     ) 131072 packets needed to receive 4.0MB in 4.453370000 sec for 0.898196 MB/sec for  29432.09 packets/sec (  0.033977 ms per packet)
qSpeedTest(send=0      , recv=64     )  65536 packets needed to receive 4.0MB in 2.476258000 sec for 1.615340 MB/sec for  26465.74 packets/sec (  0.037785 ms per packet)
qSpeedTest(send=0      , recv=128    )  32768 packets needed to receive 4.0MB in 1.408105000 sec for 2.840697 MB/sec for  23270.99 packets/sec (  0.042972 ms per packet)
qSpeedTest(send=0      , recv=256    )  16384 packets needed to receive 4.0MB in 0.840672000 sec for 4.758098 MB/sec for  19489.17 packets/sec (  0.051311 ms per packet)
qSpeedTest(send=0      , recv=512    )   8192 packets needed to receive 4.0MB in 0.573062000 sec for 6.980047 MB/sec for  14295.14 packets/sec (  0.069954 ms per packet)
qSpeedTest(send=0      , recv=1024   )   4096 packets needed to receive 4.0MB in 0.427141000 sec for 9.364589 MB/sec for   9589.34 packets/sec (  0.104282 ms per packet)

Mac Mini(Debug) - WaitForPacketWithTimeoutMicroSecondsNoLock

Testing receiving 4.0MB of data using varying receive packet sizes:
qSpeedTest(send=0      , recv=32     ) 131072 packets needed to receive 4.0MB in 9.911248000 sec for 0.403582 MB/sec for  13224.57 packets/sec (  0.075617 ms per packet)
qSpeedTest(send=0      , recv=64     )  65536 packets needed to receive 4.0MB in 4.906384000 sec for 0.815264 MB/sec for  13357.29 packets/sec (  0.074865 ms per packet)
qSpeedTest(send=0      , recv=128    )  32768 packets needed to receive 4.0MB in 2.666435000 sec for 1.500130 MB/sec for  12289.07 packets/sec (  0.081373 ms per packet)
qSpeedTest(send=0      , recv=256    )  16384 packets needed to receive 4.0MB in 1.465932000 sec for 2.728640 MB/sec for  11176.51 packets/sec (  0.089473 ms per packet)
qSpeedTest(send=0      , recv=512    )   8192 packets needed to receive 4.0MB in 0.922116000 sec for 4.337849 MB/sec for   8883.92 packets/sec (  0.112563 ms per packet)
qSpeedTest(send=0      , recv=1024   )   4096 packets needed to receive 4.0MB in 0.802231000 sec for 4.986095 MB/sec for   5105.76 packets/sec (  0.195857 ms per packet)

Mac Mini(Debug) - ReadThread

Testing receiving 4.0MB of data using varying receive packet sizes:
qSpeedTest(send=0      , recv=32     ) 131072 packets needed to receive 4.0MB in 10.168392000 sec for 0.393376 MB/sec for  12890.14 packets/sec (  0.077579 ms per packet)
qSpeedTest(send=0      , recv=64     )  65536 packets needed to receive 4.0MB in 5.112452000 sec for 0.782403 MB/sec for  12818.90 packets/sec (  0.078010 ms per packet)
qSpeedTest(send=0      , recv=128    )  32768 packets needed to receive 4.0MB in 2.962825000 sec for 1.350063 MB/sec for  11059.71 packets/sec (  0.090418 ms per packet)
qSpeedTest(send=0      , recv=256    )  16384 packets needed to receive 4.0MB in 1.576047000 sec for 2.537995 MB/sec for  10395.63 packets/sec (  0.096194 ms per packet)
qSpeedTest(send=0      , recv=512    )   8192 packets needed to receive 4.0MB in 0.998601000 sec for 4.005604 MB/sec for   8203.48 packets/sec (  0.121900 ms per packet)
qSpeedTest(send=0      , recv=1024   )   4096 packets needed to receive 4.0MB in 0.891776000 sec for 4.485431 MB/sec for   4593.08 packets/sec (  0.217719 ms per packet)

Mac Mini(Release) - WaitForPacketWithTimeoutMicroSecondsNoLock

Testing receiving 4.0MB of data using varying receive packet sizes:
qSpeedTest(send=0      , recv=32     ) 131072 packets needed to receive 4.0MB in 8.621728000 sec for 0.463944 MB/sec for  15202.52 packets/sec (  0.065779 ms per packet)
qSpeedTest(send=0      , recv=64     )  65536 packets needed to receive 4.0MB in 4.492886000 sec for 0.890296 MB/sec for  14586.62 packets/sec (  0.068556 ms per packet)
qSpeedTest(send=0      , recv=128    )  32768 packets needed to receive 4.0MB in 2.295398000 sec for 1.742617 MB/sec for  14275.52 packets/sec (  0.070050 ms per packet)
qSpeedTest(send=0      , recv=256    )  16384 packets needed to receive 4.0MB in 1.204236000 sec for 3.321608 MB/sec for  13605.31 packets/sec (  0.073501 ms per packet)
qSpeedTest(send=0      , recv=512    )   8192 packets needed to receive 4.0MB in 0.641858000 sec for 6.231908 MB/sec for  12762.95 packets/sec (  0.078352 ms per packet)
qSpeedTest(send=0      , recv=1024   )   4096 packets needed to receive 4.0MB in 0.423977000 sec for 9.434475 MB/sec for   9660.90 packets/sec (  0.103510 ms per packet)

Mac Mini(Release) - ReadThread

Testing receiving 4.0MB of data using varying receive packet sizes:
qSpeedTest(send=0      , recv=32     ) 131072 packets needed to receive 4.0MB in 9.397579000 sec for 0.425642 MB/sec for  13947.42 packets/sec (  0.071698 ms per packet)
qSpeedTest(send=0      , recv=64     )  65536 packets needed to receive 4.0MB in 4.745158000 sec for 0.842965 MB/sec for  13811.13 packets/sec (  0.072405 ms per packet)
qSpeedTest(send=0      , recv=128    )  32768 packets needed to receive 4.0MB in 2.494341000 sec for 1.603630 MB/sec for  13136.94 packets/sec (  0.076121 ms per packet)
qSpeedTest(send=0      , recv=256    )  16384 packets needed to receive 4.0MB in 1.289940000 sec for 3.100919 MB/sec for  12701.37 packets/sec (  0.078732 ms per packet)
qSpeedTest(send=0      , recv=512    )   8192 packets needed to receive 4.0MB in 0.741130000 sec for 5.397164 MB/sec for  11053.39 packets/sec (  0.090470 ms per packet)
qSpeedTest(send=0      , recv=1024   )   4096 packets needed to receive 4.0MB in 0.511958000 sec for 7.813140 MB/sec for   8000.66 packets/sec (  0.124990 ms per packet)

I also ran dosep.py 50 times on linux, doesn't seem to be a regression but there's a lot of noise.

Clean repo before patch.

50 x FAIL: LLDB (suite) :: TestDataFormatterLibccIterator.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64) 
50 x FAIL: LLDB (suite) :: TestDataFormatterLibccMap.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)
50 x FAIL: LLDB (suite) :: TestDataFormatterLibccMultiMap.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxList.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxMultiSet.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxSet.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)   
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxString.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxVBool.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)   
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxVector.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)    
50 x FAIL: LLDB (suite) :: TestDataFormatterUnordered.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)       
30 x FAIL: LLDB (suite) :: TestLldbGdbServer.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)                      
50 x TIMEOUT: LLDB (suite) :: TestThreadStates.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)

With non-stop mode enabled after patch, using read thread.

50 x FAIL: LLDB (suite) :: TestDataFormatterLibccIterator.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)
50 x FAIL: LLDB (suite) :: TestDataFormatterLibccMap.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)     
50 x FAIL: LLDB (suite) :: TestDataFormatterLibccMultiMap.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64) 
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxList.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)      
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxMultiSet.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)  
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxSet.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)    
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxString.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)   
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxVBool.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)   
50 x FAIL: LLDB (suite) :: TestDataFormatterLibcxxVector.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)    
50 x FAIL: LLDB (suite) :: TestDataFormatterUnordered.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)  
30 x FAIL: LLDB (suite) :: TestLldbGdbServer.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)              
44 x TIMEOUT: LLDB (suite) :: TestThreadStates.py (Linux ewan-VirtualBox 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64)
source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
3242

The side effect of SetPrivateState is that if there is a change in state it triggers a broadcast.

Process state a lot trickier since we haven't added an enum entry for a non-stop state type. eStateStopped is what seems to make the most sense, since there are a lot of commands lldb doesn't let you perform if the state is running. However having a stopped state also means that lldb assumes it's okay to things like read register values from the running threads. Having a complete solution for that will take more extensive planning, we've hacked around it a bit for now.

I looked into how GDB processes async packets with only one thread. Essentially GDB has a high level event loop which listens for events from multiple sources. Events are queued and only processed later by calling the appropriate handler, allowing the event loop to continue listening for more events. There is an event for async notification packets which is generated when the notify packet is received, but only responded later so that it doesn't interfere with regular synchronous communication. The event itself is generated from either polling the serial connection or looking for a synchronous response after a send.

For our purposes polling isn't the best solution since we can spawn a thread from Communication easily. However delaying the notify processing does address a race condition I didn't previously cover.

(a) ---> m
(b) <--- %notify
(c) [broadcast notify packet received]
(d) ---> vStopped [broadcast is handled by listener in AysncThread before reply to (a) is received]
(e) <--- memory reply to (a), unexpected response to vStopped.

This patch fixes this by getting SendPacketAndWaitForResponse() to Hijack the notify broadcast, then rebroadcasting after the synchronous response has been seen. Avoiding interference with regular synchronous communication.

clayborg accepted this revision.Jun 15 2015, 3:51 PM
clayborg edited edge metadata.

Looks fine as long as all tests still pass on MacOSX and there is no regression is packet speed test.

This revision was automatically updated to reflect the committed changes.