Page MenuHomePhabricator

gdb-remote: Make the sequence mutex non-recursive
ClosedPublic

Authored by labath on Aug 23 2016, 6:31 AM.

Details

Summary

This is a preparatory commit for D22914, where I'd like to replace this mutex by an R/W lock
(which is also not recursive). This required a couple of changes:

  • Read/WriteRegister expect the caller to obtain the mutex and are renamed to reflect that. The only caller of these functions is the GDBRemoteRegisterContext class, which already acquired that mutex (for the most part), so the changes there are minimal. (It is not clear to me whether this function actually needs to take the lock. The thing which this lock seems to protect is the state of the register context if it is being accessed from multiple threads, which, if true, sounds like it could be done by a separate mutex.)
  • GetThreadSuffixSupported() was being called from locked and unlocked contexts (including contexts where the process was running, and the call would fail if it did not have the result cached). I have split this into two functions, one which computes the thread suffix support and caches it (this one always takes the lock), and another, which returns the cached value (and never needs to take the lock). This feels quite natural as ProcessGdbRemote was already pre-caching this value at the start.

Diff Detail

Repository
rL LLVM

Event Timeline

labath updated this revision to Diff 68985.Aug 23 2016, 6:31 AM
labath retitled this revision from to gdb-remote: Make the sequence mutex non-recursive.
labath updated this object.
labath added a reviewer: clayborg.
labath added a subscriber: lldb-commits.
clayborg edited edge metadata.Aug 23 2016, 9:18 AM

The old mutex was there so that you could send a packet and get a result without worrying about other threads getting your response. It was recursive for the case where we needed to send two packets as one (set thread ID, then send packet), and for reading all registers (as you have found), so your patch does work, but it now only for these special cases. Now there is no way to easily take the mutex and send 5 different packets without going and adding new NoLock variants of each call that you might want to send as a single stream of packets. Can we think slightly cleaner way to do this? Before you could stake the sequence mutex in your main function that wanted to make N calls, and each call would then recursively lock the mutex and everything just worked. Now you will deadlock if you do things incorrectly (granted you will need to do a "GDBRemoteClientBase::Lock lock(gdb_comm, false);" followed by a "m_gdb_comm.SendSomePacket()".

The old mutex was there so that you could send a packet and get a result without worrying about other threads getting your response. It was recursive for the case where we needed to send two packets as one (set thread ID, then send packet), and for reading all registers (as you have found), so your patch does work, but it now only for these special cases. Now there is no way to easily take the mutex and send 5 different packets without going and adding new NoLock variants of each call that you might want to send as a single stream of packets. Can we think slightly cleaner way to do this? Before you could stake the sequence mutex in your main function that wanted to make N calls, and each call would then recursively lock the mutex and everything just worked. Now you will deadlock if you do things incorrectly (granted you will need to do a "GDBRemoteClientBase::Lock lock(gdb_comm, false);" followed by a "m_gdb_comm.SendSomePacket()".

I see what you mean. I have considered a couple of other options, but none of them struck me as obviously best. Alternatives I see are:

  • have all functions (or just those that we need right now) take an optional lock boolean parameter (probably defaulting to true). If it is set the function takes a lock, if not, it doesn't. The main downside I see here is that then the locking behaviour would depend on the run-time parameter, which is not completely ideal. OTOH, it would sort of sit along nicely with the "send_async" parameter, which also controls behavior in a similar way.
  • just make all functions NoLock and require the user to lock manually. The tricky part here is that unlike a normal lock, locking the packet lock can fail, so you end up with code like Lock lock; if(!lock) {if(log) log->Printf("getting registers failed");} else { ... }, which doesn't look nice but we could make it better. If we move the logging part into the Lock object, then this would end up with: Lock lock(gdb_comm, false, "getting registers"); if (!lock) return; ... which I don't think is too bad.
  • like before, but only do it for functions which are used from locked and unlocked contexts. Then the policy would be that if you need a NoLock version of a function, instead of creating a new one, you rename the current one and change all callers (I don't expect that there will be so many of them) to lock explicitly. This does not require any major rewrite, but it is a bit messy because you have two kinds of functions. (This is not really an alternative, just an extension of the current approach.)
  • write our own implementation of a recursive read/write lock. This may not be so difficult even (although I can see how some edge cases might be tricky -- e.g. we have a read lock (so we can share the connection with other threads), but then we need to send a packet which wants exclusive access), and it would nicely compartmentalize all the ugliness. However, I am kinda against inventing locking primitives (even though the Lock class is such a thing, but there I think it was justified).

What do you think?

clayborg requested changes to this revision.Aug 23 2016, 11:43 AM
clayborg edited edge metadata.

Maybe instead of "NoLock" on the functions we those functions take an extra argument? Then someone can't accidentally run one of those without locking. We need to somehow enforce that locking must occur. Maybe passing an extra:

const GDBRemoteClientBase::Lock &lock

Making it const can stop people from locking it in the NoLock functions, and we can maybe assert:

bool
GDBRemoteCommunicationClient::SetCurrentThread(uint64_t tid, const GDBRemoteClientBase::Lock &lock)
{
    assert(lock.IsAcquired());

Then IsAcquired() can be const, but this at least would enforce that someone needs to lock before sending the packet. We probably need to document these no locking commands and let users know they must lock and pass in the lock for verification.

This revision now requires changes to proceed.Aug 23 2016, 11:43 AM
labath updated this revision to Diff 69088.Aug 24 2016, 2:07 AM
labath edited edge metadata.

Sounds good. I like that this enables us to get rid of the NoLock suffix.

Maybe eventually we could event replace the runtime assert with some sort of a static lock
discipline checking framework, where you annotate functions with what locks it expects (I have no
experience with it, but I understand clang supports something like that).

The IsAcquired function is already present in the form of the explicit operator bool (I just needed to make it const).

clayborg accepted this revision.Aug 24 2016, 8:41 AM
clayborg edited edge metadata.

Looks good.

This revision is now accepted and ready to land.Aug 24 2016, 8:41 AM
This revision was automatically updated to reflect the committed changes.
nitesh.jain added inline comments.Aug 30 2016, 3:10 AM
lldb/trunk/source/Plugins/Process/gdb-remote/GDBRemoteRegisterContext.cpp
131

Hi labath,

This patch cause deadlock when we try to run "reg read f0". The GDBRemoteRegisterContext::ReadRegister is call twice which result in deadlock since earlier acquire lock has not been release.

  1. To Evaluate Dwarf expression so that floating register size can be determine at run time. It add overhead of reading one more register which cause lock to acquire and forgot to release at the end.
  1. To read register f0. This time it try to acquire lock which is not been release resulting in deadlock.

-NJ

Forgot to mention that this case has been observed for MIPS architecture. Since for MIPS, the floating point register size is calculated at runtime.

-NJ

Thanks for the heads up. I'll need to give this some thought. I've pulled the change out until I can figure out what to do.

BTW, have you guys considered putting up a mips buildbot to detect these kinds of things?

Thanks for the heads up. I'll need to give this some thought. I've pulled the change out until I can figure out what to do.

BTW, have you guys considered putting up a mips buildbot to detect these kinds of things?

Thanks. Right now we are trying to fix big-endain failures and then will setup buildbot for both.

ted added a subscriber: ted.Oct 12 2016, 10:33 AM

I'm seeing the same issue as Nitesh, except in a different spot. The Hexagon Simulator doesn't support QSaveRegisterState, so lldb calls GDBRemoteRegisterContext::ReadAllRegisterValues. It gets a lock on the communication client, then calls ReadAllRegisters, which tries to get a lock on the communication client, and fails (hangs on Linux, crashes on Windows).

The revert fixed the problem.