This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Fix data race in NativeFile
ClosedPublic

Authored by JDevlieghere on Aug 7 2023, 3:49 PM.

Details

Summary

TSan reports the following data race:

Write of size 4 at 0x000109e0b160 by thread T2 (mutexes: write M0, write M1):
  #0 lldb_private::NativeFile::Close() File.cpp:329 (liblldb.18.0.0git.dylib:arm64+0x58dac0)
  #1 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) ConnectionFileDescriptorPosix.cpp:232 (liblldb.18.0.0git.dylib:arm64+0x5bad6c)
  #2 lldb_private::Communication::Disconnect(lldb_private::Status*) Communication.cpp:61 (liblldb.18.0.0git.dylib:arm64+0x451f04)
  #3 lldb_private::process_gdb_remote::ProcessGDBRemote::DidExit() ProcessGDBRemote.cpp:1164 (liblldb.18.0.0git.dylib:arm64+0xadfd80)
  #4 lldb_private::Process::SetExitStatus(int, char const*) Process.cpp:1097 (liblldb.18.0.0git.dylib:arm64+0x6ed338)
  #5 lldb_private::process_gdb_remote::ProcessGDBRemote::MonitorDebugserverProcess(std::__1::weak_ptr<lldb_private::process_gdb_remote::ProcessGDBRemote>, unsigned long long, int, int) ProcessGDBRemote.cpp:3387 (liblldb.18.0.0git.dylib:arm64+0xae9464)

Previous read of size 4 at 0x000109e0b160 by main thread (mutexes: write M2):
  #0 lldb_private::NativeFile::IsValid() const File.h:393 (liblldb.18.0.0git.dylib:arm64+0x590830)
  #1 lldb_private::ConnectionFileDescriptor::IsConnected() const ConnectionFileDescriptorPosix.cpp:121 (liblldb.18.0.0git.dylib:arm64+0x5b8ab4)
  #2 lldb_private::Communication::IsConnected() const Communication.cpp:79 (liblldb.18.0.0git.dylib:arm64+0x451fcc)
  #3 lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(StringExtractorGDBRemote&, lldb_private::Timeout<std::__1::ratio<1l, 1000000l>>, bool) GDBRemoteCommunication.cpp:256 (liblldb.18.0.0git.dylib:arm64+0xaac060)
  #4 lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(StringExtractorGDBRemote&, lldb_private::Timeout<std::__1::ratio<1l, 1000000l>>, bool) GDBRemoteCommunication.cpp:244 (liblldb.18.0.0git.dylib:arm64+0xaabe78)
  #5 lldb_private::process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponseNoLock(llvm::StringRef, StringExtractorGDBRemote&) GDBRemoteClientBase.cpp:246 (liblldb.18.0.0git.dylib:arm64+0xaaa184)

I tried fixing the problem at the ConnectionFileDescriptor level, but the underlying problem is that NativeFile isn't thread safe. NativeFile can hold a file descriptor and/or a file stream. Throughout its implementation, it would check if the descriptor or stream is valid and than do something based on that. While that works in a single threaded environment, nothing prevents another thread from modifying the descriptor or stream between the IsValid check and when it's actually being used. This patch prevents such issues by returning a ValueGuard RAII object. As long as the object is in scope, the value is guaranteed by a lock.

Diff Detail

Event Timeline

JDevlieghere created this revision.Aug 7 2023, 3:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2023, 3:49 PM
JDevlieghere requested review of this revision.Aug 7 2023, 3:49 PM
This revision is now accepted and ready to land.Aug 7 2023, 4:25 PM
bulbazord accepted this revision.Aug 7 2023, 4:26 PM

Makes sense to me.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2023, 6:38 PM
JDevlieghere reopened this revision.Aug 7 2023, 9:33 PM
This revision is now accepted and ready to land.Aug 7 2023, 9:33 PM
JDevlieghere requested review of this revision.Aug 7 2023, 9:43 PM
JDevlieghere updated this revision to Diff 548056.
JDevlieghere retitled this revision from [lldb] Fix data race in ConnectionFileDescriptor to [lldb] Fix data race in NativeFile.
JDevlieghere edited the summary of this revision. (Show Details)
JDevlieghere added a reviewer: jingham.
bulbazord added inline comments.Aug 8 2023, 2:27 PM
lldb/include/lldb/Host/File.h
434–442

Hmm, this should be okay since URVO is mandatory as of C++17 (meaning this will not perform a copy and potentially screw up the locking mechanism).

lldb/source/Host/common/File.cpp
223

nit: delete this stray semicolon.

328–329

I think you need to grab both mutexes before you can modify either one of them in Close. As an example:
Thread 1 calls Close.
Thread 2 calls GetStream.
Thread 1 grabs the stream_guard and does all the relevant work with it.
Thread 2 grabs the stream_guard and subsequently grabs the descriptor_guard. It opens the stream again and sets the descriptor to unowned.
Thread 1 resumes, grabs the descriptor_guard, but does nothing with it because it's unowned. However, it does reset the state of the other member variables.

In this case, the work that Thread 1 did to close the stream was undone by Thread 2. The File is not closed and is left is a strange half-initialized state.

Reading through this and thinking about it more, I wonder if Close is the right abstraction for a File that can be touched by multiple threads. If one thread closes it and the other is still trying to work with it, you could end up in a weird state no matter what else is going on. Maybe it would make more sense to have a File have a reference count? Users wouldn't call File::Close directly but just let it go out of scope and allow the destructor to handle resource management. That might be out of the scope of this patch though.

JDevlieghere marked an inline comment as done.

Address @bulbazord's code review feedback

JDevlieghere marked 2 inline comments as done.Aug 9 2023, 12:07 PM
bulbazord accepted this revision.Aug 9 2023, 12:12 PM

Looks right to me now. Thanks!

This revision is now accepted and ready to land.Aug 9 2023, 12:12 PM
augusto2112 accepted this revision.Aug 9 2023, 5:39 PM

I have one nit that I'll leave up to you whether it should be addressed now or not.

lldb/source/Host/common/File.cpp
560–563

I believe descriptor_guard will still be in scope in the else branch (weird scoping rules), but

624–627

One thing that's not super obvious for people reading this is that descriptor_guard is still in scope in the else branch (due to C++'s weird scoping rules when it comes to declaring variables inside if statements), and will keep the descriptor mutex locked which is probably not what you intended. I don't think this affects the correctness of this implementation at the moment, but could be dangerous with further changes on top of this one.

bulbazord requested changes to this revision.Aug 9 2023, 5:44 PM
bulbazord added inline comments.
lldb/source/Host/common/File.cpp
624–627

If that's true, then this change needs further adjustment. GetStream acquires the ValueGuards in the opposite order, meaning Thread 1 can call this function and acquire descriptor_guard while Thread 2 can call GetStream and acquire stream_guard at the same time. Then they're both waiting on the other lock (deadlock).

This revision now requires changes to proceed.Aug 9 2023, 5:44 PM
augusto2112 added inline comments.Aug 9 2023, 5:46 PM
lldb/source/Host/common/File.cpp
624–627

You're right, nice catch.

JDevlieghere marked 4 inline comments as done.

Limit scope of ValueGuard by avoiding else if.

lldb/source/Host/common/File.cpp
624–627

Thanks, I didn't know about that and that was certainly not the intention.

augusto2112 accepted this revision.Aug 9 2023, 8:22 PM
This revision is now accepted and ready to land.Aug 10 2023, 10:24 AM
This revision was automatically updated to reflect the committed changes.