This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Protect OptionValue accesses from data races
ClosedPublic

Authored by augusto2112 on Aug 3 2023, 2:48 PM.

Details

Summary

Thread sanitizer is catching data races in OptionValue, protect accesses
to OptionValue with a mutex.

Diff Detail

Event Timeline

augusto2112 created this revision.Aug 3 2023, 2:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 2:48 PM
augusto2112 requested review of this revision.Aug 3 2023, 2:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 2:48 PM

A few questions:

  • Do all of these need to be protected with a mutex? In your description you're saying TSan is detecting data races. What piece of data are you observing the data race on?
  • Do we need a recursive mutex? I assume that these operations might call into each other, but if not it would be nice to just have it be a std::mutex.

I think your implementation is already correct, these questions are more geared towards performance (i.e. not locking more than we need to, using a more efficient mutex type, etc.)

Do all of these need to be protected with a mutex? In your description you're saying TSan is detecting data races. What piece of data are you observing the data race on?

Only SetAsBoolean/GetAsBoolean are being caught when running our test suite at the moment.

Do we need a recursive mutex? I assume that these operations might call into each other, but if not it would be nice to just have it be a std::mutex.

If we want to protect all accesses then yes, since these functions call each other. It we decide to only lock against what's being caught by tsan then we can get by with a regular mutex.

kastiglione added inline comments.
lldb/source/Interpreter/OptionValue.cpp
40

this one doesn't need a lock

353–354

this one doesn't have a lock, should it?

Do all of these need to be protected with a mutex? In your description you're saying TSan is detecting data races. What piece of data are you observing the data race on?

Only SetAsBoolean/GetAsBoolean are being caught when running our test suite at the moment.

Do we need a recursive mutex? I assume that these operations might call into each other, but if not it would be nice to just have it be a std::mutex.

If we want to protect all accesses then yes, since these functions call each other. It we decide to only lock against what's being caught by tsan then we can get by with a regular mutex.

Only SetAsBoolean and GetAsBoolean are being caught right now, I see. The implementation of those methods really isn't any different (AFAICT) from the other methods so I would assume we would need to protect all the other Get and Set methods as well then, makes sense. I'd probably keep the recursive mutex for now then.

kastiglione added inline comments.Aug 3 2023, 4:25 PM
lldb/source/Interpreter/OptionValue.cpp
46

do these GetAsX functions need to lock?

augusto2112 marked 2 inline comments as done.Aug 3 2023, 5:16 PM
augusto2112 added inline comments.
lldb/source/Interpreter/OptionValue.cpp
46

Hmm, I think we can get away with just locking GetXValue and SetXValue, and replace the lock with a non recursive one.

aprantl added inline comments.Aug 4 2023, 9:11 AM
lldb/source/Interpreter/OptionValue.cpp
52

If you are following @kastiglione 's suggestion from above (I'd be fine with choosing symmetry over performance) then this method (and the ones below) also doesn't need any locks, since it just calls other thread-safe methods.

JDevlieghere added a comment.EditedAug 4 2023, 10:43 AM

Despite their prevalence in LLDB, I still consider recursive_mutex a potential code smell. So +1 on avoiding the need for a recursive mutex.

Addressed comments

JDevlieghere accepted this revision.Aug 4 2023, 11:55 AM

LGTM. If it's straightforward, I think would be nice to have a unit test with two threads modifying the same OptionValue. That way a TSan run would catch this issue. If that's more work than expected then this is fine as is.

This revision is now accepted and ready to land.Aug 4 2023, 11:55 AM

Remove lock from DeepCopy

If it's straightforward, I think would be nice to have a unit test with two threads modifying the same OptionValue. That way a TSan run would catch this issue. If that's more work than expected then this is fine as is.

We might just want to set up a bot that runs the entire test suite under TSan, assuming that we can get it clean?

If it's straightforward, I think would be nice to have a unit test with two threads modifying the same OptionValue. That way a TSan run would catch this issue. If that's more work than expected then this is fine as is.

We might just want to set up a bot that runs the entire test suite under TSan, assuming that we can get it clean?

Definitely, but last I checked there was still quite some work to be done :-)

This revision was automatically updated to reflect the committed changes.