Page MenuHomePhabricator

Thread safety analysis: Require exclusive lock for passing by non-const reference
Changes PlannedPublic

Authored by aaronpuchert on Sep 22 2018, 11:29 AM.

Details

Summary

When passing by reference, we check if the reference is const-qualified
and if it isn't, we demand an exclusive lock. Unlike checking const
qualifiers on member functions, there are probably not many false
positives here: if a function takes a non-const reference, it will
in almost all cases modify the object that we passed it.

Diff Detail

Event Timeline

aaronpuchert created this revision.Sep 22 2018, 11:29 AM

Unlike checking const qualifiers on member functions, there are probably not many false positives here: if a function takes a non-const reference, it will in almost all cases modify the object that we passed it.

I'm not certain I agree with the predicate here. We can make that inference when there *is* a const qualifier on the parameter, but I don't I think we can make any assumptions about whether it will or won't modify the object passed in in the absence of a const qualifier. This has come up in the past for things like C(C&) being a valid copy constructor despite the parameter being non-const. We might need some data to back this assertion up.

lib/Analysis/ThreadSafety.cpp
2030–2031

This isn't specific to your changes, but this gives me the impression we don't distinguish between rvalue references and lvalue references, but that may be something of interest in here.

While most people probably just use ordinary mutexes, hence won't be affected, those that use read/write locks need to know when to use a shared and when to use an exclusive lock. What makes things hard in C++ is that through passing by non-const reference, an object can actually be modified although it looks like it's only passed as a parameter. That makes it really hard to decide whether a shared lock is sufficient just by reading the code, which was a major pain point in our project. We had to do very careful reviews to make sure we had the right kind of lock in every function. I'd like to offload some of that work to the compiler.

That's the main reason for this change: it's pretty easy to find out where a variable is used, but it's hard to say which uses are reads and which are writes in modern C++, if it weren't for const.

Unlike checking const qualifiers on member functions, there are probably not many false positives here: if a function takes a non-const reference, it will in almost all cases modify the object that we passed it.

I'm not certain I agree with the predicate here. We can make that inference when there *is* a const qualifier on the parameter, but I don't I think we can make any assumptions about whether it will or won't modify the object passed in in the absence of a const qualifier. This has come up in the past for things like C(C&) being a valid copy constructor despite the parameter being non-const.

For const qualifiers on member functions, the concern is that I might have a shared lock on a data structure like a std::vector and then I read stuff from that vector. But unless I'm reading the vector from a const member function, or through a const reference, I'm going to use the non-const operator[]. That operator doesn't actually modify anything, but it returns a reference that allows me to modify stuff. However, I'm not doing any of that, so why should there be a warning. I think that's a valid concern.

Effectively we have two overloads that are both not actually modifying anything, but the non-const variant returns a reference that does allow us to modify something. My thinking was that this pattern is not so common with function parameters. The function(s) will return a reference to a part of an object, and such references will typically come from member functions for which we don't enforce this. So even if I use a non-modifying <algorithm> like std::find, there'll be no warning because the iterators come from container members begin and end. (By the way, that would be easy to fix by using cbegin and cend.)

The way I see it, there are basically three cases of false positives:

  • The function doesn't actually modify the object, but takes it as non-const reference nevertheless. That can be easily fixed though and I think it's a fix that doesn't hurt.
  • The function modifies the object sometimes, but here the user is very certain that it won't. That might be indication of a poor design, in any event it's very fragile. As a developer, I would feel very anxious about this, especially in a larger code base.
  • We have two overloads of a function, both don't modify the object, but the non-const variant returns a non-const reference to some part of the object. This isn't backed up by data, but I think this doesn't happen nearly as often as with member functions. There are several fixes available as well: one could make sure that we only have a const reference there, or make the function a const member, or if all else fails, by taking a const reference to the object we don't want to modify and then call the function with that as argument. That's an additional line of code like const auto &cref = obj; which I think will rarely be needed.

So basically my argument is that this can catch really nasty bugs, and false positives should be rare. If they occur, they can be fixed by making the code more const-correct. Such fixes should be easy to sell, because everybody loves const-correctness, and it doesn't require any annotations.

We might need some data to back this assertion up.

Makes sense to me, but I don't have access to the largest code base in terms of thread safety analysis. Maybe @delesley can help here? Or I need to apply for a job at Google. On our own code, we're still in the very early stages of annotating, so I can't provide reliable data from there. I've seen thread safety analysis in flutter and Chromium, but both don't seem to use read/write locks.

lib/Analysis/ThreadSafety.cpp
2030–2031

You mean & vs &&? I don't that makes a difference here, the important property is whether it's const. So const& is a read, and & and && are writes, and const&& doesn't really make sense. (It's effectively the same as const& for all I know.)

But there is one error here: I definitely have to look at the canonical type, i.e. it should be &*Qt.getCanonicalType() instead of &*Qt.

I have mixed feelings about this patch. When you pass an object to a function, either by pointer or by reference, no actual load from memory has yet occurred. Thus, there is a real risk of false positives; what you are saying is that the function *might* read or write from protected memory, not that it actually will. In fact, the false positive rate is high enough that this particular warning is sequestered under -Wthread-safety-reference, and is not used with warnings-as-errors at Google.

In theory, the correct way to deal with references is to make GUARDED_BY an attribute of the type, rather than an attribute of the data member; that way taking the address of a member, or passing it by reference, wouldn't cast away the annotation. But that would require a properly parametric dependent type system for C++, which is pretty much impossible. So you're left with two bad options: either issue a warning eagerly, and deal with false positives, or suppress the warning, and deal with false negatives. At Google, false positives usually break the build, which has forced me to prefer false negatives in most cases; my strategy has been to gradually tighten the analysis bit by bit. Thankfully, that's less of a concern here.

Adding to the difficulty is the fact that the correct use of "const" in real-world C++ code is spotty at best. There is LOTS of code that arguably should be using const, but doesn't for various reasons. E.g. if program A calls library B, and library B forgot to provide const-qualified versions of a few methods, than A has to make a choice: either drop the const qualifier, or insert ugly const-casts, neither of which is pleasant. In other-words, non-constness has a tendency to propagate through the code base, and you can't depend on "const" being accurate.

I have two recommendations. First think the default behavior should be to require only a read-only lock, as it currently does. That's a compromise between the "false-positive" and "false-negative" camps. It catches a lot of errors, because you have to hold the lock in some way, but doesn't require accurate const-ness. For people who want more accuracy, there should be a different variant of the warning -- maybe -Wthread-safety-reference-precise? Between beta/precise/reference etc. there are a now a lot of analysis options, and I would love to consolidate them in some fashion. However, different code bases have different needs, and I think "one size fits all" is not really appropriate.

Second, there needs to be a thread-safety-variant of "const_cast" -- i.e. a way to pass a reference to a function without triggering the warning. That may already be possible via clever use of our current annotations, or you may have to fiddle with something, but it needs to appear in the test cases.

With respect to data, I really think these patches should be tested against Google's code base, because otherwise you're going to start seeing angry rollbacks. However, I don't work on the C++ team any more, and I don't have time to do it. When I was actively developing the analysis, I spent about 90% of my time just running tests and fixing the code base. Each incremental improvement in the analysis itself was a hard upward slog. If you're going to be adding lots of improvements, we need to have someone at Google running point.

With respect to data, I really think these patches should be tested against Google's code base, because otherwise you're going to start seeing angry rollbacks. However, I don't work on the C++ team any more, and I don't have time to do it. When I was actively developing the analysis, I spent about 90% of my time just running tests and fixing the code base. Each incremental improvement in the analysis itself was a hard upward slog. If you're going to be adding lots of improvements, we need to have someone at Google running point.

Do you have a recommendation for who this point person could be? What do we do if we cannot find such a person? (I assume we won't halt progress on thread safety analysis and that propose->review->accept->commit->revert is not an acceptable workflow.)

I've thought about your concerns and the important thing is: I didn't actually test this on a larger code base. The change makes sense to me, but maybe I'm missing something. So maybe instead of pushing through further restrictions, I should focus on rolling out the analysis on our own messy code base. Maybe I'm going to see the false positives right there.

However, as messy as our code may be, it might be messy in different ways than Google's code. So I might not see the same issues. Then on the other hand, there might be (probably smaller) code bases which don't have a problem with stricter rules. So I think additional flags are probably not the worst idea, but we need to keep their number small.

Second, there needs to be a thread-safety-variant of "const_cast" -- i.e. a way to pass a reference to a function without triggering the warning. That may already be possible via clever use of our current annotations, or you may have to fiddle with something, but it needs to appear in the test cases.

There is such a mechanism in the test suite. It uses that we don't check the arguments for functions annotated with no_thread_safety_analysis, introduced by r246806:

// Takes a reference to a guarded data member, and returns an unguarded
// reference.
template <class T>
inline const T& ts_unchecked_read(const T& v) NO_THREAD_SAFETY_ANALYSIS {
  return v;
}

template <class T>
inline T& ts_unchecked_read(T& v) NO_THREAD_SAFETY_ANALYSIS {
  return v;
}

For future patches, please add Richard Trieu (rtrieu@google.com) as a subscriber. I will continue to try and do code reviews, but Richard is on the team that actually rolls out new compiler changes. Thanks!

BTW, the issue is not just that changes may introduce false positives. Even if it's a true positive, you risk breaking the build.

Rebase on top of D52443. We also check the move constructor argument for write access, as suggested in a review.

This isn't intended to be merged (yet?), it should be seen as an RFC.

aaronpuchert planned changes to this revision.May 22 2019, 10:01 AM

I don't actually want to change anything, but remove this from your review queues until I have an idea how often the warning fires and how many false positives we have.

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2019, 10:01 AM