This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] implement concurrency-mt-unsafe
ClosedPublic

Authored by segoon on Nov 6 2020, 7:05 AM.

Details

Summary

Checks for some thread-unsafe functions against a black list of known-to-be-unsafe functions. Usually they access static variables without synchronization (e.g. gmtime(3)) or utilize signals in a racy way (e.g. sleep(3)).

The patch adds a check instead of auto-fix as thread-safe alternatives usually have API with an additional argument (e.g. gmtime(3) v.s. gmtime_r(3)) or have a different semantics (e.g. exit(3) v.s. __exit(3)), so it is a rather tricky or non-expected fix.

An option specifies which functions in libc should be considered thread-safe, possible values are posix, glibc, or any (the most strict check). It defaults to 'any' as it is unknown what target libc type is - clang-tidy may be run on linux but check sources compiled for other *NIX.

The check is used in Yandex Taxi backend and has caught many unpleasant bugs. A similar patch for coroutine-unsafe API is coming next.

Diff Detail

Event Timeline

segoon created this revision.Nov 6 2020, 7:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 6 2020, 7:05 AM
segoon requested review of this revision.Nov 6 2020, 7:05 AM
lebedev.ri added inline comments.
clang-tools-extra/test/clang-tidy/checkers/misc-mt-unsafe.cpp
3–4

Tests should be hermetic, they can not use headers from system

segoon updated this revision to Diff 303499.Nov 6 2020, 10:37 AM

don't include any system headers in tests

segoon marked an inline comment as done.Nov 6 2020, 10:38 AM
segoon added inline comments.
clang-tools-extra/test/clang-tidy/checkers/misc-mt-unsafe.cpp
3–4

fixed

lebedev.ri set the repository for this revision to rG LLVM Github Monorepo.
lebedev.ri added a project: Restricted Project.

It appears to not check for signs that the code is running in a multi threaded manner, This will result in many false positives in code that is known to be single threaded.

clang-tools-extra/clang-tidy/misc/MtUnsafeCheck.cpp
20

Don't use an anonymous namespace, just make the decls contained static.

200

I don't think this range needs appending onto the diagnostic.

clang-tools-extra/docs/clang-tidy/checks/list.rst
33

Can you undo this change and all other unrelated changes to this file.

segoon marked an inline comment as done.Nov 6 2020, 12:09 PM
segoon added inline comments.
clang-tools-extra/docs/clang-tidy/checks/list.rst
33

These changes are added by ./add_new_check.py. I guess the file was altered by hands w/o using the script the last time(s).

njames93 added inline comments.Nov 6 2020, 12:19 PM
clang-tools-extra/docs/clang-tidy/checks/list.rst
33

I'm aware of that. but these changes are unrelated and shouldn't be a part of the PR.

segoon added a comment.EditedNov 6 2020, 12:54 PM

It appears to not check for signs that the code is running in a multi threaded manner, This will result in many false positives in code that is known to be single threaded.

I'm not sure there is a trustworthy check whether a source is going to be used in MT environment. A program can be linked with threads libraries but still use a single thread and use MT-unsafe API with no problem. I'd rather disable the check by default and let the user explicitly enable the check. I'm not sure what's clang-tidy policy for such case - should I create a new module instead of "misc" (given that llvm itself and probably many other users manually enable "misc-*")?

segoon updated this revision to Diff 303532.Nov 6 2020, 1:11 PM
  • use static instead of namespace {}
  • don't use SourceRange()
  • revert unrelated changes to .rst

While i share concern about false-positives, it is literally impossible to avoid them here,
and this should be viewed more as an enforcement tool (don't use thread-unsafe fns),
not bug-detection check.

What i would however like to be improved, is better docs.
I'm guessing mt-unsafe is MT-Unsafe from https://www.gnu.org/software/libc/manual/html_node/POSIX-Safety-Concepts.html ?
Does this fully cover all glibc functions? POSIX? Etc?

segoon updated this revision to Diff 304189.Nov 10 2020, 7:56 AM
segoon edited the summary of this revision. (Show Details)
  • add Libc option
  • improve docs
segoon updated this revision to Diff 305460.Nov 16 2020, 3:48 AM
  • minor changes to docs
segoon added a comment.EditedNov 16 2020, 3:50 AM

What i would however like to be improved, is better docs.

I hope I've addressed your questions in documentation. Please tell me whether you still have any unanswered questions.

Just a general drive by point, is it wise to add a new tidy module to clang-tidy called threads (or similar).
We have got a few other checks related to multi-threaded code in the pipeline (D77493, D75229) that would fit perfectly for the same module.
Not to mention it would make it a lot easier to find checks specifically designed for multi-threaded workloads.

Libc option name doesn't really make sense to me, maybe FunctionSet would fit better.

clang-tools-extra/clang-tidy/misc/MtUnsafeCheck.cpp
33–34

I would expect this can be outside of all the namespaces right after using namespace clang::ast_matchers;

288–289

Likewise, i would expect this can be outside of namespaces

296

return anyOf(hasAnyName(posixFunctions), hasAnyName(glibcFunctions));

310–311

Is there any way to invert the direction of this matcher,
instead of checking each call that it's callee isn't one of the bad ones,
look through all function decls, and for all the bad ones, diagnose all calls to them?

clang-tools-extra/docs/ReleaseNotes.rst
111

<...>, Currently knows about POSIX and GLIBC.

or something along those lines

segoon added inline comments.Nov 17 2020, 4:55 AM
clang-tools-extra/clang-tidy/misc/MtUnsafeCheck.cpp
296

It would be a bit slower than now - the current code removes duplicates. If/when other FunctionSets are added, duplicate list will be much bigger.

310–311

I'm not sure it is possible without additional costs - it would require somehow marking bad FunctionDecl, it requires an addition of state.

I'm not an expert in LLVM AST, maybe it is very cheap and easy to do, but I don't see how, do you?

Also can this be ran through clang-tidy, feeling a few naming violations are in here. If you use arc to upload your patches you'll get lint warnings about clang-tidy and clang-format.

clang-tools-extra/clang-tidy/misc/MtUnsafeCheck.cpp
280–283

Would it not be faster to get rid of the set. Instead insert all the items in to a vector, then sort and unique it

lebedev.ri added inline comments.Nov 17 2020, 5:29 AM
clang-tools-extra/clang-tidy/misc/MtUnsafeCheck.cpp
280–283

I can't imagine this will have measurable performance difference.
The main problem is algorithmic, https://reviews.llvm.org/D90944#inline-853246

But my main point is https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors

segoon updated this revision to Diff 305764.Nov 17 2020, 6:23 AM
segoon retitled this revision from [clang-tidy] implement misc-mt-unsafe to [clang-tidy] implement concurrent-mt-unsafe.
  • move plugin to concurrent group
  • naming fixes
  • use anyOf
  • Libc -> FunctionSet
  • move plugin to concurrent group

While this is definitely a step in the right direction, it should be a separate patch to introduce the new module. After making that, set this to be a child of it.

segoon updated this revision to Diff 306020.Nov 18 2020, 2:45 AM
segoon retitled this revision from [clang-tidy] implement concurrent-mt-unsafe to [clang-tidy] implement concurrency-mt-unsafe.
  • concurrent -> concurrency
  • split the patch apart

Looks good to me, but i'm still uneasy about doing the checking in inverted sense.

clang-tools-extra/clang-tidy/concurrency/MtUnsafeCheck.cpp
31–32 ↗(On Diff #306020)

I commented, but the comment was lost; do these arrays really have to be in namespace?
I'd think they can be right afterwards using namespace clang::ast_matchers;.

clang-tools-extra/clang-tidy/concurrency/MtUnsafeCheck.h
9 ↗(On Diff #306020)

s/CONCURRENT/CONCURRENCY/, maybe elsewhere too

segoon updated this revision to Diff 307884.Nov 26 2020, 8:48 AM
  • move static vectors out of namespace
  • mark mt-unsafe decls and check for marks in exprs
  • move static vectors out of namespace

Nice.

  • mark mt-unsafe decls and check for marks in exprs

Eeeh.
I was thinking of either some smart matcher "match any function declaration
with name from these lists, and then match every call to said decl".
But the current implementation, i'm not sure this approach is even legal for checks.
So i guess let's stick with the original solution..

  • mark mt-unsafe decls and check for marks in exprs

Eeeh.
I was thinking of either some smart matcher "match any function declaration
with name from these lists, and then match every call to said decl".

I tried to utilize bind()+equalsBoundNode(), but it seems it's impossible to mark and use the mark in a single matcher.

But the current implementation, i'm not sure this approach is even legal for checks.

The trick is stolen from abseil/UpgradeDurationConversionsCheck.h. If it's invalid here, then abseil should be fixed too.

segoon updated this revision to Diff 307903.Nov 26 2020, 9:56 AM
  • revert decls marking
lebedev.ri accepted this revision.Nov 26 2020, 10:25 PM

Please upload all patches with full context (-U9999)
I'm guessing you'll need help committing this, in which case please specify Author <e@ma.il> to be used for git commit --author="<>"

Thanks, looks good to me then.
Does anyone else have any further thoughts here?

  • mark mt-unsafe decls and check for marks in exprs

Eeeh.
I was thinking of either some smart matcher "match any function declaration
with name from these lists, and then match every call to said decl".

I tried to utilize bind()+equalsBoundNode(), but it seems it's impossible to mark and use the mark in a single matcher.

But the current implementation, i'm not sure this approach is even legal for checks.

The trick is stolen from abseil/UpgradeDurationConversionsCheck.h. If it's invalid here, then abseil should be fixed too.

The problem with that approach isn't so much that it stores state in check (i guess that is fine in general - see preprocessor handling callbacks),
but that

  1. is it guaranteed that we'll match all the decl before first callinst to that decl?
  2. and what's worse, this approach wouldn't actually solve the problem, because it still matches all the callinsts, it just caches "is bad callee" check somewhat..
This revision is now accepted and ready to land.Nov 26 2020, 10:25 PM
segoon updated this revision to Diff 307981.Nov 27 2020, 12:55 AM
  • git diff -U9999

Please upload all patches with full context (-U9999)

Done.

I'm guessing you'll need help committing this, in which case please specify Author <e@ma.il> to be used for git commit --author="<>"

Vasily Kulikov <segoon@yandex-team.ru>

This revision was automatically updated to reflect the committed changes.