This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Restrict CallDescription fuzzy builtin matching
ClosedPublic

Authored by steakhal on Jan 27 2022, 9:13 AM.

Details

Summary

CallDescriptions for builtin functions relaxes the match rules
somewhat, so that the CallDescription will match for calls that have
some prefix or suffix. This was achieved by doing a StringRef::contains().
However, this is somewhat problematic for builtins that are substrings
of each other.

Consider the following:

CallDescription{ builtin, "memcpy"} will match for
__builtin_wmemcpy() calls, which is unfortunate.

This patch addresses/works around the issue by checking if the
characters around the function's name are not part of the 'name'
semantically. In other words, to accept a match for "memcpy" the call
should not have alphanumeric ([a-zA-Z]) characters around the 'match'.

So, CallDescription{ builtin, "memcpy"} will not match on:

  • __builtin_wmemcpy: there is a w alphanumeric character before the match.
  • __builtin_memcpyFOoBar_inline: there is a F character after the match.
  • __builtin_memcpyX_inline: there is an X character after the match.

But it will still match for:

  • memcpy: exact match
  • __builtin_memcpy: there is an _ before the match
  • __builtin_memcpy_inline: there is an _ after the match
  • memcpy_inline_builtinFooBar: there is an _ after the match

Diff Detail

Event Timeline

steakhal created this revision.Jan 27 2022, 9:13 AM
steakhal requested review of this revision.Jan 27 2022, 9:13 AM
steakhal added inline comments.Jan 28 2022, 1:21 AM
clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
58

std::size_t

NoQ added inline comments.Jan 28 2022, 11:35 AM
clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
107–111

Do these need fixing as well?

Though honestly I'd remove isCLibraryFunction() as an API entirely, people should just use CallDescription for everything.

NoQ accepted this revision.Jan 28 2022, 11:35 AM

Looks great overall!

This revision is now accepted and ready to land.Jan 28 2022, 11:35 AM

Looks great overall!

Thanks!

clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
107–111

Probably yes. I dont know how i missed this. I probably tunnel visioned too much.

Btw CDs are calling back to this API. So unless we settle on the exact semantics for builtins we should keep it. Do you have anything specific in mind?

martong added inline comments.Feb 1 2022, 9:31 AM
clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
107–111

Good catch! I am suggesting to move your new logic that replaces contains into a member function (or lambda?) and then use that in the remaining two cases.

This revision was landed with ongoing or failed builds.Feb 11 2022, 1:45 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2022, 1:45 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I decided to land it as-is.
I tried to refactor the rest of the isCLibraryFunction() to reuse the new detection logic, but I run into issues by doing that.
BTW I'm not sure if we should support anything besides the builtins defined in clang/include/clang/Basic/Builtins*.def files.
Given that the parser should recognize these and use the appropriate Builtin for the FunctionDecl, getBuiltinID() should return the given ID.
Anyway. I decided to take it slowly and fix only the issue causing the crashes.