This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Tighten up known library function signature tests (PR #56463)

Authored by msebor on Jul 15 2022, 7:03 PM.



The discussion in PR #56463 points out that LLVM might abort when trying to simplify a call to a known library function when it's declared with an unexpected/incompatible signature. The root cause of the problems is inadequate compatibility checking by TargetLibraryInfoImpl::isValidProtoForLibFunc where only some arguments are subject to having their types checked, and the checks for the numbers of arguments are overly permissive, allowing for calls with either fewer or more arguments. This is a recurring problem with reports of similar crashes in PR #50836, #50180, #50850, #50885, 50846, and possibly others.

This change tightens up the checking for a number of library functions, although more still remain.

Diff Detail

Unit TestsFailed

Event Timeline

msebor created this revision.Jul 15 2022, 7:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2022, 7:03 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
msebor requested review of this revision.Jul 15 2022, 7:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2022, 7:03 PM
nikic added a subscriber: nikic.Jul 18 2022, 7:52 AM
nikic added inline comments.

& -> && (and below)


One of those is probably supposed to be getParamType(0)?


You probably want --match-full-lines to make sure there are no trailing attributes.


-opaque-pointers flag is not needed



msebor marked 3 inline comments as done.Jul 18 2022, 8:33 AM

Thanks for the comments. I actually wasn't quite sure the patch was ready for review because there are more functions where the checking should be tightened up. But since you've already gone through it let me ask a few questions:

  1. Should I submit one patch with all of them or two or more smaller patches?
  2. The functions are not in any particular order, which tends to lead to repetitive checks in different places, and is also more error-prone. I've already moved some of them together (grouped by the header they're declared in) but I wonder: should I do it for all of them? (I.e., have groups by header with no repetitive checks in each group.)
  3. The argument checking is also exceedingly verbose with the differences often being just the argument number or type. These differences would be easier to see (and harder to miss when adding new checks) if we cut down on the amount of superfluous text (e.g., avoided calling getParamType(N) for each argument and instead provided an accessor, like operator[]). Would you support this change?

That's to make sure pointer type differences don't affect the test results.

msebor added inline comments.Jul 18 2022, 8:36 AM

Right. (And I wasn't even trying to make the point about the repetitive text obscuring the important difference.)

msebor edited the summary of this revision. (Show Details)Jul 18 2022, 9:07 AM
msebor added a reviewer: nikic.
msebor edited the summary of this revision. (Show Details)
msebor added a reviewer: spatel.

Thanks for cleaning this up. As noted, we get a steady drip of (mostly fuzzer, I think) bug reports from lax/wrong prototype checking.

If you can split this up into chunks, it will be easier to review and less likely to break something. Last time I made a change here, I think it wasn't clear if the fix was correct because there was some ambiguity between the C spec vs. LLVM types.

msebor updated this revision to Diff 447364.Jul 25 2022, 9:58 AM

Revision 2 of the patch modifies TargetLibraryInfoImpl::isValidProtoForLibFunc to use the same consistent approach to validate the vast majority of known library functions. It replaces the repetitive (verbose and error-prone) argument checking used for groups of functions with a single "table-driven" algorithm. For better organization it also rearranges the functions by the header they're declared in. (A further improvement here, to avoid duplication, might be also to arrange them within each header by their number of arguments.) Finally, this revision is parameterized on the result of getIntSize() to correctly validate int arguments in 16-bit environments rather than assuming the type is 32 bits everywhere.

My preferred choice of a data structure would have been a constant array to represent the signatures with LibFunc as an index to retrieve each in constant time, but this would have required keeping the signatures sorted which feels like too much of a maintenance burden. With the (hopefully) intuitive notation for the signatures, the large switch statement seems like enough of an improvement to readability to avoid the common problems.

I realize this is a large change unlike what @spatel requested. I was too far along into implementing it when I read the comment. Although it would have been possible (and still is) to split it up into chunks by progressively moving subsets of cases into a new, ever-growing function, it seems like a lot of intermediate work to get to the same final result. Based on the test suite failures I ran into while making the changes the coverage here seems pretty good, although there is room for improvement here as well.

A number of existing tests that rely on the uniqueness of distinct pointer types failed with this change because it intentionally doesn't preserve this aspect of the checking. I've adjusted them to pass, although I'm pretty sure the changes don't exercise the same failure modes as before. If this is a problem it's easy to preserve it/them until opaque types become the default.

There are a few outstanding gaps that I think should be plugged in a follow-up:

  • functions that expect an argument of type long successfully match any integer arguments (I couldn't find a getLongSize() equivalent of getIntSize()),
  • functions that expect long long assume it's 64 bits on all targets (this may true in practice but it would be nice to parameterize the code nonetheless)
  • functions that expect an argument of type long double match any floating argument (I didn't want to complicate the logic of matching the corresponding type on each target)
  • the special cases for the cabs and the sincospi families of functions could be generalized and made available to others (such as the div functions proposed in D129396).
nikic added a comment.Jul 25 2022, 1:51 PM

I like the general approach, but haven't looked through the details.

My preferred choice of a data structure would have been a constant array to represent the signatures with LibFunc as an index to retrieve each in constant time, but this would have required keeping the signatures sorted which feels like too much of a maintenance burden. With the (hopefully) intuitive notation for the signatures, the large switch statement seems like enough of an improvement to readability to avoid the common problems.

Just to throw it out there, one possible way to do this would be to include this information in TargetLibraryInfo.def. It already lists the signatures of all TLI functions, so it would make sense to also include them in machine-understandable form. For example, by adding a third macro:

/// size_t strlen(const char *s);

The likely best solution would be to switch TargetLibraryInfo.def to use TableGen (like we do for intrinsics), which would allow generating this information with a compact encoding. That would take additional effort though.


Right, but just using ptr is sufficient for that, the -opaque-pointers flag isn't needed (it's enabled by default).

msebor updated this revision to Diff 448697.Jul 29 2022, 12:17 PM

I like the idea of defining the signatures using the TLI_ macros in TargetLibraryInfo.def. It keeps most of the related bits together in the same place.

Revision 3 with the patch with the following changes:

  • provide definitions of known library function signatures in TargetLibraryInfo.def by means of two new macros: TLI_DEFINE_SIG and TLI_DEFINE_SIG_INTERNAL, analogous to the function names and the corresponding enumerated constants
  • replace a large switch statement in TargetLibraryInfoImpl::isValidProtoForLibFuncmacros with indexing into the array
  • tighten up type checking a bit further to avoid integers smaller than int matching it or larger
nikic added a comment.Aug 1 2022, 1:33 PM

Looks great, just some style nits.


Does the precessor allow writing something like defined(TLI_DEFINE_ENUM) + defined(TLI_DEFINE_STRING) + defined(TLI_DEFINE_SIG) == 1? Might be a clearer way to express this...


I'd suggest making this and similar functions TLI_DEFINE_SIG_INTERNAL(/* Manually checked */) or so, as the signature specified here will just get ignored...


I believe the standard formatting has a space before :.


I think it would be better to use a separate zero sentinel value (rather than reusing Void for the purpose). It's not like we need to save on enum elements here.


static bool matchType


Type *


I don't think separating one of the conditions out as an early return is really adding something here and would write this as return Ty->getNumElements() == 2 && T->getElementType() == ParamTy; Don't feel strongly about this though.


Stray space before comma


const auto & (or just drop the var)


Should be on same line.

nikic added inline comments.Aug 1 2022, 1:38 PM

It looks like mode_t is actually 16 bits on Darwin. Though I see you already use IntX, so that should still work fine. This test update is probably not needed?

msebor updated this revision to Diff 449444.Aug 2 2022, 2:37 PM
msebor marked 3 inline comments as done.

Revision 4 of the patch addressing style nits.


It sure does. Good suggestion!


Sounds good. Something like that was probably why I had /* TODO */ for cabs below and forgot to get back to it.


Void is set to zero in order to terminate the argument list, as in:


I don't mind making it a const reference but the element is an 8-byte struct so making a copy of it should have no overhead, if that's what you're concerned about.


It's not needed but it's wrong for all the other targets. I initially had mode_t, gid_t, uid_t, and pid_t all as Int and relaxed it only after I remembered that some kernels use integers of different sizes for some of them. I'd hope to see the sizes of all these types, as well as long, encoded for each target somewhere in TargetLibraryInfo.cpp and tested separately for each. Until that happens I've reverted this and added a comment explaining why i16 is being used here.

nikic accepted this revision.Aug 10 2022, 3:36 AM



These should be marked as manually checked as well.


I get that, my point here is that we should do something like End = 0, Void, rather than having Void both indicate a void return and the end of the list.


Function name should start with a lowercase letter (historically the convention was different, that's why you sometimes still see uppercase function names).


Missed this one.

This revision is now accepted and ready to land.Aug 10 2022, 3:36 AM
msebor marked 3 inline comments as done.Aug 10 2022, 1:44 PM
msebor added inline comments.

Void or Ellip must always be last (if there's room) so I'm not sure I see an advantage of adding another enumerator to mean the same thing.


I think there is a problem with this patch.
I made a comment here (as I didn't find this review at first :( :

I short: The commit not only tightens up the lib function lib tests, it also relaxes them! So in some cases we now identify calls as lib calls that LibCallSimplifier isn't prepared for.

Ka-Ka added a subscriber: Ka-Ka.Aug 11 2022, 4:46 AM
msebor closed this revision.Aug 15 2022, 9:07 AM

Patch committed in rG0dcfe7aa35cd.