This is an archive of the discontinued LLVM Phabricator instance.

[LibCalls] Add argument extension attributes to more functions.
ClosedPublic

Authored by jonpa on Apr 6 2022, 2:43 AM.

Details

Summary

This continues the work of having inferLibFuncAttributes() add the right extension attribute to i32 arguments (as started with https://reviews.llvm.org/D123030).

As far as I can tell this covers all the remaining functions handled in inferLibFuncAttributes() that is present on SystemZ (there may however still be more library functions not yet handled there per the comment in the bottom.).

Notes:

@under_IO_putc(i32, %opaque*) is not present on SystemZ and not handled.

@memset_chk(i8*, i32, i64, i64) is also not present/handled, but @__memset_chk() is, which I am not sure is correct....

Diff Detail

Unit TestsFailed

Event Timeline

jonpa created this revision.Apr 6 2022, 2:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 2:43 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
jonpa requested review of this revision.Apr 6 2022, 2:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 2:43 AM
efriedma added inline comments.Apr 6 2022, 12:06 PM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
607

Are uid_t/gid_t unsigned on all targets?

(I wouldn't mind just dropping LibFunc recognition for chown etc. if necessary...)

@under_IO_putc(i32, %opaque*) is not present on SystemZ and not handled.

This has the same signature as putc; please fix while you're here.

@memset_chk(i8*, i32, i64, i64) is also not present/handled, but @__memset_chk() is, which I am not sure is correct....

LibFunc_memset_chk refers to __memset_chk; see TargetLibraryInfo.def. I don't think memset_chk is a thing.

jonpa updated this revision to Diff 421704.Apr 9 2022, 2:09 AM

Patch updated per review (see inline commenting).

@under_IO_putc(i32, %opaque*) is not present on SystemZ and not handled.

This has the same signature as putc; please fix while you're here.

OK - it is now under the same switch case as putc (although still untested in annotate.ll as before). I also moved under_IO_getc into the getc handling as they are also identical.

llvm/lib/Transforms/Utils/BuildLibCalls.cpp
607

I see your point: since uid_t/gid_t (as far as I could see) are either unsigned or signed we cannot have a generic BuildLibCall utility for them with the current implementation.

If this is needed it seems to me we would need to add a target based flag similar to ShouldExtI32Param for this to properly determine the types of uid_t and gid_t.

All tests are however now passing after removing chown and lchown, so I did this instead now per your suggestion.

xbolva00 added a subscriber: xbolva00.EditedApr 9 2022, 2:13 AM

Previously, the SExt attribute was always added to the i32 ldexp* argument as it was expected to be ignored by targets not needing it. This patch now changes this so that it is only added for the targets that need it in the first place.

So required? required for correctness? Then is this right place to add them? With -O0, you will not get these attributes, no?

xbolva00 requested changes to this revision.Apr 9 2022, 2:19 AM
xbolva00 added inline comments.
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
607

Why do you need to drop it all? You added nothing so how this blocks you now?

llvm/test/Transforms/InferFunctionAttrs/annotate.ll
318–322

Do not remove test coverage.

This revision now requires changes to proceed.Apr 9 2022, 2:19 AM
jonpa added inline comments.Apr 9 2022, 2:32 AM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
607

It doesn't seem right to keep incorrect prototypes around if they are not fixed.

llvm/test/Transforms/InferFunctionAttrs/annotate.ll
318–322

see above

xbolva00 added inline comments.Apr 9 2022, 2:38 AM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
607

So it would be better to fix it properly.

Somebody in the future may reintroduce handling of chown…

Previously, the SExt attribute was always added to the i32 ldexp* argument as it was expected to be ignored by targets not needing it. This patch now changes this so that it is only added for the targets that need it in the first place.

So required? required for correctness? Then is this right place to add them? With -O0, you will not get these attributes, no?

^

jonpa added a comment.Apr 9 2022, 2:56 AM

Previously, the SExt attribute was always added to the i32 ldexp* argument as it was expected to be ignored by targets not needing it. This patch now changes this so that it is only added for the targets that need it in the first place.

So required? required for correctness? Then is this right place to add them? With -O0, you will not get these attributes, no?

Yes, required for correctness. The SystemZ ABI says that any integer argument less than 64 bits should be extended as appropriate (sext/zext) by the caller. If this is not done, wrong-code may result, and it has nothing to do with optimization levels.
See https://reviews.llvm.org/D112942 with (link to) discussion.

jonpa added inline comments.Apr 9 2022, 3:02 AM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
607

Eli wrote he thought we could remove them which made sense to me given the issue with uid_t/gid_t...

Do you see a need for them here to motivate adding a recogniziton of uid_t/gid_t types signedness per target?

Since no tests fail if they are removed (other than the immediate annotate.ll test), I thought this could be done later if needed...

@efriedma : I now actually wonder if you meant that these functions be removed also in TargetLibraryInfo.def and other places..?

xbolva00 added a comment.EditedApr 9 2022, 3:59 AM

Previously, the SExt attribute was always added to the i32 ldexp* argument as it was expected to be ignored by targets not needing it. This patch now changes this so that it is only added for the targets that need it in the first place.

So required? required for correctness? Then is this right place to add them? With -O0, you will not get these attributes, no?

Yes, required for correctness. The SystemZ ABI says that any integer argument less than 64 bits should be extended as appropriate (sext/zext) by the caller. If this is not done, wrong-code may result, and it has nothing to do with optimization levels.
See https://reviews.llvm.org/D112942 with (link to) discussion.

BuildLibCalls does not run under -O0 afaik .. so you wont get these attrs. So I dont think this is a correct place if my assumption holds.

And I also think that things will break for you if somebody uses eg. -fno-builtin-memccpy .. you will not get this ext attribute.

jonpa added a comment.Apr 11 2022, 1:30 AM

BuildLibCalls does not run under -O0 afaik .. so you wont get these attrs. So I dont think this is a correct place if my assumption holds.

I am quite sure there are more places to handle, but fixing BuildLibCalls seems like a step in the right direction, wouldn't you agree? This is an existing problem we are trying to fix by carefully finding and handling all call sites throughout the code base.

And I also think that things will break for you if somebody uses eg. -fno-builtin-memccpy .. you will not get this ext attribute.

I would be very grateful if you helped with this work by filing bug reports with test cases, and maybe even patches...

BuildLibCalls does not run under -O0 afaik .. so you wont get these attrs. So I dont think this is a correct place if my assumption holds.

I am quite sure there are more places to handle, but fixing BuildLibCalls seems like a step in the right direction, wouldn't you agree? This is an existing problem we are trying to fix by carefully finding and handling all call sites throughout the code base.

I dont stiĺ think this is a correct place.

If this attribute is required for correctness, you should write simple pass which adds this atrribute very soon in pipeline and runs even under -O0. Or @efriedma may propose something…

And I also think that things will break for you if somebody uses eg. -fno-builtin-memccpy .. you will not get this ext attribute.

I would be very grateful if you helped with this work by filing bug reports with test cases, and maybe even patches...

This is by design - if -ifno-builtin-xxx is used, a libcall is not annotated with any attribute here. This works fine and you are misusing BuildLibCalls and the solution is not to change when and how BuildLibCalls runs.

nikic added a subscriber: nikic.Apr 11 2022, 2:24 AM

I think there is a bit of confusion here, because this code (inferLibFuncAttributes) is used for two purposes: To add attributes to existing libcalls, and to add attributes to newly created libcalls. Adding signext/zeroext attributes is not necessary for existing libcalls, because this is a frontend responsibility -- if these attributes are absent (but are required by ABI) then that is a bug in the frontend, and there is no requirement for LLVM to "fix" that problem (though it also doesn't hurt if it does). On the other hand, if LLVM introduces new libcalls, then it is LLVM's responsibility to add necessary signext/zeroext attributes.

If we wanted to make that clearer, we could separate out this part to only add these attributes when creating new libcalls. That should also make it obvious that it's fine that this code doesn't run under O0, because we only care about libcalls introduced by optimizations.

jonpa added a comment.Apr 11 2022, 2:27 AM

I dont stiĺ think this is a correct place.

If this attribute is required for correctness, you should write simple pass which adds this atrribute very soon in pipeline and runs even under -O0. Or @efriedma may propose something…

It is not a question if it is required for correctness or not (even though not all targets have this in their ABIs). The SystemZ Clang Front End does this already, so there should not be any need for an early pass to do this. Basically all front ends emitting code for SystemZ are obliged to annotate call parameters with the proper extension attribute, but that is not what this patch is handling.

This is by design - if -ifno-builtin-xxx is used, a libcall is not annotated with any attribute here. This works fine and you are misusing BuildLibCalls and the solution is not to change when and how BuildLibCalls runs.

I would hope that with -fno-builtin the original call would be already be correctly emitted by Clang (if not, that's the place to fix it probably).

I don't understand your objections to this patch. I am not changing when or how BuildLibCalls is used, merely making sure it adds the needed parameter attributes for the targets that need it. If your concern is about chown() and lchown(), let's see what Eli has to say about that...

BuildLibCalls does not run under -O0 afaik .. so you wont get these attrs. So I dont think this is a correct place if my assumption holds.

I am quite sure there are more places to handle, but fixing BuildLibCalls seems like a step in the right direction, wouldn't you agree? This is an existing problem we are trying to fix by carefully finding and handling all call sites throughout the code base.

I dont stiĺ think this is a correct place.

If this attribute is required for correctness, you should write simple pass which adds this atrribute very soon in pipeline and runs even under -O0. Or @efriedma may propose something…

And I also think that things will break for you if somebody uses eg. -fno-builtin-memccpy .. you will not get this ext attribute.

I would be very grateful if you helped with this work by filing bug reports with test cases, and maybe even patches...

This is by design - if -ifno-builtin-xxx is used, a libcall is not annotated with any attribute here. This works fine and you are misusing BuildLibCalls and the solution is not to change when and how BuildLibCalls runs.

I dont stiĺ think this is a correct place.

If this attribute is required for correctness, you should write simple pass which adds this atrribute very soon in pipeline and runs even under -O0. Or @efriedma may propose something…

It is not a question if it is required for correctness or not (even though not all targets have this in their ABIs). The SystemZ Clang Front End does this already, so there should not be any need for an early pass to do this. Basically all front ends emitting code for SystemZ are obliged to annotate call parameters with the proper extension attribute, but that is not what this patch is handling.

This is by design - if -ifno-builtin-xxx is used, a libcall is not annotated with any attribute here. This works fine and you are misusing BuildLibCalls and the solution is not to change when and how BuildLibCalls runs.

I would hope that with -fno-builtin the original call would be already be correctly emitted by Clang (if not, that's the place to fix it probably).

I don't understand your objections to this patch. I am not changing when or how BuildLibCalls is used, merely making sure it adds the needed parameter attributes for the targets that need it. If your concern is about chown() and lchown(), let's see what Eli has to say about that...

Thanks, it makes more sense now. But info like this info should be in patch description to better understand the whole problem.

I think there is a bit of confusion here, because this code (inferLibFuncAttributes) is used for two purposes: To add attributes to existing libcalls, and to add attributes to newly created libcalls. Adding signext/zeroext attributes is not necessary for existing libcalls, because this is a frontend responsibility -- if these attributes are absent (but are required by ABI) then that is a bug in the frontend, and there is no requirement for LLVM to "fix" that problem (though it also doesn't hurt if it does). On the other hand, if LLVM introduces new libcalls, then it is LLVM's responsibility to add necessary signext/zeroext attributes.

If we wanted to make that clearer, we could separate out this part to only add these attributes when creating new libcalls. That should also make it obvious that it's fine that this code doesn't run under O0, because we only care about libcalls introduced by optimizations.

Yeah!

xbolva00 resigned from this revision.Apr 11 2022, 2:57 AM

Removing my “block”

This revision now requires review to proceed.Apr 11 2022, 2:57 AM

If we wanted to make that clearer, we could separate out this part to only add these attributes when creating new libcalls. That should also make it obvious that it's fine that this code doesn't run under O0, because we only care about libcalls introduced by optimizations.

This might make sense... I guess we could introduce a separate routine to compute the "mandatory" attributes. Maybe make it a combined routine, actually; have it call getOrInsertFunction, then compute the mandatory attributes for that function. That would let us be more strict, also: we could assert() that we don't build new libcalls that don't have the code for computing mandatory attributes.

Then we can say inferLibFuncAttributes is just an optimization (and the emit* utilities call it just for the sake of avoiding pass ordering issues).

llvm/lib/Transforms/Utils/BuildLibCalls.cpp
607

I don't think LibFunc_chown/LibFunc_lchown are used anywhere else... so dropping them from TargetLibraryInfo wouldn't have any practical effect.

That said, like @nikic noted, we only really need to add sign/zeroext attributes to new libcalls; existing ones should already have the right attributes. So we could get away with still recognizing chown/lchown as long as we don't try to construct calls to them.

jonpa updated this revision to Diff 423089.Apr 15 2022, 7:12 AM

Thanks for review so far everyone! Since all of you seemed to like the idea of separating these mandatory arguments from the other ones ("inferred"), and I also thought it made sense, I decided to give it a try per your suggestions.

This might make sense... I guess we could introduce a separate routine to compute the "mandatory" attributes. Maybe make it a combined routine, actually; have it call getOrInsertFunction, then compute the mandatory attributes for that function. That would let us be more strict, also: we could assert() that we don't build new libcalls that don't have the code for computing mandatory attributes.

I called this new function getOrInsertLibFunc() which is basically mirrors/wraps the getOrInsertFunction() definitions, with added handling of mandatory argument attributes. I also changed the name of inferLibFuncAttributes() to inferNonMandatoryLibFuncAttrs() to make this point clear.

The idea is now that getOrInsertLibFunc() is called instead of getOrInsertFunction() anytime a libcall is being created, for instance in LoopIdiomRecognize.cpp. It would have been nice to assert in getOrInsertFunction() that a libcall is not being created "on the side", but that does not seem practical as it would require checking with *TLI.

I left inferNonMandatoryLibFuncAttrs() outside of getOrInsertLibFunc() as it is not always called, for instance in emitFPutC().

I started out with demanding all functions in getOrInsertLibFunc() to be "seen", and listed also those without any i32 argument explicitly (with no action in the 'default' case). It seemed however more reasonable to instead do the check afterwards so that e.g. a new libfunction without any i32 argument would not all of a sudden trigger an assert. This is not "perfect", but at least it can report if there is not any type of extension at all being done on such an argument when the target requires it.

There are many more functions (found in annotate.ll) that are not currently being emitted in calls by any optimizer (or at least not tested). I added a test for one of them (LibFunc_ldexpl) as it actually can be generated by SimplifyLibCalls, but I am not sure what to do with the others. They could either be added now without any test, or they could simply be left out in the hopes that they would be handled/discovered if they ever where to be added later (what if someone makes an optimization but only tests it on their target and then clang is built in release-mode on SystemZ..? It would probably be missed then, but on the other hand there could always be new libfunctions added that we don't know about now...). The functions I have omitted for now are:

case LibFunc_setitimer:
 case LibFunc_read:
 case LibFunc_write:
 case LibFunc_fdopen:
 case LibFunc_fputc:
 case LibFunc_fputc_unlocked:
 case LibFunc_fstat:
 case LibFunc_fstatvfs:
 case LibFunc_getitimer:
 case LibFunc_ungetc:
 case LibFunc_putc:
 case LibFunc_putc_unlocked:
 case LibFunc_under_IO_putc:
 case LibFunc_pread:
 case LibFunc_pwrite:
 case LibFunc_htonl:
 case LibFunc_htons:
 case LibFunc_ntohl:
 case LibFunc_ntohs:
 case LibFunc_fstat64:
 case LibFunc_fstatvfs64:
 case LibFunc_abs:
 case LibFunc_ffs:
 case LibFunc_isascii:
 case LibFunc_isdigit:
 case LibFunc_toascii:
 case LibFunc_putchar_unlocked:
   Changed |= setArgExtAttr(F, 0, TLI);

 case LibFunc_strchr:
 case LibFunc_strrchr:
 case LibFunc_memchr:
 case LibFunc_memrchr:
 case LibFunc_access:
 case LibFunc_fgets:
 case LibFunc_fgets_unlocked:
 case LibFunc_open:
 case LibFunc_open64:
 case LibFunc_memset_pattern4:
 case LibFunc_memset_pattern8:
 case LibFunc_memset_pattern16:
 case LibFunc_memset:
 case LibFunc_memset_chk:
 case LibFunc_memrchr:
 case LibFunc_strrchr:
   Changed |= setArgExtAttr(F, 1, TLI);

 case LibFunc_strtol:
 case LibFunc_strtoll:
 case LibFunc_strtoul:
 case LibFunc_strtoull:
 case LibFunc_setvbuf:
 case LibFunc_memccpy:
 case LibFunc_fseek:
 case LibFunc_fseeko:
 case LibFunc_fseeko64:
   Changed |= setArgExtAttr(F, 2, TLI);

Should any of these be added at this point?

BuildLibCalls:
Changed getFloatFnName() to getFloatFn() and made it also return the LibFunc via reference. This is needed as it may happen that the original LibFunc is first mapped to a name that is specific to a target. Then it happened that this name was not mapped back to the LibFunc, so I had to pass the LibFunc argument (in addition to the name) to getOrInsertLibFunc(). This mapping from name to LibFunc however seems to work fine in emitUnaryFloatFnCall() and emitBinaryFloatFnCall() ...

Since TLI is now needed in getOrInsertLibFunc() I added it where it was previously missing (/optional) to versions of emitUnaryFloatFnCall() and emitBinaryFloatFnCall(), as well as in SimplifyLibCalls.cpp as needed.

Tests:
Removed the previous changes in annotate.ll and instead tested the functions handled especially on SystemZ in some files. In two cases I had to remove the datalayout string so that there would not be i32 arguments on SystemZ (for instance for a pointer or size_t), but I think those test changes should be ok.

jonpa added inline comments.Apr 15 2022, 7:19 AM
llvm/test/Transforms/InstCombine/strchr-1.ll
3 ↗(On Diff #423089)

Will this test pass on all targets or should the first line here have a target triple to match the output (uses of i64)...?
Or perhaps running it just once on SystemZ would work?

(fputs-1.ll same)

efriedma added inline comments.Apr 15 2022, 12:48 PM
llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
38 ↗(On Diff #423089)

Does getOrInsertLibFunc really need a "Name" argument? We have TargetLibraryInfo; we can just look up the correct name.

llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1274

I'd prefer to make the assertion target-independent if possible; if someone is adding a new libcall optimization, they might not remember to write a test specifically for SystemZ.

Can we just add a check in the "default" case that the call doesn't pass/return any integer values, or something like that? (I guess in that case, you'd have to list out all the calls we build that have a size_t argument or return value, but that should be a much smaller list than every function supported by inferNonMandatoryLibFuncAttrs.)

llvm/test/Transforms/InstCombine/strchr-1.ll
3 ↗(On Diff #423089)

In general, if you're going to manually hack at tests with an "Assertions have been autogenerated" marking, please either regenerate them or drop the marking.

These tests were written with the assumption the generated IR wouldn't be target-specific. Since it turns out the input and result are in fact target-specific, we should just pick a target, and shove them into the relevant target-specific subdirectory (llvm/test/Transforms/InstCombine/X86/ etc.). And maybe add a few tests to llvm/test/Transforms/InstCombine/SystemZ/ to confirm the whole signext marking mechanism is working.

Alternatively, you could try to hack at the tests to try make them agnostic to whether there's a signext marking, but that seems like extra work for not much benefit.

jonpa updated this revision to Diff 423240.Apr 16 2022, 8:57 AM

Updated per review (see inline comments)

llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
38 ↗(On Diff #423089)

I removed it and it seems that it is nearly identical, but for one test case:

double-float-shrink-1.ll

-; MS64-NEXT:     [[LOGBF:%.*]] = call fast float @logbf(float [[F:%.*]])
+; MS64-NEXT:     [[LOGBF:%.*]] = call fast float @_logbf(float [[F:%.*]])

On this target the name gets a different name with 'TLI.setAvailableWithName(LibFunc_logbf, "_logbf");'.

I am not sure what the difference here means, but I am hoping that it would work to make that call to the preferred version directly, so I kept this change and updated the test.

llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1274

Ah, that seems to work: if we can assume that size_t never needs extension (which I "hope") we get only a small list of functions here to exclude and can have the check enabled for all targets...

llvm/test/Transforms/InstCombine/strchr-1.ll
3 ↗(On Diff #423089)

ok - I backed out of those tests and instead made a new SystemZ test file.

I think those tests don't have CHECKs for any target-specific result, so leaving them as is under /InstCombine.

efriedma added inline comments.Apr 16 2022, 2:20 PM
llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
38 ↗(On Diff #423089)

That's what we want this code to be doing, yes.

(Looking at the latest Microsoft documentation, I'm not sure the TargetLibraryInfo description is actually doing what we want, but that's not really related to this patch.)

llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1274

I think it's true at the moment that size_t never needs extension. If some future target does need extension, we have a list of functions we'd need to mark up. (I can imagine some 64-bit target with an alternate 32-bit ABI.)

1286

Do you need to check the bitwidth here? There should be very few libfuncs which take a 64-bit integer as an argument, and I'd like to avoid tripping someone up if they, for example, add a call to a libfunc with a size_t argument and forget to test it on a 32-bit target.

jonpa updated this revision to Diff 423430.Apr 18 2022, 11:13 AM

Updated per review (see inline comment)

llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1286

I tried checking for any integer argument here which succeeded on tests after adding some more functions.

We by this now also see two vararg functions here (snprintf, vsnprintf). They have a size_t argument which we consider ok, but they also have varargs, which could *theoritically* be emitted by the compiler as well, right? It seems that currently this is done only by SimplifyLibCalls which removes a check and simply passes on the varargs to the new function, so they really should be ok as they are, I think.

I can imagine we either:

  1. Accept these varargs as "safe" and assume they will never ever be emitted by an optimizer on its own and so ignores adding any mandatory attributes on them. We could add a comment somewhere documenting this.
  1. We check the varargs here with an assert that any i32 arg is extended if target demands it.
  1. We rewrite the emission of these libcalls so that it is part of the code design that the valist is being reused (as emitted by the front end). (Maybe this could if desirable be done at some later point?)

#2 and #3 would include all of them and not just the two with a size_t arg...

efriedma added inline comments.Apr 18 2022, 11:31 AM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1286

vsprintf/vsnprintf aren't varargs functions, strictly speaking. (The fact that they have a va_list argument isn't really relevant here.)

emitSPrintf/emitSNPrintf are used to lower calls to __sprintf_chk etc. If we do perform this lowering, we need to copy the zext/sext flags from the corresponding call, I think. But I think we can leave that for a followup.

efriedma added inline comments.Apr 18 2022, 11:47 AM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1286

Also, I don't think that varargs handling would belong in this function, in any case. For a varargs call, any attributes on varargs arguments need to be attached to the specific call, not the function.

jonpa added inline comments.Apr 19 2022, 6:45 AM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1286

makes sense to me...

This revision is now accepted and ready to land.Apr 19 2022, 10:21 AM
This revision was landed with ongoing or failed builds.Apr 19 2022, 12:24 PM
This revision was automatically updated to reflect the committed changes.

This breaks ubsan on the bot:

FAIL: LLVM :: Transforms/InstCombine/pr39177.ll (51198 of 64150)
******************** TEST 'LLVM :: Transforms/InstCombine/pr39177.ll' FAILED ********************
Script:
--
: 'RUN: at line 2';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/InstCombine/pr39177.ll -passes=instcombine -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/InstCombine/pr39177.ll
--
Exit Code: 2

Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/BuildLibCalls.cpp:1217:17: runtime error: reference binding to null pointer of type 'llvm::Function'
    #0 0x55892440c92e in llvm::getOrInsertLibFunc(llvm::Module*, llvm::TargetLibraryInfo const&, llvm::LibFunc, llvm::FunctionType*, llvm::AttributeList) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/BuildLibCalls.cpp:1217:13
    #1 0x55892441213d in llvm::FunctionCallee llvm::getOrInsertLibFunc<llvm::PointerType*, llvm::IntegerType*, llvm::IntegerType*, llvm::Type*>(llvm::Module*, llvm::TargetLibraryInfo const&, llvm::LibFunc, llvm::AttributeList, llvm::Type*, llvm::PointerType*, llvm::IntegerType*, llvm::IntegerType*, llvm::Type*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Transforms/Utils/BuildLibCalls.h:47:12
    #2 0x55892441124e in getOrInsertLibFunc<llvm::PointerType *, llvm::IntegerType *, llvm::IntegerType *, llvm::Type *> /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Transforms/Utils/BuildLibCalls.h:55:12
    #3 0x55892441124e in llvm::emitFWrite(llvm::Value*, llvm::Value*, llvm::Value*, llvm::IRBuilderBase&, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/BuildLibCalls.cpp:1725:22
    #4 0x5589245c6ee9 in llvm::LibCallSimplifier::optimizeFPrintFString(llvm::CallInst*, llvm::IRBuilderBase&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:2705:14
    #5 0x5589245c7168 in llvm::LibCallSimplifier::optimizeFPrintF(llvm::CallInst*, llvm::IRBuilderBase&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:2738:18
    #6 0x5589245c8ce1 in llvm::LibCallSimplifier::optimizeCall(llvm::CallInst*, llvm::IRBuilderBase&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:3159:14
    #7 0x558923d02e5d in llvm::InstCombinerImpl::tryOptimizeCall(llvm::CallInst*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:2726:32
    #8 0x558923cfc23b in llvm::InstCombinerImpl::visitCallBase(llvm::CallBase&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:2997:22
    #9 0x558923cf0b00 in llvm::InstCombinerImpl::visitCallInst(llvm::CallInst&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
    #10 0x558923c8ac69 in llvm::InstCombinerImpl::run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp:4120:31
    #11 0x558923c8cfcc in combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp:4408:13
    #12 0x558923c8cc02 in llvm::InstCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp:4439:8
    #13 0x5589247a1281 in llvm::detail::PassModel<llvm::Function, llvm::InstCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #14 0x55892394d939 in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:522:21
    #15 0x558921bdc0f1 in llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #16 0x5589239520fb in llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/PassManager.cpp:127:22
    #17 0x558921bdbe61 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #18 0x55892394cb49 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:522:21
    #19 0x5589216d6db9 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:496:7
    #20 0x5589216f2d84 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12

https://lab.llvm.org/buildbot/#/builders/5/builds/22427

MaskRay reopened this revision.Apr 19 2022, 10:42 PM
MaskRay added a subscriber: MaskRay.
MaskRay added inline comments.
llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
31 ↗(On Diff #423705)

While updating function signatures, it may be worth checking whether M is non-null and may use Module &M

This revision is now accepted and ready to land.Apr 19 2022, 10:42 PM

Review: Eli Friedman, Nikita Popov, Dávid Bolvanský

Please use Reviewed by: <who-has-accepted-the-revision> instead of Review: <all-people-on-the-review-line>

MaskRay requested changes to this revision.Apr 19 2022, 10:43 PM
This revision now requires changes to proceed.Apr 19 2022, 10:43 PM
MaskRay added inline comments.Apr 19 2022, 10:45 PM
llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
46 ↗(On Diff #423705)

Consider using a plain array.

It seems that the failing test case (Transforms/InstCombine/pr39177.ll) is a bit peculiar in that it contains an fwrite *alias*:

fwrite = alias i64 (i8*, i64, i64, %struct._IO_FILE*), i64 (i8*, i64, i64, %struct._IO_FILE*)* @__fwrite_alias

It defines the @__fwrite_alias() function, and then has a foo() function containing a call to @fprintf() which SimplifyLibCalls is trying to replace with a call to fwrite(). This is when it crashes because of a nullptr as discovered by the sanitizer.

I am not sure what the idea is here with having an fwrite alias, but it is clear that BuildLibCalls is getting confused:

 FunctionCallee llvm::getOrInsertLibFunc(Module *M, const TargetLibraryInfo &TLI,
                                        LibFunc TheLibFunc, FunctionType *T,
                                        AttributeList AttributeList) {
  assert(TLI.has(TheLibFunc) &&
         "Creating call to non-existing library function.");
  StringRef Name = TLI.getName(TheLibFunc);
=>FunctionCallee C = M->getOrInsertFunction(Name, T, AttributeList);

at this point M returns the alias:

(gdb) p C.Callee->dump()
@fwrite = alias i64 (i8*, i64, i64, %struct._IO_FILE*), i64 (i8*, i64, i64, %struct._IO_FILE*)* @__fwrite_alias

This is a GlobalAlias and not a Function, which is why getFunction() returned nullptr:

Function &F = *M->getFunction(Name);

Looking at the test case, it seems to be valid to write your own fwrite and expect the compiler use it as a replacement for the standard fwrite call..? I guess this means that getOrInsertLibFunc() should also be able to handle a GlobalAlias?

llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
46 ↗(On Diff #423705)

Do you have anything special in mind? This was copied from getOrInsertFunction().

I guess there are two possibilities in that case:

  1. Abandon the transform, as if we don't have an fwrite function.
  2. Return the necessary attributes from getOrInsertLibFunc, and make the caller apply them to the call.
llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
46 ↗(On Diff #423705)

I assume something like Type *ArgTys[] = {Args...}; should work.

I guess there are two possibilities in that case:

  1. Abandon the transform, as if we don't have an fwrite function.
  2. Return the necessary attributes from getOrInsertLibFunc, and make the caller apply them to the call.

If the user wrote a function either with the same identical name as a library function, or it ends up being aliased with such a name, it seems we should here as well trust that the front end has added the mandatory attributes, or?

Maybe we should change getOrInsertLibFunc() to first check for any existing Function or GlobalAlias matching Name. If there is already one of those, we could then assume that we have either already created it in getOrInsertLibFunc() earlier or that it is an alias created by the front end. So only when we are actually inserting a library function into the Module the first time do we need to add the attributes, right?

The user could either write a function named e.g. fwrite(), or an alias called fwrite. It seems that an optimizer emitting fwrite() therefore unknowingly either uses the library function, or the user provided function, which is somewhat strange to me: I would have guessed that the libcall emissions only emit libcalls and nothing else. I wonder even if that could lead to problems - what if the user wrote a function that did something entirely different and just happened to give it the same name as a libfunction? Is there something preventing unintended "libcall" emissions in such a case? This seems broken to me unless the user really is supposed to be able to write his own library function and expect it to be used entirely the same as the standard library function would be, as a stand-in. That is a separate topic from this patch, though...

If the user wrote a function either with the same identical name as a library function, or it ends up being aliased with such a name, it seems we should here as well trust that the front end has added the mandatory attributes, or?

If we find an alias, I wouldn't trust that the alias points to a function with a real signature.

Maybe we should change getOrInsertLibFunc() to first check for any existing Function or GlobalAlias matching Name. If there is already one of those, we could then assume that we have either already created it in getOrInsertLibFunc() earlier or that it is an alias created by the front end. So only when we are actually inserting a library function into the Module the first time do we need to add the attributes, right?

We could, sure. If we want to be conservative, I guess we could mess with isValidProtoForLibFunc to verify the attributes.

The user could either write a function named e.g. fwrite(), or an alias called fwrite. It seems that an optimizer emitting fwrite() therefore unknowingly either uses the library function, or the user provided function, which is somewhat strange to me: I would have guessed that the libcall emissions only emit libcalls and nothing else. I wonder even if that could lead to problems - what if the user wrote a function that did something entirely different and just happened to give it the same name as a libfunction? Is there something preventing unintended "libcall" emissions in such a case? This seems broken to me unless the user really is supposed to be able to write his own library function and expect it to be used entirely the same as the standard library function would be, as a stand-in. That is a separate topic from this patch, though...

If the user is defining functions that overlap with core libc function names, we expect them to use "-fno-builtin-fwrite" etc. If the user doesn't specify the right no-builtin attributes, we just sort of best-effort avoid crashing.

If we find an alias, I wouldn't trust that the alias points to a function with a real signature.

I am not sure how that could be: per the LLVM LangRef an alias must have

@<Name> = alias <AliaseeTy>, <AliaseeTy>* @<Aliasee>

, which to me looks like there must be a function defined or declared, provided as @Aliasee, with a function type..?

Would both the alias and Aliasee always come from the front end (and so be assumed to have the required attributes)...?

If we find an alias, I wouldn't trust that the alias points to a function with a real signature.

, which to me looks like there must be a function defined or declared, provided as @Aliasee, with a function type..?

An alias has to point to something, but it could point to a variable.

Would both the alias and Aliasee always come from the front end (and so be assumed to have the required attributes)...?

Even if there is a function, I wouldn't trust it has the right signature. In general, we don't require that function declarations have the right signature. (For example, you can write struct A; struct A f(void), (*g)(void) = f; in C.)

When I compile the following valid test case:

#include <string.h>
int main(){
   char a[]="123";
   char b[5];

  memset(b, 0, 5);
    __builtin___stpncpy_chk (b, a, 2, 5);
}

An Assert in BuildLibCalls.cpp fails:

llvm::FunctionCallee llvm::getOrInsertLibFunc(llvm::Module *, const llvm::TargetLibraryInfo &, llvm::LibFunc, llvm::FunctionType *, llvm::AttributeList): Assertion `!isa<IntegerType>(T->getParamType(i)) && "Unhandled integer argument."' failed.
jonpa added a comment.Apr 26 2022, 9:49 AM

... An Assert in BuildLibCalls.cpp fails ...

Thanks for reporting this. This is because InstCombiner is replacing the @__stpncpy_chk() call with a call to @stpncpy(), which has a sizeof parameter (see earlier discussion). This went undetected as there was no regression test covering this. I suppose there must be more of these...

I guess I should make a list of all built libcalls with a sizeof parameter and include them as accepted in getOrInsertLibFunc()?

Even if there is a function, I wouldn't trust it has the right signature. In general, we don't require that function declarations have the right signature. (For example, you can write struct A; struct A f(void), (*g)(void) = f; in C.)

OK - now I understand better: the Function Type may actually be incomplete at this point.

IIUC, this is basically the users responsibility: If user decides to write e.g. a function named fwrite (or a function alias of that name), either

-fno-builtin-fwrite is passed in which case fwrite will not be available to generate so getOrInsertLibFunc() will never be reached.
-fno-builtin-fwrite is *not* passed (as in pr39177.ll) and any problem is then basically the users fault and getOrInsertLibFunc() does not have to care about this.

In getOrInsertLibFunc() we could either

  1. Insert into Module a Function with the given Name
  2. Get back a previusly inserted Function a. a proper libcall b. a user defined function
  3. Get back a GlobalAlias matching Name.

I think we really only need to care about #1. We could check with isValidProtoForLibFunc() and add mandatory arguments also in cases of #2b and #3 where the prototype matches, but maybe that is not worth the effort since this is not really supposed to happen..? (And again, the front end should have emitted those I think).

Would it makes sense then to simply add to the patch to skip adding mandatory attributes in case #3 (when we get a function alias)?

When we see an alias, there are two options: we fail the transform (getOrInsertLibFunc returns null or something like that), or we return the attributes the caller needs to attach.

We can't just skip adding the attributes: we end up generating a broken call that won't perform the necessary sign-extension. So it breaks even if the alias points to a proper fwrite implementation. (Even if we don't technically promise anything in this case, generating a call that's provably broken is bad.)

jonpa added a comment.EditedApr 27 2022, 4:48 AM

We can't just skip adding the attributes: we end up generating a broken call that won't perform the necessary sign-extension. So it breaks even if the alias points to a proper fwrite implementation. (Even if we don't technically promise anything in this case, generating a call that's provably broken is bad.)

OK - we don't have the responsibility here but it wouldn't hurt to help if we could, makes sense.

I started look into the difference in having a function alias replacing a libcall, and was surprised to find that the final assembly output is actually identical, which I hadn't expected. I was assuming that in the following program @__ldexp_alias() would end up being called:

@ldexp = alias double (double, i32), double (double, i32)* @__ldexp_alias
define double @__ldexp_alias(double %X, i32 %Exp) #0 {
 ret double 0.000000e+00
}
;declare double @ldexp(double, i32) 

define double @fake_ldexp() #0 {
  %ldexp = call double @ldexp(double 1.000000e+00, i32 2)
  ret double %ldexp
}

It however has no difference as to when the "declare @ldexp" is used instead... In both cases SelectionDAGBuilder creates a node like

i64 = GlobalAddress<double (double, i32)* @ldexp> 0

Resulting in

brasl   %r14, ldexp@PLT

The function with the name of the alias is called even though that is not supposed to exist on its own (same thing with non library like name)... Not sure what to think of this... Maybe I am using the alias incorrectly somehow?

If you dig through the assembly in your example, you'll find a directive .set ldexp, __ldexp_alias. The assembler will resolve the alias.

Yes, computing the attributes in getOrInsertLibFunc() seems correct.

jonpa updated this revision to Diff 426015.Apr 29 2022, 4:37 AM

Thanks for review.

I painstakingly went through SimplifyLibCalls and made a list of all emitted libcalls with integer arguments, and found that indeed the emission of stpncpy() was missing (but no others (!). The emission of stpncpy() never had a test, but I added one in addition to the needed recognition in getOrInsertLibFunc().

When we see an alias, there are two options: we fail the transform (getOrInsertLibFunc returns null or something like that), or we return the attributes the caller needs to attach.

It seems most reasonable to not have the optimizers emit a call to a library function that is in fact not available. So I added a new function isLibCallEmittable() that can be used instead of TLI->has(): it also checks for an existing alias of the function. I also added an assert in getOrInsertLibFunc() that getOrInsertFunction() actually returns a Function.

I have tried to make all callers of getOrInsertLibFunc() first call isLibFuncEmittable() before emitting a libcall, which is required.

Also replaced getOrInsertFunction() with getOrInsertLibFunc() in few more places in SimplifyLibCalls.cpp

In a way this is still kind of a "mixed policy": We don't stop a libcall emission in the presence of a user defined function, but an alias however disables it. A function coming from the front end should have mandatory attributes already, so it should hopefully be ok. I was hoping the same should be true for a function alias, but IIUC the previous review that is not the case (due to incomplete function prototypes?). As this is after all the users responsibility, I don't think there has to be a full consistency and also that this is a separate topic that could perhaps be improved upon in the future if desired. There could for instance be a check that the Function seen in getOrInsertLibFunc() is a declaration with the proper libfunc arguments, and not a definition (unless the opposite handling would be done to instead try to do a best effort to add mandatory attributes to any user definitions and proceed with the libcall emission).

What do you think?

efriedma added inline comments.Apr 29 2022, 11:08 AM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1253

Instead of calling getFunction(), should we just cast<Function>(C.getCallee())?

Actually, also, should we check that the type of the Function is actually "T"?

I mostly want to make sure what we're doing here is sufficient to avoid crashing/verifier errors in the presence of weird user inputs. I don't want to keep expanding the scope of this.

jonpa updated this revision to Diff 426235.Apr 30 2022, 9:58 AM

Instead of calling getFunction(), should we just cast<Function>(C.getCallee())?
Actually, also, should we check that the type of the Function is actually "T"?

Patch updated per review.

jonpa marked an inline comment as done.Apr 30 2022, 9:59 AM
efriedma added inline comments.Apr 30 2022, 5:47 PM
llvm/lib/Transforms/Utils/BuildLibCalls.cpp
1313

Not sure !M->getNamedAlias(FuncName) is quite sufficient. What happens if we find a GlobalVariable? Or a Function with the wrong type?

jonpa updated this revision to Diff 426300.May 1 2022, 9:43 AM

Not sure !M->getNamedAlias(FuncName) is quite sufficient. What happens if we find a GlobalVariable? Or a Function with the wrong type?

Yes, I guess the more full treatment would be to check for other things as well, which makes sense.

I think I managed to improve the check from just checking for an Alias to checking for any GlobalValue and if found then making sure it's a Function.

As for the FunctionType, it doesn't seem feasible (at least not with the current patch), to have the caller to isLibFuncEmittable() provide the FunctionType. It however seems to work to use isValidProtoForLibFunc() for this purpose. I don't see that function used anywhere, but I remember you mentioned it earlier.

efriedma accepted this revision.May 2 2022, 9:33 AM

LGTM

This revision was not accepted when it landed; it landed in state Needs Review.May 2 2022, 10:39 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.