This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyLibCalls] Pointers passed to libcalls must point to valid objects
Needs RevisionPublic

Authored by xbolva00 on Apr 28 2022, 12:39 PM.

Details

Summary

Discussed some time ago here: https://reviews.llvm.org/D53342 , the conclusion was to stay conservative - assume nothing about pointers if size could be zero - and according to the previous decision, memcpy(nullptr, nullptr, 0) is fine, indeed. But is is not:

c99: 7.21.2.1 "The memcpy function copies n characters from the object pointed to by s2 into the
object pointed to by s1. If copying takes place between objects that overlap, the behavior
is undefined."

null is not a pointer to an object

So even when size is zero, pointer(s) must be valid.

Somebody may say that even if this is UB, it works in practise. Not at all:

urnathan: (For amusement value, I do know MIPS' memcpy routines segfault on memcpy (dst, nullptr, 0))

Maybe we can reevaluate previous decision now?

If this rule is broken, UBSan catches this UB without any problems, as recently shown in: https://reviews.llvm.org/D124524 / https://godbolt.org/z/qx4sK43f6

Diff Detail

Event Timeline

xbolva00 created this revision.Apr 28 2022, 12:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 12:39 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
xbolva00 requested review of this revision.Apr 28 2022, 12:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 12:39 PM
xbolva00 edited the summary of this revision. (Show Details)Apr 28 2022, 12:40 PM
xbolva00 edited the summary of this revision. (Show Details)Apr 28 2022, 12:42 PM
nikic added a comment.Apr 28 2022, 1:00 PM

Looking back over https://reviews.llvm.org/D53342, the current implementation was chosen intentionally, though possibly that stance can be reevaluated now.

Last time we discussed this, we specifically avoided doing this, yes. I'm still not convinced this is worth changing; the performance improvement is probably negligible. Probably worth adding a comment briefly explaining the discussion, in any case.

We shouldn't be marking dereferenceable(1). The pointer could point to the end of an array.

xbolva00 edited the summary of this revision. (Show Details)Apr 28 2022, 1:48 PM
xbolva00 edited the summary of this revision. (Show Details)Apr 28 2022, 1:50 PM

Probably worth adding a comment briefly explaining the discussion, in any case.

Yeah, added link to that patch and some more info about previous decision to the description.

We shouldn't be marking dereferenceable(1). The pointer could point to the end of an array.

Right, I will update it.

xbolva00 updated this revision to Diff 425907.Apr 28 2022, 2:30 PM
xbolva00 edited the summary of this revision. (Show Details)Apr 28 2022, 2:32 PM

The C constraint that a pointer argument to a library function must point to an object extends beyond memcpy etc. to all standard library functions (unless otherwise specified), including for instance memchr or strncmp, regardless of the size argument. (I don't have enough background here to judge if this is something worth exploiting for optimization.)

The C constraint that a pointer argument to a library function must point to an object extends beyond memcpy etc. to all standard library functions (unless otherwise specified), including for instance memchr or strncmp, regardless of the size argument. (I don't have enough background here to judge if this is something worth exploiting for optimization.)

Right, libcalls with size argument (many tests are updated), memcpy was just example.
IIRC, gcc has knowledge about this fact already and assumes nonnull as well for the purpose of futher optimizations like elimination of redundant null checks after inlining.

xbolva00 planned changes to this revision.Apr 28 2022, 3:48 PM

But yeah, currently some libcalls are missed.

wip

xbolva00 updated this revision to Diff 425992.Apr 29 2022, 2:15 AM

Handle more libcalls

I don't have anything to add to review, but thanks for picking this up.

Just to confirm your understanding of what GCC does: it relies on attribute nonnull on function (and function pointer) declarations to eliminate redundant null pointer tests subsequent to calls to such functions (with -fdelete-null-pointer-checks).

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
155–157

It seems that for a bool the default ought to true rather than 1 (ditto for the actual argument at the call sites); a better name for the argument would also reflect the fact it's a toggle rather than a count.

But I wonder if changing the last argument to something like Value *MinBytes = nullptr and computing the Boolean result from it here would be a way to simplify the code and avoid having to do that at the call site.

msebor added inline comments.May 5 2022, 8:53 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
155–157

It occurs to me that since the function is also called when folding wcslen where the annotation refers to the number of wide characters, using the word byte for the argument isn't entirely accurate. It should probably be element instead (ditto in annotateDereferenceableBytes).

xbolva00 added inline comments.May 7 2022, 2:00 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
155–157

It occurs to me that since the function is also called when folding wcslen

how? I see no annotateNonNullNoUndefBasedOnAccess in optimizeWcslen.

xbolva00 updated this revision to Diff 427915.May 8 2022, 2:09 AM

Addressed review feedback

xbolva00 updated this revision to Diff 427917.May 8 2022, 2:15 AM

Updated comments in tests

msebor accepted this revision.May 9 2022, 9:27 AM

This revision looks good to me. I suggest reviewing the rest of the folders for missing annotations and adding those in a follow up patch.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
155–157

You're right, I must have been looking at the narrow character version. That seems like an oversight. I'd expect optimizeWcslen to annotate the pointer argument the same way most of the narrow functions do, wouldn't you?

Some narrow character folders like optimizeStrPBrk and optimizeStrSpn among others are also missing a call to annotateNonNullNoUndefBasedOnAccess, so they might all need updating. If so, that could be done in a followup change.

This revision is now accepted and ready to land.May 9 2022, 9:27 AM

This revision looks good to me. I suggest reviewing the rest of the folders for missing annotations and adding those in a follow up patch.

I agree, we should not mix new annotations for new libcalls in this patch and do it as follow up.

efriedma requested changes to this revision.May 9 2022, 11:58 AM

I'd like to see a Discourse thread discussing this, given it was contentious the last time it came up.

This revision now requires changes to proceed.May 9 2022, 11:58 AM