This is an archive of the discontinued LLVM Phabricator instance.

[Libcalls, Attrs] Annotate libcalls with noundef
ClosedPublic

Authored by xbolva00 on Jan 21 2021, 2:47 AM.

Details

Summary

I think we can use here same logic as for nonnull.

strlen(X) - X must be noundef => valid pointer.

for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)

Diff Detail

Event Timeline

xbolva00 created this revision.Jan 21 2021, 2:47 AM
xbolva00 requested review of this revision.Jan 21 2021, 2:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2021, 2:47 AM

Only some tests are updated, I will update remaining ones if this direction is fine.

xbolva00 edited the summary of this revision. (Show Details)
xbolva00 edited the summary of this revision. (Show Details)

LGTM. FWIW, nonnull should imply noundef because if it was undef we could pick null.

jdoerfert accepted this revision.Jan 21 2021, 7:07 AM
This revision is now accepted and ready to land.Jan 21 2021, 7:07 AM
xbolva00 updated this revision to Diff 318202.Jan 21 2021, 7:34 AM

Updated remaining tests

LGTM. FWIW, nonnull should imply noundef because if it was undef we could pick null.

Thanks

nikic requested changes to this revision.Jan 21 2021, 7:35 AM
nikic added a subscriber: nikic.

I think this goes against the rules established in D87994, which allows memory accesses based on partially undefined values. Now, I think the change in D87994 was a very bad one and we should undo it, but until then I don't think we can annotate with noundef here.

This revision now requires changes to proceed.Jan 21 2021, 7:35 AM

LGTM. FWIW, nonnull should imply noundef because if it was undef we could pick null.

I did not look into the changes in this patch yet, but does it mean that from

f(i8* nonnull %p)

it is allowed to attach noundef?

f(i8* nonnull noundef %p)

I think this shouldn't be allowed, because the former one isn't UB when %p was null but the latter one raises UB.

I think this goes against the rules established in D87994, which allows memory accesses based on partially undefined values. Now, I think the change in D87994 was a very bad one and we should undo it, but until then I don't think we can annotate with noundef here.

I remember that this was to allow free code motion of load/stores if the offset is within bounds.
e.g

%p = alloca [8 x i32]
%q = gep %p, %i ; %i may be partially undef
load %q ; If 0 <= %i < 8, %q is dereferenceable even if %q is partially undefined
xbolva00 updated this revision to Diff 318209.Jan 21 2021, 7:49 AM

LGTM. FWIW, nonnull should imply noundef because if it was undef we could pick null.

I did not look into the changes in this patch yet, but does it mean that from

f(i8* nonnull %p)

it is allowed to attach noundef?

f(i8* nonnull noundef %p)

I think this shouldn't be allowed, because the former one isn't UB when %p was null but the latter one raises UB.

I think this goes against the rules established in D87994, which allows memory accesses based on partially undefined values. Now, I think the change in D87994 was a very bad one and we should undo it, but until then I don't think we can annotate with noundef here.

I remember that this was to allow free code motion of load/stores if the offset is within bounds.
e.g

%p = alloca [8 x i32]
%q = gep %p, %i ; %i may be partially undef
load %q ; If 0 <= %i < 8, %q is dereferenceable even if %q is partially undefined

but we are talking about specific libcalls, so eg. strlen(p), where p is null, is UB.

Okay, I think the situation is a bit complex. :/
Conceptually, for library functions that can be introduced by optimizations, noundef cannot be attached unless it is proven to be noundef; it can make the program more undefined due to the existence of partially undefined pointers.
A problem is that sometimes it isn't clear whether a library function is introduced by optimizations or not. ExpandMemCmp can introduce memcmp and MemCpyOptimizer can introduce memset/memcpy, so they shouldn't have noundef attached in general. But, for other functions (except fns like printf that have a side-effect), I don't think it is clear.

I think a safer way is to make clang frontend attach noundef to the pointer argument of memcmp/memset/memcpy/.... calls when emitting them. Does it have the same optimization power in practice, or still SimplifyLibCalls allows more optimizations?

but we are talking about specific libcalls, so eg. strlen(p), where p is null, is UB.

The problem is that in theory p can be partially undef which is still dereferenceable. Imagine a pointer that points into somewhere within bounds of alloca [16 x i8] but the lowest 4 bits of the offset is undefined.
If strlen's first argument has noundef, this will also be UB, which is more undefined than before.

LGTM. FWIW, nonnull should imply noundef because if it was undef we could pick null.

I did not look into the changes in this patch yet, but does it mean that from

f(i8* nonnull %p)

it is allowed to attach noundef?

f(i8* nonnull noundef %p)

I think this shouldn't be allowed, because the former one isn't UB when %p was null but the latter one raises UB.

My comment should be read in the context of this patch. We only add nonnull when we know there is an access, so the poison of passing null would be accessed causing UB.

I think this goes against the rules established in D87994, which allows memory accesses based on partially undefined values. Now, I think the change in D87994 was a very bad one and we should undo it, but until then I don't think we can annotate with noundef here.

Agreed. It seems to make noundef unusable.

Or just revert D87994 if it blocks more optimizations than it allows..

We could patch clang, but llvm provides valuetracking’s “isKnownNonZero” which is useful and makes this more powerful.

aqjune added a comment.EditedJan 21 2021, 8:54 AM

Since this issue only happens when the undef value is involved, another solution is to revert D87994 as suggested & make load/store's pointer noundef & rely on the future that the undef value will be finally removed.
I don't think this will cause a problem in practice as well even if undef persists; memory sanitizer is already catching the case.

Since this issue only happens when the undef value is involved, another solution is to revert D87994 as suggested & make load/store's pointer noundef & rely on the future that the undef value will be finally removed.
I don't think this will cause a problem in practice as well even if undef persists; memory sanitizer is already catching the case.

+1

Since this issue only happens when the undef value is involved, another solution is to revert D87994 as suggested & make load/store's pointer noundef & rely on the future that the undef value will be finally removed.
I don't think this will cause a problem in practice as well even if undef persists; memory sanitizer is already catching the case.

Since this issue only happens when the undef value is involved, another solution is to revert D87994 as suggested & make load/store's pointer noundef & rely on the future that the undef value will be finally removed.
I don't think this will cause a problem in practice as well even if undef persists; memory sanitizer is already catching the case.

+1

I don't understand what this has to do with the undef value, or maybe I misinterpret what that means. However, if we revert that patch, I think we do ourselves a favor.

Since this issue only happens when the undef value is involved, another solution is to revert D87994 as suggested & make load/store's pointer noundef & rely on the future that the undef value will be finally removed.
I don't think this will cause a problem in practice as well even if undef persists; memory sanitizer is already catching the case.

Since this issue only happens when the undef value is involved, another solution is to revert D87994 as suggested & make load/store's pointer noundef & rely on the future that the undef value will be finally removed.
I don't think this will cause a problem in practice as well even if undef persists; memory sanitizer is already catching the case.

+1

I don't understand what this has to do with the undef value, or maybe I misinterpret what that means. However, if we revert that patch, I think we do ourselves a favor.

Proposal for revert:: https://reviews.llvm.org/D95174

I think this goes against the rules established in D87994, which allows memory accesses based on partially undefined values. Now, I think the change in D87994 was a very bad one and we should undo it, but until then I don't think we can annotate with noundef here.

LangRef was changed/reverted.

Slightly orthogonal to this patch but somewhat related: do we want to make malloc's size noundef as well (as well as many allocator functions' size args)?

%size = load i64, i64* %my_ptr_size ; If malloc's size is noundef, !noundef can be attached to this load
%ptr = call malloc(i64 %size)

If we're moving towards allowing well-defined pointers and access sizes only, this also makes sense.
If okay, I'll write a patch that updates BuildLibCalls.cpp .

aqjune accepted this revision.Feb 17 2021, 9:34 PM

LGTM

xbolva00 updated this revision to Diff 325134.Feb 19 2021, 7:08 PM
This revision was not accepted when it landed; it landed in state Needs Review.Feb 19 2021, 7:10 PM
This revision was automatically updated to reflect the committed changes.