This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Update memory access ops to raise UB if ptrs are not well defined
ClosedPublic

Authored by aqjune on Jan 22 2021, 7:29 AM.

Details

Summary

In the past, it was stated in D87994 that it is allowed to dereference a pointer that is partially undefined
if all of its possible representations fit into a dereferenceable range.
The motivation of the direction was to make a range analysis helpful for assuring dereferenceability.
Even if a range analysis concludes that its offset is within bounds, the offset could still be partially undefined; to utilize the range analysis, this relaxation was necessary.
https://groups.google.com/g/llvm-dev/c/2Qk4fOHUoAE/m/KcvYMEgOAgAJ has more context about this.

However, this is currently blocking another optimization, which is annotating the noundef attribute for library functions' arguments. D95122 is the patch.
Currently, there are quite a few library functions which cannot have noundef attached to its pointer argument because it can be transformed from load/store.
For example, MemCpyOpt can convert stores into memset:

store p, i32 0
store (p+1), i32 0 // Since currently it is allowed for store to have partially undefined pointer..
->
memset(p, 0, 8)    // memset cannot guarantee that its ptr argument is noundef.

A bigger problem is that this makes unclear which library functions are allowed to have 'noundef' and which functions aren't (e.g., strlen).
This makes annotating noundef almost impossible for this kind of functions.

This patch proposes that all memory operations should have well-defined pointers.
For memset/memcpy, it is semantically equivalent to running a loop until the size is met (and branching on undef is UB), so the size is also updated to be well-defined.

Strictly speaking, this again violates the implication of dereferenceability from range analysis result.
However, I think this is okay for the following reasons:

  1. It seems the existing analyses in the LLVM main repo does not have conflicting implementation with the new proposal.

isDereferenceableAndAlignedPointer works only when the GEP offset is constant, and isDereferenceableAndAlignedInLoop is also fine.

  1. A possible miscompilation happens only when the source has a pointer with a *partially* undefined offset (it's okay with poison because there is no 'partially poison' value).

But, at least I'm not aware of a language using LLVM as backend that has a well-defined program while allowing partially undefined pointers.
There might be such a language that I'm not aware of, but improving the performance of the mainstream languages like C and Rust is more important IMHO.

Diff Detail

Event Timeline

aqjune created this revision.Jan 22 2021, 7:29 AM
aqjune requested review of this revision.Jan 22 2021, 7:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2021, 7:29 AM

+1 on the motivation. When we landed the other patch we didn't consider all the implications and, so far, they clearly outweigh the benefits of having partial undef offsets "work".

LGTM. Someone else?

llvm/docs/LangRef.rst
12725–12726

Partially related: I never understood the alignment thing. undef and poison are not aligned. Violating align should result in a poison value, which we had in the first place. We should check why this is her.

nikic added a comment.Jan 22 2021, 1:18 PM

I believe we would also have to specify that dereferenceable requires a well-defined value in order to be self-consistent.

aqjune updated this revision to Diff 318884.Jan 24 2021, 5:55 PM

Address comments

aqjune added inline comments.Jan 24 2021, 6:06 PM
llvm/docs/LangRef.rst
12725–12726

Initially, the constraint of alignment was introduced by D57600 after the discussion at https://bugs.llvm.org/show_bug.cgi?id=38583 .

Then, the semantics of passing undef/poison pointers to memcpy/memset was clarified by D86643.
At that point, violation of align attribute was still undefined behavior.

Then, D90529 updated the semantics of align to return poison instead.
Now, align attribute has no effect if the size is zero (unless noundef is there too).

I fixed the sentences here to be modular w.r.t. the semantics of attributes.

aqjune added a comment.Feb 8 2021, 9:43 PM

ping
Does this change look okay?
If it's okay, I'll make a follow-up patch that updates ValueTracking to analyze pointers used by these instructions as noundef and make InstCombine propagate noundef to function call arguments (as it propagates nonnull).

This revision is now accepted and ready to land.Feb 10 2021, 9:24 PM
This revision was landed with ongoing or failed builds.Feb 12 2021, 9:14 PM
This revision was automatically updated to reflect the committed changes.