Page MenuHomePhabricator

[LangRef] Clarify semantics of load metadata.
ClosedPublic

Authored by efriedma on Jun 6 2018, 4:45 PM.

Details

Summary

We need to explicitly state what happens when an invariant promised by load metadata is violated at runtime, since it's come up repeatedly.

It's possible we want to specify that the result of the load is poison in some cases, rather than undefined behavior, if the constraint is violated. That would allow preserving the metadata when the load is hoisted, but doesn't allow propagating metadata based on control flow. We currently do transforms based on control-flow for nonnull metadata (in PromoteMemToReg).

The dereferenceable metadata is a little weird; not sure I'm capturing the intended semantics, but it matches the logic in Value::getPointerDereferenceableBytes.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma created this revision.Jun 6 2018, 4:45 PM
nlopes added a comment.Jun 7 2018, 7:00 AM

I just got a scary ah-ah moment..
Take this code:

int f(int &p, int n, int a) {
    int r = 0;
    for (int i = 0; i < n; ++i) {
      if (a) {
        r = r * p + 3;
      } else {
        ++a;
      }
    }
    return r;
}

This code doesn't load p if f is called with a = false and n = 1 (i.e., only 1 iteration of the loop and goes to the else branch).

clang -O1 (not -O2 to avoid hard-to-read-unrolling):

define i32 @f(i32* dereferenceable(4), i32, i32) {
  %4 = icmp sgt i32 %1, 0
  br i1 %4, label %5, label %7

; <label>:5:
  %6 = load i32, i32* %0, align 4   ; <---
  br label %9

; <label>:7:
  %8 = phi i32 [ 0, %3 ], [ %17, %9 ]
  ret i32 %8

; <label>:9:
  %10 = phi i32 [ 0, %5 ], [ %18, %9 ]
  %11 = phi i32 [ 0, %5 ], [ %17, %9 ]
  %12 = phi i32 [ %2, %5 ], [ %16, %9 ]
  %13 = icmp eq i32 %12, 0
  %14 = mul nsw i32 %6, %11
  %15 = add nsw i32 %14, 3
  %16 = select i1 %13, i32 1, i32 %12
  %17 = select i1 %13, i32 %11, i32 %15
  %18 = add nuw nsw i32 %10, 1
  %19 = icmp eq i32 %18, %1
  br i1 %19, label %7, label %9
}

https://godbolt.org/g/SQh8EJ

The load was moved to the loop preheader, but it's executed irrespective of a.
Hence if p is not 4-byte aligned, then LLVM introduce undefined behavior. Being dereferenceable doesn't imply it's so with a 4-byte alignment load.

This implies we need to remove the !align attribute (as well as all the other) when hoisting loads. Lowering 1-byte loads is not particularly efficient..
We can't really define load !align as returning poison or anything other than UB since some chips trap with unaligned memory accesses. So this is a real bug we need to fix for those folks using SPARC & friends.

I just got a scary ah-ah moment..
Take this code:

int f(int &p, int n, int a) {
    int r = 0;
    for (int i = 0; i < n; ++i) {
      if (a) {
        r = r * p + 3;
      } else {
        ++a;
      }
    }
    return r;
}

This code doesn't load p if f is called with a = false and n = 1 (i.e., only 1 iteration of the loop and goes to the else branch).

clang -O1 (not -O2 to avoid hard-to-read-unrolling):

define i32 @f(i32* dereferenceable(4), i32, i32) {
  %4 = icmp sgt i32 %1, 0
  br i1 %4, label %5, label %7

; <label>:5:
  %6 = load i32, i32* %0, align 4   ; <---
  br label %9

; <label>:7:
  %8 = phi i32 [ 0, %3 ], [ %17, %9 ]
  ret i32 %8

; <label>:9:
  %10 = phi i32 [ 0, %5 ], [ %18, %9 ]
  %11 = phi i32 [ 0, %5 ], [ %17, %9 ]
  %12 = phi i32 [ %2, %5 ], [ %16, %9 ]
  %13 = icmp eq i32 %12, 0
  %14 = mul nsw i32 %6, %11
  %15 = add nsw i32 %14, 3
  %16 = select i1 %13, i32 1, i32 %12
  %17 = select i1 %13, i32 %11, i32 %15
  %18 = add nuw nsw i32 %10, 1
  %19 = icmp eq i32 %18, %1
  br i1 %19, label %7, label %9
}

https://godbolt.org/g/SQh8EJ

The load was moved to the loop preheader, but it's executed irrespective of a.
Hence if p is not 4-byte aligned, then LLVM introduce undefined behavior. Being dereferenceable doesn't imply it's so with a 4-byte alignment load.

This implies we need to remove the !align attribute (as well as all the other) when hoisting loads. Lowering 1-byte loads is not particularly efficient..
We can't really define load !align as returning poison or anything other than UB since some chips trap with unaligned memory accesses. So this is a real bug we need to fix for those folks using SPARC & friends.

Yes, although hopefully align(4) would have also been specified on the function argument. As the reference must have been bound to a valid object, from C++ semantics, we should know that it's properly aligned. Do we do that now?

nlopes added a comment.Jun 7 2018, 7:18 AM

Yes, although hopefully align(4) would have also been specified on the function argument. As the reference must have been bound to a valid object, from C++ semantics, we should know that it's properly aligned. Do we do that now?

Good point. It's not done right now by clang, no. But I guess it should so that we can fix this hoisting bug.

aqjune added a subscriber: aqjune.Jun 8 2018, 7:22 PM
aqjune added inline comments.
docs/LangRef.rst
7941 ↗(On Diff #150221)

Hello, I have a minor question - is undefined behavior raised at the point when the load is executed? I wonder whether load with !invariant.load can be hoisted out of a loop (as in Nuno's example), even if the loaded value is actually not invariant.

nlopes added inline comments.Jun 11 2018, 3:19 AM
docs/LangRef.rst
7941 ↗(On Diff #150221)

The way the sentence is written implies that !invariant.load needs to be stripped on hoisting. I grepped around and I saw very few cases where it is used. But I guess we cannot change the semantics to be "it has the same value always" since that would mean that memory region is constant instead, and that's not what we want.
So, I guess UB is fine, but we need to strip it on hoisting.

hfinkel added inline comments.Jun 11 2018, 3:37 AM
docs/LangRef.rst
7941 ↗(On Diff #150221)

But I guess we cannot change the semantics to be "it has the same value always" since that would mean that memory region is constant instead, and that's not what we want.

Are you sure? I'm under the impression that this is used to indicate things like loads from constant memories on GPUs. I think that we should support hoisting without stripping for this. Granted, however, it means that there can't be any control dependencies on the validity of the metadata. If we have both kinds of use cases, then we'll need to introduce some way to distinguish them.

nlopes added inline comments.Jun 11 2018, 3:51 AM
docs/LangRef.rst
7941 ↗(On Diff #150221)

The uses in clang:
lib/CodeGen/CGObjC.cpp: CGM.getModule().getMDKindID("invariant.load"),
lib/CodeGen/CGObjCMac.cpp: Annotate the load as an invariant load iff inside an instance method
lib/CodeGen/CGObjCMac.cpp: ->setMetadata(CGM.getModule().getMDKindID("invariant.load"),
lib/CodeGen/CGObjCMac.cpp: LI->setMetadata(CGM.getModule().getMDKindID("invariant.load"),
lib/CodeGen/ItaniumCXXABI.cpp:
Add !invariant.load md to virtual function load to indicate that
lib/CodeGen/ItaniumCXXABI.cpp: llvm::LLVMContext::MD_invariant_load,

So there's definitely more than GPUs :) (I didn't investigate these)

The way it's written right now says that the memory is only constant after the load is executed, but makes no promises before the load is executed. Which makes sense: if the instruction is never executed, how could it constrain the execution?

hfinkel added inline comments.Jun 11 2018, 6:32 AM
docs/LangRef.rst
7941 ↗(On Diff #150221)

I don't know much about the Objective C bits, but for the vtbl loads, those are in constant memory. No?

The way it's written right now says that the memory is only constant after the load is executed, but makes no promises before the load is executed. Which makes sense: if the instruction is never executed, how could it constrain the execution?

I think that, for loads that we know are into constant memory, we actually to have the guarantee. It means that, by the very act of forming the pointer that might potentially feed the load, we know that the pointer is to constant memory.

efriedma added inline comments.Jun 11 2018, 1:15 PM
docs/LangRef.rst
7941 ↗(On Diff #150221)

The current language is "at all points in the program where the memory location is known to be dereferenceable". I think that includes points before the load, if the optimizer can prove the memory is dereferenceable some other way.

That said, the assertion still only applies along codepaths where the load is guaranteed to be executed, so we still have to strip it for some kinds of hoisting.

nlopes added inline comments.Jun 12 2018, 1:39 AM
docs/LangRef.rst
7941 ↗(On Diff #150221)

Let me be pedantic here. Right now it says:
"If a load instruction tagged with the `!invariant.load` metadata is executed, the optimizer may assume the memory location referenced by the load contains the same value at all points in the program (…)"

There's the requisite that the function must be executed before the memory region is known to be constant.

This forbids this optimization, for example:

%v = load %p
if (foo) {
   %v2 = load %p, !invariant.load
   use(%v2)
}
use(%v)

  =>

%v = load %p, !invariant.load
if (foo) {
   use(%v)
}
use(%v)

Unless foo=true, the program is not claiming that the memory at %p is constant.
If you really want to mark a region as being constant for the whole program, it sounds more like a property of the pointer (constant) rather than a local property of an instruction that may or may not be executed.

hfinkel added inline comments.Jun 12 2018, 2:47 AM
docs/LangRef.rst
7941 ↗(On Diff #150221)

I agree with you, as currently worded we'd need to strip it. My impression, however, is that it's actually generated for loads to pointers we know are only in constant memory.

Nevertheless, I've convinced myself that we need to strip it anyway. We'll need to indicate these constant memory loads in some other way for best optimization. For example, if I have something like this:

void *x = ...;
if (foo()) {
  auto *y = (class_w_vtbl *) x;
  y->do_something();
} else {
  long *y = *(long **) x;
  long z = y[some_offset];
}

the vtbl load in the first branch is the same as the first load in the second branch, and then the second load in the second branch could easily be the same load as the "invariant" load from the first branch. While language semantics will prevent us from generating loads to vtbl pointers for objects known to have them, this is not true in the face of arbitrary type casts.

Any other comments on this? The !dereferenceable changes might be a bit controversial.

efriedma updated this revision to Diff 151558.Jun 15 2018, 2:00 PM

Actually, move the dereferenceable changes out of this patch; it should all go together, separately.

hfinkel accepted this revision.Jul 9 2018, 3:41 PM

ping

LGTM.

This revision is now accepted and ready to land.Jul 9 2018, 3:41 PM

I just realized isSafeToSpeculativelyExecute currently doesn't agree with this LangRef language. :( I'll look into a fix when I have time.

Err, nevermind, we already handle that: in places which actually perform hoisting, we strip the metadata. Maybe inconvenient in other situations, though (e.g. D49144).

fhahn added a subscriber: fhahn.Jul 13 2018, 7:37 AM

Thanks for clarifying this Eli! Treating loading a null value of a nonnull load as UB should allow us to not drop nonnull from loads in some cases (D47339)

This revision was automatically updated to reflect the committed changes.