This is an archive of the discontinued LLVM Phabricator instance.

Add new !unconditionally_dereferenceable load instruction metadata
AbandonedPublic

Authored by whitequark on Apr 3 2016, 3:38 AM.

Details

Summary

!unconditionally_dereferenceable is similar to !dereferenceable,
but is intended for frontends for memory-safe languages where
loads can be freely moved around without impacting their
dereferenceability.

!dereferenceable used to have the same semantics as the newly
introduced !unconditionally_dereferenceable before LLVM 3.8;
it was made more conservative in r252604.

Notes:

  • The dereferenceable argument attribute is close to !unconditionally_dereferenceable in semantics than !dereferenceable, should we update LangRef?
  • There's no !unconditionally_dereferenceable_or_null; I'm not sure if there should be. It's not useful for my frontend and I'm not sure if it's useful for LICM either. Also, this combinatory explosion worries me.

Diff Detail

Repository
rL LLVM

Event Timeline

whitequark updated this revision to Diff 52494.Apr 3 2016, 3:38 AM
whitequark retitled this revision from to Add new !unconditionally_dereferenceable load instruction metadata.
whitequark updated this object.
whitequark set the repository for this revision to rL LLVM.
whitequark added a subscriber: llvm-commits.
sanjoy edited edge metadata.Apr 3 2016, 11:47 AM

I haven't done a full review, but one aspect of this change worries me
at a theoretical level -- after this change it is possible to cause
miscompiles by introducing dynamically dead code.

E.g. if we have

void @foo() {
  %t = alloca i32*
}

and say we change it to

void @foo() {
  %t = alloca i32*
  if (false) {
    %ptr = load i32*, i32** %t, !unconditionally_dereferenceable
    %val = load i32, i32* %ptr
  }
}

In theory the second program should be equivalent to the first, since
only dynamically dead code was added (that would never execute at
runtime). But, given the semantics of the
!unconditionally_dereferenceable attribute, I can further transform
the program to

void @foo() {
  %t = alloca i32*
  %ptr = load i32*, i32** %t, !unconditionally_dereferenceable
      ;; allocas are always dereferenceable
  %val = load i32, i32* %ptr
      ;; load from unconditionally dereferenceable value
  if (false) {
  }
}

which would has undefined behavior.

Is there a way you can change the semantics of this attribute so that
one of the above transforms isn't possible?

whitequark added a comment.EditedApr 3 2016, 12:55 PM

I acknowledge that your analysis is correct, and I see your point about the transformations (assuming for sake of translation of your argument that by if(false) you mean "a loop with zero trip count"). The core of the issue as I see it is that, when applied to self-contradictory IR, otherwise valid optimizations can bring undefined behavior "out of thin air".

However, this is something that already happens in LLVM. Consider this testcase:

target datalayout = "E-m:e-p:32:32-i8:8:8-i16:16:16-i64:32:32-f64:32:32-v64:32:32-v128:32:32-a0:0:32-n32"

%a = type { %b* }
%b = type { i32 }

define void @test() {
entry:
  %arg = alloca %a
  %ptr.f = getelementptr inbounds %a, %a* %arg, i32 0, i32 0
  %val.f = load %b*, %b** %ptr.f, !invariant.load !0, !dereferenceable !1
  br label %for.head

for.head:
  %IND = phi i32 [ 0, %entry ], [ %IND.new, %for.body ]
  %CMP = icmp slt i32 %IND, 0
  br i1 %CMP, label %for.body, label %exit

for.body:
  %ptr.x = getelementptr inbounds %b, %b* %val.f, i32 0, i32 0
  %val.x = load i32, i32* %ptr.x, !invariant.load !0
  call void @consume(i32 %val.x)
  %IND.new = add i32 %IND, 1
  br label %for.head

exit:
  ret void
}

declare void @consume(i32)

!0 = !{}
!1 = !{ i64 4 }

The loop trip count is zero, and the second load is clearly undefined, but ToT LICM nevertheless hoists it.

Personally, I think that:

  1. It is the responsibility of the frontend to not construct self-contradictory IR. There are far too many ways to do it that are not obviously wrong (and thus not easily rejected by the verifier), e.g. it is easy to do so with TBAA.
  2. It is the responsibility of the middle-end to not construct self-contradictory IR when given non-self-contradictory IR. I.e. in your example the transformation introducing the code inside if (false) { ... } is clearly at fault, because there are some (dynamically dead) CFG paths that violate the invariants of the !unconditionally_dereferenceable marker and there were none before it.
sanjoy added a comment.EditedApr 3 2016, 1:22 PM

However, this is something that already happens in LLVM. Consider this testcase:

target datalayout = "E-m:e-p:32:32-i8:8:8-i16:16:16-i64:32:32-f64:32:32-v64:32:32-v128:32:32-a0:0:32-n32"

%a = type { %b* }
%b = type { i32 }

define void @test() {
entry:
  %arg = alloca %a
  %ptr.f = getelementptr inbounds %a, %a* %arg, i32 0, i32 0
  %val.f = load %b*, %b** %ptr.f, !invariant.load !0, !dereferenceable !1
  br label %for.head

for.head:
  %IND = phi i32 [ 0, %entry ], [ %IND.new, %for.body ]
  %CMP = icmp slt i32 %IND, 0
  br i1 %CMP, label %for.body, label %exit

for.body:
  %ptr.x = getelementptr inbounds %b, %b* %val.f, i32 0, i32 0
  %val.x = load i32, i32* %ptr.x, !invariant.load !0
  call void @consume(i32 %val.x)
  %IND.new = add i32 %IND, 1
  br label %for.head

exit:
  ret void
}

declare void @consume(i32)

!0 = !{}
!1 = !{ i64 4 }

I may be missing the point here, but isn't the program undefined to
begin with? %val.f is marked as !dereferenceable yet it results
in a non-dereferenceable value, and that breaks the frontend's
contract with the optimizer. If I remove !dereferenceable from
%val.f, then LICM does not hoist the load out of the loop.

  1. It is the responsibility of the frontend to not construct

self-contradictory IR. There are far too many ways to do it that are
not obviously wrong (and thus not easily rejected by the verifier),
e.g. it is easy to do so with TBAA.

Agreed, if the frontend generates bogus IR then all bets are off. The
IR verifier is only a best-effort sanity check, and cannot catch every
issue.

  1. It is the responsibility of the middle-end to not construct

self-contradictory IR when given non-self-contradictory IR. I.e. in
your example the transformation introducing the code inside `if
(false) { ... }` is clearly at fault, because there are some
(dynamically dead) CFG paths that violate the invariants of the
!unconditionally_dereferenceable marker and there were none before
it.

But with !unconditionally_dereferenceable, this ("don't introduce
dynamically dead paths") is a new burden to the optimizer that
wasn't there before (as far as I can tell). That is what I'm worried
about.

And there are cases where we'd realistically want to introduce
dynamically dead paths (indirectly). E.g. imagine a sophisticated
"cold function merging" optimization, like:

int* foo1(int** ptr) {
  int* val = **ptr, unconditionally_dereferenceable
  val;
}

int* foo2(int** ptr) {
  val = *ptr
  val;
}

void bar(int** ptr) {
  int* k1 = foo1(ptr);
  int* k2 = foo2(alloca(...));
}

Non-existent (at present) function commoning pass =>

int* foo_common(int** ptr, bool unconditional) {
  int *val;
  if (unconditional) {
    val = *ptr, unconditionally_dereferenceable
  } else {
    val = *ptr
  }
  return val;
}

void bar(int* ptr) {
  int* k1 = foo_common(ptr, true);
  int* k2 = foo_common(alloca(...), false);
}

Inlining into bar =>

(Edited: the inlined code earlier was incorrect)

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  if (false) {
    val = *t, unconditionally_dereferenceable
  } else {
    val = *ptr
  }
}
sanjoy added a comment.EditedApr 3 2016, 1:31 PM

[Edit: added back some words that I had eaten up in the previous
version]

In our JIT, we do face a problem with the current notion of
!dereferenceable and LLVM dropping these too often, but we have a
different solution for that. When we hoist !dereferenceable loads
out of control flow we re-do a language specific analysis over the IR
and "heal" back the !dereferenceable metadata. We do this for other
kinds of metadata as well, like !range. Will something like that
work for you?

reames edited edge metadata.Apr 18 2016, 6:03 PM

Can you give a bit of context on what the source level rule which implies global dereferenceability is?

The patch is presented looks mostly okay - I haven't gone through it carefully yet - but the motivation bothers me. Having a condition load from something which isn't obviously dereferenceable (i.e. global, etc..) seems to imply we might be missing something. i.e. why was it conditional at the source level to start with?

reames requested changes to this revision.Apr 18 2016, 6:04 PM
reames edited edge metadata.

Marking as request changes to get it out of my queue. Please change when responding.

This revision now requires changes to proceed.Apr 18 2016, 6:04 PM
whitequark requested a review of this revision.EditedNov 10 2016, 7:23 AM
whitequark edited edge metadata.

The source level rule is as follows: I have a memory-safe language with region-based memory management. Once an object is constructed, pointers to it are guaranteed by the frontend to live no longer than the object. Thus, most* pointers constructed, even in dead code, can be dereferenced at any time when it is possible at all to construct it.

\* The language is fairly simple and has two major object types at runtime, arrays and fields. Loads of pointers from fields always result in dereferenceable pointers because objects never contain trap representations in fields except during construction, and hoisting a load across a store used during construction is not permitted because of aliasing. Loads of pointers from arrays can result in non-dereferenceable pointers if the load is hoisted above a bounds check.

Do you feel this is too niche?

hfinkel edited edge metadata.Nov 10 2016, 11:50 AM

However, this is something that already happens in LLVM. Consider this testcase:

...

Inlining into bar =>

(Edited: the inlined code earlier was incorrect)

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  if (false) {
    val = *t, unconditionally_dereferenceable
  } else {
    val = *ptr
  }
}

The assumption is placed on the return value, and val would not unconditionally have the assumption, and so never will really, so there's not a problem here (AFAIKT).

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  if (false) {
    val = *t, unconditionally_dereferenceable
  } else {
    val = *ptr
  }
}

The assumption is placed on the return value, and val would not unconditionally have the assumption, and so never will really, so there's not a problem here (AFAIKT).

Sure, but isn't that the same as !dereferenceable then?

I thought the idea here was to be able to hoist the load of val out of the never-taken branch and still preserve the !unconditionally_dereferenceable attribute. That would mean if the example was a bit more complicated:

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  if (false) {
    val = *t, unconditionally_dereferenceable
    int k = *val;
    print(k)
  } else {
    val = *ptr
  }
}

(hoisting)

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  val = *t, unconditionally_dereferenceable
  if (false) {
    int k = *val;
    print(k)
  } else {
    val = *ptr
  }
}

(hoisting, since k is a load from a known dereferenceable pointer)

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  val = *t, unconditionally_dereferenceable
  int k = *val; // FAULT / UB
  if (false) {
    print(k)
  } else {
    val = *ptr
  }
}

@sanjoy Your example is being transformed as expected, from my POV. The idea is that an unconditionally dereferenceable load cannot be moved across a store that may-alias the pointer, and this lets one initialize it safely. Other than immediately after allocation, a pointer which is ever loaded like that must always be dereferenceable.

@sanjoy Your example is being transformed as expected, from my POV. The idea is that an unconditionally dereferenceable load cannot be moved across a store that may-alias the pointer, and this lets one initialize it safely. Other than immediately after allocation, a pointer which is ever loaded like that must always be dereferenceable.

(Sorry for the wall of text, but I've repeated things here to consolidate some of the previous arguments and present something coherent.)

Maybe we are talking about slightly different things, but I'm trying to avoid adding things to the IR that affect the IR semantics without executing. That is, the optimizer should be able to add if (false) { /* Whatever it wants as long as it syntactically correct. */ } and have it not affect what the program's behavior. I don't think there are any constructs in the IR today that allow this, and if there are, we should try to fix them, and not add more.

The way dead code can affect behavior using this construct is:

(snippet 1)

if (false) {
  %t0 = alloca i32*
  %t1 = load i32*, i32** %t0, !uncond_deref
  %t2 = load i32, i32* %t1
}

given the rules you're suggesting (IIUC) can be transformed to

%t0 = alloca i32*
%t1 = load i32*, i32** %t0, !uncond_deref
%t2 = load i32, i32* %t1
if (false) {
}

which will introduce a fault in the program.

I want to avoid the "dead-code-affects-semantics" problem because I think it will make the IR difficult to reason about and optimize. For instance, say we want to do some form of "checked devirtualization" (you compare the vtable of the receiver with some constant, and if that comparison succeeds, you do a direct call which can then be inlined alter, else you do a normal virtual call). That is:

func_ptr = rcvr->vtable[5];
func_ptr(40);

to

t = rcvr->vtable;
if (t == __STRING__) {
  foo(40);
} else {
  func_ptr = t[5];
  func_ptr(40);
}

The justification for the optimization above is that in case t is not == __STRING__ then the call to foo(40) is dead and won't affect the behavior of the program, and if t is == __STRING__ then you'd have called foo anyway. However, if the body of foo is (snippet 1) then the optimizer could end up introducing a fault in the program that wasn't there before after inlining foo.

Another example is here: https://reviews.llvm.org/D18738#390736

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  if (false) {
    val = *t, unconditionally_dereferenceable
  } else {
    val = *ptr
  }
}

The assumption is placed on the return value, and val would not unconditionally have the assumption, and so never will really, so there's not a problem here (AFAIKT).

Sure, but isn't that the same as !dereferenceable then?

I thought the idea here was to be able to hoist the load of val out of the never-taken branch and still preserve the !unconditionally_dereferenceable attribute. That would mean if the example was a bit more complicated:

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  if (false) {
    val = *t, unconditionally_dereferenceable
    int k = *val;
    print(k)
  } else {
    val = *ptr
  }
}

(hoisting)

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  val = *t, unconditionally_dereferenceable
  if (false) {
    int k = *val;
    print(k)
  } else {
    val = *ptr
  }
}

(hoisting, since k is a load from a known dereferenceable pointer)

void bar(int* ptr) {
  int* k1 = foo1(ptr, true);
  // inlined int* k2 = foo_common(alloca(...), false);
  t = alloca(...)
  int* val;
  val = *t, unconditionally_dereferenceable
  int k = *val; // FAULT / UB
  if (false) {
    print(k)
  } else {
    val = *ptr
  }
}

Good point. I think that the merging in your example needs to be invalid. I think the relevant restrictions here seems very much like the restrictions on what can be done for the convergent attribute. The difference being that, for convergent, it only matters if we introduce new control dependencies that might be non-constant (or non-uniform more specifically). Here, we can't introduce any new control dependencies, even if they're trivial.

@sanjoy OK, I understand now. In a nutshell, our disagreement was in whether dead code in IR can affect semantics of live code. This doesn't strike me as particularly bad (even after looking at your examples--clearly they shouldn't use !unconditionally_dereferenceable, but that doesn't mean it's not useful elsewhere), but you clearly have more experience here, so I won't argue that my approach is viable for upstream.

Do you have any suggestions for implementing this functionality in a cleaner way?

Hi @whitequark,

@sanjoy OK, I understand now. In a nutshell, our disagreement was in whether dead code in IR can affect semantics of live code. This doesn't strike me as particularly bad (even after looking at your examples--clearly they shouldn't use !unconditionally_dereferenceable, but that doesn't mean it's not useful elsewhere), but you clearly have more experience here, so I won't argue that my approach is viable for upstream.

Do you have any suggestions for implementing this functionality in a cleaner way?

Sorry for the late reply!

The way I'd go about this is to mark the pointers being loaded from as being dereferenceable up to a certain size at the source of the pointer itself. That is, instead of:

i8* ptr = x->field;
val = load i8, i8* (ptr + 32), !unconditionally_dereferenceable

do

i8* ptr = x->field, !dereferenceable(32 + 1);
val = load i8, i8* ptr

For return values and incoming arguments, you can use the dereferenceable attribute etc.

I'm a bit late to the party here, but I'm facing similar problems, so I'm interested in a clean solution. I wonder though if what we want to express isn't some sort of "type-based dereferencability annotations". For example the semantics I care about are essentially, "if you know you have a defereferencable pointer, you can go and dereference any other valid (managed) pointers the pointed to data references (recursively)". This has to be true for me, because the GC walks all pointers recursively that way. Of course the problem with this is that the compiler doesn't know which part of the referenced data are valid pointers for this purpose (and it can't just be based on the struct type, because we do store pointers to unmanaged data). So if we had a way to express to the compiler "here are the pointers in this struct that you may assume dereferencable", that would work very well for me. Would that solve your problem as well?

Prazek added a subscriber: Prazek.Apr 7 2017, 8:29 AM

I think we already have the same type of optimization and it is fine. Consider:

i8 foo(i8* dereferenceable(8) %p) {
  %v = load i8, i8* %p
  ret i8 %v
}

Now if you have code like:

void main(i8* %p) {

}

you can argue that you could transform it into:

void main(i8* %p) {
  if (false) {
     call foo(i8* %p)
  }
}

resulting in possibility of hoisting the load of %p from foo based on dereferenceable(8) attribute.
I would argue that this transformation is invalid, because you are introducing new information about the %p that wasn't there.
The same thing applies to the metadata.

Beside the discussion, my comment about the patch would be to instead of adding new metadata, to store that information inside of it.
The problem I see is that I would like to have unconditional !invariant.load, invariant.group and !dereferenceable to mark it on vtable loads and virtual function loads,
which are known to be dereferenceable and invariant.laod (for vfunction loads) unconditionally.

I was imaging it as adding string to metadata like:
!0 = !{i64 8, "Unconditionally"} or "NonLocalProperty" or "GlobalProperty" or whathever.
This way it would scale better for any metadata (someone would also like to mark nonnull and other), and we could just drop only conditional metadata.
You can check out the discussion I started on mailing list http://lists.llvm.org/pipermail/llvm-dev/2017-April/111684.html

I think we already have the same type of optimization and it is fine. Consider:

i8 foo(i8* dereferenceable(8) %p) {
  %v = load i8, i8* %p
  ret i8 %v
}

Now if you have code like:

void main(i8* %p) {

}

you can argue that you could transform it into:

void main(i8* %p) {
  if (false) {
     call foo(i8* %p)
  }
}

Yes, that needs to be legal.

resulting in possibility of hoisting the load of %p from foo based on dereferenceable(8) attribute.

How would you justify that? %p is dereferenceable(8) iff foo(%p) is executed. In the above program foo(%p) is not executed, so %p is not dereferneceable (outside the if (false) block, inside the if (false) block anything is "valid" since it is dead code).

I would argue that this transformation is invalid, because you are introducing new information about the %p that wasn't there.

I don't think you've introduced any new information that is valid outside the if (false) block.

I think we already have the same type of optimization and it is fine. Consider:

i8 foo(i8* dereferenceable(8) %p) {
  %v = load i8, i8* %p
  ret i8 %v
}

Now if you have code like:

void main(i8* %p) {

}

you can argue that you could transform it into:

void main(i8* %p) {
  if (false) {
     call foo(i8* %p)
  }
}

Yes, that needs to be legal.

resulting in possibility of hoisting the load of %p from foo based on dereferenceable(8) attribute.

How would you justify that? %p is dereferenceable(8) iff foo(%p) is executed. In the above program foo(%p) is not executed, so %p is not dereferneceable (outside the if (false) block, inside the if (false) block anything is "valid" since it is dead code).

I would argue that this transformation is invalid, because you are introducing new information about the %p that wasn't there.

I don't think you've introduced any new information that is valid outside the if (false) block.

You are right, the dereferenceable does not propagate outside of BB it is being executed.

I think I see your point now. There is probably no way of specifying this metadata so it would not affect the program behavior if it is introduced in dead block.

I would consider transformation like:

int main() {
}

to

int main() {
  if (false) { something with global dereferenceable }
}

Because we are lying that something is dereferenceable everywhere, but it is actually only in this BB.

The problem is that this property could be introduced with a function call like

int main() {
  if (false) call();
}

and because the attribute is not transitive in any way, we don't know if such transformation is legal or not.

On the other hand, thinking about what could go wrong if we would add this to vtable loads etc, I can't find anything that would caue our program to behave in different way, because the
property will be always valid (unless LLVM would introduce the code with !global_dereferenceable from thin air) - even if we would not mark every vtable load with it,
doing the transformations that you mentioned would still be valid, like:

int main() {
  if (false) { // introduced by a speculative call or something
    load %p, !global_dereferenceable
  }
  else {
    load %p
  }
}

Figuring out that the second load might have !global_dereferenceable is legit

So in the summary, if this feature would be used to model some higher language feature, that is valid everywhere in the program like:

  • the fact that loaded vtable is dereferenceable for all slots, or
  • the fact that vtable is constant

then if llvm will do sane transformation (as long as it does not add this metadata from the air), then even inlining function all inside the dead block should be valid.

Hi Piotr,

This can show in the course of "normal" optimization as well. Consider a function like:

void f(void* ptr, bool is_virt) {
  if (is_virt) {
    auto* vb = (VirtBaseClass*)ptr;
    vb->v_func_call();
  }
}

void main() {
  long x;
  f(&x, false);
}

After inlining, this will be:

void f(void* ptr, bool is_virt) {
  if (is_virt) {
    auto* vb = (VirtBaseClass*)ptr;
    vb->v_func_call();
  }
}

void main() {
  long x;
  if (false) {
    auto* vb = (VirtBaseClass*)&x;
    vb->v_func_call();
  }
}

The VPTR load due to the virtual call will be hoistable because it is loading from an alloca of size 8 (which is known dereferenceable). If you had a global !dereferenceable on the virtual table load then the hoisted vtable load will still have the !dereferenceable, meaning the dependent function pointer load will also be hoisted. But that would likely introduce a fault since you just loaded from undef.

I hope I made sense.

Hi Piotr,

This can show in the course of "normal" optimization as well. Consider a function like:

void f(void* ptr, bool is_virt) {
  if (is_virt) {
    auto* vb = (VirtBaseClass*)ptr;
    vb->v_func_call();
  }
}

void main() {
  long x;
  f(&x, false);
}

After inlining, this will be:

void f(void* ptr, bool is_virt) {
  if (is_virt) {
    auto* vb = (VirtBaseClass*)ptr;
    vb->v_func_call();
  }
}

void main() {
  long x;
  if (false) {
    auto* vb = (VirtBaseClass*)&x;
    vb->v_func_call();
  }
}

The VPTR load due to the virtual call will be hoistable because it is loading from an alloca of size 8 (which is known dereferenceable). If you had a global !dereferenceable on the virtual table load then the hoisted vtable load will still have the !dereferenceable, meaning the dependent function pointer load will also be hoisted. But that would likely introduce a fault since you just loaded from undef.

I hope I made sense.

Yep, this is a very good example. I will think about some solutions

@sanjoy Since D20116 is in, is there any reason to avoid having a !speculatable on load instructions? It can be emulated anyway by defining a class of @load.x functions marked speculatable and their return value dereferenceable, so there is no loss of soundness.

This revision now requires changes to proceed.May 11 2017, 6:50 AM

@sanjoy Since D20116 is in, is there any reason to avoid having a !speculatable on load instructions? It can be emulated anyway by defining a class of @load.x functions marked speculatable and their return value dereferenceable, so there is no loss of soundness.

Isn't returning the pointer as dereferenceable, that is not actually dereferenceable considered immediate UB? If that is the case then unfortunatelly we it is not that simple, because the speculable function can't have UB.

@sanjoy Since D20116 is in, is there any reason to avoid having a !speculatable on load instructions? It can be emulated anyway by defining a class of @load.x functions marked speculatable and their return value dereferenceable, so there is no loss of soundness.

I'd be okay (even happy! :) ) if you add a @llvm.safe.load.<ty> intrinsic that never has UB, and returns undef if the address passed to it is not dereferenceable. That intrinsic could then be marked speculatable. If needed, we could even implement the intrinsic by trying to read from the address passed in, and by catching the SIGSEGV or SIGBUS, if any.

However, I don't think we agreed allowing a per-site speculatable attribute, which is analogous to what you're suggesting IIUC -- a per-load !speculatable tag.

@sanjoy

I'd be okay (even happy! :) ) if you add a @llvm.safe.load.<ty> intrinsic that never has UB, and returns undef if the address passed to it is not dereferenceable. That intrinsic could then be marked speculatable. If needed, we could even implement the intrinsic by trying to read from the address passed in, and by catching the SIGSEGV or SIGBUS, if any.

First, it is not realistically possible to implement on most platforms (SEH *might* be fine but even then I'm not sure). Second, every pass that looks at loads will have to be amended in an invasive way (a quick look at LICM alone tells me this will be a nightmare as it passes bare LoadInst* everywhere...). Third, it would be crippled compared to real loads, as it won't support some attributes loads do (e.g. !invariant.load) and adding support for that will, AFAICT, require adding a new return value attribute, similar to how nonnull and dereferenceable are currently implemented there. Fourth, I don't think it will be easy to plug into the current AA architecture.

Even if all the rest was fixed, the lack of !invariant.load alone makes it completely useless for our use case so I suggest not discussing this proposal further.

Let's back away a bit. My current issue is that my frontend generates lots of deeply nested loads in inner loops. This happens because it is translating Python, and you can easily end up with something like:

for bs in as:
  for b in bs:
    c += self.core.dds0.ftw

The frontend guarantees that all these pointers are always dereferenceable. In fact every single SSA value of pointer type in the entire emitted IR is dereferenceable and nonnull. The frontend also knows that most of these loads are ultimately constant (in the case above, a simplified extract of real-world code, self.core.dds0 will never change for the entire program lifetime). The frontend is not able to hoist the loads into preheader itself because it does not perform inlining and so does not have enough visibility.

How can I tell LLVM that these loads may be safely hoisted?

@sanjoy

I'd be okay (even happy! :) ) if you add a @llvm.safe.load.<ty> intrinsic that never has UB, and returns undef if the address passed to it is not dereferenceable. That intrinsic could then be marked speculatable. If needed, we could even implement the intrinsic by trying to read from the address passed in, and by catching the SIGSEGV or SIGBUS, if any.

First, it is not realistically possible to implement on most platforms (SEH *might* be fine but even then I'm not sure). Second, every pass that looks at loads will have to be amended in an invasive way (a quick look at LICM alone tells me this will be a nightmare as it passes bare LoadInst* everywhere...). Third, it would be crippled compared to real loads, as it won't support some attributes loads do (e.g. !invariant.load) and adding support for that will, AFAICT, require adding a new return value attribute, similar to how nonnull and dereferenceable are currently implemented there. Fourth, I don't think it will be easy to plug into the current AA architecture.

Even if all the rest was fixed, the lack of !invariant.load alone makes it completely useless for our use case so I suggest not discussing this proposal further.

Let's back away a bit. My current issue is that my frontend generates lots of deeply nested loads in inner loops. This happens because it is translating Python, and you can easily end up with something like:

for bs in as:
  for b in bs:
    c += self.core.dds0.ftw

The frontend guarantees that all these pointers are always dereferenceable. In fact every single SSA value of pointer type in the entire emitted IR is dereferenceable and nonnull. The frontend also knows that most of these loads are ultimately constant (in the case above, a simplified extract of real-world code, self.core.dds0 will never change for the entire program lifetime). The frontend is not able to hoist the loads into preheader itself because it does not perform inlining and so does not have enough visibility.

How can I tell LLVM that these loads may be safely hoisted?

Would it make more sense to have a way to mark the pointer SSA value as being dereferenceable (instead of trying to make the access itself). This seems more in line with how we handle known-dereferenceable function arguments, and might be semantically cleaner.

@hfinkel Oh, my bad--I now remember that this came up long ago...

@sanjoy Can you confirm that a dereferenceable attribute on getelementptr would be an acceptable IR extension?

@hfinkel Oh, my bad--I now remember that this came up long ago...

Firstly, going back one step to the @llvm.safe.load.<ty> intrinsic -- I think it would be fine to add a bit to load instructions that marks it as "safe" (i.e. it returns undef on being passed a non-dereferenceable pointer). Metadata won't do here since stripping the hypothetical !is_safe metadata from a load instruction won't be behavior preserving. This solves the issues you mentioned around integrating a new kind of load with the rest of LLVM, but you'll still have to implement safe loads, which can be tricky as you said.

@sanjoy Can you confirm that a dereferenceable attribute on getelementptr would be an acceptable IR extension?

A GEP that always produces a dereferenceable value may be tricky to implement since we'll have to remember to strip said attribute whenever we hoist GEPs; and LLVM likes to hoist GEP's without thinking too much. But I believe this is going in the right direction -- we should not have soundness problems as long as the safety of some operation is guaranteed by some other preceding operation.

JFYI, we solved this problem for Java by modeling the Java type system in LLVM IR. We have a way of communicating Java type layouts (which contains dereferenceability and invariance information) from our JVM frontend to LLVM[0], and LLVM uses this functionality to compute type layouts for values whose types it can infer. While we don't have any near term plans of upstreaming said infrastructure, if you wanted to take on the task of building something like this upstream, we may be able to use it. +CC @apilipenko -- he gave a talk on this in EuroLLVM 2017.

[0]: LLVM upstream merged with some local non-upstreamed changes

@hfinkel Oh, my bad--I now remember that this came up long ago...

...

@sanjoy Can you confirm that a dereferenceable attribute on getelementptr would be an acceptable IR extension?

A GEP that always produces a dereferenceable value may be tricky to implement since we'll have to remember to strip said attribute whenever we hoist GEPs; and LLVM likes to hoist GEP's without thinking too much. But I believe this is going in the right direction -- we should not have soundness problems as long as the safety of some operation is guaranteed by some other preceding operation.

In your usage model, would you need the metadata on the GEP, or would putting it on whatever generates the base pointer be sufficient?

whitequark abandoned this revision.Oct 15 2017, 5:34 AM