This is an archive of the discontinued LLVM Phabricator instance.

Add speculatable function attribute
ClosedPublic

Authored by arsenm on May 10 2016, 10:42 AM.

Details

Summary

This attribute tells the optimizer that the function may be speculated.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Rename the attribute to speculatable. This simplifies the patch a lot since
we are no longer trying to solve the problem where specifying one of
the memory properties for intrinsics implies that it has no side effects
(this should probably still be addressed in another patch).

I wasn't sure exactly how to define the speculatable attribute
in the language ref, so I copied the definition from the
isSafeToSpeculativelyExecute() function.

Makes sense to me. Is the idea to have isSafeToSpeculativelyExecute() return true on functions with this attribute?

Yes, this is the idea.

hfinkel added inline comments.Jul 14 2016, 7:34 PM
docs/LangRef.rst
1494–1496

We should say something to indicate that speculatable does not imply CSE-able. Unless I am mistaken, it is possible for a function to be speculatable but return different results given the same parameters.

True; only a readnone speculatable function can be CSE'd. It might be readonly, but then you can't CSE it unless you know more about the memory it might access.

Rename the attribute to speculatable. This simplifies the patch a lot since
we are no longer trying to solve the problem where specifying one of
the memory properties for intrinsics implies that it has no side effects
(this should probably still be addressed in another patch).

I wasn't sure exactly how to define the speculatable attribute
in the language ref, so I copied the definition from the
isSafeToSpeculativelyExecute() function.

Makes sense to me. Is the idea to have isSafeToSpeculativelyExecute() return true on functions with this attribute? I'd find it strange if this were not the plan. Do we plan to have FuncAttrs infer the attribute? I'm wondering if we should say something about cost and how that will be handled. We don't want to speculate expensive-to-execute functions even if it is legal.

The SpeculativeExecution pass already considers the cost

Rename the attribute to speculatable. This simplifies the patch a lot since
we are no longer trying to solve the problem where specifying one of
the memory properties for intrinsics implies that it has no side effects
(this should probably still be addressed in another patch).

I wasn't sure exactly how to define the speculatable attribute
in the language ref, so I copied the definition from the
isSafeToSpeculativelyExecute() function.

Makes sense to me. Is the idea to have isSafeToSpeculativelyExecute() return true on functions with this attribute? I'd find it strange if this were not the plan. Do we plan to have FuncAttrs infer the attribute? I'm wondering if we should say something about cost and how that will be handled. We don't want to speculate expensive-to-execute functions even if it is legal.

I don't think we should avoid inferring speculatable via a cost model in the same way that we don't slap noinline on huge functions.
IMO, speculating function calls should likely be done via an IPO pass which is driven by some cost model.

We should probably have another, symmetric, attribute: sinkable. But that shouldn't be done with this patch.

We sort of already have this, convergent. Owen was talking about splitting convergent so that it really means do not sink, and then to add a speculatable attribute. The current convergent semantics would be the combination of no sink and no speculate

-Matt

tstellarAMD edited edge metadata.

Fix wording in language ref and add a note about speculatable not implying
the function can be CSE'd.

tstellarAMD marked 2 inline comments as done.Jul 15 2016, 6:37 AM
hfinkel added inline comments.Jul 15 2016, 10:11 AM
docs/LangRef.rst
1497

I don't think that we should use "CSE'd" here, as a term. We should say something about this attribute not being enough to conclude that the number of calls executed along any particular execution path being externally observable, or something along those lines. Akin, perhaps, to what we say for volatile.

Replace the term CSE'd in the language ref with a more precise definition.

I reworded Hal's suggested description slightly to try to emphisize the
fact that it's talking about the *number* of calls being externally
observable, not just that the call itself is externally observable.

hfinkel added inline comments.Jul 15 2016, 4:55 PM
docs/LangRef.rst
1498

Do you mean "will not be"?

Fix typo in language ref.

tstellarAMD marked an inline comment as done.Jul 18 2016, 3:57 AM
tstellarAMD added inline comments.
docs/LangRef.rst
1498

Yes, I did. This is fixed now.

mehdi_amini added inline comments.Jul 18 2016, 3:30 PM
docs/LangRef.rst
1499

"does not have any effects besides calculating its result" and "speculatable is not enough to conclude that [...] the number of calls to this function will not be externally observable." seem contradictory to me.

(Also you have a typo with exection instead of execution)

tstellarAMD marked an inline comment as done.Nov 15 2016, 5:08 AM
tstellarAMD added inline comments.
docs/LangRef.rst
1499

I would like to try to revive this discussion. We've gone back and forth a lot on the attribute description. The intention is that speculatable allows something to be speculatively executed, but is not enough by itself to determine whether or not the function can be CSE'd.

Would it make sense to replace:
'does not have any effects besides calculating its result'
with
'does not have any effects other than possibly reading/writing memory and calculating its result'

hfinkel added inline comments.Nov 15 2016, 6:59 AM
docs/LangRef.rst
1499

It also can't have any undefined behavior.

mehdi_amini added inline comments.Nov 15 2016, 8:56 AM
docs/LangRef.rst
1499

Isn't it enough to say: This function attribute indicates that the function does not have undefined behavior, for any possible combination of arguments or global memory state. ?

hfinkel added inline comments.Nov 15 2016, 2:10 PM
docs/LangRef.rst
1499

Yes, I think that sounds right (the comma is not necessary).

Update definition in LangReg.

mehdi_amini accepted this revision.Feb 1 2017, 5:16 PM
mehdi_amini added reviewers: sanjoy, chandlerc.

LGTM. But let's wait a little to give a change to @sanjoy or @chandlerc to comment if they feel the need.

I repeat here one of the important earlier comment from @tstellarAMD , since it is not in the description and easy to miss:

I added two new Intrinsic attributes IntrNoSideEffects and IntrHasSideEffects,
which make it possible to specify all the possible memory interaction / side effect
combinations. With these properties in place, it should be possible in the future
to drop the 'no side effect' portion of the intrinsic memory properties once targets
have been updated to use these new properties.
This revision is now accepted and ready to land.Feb 1 2017, 5:18 PM

LGTM. But let's wait a little to give a change to @sanjoy or @chandlerc to comment if they feel the need.

I repeat here one of the important earlier comment from @tstellarAMD , since it is not in the description and easy to miss:

I added two new Intrinsic attributes IntrNoSideEffects and IntrHasSideEffects,
which make it possible to specify all the possible memory interaction / side effect
combinations. With these properties in place, it should be possible in the future
to drop the 'no side effect' portion of the intrinsic memory properties once targets
have been updated to use these new properties.

This comment is actually from an earlier version of the patch. I've dropped this part and wrote this patch instead: https://reviews.llvm.org/D22459

sanjoy edited edge metadata.Mar 21 2017, 9:52 AM

Sorry, I didn't know that this was blocked on me. I'll review this by end of day today.

I'm okay with this as long as this is allowed only on specific intrinsics (i.e. it cannot be used by external clients, but we know that certain intrinsics are speculatable). I don't think we can allow this as generic function attribute, since that allows dead code to affect program behavior. E.g.:

if (false)
  puts("hi") speculatable;

which can get transformed to

puts("hi") speculatable;
if (false)
  ;

The speculatable attribute on puts is incorrect, but we need to allow such "bogus" IR down dead paths. For instance, the original program could have been:

void do_call(fnptr f, bool is_speculatable) {
  if (is_speculatable)
    f("hi") speculatable;
  else
    f("hi");
}

// Later
do_call(puts, false);

I'm okay with this as long as this is allowed only on specific intrinsics (i.e. it cannot be used by external clients, but we know that certain intrinsics are speculatable). I don't think we can allow this as generic function attribute, since that allows dead code to affect program behavior. E.g.:

if (false)
  puts("hi") speculatable;

which can get transformed to

puts("hi") speculatable;
if (false)
  ;

The speculatable attribute on puts is incorrect, but we need to allow such "bogus" IR down dead paths. For instance, the original program could have been:

void do_call(fnptr f, bool is_speculatable) {
  if (is_speculatable)
    f("hi") speculatable;
  else
    f("hi");
}

// Later
do_call(puts, false);

Then the program is broken to begin with? Why shouldn't speculatable apply to functions that only call other speculatable instructions, similar to how FunctionAttrs can infer this for readnone? Do you mean not expose a user visible attribute in the frontend?

I'm okay with this as long as this is allowed only on specific intrinsics (i.e. it cannot be used by external clients, but we know that certain intrinsics are speculatable). I don't think we can allow this as generic function attribute, since that allows dead code to affect program behavior. E.g.:

if (false)
  puts("hi") speculatable;

which can get transformed to

puts("hi") speculatable;
if (false)
  ;

The speculatable attribute on puts is incorrect, but we need to allow such "bogus" IR down dead paths. For instance, the original program could have been:

void do_call(fnptr f, bool is_speculatable) {
  if (is_speculatable)
    f("hi") speculatable;
  else
    f("hi");
}

// Later
do_call(puts, false);

Then the program is broken to begin with? Why shouldn't speculatable apply to functions that only call other speculatable instructions, similar to how FunctionAttrs can infer this for readnone? Do you mean not expose a user visible attribute in the frontend?

readnone etc. are different from speculatable, in that once you mark a call site as speculatable you've the said call site as speculatable throughout the lifetime of the program (since, by definition, it can be arbitrarily speculated). readnone, readonly etc. do not have that property.

But thinking about it a bit, I concede that speculatable as a generic (i.e. both intrinsic and non-intrinsic) function attribute is fine. However, it doesn't make sense as a call site attribute: being speculatable only down a control flow path is basically the antithesis of speculatable.

readnone etc. are different from speculatable, in that once you mark a call site as speculatable you've the said call site as speculatable throughout the lifetime of the program (since, by definition, it can be arbitrarily speculated). readnone, readonly etc. do not have that property.

I don't follow why readnone and readonly based transformations don't need the same guarantee?

I'm okay with this as long as this is allowed only on specific intrinsics (i.e. it cannot be used by external clients, but we know that certain intrinsics are speculatable). I don't think we can allow this as generic function attribute, since that allows dead code to affect program behavior. E.g.:

if (false)
  puts("hi") speculatable;

which can get transformed to

puts("hi") speculatable;
if (false)
  ;

The speculatable attribute on puts is incorrect, but we need to allow such "bogus" IR down dead paths. For instance, the original program could have been:

void do_call(fnptr f, bool is_speculatable) {
  if (is_speculatable)
    f("hi") speculatable;
  else
    f("hi");
}

// Later
do_call(puts, false);

Then the program is broken to begin with? Why shouldn't speculatable apply to functions that only call other speculatable instructions, similar to how FunctionAttrs can infer this for readnone? Do you mean not expose a user visible attribute in the frontend?

readnone etc. are different from speculatable, in that once you mark a call site as speculatable you've the said call site as speculatable throughout the lifetime of the program (since, by definition, it can be arbitrarily speculated). readnone, readonly etc. do not have that property.

But thinking about it a bit, I concede that speculatable as a generic (i.e. both intrinsic and non-intrinsic) function attribute is fine. However, it doesn't make sense as a call site attribute: being speculatable only down a control flow path is basically the antithesis of speculatable.

True, but it can still be a callsite attribute. It can represent a data dependency:

int div(int a, int b) {
  return a/b;
}

div(q, 1); /* speculatable */

readnone etc. are different from speculatable, in that once you mark a call site as speculatable you've the said call site as speculatable throughout the lifetime of the program (since, by definition, it can be arbitrarily speculated). readnone, readonly etc. do not have that property.

But thinking about it a bit, I concede that speculatable as a generic (i.e. both intrinsic and non-intrinsic) function attribute is fine. However, it doesn't make sense as a call site attribute: being speculatable only down a control flow path is basically the antithesis of speculatable.

True, but it can still be a callsite attribute. It can represent a data dependency:

int div(int a, int b) {
  return a/b;
}

div(q, 1); /* speculatable */

But div is not well defined for any possible combination of arguments or global memory state. I think I understand what you were getting at (that for all values of q the expression div(q, 1) is well defined), but the LangRef definition mentioned in this change does not phrase things that way.

However, this should be fine IMO:

int div_specialized(int a) speculatable {
  return div(a, 1);
}

readnone etc. are different from speculatable, in that once you mark a call site as speculatable you've the said call site as speculatable throughout the lifetime of the program (since, by definition, it can be arbitrarily speculated). readnone, readonly etc. do not have that property.

I don't follow why readnone and readonly based transformations don't need the same guarantee?

I'm talking about cases like this:

void f(bool to_write, int* ptr) {
  if (to_write) *ptr = 20;
}

void g() {
  int a;
  f(false, &a) readnone;
}

Now f is not readnone generally (there are argument values for which it writes to memory), but in that specific call site we know that it does not.

What I was trying to say before is that, because of how we're defined speculatable, "speculatable for this specific argument / control dependence" does not make sense -- if you expand it out, the statement would be "for this specific set of arguments, global memory state and control dependence, this function does not have undefined behavior for any possible combination of arguments or global memory state", which contradicts itself.

Of course, due to this attribute, we will start being able to hoist calls and invokes. If those are marked readnone etc. we will have to strip those attributes if we hoisted said call or invoke out of control flow. But that's a different story.

readnone etc. are different from speculatable, in that once you mark a call site as speculatable you've the said call site as speculatable throughout the lifetime of the program (since, by definition, it can be arbitrarily speculated). readnone, readonly etc. do not have that property.

But thinking about it a bit, I concede that speculatable as a generic (i.e. both intrinsic and non-intrinsic) function attribute is fine. However, it doesn't make sense as a call site attribute: being speculatable only down a control flow path is basically the antithesis of speculatable.

True, but it can still be a callsite attribute. It can represent a data dependency:

int div(int a, int b) {
  return a/b;
}

div(q, 1); /* speculatable */

But div is not well defined for any possible combination of arguments or global memory state. I think I understand what you were getting at (that for all values of q the expression div(q, 1) is well defined), but the LangRef definition mentioned in this change does not phrase things that way.

We might want to adjust the wording. When we say "any possible", for a call site attribute, we mean the values that can possibly present themselves at *that* call site (which may be constrained to be a subset of the values allowed for the input types by the semantics of the program). We mean the same things for other call site attributes (e.g. readnone).

However, this should be fine IMO:

int div_specialized(int a) speculatable {
  return div(a, 1);
}

readnone etc. are different from speculatable, in that once you mark a call site as speculatable you've the said call site as speculatable throughout the lifetime of the program (since, by definition, it can be arbitrarily speculated). readnone, readonly etc. do not have that property.

I don't follow why readnone and readonly based transformations don't need the same guarantee?

I'm talking about cases like this:

void f(bool to_write, int* ptr) {
  if (to_write) *ptr = 20;
}

void g() {
  int a;
  f(false, &a) readnone;
}

Now f is not readnone generally (there are argument values for which it writes to memory), but in that specific call site we know that it does not.

What I was trying to say before is that, because of how we're defined speculatable, "speculatable for this specific argument / control dependence" does not make sense -- if you expand it out, the statement would be "for this specific set of arguments, global memory state and control dependence, this function does not have undefined behavior for any possible combination of arguments or global memory state", which contradicts itself.

Of course, due to this attribute, we will start being able to hoist calls and invokes. If those are marked readnone etc. we will have to strip those attributes if we hoisted said call or invoke out of control flow. But that's a different story.

But I think this is exactly the point. We can consider value constraints due to data dependencies, but not from control dependencies. The same infects the other attributes as well (if it is tagged readnone and speculatable, the readnone can also not be control dependent - attributes are not metadata; we shouldn't be stripping them).

I can phrase my concerns as a question -- what is the intended behavior of the following program if condition is false?

void f(int a, int b) { return a / b; }

void main() {
  if (condition)
    f(5, 0) speculatable;
}

As far as I can tell, there are two possibilities:

  1. It is well defined -- main is a no-op.
  2. It has undefined behavior, due to the incorrect speculatable attribute.

If we go with (1), then we cannot hoist the call to f above the if -- we'd introduce UB if we do that. This is the interpretation I'm leaning towards, but this interpretation makes the speculatable attribute useless for general functions, since we're prevented from doing what it was supposed to enable in the first place.

If we go with (2), then we've admitted that "dead code" (that is, code that is not run) can influence program behavior, which I find troubling. In particular, I think foo1 and foo2 should be equivalent:

void foo1() {
}

void foo2() {
  if (false) {
    // Arbitrary syntactically valid stuff
  }
}

In other words, adding dead code should not change program behavior.

Even

void f(int a, int b) speculatable { return a / b; }

void main() {
  if (condition)
    f(5, 0);
}

has the same problem, so I take back my earlier concession on function annotations.

Now it is fine for us to have a speculatable analysis which would infer speculatable in limited cases, like int f() { return 10; }. But that's a module analysis, and not an attribute.
Having them on intrinsics is fine too, since we axiomatically know that certain intrinsics are speculatable, just like we know the add instruction is speculatable.

  1. It has undefined behavior, due to the incorrect speculatable attribute.

I don't see it as being anything other than this. Marking it as speculatable is saying it has no undefined behavior a condition could possibly be avoiding. If you are lying about this, you get what you get.

  1. It has undefined behavior, due to the incorrect speculatable attribute.

I don't see it as being anything other than this. Marking it as speculatable is saying it has no undefined behavior a condition could possibly be avoiding. If you are lying about this, you get what you get.

I agree.

If we go with (2), then we've admitted that "dead code" (that is, code that is not run) can influence program behavior, which I find troubling.

That troubles (and worries) me as well.

If we go with (2), then we've admitted that "dead code" (that is, code that is not run) can influence program behavior, which I find troubling.

That troubles (and worries) me as well.

Why? That's part of the promise we make when we tag the code as speculatable. We promise that doing the operation early, even in cases where it might otherwise have been performed, is fine. Furthermore, we're promising that any properties the call has (e.g. promising that a certain argument is nonnull) must not be control dependent. As a result, looking at it as dead code affecting live code is suboptimal; any other properties are just a kind of global assertion.

That troubles (and worries) me as well.

Why?

Irrational fear? ;-)
It seems unusual, and I'm cautious about introducing unusual properties in the compiler in general, it makes it harder to reason about "stuff" when there aren't "simple" rules to guide the logic.
Are there existing other cases of UB induced by unreachable/dead code?

That troubles (and worries) me as well.

Why?

Irrational fear? ;-)

;-)

It seems unusual, and I'm cautious about introducing unusual properties in the compiler in general, it makes it harder to reason about "stuff" when there aren't "simple" rules to guide the logic.
Are there existing other cases of UB induced by unreachable/dead code?

Not as far as I know, but this is because we normally need to conservatively assume there might be control dependencies on just about everything we can't completely understand (i.e. a call to a function). This is novel because we're explicitly saying that there aren't such dependencies. This makes validity assertions more "global" (this is how I look at it), meaning that it matters less where in the CFG you put them (even in dead regions). The point is that, in some cases, this is exactly what we want and mean.

That troubles (and worries) me as well.

Why?

Irrational fear? ;-)
It seems unusual, and I'm cautious about introducing unusual properties in the compiler in general, it makes it harder to reason about "stuff" when there aren't "simple" rules to guide the logic.
Are there existing other cases of UB induced by unreachable/dead code?

I suppose it's the same as speculating a load from a pointer marked as dereferencable that isn't really, which is already done

That troubles (and worries) me as well.

Why?

Irrational fear? ;-)
It seems unusual, and I'm cautious about introducing unusual properties in the compiler in general, it makes it harder to reason about "stuff" when there aren't "simple" rules to guide the logic.
Are there existing other cases of UB induced by unreachable/dead code?

I suppose it's the same as speculating a load from a pointer marked as dereferencable that isn't really, which is already done

My understanding of what gives people pause is that:

if (p != nullptr)
  something();

if (false)
  foo(p /*nonnull*/) /* speculatable */

depending on the pass ordering, we might end up hoisting the call to foo and then using the nonnull assumption to simplify the if condition (even though that call is dynamically dead). We might not, however (and probably won't because we pretty eagerly remove dead code).

The thing to realize about speculatable is that it promotes all argument restrictions to properties of the argument values themselves. This might certainly seem surprising.

I suppose it's the same as speculating a load from a pointer marked as dereferencable that isn't really, which is already done

As far as I remember, dereferenceable and !dereferenceable are carefully designed to avoid this UB-from-dead-code situation.

If we go with (2), then we've admitted that "dead code" (that is, code that is not run) can influence program behavior, which I find troubling.

That troubles (and worries) me as well.

Why? That's part of the promise we make when we tag the code as speculatable. We promise that doing the operation early, even in cases where it might otherwise have been performed, is fine. Furthermore, we're promising that any properties the call has (e.g. promising that a certain argument is nonnull) must not be control dependent. As a result, looking at it as dead code affecting live code is suboptimal; any other properties are just a kind of global assertion.

Making even the behavior of a program dependent on instructions that are never executed seems like a fundamentally new thing, and I'm not yet convinced that that's safe. It may be possible to come up with a consistent model of this new thing, but I think the model will still be tricky to work with.

For instance, certain kinds of devritualization optimizations are wrong in that model. Say we had:

void f() { }
void g() { 1 / 0; } // unused

void main() {
  fnptr p = f;
  *p() speculatable;
}

Now you're saying you can't specialize the call via function pointer like this:

void f() { }
void g() { 1 / 0; } // unused

void main() {
  fnptr p = f;
  if (p == g)
    g() speculatable;  // incorrect
  else
    *p() speculatable;
}

which seems odd.

There are also (somewhat more complex) cases like this that do not involve indirect speculatable calls:

struct Base {
  int k;
  Base(int k) : k(k) {}
};

struct S : public Base {
  S(bool c) : Base(c ? 10 : 20) {}
  virtual void f() {
    div(1, k) speculatable;  // k is either 10 or 20
  }
};

struct T : public Base {
  T() : Base(0) {}
  virtual void f() { }
};

void bug(Base* b) {
  b->f();
}

We have a problem in bug if we ever devirtualize, inline and hoist a load:

void bug(Base *b) {
  int k = b->k;
  if (b->type == S) {
    div(1, k) speculatable;
  } else (b->type == T) {
  }
}

It also breaks "code compression" type optimizations:

if (a == 1 || a == 2) {
  switch (a) {
  case 1:
    div(m, 1) speculatable;
    break;
  case 2:
    div(m, 2) speculatable;
    break;
  }
}

to

if (a == 1 || a == 2) {
  div(m, a) speculatable;
}

to

if (a == 1 || a == 2) {
  if (a == 0)
    div(m, 0) speculatable;
  else
    div(m, a) speculatable;
}

It may be possible to come up with a consistent model of this new thing, but I think the model will still be tricky to work with.

This ^ ended up sounding more FUD'dy than I intended. What I meant to say is that I'm being cautious because this isn't obviously okay, not because this is obviously not okay.

If we go with (2), then we've admitted that "dead code" (that is, code that is not run) can influence program behavior, which I find troubling.

That troubles (and worries) me as well.

Why? That's part of the promise we make when we tag the code as speculatable. We promise that doing the operation early, even in cases where it might otherwise have been performed, is fine. Furthermore, we're promising that any properties the call has (e.g. promising that a certain argument is nonnull) must not be control dependent. As a result, looking at it as dead code affecting live code is suboptimal; any other properties are just a kind of global assertion.

Making even the behavior of a program dependent on instructions that are never executed seems like a fundamentally new thing, and I'm not yet convinced that that's safe. It may be possible to come up with a consistent model of this new thing, but I think the model will still be tricky to work with.

For instance, certain kinds of devritualization optimizations are wrong in that model. Say we had:

void f() { }
void g() { 1 / 0; } // unused

void main() {
  fnptr p = f;
  *p() speculatable;
}

Now you're saying you can't specialize the call via function pointer like this:

void f() { }
void g() { 1 / 0; } // unused

void main() {
  fnptr p = f;
  if (p == g)
    g() speculatable;  // incorrect
  else
    *p() speculatable;
}

which seems odd.

There are also (somewhat more complex) cases like this that do not involve indirect speculatable calls:

struct Base {
  int k;
  Base(int k) : k(k) {}
};

struct S : public Base {
  S(bool c) : Base(c ? 10 : 20) {}
  virtual void f() {
    div(1, k) speculatable;  // k is either 10 or 20
  }
};

struct T : public Base {
  T() : Base(0) {}
  virtual void f() { }
};

void bug(Base* b) {
  b->f();
}

We have a problem in bug if we ever devirtualize, inline and hoist a load:

void bug(Base *b) {
  int k = b->k;
  if (b->type == S) {
    div(1, k) speculatable;
  } else (b->type == T) {
  }
}

It also breaks "code compression" type optimizations:

if (a == 1 || a == 2) {
  switch (a) {
  case 1:
    div(m, 1) speculatable;
    break;
  case 2:
    div(m, 2) speculatable;
    break;
  }
}

to

if (a == 1 || a == 2) {
  div(m, a) speculatable;
}

to

if (a == 1 || a == 2) {
  if (a == 0)
    div(m, 0) speculatable;

Your "code compression" optimization just introduced dead code ;)

else
  div(m, a) speculatable;

}

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute. I agree, however, that we need to think carefully about how to define what speculatable means on an individual call site. Perhaps they're like convergent functions in this regard: you can't introduce new control dependencies (at least not in general).

It also breaks "code compression" type optimizations:

if (a == 1 || a == 2) {
  switch (a) {
  case 1:
    div(m, 1) speculatable;
    break;
  case 2:
    div(m, 2) speculatable;
    break;
  }
}

to

if (a == 1 || a == 2) {
  div(m, a) speculatable;
}

to

if (a == 1 || a == 2) {
  if (a == 0)
    div(m, 0) speculatable;

Your "code compression" optimization just introduced dead code ;)

Yea, I don't even know why I called it "code compression". :)

else
  div(m, a) speculatable;

}

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute. I agree, however, that we need to think carefully about how to define what speculatable means on an individual call site. Perhaps they're like convergent functions in this regard: you can't introduce new control dependencies (at least not in general).

I'd say as an initial step support for intrinsics that we _know_ are speculatable (like we _know_ that add is speculatable) can land without any further discussion.

As for a generic speculatable attribute -- I need some time to think about this. Perhaps if done with sufficient care, it is possible.

I'd also like to ping @whitequark for comments. I had blocked D18738 some time back on similar grounds, but if we can reach a conclusion on specuatable, perhaps some of that can be transferable to !unconditionally_dereferenceable as well.

It also breaks "code compression" type optimizations:

if (a == 1 || a == 2) {
  switch (a) {
  case 1:
    div(m, 1) speculatable;
    break;
  case 2:
    div(m, 2) speculatable;
    break;
  }
}

to

if (a == 1 || a == 2) {
  div(m, a) speculatable;
}

to

if (a == 1 || a == 2) {
  if (a == 0)
    div(m, 0) speculatable;

Your "code compression" optimization just introduced dead code ;)

Yea, I don't even know why I called it "code compression". :)

else
  div(m, a) speculatable;

}

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute. I agree, however, that we need to think carefully about how to define what speculatable means on an individual call site. Perhaps they're like convergent functions in this regard: you can't introduce new control dependencies (at least not in general).

I'd say as an initial step support for intrinsics that we _know_ are speculatable (like we _know_ that add is speculatable) can land without any further discussion.

I'm fine with restricting speculatable to only appear where it appears on a function declaration/definition unless/until we can figure out semantics for it on a call site in general. I don't want it restricted to intrinsics specifically, but I don't think that's the problem.

As for a generic speculatable attribute -- I need some time to think about this. Perhaps if done with sufficient care, it is possible.

I'd also like to ping @whitequark for comments. I had blocked D18738 some time back on similar grounds, but if we can reach a conclusion on specuatable, perhaps some of that can be transferable to !unconditionally_dereferenceable as well.

I'm fine with restricting speculatable to only appear where it appears on a function declaration/definition unless/until we can figure out semantics for it on a call site in general. I don't want it restricted to intrinsics specifically, but I don't think that's the problem.

Only function-level speculatable (and no call site specific speculatable) seems less problematic. It would mean having a function declaration or definition incorrectly marked as speculatable, even if it is never called, is UB; but I can live with that as long as that is properly documented.

nlopes added a subscriber: nlopes.Mar 26 2017, 8:17 AM

Let me maybe zoom out and give a different perspective:
Right now call site and function attributes are an AND of predicates that are always guaranteed to hold for that specific call site or for all call sites, respectively.
Predicates include things like doesn't write to memory, only writes to memory that is not observable, etc. Using attributes we can state that several of these predicates hold.
In the ideal world, predicates wouldn't overlap, although since we can only state ANDs of predicates and not ORs, some overlap may be need in practice.

Then we have an orthogonal concern which is what's the precondition that is sufficient to justify an optimization. For example, whether a function call can be executed speculatively is one of such preconditions. It can be derived by looking at the function attributes. For example, for speculative execution we probably need to know that the function doesn't write to memory and that it terminates.

So I feel that this speculatable attribute is not the right answer. It should be a helper function that derives its result from a set of function attributes, but shouldn't be an attribute on its own.
The only reason I could see to have it as an attribute would be to carry cost information. Just because a function can be executed speculatively, it doesn't mean it should be. We have infrastructure to record cost of intra-procedural edges, but not across functions AFAIK. That may need a more general solution for LTO anyway.

I'm just slightly concerned that the set of function attributes is growing pretty quickly without a more throughout discussion of the whole set of attributes rather than adding a new attribute every time someone wants to fix a problem. Some attributes are not well supported across the pipeline. It really feels like we need to stop for a moment and refactor function attributes.
Probably what we are missing is a "halts"/"terminating" attributes (that states that the function always returns). It has been discussed multiple times, and now we have another use case.

Then we have an orthogonal concern which is what's the precondition that is sufficient to justify an optimization. For example, whether a function call can be executed speculatively is one of such preconditions. It can be derived by looking at the function attributes. For example, for speculative execution we probably need to know that the function doesn't write to memory and that it terminates.

It is not enough (for example division by zero).

So I feel that this speculatable attribute is not the right answer. It should be a helper function that derives its result from a set of function attributes, but shouldn't be an attribute on its own.

Feel free to propose an alternative solution, right now there is no combination of attributes that is enough.

The only reason I could see to have it as an attribute would be to carry cost information.

I'm not convinced that attributes should carry "cost" information, is there a precedent for this?

dberlin added a subscriber: dberlin.EditedMar 26 2017, 2:53 PM

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute.

A lot of these (but not all of these) amount to "you cannot clone speculatable", ie if you clone the call, you must remove the attribute.

I believe the set of conditions under which you could clone the attribute are:

  1. The new call is CDEQ the original call (IE the set of conditions under which it executes is identical). IF you are cloning from one function to another, it must be CDEQ using the interprocedural control dependence.
  2. The arguments are identical.
  3. The function called is identical or marked speculatable

Note this is not entirely shocking,.
the "no cloning" is true of other attributes (you can't clone and apply readonly like is done in the devirt example above) , it's just that, being an attribute about control dependence, the effects relate to control dependence.

Note: a lot of this is premature anyway. There is no possible way you could ever apply any of these optimizations correctly to speculatable, at all, until we fix post-dom to not be broken.

I agree, however, that we need to think carefully about how to define what speculatable means on an individual call site. Perhaps they're like convergent functions in this regard: you can't introduce new control dependencies (at least not in general).

Definitely true.

Either the CDEQ set of the call must not change , or you must be able to prove that changes cannot impact the function (IE you don't make it any less dead, etc).

const int foo = bar = fred = 0;
if (foo)
if (bar)
if (fred)
   call baz() speculatable

You can prove hoisting into if (foo) cannot make it any less dead.

Then we have an orthogonal concern which is what's the precondition that is sufficient to justify an optimization. For example, whether a function call can be executed speculatively is one of such preconditions. It can be derived by looking at the function attributes. For example, for speculative execution we probably need to know that the function doesn't write to memory and that it terminates.

It is not enough (for example division by zero).

Sure; I didn't mean to have enumerated a complete list.

So I feel that this speculatable attribute is not the right answer. It should be a helper function that derives its result from a set of function attributes, but shouldn't be an attribute on its own.

Feel free to propose an alternative solution, right now there is no combination of attributes that is enough.

My point is that attributes should be about the function behavior: they should be about the *how*, and not what kind of transformations you can do with respective call sites.
In the same way, today we don't have attributes like "call can be removed if result unused", "call can be removed if multiple calls with same argument", "call can be sank into a loop", etc.
Instead we have things like "doesn't write to memory", "only writes to memory that is not observable by the caller", etc.
From our set of attributes we can infer if a transformation is valid or not. This approach scales much better: it's one attribute per behavior we want to capture instead of 1 attribute per transformation we want to do, which is surely much higher.

The other point is that this approach separates concerns: we have an analysis pass that decorates functions with attributes about how they behave, and then we have transformations that consume this information in some way. To me it seems useful to have this separation: it makes the analysis part much easier to reason about and to implement correctly.

My biggest concern with speculatable is that there's no intuitive semantics for it. If people already have different opinions of what "readonly" means (and it was supposed to have trivial semantics), then something as complex as speculatable seems like a can of worms.
And a month from now people will want more and more speculatable attributes. For example, "can be speculated across stores", "can be speculated across stores and function calls", etc. Doesn't seem to scale.

The only reason I could see to have it as an attribute would be to carry cost information.

I'm not convinced that attributes should carry "cost" information, is there a precedent for this?

No, and I didn't propose we do that. But at some point for applications like ThinLTO and PGO it seems that an inter-proc cost/information framework will be needed. ThinLTO needs to summarize what functions do. Imagine extending ThinLTO summaries to include information like "simplifies a lot if 2nd argument is false", "returns a number in range [0, 4]", etc. So it feels that eventually we will need an additional set of information to be attached to functions that we can probably not cover with the attribute framework we have today.
Anyway, that's a separate discussion.

Then we have an orthogonal concern which is what's the precondition that is sufficient to justify an optimization. For example, whether a function call can be executed speculatively is one of such preconditions. It can be derived by looking at the function attributes. For example, for speculative execution we probably need to know that the function doesn't write to memory and that it terminates.

It is not enough (for example division by zero).

Sure; I didn't mean to have enumerated a complete list.

Still: my point was you *can't* provide a complete list because we're not capturing everything with the existing attributes.

So I feel that this speculatable attribute is not the right answer. It should be a helper function that derives its result from a set of function attributes, but shouldn't be an attribute on its own.

Feel free to propose an alternative solution, right now there is no combination of attributes that is enough.

My point is that attributes should be about the function behavior: they should be about the *how*, and not what kind of transformations you can do with respective call sites.
In the same way, today we don't have attributes like "call can be removed if result unused", "call can be removed if multiple calls with same argument", "call can be sank into a loop", etc.
Instead we have things like "doesn't write to memory", "only writes to memory that is not observable by the caller", etc.
From our set of attributes we can infer if a transformation is valid or not. This approach scales much better: it's one attribute per behavior we want to capture instead of 1 attribute per transformation we want to do, which is surely much higher.

I get that, I'm still not sure what you're suggesting *in this case*.
Is it just the name that bothers your?
Looking at the description of the attribute, it fits your definition somehow: "This function attribute indicates that the function does not have undefined behavior for any possible combination of arguments or global memory state."

The other point is that this approach separates concerns: we have an analysis pass that decorates functions with attributes about how they behave, and then we have transformations that consume this information in some way. To me it seems useful to have this separation: it makes the analysis part much easier to reason about and to implement correctly.

I don't see how this is related to this attribute in any way.

My biggest concern with speculatable is that there's no intuitive semantics for it. If people already have different opinions of what "readonly" means (and it was supposed to have trivial semantics), then something as complex as speculatable seems like a can of worms.
And a month from now people will want more and more speculatable attributes. For example, "can be speculated across stores", "can be speculated across stores and function calls", etc. Doesn't seem to scale.

I don't have this impression. If you feel some parts of the definition need to be more carefully specified, that's fine. But the way you're putting it forward above seems like FUD to me.

The only reason I could see to have it as an attribute would be to carry cost information.

I'm not convinced that attributes should carry "cost" information, is there a precedent for this?

No, and I didn't propose we do that. But at some point for applications like ThinLTO and PGO it seems that an inter-proc cost/information framework will be needed. ThinLTO needs to summarize what functions do. Imagine extending ThinLTO summaries to include information like "simplifies a lot if 2nd argument is false", "returns a number in range [0, 4]", etc. So it feels that eventually we will need an additional set of information to be attached to functions that we can probably not cover with the attribute framework we have today.

The way this is structured in ThinLTO is using in-memory analysis that don't attach their result to the IR. This is then part of the summaries directly. These are just a serialization of the in-memory analysis, they don't need any IR construct like attribute or metadata.

Let me maybe zoom out and give a different perspective:
Right now call site and function attributes are an AND of predicates that are always guaranteed to hold for that specific call site or for all call sites, respectively.
Predicates include things like doesn't write to memory, only writes to memory that is not observable, etc. Using attributes we can state that several of these predicates hold.
In the ideal world, predicates wouldn't overlap, although since we can only state ANDs of predicates and not ORs, some overlap may be need in practice.

It behaves as an or right now? Last time I checked the implementation, call sites just scan its attributes and then check the set on the declaration

arsenm commandeered this revision.Mar 27 2017, 9:21 AM
arsenm updated this revision to Diff 93142.
arsenm added a reviewer: tstellarAMD.

Disallow on call sites

sanjoy added a comment.EditedMar 27 2017, 3:56 PM

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute.

A lot of these (but not all of these) amount to "you cannot clone speculatable", ie if you clone the call, you must remove the attribute.

I believe the set of conditions under which you could clone the attribute are:

  1. The new call is CDEQ the original call (IE the set of conditions under which it executes is identical). IF you are cloning from one function to another, it must be CDEQ using the interprocedural control dependence.
  2. The arguments are identical.
  3. The function called is identical or marked speculatable

Note this is not entirely shocking,.

(Caveat: I think we're arguing some subtle points here, so apologies if I misunderstood your intent.)

I think (1) is somewhat shocking. I'd say normally we follow a weaker constraint: "The new call is executed only if the old call was executed", not "The new call is executed if and only [edit: previously this incorrectly said "only if and only if"] if the old call was executed".

For instance, if we unroll loops (say by a factor of 2):

for (i = 0; i < N; i++)
  f(i) readonly;      // X

it is reasonable that the result be:

for (i = 0; i < N / 2; i += 2) {
  f(i) readonly;       // A
  f(i + 1) readonly;   // B
}

if (i < N)
  f(i++) readonly;     // C

Assuming I correctly understood what you meant by "the set of conditions under which it executes is identical", we won't be able to keep any of the readonly attributes.

  • X is executed for all i < N.
  • A is executed for all even i if N > 1
  • B is executed for all odd i if N > 1
  • C is executed if N is 1.

Of course, in the program trace the instances in which f was executed stay the same in the pre-unroll and post-unroll program; and that's related to what I was trying to say earlier: with the speculatable attribute, the behavior of a program is no longer a property of its trace -- there could be a "hidden" speculatable call somewhere that influences the behavior of the program without actually being executed (in the initial program), by getting hoisted into the executable bits of the program. In order to mark these programs as "buggy", we will have to rule such "hidden" speculatable calls (that nevertheless have side effects) as tainting the program with UB, despite not being present in the program trace.

the "no cloning" is true of other attributes (you can't clone and apply readonly like is done in the devirt example above)

Why can't you apply readonly to the devirtualization example?

it's just that, being an attribute about control dependence, the effects relate to control dependence.

Yes -- IMO that's the key problem with speculatable. Since it says "there is no control dependence", we cannot apply it in a control dependent manner.

I agree, however, that we need to think carefully about how to define what speculatable means on an individual call site. Perhaps they're like convergent functions in this regard: you can't introduce new control dependencies (at least not in general).

Definitely true.

Either the CDEQ set of the call must not change , or you must be able to prove that changes cannot impact the function (IE you don't make it any less dead, etc).

What exactly do you mean by "the CDEQ set"? I could not find anything easily on Google. [edit: by CDEQ did you mean the cdequiv (control dependence equivalent) set? If so I think I know what you mean.]

Btw, is "make it any less dead" == "speculatively execute it", or something more subtle?

const int foo = bar = fred = 0;
if (foo)
if (bar)
if (fred)
   call baz() speculatable

You can prove hoisting into if (foo) cannot make it any less dead.

In the ideal world, predicates wouldn't overlap, although since we can only state ANDs of predicates and not ORs, some overlap may be need in practice.

It behaves as an or right now? Last time I checked the implementation, call sites just scan its attributes and then check the set on the declaration

That behavior precisely means it is an AND -- a call site marked as readonly nounwind is both readonly AND nounwind.

One concrete example illustrating Nuno's point is the dereferenceable_or_null attribute. If we had an is_null attribute (or emulated it via !range), then we would not need a new attribute to denote that a value is either dereferenceable or null.

In the ideal world, predicates wouldn't overlap, although since we can only state ANDs of predicates and not ORs, some overlap may be need in practice.

It behaves as an or right now? Last time I checked the implementation, call sites just scan its attributes and then check the set on the declaration

That behavior precisely means it is an AND -- a call site marked as readonly nounwind is both readonly AND nounwind.

One concrete example illustrating Nuno's point is the dereferenceable_or_null attribute. If we had an is_null attribute (or emulated it via !range), then we would not need a new attribute to denote that a value is either dereferenceable or null.

OK, I was thinking about it like: Is this call readnone? It is if if the call site OR the declaration is readnone.

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute.

A lot of these (but not all of these) amount to "you cannot clone speculatable", ie if you clone the call, you must remove the attribute.

I believe the set of conditions under which you could clone the attribute are:

  1. The new call is CDEQ the original call (IE the set of conditions under which it executes is identical). IF you are cloning from one function to another, it must be CDEQ using the interprocedural control dependence.
  2. The arguments are identical.
  3. The function called is identical or marked speculatable

Note this is not entirely shocking,.

(Caveat: I think we're arguing some subtle points here, so apologies if I misunderstood your intent.)

I think (1) is somewhat shocking. I'd say normally we follow a weaker constraint: "The new call is executed only if the old call was executed", not "The new call is executed if and only [edit: previously this incorrectly said "only if and only if"] if the old call was executed".

For instance, if we unroll loops (say by a factor of 2):

for (i = 0; i < N; i++)
  f(i) readonly;      // X

it is reasonable that the result be:

for (i = 0; i < N / 2; i += 2) {
  f(i) readonly;       // A
  f(i + 1) readonly;   // B
}

I'm presuming you meant this loop to go half as much but still cover the same values in the same order in calls to f?
(it doesn't)

If so, agree we allow it in practice, butt hat's in practice
Here is an, IMHO, valid callsite attribute readonly implementation of f, not legally doable in C, but we're talking about llvm IR anyway.

void f(int a)
{
  // go scrobbling through instruction stream to see what the increment is
  if (increment == 2)
    write some memory
  otherwise
    do nothing
}

Now, obviously, i'm cheating, but the point is, f can read whatever state it wants here, and detect what you've done, and not be readonly in that case.
Again, if i'm understanding things right (the langref really is somewhat confusing to me here, but i assume it's legal to have not be readonly except for that call. That's what it seems like we are talking about here)

Given they can also unwind, and read state of other functions, i'm pretty positive you can come up with implementations easier than what i just did to detect and write memory in the two cases differently, without resorting to this trickery.

If you meant to make the loop iteration space change, i disagree with you that it's allowed even harder :)

Assuming I correctly understood what you meant by "the set of conditions under which it executes is identical", we won't be able to keep any of the readonly attributes.

  • X is executed for all i < N.
  • A is executed for all even i if N > 1
  • B is executed for all odd i if N > 1
  • C is executed if N is 1.

Of course, in the program trace the instances in which f was executed stay the same in the pre-unroll and post-unroll program; and that's related to what I was trying to say earlier: with the speculatable attribute, the behavior of a program is no longer a property of its trace -- there could be a "hidden" speculatable call somewhere that influences the behavior of the program without actually being executed (in the initial program), by getting hoisted into the executable bits of the program. In order to mark these programs as "buggy", we will have to rule such "hidden" speculatable calls (that nevertheless have side effects) as tainting the program with UB, despite not being present in the program trace.

Again, this is literally what it means to play with control dependence, so i *don't* find it shocking.

the "no cloning" is true of other attributes (you can't clone and apply readonly like is done in the devirt example above)

Why can't you apply readonly to the devirtualization example?

Sorry, meant to delete it. you can because it's dead code, otherwise, you can't.

it's just that, being an attribute about control dependence, the effects relate to control dependence.

Yes -- IMO that's the key problem with speculatable. Since it says "there is no control dependence", we cannot apply it in a control dependent manner.

Sure, i'd agree with that, but you really cannot work around that.

I agree, however, that we need to think carefully about how to define what speculatable means on an individual call site. Perhaps they're like convergent functions in this regard: you can't introduce new control dependencies (at least not in general).

Definitely true.

Either the CDEQ set of the call must not change , or you must be able to prove that changes cannot impact the function (IE you don't make it any less dead, etc).

What exactly do you mean by "the CDEQ set"? I could not find anything easily on Google. [edit: by CDEQ did you mean the cdequiv (control dependence equivalent) set? If so I think I know what you mean.]

yes, cdequiv, sorry. it's called CDEQ by some papers, and cdequiv by the other.

It's the equivalence classes of the control dependence graph, basically. The set of blocks/statements/etc (depends on what level you look) that execute only under the same conditions.

Btw, is "make it any less dead" == "speculatively execute it", or something more subtle?

yes, that's what we'd call it. but it's not executed in any case, so i guess it'd be "speculatively hoist it".
:)

const int foo = bar = fred = 0;
if (foo)
if (bar)
if (fred)
   call baz() speculatable

You can prove hoisting into if (foo) cannot make it any less dead.

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute.

A lot of these (but not all of these) amount to "you cannot clone speculatable", ie if you clone the call, you must remove the attribute.

I believe the set of conditions under which you could clone the attribute are:

  1. The new call is CDEQ the original call (IE the set of conditions under which it executes is identical). IF you are cloning from one function to another, it must be CDEQ using the interprocedural control dependence.
  2. The arguments are identical.
  3. The function called is identical or marked speculatable

Note this is not entirely shocking,.

(Caveat: I think we're arguing some subtle points here, so apologies if I misunderstood your intent.)

I think (1) is somewhat shocking. I'd say normally we follow a weaker constraint: "The new call is executed only if the old call was executed", not "The new call is executed if and only [edit: previously this incorrectly said "only if and only if"] if the old call was executed".

For instance, if we unroll loops (say by a factor of 2):

for (i = 0; i < N; i++)
  f(i) readonly;      // X

it is reasonable that the result be:

for (i = 0; i < N / 2; i += 2) {
  f(i) readonly;       // A
  f(i + 1) readonly;   // B
}

I'm presuming you meant this loop to go half as much but still cover the same values in the same order in calls to f?
(it doesn't)

Do'h. I did a split-brain there. :)

If so, agree we allow it in practice, butt hat's in practice
Here is an, IMHO, valid callsite attribute readonly implementation of f, not legally doable in C, but we're talking about llvm IR anyway.

void f(int a)
{
  // go scrobbling through instruction stream to see what the increment is
  if (increment == 2)
    write some memory
  otherwise
    do nothing
}

Now, obviously, i'm cheating, but the point is, f can read whatever state it wants here, and detect what you've done, and not be readonly in that case.

I don't think we can reasonably allow functions to base their behavior on the instruction stream of their callers (or anywhere else, for that matter), in IR or in C.

For instance, if we allowed things like that, even basic optimizations like this:

void f() {
  int a = 2, b = 3;
  f(a + b);
}

to

void f() {
  f(5);
}

would not be valid, since an implementation of f could be:

void f() {
  cond = does the caller's instruction stream have an add instruction?
  if (cond)
    print("hi");
}

and we'd change observable behavior by optimizing out the add instruction.

Again, if i'm understanding things right (the langref really is somewhat confusing to me here, but i assume it's legal to have not be readonly except for that call. That's what it seems like we are talking about here)

I think code like this is fine:

void f(bool do_store, int* ptr) {
  if (do_store)
    *ptr = 42;
}

...
f(false, ptr) readnone
...

The example you gave above seems odd to me because the callee is has different behavior based on the instruction stream of the caller, but the conditionally-readonly aspect of it is fine.

Given they can also unwind, and read state of other functions, i'm pretty positive you can come up with implementations easier than what i just did to detect and write memory in the two cases differently, without resorting to this trickery.

If there was a legal way to detect a difference between the two cases above, then loop unrolling would be illegal. So even if there is some way to detect the difference, I'd consider that a bug in the LLVM IR semantics since it disallows an important optimization.

If you meant to make the loop iteration space change, i disagree with you that it's allowed even harder :)

Yes, we can't transform the iteration space, since if instead of executing f(0) first, we execute f(N - 1) first; we'd break an f that was implemented as:

void f(int i) {
  // I don't think this is possible to do without writing memory in C++,
  // but it may be possible in other languages.
  throw i;
}

in which case the pre-transformed program would throw 0, but the post-transform program would throw N - 1.

Assuming I correctly understood what you meant by "the set of conditions under which it executes is identical", we won't be able to keep any of the readonly attributes.

  • X is executed for all i < N.
  • A is executed for all even i if N > 1
  • B is executed for all odd i if N > 1
  • C is executed if N is 1.

Of course, in the program trace the instances in which f was executed stay the same in the pre-unroll and post-unroll program; and that's related to what I was trying to say earlier: with the speculatable attribute, the behavior of a program is no longer a property of its trace -- there could be a "hidden" speculatable call somewhere that influences the behavior of the program without actually being executed (in the initial program), by getting hoisted into the executable bits of the program. In order to mark these programs as "buggy", we will have to rule such "hidden" speculatable calls (that nevertheless have side effects) as tainting the program with UB, despite not being present in the program trace.

Again, this is literally what it means to play with control dependence, so i *don't* find it shocking.

I'm not entirely sure what you mean here -- did you mean that it is okay to have stuff outside the program trace affect program definedness? Or did you mean the converse?

the "no cloning" is true of other attributes (you can't clone and apply readonly like is done in the devirt example above)

Why can't you apply readonly to the devirtualization example?

Sorry, meant to delete it. you can because it's dead code, otherwise, you can't.

it's just that, being an attribute about control dependence, the effects relate to control dependence.

Yes -- IMO that's the key problem with speculatable. Since it says "there is no control dependence", we cannot apply it in a control dependent manner.

Sure, i'd agree with that, but you really cannot work around that.

Yes!

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute.

A lot of these (but not all of these) amount to "you cannot clone speculatable", ie if you clone the call, you must remove the attribute.

I believe the set of conditions under which you could clone the attribute are:

  1. The new call is CDEQ the original call (IE the set of conditions under which it executes is identical). IF you are cloning from one function to another, it must be CDEQ using the interprocedural control dependence.
  2. The arguments are identical.
  3. The function called is identical or marked speculatable

Note this is not entirely shocking,.

(Caveat: I think we're arguing some subtle points here, so apologies if I misunderstood your intent.)

I think (1) is somewhat shocking. I'd say normally we follow a weaker constraint: "The new call is executed only if the old call was executed", not "The new call is executed if and only [edit: previously this incorrectly said "only if and only if"] if the old call was executed".

For instance, if we unroll loops (say by a factor of 2):

for (i = 0; i < N; i++)
  f(i) readonly;      // X

it is reasonable that the result be:

for (i = 0; i < N / 2; i += 2) {
  f(i) readonly;       // A
  f(i + 1) readonly;   // B
}

I'm presuming you meant this loop to go half as much but still cover the same values in the same order in calls to f?
(it doesn't)

Do'h. I did a split-brain there. :)

If so, agree we allow it in practice, butt hat's in practice
Here is an, IMHO, valid callsite attribute readonly implementation of f, not legally doable in C, but we're talking about llvm IR anyway.

void f(int a)
{
  // go scrobbling through instruction stream to see what the increment is
  if (increment == 2)
    write some memory
  otherwise
    do nothing
}

Now, obviously, i'm cheating, but the point is, f can read whatever state it wants here, and detect what you've done, and not be readonly in that case.

I don't think we can reasonably allow functions to base their behavior on the instruction stream of their callers (or anywhere else, for that matter), in IR or in C.

As mentioned, i don't think that's strictly necessary to screw up what you did. But i'm willing to admit those are likely semantic bugs.
:)
In any case, I was just half-pointing out that i don't think the semantics of the other attribute are so cut and dried in their control dependence, etc, when applied to callsites, that speculatable on callsites is that bad.

For instance, if we allowed things like that, even basic optimizations like this:

void f() {
  int a = 2, b = 3;
  f(a + b);
}

to

void f() {
  f(5);
}

would not be valid, since an implementation of f could be:

void f() {
  cond = does the caller's instruction stream have an add instruction?
  if (cond)
    print("hi");
}

and we'd change observable behavior by optimizing out the add instruction.

Right, my point was more on the side of "we are pretending we've got the semantics of these other instructions down really well and they are easy to understand", when, honestly, there is nothing in the langref that outlaws what i wrote :)

The closest it comes is "when called with the same set of arguments and global state"
But you aren't calling it with the same global state depending on the definition of global state, which .. we don't define anywhere.

The example you gave above seems odd to me because the callee is has different behavior based on the instruction stream of the caller, but the conditionally-readonly aspect of it is fine.

As mentioned I'm pretty positive i could construct an example just as bad given the limitations expressed in the langref :)
I

Given they can also unwind, and read state of other functions, i'm pretty positive you can come up with implementations easier than what i just did to detect and write memory in the two cases differently, without resorting to this trickery.

If there was a legal way to detect a difference between the two cases above, then loop unrolling would be illegal.

FWIW, i'm honestly too lazy to go construct one, but i'm basically positive you can. At least, legal gievn the semantics and restrictions on LLVM IR as it exists today, becuase LLVM IR is not *that* well defined.
I would not believe it well-defined in C, but only because unwinders, etc are usually not well defined.
This is basically an exercise in the moral equivalent of godel numbering.
I'm going to go with "yes"
Is it possible to define our IR well enough to outlaw that: Maybe?
I'm not going to tug at this thread any longer though :P

So even if there is some way to detect the difference, I'd consider that a bug in the LLVM IR semantics since it disallows an important optimization.

I guess my basic point was exactly this - the semantics of our other attributes, compared to speculatable, are not so easy or non-shocking that i think artificially limiting speculatable to non-callsites only makes sense. Particularly becuase those attributes may not be so well defined that their behavior makes complete and total sense when applied to callsites
Instead, when we discover they are wrong we fix them. So why not just mark speculatable on callsites as experimental , see what happens, and go from there?

Bottom line: IMHO, We are unlikely to ever figure out the specific set of issues we will hit on callsites with speculatable if we don't allow it there. Yeah, we'll figure out if it's completely broken if we don't, but we all seem to agree that it's probably not?

I'm going to drop the rest of this, fwiw, because i don't think it's worth pushing the readonly example any further.
I'm positive i can come up with a program that meets whatever requirements you throw at it and still breaks in your example, precisely because we don't define our IR well enough to prevent it (I'm on vacation, so i'm not going to think about whether i could prove that you can't make the IR well defined enough to prevent it and still have it express useful programs :P)

Right, my point was more on the side of "we are pretending we've got the semantics of these other instructions down really well and they are easy to understand", when, honestly, there is nothing in the langref that outlaws what i wrote :)

The closest it comes is "when called with the same set of arguments and global state"
But you aren't calling it with the same global state depending on the definition of global state, which .. we don't define anywhere.

Of course. I'm not claiming that the LangRef is a mathematically precise document (though parts of it could be, if we incorporated Vellvm). However, the converse of "it is mathematically precise" is not "anything goes". :)

In other words, I think despite the semantics of LLVM IR being only informally specified, we can still reasonably draw *some* boundaries on what the semantics of various constructs *can* be, based on the optimizations we want to be correct.

So even if there is some way to detect the difference, I'd consider that a bug in the LLVM IR semantics since it disallows an important optimization.

I guess my basic point was exactly this - the semantics of our other attributes, compared to speculatable, are not so easy or non-shocking that i think artificially limiting speculatable to non-callsites only makes sense.

I'm not sure if I agree with the "artificially" characterization. My objection was based around (what I think are) concrete problems that we'll have if we allow this.

Particularly becuase those attributes may not be so well defined that their behavior makes complete and total sense when applied to callsites

I agree we have room to improve today. However, I don't see how two wrongs make a right here.

Instead, when we discover they are wrong we fix them. So why not just mark speculatable on callsites as experimental , see what happens, and go from there?
Bottom line: IMHO, We are unlikely to ever figure out the specific set of issues we will hit on callsites with speculatable if we don't allow it there. Yeah, we'll figure out if it's completely broken if we don't, but we all seem to agree that it's probably not?

As I said, my objection was based on my opinion that they're already discovered to be wrong. Of course, my (counter-)examples may not stand up to scrutiny, in which case my objection is moot.

Having said all this: while I won't exactly be happy with it, I would be fine with allowing speculatable on normal call sites if that helps some really compelling use case. But I want to make it clear that we're making a tradeoff here.

I'm going to drop the rest of this, fwiw, because i don't think it's worth pushing the readonly example any further.
I'm positive i can come up with a program that meets whatever requirements you throw at it and still breaks in your example, precisely because we don't define our IR well enough to prevent it (I'm on vacation, so i'm not going to think about whether i could prove that you can't make the IR well defined enough to prevent it and still have it express useful programs :P)

So even if there is some way to detect the difference, I'd consider that a bug in the LLVM IR semantics since it disallows an important optimization.

I guess my basic point was exactly this - the semantics of our other attributes, compared to speculatable, are not so easy or non-shocking that i think artificially limiting speculatable to non-callsites only makes sense.

I'm not sure if I agree with the "artificially" characterization. My objection was based around (what I think are) concrete problems that we'll have if we allow this.

Sorry, let me try to explain:
You have identified concrete problems.
IMHO, your concrete problems are not with the definition of the attribute, but instead are "things wanting to optimize callsites and keep this attribute have to understand control dependence".
But this is pretty much a truism to me given *any* sane definition of the attribute on callsites.
Additionally, IMHO, there can be *no* sensible way to define this attribute such that the problems go away.
So from my perspective, the concrete problems you've identified are going to get solved in exactly the same way whether we add them for callsites today, tomorrow, or five years from now.
Hence i see the limitation as fairly artificial. We aren't going solve these problems other than by auditing all the callsite optimizations and either making them drop this attribute, or deeply understand control dependence enough to do a correct thing.

As I said, my objection was based on my opinion that they're already discovered to be wrong.

I guess this is where we disagree.
I do not believe you have pointed out that there is something wrong with the semantic, at least in a way that you could solve.
I believe you have pointed out it will interact badly if we don't audit our code.

I'd be fine if our answer was "hey, start a patch with all the fixes necessary to make this work properly on callsites".
But otherwise, what's the path forward?
IE what do you see as a definition of these attributes on callsites that is sane but doesn't have the issues you foresee?

So from my perspective, the concrete problems you've identified are going to get solved in exactly the same way whether we add them for callsites today, tomorrow, or five years from now.

I agree.

Hence i see the limitation as fairly artificial. We aren't going solve these problems other than by auditing all the callsite optimizations and either making them drop this attribute, or deeply understand control dependence enough to do a correct thing.

I guess my English isn't strong enough to quite grok that use of "artificial". :) I partially agree with "We aren't going solve these problems other ... enough to do a correct thing.", but please see below.

As I said, my objection was based on my opinion that they're already discovered to be wrong.

I guess this is where we disagree.
I do not believe you have pointed out that there is something wrong with the semantic, at least in a way that you could solve.
I believe you have pointed out it will interact badly if we don't audit our code.

What I was *trying* to show was that the context-sensitive-speculatable semantic introduces a fundamentally new behavior -- that dead code can (now) affect the semantics of a program. I don't think we have this in LLVM today (today the behavior of an LLVM program can be deduced solely from the *trace* of the instructions that were actually executed), and the impact of changing LLVM to allow dead code to have this "action at a distance" is unknown to me.

That ^ is really the core of my objection. Almost everything else I said can be traced back to the above.

And making dead code affect behavior sets of all sorts of alarms in my head. I've mentioned some of the more concrete problems earlier, but the fundamental thing that is bugging me is that if (false) { X } will not be a NOP for some values of X.

I guess what you're saying is that there is nothing fundamental about the dead-code-influencing-behavior problem, and that it is just a matter of fixing passes to do the right thing?

If so, I do not have a better answer to that than a vague sense of unease. :)

I'd be fine if our answer was "hey, start a patch with all the fixes necessary to make this work properly on callsites".
But otherwise, what's the path forward?
IE what do you see as a definition of these attributes on callsites that is sane but doesn't have the issues you foresee?

If I was able to sell you on semantically-relevant-dead-code is a bad thing, then there is no path forward with a call specific speculative that is as strong as is implied in this patch. We can probably do something weaker though (say allow it only on calls with all constant arguments).

If you're okay with semantically-relevant-dead-code and its consequences, then it is a matter of fixing all of the passes that assume arbitrary dead code is "okay".

I'm leaning towards the former, but I'd understand if people wanted to do the latter.

FWIW I don't see much of a practical need for speculatable to apply to individual call sites. There are a few cases where I think it might be useful on specific intrinsic call sites but aren't really interesting enough to worry about.

arsenm updated this revision to Diff 97117.Apr 28 2017, 10:13 AM

Only allow for intrinsics

Only allow for intrinsics

Why only for intrinsics? I thought we had concluded that we'd only allow it for declarations and not on call sites (which may technically mean on call sited but only matching the declaration). I think it is important that we can apply it to regular functions.

Only allow for intrinsics

Why only for intrinsics? I thought we had concluded that we'd only allow it for declarations and not on call sites (which may technically mean on call sited but only matching the declaration). I think it is important that we can apply it to regular functions.

I think so too, but @sanjoy said to restrict it to intrinsics for now. Intrinsics are the important part. I also ran into one minor issue with the call site restriction in D32655 where speculatable intrinsic calls are sometimes replaced with non-speculatable libcalls.

Why only for intrinsics? I thought we had concluded that we'd only allow it for declarations and not on call sites (which may technically mean on call sited but only matching the declaration). I think it is important that we can apply it to regular functions.

I think a general speculatable attribute that is allowed only on functions decls is *less problematic*[0] that a context sensitive one, but I think speculatable intrinsics are clearly okay. Therefore my opinion is (which I expressed on IRC to Matt) is to first land the intrinsic variant of this, since that's what he's blocked on; and then we can go ahead with more aggressive variants on subsequent patches.

[0] https://reviews.llvm.org/D20116#709352

Only allow for intrinsics

Why only for intrinsics? I thought we had concluded that we'd only allow it for declarations and not on call sites (which may technically mean on call sited but only matching the declaration). I think it is important that we can apply it to regular functions.

I think so too, but @sanjoy said to restrict it to intrinsics for now.

He suggested that, and I said that I did not want that restriction, and he said that he was fine with that:

I'm fine with restricting speculatable to only appear where it appears on a function declaration/definition unless/until we can figure out semantics for it on a call site in general. I don't want it restricted to intrinsics specifically, but I don't think that's the problem.

Only function-level speculatable (and no call site specific speculatable) seems less problematic. It would mean having a function declaration or definition incorrectly marked as speculatable, even if it is never called, is UB; but I can live with that as long as that is properly documented.

Intrinsics are the important part.

Not to me ;)

I also ran into one minor issue with the call site restriction in D32655 where speculatable intrinsic calls are sometimes replaced with non-speculatable libcalls.

This seems like it is an unfortunate information loss that we should fix, but why is that a problem?

Why only for intrinsics? I thought we had concluded that we'd only allow it for declarations and not on call sites (which may technically mean on call sited but only matching the declaration). I think it is important that we can apply it to regular functions.

I think a general speculatable attribute that is allowed only on functions decls is *less problematic*[0] that a context sensitive one, but I think speculatable intrinsics are clearly okay. Therefore my opinion is (which I expressed on IRC to Matt) is to first land the intrinsic variant of this, since that's what he's blocked on; and then we can go ahead with more aggressive variants on subsequent patches.

[0] https://reviews.llvm.org/D20116#709352

Okay, unfortunately, this is only useful to me if we allow it on function declarations, and I don't see how kicking this can down the road helps in this regard. If he adds this for intrinsics and I immediately turn around a propose a patch to remove the restriction, that's a waste of everyone's time. I thought that we had agreed that allowing it on function declarations was okay so long as we documented the fact that this introduces potential UB just by declaring such a function, so let's do that.

Okay, unfortunately, this is only useful to me if we allow it on function declarations

I had somehow missed this bit ^ and I was under the impression that the main motivation for a general attribute was more completeness than anything else.

I thought that we had agreed that allowing it on function declarations was okay so long as we documented the fact that this introduces potential UB just by declaring such a function, so let's do that.

I had not phrased my concession clearly. :)

Just to be clear, I don't think they're okay, but I can live with them in the spirit of begin pragmatic.

So yes, if this attribute will be useless to you without the generalization to non-intrinsics, then I won't object to checking in the previous version of this patch.

Okay, unfortunately, this is only useful to me if we allow it on function declarations

I had somehow missed this bit ^ and I was under the impression that the main motivation for a general attribute was more completeness than anything else.

No problem.

I thought that we had agreed that allowing it on function declarations was okay so long as we documented the fact that this introduces potential UB just by declaring such a function, so let's do that.

I had not phrased my concession clearly. :)

Just to be clear, I don't think they're okay, but I can live with them in the spirit of begin pragmatic.

I understand. In a theoretical sense, I see adding them on function declarations as the same as adding them to intrinsics. Obviously there are practical differences, however, I'm not sure that in practice you're more likely to introduce a call to an arbitrary function, just because it happens to have been declared as speculatable, than you are to an intrinsic, just because it happens to be similarly available. I can't imagine any general transformation doing anything for each declared function just on the basis of it being declared. You'd need to be operating in a very restricted environment for that to make sense, and in such an environment, you should reasonably have the power not to mark functions as speculatable in a problematic way.

So yes, if this attribute will be useless to you without the generalization to non-intrinsics, then I won't object to checking in the previous version of this patch.

Thanks!

arsenm closed this revision.Apr 28 2017, 1:38 PM

r301680