Page MenuHomePhabricator

[LangRef] Document forward-progress requirement
AbandonedPublic

Authored by nikic on Aug 4 2019, 9:05 AM.

Details

Summary

Followup on a recent discussion on D59978: It seems that the current consensus is that LLVM has a forward-progress requirement and also intends to keep it in the future. llvm.sideeffect is not a temporary workaround, but the final solution. While this is pretty disappointing for non-C++ frontends, let's make sure this is at least documented in LangRef.

I haven't found a great place to put this, so I created a separate section

Diff Detail

Event Timeline

nikic created this revision.Aug 4 2019, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2019, 9:05 AM

I was under impression that the exact opposite was the status,
even if it wasn't consistently followed everywhere.

nikic added a comment.Aug 4 2019, 10:26 AM

I was under impression that the exact opposite was the status,
even if it wasn't consistently followed everywhere.

That was also my own impression as well, but @jdoerfert and @Meinersbur argued that this is the the current consensus. I would be very happy if this isn't the case. We should clarify this one way or another in LangRef though, to avoid future confusion on this point.

For what it's worth, actually using llvm.sideeffect for this purpose results in pretty catastrophic regressions right now. I think things wouldn't be quite as bad if it was only needed to control loops, but the need to also account for recursion means that llvm.sideeffect ends up literally everywhere. If this is indeed the way forward, we should probably add a sideeffect coalescing pass and see if that makes this at least somewhat viable.

I was under impression that the exact opposite was the status,
even if it wasn't consistently followed everywhere.

That was also my own impression as well, but @jdoerfert and @Meinersbur argued that this is the the current consensus. I would be very happy if this isn't the case. We should clarify this one way or another in LangRef though, to avoid future confusion on this point.

My argument comes from the current behavior:
We delete non-side-effect loops and recursion.

For what it's worth, actually using llvm.sideeffect for this purpose results in pretty catastrophic regressions right now. I think things wouldn't be quite as bad if it was only needed to control loops, but the need to also account for recursion means that llvm.sideeffect ends up literally everywhere. If this is indeed the way forward, we should probably add a sideeffect coalescing pass and see if that makes this at least somewhat viable.

We could probably improve the handling of llvm.sideeffect in certain situations. Especially once we propagate information about accessed (inaccessible) memory locations (D60077 needs rebase though), we should see better results.
I hope we will then have interactions only with other "inaccessible" intrinsics which should not be too bad.

rkruppe added a subscriber: rnk.Aug 4 2019, 11:41 AM

Clearly the current semantics of LLVM IR have a forward progress guarantee and I agree that should be documented.

The question of how things should be is harder, and I don't know if there really is any community consensus about it. But while the topic is alive again, I want to take the opportunity to point out that we have other options than just not doing the optimizations or baking forward progress into LLVM IR. The [[ https://lists.llvm.org/pipermail/llvm-dev/2017-October/118558.html | more invasive proposal discussed alongside llvm.sideeffect ]] was:

  • define LLVM IR to not have a forward progress guarantee as baseline, but add a function attribute to add it back in
  • let frontends for languages with forward progress guarantee use this attribute to opt into the optimizations in question
  • use a combination of that attribute and llvm.sideeffect where more fine-grained control is needed (such as C, where forward progress is the default but some loops are exempt)

Other than side-stepping most concerns about the optimization impact of the intrinsic, this also avoids overly privileging C's particulars in the IR to the (potential) demerit of every other language. As @rnk said in the comment that originally proposed this:

The forward progress required by C++ is the odd duck here. We should ask frontends to put a function attribute on functions that they want to receive this kind of optimization.

uenoku added a subscriber: uenoku.Aug 4 2019, 5:51 PM
reames accepted this revision.Aug 5 2019, 3:07 PM

LGTM w/suggested minor changes.

You might want to revise the infinite loop example used in the phi instruction section to add a call to llvm.side_effect for clarity and seachability.

llvm/docs/LangRef.rst
2501

I'd suggest a tweak in wording here. Rather than explicitly advising llvm.side_effect, I'd suggest phrasing this as requiring the insertion of some synthetic side effect, and then mention llvm.side_effect existing specifically for this purpose.

The hair I'm splitting is that if you already need another synthetic side effect - say a GC or deoptimization safepoint opportunity - then you don't actually need the llvm.side_effect intrinsic *too*.

This revision is now accepted and ready to land.Aug 5 2019, 3:07 PM
Meinersbur accepted this revision.Aug 5 2019, 3:27 PM
Meinersbur added inline comments.
llvm/docs/LangRef.rst
2496

[typo] accesses to or accessing ...

"inaccessable memory" is a strange example, it results in an access violation and most often termination of the program, but I guess it can be called a side-effect/progress. However, the compiler may optimize away unused accesses to non-volatile memory such that the side-effect is not guaranteed to happen. Maybe include system calls instead?

jdoerfert added inline comments.Aug 5 2019, 3:42 PM
llvm/docs/LangRef.rst
2496

"inaccessable memory" is not "invalid memory" but just memory not accessible through direct means from the module. (llvm.side_effect is modeled as an access to inaccessible memory only)

nikic updated this revision to Diff 213895.Aug 7 2019, 8:16 AM
nikic marked an inline comment as done.

Clarify that any synthetic side-effect is sufficient. Add llvm.sideeffect to infinite loop in phi example.

jfb added inline comments.Aug 7 2019, 9:29 AM
llvm/docs/LangRef.rst
2496

It would be better to clarify what "inaccessible memory" means here, or refer to a definition.

2497

A relaxed atomic load is a sufficient side-effect?

jfb added a subscriber: __simt__.Aug 7 2019, 9:29 AM
efriedma added inline comments.Aug 7 2019, 10:49 AM
llvm/docs/LangRef.rst
2497

We want to consider a busy-wait loop like "while (foo){}", where "foo" is an atomic variable, as making forward progress; otherwise, there isn't any way to write such a loop. We also want to allow erasing dead atomic loads.

We might need to modify the definition of a side-effect to make this work.

jfb added inline comments.Aug 7 2019, 1:05 PM
llvm/docs/LangRef.rst
2497

Right, I'm trying to understand what guarantees LLVM offers, and if they're the same or more than C++.
Where it makes sense I'd like this documented, with references.

http://eel.is/c++draft/intro.progress
http://eel.is/c++draft/intro.multithread#intro.races-3

Somewhat related:
I was hoping that the following deduction will be valid:

If all instructions in the loop do not synchronize (see `nosync`) then the loop cannot be infinite.

The reason is that any "progress" need to be observable and for that it needs to be "synchronized".
Now, the nosync definition might not be up to this (I hope it is though), but that is a different issue.
Having the above reasoning will (I think) allow us to deduce willreturn in the presence of various "common" loops.

llvm/docs/LangRef.rst
10080

Add a link to the forward progress requirement section please.

reames added a comment.Aug 9 2019, 8:34 PM

Just a note of caution: I see discussion starting on cornercases. I would strongly suggest *not* blocking this patch on getting a fully consistent definition. Having something documented which is "mostly right" is better than no documentation at all. (Adding a explicit marker of subtleties under discussion is good.) It's really tempting as reviewers to keep looking for the subtle problems, but remember, the confusion which keeps arising is over the basic need for a side effect, not the exact definition thereof!

jfb added a comment.EditedAug 9 2019, 8:56 PM

Just a note of caution: I see discussion starting on cornercases. I would strongly suggest *not* blocking this patch on getting a fully consistent definition. Having something documented which is "mostly right" is better than no documentation at all. (Adding a explicit marker of subtleties under discussion is good.) It's really tempting as reviewers to keep looking for the subtle problems, but remember, the confusion which keeps arising is over the basic need for a side effect, not the exact definition thereof!

I have to disagree strongly. C++ has a forward progress definition and it sounds like we’re codifying something derived from it for llvm IR.

Now say I have code with:

bool ready;
// ...
while (!ready)
  __atomic_fence(RELAXED); // However my languages does this.

Does this need a side effect? Per the current definition, no. Per C++, yes.

So codifying something “mostly ok” is a terrible idea because any further change will affect semantics we’re promised people.

Right now we don’t promise anything. It’s bad! Let’s fix it. But let’s fix it for real, not a band-aid.

Does this need a side effect? Per the current definition, no. Per C++, yes.

What does "need a side-effect" mean here? Do you mean whether llvm.sideffect has to be added to avoid UB?
Per my reading of C++, an atomic fence is considered a side-effect for the purpose of forward progress (looks like an atomic operation to me, even if it is relaxed), so this loop is not UB and may not be optimized away.

use a combination of that attribute and llvm.sideeffect where more fine-grained control is needed (such as C, where forward progress is the default but some loops are exempt)

AFAIK, in C forward progress is the default for loops (except, as you said, if their condition is a constant). But I cannot see anything in the C standard that allows a forward progress assumption for recursion or goto or any other form of infinite execution (if they exist).

The question of how things should be is harder

Is it worth saying in the docs that this is subject to discussion, or so?
Basically I hope we can use this patch as the start of a discussion, not as a way of cementing the current semantics (which are a bad fit for every language except for C++).

RalfJung added inline comments.Aug 12 2019, 2:15 AM
llvm/docs/LangRef.rst
2497

A loop while (/* relaxed load */ == 0) {} followed by an acquire-fence should definitely not be optimized away. That is a perfectly reasonable pattern to establish synchronization though fences and relaxed accesses.

The C++ standard specifically says "atomic operations" in loops are good enough, and while one can argue about whether a relaxed load synchronizes (it sometimes does, as in the above), I think we can agree that it is an atomic operation?

Somewhat related:
I was hoping that the following deduction will be valid:

If all instructions in the loop do not synchronize (see `nosync`) then the loop cannot be infinite.

If we say relaxed atomics do sync (as opposed to nosync), this should work with the concerns raised (and with recursion).
Which would allow us to say, if it does not sync it must be finite if forward progress is enforced^.

^ So this is orthogonal to the discussion if we should have an attribute enforcing forward progress (which I think makes sense).

jfb added a comment.Aug 12 2019, 10:04 AM

Does this need a side effect? Per the current definition, no. Per C++, yes.

What does "need a side-effect" mean here? Do you mean whether llvm.sideffect has to be added to avoid UB?

I mean: if my language wants to generate a loop with a fence in it, and a non-atomic load, how do I get LLVM to generate that code without eliminating the code? Do I need llvm.sideeffect?

Per my reading of C++, an atomic fence is considered a side-effect for the purpose of forward progress (looks like an atomic operation to me, even if it is relaxed), so this loop is not UB and may not be optimized away.

A fence is not an atomic operation on its own in C++. See http://eel.is/c++draft/atomics.fences where it says "is a synchronization operation *if* [...]".
Relaxed atomics aren't synchronization operations in C++.

The question of how things should be is harder

Is it worth saying in the docs that this is subject to discussion, or so?
Basically I hope we can use this patch as the start of a discussion, not as a way of cementing the current semantics (which are a bad fit for every language except for C++).

I see two approaches:

  • Document what LLVM does today, get people to agree that it's actually what is done, leave a note saying "we'd like to change this", and propose changes from current practice in the RFC.
  • Write down what you think the rules should be, including any changes to LLVM's builtins and optimizations, and send and RFC proposing that what you think is right be adopted.

What I don't want is a hand wavy description of what could be. Being imprecise is strictly worst than what we have today (undocumented) because it gives the impression of being correct, and of being something we won't change in the future.

llvm/docs/LangRef.rst
2497

I asked about a loop with a relaxed atomic load, no acquire fence.

atomic<int> working;
// ...
while (working.load(memory_order_relaxed) != 0) {
  // Wait some more...
}

Can this be optimized out? Per C++... kinda?
What about the direct lowering to LLVM IR, does it need a side effect to avoid being optimized out?

For reference: wg21.link/n4455 and http://wg21.link/p0062

nikic planned changes to this revision.Aug 12 2019, 11:00 AM
nikic marked an inline comment as done.

This documentation should probably be combined with the introduction of the requires_forward_progress attribute. The current codebase in not consistent in this regard (e.g. we have code in LoopDeletion specifically assuming no forward-progress requirement). It would be better to condition that kind of code on an attribute rather than drop it entirely and add it back later.

llvm/docs/LangRef.rst
2497

The [intro.progress] link you provided considers "perform[ing] a synchronization operation or an atomic operation" (emphasis mine) as forward progress. Whether the relaxed atomic load synchronizes is besides the point: It is still an atomic operation and as such sufficient to constitute forward progress. This is also consistent with the definition given here. Am I misunderstanding your concern?

A fence is not an atomic operation on its own in C++. See http://eel.is/c++draft/atomics.fences where it says "is a synchronization operation *if* [...]".

Sure -- there is no operation that is guaranteed to synchronize in all executions, they all only "synchronize if".
But you are using "atomic operation" and "synchronization operation" as if they were the same term here. I do not think that is the case.
http://eel.is/c++draft/atomics.fences describes fences as part of the "atomic operations library", so I'd say we can safely assume that fences are an atomic operation.

Relaxed atomics aren't synchronization operations in C++.

Indeed relaxed atomics don't sync on their own. Just like fences. Relaxed atomics can, however, be crucial part of the synchronization between two fences. And is there any doubt that relaxed atomics are *atomic*? Because that's all it takes to count as forward progress (http://eel.is/c++draft/intro.progress):

perform a synchronization operation or an atomic operation.

So, as long as we agree that all fences and relaxed atomic accesses are all "atomic operations", whether they synchronize does not matter for the purposes of forward progress.

RalfJung added inline comments.Aug 15 2019, 11:18 PM
llvm/docs/LangRef.rst
2497

You asked about a loop with a fence here. That's what I responded to. Sorry, I am still very confused by the UI here, so I probably used the wrong button for that.

That loop, which is the same as what I drafted up there, certainly should not be optimized away. If the standard allowed this, it would be horribly broken. But as nikic pointed out, there is an atomic operation inside the loop, so we are good. It does not matter whether it is synchronizing.

The pointer of the "or synchronizing" in the standard is likely for per-thread compiler fences or so? I am not sure.

sassa.nf added a subscriber: sassa.nf.EditedAug 7 2020, 10:53 PM

I am unable to go through all the examples of infinite loops, but please consider that it is not so much about forward progress in many cases.

Pure computation case:

fn loop_it(n) {
   while(n != 0) n++; // modifying n is optional, but let's say it is modified
   return 0
}
  1. Abstract away a loop performing a computation into a function
fn get_n(n) {
   while (n != 0) n++;
   return n
}
  1. Call the function everywhere where the variable is accessed

Basically, allow to move the computation into the future, but so that it still happens-before the first use of the result of the computation.

If there are no accesses to the variable, there are no calls of the function. Throwing away the loop is valid, analysis of whether it is finite is not required.

Bad synchronization case:

A static variable reflects readiness, an incorrectly constructed loop polls.

bool ready;

fn bad_sync() {
   while(!ready) {}
   return true
}

This should be optimizable into (as in: allowed, but not required, to be optimized into)):

fn bad_sync() {
   if (!ready) {
      for(;;) {} // do not elide this loop for the purpose of this discussion
   }
   return true
}

Observe that badly synchronized code should expect that the changes to ready may never be observed, so will potentially hang.

Good synchronization case:

bool ready

fn good_sync() {
   while(!ready) {
      read_acquire_fence();
   }
   return true;
}

This should not be optimizeable: there is a requirement that either ready is observed to be true immediately, or the return is placed after some release fence executed by some other thread.

If you can prove there are no release fences executed by any thread some time after they change the value of ready, it should be optimizeable into the bad synchronization case.


I am not saying whether LLVM is not doing this. I am just stating that not all the examples may be sound programs, and not all the explanations rely on forward progress guarantee for a good reason.

ychen added a subscriber: ychen.Aug 7 2020, 11:18 PM

If there are no accesses to the variable, there are no calls of the function. Throwing away the loop is valid, analysis of whether it is finite is not required.

But that's exactly what the progress guarantee is about? In general, it is *not* valid for a compiler to remove an infinite loop. This could make previously dead code live, which is clearly wrong. It is only because the compiler may assume taht every program eventually makes progress, that this transformation is permitted.

sassa.nf added a comment.EditedAug 8 2020, 12:04 PM

In general, it is *not* valid for a compiler to remove an infinite loop.

I don't know what this statement is based on. It is up to the computational semantics of the expression. On the contrary, I can say that in general the compiler is free to move the computation around, as long as the observability of the computed value produces a partial order of reads/writes/synchronization actions/IO that is consistent with the program and the language semantics.

Besides, in order to know whether the loop is finite or not, one needs to perform a termination check, which cannot be done in general. So we can either speak about loops in general, or not at all.

But that's exactly what the progress guarantee is about?

No. It is about proof-relevance. Computing n is not relevant to computing what the effects (side-effects + return value) of the loop_it function are.

Also, I've seen an argument about time. Time is not a side-effect of the infinite loop. There is no language specification that says that infinite loops (or any loops for that matter) consume time. (ok, if there is, then it would be great to quote it)

When you want time to be a side-effect, you use micropause() (which can be a no-op).

This could make previously dead code live, which is clearly wrong.

To my mind, it needs to be dealt with on a case-by-case basis.

In particular, the examples I've seen could be saved in a very simple way: deliberately computing an infinite loop (like, constructing a bottom) - just make the computed value used in some way. Like, store it in a infitine_loop_finished global. Now the loop no longer can be eliminated, because now there is a happens-before edge that precludes moving the computation into the future beyond the return - and the loop is no longer eliminated. (I've tried this in Rust)

sassa.nf added a comment.EditedAug 9 2020, 3:24 AM

I am going to add that the optimization is on the same lines as the more common case:

fn some_fn(y) {
   x = y % 42;
   if y < 0 return 0;
...
}

Modern CPUs will attempt to divide, compare and prefetch the code on the predicted success branch in parallel, as and when the corresponding processing units become available. That is, it is possible that for negative y the return of 0 will be observed by the caller before the division is completed. The simple consequence is that the only way you can detect that the division has not started, is to write the code that will test the division outcome.

Since you can't write the code that expects the division to have finished without checking the division outcome, you can't also write the code that expects the division to have started. This means the compiler is free to reorder the computation:

fn some_fn(y) {
   if y < 0 return 0;
   x = y % 42;
...
}

The loop is no different: if the return value of the function does not depend on the result of the loop, it means it is not really waiting for the loop to complete, and cannot detect whether the loop has started.


This is not to say what LLVM compiler, Rust or other languages should do. This is only to say that there are other reasons than forward progress guarantee that can make this optimization happen.

In fact, there are other languages without forward progress guarantee, but which do speak about the observability of effects, and where this sort of optimization is possible, and in fact is happening. The statement about observability of the effect is the necessary condition. The solution in those languages is to make the entering of the loop observable, so one may observe when the loop has not been entered.

Example in Rust:

fn is_odd(x: mut u32, y: &mut u32) -> bool {
   while x != 0 {
      x += 2;
   }
   *y = x;
   return false
}

Now x escapes as the witness that the loop has finished. Passing some global reference bottom_has_been_computed with volatile semantics as y will ensure the infinite loop cannot be optimized away - because there are observable effects of not having it executed.

RalfJung added a comment.EditedAug 9 2020, 3:31 AM

I don't know what this statement is based on.

It is based on the very definition of correct compilation: the compiled program must only exhibit behaviors that were possible behaviors of the source program to begin with.
If the source program never terminates, then the compiled program must not ever terminate either (except if the source program has UB, in which case there is no restriction on what the compiled program may do -- but then there needs to be a clearly-defined safety condition that this program violates).

Yes, C++ makes infinite loops UB (and C, under some conditions). This PR is about making LLVM support languages like JavaScript or Rust that do not have such UB.

See https://bugs.llvm.org/show_bug.cgi?id=965 for some background.

fhahn added a subscriber: fhahn.Aug 9 2020, 5:09 AM
sassa.nf added a comment.EditedAug 9 2020, 6:49 AM

the compiled program must only exhibit behaviors that were possible behaviors of the source program

That's good. Now define what the possible behaviours of the source program are. We can skip this question all the way to what causes the infinite loop to be executed before the statement following it, and not after. There needs to be some form of happens-before rule.

The simplest form of a happens-before rule is: everything must be done in exactly the evaluation order specified by the language expression. However, this means no optimization whatsoever is possible.

So you'll end up with rules that talk about the data dependencies - the expression must be evaluated in such an order that conforms to the language semantics. However, this usually specifies only the order of expressions that relate to the same variables. It is very difficult (if at all possible) to rigidly specify the order of things that are not referring to the same variables.

The order of execution of:

while n != 0 {n += 2;}
return n

is easy to define - can't return such an n that does not correspond to the execution of a loop. Execution time is not specified. For all we know, the compiler is allowed to precompute the result of the function at specific invocation sites and plug the constant there.

The order of execution of:

while n != 0 {n += 2;}
return 0

is harder to define, because return 0 does not depend on the computation of n. What happens-before rule would you add that should require this specific loop to be executed before this specific return of a constant, and yet not forbid other execution reorderings that the optimizer should be allowed to do?


The "possible behaviours" is a very difficult thing to define. Is a program that crashes with stack overflow allowed to "do" anything other than a panic due to the stack overflow? The answer decides the availability of tail call optimization for such a program.

nikic abandoned this revision.Aug 9 2020, 7:05 AM

Abandoning this revision, as I expect this to get superseded by the LangRef patch for D85393 or a variation thereon, which will have to define both what constitutes forward progress and how the attributes either opts into or out of it.

No happens-before rule is needed for sequential programs. The compiled program must exhibit all observable events in the same order as the source program.

Excellent! Observability is key. If the value of n computed by the loop is not observable, the loop can be thrown out. If you want the computation in the loop to be observable, observe the n after the loop is done (or a bunch of other things that will work - like, call some "IO" or "OS" function).

aqjune added a subscriber: aqjune.Aug 9 2020, 7:18 AM

the compiled program must only exhibit behaviors that were possible behaviors of the source program

That's good. Now define what the possible behaviours of the source program are. We can skip this question all the way to what causes the infinite loop to be executed before the statement following it, and not after. There needs to be some form of happens-before rule.

...
The order of execution of:

while n != 0 {n += 2;}
return 0

is harder to define, because return 0 does not depend on the computation of n. What happens-before rule would you add that should require this specific loop to be executed before this specific return of a constant, and yet not forbid other execution reorderings that the optimizer should be allowed to do?

I think the happens-before rule used in general is simply the syntactic order of instructions; regardless of whether there is a data dependency or not, no out-of-order execution is assumed. So, whenever the last statement is return 0 or return n, it has to be executed after the while statement.
This is used by e.g. CompCert, which is a formally verified C compiler. It defines the behavior of a program (using Coq) that way, and proves (using Coq) that the behavior of target refines that of source after a translation is done.

Since it isn't good to discourage 'what-if' questions simply because a well-known software isn't doing that... What would be a main benefit for allowing out-of-order execution between instructions having no data dependency in a sequential program? I believe a rule of thumb of a theory is that it has to be simple - more complex it is, harder it is to use for human's reasoning.
For example, how can we prove this optimization?

while (x != 1) { x := f(x); }
use(x);
  =>
while (x != 1) { x := f(x); }
use(1);

What would be a main benefit for allowing out-of-order execution between instructions having no data dependency in a sequential program?

The same as elimination of dead code. The example that you give has f(x) in it, which is an important difference.

My original question was about whether "forward progress guarantee" is a necessary condition.

But actually the language I was thinking of defines program order as the total order, no partial order with data dependencies involved. So I can see now where the "forward progress guarantee" comes in into this optimization: without this guarantee, and without the partial order you can't ignore non-termination, even if you are not observing the loop's outcome (i.e. even if you do not have a data dependency).