Page MenuHomePhabricator

[LangRef] Document forward-progress requirement
Changes PlannedPublic

Authored by nikic on Sun, Aug 4, 9:05 AM.

Details

Summary

Followup on a recent discussion on D59978: It seems that the current consensus is that LLVM has a forward-progress requirement and also intends to keep it in the future. llvm.sideeffect is not a temporary workaround, but the final solution. While this is pretty disappointing for non-C++ frontends, let's make sure this is at least documented in LangRef.

I haven't found a great place to put this, so I created a separate section

Diff Detail

Event Timeline

nikic created this revision.Sun, Aug 4, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptSun, Aug 4, 9:05 AM

I was under impression that the exact opposite was the status,
even if it wasn't consistently followed everywhere.

nikic added a comment.Sun, Aug 4, 10:26 AM

I was under impression that the exact opposite was the status,
even if it wasn't consistently followed everywhere.

That was also my own impression as well, but @jdoerfert and @Meinersbur argued that this is the the current consensus. I would be very happy if this isn't the case. We should clarify this one way or another in LangRef though, to avoid future confusion on this point.

For what it's worth, actually using llvm.sideeffect for this purpose results in pretty catastrophic regressions right now. I think things wouldn't be quite as bad if it was only needed to control loops, but the need to also account for recursion means that llvm.sideeffect ends up literally everywhere. If this is indeed the way forward, we should probably add a sideeffect coalescing pass and see if that makes this at least somewhat viable.

I was under impression that the exact opposite was the status,
even if it wasn't consistently followed everywhere.

That was also my own impression as well, but @jdoerfert and @Meinersbur argued that this is the the current consensus. I would be very happy if this isn't the case. We should clarify this one way or another in LangRef though, to avoid future confusion on this point.

My argument comes from the current behavior:
We delete non-side-effect loops and recursion.

For what it's worth, actually using llvm.sideeffect for this purpose results in pretty catastrophic regressions right now. I think things wouldn't be quite as bad if it was only needed to control loops, but the need to also account for recursion means that llvm.sideeffect ends up literally everywhere. If this is indeed the way forward, we should probably add a sideeffect coalescing pass and see if that makes this at least somewhat viable.

We could probably improve the handling of llvm.sideeffect in certain situations. Especially once we propagate information about accessed (inaccessible) memory locations (D60077 needs rebase though), we should see better results.
I hope we will then have interactions only with other "inaccessible" intrinsics which should not be too bad.

rkruppe added a subscriber: rnk.Sun, Aug 4, 11:41 AM

Clearly the current semantics of LLVM IR have a forward progress guarantee and I agree that should be documented.

The question of how things should be is harder, and I don't know if there really is any community consensus about it. But while the topic is alive again, I want to take the opportunity to point out that we have other options than just not doing the optimizations or baking forward progress into LLVM IR. The more invasive proposal discussed alongside llvm.sideeffect was:

  • define LLVM IR to not have a forward progress guarantee as baseline, but add a function attribute to add it back in
  • let frontends for languages with forward progress guarantee use this attribute to opt into the optimizations in question
  • use a combination of that attribute and llvm.sideeffect where more fine-grained control is needed (such as C, where forward progress is the default but some loops are exempt)

Other than side-stepping most concerns about the optimization impact of the intrinsic, this also avoids overly privileging C's particulars in the IR to the (potential) demerit of every other language. As @rnk said in the comment that originally proposed this:

The forward progress required by C++ is the odd duck here. We should ask frontends to put a function attribute on functions that they want to receive this kind of optimization.

uenoku added a subscriber: uenoku.Sun, Aug 4, 5:51 PM
reames accepted this revision.Mon, Aug 5, 3:07 PM

LGTM w/suggested minor changes.

You might want to revise the infinite loop example used in the phi instruction section to add a call to llvm.side_effect for clarity and seachability.

llvm/docs/LangRef.rst
2501

I'd suggest a tweak in wording here. Rather than explicitly advising llvm.side_effect, I'd suggest phrasing this as requiring the insertion of some synthetic side effect, and then mention llvm.side_effect existing specifically for this purpose.

The hair I'm splitting is that if you already need another synthetic side effect - say a GC or deoptimization safepoint opportunity - then you don't actually need the llvm.side_effect intrinsic *too*.

This revision is now accepted and ready to land.Mon, Aug 5, 3:07 PM
Meinersbur accepted this revision.Mon, Aug 5, 3:27 PM
Meinersbur added inline comments.
llvm/docs/LangRef.rst
2496

[typo] accesses to or accessing ...

"inaccessable memory" is a strange example, it results in an access violation and most often termination of the program, but I guess it can be called a side-effect/progress. However, the compiler may optimize away unused accesses to non-volatile memory such that the side-effect is not guaranteed to happen. Maybe include system calls instead?

jdoerfert added inline comments.Mon, Aug 5, 3:42 PM
llvm/docs/LangRef.rst
2496

"inaccessable memory" is not "invalid memory" but just memory not accessible through direct means from the module. (llvm.side_effect is modeled as an access to inaccessible memory only)

nikic updated this revision to Diff 213895.Wed, Aug 7, 8:16 AM
nikic marked an inline comment as done.

Clarify that any synthetic side-effect is sufficient. Add llvm.sideeffect to infinite loop in phi example.

jfb added inline comments.Wed, Aug 7, 9:29 AM
llvm/docs/LangRef.rst
2496

It would be better to clarify what "inaccessible memory" means here, or refer to a definition.

2497

A relaxed atomic load is a sufficient side-effect?

jfb added a subscriber: __simt__.Wed, Aug 7, 9:29 AM
efriedma added inline comments.Wed, Aug 7, 10:49 AM
llvm/docs/LangRef.rst
2497

We want to consider a busy-wait loop like "while (foo){}", where "foo" is an atomic variable, as making forward progress; otherwise, there isn't any way to write such a loop. We also want to allow erasing dead atomic loads.

We might need to modify the definition of a side-effect to make this work.

jfb added inline comments.Wed, Aug 7, 1:05 PM
llvm/docs/LangRef.rst
2497

Right, I'm trying to understand what guarantees LLVM offers, and if they're the same or more than C++.
Where it makes sense I'd like this documented, with references.

http://eel.is/c++draft/intro.progress
http://eel.is/c++draft/intro.multithread#intro.races-3

Somewhat related:
I was hoping that the following deduction will be valid:

If all instructions in the loop do not synchronize (see `nosync`) then the loop cannot be infinite.

The reason is that any "progress" need to be observable and for that it needs to be "synchronized".
Now, the nosync definition might not be up to this (I hope it is though), but that is a different issue.
Having the above reasoning will (I think) allow us to deduce willreturn in the presence of various "common" loops.

llvm/docs/LangRef.rst
10080

Add a link to the forward progress requirement section please.

reames added a comment.Fri, Aug 9, 8:34 PM

Just a note of caution: I see discussion starting on cornercases. I would strongly suggest *not* blocking this patch on getting a fully consistent definition. Having something documented which is "mostly right" is better than no documentation at all. (Adding a explicit marker of subtleties under discussion is good.) It's really tempting as reviewers to keep looking for the subtle problems, but remember, the confusion which keeps arising is over the basic need for a side effect, not the exact definition thereof!

jfb added a comment.EditedFri, Aug 9, 8:56 PM

Just a note of caution: I see discussion starting on cornercases. I would strongly suggest *not* blocking this patch on getting a fully consistent definition. Having something documented which is "mostly right" is better than no documentation at all. (Adding a explicit marker of subtleties under discussion is good.) It's really tempting as reviewers to keep looking for the subtle problems, but remember, the confusion which keeps arising is over the basic need for a side effect, not the exact definition thereof!

I have to disagree strongly. C++ has a forward progress definition and it sounds like we’re codifying something derived from it for llvm IR.

Now say I have code with:

bool ready;
// ...
while (!ready)
  __atomic_fence(RELAXED); // However my languages does this.

Does this need a side effect? Per the current definition, no. Per C++, yes.

So codifying something “mostly ok” is a terrible idea because any further change will affect semantics we’re promised people.

Right now we don’t promise anything. It’s bad! Let’s fix it. But let’s fix it for real, not a band-aid.

Does this need a side effect? Per the current definition, no. Per C++, yes.

What does "need a side-effect" mean here? Do you mean whether llvm.sideffect has to be added to avoid UB?
Per my reading of C++, an atomic fence is considered a side-effect for the purpose of forward progress (looks like an atomic operation to me, even if it is relaxed), so this loop is not UB and may not be optimized away.

use a combination of that attribute and llvm.sideeffect where more fine-grained control is needed (such as C, where forward progress is the default but some loops are exempt)

AFAIK, in C forward progress is the default for loops (except, as you said, if their condition is a constant). But I cannot see anything in the C standard that allows a forward progress assumption for recursion or goto or any other form of infinite execution (if they exist).

The question of how things should be is harder

Is it worth saying in the docs that this is subject to discussion, or so?
Basically I hope we can use this patch as the start of a discussion, not as a way of cementing the current semantics (which are a bad fit for every language except for C++).

RalfJung added inline comments.Mon, Aug 12, 2:15 AM
llvm/docs/LangRef.rst
2497

A loop while (/* relaxed load */ == 0) {} followed by an acquire-fence should definitely not be optimized away. That is a perfectly reasonable pattern to establish synchronization though fences and relaxed accesses.

The C++ standard specifically says "atomic operations" in loops are good enough, and while one can argue about whether a relaxed load synchronizes (it sometimes does, as in the above), I think we can agree that it is an atomic operation?

Somewhat related:
I was hoping that the following deduction will be valid:

If all instructions in the loop do not synchronize (see `nosync`) then the loop cannot be infinite.

If we say relaxed atomics do sync (as opposed to nosync), this should work with the concerns raised (and with recursion).
Which would allow us to say, if it does not sync it must be finite if forward progress is enforced^.

^ So this is orthogonal to the discussion if we should have an attribute enforcing forward progress (which I think makes sense).

jfb added a comment.Mon, Aug 12, 10:04 AM

Does this need a side effect? Per the current definition, no. Per C++, yes.

What does "need a side-effect" mean here? Do you mean whether llvm.sideffect has to be added to avoid UB?

I mean: if my language wants to generate a loop with a fence in it, and a non-atomic load, how do I get LLVM to generate that code without eliminating the code? Do I need llvm.sideeffect?

Per my reading of C++, an atomic fence is considered a side-effect for the purpose of forward progress (looks like an atomic operation to me, even if it is relaxed), so this loop is not UB and may not be optimized away.

A fence is not an atomic operation on its own in C++. See http://eel.is/c++draft/atomics.fences where it says "is a synchronization operation *if* [...]".
Relaxed atomics aren't synchronization operations in C++.

The question of how things should be is harder

Is it worth saying in the docs that this is subject to discussion, or so?
Basically I hope we can use this patch as the start of a discussion, not as a way of cementing the current semantics (which are a bad fit for every language except for C++).

I see two approaches:

  • Document what LLVM does today, get people to agree that it's actually what is done, leave a note saying "we'd like to change this", and propose changes from current practice in the RFC.
  • Write down what you think the rules should be, including any changes to LLVM's builtins and optimizations, and send and RFC proposing that what you think is right be adopted.

What I don't want is a hand wavy description of what could be. Being imprecise is strictly worst than what we have today (undocumented) because it gives the impression of being correct, and of being something we won't change in the future.

llvm/docs/LangRef.rst
2497

I asked about a loop with a relaxed atomic load, no acquire fence.

atomic<int> working;
// ...
while (working.load(memory_order_relaxed) != 0) {
  // Wait some more...
}

Can this be optimized out? Per C++... kinda?
What about the direct lowering to LLVM IR, does it need a side effect to avoid being optimized out?

For reference: wg21.link/n4455 and http://wg21.link/p0062

nikic planned changes to this revision.Mon, Aug 12, 11:00 AM
nikic marked an inline comment as done.

This documentation should probably be combined with the introduction of the requires_forward_progress attribute. The current codebase in not consistent in this regard (e.g. we have code in LoopDeletion specifically assuming no forward-progress requirement). It would be better to condition that kind of code on an attribute rather than drop it entirely and add it back later.

llvm/docs/LangRef.rst
2497

The [intro.progress] link you provided considers "perform[ing] a synchronization operation or an atomic operation" (emphasis mine) as forward progress. Whether the relaxed atomic load synchronizes is besides the point: It is still an atomic operation and as such sufficient to constitute forward progress. This is also consistent with the definition given here. Am I misunderstanding your concern?

A fence is not an atomic operation on its own in C++. See http://eel.is/c++draft/atomics.fences where it says "is a synchronization operation *if* [...]".

Sure -- there is no operation that is guaranteed to synchronize in all executions, they all only "synchronize if".
But you are using "atomic operation" and "synchronization operation" as if they were the same term here. I do not think that is the case.
http://eel.is/c++draft/atomics.fences describes fences as part of the "atomic operations library", so I'd say we can safely assume that fences are an atomic operation.

Relaxed atomics aren't synchronization operations in C++.

Indeed relaxed atomics don't sync on their own. Just like fences. Relaxed atomics can, however, be crucial part of the synchronization between two fences. And is there any doubt that relaxed atomics are *atomic*? Because that's all it takes to count as forward progress (http://eel.is/c++draft/intro.progress):

perform a synchronization operation or an atomic operation.

So, as long as we agree that all fences and relaxed atomic accesses are all "atomic operations", whether they synchronize does not matter for the purposes of forward progress.

RalfJung added inline comments.Thu, Aug 15, 11:18 PM
llvm/docs/LangRef.rst
2497

You asked about a loop with a fence here. That's what I responded to. Sorry, I am still very confused by the UI here, so I probably used the wrong button for that.

That loop, which is the same as what I drafted up there, certainly should not be optimized away. If the standard allowed this, it would be horribly broken. But as nikic pointed out, there is an atomic operation inside the loop, so we are good. It does not matter whether it is synchronizing.

The pointer of the "or synchronizing" in the standard is likely for per-thread compiler fences or so? I am not sure.