This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Make store promotion work in the face of unordered atomics
ClosedPublic

Authored by reames on Dec 16 2015, 3:08 PM.

Details

Summary

Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled.

Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct.

It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following:
while(b) {

load i128 ...
if (can lower i128 atomic)
  store atomic i128 ...
else
  store i128

}

It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically. Promoting the non-atomic accesses to atomic would be incorrect.

Diff Detail

Event Timeline

reames updated this revision to Diff 43070.Dec 16 2015, 3:08 PM
reames retitled this revision from to [LICM] Make store promotion work in the face of unordered atomics.
reames updated this object.
reames added reviewers: jfb, hfinkel, majnemer, sanjoy.
reames added a subscriber: llvm-commits.
jfb edited edge metadata.Dec 16 2015, 5:54 PM

Can you add tests which validates that:

  • Atomics in a loop aren't moved past a signal fence.
  • Volatile atomics aren't affected.

Also, parts of the C++ standards committee thinks that the standard should "discourage non-normatively aggressive optimizations, e.g across large or unbounded loops" in the context of p0062r0. There isn't full consensus on this yet, but I'd be cautious about doing anything too aggressive to relaxed (a.k.a. monotonic) accesses for now. It would be good to also add a test for the example in p0062r0 and make sure it doesn't get optimized for now. I'll ping Hans to see what he thinks.

lib/Transforms/Scalar/LICM.cpp
962–964

I'm being a bit paranoid here (since this shouldn't happen because you prevent mixing of atomic / non-atomic), but it would be good to assert((Alignment == 0 ? !store->isAtomic() : true) && "atomic alignment can't be 0"); just to make sure that future edits don't introduce bugs.

test/Transforms/LICM/atomics.ll
69

Could you add a similar test where the loop does a monotonic load followed by a monotonic store?

77

CHECK-NOT: store

103

Could you also add a test which has:

loop:
  store i32 5, i32* %x
  %vala = load atomic i32, i32* %y monotonic, align 4
  store atomic i32 %vala, i32* %z unordered, align 4
  %exitcond = icmp ne i32 %vala, 0
  br i1 %exitcond, label %end, label %loop

Note the unordered store is now to %z which should also be noalias.

119

Also check that the unordered atomic is still there.

jfb added a comment.Dec 17 2015, 2:02 PM

Hans pointed out another issue worth adding a test for: you can't remove all atomic accesses from loops which aren't proven to terminate (or otherwise have I/O, volatile, synchronization):

[intro.multithread] 24:
The implementation may assume that any thread will eventually do one of 
the following:
— terminate,
— make a call to a library I/O function,
— access or modify a volatile object, or
— perform a synchronization operation or an atomic operation.
[ Note: This is intended to allow compiler transformations such as 
removal of empty loops, even when
termination cannot be proven. —end note ]

Regardless of what the standard says, it's desirable to keep atomic stores inside infinite loops, though IIUC that would be OK for "unordered" (a.k.a. Java) atomics, but not "monotonic" (C++ relaxed)? It all boils down to what LLVM's memory model is compared to C++'s, and I'm not clear on this (at the minimum it should be as strict as the language it's honoring).

He also points at this article which he believes would be more profitable profitable to implement (in separate patches). The problem in the past was that LLVM didn't have enough alias information to make this work in common cases, but offers to put us in contact with the author off-list.

reames added a subscriber: reames.Dec 18 2015, 9:55 PM

JF,

I'm going to respond to your points in more detail at a future point,
but I did want to quickly say that I do not feel it's appropriate to
keep adding tests for every possible combination of ordered atomics
optimizations under the sun. I've gone along so far because the amount
of work has been less than that required to argue the case, but given
I'm working on supporting only the unordered variety, I really don't see
how the ordered ones are relevant beyond a couple of simple tests to
make sure I don't accidentally include ordered where only unordered was
intended.

Philip

hfinkel edited edge metadata.Feb 2 2016, 2:56 PM

Promoting the non-atomic accesses to atomic would be incorrect.

Why?

Hans pointed out another issue worth adding a test for: you can't remove all atomic accesses from loops which aren't proven to terminate (or otherwise have I/O, volatile, synchronization):

...

Also, parts of the C++ standards committee thinks that the standard should "discourage non-normatively aggressive optimizations, e.g across large or unbounded loops" in the context of p0062r0.

Philip, can you comment on the original use cases here? Are there cases where indefinitely postponing the write (because, say, the loop ended up not terminating) would be acceptable?

reames added a comment.Feb 3 2016, 3:42 PM

Promoting the non-atomic accesses to atomic would be incorrect.

Why?

Per the example I gave, we might not be able to lower a 128 bit atomic, and the code might be conditional on that fact. Inserting an unconditional i128 atomic which can't be lowered which wasn't in the original code in that case seems "questionable". I didn't have a need to do this and the semantics seemed a bit unclear, so it seemed better to avoid.

Philip, can you comment on the original use cases here? Are there cases where indefinitely postponing the write (because, say, the loop ended up not terminating) would be acceptable?

As I said previous, I consider most of JFs comments to be wildly off topic. Particularly, the "unordered" ordering is not related to C++ at all. As specifically documented in the LangRef: "This is intended to provide a guarantee strong enough to model Java’s non-volatile shared variables. " This is exactly the purpose I'm using it for. To the best of my knowledge (which admittedly isn't that extensive here), there is no way to generate an "unordered" instruction from C++ code.

On the topic of infinite loops, I could go either way. In practice, any infinite loop will have a safepoint or deoptimization point within it, so the loop promotion couldn't trigger anyways. Given that, I don't see any reason why we'd need to treat potentially infinite loops specially here. (Again, C++'s rules are not relevant.)

(Hoping to get back to this line of work soon. Distracted with higher priority tasks at the moment.)

Promoting the non-atomic accesses to atomic would be incorrect.

Why?

Per the example I gave, we might not be able to lower a 128 bit atomic, and the code might be conditional on that fact. Inserting an unconditional i128 atomic which can't be lowered which wasn't in the original code in that case seems "questionable". I didn't have a need to do this and the semantics seemed a bit unclear, so it seemed better to avoid.

Fair enough, and I'm fine with not doing this. I don't understand your example, however, because if we can't lower the atomic, then the compilation will fail regardless of whether or not the atomic is conditionally executed. We still need to generate code for it if it is present.

Philip, can you comment on the original use cases here? Are there cases where indefinitely postponing the write (because, say, the loop ended up not terminating) would be acceptable?

As I said previous, I consider most of JFs comments to be wildly off topic. Particularly, the "unordered" ordering is not related to C++ at all. As specifically documented in the LangRef: "This is intended to provide a guarantee strong enough to model Java’s non-volatile shared variables. " This is exactly the purpose I'm using it for. To the best of my knowledge (which admittedly isn't that extensive here), there is no way to generate an "unordered" instruction from C++ code.

I agree, the C++ rules here are not strictly relevant.

On the topic of infinite loops, I could go either way. In practice, any infinite loop will have a safepoint or deoptimization point within it, so the loop promotion couldn't trigger anyways. Given that, I don't see any reason why we'd need to treat potentially infinite loops specially here. (Again, C++'s rules are not relevant.)

I understand, and the fact that this can't come up in your environment is interesting. However, from LLVM's perspective, we still need to decide on the semantics here.

(Hoping to get back to this line of work soon. Distracted with higher priority tasks at the moment.)

reames added a comment.Feb 3 2016, 5:19 PM

Promoting the non-atomic accesses to atomic would be incorrect.

Why?

Per the example I gave, we might not be able to lower a 128 bit atomic, and the code might be conditional on that fact. Inserting an unconditional i128 atomic which can't be lowered which wasn't in the original code in that case seems "questionable". I didn't have a need to do this and the semantics seemed a bit unclear, so it seemed better to avoid.

Fair enough, and I'm fine with not doing this. I don't understand your example, however, because if we can't lower the atomic, then the compilation will fail regardless of whether or not the atomic is conditionally executed. We still need to generate code for it if it is present.

My reasoning was that the very next thing the optimizer might do is constant fold the "can lower i128 atomically" branch to false. The original program would then not contain a atomic i128 (i.e. codegen failure) while a program we promoted to atomic i128 would. I'm not sure this is strictly speaking legal since the program is relying on optimizer behaviour for correctness.

On the topic of infinite loops, I could go either way. In practice, any infinite loop will have a safepoint or deoptimization point within it, so the loop promotion couldn't trigger anyways. Given that, I don't see any reason why we'd need to treat potentially infinite loops specially here. (Again, C++'s rules are not relevant.)

I understand, and the fact that this can't come up in your environment is interesting. However, from LLVM's perspective, we still need to decide on the semantics here.

I think the answer should be that "unordered" atomics are treated like normal non-atomic instructions in all ways *except* that the actual memory operation is atomic. Thus, there are no special requirement about infinite loops.

I draw a mental distinction between atomicity, ordering, and visibility. All of our "atomics" are atomic, but not all of them are ordered. The visibility rules are essentially unspecified, but in practice, we have to give the "ordered" forms eventual visibility guarantees since that's what the C++ spec requires.

To say this differently, I really don't think we want the optimizer to be in a position where it has to prove a loop is not dynamically infinite before doing store promotion. That seems like a major complication. If we later find we need a unordered, atomic, but eventually visible store, I'd rather add that explicitly to the IR. Having both forms seems potentially useful.

For a code clarity perspective, should we add a predicate like mustBeEventuallyVisible(AtomicOrdering)? This would give a place to describe the result of this discussion - whatever we eventually settle on - in a way future uses might find. :)

p.s. Lest it be confusing, our needs on this changed in the not too distant past. I was at one point arguing that LLVM should allow infinite loops without side effects. I'm no longer of that belief. So long as every infinite loop has a side effect (for us, a potential safepoint, deoptimization point, or call), then we do not need any special handling for the unordered atomic stores.

jfb added a subscriber: jfb.Feb 4 2016, 12:59 AM

On the topic of infinite loops, I could go either way. In practice,

any infinite loop will have a safepoint or deoptimization point within it,
so the loop promotion couldn't trigger anyways. Given that, I don't see
any reason why we'd need to treat potentially infinite loops specially
here. (Again, C++'s rules are not relevant.)

I understand, and the fact that this can't come up in your environment

is interesting. However, from LLVM's perspective, we still need to decide
on the semantics here.

I think the answer should be that "unordered" atomics are treated like
normal non-atomic instructions in all ways *except* that the actual memory
operation is atomic. Thus, there are no special requirement about infinite
loops.

That's totally fine by me. My concern is that C++ relaxed atomics
inadvertently gain the same semantics. Maybe not in your patch, but through
further refactoring. That's why I ask for negative tests on relaxed
atomics: to catch if they ever start doing the wrong thing.

reames updated this revision to Diff 50193.Mar 9 2016, 2:02 PM
reames edited edge metadata.

Add requested tests, address comments, and rebase on ToT.

sanjoy resigned from this revision.Jun 16 2016, 6:32 PM
sanjoy removed a reviewer: sanjoy.

Last update was 3 months ago. Feel free to add me back if you resurrect this review.

Is there anything left to do on this? If not, you should commit this :)

sanjoy accepted this revision.Feb 13 2017, 12:36 PM
sanjoy added a subscriber: sanjoy.

This LGTM!

lib/Transforms/Scalar/LICM.cpp
803

clang-format?

This revision is now accepted and ready to land.Feb 13 2017, 12:36 PM
This revision was automatically updated to reflect the committed changes.