This is an archive of the discontinued LLVM Phabricator instance.

Disallow sinking of unordered atomic loads into loops
ClosedPublic

Authored by DaniilSuchkov on Sep 29 2017, 1:57 AM.

Details

Summary

Sinking of unordered atomic load into loop must be disallowed because it turns a single load into multiple loads.
The relevant section of the documentation is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for Optimizers section.
Here is the full text of this section:

Notes for optimizers
In terms of the optimizer, this prohibits any transformation that transforms a single load into multiple loads, transforms a store into multiple stores, narrows a store, or stores a value which would not be stored otherwise. Some examples of unsafe optimizations are narrowing an assignment into a bitfield, rematerializing a load, and turning loads and stores into a memcpy call. Reordering unordered operations is safe, though, and optimizers should take advantage of that because unordered operations are common in languages that need them.

Diff Detail

Repository
rL LLVM

Event Timeline

DaniilSuchkov created this revision.Sep 29 2017, 1:57 AM

Reviewers: Note that there's a lot of context in previous discussion about the semantics of various types of nonatomic, unordered, and ordered loads in https://reviews.llvm.org/D37463. Comment https://reviews.llvm.org/D37463#882487 is particularly relevant for this review. It's the direct motivation for this patch.

reames requested changes to this revision.Sep 29 2017, 9:01 AM
reames added inline comments.
lib/Transforms/Scalar/LICM.cpp
582 ↗(On Diff #117100)

Deriving this from whether an optional parameter is present is very fragile. Please just add another parameter.

Reads further: Oh, you're not introducing this, just hoisting it in the function. Mind doing a separate NFC patch to expose the parameter?

This could be either before this patch (no further review needed) or after, I don't care.

588 ↗(On Diff #117100)

The placement of this check is wrong. Duplicating a load from an invariant/constant memory location is fine. Move this down and add a respective test case.

This revision now requires changes to proceed.Sep 29 2017, 9:01 AM
danielcdh edited edge metadata.Sep 29 2017, 9:04 AM

Is it legal to hoist an atomic load out of the loop?

Is it legal to hoist an atomic load out of the loop?

Yes, that strictly reduces the number of loads. You can think of hoisting as reusing the load from the first iteration across future iterations and just moving where in the first iteration that load runs. As long as that repositioning is legal, the rest of the transform is just basic FRE.

How about the following case:

while(c1) {
if (c2)

atomic_load;

}

Looks like isSafeToExecuteUnconditionally will not prevent the atomic_load from hoisted to preheader. So in real execution, it may introduce extra atomic loads?

How about the following case:

while(c1) {
if (c2)

atomic_load;

}

Looks like isSafeToExecuteUnconditionally will not prevent the atomic_load from hoisted to preheader. So in real execution, it may introduce extra atomic loads?

From an aliasing/ordering perspective nothing has changed in this example. We have an additional safety requirement (we can't introduce faults), but we haven't changed anything about the memory model legality of the reordering.

In the following example:

while(true) {

a1 = foo();
bar(a1);
while(true) {
  if (baz())
    return atomic_load(a1);
}

}

The atomic_load(a1) will be executed once without hoisting. But with LICM, it may be called multiple times.

In the following example:

while(true) {

a1 = foo();
bar(a1);
while(true) {
  if (baz())
    return atomic_load(a1);
}

}

The atomic_load(a1) will be executed once without hoisting. But with LICM, it may be called multiple times.

I can't make out your example enough to assess what you're intent is. As a general comment, keep in mind that *loading* multiple times is not a problem if only one of them is *used*.

I see your point now. My concern is performance: if we allow hoisting of atomic load, but not allow sinking it, we may end up with bad performance as we may have too much redundant atomic loads in the preheader. Any suggestions on how to solve that?

I see your point now. My concern is performance: if we allow hoisting of atomic load, but not allow sinking it, we may end up with bad performance as we may have too much redundant atomic loads in the preheader. Any suggestions on how to solve that?

Not really. I can say that we've been running performance tests for months in this configuration (LICM hoisting unordered loads, no LoopSink) without noticing any problems. I'm not immediately concerned.

I see your point now. My concern is performance: if we allow hoisting of atomic load, but not allow sinking it, we may end up with bad performance as we may have too much redundant atomic loads in the preheader. Any suggestions on how to solve that?

Not really. I can say that we've been running performance tests for months in this configuration (LICM hoisting unordered loads, no LoopSink) without noticing any problems. I'm not immediately concerned.

Sorry, not sure if I follow the logic, could you explain why you don't think this is a performance concern?

I see your point now. My concern is performance: if we allow hoisting of atomic load, but not allow sinking it, we may end up with bad performance as we may have too much redundant atomic loads in the preheader. Any suggestions on how to solve that?

Not really. I can say that we've been running performance tests for months in this configuration (LICM hoisting unordered loads, no LoopSink) without noticing any problems. I'm not immediately concerned.

Sorry, not sure if I follow the logic, could you explain why you don't think this is a performance concern?

I acknowledge it's a potential issue. I have no good ideas for a solution. I don't believe it's a current problem and evidence to believe that it's relatively minor. (i.e. we haven't seen it)

DaniilSuchkov edited edge metadata.
DaniilSuchkov marked an inline comment as done.

Now sinking of unordered atomic loads from constant/invariant memory is allowed, corresponding test case added.
I am going to expose SafetyInfo in follow up patch.

DaniilSuchkov added inline comments.Oct 5 2017, 10:06 PM
lib/Transforms/Scalar/LICM.cpp
582 ↗(On Diff #117100)

Here is declaration of canSinkOrHoistInst (LoopUtils.h):

/// Returns true if the hoister and sinker can handle this instruction.
/// If SafetyInfo is null, we are checking for sinking instructions from
/// preheader to loop body (no speculation).
/// If SafetyInfo is not null, we are checking for hoisting/sinking
/// instructions from loop body to preheader/exit. Check if the instruction
/// can execute speculatively.
/// If \p ORE is set use it to emit optimization remarks.
bool canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
                        Loop *CurLoop, AliasSetTracker *CurAST,
                        LoopSafetyInfo *SafetyInfo,
                        OptimizationRemarkEmitter *ORE = nullptr);

As you can see it is safe to use SafetyInfo for such check. Actually this line is the only use of SafetyInfo in the function, so I am going to replace this parameter with bool in a separate NFC after this patch.

reames accepted this revision.Oct 10 2017, 12:00 PM

LGTM

This revision is now accepted and ready to land.Oct 10 2017, 12:00 PM
This revision was automatically updated to reflect the committed changes.