This is an archive of the discontinued LLVM Phabricator instance.

Fix miscompile in LoopSink pass
AbandonedPublic

Authored by DaniilSuchkov on Sep 5 2017, 3:38 AM.

Details

Summary

It was allowed by llvm::canSinkOrHoistInst to sink non-invariant loads into loops but it is illegal because it can introduce new data races. For example:

b = *p;
for (int i = 0; i < N; i++)
  a[i] = b;

Assuming a is a thread local value, it should always contain a single value. If it were to contain different values at different indices, that would be a miscompile.

Diff Detail

Repository
rL LLVM

Event Timeline

DaniilSuchkov created this revision.Sep 5 2017, 3:38 AM
danielcdh edited edge metadata.Sep 5 2017, 9:20 AM

Thanks for raising the issue.

But I'm not sure if this the right fix

  • this disables sinking of all load instructions unless it's a constant load, or has invariant load metadata
  • not sure why SafetyInfo would help identify the potential data race here

Additionally, about the original testcase, the data race is there even without loopsink optimization as the load outside of the loop is not guarded by any locks. I'm not a C++ standard expert and am not sure if data race like this is considered as undefined behavior (if yes, then it's legal for compiler to sink the load down to the loop to break the "everything in a[] should be the same" assumption)

Put it another way, is it legal to hoist the load for the following load with potential racy condition?

for (i = 0; i < N; ++i)

a[i] = *p;

In C/C++, your example has undefined behavior. From the C++ standard: "The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior."

DaniilSuchkov added a comment.EditedSep 5 2017, 9:31 PM
  • not sure why SafetyInfo would help identify the potential data race here

From comment to canSinkOrHoistInst:

/// If SafetyInfo is null, we are checking for sinking instructions from
/// preheader to loop body (no speculation).

So I'm just checking if we are going to sink load into loop.

Put it another way, is it legal to hoist the load for the following load with potential racy condition?

for (i = 0; i < N; ++i)

a[i] = *p;

Yes. "No race" is one of possible cases of data race.

mkazantsev edited edge metadata.Sep 5 2017, 10:52 PM

I need to read what C++ specification says about this particular issue, but basically LLVM is not only used to compile C++. This situation can be illegal in other languages (again, need to dig more through specifications). My proposal is to add an option that prohibits this transform and set it to false by default, with abitily to turn it off for languages where it is prohibited.

DaniilSuchkov abandoned this revision.Sep 6 2017, 3:11 AM

Actually the answer is here: https://llvm.org/docs/Atomics.html#notatomic

NotAtomic is the obvious, a load or store which is not atomic. (This isn’t really a level of atomicity, but is listed here for comparison.) This is essentially a regular load or store.
...
Notes for optimizers:
Introducing loads to shared variables along a codepath where they would not otherwise exist is allowed; introducing stores to shared variables is not.

So, this behavior of LoopSink is not a bug.

Actually, there might be a bug here in the handling of "unordered" atomic loads... I haven't verified, but we might need to special-case them in LoopSink.

LoopSink uses canSinkOrHoistInst from LICM to check whether instruction can be sunk or not. This check rejects all loads with isUnordered() == true (see http://llvm.org/docs/Atomics.html#atomics-and-ir-optimization). It looks like "unordered" atomics are handled in a right way: they are never sunk into loops.

I need to read what C++ specification says about this particular issue, but basically LLVM is not only used to compile C++. This situation can be illegal in other languages (again, need to dig more through specifications). My proposal is to add an option that prohibits this transform and set it to false by default, with abitily to turn it off for languages where it is prohibited.

LLVM has a memory model.
https://llvm.org/docs/LangRef.html#memory-model-for-concurrent-operations

If this violates LLVM's memory model, it's a bug.
It it doesn't, it should not be turned off, and an other-language frontend needs to make sure the code it generates is correct in LLVM's memory model.
(or of course, start a discussion about changing the memory model)

reames edited edge metadata.Sep 11 2017, 6:36 PM

I need to read what C++ specification says about this particular issue, but basically LLVM is not only used to compile C++. This situation can be illegal in other languages (again, need to dig more through specifications). My proposal is to add an option that prohibits this transform and set it to false by default, with abitily to turn it off for languages where it is prohibited.

LLVM has a memory model.
https://llvm.org/docs/LangRef.html#memory-model-for-concurrent-operations

If this violates LLVM's memory model, it's a bug.
It it doesn't, it should not be turned off, and an other-language frontend needs to make sure the code it generates is correct in LLVM's memory model.
(or of course, start a discussion about changing the memory model)

Strongly agreed with this. Neither the C++ memory model or the Java memory model are directly relevant to this discussion. We should focus on the LLVM memory model exclusively when defining LLVM bugs and revise the LLVM memory model if needed to support one of our frontend languages.

Reading through the discussion, it sounds like we have correctly concluded there was no bug here (at least for standard non-atomic loads and stores). Reading through the code, it also looks like ordered atomics are handled conservatively and not sunk. I think there might be some further investigation needed for unordered atomics, but I will take that offline with Daniil before returning to this thread.

I think there might be some further investigation needed for unordered atomics, but I will take that offline with Daniil before returning to this thread.

Returning to this point, the conclusion I would reach reviewing our Atomics documentation is that sinking an unordered atomic into a loop must be disallowed, because doing so duplicates the load which is disallowed.

The relevant section of the documentation is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for Optimizers section. Here's the full text of the section under discussion:

Notes for optimizers

In terms of the optimizer, this *prohibits any transformation that transforms a single load into multiple loads*, transforms a store into multiple stores, narrows a store, or stores a value which would not be stored otherwise. Some examples of unsafe optimizations are narrowing an assignment into a bitfield, *rematerializing a load*, and turning loads and stores into a memcpy call. Reordering unordered operations is safe, though, and optimizers should take advantage of that because unordered operations are common in languages that need them.

The important bits are in *bold*. The first highlighted clause could be interpreted to be a weaker constraint that it first seems. In particular, it could be read as only disallowing *splitting* a load into multiple pieces, and not disallowing *duplicating* an load. Internally, several folks read it that way. I believe that is an incorrect interpretation and that duplication must also be disallowed by the text. My reasoning is as follows:

Sinking a load into the header of a loop with a trip count > 1 increases the number of times the load is run dynamically.

Rematerialization (which is explicitly allowed as a disallowed transformation) is the insertion of loads by the register allocator at use sites. LoopSinking and rematerialization end up producing *exactly the same code* on the following example:

L = load atomic %p
while (foo()) {

call_which_clobbers_all_regs();
use L

}

Both transforms would want to produce:

while (foo()) {

call_which_clobbers_all_regs();
L = load atomic %p
use L

}

If this were legal for LoopSink, then it would also be legal for rematerialization. And it's definitely not, as explicitly stated in the text. As such, it must be the case that LoopSink is disallowed for this case as well, which requires the broader interpretation of the initial sentence.

Given this, I must conclude that LoopSink is currently buggy as we do sink unordered atomic loads into loops.

I think there might be some further investigation needed for unordered atomics, but I will take that offline with Daniil before returning to this thread.

Returning to this point, the conclusion I would reach reviewing our Atomics documentation is that sinking an unordered atomic into a loop must be disallowed, because doing so duplicates the load which is disallowed.

The relevant section of the documentation is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for Optimizers section. Here's the full text of the section under discussion:

Notes for optimizers

In terms of the optimizer, this *prohibits any transformation that transforms a single load into multiple loads*, transforms a store into multiple stores, narrows a store, or stores a value which would not be stored otherwise. Some examples of unsafe optimizations are narrowing an assignment into a bitfield, *rematerializing a load*, and turning loads and stores into a memcpy call. Reordering unordered operations is safe, though, and optimizers should take advantage of that because unordered operations are common in languages that need them.

The important bits are in *bold*. The first highlighted clause could be interpreted to be a weaker constraint that it first seems. In particular, it could be read as only disallowing *splitting* a load into multiple pieces, and not disallowing *duplicating* an load. Internally, several folks read it that way. I believe that is an incorrect interpretation and that duplication must also be disallowed by the text. My reasoning is as follows:

Sinking a load into the header of a loop with a trip count > 1 increases the number of times the load is run dynamically.

Rematerialization (which is explicitly allowed as a disallowed transformation) is the insertion of loads by the register allocator at use sites. LoopSinking and rematerialization end up producing *exactly the same code* on the following example:

L = load atomic %p
while (foo()) {

call_which_clobbers_all_regs();
use L

}

Both transforms would want to produce:

while (foo()) {

call_which_clobbers_all_regs();
L = load atomic %p
use L

}

If this were legal for LoopSink, then it would also be legal for rematerialization. And it's definitely not, as explicitly stated in the text. As such, it must be the case that LoopSink is disallowed for this case as well, which requires the broader interpretation of the initial sentence.

Given this, I must conclude that LoopSink is currently buggy as we do sink unordered atomic loads into loops.

I agree. Sinking the load into the loop is equivalent to rematerialization.