This is an archive of the discontinued LLVM Phabricator instance.

[Trivial Dead] Consider any non volatile load as trivially dead independent on ordering
Needs ReviewPublic

Authored by skatkov on Apr 22 2022, 4:23 AM.

Details

Summary

None volatile load can be considered as trivial dead independent on atomic
semantic. This is based on the fact that release-acquire synchronization
happens only if load reads the value written by store-release operation.
As soone as load instruction does not have uses, so no one will check what
actually value has been read. So optimizer may suggest that load reads the
value was before store release happened and so no synchronization happened.
This allows us simply to remove this load.

Diff Detail

Event Timeline

skatkov created this revision.Apr 22 2022, 4:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 4:23 AM
skatkov requested review of this revision.Apr 22 2022, 4:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 4:23 AM
lkail added a subscriber: lkail.Apr 22 2022, 5:27 AM

This isn't trivially correct.

For example, say you have a sequence "%load1 = load seq_cst %a; %load2 = load relaxed %a". I think the later load can sort of "inherit" the sequential consistency of the earlier load; the values it can contain are restricted. (I mean, maybe I'm missing something, but it's not as simple as "no one will check what actually value has been read".)

This isn't trivially correct.

For example, say you have a sequence "%load1 = load seq_cst %a; %load2 = load relaxed %a". I think the later load can sort of "inherit" the sequential consistency of the earlier load; the values it can contain are restricted. (I mean, maybe I'm missing something, but it's not as simple as "no one will check what actually value has been read".)

I'm sorry I do not follow your example. First of all read the same memory as atomic and as not an atomic generally is not a good idea, however...

If in real execution we have an order load1, store-release in other thread, load2 then load2 does not introduce any release-acquire synchronization. So it would be incorrect to say about any inheritance.
May be I miss anything as well....

I think I wrote it backwards. Say you have something like the following:

// thread 1
store seq_cst 1, %a

// thread 2
%load1 = load monotonic %a
%load2 = load seq_cst %a
if (%load1 == 1) {
  // If %load1 produces 1, %load2 must also produce 1.
  // So %load2 acts as an acquire barrier.
}

In this example, I think it isn't legal to erase %load2.

Hi Eli, nice example. Do I understand correctly that here is the problem with exactly using of monotonic? If we used unordered (aka relaxed) then load2 can be safely removed?

The bad thing here is the no analysis will help me, so for the function:
f(int x) {

load acq a
if (x == 42) {
  sync may happend due to incoming parameter x might be result of load a monotonic.
}

}
So generally we should check all callers for this pattern which is likely not a good idea.

However if we could say that our language/runtime does not use monotonic atomic instructions (or do not use them for pointers from some addrspace or say for gc pointers) we are safe to remove such load?
Or you have some other example in your pocket?:)

Do I understand correctly that here is the problem with exactly using of monotonic? If we used unordered (aka relaxed) then load2 can be safely removed?

In that example, yes. If the first load isn't atomic, you'd have a race, which has undefined behavior. If the first load is seq_cst, the second load is redundant, so it can be removed.

So generally we should check all callers for this pattern which is likely not a good idea.

Right; it's very hard to detect cases where we can actually prove the safety.

However if we could say that our language/runtime does not use monotonic atomic instructions (or do not use them for pointers from some addrspace or say for gc pointers) we are safe to remove such load?

If all memory ops in the program are seq_cst, it ends up being okay: every operation in the program is globally sequenced anyway, so a "dead" atomic op can't impose any additional ordering restrictions. If you mix in non-atomic ops, it's more difficult to reason about, but I think you end up with a non-deterministic race in any other case where the "dead" atomic might matter.

I don't want to think about what happens if you mix in monotonic ops in other address-spaces.

Thanks, for you comment. Let me add details about my reasoning.
Let's we have load atomic %a with acquire or seq_cst semantic and this load has no uses.
According to specification this load has a ordering semantic only if it observes the value stored with release semantic to the same memory location.
If there is no uses then explicit check for is impossible. But it can be a implicit check like in example you mentioned.
So to do this implicit check we should load a value before our load and check its value. If our load must have the same value as previous load the previously loaded value may be used for such check.
Side note, let's consider there is only one store release to avoid mentioning "observe release store or later store" all the time.

So what this previous load may be:

  1. Non atomic load - not interesting due to specification declares that atomic release store and non-atomic load is undefined behavior.
  2. Atomic seq_cst or acquire load - if the previous such load already observes the store release our load is not required due to synchronization already happened and our load can be simply removed. If the previous load does not observe store release then it does not help to in implicit check.
  3. Atomic monotonic load - according to your example definitely a problem. So if there is a monotonic load before our load, its check implicitly checks our load and we cannot eliminate our load.
  4. Atomic unordered load - generally our acquire load does not prohibit moving other unordered load after our load (if it would be other memory location for sure and for the same location I guess also) and so it does not help in implicit check of our load.

So it looks like only usage of monotonic load before our load can be used for implicit check.
If it is true then we can probably add the following check in compiler:

If the monotonic atomic load is not possible to be used for accessing the memory location used in our load then we can remove our load.

It can be done on different levels:

  1. Add global flag that this complier does not use monotonic atomics.
  2. Say for this addrspace compiler does not use monotonic atomics.
  3. Say that for the current GC, gc pointers cannot be accessed with monotonic atomics.

In all cases verifier should be updated to ensure that monotonic loads are really not used for this type of pointers.

This is an idea. Do I miss anything?

kpn added a subscriber: kpn.Apr 26 2022, 6:41 AM

The part of that analysis I'm most unsure about is the handling of "acquire" loads. Overall sequential consistency goes beyond just sequencing between "release" stores and "acquire" loads.

Generally, I'd rather not go significantly outside the transforms that have been formally proven; I don't trust my intuition here.

Sequential consistency adds total order of all seq_cst operations to release-acquire semantic. However If load is not used its total order does not matter, so only acquire semantic plays role here.

But I understand your point.
All this stuff related to synchronization is complicated. The worst thing is that in this area it is easier to reject something with an example than to prove that such example is not possible.

Don't you think that adding the implementation above under the flag is more or less safe if it is off by default and we can consider it switching on some day in future. At least we can try to play with it...
Meaning I'd like to add it only if there is no visible failure in my analysis.

And thanks for the discussion anyway - the topic is pretty complex and discuss it with someone is really helpful.

Sequential consistency adds total order of all seq_cst operations to release-acquire semantic. However If load is not used its total order does not matter, so only acquire semantic plays role here.

I'm not sure this is right, but I don't really have time to dig into it. (Maybe you could end up in a situation where the result of an "acquire" load contradicts the total order, even if the seq_cst load is not used.

Don't you think that adding the implementation above under the flag is more or less safe if it is off by default and we can consider it switching on some day in future. At least we can try to play with it...

Under your proposal, we'd need some way to specify that some particular set of loads is safe to optimize more aggressively. Given that existing IR wouldn't have those markings, there's not really any point to turning off the feature by default?

More generally, if C++11 atomics aren't suitable for your use-case, it would make sense to look at other possibilities. If you're interested in pursuing this, please start a thread on Discourse.