This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
ClosedPublic

Authored by qcolombet on Feb 6 2019, 3:07 PM.

Details

Summary

This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given atomicrmw <op>, this is possible when:

  1. The ordering of that operation is compatible with a load (i.e., anything that doesn't have a release semantic).
  2. <op> does not modify the value being stored

Diff Detail

Repository
rL LLVM

Event Timeline

qcolombet created this revision.Feb 6 2019, 3:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 3:07 PM
Herald added subscribers: jfb, mgorny. · View Herald Transcript
jfb added inline comments.Feb 6 2019, 3:14 PM
lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
28

I don't think you should modify volatile this way.

33

This doesn't capture seq_cst. I think you want to list the valid ones: NotAtomic / Unordered / Monotonic / Acquire.

37

This drops SyncScope.

qcolombet marked 3 inline comments as done.Feb 6 2019, 4:20 PM
qcolombet added inline comments.
lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
28

Good point, I'm killing the volatile store in the process, which is invalid.

33

Ok

37

What do you mean?

jfb added inline comments.Feb 6 2019, 4:21 PM
lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
37

Oops ignore me on this one! I misread.

jfb added a comment.Feb 6 2019, 4:23 PM

FWIW you might find wg21.link/n4455 relevant :)

qcolombet updated this revision to Diff 185675.Feb 6 2019, 4:31 PM

Update:

  • Switch to listing valid ordering instead of invalid ones
  • Don't apply the transformation on volatile atomicrmw
qcolombet marked 3 inline comments as done.Feb 6 2019, 4:33 PM

Thanks for the link @jfb.

jfb accepted this revision.Feb 6 2019, 4:46 PM
jfb added a subscriber: reames.

One fix then this LGTM

lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
39

Agreed NotAtomic is bonkers (AtomicRMWInst::Init disallows it), but Unordered might be sensible, @reames would know.

This revision is now accepted and ready to land.Feb 6 2019, 4:46 PM
qcolombet marked an inline comment as done.Feb 7 2019, 10:00 AM
qcolombet added inline comments.
lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
39

From the language reference (http://llvm.org/docs/LangRef.html#atomic-memory-ordering-constraints):
Unordered
[...]
This ordering cannot be specified for read-modify-write operations; it is not strong enough to make them atomic in any interesting way.

And if I try to write a test case with unordered I get:
error: atomicrmw cannot be unordered

So I think we are good :).

jfb added inline comments.Feb 7 2019, 10:31 AM
lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
39

I did not know this! Ignore me then.

This revision was automatically updated to reflect the committed changes.
reames added inline comments.Feb 8 2019, 10:05 PM
llvm/trunk/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
21 ↗(On Diff #185860)

Any plans to handle the other obvious cases such as xor, and, etc?

On the inverse side, we can do the same thing converting to a blind store. e.g. and w/zero, or max w/INT_MAX.

Just curious as to what plans here are.

qcolombet marked an inline comment as done.Feb 12 2019, 9:52 AM
qcolombet added inline comments.
llvm/trunk/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
21 ↗(On Diff #185860)

That would certainly make sense.
I didn't see motivating examples yet, but we could just write test cases that exercises those.
I'll take a look soon-ish unless you beat me at it :).

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2023, 2:53 PM

I believe that replacing atomicrmw <op> LHS, 0 with load atomic LHS. is not sound. There's an extensive discussion in a github issue https://github.com/llvm/llvm-project/issues/56450 which roughly concludes that this patch should be amended or reverted, but it doesn't seem to have tracked across to here.

In particular, anyone programming exclusively using thread fences, relaxed loads and stores, and relaxed atomicrmw, is going to have a bad time when this optimisation kicks in. That's roughly the model that nvptx steers one towards. I can work around this miscompilation by remembering to mark rmw operations as acquire-release instead of relaxed but I'd rather we fix it in trunk instead.

Hi @JonChesterfield ,

Reading through that issue, I agree.
Reverting it in https://reviews.llvm.org/D141277.

Cheers,
-Quentin