This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Select atomic loads and stores
ClosedPublic

Authored by Hahnfeld on Aug 7 2018, 9:11 AM.

Details

Summary

According to PTX ISA .volatile has the same memory synchronization
semantics as .relaxed.sys, so it can be used to implement monotonic
atomic loads and stores. This is important for OpenMP's atomic
construct where

  • 'read's and 'write's are lowered to atomic loads and stores, and
  • an update of float or double types are lowered into a cmpxchg loop.

(Note that PTX could do better because it has atom.add.f{32,64} but
LLVM's atomicrmw instruction only allows integer types.)

Higher levels of atomicity (like acquire and release) need additional
synchronization properties which were added with PTX ISA 6.0 / sm_70.
So using these instructions still results in an error.

Diff Detail

Event Timeline

Hahnfeld created this revision.Aug 7 2018, 9:11 AM
jlebar removed a reviewer: jlebar.Aug 7 2018, 10:03 AM
jlebar added a subscriber: jlebar.

I defer to the others here.

tra accepted this revision.Aug 8 2018, 10:41 AM

In general .relaxed.sys semantics does appear to match guarantees provided by llvm's monotonic ordering, so the patch overall looks like the right thing to do.

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
853

I'd be more explicit -- in order to lower atomic loads with stronger guarantees we would need to have to use .release/.acquire which are only available in ....

Same for the tryStore() below.

Maybe add TODO to check if we *are* compiling for sm_70 and use ld/st with .release/.acquire qualifiers then.

This revision is now accepted and ready to land.Aug 8 2018, 10:41 AM
Hahnfeld updated this revision to Diff 159776.Aug 8 2018, 1:36 PM
Hahnfeld marked an inline comment as done.

Update comments to be more explicit about strong ordering guarantees.

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
853

I've updated the patch and added a note that we could also use fence instructions which were also added in sm_70. I'm not really sure if we could achieve the same effects using membar, but the PTX documentation doesn't mention them explicitly in the section about the "Memory Consistency Model"...

This revision was automatically updated to reflect the committed changes.