Page MenuHomePhabricator

[RFC] Enhance synchscope representation
ClosedPublic

Authored by kzhuravl on Jun 24 2016, 11:19 PM.

Details

Summary

As described in this proposal: https://groups.google.com/forum/#!topic/llvm-dev/GtWfCc5j-4U

Motivation:
In OpenCL 2.x, two atomic operations on the same atomic object need to have the same scope to prevent a data race. This derives from the definition of "inclusive scope" in OpenCL 2.x. Encoding OpenCL 2.x scope as metadata in LLVM IR would be a problem because there cannot be a "safe default value" to be used when the metadata is dropped. If the "largest" scope is used as the default, then the optimizer must guarantee that the metadata is dropped from every atomic operation in the whole program, or not dropped at all.

We cannot use the metadata approach since this metadata can be dropped during the processing of one module but not dropped in the processing of a second module, potentially resulting in inconsistent scopes for synchronizing operations leading to data races and subsequently leading to correctness issues.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
t-tye added inline comments.May 5 2017, 6:34 PM
lib/CodeGen/MIRParser/MIParser.cpp
2082 ↗(On Diff #97997)

To be consistent with the LLVM IR syntax should this use a quoted string rather than an identified? Would need to add a MIToken::dquote and then consumeIfPresent(MIToken::dquote) before/after.

lib/CodeGen/MIRPrinter.cpp
1032 ↗(On Diff #97997)

To be consistent with the LLVM IR syntax should this be a quoted string?

lib/IR/Core.cpp
2747 ↗(On Diff #97997)

Should this and the other atomic instruction support building with a syncscope?

t-tye added inline comments.May 6 2017, 10:27 AM
lib/Bitcode/Reader/BitcodeReader.cpp
1755 ↗(On Diff #97997)

"Invalid multiple sync scope names blocks"

4600 ↗(On Diff #97997)

Should this assert(Val < SSIDs.size()) to ensure that the bitcode is not using a value that is not in the SYNC_SCOPE_NAMES_BLOCK_ID block?

t-tye added inline comments.May 6 2017, 10:38 AM
lib/Bitcode/Reader/BitcodeReader.cpp
4600 ↗(On Diff #97997)

Actually, since it is the bit code reader I guess it should use error().

if (Val > SSIDs.size())
  return error("Invalid sync scope");
return SSIDs[Val];
t-tye added inline comments.May 6 2017, 10:45 AM
lib/Bitcode/Reader/BitcodeReader.cpp
1784 ↗(On Diff #97997)

If the sync scope block is present, should be required to have at least one entry? That would be consistent with the assert for checking for duplicate blocks above (since it relies on SSIDs not being empty), and the writer seems to ensure that it never generates empty blocks. (In practice there will always be 2 entries for the singlethread and empty system scopes, but that could change in the future.)

if (SSIDs.empty())
  return error("Invalid empty sync scope names block");
kzhuravl updated this revision to Diff 98223.May 8 2017, 3:29 PM
kzhuravl marked 8 inline comments as done.

Address review comments.

lib/Bitcode/Reader/BitcodeReader.cpp
4600 ↗(On Diff #97997)

Mapped unknown to system as it was done before.

lib/IR/Core.cpp
2747 ↗(On Diff #97997)

I am not sure. Added a TODO.

t-tye added inline comments.May 8 2017, 9:01 PM
lib/Bitcode/Reader/BitcodeReader.cpp
1784 ↗(On Diff #97997)

I did not see that SSIDs.empty() being reported as an error after the loop has processed the block's entries..

lib/CodeGen/MIRParser/MIParser.cpp
2447–2459 ↗(On Diff #98223)

This does not match the syntax of a string in the LLVM IR parser (see LLLexer::ReadString()). It parses up to the next double quote, and then unescapes // and /nn.

Seems MIR should support the same values as the LLVM IR otherwise the MIR will give errors parsing sync scope names that are legal in the LLVM IR.

Similarly the printing of MIR syncscope should match any escaping () done.

Alternatively LLVM IR should enforce that the sync scope name strings are just identifiers in double quotes, and the documentation for sync scope updated to state what sync scope name strings are allowed.

kzhuravl added inline comments.May 8 2017, 9:10 PM
lib/Bitcode/Reader/BitcodeReader.cpp
1784 ↗(On Diff #97997)

lines 1766-1767.

lib/CodeGen/MIRParser/MIParser.cpp
2447–2459 ↗(On Diff #98223)

ok, I can take a look.

kzhuravl updated this revision to Diff 98352.May 9 2017, 2:48 PM
kzhuravl marked 4 inline comments as done.

Address review feedback.

pcc edited edge metadata.May 9 2017, 3:13 PM

The bitcode parts seem fine to me.

lib/IR/LLVMContextImpl.cpp
218 ↗(On Diff #98223)

Unused function

kzhuravl added inline comments.May 9 2017, 3:23 PM
lib/IR/LLVMContextImpl.cpp
218 ↗(On Diff #98223)

I was planning to start using it in the follow up patch, which implements AMDGPU's memory model. Would that be ok to leave this function in this patch? Or should it be moved to the follow up patch?

pcc added inline comments.May 9 2017, 3:30 PM
lib/IR/LLVMContextImpl.cpp
218 ↗(On Diff #98223)

Please move it to the follow-up so that we can see how it is used in context.

t-tye accepted this revision.May 9 2017, 3:40 PM

LGTM

kzhuravl updated this revision to Diff 98364.May 9 2017, 4:05 PM
kzhuravl marked 3 inline comments as done.

Address review feedback: remove unused function.

Is there a MIR test?

lib/CodeGen/MIRPrinter.cpp
709 ↗(On Diff #98696)

I suspect Using MF would be shorter here.

test/Bitcode/atomic-no-syncscope.ll
1 ↗(On Diff #98696)

Can you add a one line overview of what this test is doing?

Is there a MIR test?

Will test/CodeGen/MIR/AArch64/atomic-memoperands.mir count as a MIR test? Or should there be an additional test?

I will take care of other comments shortly.

Thanks.

I think so, but what would be nice to have a test that round-trip a a few non-standard strings (i.e. not "singlethread" or "system") through MIR and through bitcode.

I think so, but what would be nice to have a test that round-trip a a few non-standard strings (i.e. not "singlethread" or "system") through MIR and through bitcode.

Sounds good, I will add that one.

kzhuravl updated this revision to Diff 99335.May 17 2017, 12:07 PM
kzhuravl marked 2 inline comments as done.

Address review feedback.

mehdi_amini added inline comments.May 22 2017, 7:51 PM
docs/LangRef.rst
2173 ↗(On Diff #71403)

So labelling an atomic operation with a syncscope named "workgroup" would not be correct according to the definition above:

``If an atomic operation is marked ``syncscope("<a>")``, where ``<a>`` is target specific synchronization scope, then it *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations marked ``syncscope("<a>")``

That seems like an issue to me.

t-tye added inline comments.May 22 2017, 9:07 PM
docs/LangRef.rst
2173 ↗(On Diff #71403)

A suggested improvement would be:

If an atomic operation is marked ``syncscope("<a>")``, where ``<a>`` is target
specific synchronization scope, then it *synchronizes with*, and participates in
the seq\_cst total orderings of, other atomic operations marked
``syncscope("<a>")`` that a members of the same instance of scope ``<a>``.

The thing that is missing from the text is the notion of scope instances. When a thread is created, it is implicitly defined which scope instances it is a member. For example, when an OpenCL dispatch is executed, it specifies a device that it will be executed on. All threads for the dispatch are considered members of the agent scope instance for that device. Other dispatches executing on a different device will be members of the agent scope instance for that other device. Atomic operations marked `syncscope("agent") will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked syncscope("agent")` and are members of the same agent scope instance.

Furthermore, the threads of a dispatch are partitioned into separate work-group instances. Atomic operations marked `syncscope("workgroup") will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked syncscope("workgroup")` and are members of the same work-group scope instance.

Furthermore, the threads of a work-group are partitioned into separate subgroup instances. Atomic operations marked `syncscope("wave") will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked syncscope("wave")` and are members of the same subgroup scope instance.

The OpenCL memory model currently requires the scope specified by atomic memory operations must exactly match for them to be considered compatible scope instances. However, other memory models exist that have scopes, and allow scope inclusion where threads are considered in compatible scope instances if they are members of the same scope instance specified in the operation, or some smaller scope in the scope hierarchy.

Since scopes are target specific one would need to ask the target if two atomic operations "may" synchronize.

mehdi_amini added inline comments.May 22 2017, 10:41 PM
docs/LangRef.rst
2173 ↗(On Diff #71403)

The thing that is missing from the text is the notion of scope instances.

Yes and it seems to me that it is a significant issue with the design of syncscope here: we can't just have a yes/no answer when considering the synchronization of two operations, but we need consider the "may" as well (so the query answer would be {must, may, no}).
It also means that a generic analysis without target support can't infer anything based on the syncscopes, they are purely opaque.

I think at least @sanjoy or @reames should re-review this (and the text needs to be fixed to reflect the "may").

kzhuravl updated this revision to Diff 100013.May 23 2017, 3:36 PM

Update documentation based on Tony's suggestion.

kzhuravl marked an inline comment as done.

@sanjoy is already a review. I have added @reames as a reviewer.

docs/LangRef.rst
2173 ↗(On Diff #71403)

Text has been updated to include scope instances.

Once syncscopes are available, the next step could be to introduce tti queries that optimizations may need such as if two atomic operations "may" synchronize. Are there any current optimizations that need such queries now? If not, that could be done as follow up work when we know what is needed.

mehdi_amini added inline comments.May 23 2017, 4:22 PM
docs/LangRef.rst
2193 ↗(On Diff #100013)

scope instance aren't defined.
And I don't even see why bringing this here. Unless there is a generic and relevant IR definition of scope instance, what matters is just that the target can say yes/no/maybe.

kzhuravl updated this revision to Diff 100024.May 23 2017, 5:11 PM

Reword to drop scope instance.

kzhuravl marked an inline comment as done.May 23 2017, 5:11 PM
svenvh added a subscriber: svenvh.May 30 2017, 6:05 AM
t-tye accepted this revision.May 31 2017, 12:13 PM

LGTM

mehdi_amini requested changes to this revision.May 31 2017, 12:30 PM

LGTM

still waiting for input from @sanjoy or @reames (or @chandler maybe?)

This revision now requires changes to proceed.May 31 2017, 12:30 PM

To expand: my current concern is that the issue spotted with "instance" of scope makes the whole mechanism entirely opaque now, I'm not longer sure what is the point of specifying it at the IR level in a target-independent way.
@t-tye you were the one mentioning some generic understanding of the code as one of the original motivation IIRC?

t-tye added a comment.May 31 2017, 1:08 PM

To expand: my current concern is that the issue spotted with "instance" of scope makes the whole mechanism entirely opaque now, I'm not longer sure what is the point of specifying it at the IR level in a target-independent way.
@t-tye you were the one mentioning some generic understanding of the code as one of the original motivation IIRC?

I think an important motivation for this change is to be able to express the memory models used in GPU languages such as OpenCL within the LLVM atomics framework. This simplifies how GPU code generators implement support for atomics (for example, D24623 which is based on the earlier version of this patch but can easily be updated for the current version).

It also allows frontends to use the same approach to generate LLVM IR for atomics directly as is currently done for C++. This would allow GPU languages such as OpenCL to do the same (see D28691 which is similarly based on the earlier version but can be updated).

Using a common approach to representing atomics also would make it easier to add support for them to the memory sanitize and optimizations. To do this, suitable TTI queries can be added to provide the information needed as is done for other target specific features.

There have been others (@raunintc, @PFerreira, @sheredom and @rampitec) that have expressed a desire for LLVM to provide memory scope support for GPUs. The proposed approach has no impact on targets that do not require any hardware support for memory scopes, and is backwards compatible with previous LLVM bitcode. The changes suggested by @mehdi_amini have made it much easier to work with at the LLVM textual level.

sanjoy added inline comments.Jun 4 2017, 9:57 PM
docs/LangRef.rst
2192 ↗(On Diff #100024)

This middle paragraph was not clear to me. Are you trying to say that given two atomic operations op0 marked with syncscope("scope-0") and op1 marked with syncscope("scope-1"), whether they synchronize or not is a target specific function of "scope-0" and "scope-1"? If so, why not say just that?

t-tye added inline comments.Jun 5 2017, 12:21 AM
docs/LangRef.rst
2192 ↗(On Diff #100024)

It is actually also target dependent even if scope-0 is the same as scope-1. The intend of the text was to convey that it may be possible to determine if two atomic operations definitely do synchronize, definitely do not, or may synchronize as suggested by @mehdi_amini in an earlier comment.

However, would the following be better?

If an atomic operation is marked ``syncscope("<target-scope>")``, where
``<target-scope>`` is a target specific synchronization scope, then it is target
dependent if it *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations.
sanjoy added inline comments.Jun 5 2017, 1:04 PM
docs/LangRef.rst
2192 ↗(On Diff #100024)

That sounds better to me.

I'd also like you to please clarify what exactly you mean by "synchronizes with" -- is this the "synchronizes with" relation used in the C++ memory model or something different?

t-tye added inline comments.Jun 6 2017, 10:55 AM
docs/LangRef.rst
2192 ↗(On Diff #100024)

Yes it is the same term as used in the C++ memory model and defined in Memory Model for Concurrent Operations. The term is also being used elsewhere in this section to describe acquire and release etc.

kzhuravl updated this revision to Diff 101603.Jun 6 2017, 12:43 PM
kzhuravl edited edge metadata.
kzhuravl marked 2 inline comments as done.

Update docs as suggested.

(Clarification the "code" part LGTM, so as soon as Sanjoy is OK with LangRef, that's fine with me)

Anastasia added inline comments.Jun 8 2017, 2:53 PM
docs/LangRef.rst
2192 ↗(On Diff #101603)

Why is it only seq_cst? I am guessing the other ordering types (e.g. acquire/release) be used with the synchronization scopes too? I think this is a bit confusing here...

t-tye added inline comments.Jun 8 2017, 3:18 PM
docs/LangRef.rst
2192 ↗(On Diff #101603)

This text was based on the style of writing used in the description for singlethread above. My interpretation of that text was that it is providing restrictions on the regular C++ memory model. So the manner in which the atomic interacts with the global sequential consistency ordering still holds (in other words an acquire or release does not interact with it, but a seq_cst does). Given the level of detail of this text it seemed reasonable as a user really needs to go read the formal definition of the memory models anyway to truly understand the semantics.

kzhuravl updated this revision to Diff 102592.Jun 14 2017, 12:33 PM

Bring up to date with trunk.

Anastasia accepted this revision.Jun 21 2017, 4:07 AM

LGTM from the OpenCL programming support side! Thanks!

docs/LangRef.rst
2192 ↗(On Diff #101603)

Yes, I though it was copied from the previous text. Perhaps over-complicating it with the ordering types of C++11 isn't a good idea indeed... although we could mention it briefly and add a reference.

Thanks for review @mehdi_amini and @Anastasia. @sanjoy would you also be able to take a look?

Thanks,
Konstantin

kzhuravl updated this revision to Diff 105496.Jul 6 2017, 11:44 AM

Bring up to date with trunk (resolve a conflict).

Ping. Hi @sanjoy, can you take a look?

Thanks.

sanjoy edited edge metadata.Jul 10 2017, 10:30 PM

langref changes lgtm

docs/LangRef.rst
2221 ↗(On Diff #105496)

Minor, but should this be "orderings of other atomic operations"? Same for later.

mehdi_amini accepted this revision.Jul 11 2017, 10:19 AM

With @sanjoy approval of langref, LGTM.

(feel free to commit with @sanjoy minor comment).

This revision is now accepted and ready to land.Jul 11 2017, 10:19 AM
This revision was automatically updated to reflect the committed changes.
kzhuravl marked an inline comment as done.