This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Enhance synchscope representation
ClosedPublic

Authored by kzhuravl on Jun 24 2016, 11:19 PM.

Details

Summary

As described in this proposal: https://groups.google.com/forum/#!topic/llvm-dev/GtWfCc5j-4U

Motivation:
In OpenCL 2.x, two atomic operations on the same atomic object need to have the same scope to prevent a data race. This derives from the definition of "inclusive scope" in OpenCL 2.x. Encoding OpenCL 2.x scope as metadata in LLVM IR would be a problem because there cannot be a "safe default value" to be used when the metadata is dropped. If the "largest" scope is used as the default, then the optimizer must guarantee that the metadata is dropped from every atomic operation in the whole program, or not dropped at all.

We cannot use the metadata approach since this metadata can be dropped during the processing of one module but not dropped in the processing of a second module, potentially resulting in inconsistent scopes for synchronizing operations leading to data races and subsequently leading to correctness issues.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kzhuravl updated this revision to Diff 69264.Aug 25 2016, 10:16 AM

Update LangRef

kzhuravl added a subscriber: alex-t.
mehdi_amini added inline comments.
docs/LangRef.rst
2164 ↗(On Diff #69264)

I feel the "of" should be before the comma?

2175 ↗(On Diff #69264)

It is not clear to me why we need to specify this level of details? It seems opaque to optimizer anyway.
I'll defer to Sanjoy and/or Hal for this.

kzhuravl updated this revision to Diff 69297.Aug 25 2016, 2:54 PM
kzhuravl marked an inline comment as done.

Partially address review feedback + update number of subclass data bits used in MemSDNode

kzhuravl updated this revision to Diff 69315.Aug 25 2016, 7:50 PM
kzhuravl marked an inline comment as done.

Reduce level of details in docs

sanjoy edited edge metadata.Aug 30 2016, 5:58 PM

FYI: I've replied on the llvm-dev thread

I stumbled onto this as I was going through the mailing list.
We (Imagination) have a metadata solution in place and while we didn't have any problems describing the memory scopes all the way through to our codegen (the metadata didn't get lost as far as we can tell), the solution presented here does look more promising. We'd be interested in seeing this mainstreamed.

This is something we (Codeplay) would like to see upstream, it is a much cleaner solution than the metadata workarounds everyone (including us) have been using to fix this.

I minorly dislike synchscope as the term for it - syncscope reads better to me (I read the former as 'Singe Scope', the latter as 'Sink Scope').

include/llvm/IR/Instructions.h
40 ↗(On Diff #69315)

Prededined -> Predefined

Have you considered a named syncscope of System or SoC? This would be similar to the OpenCL 2.0 all_devices scope, but generic enough that I think it could be useful in other domains. Even as is, I would take advantage of these changes.

@raunintc, the bitmask approach that @reames suggested may be a better approach. Could this be tackled in a separate patch?

kzhuravl updated this revision to Diff 71377.Sep 14 2016, 10:07 AM
kzhuravl edited edge metadata.

Fix minor typos

kzhuravl updated this revision to Diff 71378.Sep 14 2016, 10:08 AM

Ignore previous diff update, I have updated the wrong review by accident

jlebar added a subscriber: jlebar.Sep 14 2016, 10:30 AM

I am also planning to update the diff with fixes and full context

kzhuravl updated this revision to Diff 71403.Sep 14 2016, 11:30 AM
kzhuravl marked an inline comment as done.
  • rebase
  • synchscope -> syncscope
  • minor typos in docs
kzhuravl added a comment.EditedSep 14 2016, 11:33 AM

I minorly dislike synchscope as the term for it - syncscope reads better to me (I read the former as 'Singe Scope', the latter as 'Sink Scope').

Agreed, I have changed the keyword to syncscope and did relevant updates to LangRef. I would also like to rename functions and variables in the same manner (synchscope -> syncscope) to be consistent, but, if there are no objections, I would prefer to do it in a separate patch?

mehdi_amini added inline comments.Sep 14 2016, 11:48 AM
docs/LangRef.rst
2173 ↗(On Diff #71403)

I'm not convinced by the wording it is target defined how <n> affects which other operations, it can be read that even with the same scope it is still target define, which is not the intent right?

Straw man alternative:

If an atomic operation is marked ``syncscope(<n>)``, then it *synchronizes with*,
and participates in the seq\_cst total orderings of, other operations
running in the same scope. It is target defined how it interacts with operations
in a different scope.
2178 ↗(On Diff #71403)

The same target define behavior applies here with respect to other scope.

kzhuravl updated this revision to Diff 71524.Sep 15 2016, 10:55 AM

Address review feedback

kzhuravl added inline comments.Sep 15 2016, 10:56 AM
docs/LangRef.rst
2173 ↗(On Diff #71403)

Yes, you are right. I have updated the wording.

2178 ↗(On Diff #71403)

I tried to cover all cases in the paragraph above this one. Would that be ok?

mehdi_amini added inline comments.Sep 15 2016, 10:57 AM
docs/LangRef.rst
2178 ↗(On Diff #71524)

Ping on this comment?

2178 ↗(On Diff #71524)

(The email crossed)

mehdi_amini added inline comments.Sep 15 2016, 10:59 AM
docs/LangRef.rst
2178 ↗(On Diff #71524)

Looks enough indeed, thanks.

WRT the langref, Mehdi and I had a conversation on IRC, that I want to summarize here.

AIUI we're trying to accomplish two things here:

  1. Give the front-end a way to cause LLVM to generate atomic machine instructions with the "right" scope. We can't safely drop the scope.
  1. Let LLVM optimize these instructions, safely.

If all we cared about were (1), we could just use target-specific intrinsics. Moreover, if all we cared about were target-specific optimizations on these instructions, again (1) would suffice.

So being able to optimize sequences of these instructions at the IR level is important to this proposal. Otherwise, why bother? There would be no advantage over using target-specific intrinsics.

Thus, I want to dig into how optimizations would work under this proposal, in order to try to figure out if it's sufficient for the things we want to do today and also if it's not painting us into a corner for the things we think we'll want to do in the future.

There are two classes of target-agnostic IR optimizations we might want to perform:

  • Target-agnostic IR optimizations that are agnostic as to the meaning of the target's syncscopes.
  • Target-agnostic IR optimizations that are not agnostic to the meaning of the syncscopes.

Without extra information provided by the target, we can only perform syncscope-meaning-agnostic optimizations from target-agnostic IR passes. This means that, without extra information provided by the target, we can only optimize a set of atomics when, in a block of code we're considering, all of the atomic operations have the same syncscope.

In other words, without extra information provided by the target about the meaning of the syncscopes, a given block of code with non-syncscope'd atomics will never have more optimization opportunities after adding syncscopes. Without extra information provided by the target, syncscopes can only limit target-agnostic IR optimization opportunities.

This leads me to my question:

  • Will we want, within the foreseeable future, to have target-agnostic IR passes that will make use of the meaning of the syncscopes?

    I sort of think, maybe yes, it seems like a natural extension to this work. But maybe I'm wrong.

Then the next question:

  • Assuming we will want to build target-agnostic IR passes that make use of the meaning of the syncscopes, is this the right design?

    Specific questions related to this:
    • Is specifying the syncscope as a number (as opposed to, I guess, a string, or referencing a metadata node) the right way to go? On the one hand, using numbers matches what we do for address spaces, which also have target-specific meanings. On the other hand, anyone know what NVPTX address space 4 means off the top of their heads?
    • What are some examples of optimizations that you might do if you understood what the syncscopes meant? What information would we need the target to give to our target-independent IR pass in order for it to perform these optimizations?
    • Should the information above be encoded into the module itself, or should it be a property of the target?
docs/LangRef.rst
2174 ↗(On Diff #71524)

How about "marked `syncscope(<m>) with m != n`" or something like that?

kzhuravl updated this revision to Diff 72008.Sep 20 2016, 11:30 PM
kzhuravl marked an inline comment as done.
  • Address review feedback
  • Switch syncscope from 32 bits to 8 bits
    • 8 bits should be enough for OpenCL in the current approach and in the bitmask approach
    • Plays along nice with packing in MachineMemOperand (D24577)
    • Would there be any objections?
docs/LangRef.rst
2174 ↗(On Diff #71524)

Sounds better, thanks

tony-tye edited edge metadata.Sep 21 2016, 9:30 AM

I would argue that even without (2) there is a case for using standard LLVM atomic instructions as it makes it possible to generate the same IR for a given language regardless of whether the target supports scopes. It seems CLANG is moving towards generating LLVM IR atomics directly rather than calls to built-ins so this approach would support that. Currently LLVM already supports two scopes. It also makes code generation much simpler as the existing machine instructions can be used and so avoid creating a large number of pseudo instructions. The patches D24577 and D24623 that are under review are pretty simple and do that. There would be a lot more code if intrinsics had to be used.

But I agree that a major motivation for this is so that optimizations can be written that will benefit all languages including those that have scopes. For example, currently it appears that atomic machine instructions have to be marked as volatile to ensure that the machine code instruction scheduler will insert the necessary dependencies. This is conservatively correct, but if the memory machine instructions had the memory ordering information available it would be possible to only add dependencies in the direction required in a target neutral way. This does not require knowledge of the memory scope.

Optimizations that merge a finite series of atomics to the same location would require some knowledge of the memory scopes when they do not all match. Such optimizations would not be legal for volatile, but are allowed for atomics. It seems considering this information as target specific is not unreasonable. This is how address spaces appear to be handled currently. If alias analysis was to use address space information, then it would seem it would need to query the target to determine if two address spaces may alias (for example, for OpenCL the generic address space may alias other address spaces, but other address spaces do not alias each other).

Currently the concept of memory scope already exists in LLVM as the SynchonizationScope enumeration, and is encoded in bit code as a 32 bit value. Continuing to use an enumeration does not affect backwards compatibility and is adequate for code generation. Are there existing atomics optimizations that need to be updated to honor memory scope? Currently there are no uses of SynchonizationScope in any optimizations, even though there are two scopes currently defined. Are there plans for future atomic optimizations?

If in writing future optimizations a better representation is devised then it could always be changed as a separate patch at that time.

Tony, thanks for your comments.

I would argue that even without (2) there is a case for using standard LLVM atomic instructions as it makes it possible to generate the same IR for a given language regardless of whether the target supports scopes.

OK. Personally I don't find this particularly compelling, but I agree it's something. And it sounds like you agree below that it's not the most important reason.

Currently it appears that atomic machine instructions have to be marked as volatile to ensure that the machine code instruction scheduler will insert the necessary dependencies. This is conservatively correct, but if the memory machine instructions had the memory ordering information available it would be possible to only add dependencies in the direction required in a target neutral way. This does not require knowledge of the memory scope.

Sure, although this could also be solved by adding some sort of metadata to our intrinsic functions so that their operands are annotated correctly. It seems to me that this would be desirable even if we were to add this support for scoped atomics, because it'd be useful for other intrinsics?

For example, LLVM IR doesn't appear to have a compare-and-swap instruction, and until one is added, we're stuck using target-specific intrinsics for those (right?). But whatever volatility stuff you want to remove from atomic intrinsics, presumably you'd also want to remove from a CAS intrinsic?

If the above is right I don't find it a particularly compelling reason to go with this approach. (That is, I come back to "letting us write generic IR optimizations that take advantage of target-specific information about their atomics.")

If alias analysis was to use address space information, then it would seem it would need to query the target to determine if two address spaces may alias (for example, for OpenCL the generic address space may alias other address spaces, but other address spaces do not alias each other).

Target-specific AA may be instructive. See D24441 and D12414, my attempt at adding an NVPTX-specific AA. We could formulate this as "address spaces X and Y cannot alias", but it would be much less powerful than what I have in that patch.

Currently the concept of memory scope already exists in LLVM as the SynchonizationScope enumeration, and is encoded in bit code as a 32 bit value.

If in writing future optimizations a better representation is devised then it could always be changed as a separate patch at that time.

I guess I feel like this kind of sidesteps the question, which is, would one of the representations I mooted (or some other representation) be better? I acknowledge that the current SycnchScope enum is like address spaces, but as I wrote above, I'm not convinced the way we do target-specific address spaces is good, because it makes our IR very hard to read -- you as the human have to keep this mapping in your head. It'll be worse in some way with atomics because they're uncommon, so you're less likely to have the mapping memorized.

Currently there are only two possible values for the synchscope, so maybe it's not so painful at the moment, but we're proposing adding a lot more. As you say, there isn't a lot of code today that uses these values, so I guess I'm saying that changing now may be better than changing after we write a bunch of code that relies on the opaque numeric representation. Or at least, I'd like us to think about it.

Currently there are no uses of SynchonizationScope in any optimizations, even though there are two scopes currently defined. Are there plans for future atomic optimizations?

I think this gets to the heart of the matter; we should understand the use-cases for this metadata.

Tony, thanks for your comments.

I would argue that even without (2) there is a case for using standard LLVM atomic instructions as it makes it possible to generate the same IR for a given language regardless of whether the target supports scopes.

OK. Personally I don't find this particularly compelling, but I agree it's something. And it sounds like you agree below that it's not the most important reason.

Currently it appears that atomic machine instructions have to be marked as volatile to ensure that the machine code instruction scheduler will insert the necessary dependencies. This is conservatively correct, but if the memory machine instructions had the memory ordering information available it would be possible to only add dependencies in the direction required in a target neutral way. This does not require knowledge of the memory scope.

Sure, although this could also be solved by adding some sort of metadata to our intrinsic functions so that their operands are annotated correctly. It seems to me that this would be desirable even if we were to add this support for scoped atomics, because it'd be useful for other intrinsics?

For example, LLVM IR doesn't appear to have a compare-and-swap instruction, and until one is added, we're stuck using target-specific intrinsics for those (right?). But whatever volatility stuff you want to remove from atomic intrinsics, presumably you'd also want to remove from a CAS intrinsic?

I think we were originally talking about machine instruction scheduler? Ordering information is available in MachineMemoryOperand since rL284312. As long as you lower your target-specific intrinsics to a machine instruction which has MachineMemoryOperand, machine instruction scheduler should be able to use ordering information as described by @tony-tye. Also, there is compare-and-swap instruction in LLVM IR: http://llvm.org/docs/LangRef.html#cmpxchg-instruction

If the above is right I don't find it a particularly compelling reason to go with this approach. (That is, I come back to "letting us write generic IR optimizations that take advantage of target-specific information about their atomics.")

If alias analysis was to use address space information, then it would seem it would need to query the target to determine if two address spaces may alias (for example, for OpenCL the generic address space may alias other address spaces, but other address spaces do not alias each other).

Target-specific AA may be instructive. See D24441 and D12414, my attempt at adding an NVPTX-specific AA. We could formulate this as "address spaces X and Y cannot alias", but it would be much less powerful than what I have in that patch.

Currently the concept of memory scope already exists in LLVM as the SynchonizationScope enumeration, and is encoded in bit code as a 32 bit value.

If in writing future optimizations a better representation is devised then it could always be changed as a separate patch at that time.

I guess I feel like this kind of sidesteps the question, which is, would one of the representations I mooted (or some other representation) be better? I acknowledge that the current SycnchScope enum is like address spaces, but as I wrote above, I'm not convinced the way we do target-specific address spaces is good, because it makes our IR very hard to read -- you as the human have to keep this mapping in your head. It'll be worse in some way with atomics because they're uncommon, so you're less likely to have the mapping memorized.

Currently there are only two possible values for the synchscope, so maybe it's not so painful at the moment, but we're proposing adding a lot more. As you say, there isn't a lot of code today that uses these values, so I guess I'm saying that changing now may be better than changing after we write a bunch of code that relies on the opaque numeric representation. Or at least, I'd like us to think about it.

Currently there are no uses of SynchonizationScope in any optimizations, even though there are two scopes currently defined. Are there plans for future atomic optimizations?

I think this gets to the heart of the matter; we should understand the use-cases for this metadata.

Currently there are no generic atomic optimizations, so there are no syncscope opaque numbers used in generic code. When generic atomic optimizations are introduced, we could work together on either improving the syncscope representation or ways to query targets about the syncscope relationships as a separate patch.

There is a CLANG patch (D28691), that takes advantage of this to generate atomic LLVM IR instructions directly from generic builtins.

arsenm added inline comments.Mar 13 2017, 9:00 AM
lib/Target/AMDGPU/AMDGPU.h
14 ↗(On Diff #72008)

Including this here looks weird. Can we move the dependent enum somewhere else?

rampitec accepted this revision.Mar 13 2017, 11:07 AM

I'm not the code owner, this is more a vote. However, we have been struggling with the synchronization scope for few years and two backends now. Neither metadata nor intrinsics work well and this is the only comprehensive solution I have seen so far. The issue first arose with OpenCL 2.0 implementation and it looks like lack of support is going to impact us more and more. It also looks like not only we need this, there are also other parties interested. All of that makes me think IR extension is well justified.

We need an entry in the bitcode compatibility test as well.

include/llvm/Bitcode/LLVMBitCodes.h
381 ↗(On Diff #72008)

if the plan is to be able to add new scopes (cf previous question), how is it supposed to be translated on the encoding side?

include/llvm/IR/Instructions.h
49 ↗(On Diff #72008)

Is the plan to be able to add new scopes?

kzhuravl planned changes to this revision.Mar 13 2017, 12:47 PM

Need to address review feedback and answer questions.

mehdi_amini added inline comments.Mar 13 2017, 12:55 PM
include/llvm/Bitcode/LLVMBitCodes.h
381 ↗(On Diff #72008)

I think I mentioned in the past using target specific strings instead of integer in textual and bitcode serialization, why was this ruled out? This was intended to addressed such issues (and make the textual IR more readable)

arsenm added inline comments.Mar 13 2017, 1:19 PM
include/llvm/Bitcode/LLVMBitCodes.h
381 ↗(On Diff #72008)

This was done and then reverted to this plan. I think using a string is overkill and would make the textual IR less readable. Other places in the IR that use string-like inputs refer to them through metadata nodes, which requires indirection to see what it actually is. As it is this isn't really different from target defined address space IDs today.

mehdi_amini added inline comments.Mar 13 2017, 1:59 PM
include/llvm/Bitcode/LLVMBitCodes.h
381 ↗(On Diff #72008)

Can you explain how this:

atomicrmw volatile xchg i32* %x, i32 10 syncscope(5) monotonic

is more readable than this:

atomicrmw volatile xchg i32* %x, i32 10 syncscope("image") monotonic

?

arsenm added inline comments.Mar 13 2017, 2:04 PM
include/llvm/Bitcode/LLVMBitCodes.h
381 ↗(On Diff #72008)

I was thinking it would end up looking like

atomicrmw volatile xchg i32* %x, i32 10 syncscope(!5) monotonic

....
!5 = !{"image"}

mehdi_amini added inline comments.Mar 13 2017, 2:31 PM
include/llvm/Bitcode/LLVMBitCodes.h
381 ↗(On Diff #72008)

AFAIK the reason we use metadata to ember string attached to instruction is usually either 1) use the generic MD attachment framework or 2) intrinsic arguments (calls arguments can be metadata).

That said I would try to have a textual syntax for intrinsic calls that takes MDString: call llvm.something(!"some_string") if I was designing such intrinsic (I think that's off-topic here anyway).

I think I mentioned in the past using target specific strings instead of integer in textual and bitcode serialization, why was this ruled out? This was intended to addressed such issues (and make the textual IR more readable)

I do not recall original reasons we did not go with target specific strings, but just thinking from the top of my head:

  • Existing synchronization scope field is represented as 32 bits unsigned field in the bitcode
    • Introducing new synchronization scopes is naturally simple - minimal impact across the board
  • I think changing the synchronization scope representation to strings will impose unnecessary complexity
    • On maintaining backwards compatibility for the bitcode (in addition, strings might be of arbitrary length)
    • On using existing synchronization scopes (CrossThread, SingleThread) within the llvm. Do we want to represent those as strings as well?
    • Few size-sensitive structures (e.g. MMO) have synchronization scope as a member, so changing syncscope to string will increase their size
    • +additional burden for carrying through strings throughout the compilation (compared to ints)

I think I mentioned in the past using target specific strings instead of integer in textual and bitcode serialization, why was this ruled out? This was intended to addressed such issues (and make the textual IR more readable)

I do not recall original reasons we did not go with target specific strings, but just thinking from the top of my head:

  • Existing synchronization scope field is represented as 32 bits unsigned field in the bitcode

Not a problem. Just redirect through a table.

  • Introducing new synchronization scopes is naturally simple - minimal impact across the board

Can be made as easy with strings.

  • I think changing the synchronization scope representation to strings will impose unnecessary complexity
    • On maintaining backwards compatibility for the bitcode (in addition, strings might be of arbitrary length)

I don't understand the problem: add a mapping string <-> int in the bitcode once, keep an int to encode the syncscope, but use the mapping on read when the syncscope id is > current_max_number.

  • On using existing synchronization scopes (CrossThread, SingleThread) within the llvm. Do we want to represent those as strings as well?

Just in the textual format.

  • Few size-sensitive structures (e.g. MMO) have synchronization scope as a member, so changing syncscope to string will increase their size

I don't see why you're think that the internals of LLVM have to match the textual or bitcode representation. We routinely pool strings in the context.

  • +additional burden for carrying through strings throughout the compilation (compared to ints)

It seems to me that you're raising concerns that would only apply to a straight naive translation of strings in the IR to strings everywhere in the internals of LLVM. I didn't see anything that can't be addressed through a string pool in the context (and in the bitcode FWIW).

Just a wild thought:-) Since address spaces and memory scopes are both target defined concepts, it seems best for the target to define the enumeration of the ones it supports. Would it make sense for the data layout to not only give the properties of the address spaces, but also give their textual names, together with the textual names of the memory scopes supported by the target? The memory scopes would include the memory scope to use for singlethread and crossthread.

This way the target gets to enumerate the address spaces and memory scopes it supports, and provides the names that can be used in the textual dumps. The LLVM IR and bit code can use the target specific numeric values defined in the data layout. The target will then be able to define the numeric values it uses and not require any indirect mappings. Since the exact values are enumerated in the data layout a target can ensure that it supports the data layout. This also avoids using metadata, which is allowed to be deleted rendering it impossible for a target to know what the address spaces mean.

If not this then maybe something similar?

tony-tye edited reviewers, added: t-tye; removed: tony-tye.Mar 22 2017, 6:10 PM
kzhuravl updated this revision to Diff 94752.Apr 10 2017, 3:57 PM
  • Emit target specific tokens instead of integers in textual representation of llvm ir (similar to calling conventions)
  • Add bitcode test

Mehdi, did you have something like this in mind?

Thanks,

  • Konstantin
t-tye added inline comments.Apr 17 2017, 9:30 PM
lib/AsmParser/LLLexer.cpp
548 ↗(On Diff #94752)

Since there are now tokens for each syncscope value it seems it would be better to eliminate the "syncscope(ss)" syntax and just use the "ss" token in the same way "singlethread" and call convention abi tokens are used.

  • Emit target specific tokens instead of integers in textual representation of llvm ir (similar to calling conventions)
  • Add bitcode test

Mehdi, did you have something like this in mind?

Somehow: you're missing the bitcode storage using a string pool, as well as in memory in the LLVMContext, and getting rid of the AMDGPU_* enums in favor of dynamic lookup implemented in the target.

kzhuravl planned changes to this revision.Apr 21 2017, 9:22 AM
Anastasia added inline comments.Apr 25 2017, 8:41 AM
lib/IR/AsmWriter.cpp
2216 ↗(On Diff #94752)

Perhaps, I am missing something... I thought the original plan was to have a generic syncscope(num) for the targets to be able to interpret num is a specific way by different backends. Now, for outside of the main tree implementation it seems we will be forcing to extend the enum and thus maintaining the downstream changes. Would it be possible to allow arbitry numbers too without known meaning next to the known/identifiable scopes? I guess they could just be printed as integers similarly to the addrspace attribute in IR...

kzhuravl updated this revision to Diff 97997.May 5 2017, 12:18 PM
kzhuravl edited edge metadata.
  • Populate and lookup synchronization scopes through LLVMContext
  • CrossThread -> System
  • singlethread -> syncscope("singlethread")

+ Store syncscope names in the bitcode

kzhuravl added inline comments.May 5 2017, 12:23 PM
lib/IR/AsmWriter.cpp
2216 ↗(On Diff #94752)

I thought the original plan was to have a generic syncscope(num) for the targets to be able to interpret num is a specific way by different backends.

The updated change does exactly this, but with strings.

@pcc do you mind looking at the bitcode part? (and more if you feel like it).

t-tye added inline comments.May 5 2017, 6:34 PM
docs/LangRef.rst
2173 ↗(On Diff #71403)

In OpenCL, memory scopes define scope "levels" which correspond to how threads are grouped for execution. For example, a set of threads that belong to the same work-group, and specify the "workgroup" scope, only synchronize with threads in the same work-group instance. Threads in a different work-group instance do not synchronize even if they specify the "workgroup" scope.

So in general, if the scopes are target specific, it is also target specific how the the threads are grouped, and how scopes are related (for example inclusive).

I was also curious what the definition of singlethread is. My understanding was that synchronization only happens between atomic operations. So does both the signal handler and the non-signal handler code have to have atomic operations using singlethread? If used in a fence, what atomic operations pair with the fence to create the synchronizes-with relation? Generally non-atomic operations would not pair. The wording seems to imply a singlethread atomic is synchronizing with any operations in the same thread, and is presumably using thread to mean OS thread, rather than memory model notion of thread of execution which would have the signal handler and non signal handler code execution in different threads.

Since this text is fairly informal, how correct does it need to be?

7295 ↗(On Diff #97997)

Suggest [syncscope("<scope>")] to be consistent with other arguments. Same comment for other uses below.

t-tye added inline comments.May 5 2017, 6:34 PM
lib/CodeGen/MIRParser/MIParser.cpp
2082 ↗(On Diff #97997)

To be consistent with the LLVM IR syntax should this use a quoted string rather than an identified? Would need to add a MIToken::dquote and then consumeIfPresent(MIToken::dquote) before/after.

lib/CodeGen/MIRPrinter.cpp
1032 ↗(On Diff #97997)

To be consistent with the LLVM IR syntax should this be a quoted string?

lib/IR/Core.cpp
2747 ↗(On Diff #97997)

Should this and the other atomic instruction support building with a syncscope?

t-tye added inline comments.May 6 2017, 10:27 AM
lib/Bitcode/Reader/BitcodeReader.cpp
1755 ↗(On Diff #97997)

"Invalid multiple sync scope names blocks"

4600 ↗(On Diff #97997)

Should this assert(Val < SSIDs.size()) to ensure that the bitcode is not using a value that is not in the SYNC_SCOPE_NAMES_BLOCK_ID block?

t-tye added inline comments.May 6 2017, 10:38 AM
lib/Bitcode/Reader/BitcodeReader.cpp
4600 ↗(On Diff #97997)

Actually, since it is the bit code reader I guess it should use error().

if (Val > SSIDs.size())
  return error("Invalid sync scope");
return SSIDs[Val];
t-tye added inline comments.May 6 2017, 10:45 AM
lib/Bitcode/Reader/BitcodeReader.cpp
1784 ↗(On Diff #97997)

If the sync scope block is present, should be required to have at least one entry? That would be consistent with the assert for checking for duplicate blocks above (since it relies on SSIDs not being empty), and the writer seems to ensure that it never generates empty blocks. (In practice there will always be 2 entries for the singlethread and empty system scopes, but that could change in the future.)

if (SSIDs.empty())
  return error("Invalid empty sync scope names block");
kzhuravl updated this revision to Diff 98223.May 8 2017, 3:29 PM
kzhuravl marked 8 inline comments as done.

Address review comments.

lib/Bitcode/Reader/BitcodeReader.cpp
4600 ↗(On Diff #97997)

Mapped unknown to system as it was done before.

lib/IR/Core.cpp
2747 ↗(On Diff #97997)

I am not sure. Added a TODO.

t-tye added inline comments.May 8 2017, 9:01 PM
lib/Bitcode/Reader/BitcodeReader.cpp
1784 ↗(On Diff #97997)

I did not see that SSIDs.empty() being reported as an error after the loop has processed the block's entries..

lib/CodeGen/MIRParser/MIParser.cpp
2447–2459 ↗(On Diff #98223)

This does not match the syntax of a string in the LLVM IR parser (see LLLexer::ReadString()). It parses up to the next double quote, and then unescapes // and /nn.

Seems MIR should support the same values as the LLVM IR otherwise the MIR will give errors parsing sync scope names that are legal in the LLVM IR.

Similarly the printing of MIR syncscope should match any escaping () done.

Alternatively LLVM IR should enforce that the sync scope name strings are just identifiers in double quotes, and the documentation for sync scope updated to state what sync scope name strings are allowed.

kzhuravl added inline comments.May 8 2017, 9:10 PM
lib/Bitcode/Reader/BitcodeReader.cpp
1784 ↗(On Diff #97997)

lines 1766-1767.

lib/CodeGen/MIRParser/MIParser.cpp
2447–2459 ↗(On Diff #98223)

ok, I can take a look.

kzhuravl updated this revision to Diff 98352.May 9 2017, 2:48 PM
kzhuravl marked 4 inline comments as done.

Address review feedback.

pcc edited edge metadata.May 9 2017, 3:13 PM

The bitcode parts seem fine to me.

lib/IR/LLVMContextImpl.cpp
218 ↗(On Diff #98223)

Unused function

kzhuravl added inline comments.May 9 2017, 3:23 PM
lib/IR/LLVMContextImpl.cpp
218 ↗(On Diff #98223)

I was planning to start using it in the follow up patch, which implements AMDGPU's memory model. Would that be ok to leave this function in this patch? Or should it be moved to the follow up patch?

pcc added inline comments.May 9 2017, 3:30 PM
lib/IR/LLVMContextImpl.cpp
218 ↗(On Diff #98223)

Please move it to the follow-up so that we can see how it is used in context.

t-tye accepted this revision.May 9 2017, 3:40 PM

LGTM

kzhuravl updated this revision to Diff 98364.May 9 2017, 4:05 PM
kzhuravl marked 3 inline comments as done.

Address review feedback: remove unused function.

Is there a MIR test?

lib/CodeGen/MIRPrinter.cpp
709 ↗(On Diff #98696)

I suspect Using MF would be shorter here.

test/Bitcode/atomic-no-syncscope.ll
1 ↗(On Diff #98696)

Can you add a one line overview of what this test is doing?

Is there a MIR test?

Will test/CodeGen/MIR/AArch64/atomic-memoperands.mir count as a MIR test? Or should there be an additional test?

I will take care of other comments shortly.

Thanks.

I think so, but what would be nice to have a test that round-trip a a few non-standard strings (i.e. not "singlethread" or "system") through MIR and through bitcode.

I think so, but what would be nice to have a test that round-trip a a few non-standard strings (i.e. not "singlethread" or "system") through MIR and through bitcode.

Sounds good, I will add that one.

kzhuravl updated this revision to Diff 99335.May 17 2017, 12:07 PM
kzhuravl marked 2 inline comments as done.

Address review feedback.

mehdi_amini added inline comments.May 22 2017, 7:51 PM
docs/LangRef.rst
2173 ↗(On Diff #71403)

So labelling an atomic operation with a syncscope named "workgroup" would not be correct according to the definition above:

``If an atomic operation is marked ``syncscope("<a>")``, where ``<a>`` is target specific synchronization scope, then it *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations marked ``syncscope("<a>")``

That seems like an issue to me.

t-tye added inline comments.May 22 2017, 9:07 PM
docs/LangRef.rst
2173 ↗(On Diff #71403)

A suggested improvement would be:

If an atomic operation is marked ``syncscope("<a>")``, where ``<a>`` is target
specific synchronization scope, then it *synchronizes with*, and participates in
the seq\_cst total orderings of, other atomic operations marked
``syncscope("<a>")`` that a members of the same instance of scope ``<a>``.

The thing that is missing from the text is the notion of scope instances. When a thread is created, it is implicitly defined which scope instances it is a member. For example, when an OpenCL dispatch is executed, it specifies a device that it will be executed on. All threads for the dispatch are considered members of the agent scope instance for that device. Other dispatches executing on a different device will be members of the agent scope instance for that other device. Atomic operations marked `syncscope("agent") will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked syncscope("agent")` and are members of the same agent scope instance.

Furthermore, the threads of a dispatch are partitioned into separate work-group instances. Atomic operations marked `syncscope("workgroup") will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked syncscope("workgroup")` and are members of the same work-group scope instance.

Furthermore, the threads of a work-group are partitioned into separate subgroup instances. Atomic operations marked `syncscope("wave") will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked syncscope("wave")` and are members of the same subgroup scope instance.

The OpenCL memory model currently requires the scope specified by atomic memory operations must exactly match for them to be considered compatible scope instances. However, other memory models exist that have scopes, and allow scope inclusion where threads are considered in compatible scope instances if they are members of the same scope instance specified in the operation, or some smaller scope in the scope hierarchy.

Since scopes are target specific one would need to ask the target if two atomic operations "may" synchronize.

mehdi_amini added inline comments.May 22 2017, 10:41 PM
docs/LangRef.rst
2173 ↗(On Diff #71403)

The thing that is missing from the text is the notion of scope instances.

Yes and it seems to me that it is a significant issue with the design of syncscope here: we can't just have a yes/no answer when considering the synchronization of two operations, but we need consider the "may" as well (so the query answer would be {must, may, no}).
It also means that a generic analysis without target support can't infer anything based on the syncscopes, they are purely opaque.

I think at least @sanjoy or @reames should re-review this (and the text needs to be fixed to reflect the "may").

kzhuravl updated this revision to Diff 100013.May 23 2017, 3:36 PM

Update documentation based on Tony's suggestion.

kzhuravl marked an inline comment as done.

@sanjoy is already a review. I have added @reames as a reviewer.

docs/LangRef.rst
2173 ↗(On Diff #71403)

Text has been updated to include scope instances.

Once syncscopes are available, the next step could be to introduce tti queries that optimizations may need such as if two atomic operations "may" synchronize. Are there any current optimizations that need such queries now? If not, that could be done as follow up work when we know what is needed.

mehdi_amini added inline comments.May 23 2017, 4:22 PM
docs/LangRef.rst
2193 ↗(On Diff #100013)

scope instance aren't defined.
And I don't even see why bringing this here. Unless there is a generic and relevant IR definition of scope instance, what matters is just that the target can say yes/no/maybe.

kzhuravl updated this revision to Diff 100024.May 23 2017, 5:11 PM

Reword to drop scope instance.

kzhuravl marked an inline comment as done.May 23 2017, 5:11 PM
svenvh added a subscriber: svenvh.May 30 2017, 6:05 AM
t-tye accepted this revision.May 31 2017, 12:13 PM

LGTM

mehdi_amini requested changes to this revision.May 31 2017, 12:30 PM

LGTM

still waiting for input from @sanjoy or @reames (or @chandler maybe?)

This revision now requires changes to proceed.May 31 2017, 12:30 PM

To expand: my current concern is that the issue spotted with "instance" of scope makes the whole mechanism entirely opaque now, I'm not longer sure what is the point of specifying it at the IR level in a target-independent way.
@t-tye you were the one mentioning some generic understanding of the code as one of the original motivation IIRC?

t-tye added a comment.May 31 2017, 1:08 PM

To expand: my current concern is that the issue spotted with "instance" of scope makes the whole mechanism entirely opaque now, I'm not longer sure what is the point of specifying it at the IR level in a target-independent way.
@t-tye you were the one mentioning some generic understanding of the code as one of the original motivation IIRC?

I think an important motivation for this change is to be able to express the memory models used in GPU languages such as OpenCL within the LLVM atomics framework. This simplifies how GPU code generators implement support for atomics (for example, D24623 which is based on the earlier version of this patch but can easily be updated for the current version).

It also allows frontends to use the same approach to generate LLVM IR for atomics directly as is currently done for C++. This would allow GPU languages such as OpenCL to do the same (see D28691 which is similarly based on the earlier version but can be updated).

Using a common approach to representing atomics also would make it easier to add support for them to the memory sanitize and optimizations. To do this, suitable TTI queries can be added to provide the information needed as is done for other target specific features.

There have been others (@raunintc, @PFerreira, @sheredom and @rampitec) that have expressed a desire for LLVM to provide memory scope support for GPUs. The proposed approach has no impact on targets that do not require any hardware support for memory scopes, and is backwards compatible with previous LLVM bitcode. The changes suggested by @mehdi_amini have made it much easier to work with at the LLVM textual level.

sanjoy added inline comments.Jun 4 2017, 9:57 PM
docs/LangRef.rst
2192 ↗(On Diff #100024)

This middle paragraph was not clear to me. Are you trying to say that given two atomic operations op0 marked with syncscope("scope-0") and op1 marked with syncscope("scope-1"), whether they synchronize or not is a target specific function of "scope-0" and "scope-1"? If so, why not say just that?

t-tye added inline comments.Jun 5 2017, 12:21 AM
docs/LangRef.rst
2192 ↗(On Diff #100024)

It is actually also target dependent even if scope-0 is the same as scope-1. The intend of the text was to convey that it may be possible to determine if two atomic operations definitely do synchronize, definitely do not, or may synchronize as suggested by @mehdi_amini in an earlier comment.

However, would the following be better?

If an atomic operation is marked ``syncscope("<target-scope>")``, where
``<target-scope>`` is a target specific synchronization scope, then it is target
dependent if it *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations.
sanjoy added inline comments.Jun 5 2017, 1:04 PM
docs/LangRef.rst
2192 ↗(On Diff #100024)

That sounds better to me.

I'd also like you to please clarify what exactly you mean by "synchronizes with" -- is this the "synchronizes with" relation used in the C++ memory model or something different?

t-tye added inline comments.Jun 6 2017, 10:55 AM
docs/LangRef.rst
2192 ↗(On Diff #100024)

Yes it is the same term as used in the C++ memory model and defined in Memory Model for Concurrent Operations. The term is also being used elsewhere in this section to describe acquire and release etc.

kzhuravl updated this revision to Diff 101603.Jun 6 2017, 12:43 PM
kzhuravl edited edge metadata.
kzhuravl marked 2 inline comments as done.

Update docs as suggested.

(Clarification the "code" part LGTM, so as soon as Sanjoy is OK with LangRef, that's fine with me)

Anastasia added inline comments.Jun 8 2017, 2:53 PM
docs/LangRef.rst
2192 ↗(On Diff #101603)

Why is it only seq_cst? I am guessing the other ordering types (e.g. acquire/release) be used with the synchronization scopes too? I think this is a bit confusing here...

t-tye added inline comments.Jun 8 2017, 3:18 PM
docs/LangRef.rst
2192 ↗(On Diff #101603)

This text was based on the style of writing used in the description for singlethread above. My interpretation of that text was that it is providing restrictions on the regular C++ memory model. So the manner in which the atomic interacts with the global sequential consistency ordering still holds (in other words an acquire or release does not interact with it, but a seq_cst does). Given the level of detail of this text it seemed reasonable as a user really needs to go read the formal definition of the memory models anyway to truly understand the semantics.

kzhuravl updated this revision to Diff 102592.Jun 14 2017, 12:33 PM

Bring up to date with trunk.

Anastasia accepted this revision.Jun 21 2017, 4:07 AM

LGTM from the OpenCL programming support side! Thanks!

docs/LangRef.rst
2192 ↗(On Diff #101603)

Yes, I though it was copied from the previous text. Perhaps over-complicating it with the ordering types of C++11 isn't a good idea indeed... although we could mention it briefly and add a reference.

Thanks for review @mehdi_amini and @Anastasia. @sanjoy would you also be able to take a look?

Thanks,
Konstantin

kzhuravl updated this revision to Diff 105496.Jul 6 2017, 11:44 AM

Bring up to date with trunk (resolve a conflict).

Ping. Hi @sanjoy, can you take a look?

Thanks.

sanjoy edited edge metadata.Jul 10 2017, 10:30 PM

langref changes lgtm

docs/LangRef.rst
2221 ↗(On Diff #105496)

Minor, but should this be "orderings of other atomic operations"? Same for later.

mehdi_amini accepted this revision.Jul 11 2017, 10:19 AM

With @sanjoy approval of langref, LGTM.

(feel free to commit with @sanjoy minor comment).

This revision is now accepted and ready to land.Jul 11 2017, 10:19 AM
This revision was automatically updated to reflect the committed changes.
kzhuravl marked an inline comment as done.