This is an archive of the discontinued LLVM Phabricator instance.

Separate {Min,Max}AtomicLockFreeWidth from MaxAtomicInlineWidth
AbandonedPublic

Authored by sunfish on Mar 30 2017, 4:27 PM.

Details

Reviewers
efriedma
Summary

Currently, clang uses the MaxAtomicInlineWidth to determine both whether atomics up to a certain size can be "inlined" (aka, not a libcall), and also whether those atomics are guaranteed to be always lock-free. This implicitly assumes that if, for example, a 32-bit atomic is guaranteed lock free, that 16-bit and 8-bit are also guaranteed lock free, because they're also less than the maximum.

WebAssembly's proposed design for atomics has a guarantee that 4-byte atomics are always lock-free, but not 2-byte or 1-byte atomics. One reason is that on MIPS at least, it's not clear from available documentation that doing a 32-bit ll+sc sequence to implement an 8-bit or 16-bit atomic is safe if there's a possibility that a non-atomic access could concurrently modify any of the other 24 or 16 bits, respectively.

Clang on MIPS uses ll+sc to implement 8-bit and 16-bit atomics, and declares that they are lock-free (as they are within the MaxAtomicInlineWidth). It appears to do similar things on other targets.

Does clang know that this is always safe to do on MIPS and other targets? If there's documentation guaranteeing this, it would be desirable to feed this information back to the WebAssembly community so that the atomics proposal can be updated to reflect this.

Alternatively, if it's not safe, the attached patch may be useful. It introduces MinAtomicLockFreeWidth and MaxAtomicLockFreeWidth and uses it in the WebAssembly target to limit it to only 32-bit lock-free atomics. Possibly it would be appropriate to make MIPS and other targets use this too, however I have not yet made such changes.

Diff Detail

Repository
rL LLVM

Event Timeline

sunfish created this revision.Mar 30 2017, 4:27 PM

Not sure what MIPS reference you're looking at; the ones I can find say something like this: "If either of the following events occurs between the execution of LL and SC, the SC fails: A coherent store is completed by another processor or coherent I/O module into the block of synchronizable physical memory containing the word. The size and alignment of the block is implementation-dependent, but it is at least one word and at most the minimum page size. [...]"

The ARM architecture reference has a detailed description of semantics of the local and global monitors; the global monitor protects a block of memory of at least 8 bytes.

The perspective from the folks designing the WebAssembly memory model is that

  • WebAssembly has greater backwards-compatibility requirements than LLVM does, so it needs to be more conservative, in general.
  • 1-byte and 2-byte atomic operations aren't very important.

Consequently, WebAssembly intends to retain the property that 4-byte atomics are guaranteed lock-free, and 2-byte and 1-byte are not guaranteed. To implement this in clang, we'll need something like the patch here.

joerg added a subscriber: joerg.May 16 2017, 7:05 AM

I don't get this. If you have a lock-free 32bit CAS, you can implement all the 16bit operations on top of that and they are still lock-free.

There could conceivably be CPUs where 32-bit CAS is not the most efficient way to implement an 8-bit or 16-bit atomic operation. The other assumption is that 8-bit and 16-bit are much less important here, so the benefit of making the guarantee isn't seen to be worth the risks at this time. With backwards compatibility, one can always add guarantees that turn out to be useful in the future, but not remove guarantees that turn out to be harmful.

For further questions, I encourage you to file an issue at the threads proposal repository. The proposed lock-free guarantee is currently stated here.

joerg added a comment.May 17 2017, 2:00 AM

Lock-free operations provide two major advantages. You don't need to worry about signal safety and they can be mixed with certain non-atomic operations without creating havoc. All I said is that the presence of a 32bit CAS is enough to ensure lock-free atomic operations for (appropriately aligned) 8bit and 16bit values can be implemented. That will not be worse than any locked atomic operation on any non-brain-dead platform. E.g. it might not be true on SPARC v7 and v8 or the VAX, but then they don't have a 32bit CAS. It doesn't mean that you have to implement 16bit atomics using a 32bit CAS. I would hope the webassembly backend exposes atomics as precise operations, but if it only provides a 32bit CAS, that's still good enough for the needs of any frontend language.

If you believe the WebAssembly design is unreasonable, please do file an issue. The memory model is just a proposal right now and can be changed.

Otherwise, this is how the platform works. It has 8-bit and 16-bit atomics, and they are not guaranteed lock-free. LLVM's main options seem to be: accept this patch, use 32-bit CAS for all 8-bit and 16-bit atomics in wasm and be inefficient on hardware that does have real 8-bit and 16-bit atomics, or be buggy.

sunfish abandoned this revision.Jun 2 2017, 4:50 PM

From the discussion at the recent WebAssembly CG meeting, it seems likely that the shared memory proposal will be modified such that this patch will no longer be needed. Closing.