This is an archive of the discontinued LLVM Phabricator instance.

Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.
AcceptedPublic

Authored by jyknight on Feb 22 2021, 1:29 PM.

Details

Summary

Following the LLVM change to add an alignment argument to the
IRBuilder calls, switch Clang's CGBuilder variants to take an Address
type. Then, update all callers to pass through the Address.

There is one interesting exception: Microsoft's
InterlockedCompareExchange128 family of functions are documented to
require (and assume) 16-byte alignment, despite the argument type
being only long long*.

Diff Detail

Event Timeline

jyknight requested review of this revision.Feb 22 2021, 1:29 PM
jyknight created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2021, 1:29 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Do we really consider the existing atomic intrinsics to not impose added alignment restrictions? I'm somewhat concerned that we're going to produce IR here that's either really suboptimal or, worse, unimplementable, just because we over-interpreted some cue about alignment. I guess it would only be a significant problem on a target where types are frequently under-aligned for what atomics need, which is not typical, or when the user is doing atomics on a field of something like a packed struct.

clang/lib/CodeGen/CGBuiltin.cpp
362–366

Since you're changing this code anyway, please make this do CreateElementBitCast(DestAddr, Int128Ty) so that it's address-space-correct.

There are a lot of other lines in the patch that would benefit from the same thing.

3819

This should be using EmitPointerWithAlignment instead of assuming an alignment of 1.

jyknight updated this revision to Diff 328219.Mar 4 2021, 9:53 AM
jyknight marked 2 inline comments as done.

Address review comments.

Do we really consider the existing atomic intrinsics to not impose added alignment restrictions? I'm somewhat concerned that we're going to produce IR here that's either really suboptimal or, worse, unimplementable, just because we over-interpreted some cue about alignment. I guess it would only be a significant problem on a target where types are frequently under-aligned for what atomics need, which is not typical, or when the user is doing atomics on a field of something like a packed struct.

For the __atomic_* intrinsics, we don't consider those as imposing additional alignment restrictions -- currently, we avoid generating the LLVM IR instruction if it's below natural alignment, since we couldn't specify alignment on the IR instruction. (Now that we have alignment on the LLVM IR operations, I'd like to eventually get rid of that logic from Clang, since it's also handled by LLVM.)

For other intrinsics (e.g. MSVCIntrin::_InterlockedAnd, Builtin::BIsync_fetch_and_add_4, NVPTX::BInvvm_atom_add_gen_i, and the others in those 3 function families), we currently entirely ignore the alignment, and simply assume the argument is naturally-aligned. So, yes, this change could potentially affect the behavior for underaligned types.

So, I could change these to explicitly increase the assumed alignment of their arguments, like I did for InterlockedCompareExchange128. My inclination is not to do so, however...it doesn't seem like a good idea in general to ignore type alignment. But, I'd not be opposed to doing that, if there's a good reason.

Do we really consider the existing atomic intrinsics to not impose added alignment restrictions? I'm somewhat concerned that we're going to produce IR here that's either really suboptimal or, worse, unimplementable, just because we over-interpreted some cue about alignment. I guess it would only be a significant problem on a target where types are frequently under-aligned for what atomics need, which is not typical, or when the user is doing atomics on a field of something like a packed struct.

For the __atomic_* intrinsics, we don't consider those as imposing additional alignment restrictions -- currently, we avoid generating the LLVM IR instruction if it's below natural alignment, since we couldn't specify alignment on the IR instruction. (Now that we have alignment on the LLVM IR operations, I'd like to eventually get rid of that logic from Clang, since it's also handled by LLVM.)

Frontends ultimately have the responsibility of making sure they ultimately emit code that follows the platform ABI for atomics. In most other parts of the ABI, we usually find that it is possible (even necessary) to delegate *part* of that ABI responsibility down to LLVM — e.g. to emit inline atomic sequences, which I suppose frontends could do with inline assembly, but there are obvious reasons to prefer a more semantic IR — but that at least in some cases, there is information that we cannot pass down and so must handle in the frontend. I am somewhat skeptical that atomics are going to prove an exception here. At the very least, there will always be *some* operations that we have to expand into compare-exchange loops in the frontend, for want of a sufficiently powerful instruction/intrinsic. That said, if you find that you can actually free Clang from having to make certain decisions here, that's great; I just want you to understand that usually we find that there are limits to what we can usefully delegate to LLVM, and ultimately the responsibility is ours.

For other intrinsics (e.g. MSVCIntrin::_InterlockedAnd, Builtin::BIsync_fetch_and_add_4, NVPTX::BInvvm_atom_add_gen_i, and the others in those 3 function families), we currently entirely ignore the alignment, and simply assume the argument is naturally-aligned. So, yes, this change could potentially affect the behavior for underaligned types.

So, I could change these to explicitly increase the assumed alignment of their arguments, like I did for InterlockedCompareExchange128. My inclination is not to do so, however...it doesn't seem like a good idea in general to ignore type alignment. But, I'd not be opposed to doing that, if there's a good reason.

The idea here is not to "ignore type alignment". EmitPointerWithAlignment will sometimes return an alignment for a pointer that's less than the alignment of the pointee type, e.g. because you're taking the address of a packed struct field. The critical question is whether the atomic builtins ought to make an effort to honor that reduced alignment, even if it leads to terrible code, or if we should treat the use of the builtin as a user promise that the pointer is actually more aligned than you might think from the information statically available. That has to be resolved by the semantics of the builtin, and unfortunately intrinsic documentation is often spotty on this point. For example, the MSDN documentation for InterlockedIncrement says that it requires 32-bit alignment, but the documentation for InterlockedAdd doesn't. It seems extremely unlikely that that is meant to be read as a statement that InterlockedAdd is actually more permissive. I would say that all of the Interlocked APIs ought to be read as guaranteeing the natural, full-width alignment for their operation. I'm less certain about what to do with the __atomic_* builtins, because maybe we should take this as an opportunity to try to "do the right thing" with under-aligned atomics; on the other hand, that assumes that we always *can* "do the right thing", and I don't want Clang to start miscompiling or refusing to compile code because we're trying to do something above and beyond the normal language semantics. It might be more reasonable to always use the type alignment as a minimum.

The idea here is not to "ignore type alignment". EmitPointerWithAlignment will sometimes return an alignment for a pointer that's less than the alignment of the pointee type, e.g. because you're taking the address of a packed struct field. The critical question is whether the atomic builtins ought to make an effort to honor that reduced alignment, even if it leads to terrible code, or if we should treat the use of the builtin as a user promise that the pointer is actually more aligned than you might think from the information statically available.

Agreed -- that is the question. In general, I'd like to default to basing decisions only on the statically-known alignments, because I think that'll typically be the best choice for users. Where there's something like a packed struct, it's likely that the values will end up under-aligned in fact, not just in the compiler's understanding.

For example, the MSDN documentation for InterlockedIncrement says that it requires 32-bit alignment [...]. I would say that all of the Interlocked APIs ought to be read as guaranteeing the natural, full-width alignment for their operation.

I had missed that it was documented in some of the other functions (beyond InterlockedCompareExchange128). I'll change the remainder of the _Interlocked APIs to assume at least natural alignment.

I'm less certain about what to do with the __atomic_* builtins

The __atomic builtins have already been supporting under-aligned pointers all along -- and that behavior is unchanged by this patch.

jyknight updated this revision to Diff 330088.Mar 11 2021, 3:45 PM

Use natural alignment for _Interlocked* intrinsics.

I'm less certain about what to do with the __atomic_* builtins

The __atomic builtins have already been supporting under-aligned pointers all along -- and that behavior is unchanged by this patch.

How so? Clang hasn't been passing down an alignment, which means that it's been building atomicrmw instructions with the natural alignment for the IR type, which means we've been assuming that all pointers have a least that alignment. The LLVM documentation even says that atomicrmw doesn't allow under-alignment.

I'm less certain about what to do with the __atomic_* builtins

The __atomic builtins have already been supporting under-aligned pointers all along -- and that behavior is unchanged by this patch.

How so? Clang hasn't been passing down an alignment, which means that it's been building atomicrmw instructions with the natural alignment for the IR type, which means we've been assuming that all pointers have a least that alignment. The LLVM documentation even says that atomicrmw doesn't allow under-alignment.

We construct a libcall to __atomic_* routines in the frontend upon seeing an underaligned argument, instead of letting the backend handle it -- there's a bunch of code at https://github.com/llvm/llvm-project/blob/bc4a5bdce4af82a5522869d8a154e9e15cf809df/clang/lib/CodeGen/CGAtomic.cpp#L790 to handle that. I'd like to rip most of that out in the future, and just let the backend handle it in more cases.

E.g.

typedef int __attribute__((aligned(1))) unaligned_int;
bool cmpxchg_u(unaligned_int *x) {
    int expected = 2;
    return __atomic_compare_exchange_n(x, &expected, 2, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
}

generates a libcall to __atomic_compare_exchange_4 (in IR, generated in the Clang code), instead of creating a cmpxchg IR instruction, due to the under-alignment. (Although, sigh, I've only just noticed we actually have a problem here -- the __atomic_*_SIZE libcalls are supposed to require an aligned argument -- so we should be using __atomic_compare_exchange (without size suffix) instead. Gah.)

Ping. I think this is correct, and would like to commit.

rjmccall accepted this revision.Jun 23 2021, 12:54 PM

Alright, well, this does look cleaner.

This revision is now accepted and ready to land.Jun 23 2021, 12:54 PM