Page MenuHomePhabricator

Add __atomic_* lowering to AtomicExpandPass.
ClosedPublic

Authored by jyknight on Mar 15 2016, 3:38 PM.

Details

Summary

AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of sync_* builtins,
atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering code,
where unsupported atomic instructions would turn into sync_* library
functions. Because of that, Clang avoids emitting atomic instructions
when this will happen, and emits
atomic_* library functions itself in
the frontend.

It is advantageous to do the lowering to atomic libcalls before ISel
time, because it's important that all atomic instructions for a given
size either lower to __atomic_* libcalls, or don't. No mixing and
matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

There's also a few minor other changes:

  • getInsertFencesForAtomic() is replaced with shouldInsertFencesForAtomic(), so that the decision can be made per-instruction. (This will be used in the next patch)
  • emitLeadingFence/emitTrailingFence are no longer called when shouldInsertFencesForAtomic is false, so don't need to check that condition themselves.

Diff Detail

Repository
rL LLVM

Event Timeline

jyknight updated this revision to Diff 50779.Mar 15 2016, 3:38 PM
jyknight retitled this revision from to Add __atomic_* lowering to AtomicExpandPass..
jyknight updated this object.
jyknight added subscribers: theraven, rnk, hfinkel and 4 others.

Very nice! The eventual Clang simplification should be a huge relief and more than make up for this code (not to mention other backends). In general the code looks fine (most of my suggestions are nit-picking over comments).

Tim.

docs/Atomics.rst
483 ↗(On Diff #50779)

Case typo on LIbrary?

553–554 ↗(On Diff #50779)

I probably wouldn't mention ARM here, since I think ldrex/strex is always available in Thumb mode if it is in ARM. We won't feel neglected, honestly!

include/llvm/Target/TargetLowering.h
1055 ↗(On Diff #50779)

"atomic lowering" instead of "DAG builder", since we're changing it around anyway?

lib/CodeGen/AtomicExpandPass.cpp
1098–1099 ↗(On Diff #50779)

"So, expand to a CAS libcall instead, via a CAS loop"? I was panicking there for a while until I saw the expandAtomicCASToLibcall.

1152–1153 ↗(On Diff #50779)

Perhaps RTLibType = Libcalls[Log2_32(Size) + 1]?

1228 ↗(On Diff #50779)

The last 2 args are the defaults aren't they?

I would strongly prefer to see the two NFC changes separated and submitted separately. (Assuming you're reasonable sure they are NFC, that doesn't need further review.) Having everything in the same patch makes it harder to spot semantic changes in the diff.

Mostly minor comments follow.

I have to admit, I don't really understand the distinction between atomic_*, sync_*, and the buildins even after reading the documentation. Do you have any suggestions for further reading?

include/llvm/CodeGen/RuntimeLibcalls.h
500 ↗(On Diff #50779)

Separable NFC change?

lib/CodeGen/AtomicExpandPass.cpp
115 ↗(On Diff #50779)

Alignment can't be zero for atomic instructions. Make this an assert.

123 ↗(On Diff #50779)

Same as previous

129 ↗(On Diff #50779)

This seems like an inconsistency we should fix. Could you file a bug for this so that we don't forget?

Also, this function probably makes sense to promote to getAlignment on the AtomicRMWInst class itself.

941 ↗(On Diff #50779)

Use an enum for the memory_order_* values. Also, don't we have that somewhere already?

jyknight added inline comments.Mar 16 2016, 11:26 AM
docs/Atomics.rst
483 ↗(On Diff #50779)

The strange case actually is the correct URL. Probably a Wiki link thing.

553–554 ↗(On Diff #50779)

I believe that's not true: ArmV6 had ldrex/strex, but its "Thumb1" mode did not. It seems to have been introduced with "Thumb2" in ARMv6T2 and ARMv7 and later.

The dependent CL adds a new function in ARMSubtarget.h:

bool hasLdrex() const {
  return HasV6Ops && (!InThumbMode || HasV8MBaselineOps);
}

(effectively the same logic was previously in ARMSubtarget::enableAtomicExpand)

lib/CodeGen/AtomicExpandPass.cpp
1152–1153 ↗(On Diff #50779)

I dunno, that seems harder to understand to me. Really I think it'd be best as a switch, just with 1/3rd the number of lines in it:

switch (Size) {
case 1: RTLibType = Libcalls[1]; break;
case 2: RTLibType = Libcalls[2]; break;
case 4: RTLibType = Libcalls[3]; break;
case 8: RTLibType = Libcalls[4]; break;
case 16: RTLibType = Libcalls[5]; break;
}

But, llvm's clang-format style doesn't use
AllowShortCaseLabelsOnASingleLine: true

Maybe it should.

1228 ↗(On Diff #50779)

Indeed; fixed all such instances.

jyknight added inline comments.Mar 16 2016, 11:37 AM
lib/CodeGen/AtomicExpandPass.cpp
115 ↗(On Diff #50779)

I want to change that, to allow specifying atomic IR instructions with any alignment. They'll be expanded to a libcall which uses a mutex if unaligned, of course.

From original plan email:

A4) In LLVM, add "align" attributes to cmpxchg and atomicrmw, and allow specifying "align" values for "load atomic" and "store atomic". LLVM will lower them to the generic library calls. In clang, start lowering misaligned atomics to these llvm instructions as well.

129 ↗(On Diff #50779)

I agree it needs to be fixed. See quote from plan above. I guess I can file a bug for it if that makes it more likely someone else will do the work and I don't end up having to...

This can't really be made a getAlignment() on the class, because that would make it even more inconsistent with load and store's getAlignment functions: those return only the user-specified alignment attribute, and don't depend on DataLayout.

941 ↗(On Diff #50779)

Sure, I can put it in a local enum.We wouldn't have it anywhere in LLVM, because only clang deals with these values at the moment.

I mean, I suppose I could use stdatomic.h's enum memory_order...but it seemed better not to.

t.p.northover added inline comments.Mar 16 2016, 11:39 AM
docs/Atomics.rst
553–554 ↗(On Diff #50779)

Oops. Sorry.

lib/CodeGen/AtomicExpandPass.cpp
1152–1153 ↗(On Diff #50779)

I'd be happy with that.

reames added inline comments.Mar 16 2016, 11:43 AM
lib/CodeGen/AtomicExpandPass.cpp
115 ↗(On Diff #50779)

This sounds like a reasonable future direction, but it is not the state as of this patch. As a result, you'd be introducing completely dead code if you landed as is. Please do not do this. Introduce the assert, then change in a future commit which actually introduces the new language feature.

I would strongly prefer to see the two NFC changes separated and submitted separately. (Assuming you're reasonable sure they are NFC, that doesn't need further review.) Having everything in the same patch makes it harder to spot semantic changes in the diff.

Okay. I'll submit those, and then upload a new version of this patch without those changes, and with the modifications requested by reviewers.

I have to admit, I don't really understand the distinction between atomic_*, sync_*, and the buildins even after reading the documentation. Do you have any suggestions for further reading?

Yeah, it's pretty tricky...

So, the builtins are described in GCC's documentation:
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

One thing to note there is that sync_* builtins are now almost identical to the similarly-named atomic_* builtins, without the last ordering parameter.

The `__atomic_*` library functions are described here:
https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary

The `__sync_* library functions are not really described anywhere as far as I know, they are just an implementation detail between the compiler and the compiler-support library. They were originally just the default lowering of the __sync_*` builtins when there was no pattern for the target.

It's all especially confusing because the state of all this evolved a bunch between earlier versions of GCC and the 4.7/4.8 timeframe when it all sort of came together.

lib/CodeGen/AtomicExpandPass.cpp
115 ↗(On Diff #50779)

OK, will do.

reames added inline comments.Mar 16 2016, 12:44 PM
lib/CodeGen/AtomicExpandPass.cpp
1177 ↗(On Diff #50779)

If I'm reading the documentation correctly, the sized versions are purely an optimization over the generic versions correct? The compiler is free to emit a mix of sized and generic calls on the same object?

If so, it would be nice to have the code reflect that. In particular, it might make sense to extract a helper function which specializes generic calls with known sizes to the specialized form, and use that in way that manually written generic calls get specialized as well. I think that would also give you an logically distinct chunk which could be factored out and submitted separately.

jyknight updated this revision to Diff 50902.Mar 16 2016, 6:30 PM

Tweaks for review

jyknight added inline comments.Mar 16 2016, 6:34 PM
lib/CodeGen/AtomicExpandPass.cpp
1177 ↗(On Diff #50779)

That is correct. Except that there's no generic version of some of the functions (the integer math ones).

Also, I'm not sure I followed what you're suggesting. Are you saying you'd like the code to emit a call to a non-size-specialized function, and then run a later optimization pass to turn that into another call, to a size-specialized function?

If so, I think that'd be strictly worse. Going through an intermediate step would provide no value. There should be no manually written calls to these library functions. Users call the clang builtins (or, even better, the C11/C++11 standard functions), which will get lowered to "load atomic", "store atomic", "atomicrmw", or "cmpxchg" llvm IR instructions. From those, we'll end up here.

And if someone is crazy and calls the library functions directly, llvm shouldn't do try to identify those calls and do anything to them.

reames requested changes to this revision.Mar 17 2016, 6:20 PM
reames added a reviewer: reames.

More comments. This is a ways from submission just on code quality and test coverage alone.

lib/CodeGen/AtomicExpandPass.cpp
68 ↗(On Diff #50902)

This function seems overly and falsely generic. If I'm reading this correctly, the CASExpected and Ordering2 arguments are only used for CAS operations. I suggest just specializing this based on whether this is a CAS or not.

172 ↗(On Diff #50902)

a) move this filter into the previous loop
b) separate and land this refactoring without further review

181 ↗(On Diff #50902)

The current structure of this code will break floating point and pointer atomic canonicalization. The previous code fell through, the new code does not. I suspect you need to swap the order of the two steps.

970 ↗(On Diff #50902)

This should be the default in the switch.

980 ↗(On Diff #50902)

Wait, what? Why do we need this? We can simply test whether we have a particular sized libcall. The enable logic should be strictly contained there.

This strongly hints to me we should split the libcall part out into it's own change. This would also make it more natural to write tests for the lowering. (Which tests are missing from the current patch and required.)

1000 ↗(On Diff #50902)

Use an assert.

1013 ↗(On Diff #50902)

assert

1110 ↗(On Diff #50902)

This would be much more naturally expressed at the callsite by having this function simply return false when the expansion doesn't work, and then applying the fallback along with the second call.

1137 ↗(On Diff #50902)

Please use an ArrayRef for the last argument.

1178 ↗(On Diff #50902)

What I was suggesting was that you first found the generic libcall, and then given the size information, selected the sized version if one was available. I find the mixing of sized and unsized functions in your current array structure very confusing and hard to follow.

I hadn't realized there were no generic versions for some of them. That seems unfortunately inconsistent.

Also, I strongly disagree with your position that we should optimize manually written generic calls into specific ones. If it truly is an optimization, we should perform it. If nothing else, it removes a lot of complexity from the potential frontends.

1243 ↗(On Diff #50902)

The difference between generic (allocas) and specific call signatures is large enough that trying to share the code is actively confusing. Please just split this method based on the UseSizedLibcall variable or extract helper functions.

This revision now requires changes to proceed.Mar 17 2016, 6:20 PM
davidxl added inline comments.
test/Transforms/AtomicExpand/SPARC/libcalls.ll
10 ↗(On Diff #50902)

Perhaps move the option to the command line?

48 ↗(On Diff #50902)

Are these casting from i16 * -> i8 * necessary?

Also should __atomoic_compare_exchange_2's 4th argument be a bool value selecting weak/strong form of compare and exchange ?

77 ↗(On Diff #50902)

are the checks for alloca, bitcast necessary for the test?

jyknight added inline comments.Mar 18 2016, 11:46 AM
lib/CodeGen/AtomicExpandPass.cpp
181 ↗(On Diff #50902)

I don't believe so. This code handles those types, via appropriate calls to Builder.CreateBitOrPointerCast.

Also see the test cases starting with test_load_double. I'm missing a test for pointers, there, but it does handle them too.

970 ↗(On Diff #50902)

No it shouldn't; written this way, if someone adds new cases it will emit a compiler warning. That's not terribly important here, but it's a good idiom.

980 ↗(On Diff #50902)

We can simply test whether we have a particular sized libcall.

Simply test how?? What do you mean?

This strongly hints to me we should split the libcall part out into it's own change.

What? This entire patch is about generating libcalls. What do you mean split it into its own change? Separate from what other part?

(Which tests are missing from the current patch and required.)

Which tests would you like to see? There are some tests already; I can add more, but I don't know what you actually mean by that, either.

1000 ↗(On Diff #50902)

You mean like:

bool expanded = expandAtomicOpToLibcall(...);
assert(expanded);

Not sure why that's better, but sure.

1110 ↗(On Diff #50902)

I don't know what you mean by that.

However, I do see that I can remove the 3 Builder.* calls below by simply calling createCmpXchgInstFun, which is the same thing.

1178 ↗(On Diff #50902)

If you could please be more specific about what you found confusing maybe I can try to address that.

However, regarding the "optimization" of generic libcalls into sized libcalls: no, that just makes no sense to do. It will not do anything but obfuscate things. I'll reiterate: nobody should ever be writing calls to these libcalls themselves. Even doing so will take some effort if you're trying to do that from C.

It also cannot possibly reduce any frontend complexity, because the frontend builtins which users call are not the same as these libcalls. The frontend recognizes the builtins, and lowers to LLVM atomic IR instructions. Sometimes, now, clang also lowers the builtins to these libcalls at its layer, but the plan is to remove that, once the support in LLVM itself is all in place.

For example, compare (from https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html):

Built-in Function: bool __atomic_compare_exchange_n (type *ptr, type *expected, type desired, bool weak, int success_memorder, int failure_memorder)
Built-in Function: bool __atomic_compare_exchange (type *ptr, type *expected, type *desired, bool weak, int success_memorder, int failure_memorder)

(Note the "n" in "__atomic_compare_exchange_n" is literal letter n, not a stand-in for a number.)

with the libcall documented below (and in https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary):

bool  __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired,
                                  int success_order, int failure_order)
bool  __atomic_compare_exchange(size_t size, void *ptr, void *expected,
                                void *desired, int success_order,
                                int failure_order)

(with "N=1,2,4,8,16.)

They are just completely different.

1243 ↗(On Diff #50902)

I actually had it split along that axis in a former version of the patch, but I didn't like it, because there were two copies of basically the same function, with just minor variations. I found I was flipping back and forth trying to see what the differences were. So, I merged the two, and like it better this way.

test/Transforms/AtomicExpand/SPARC/libcalls.ll
48 ↗(On Diff #50902)

The casts aren't strictly necessary, in that all pointers are equivalent. But, the code needed to choose something to be the signature of the libcall, and it was easier to just use i8* everywhere (which, btw, is the same thing clang is doing today). When the typeless pointers work is done, that'll go away of course.

No, __atomic_compare_exchange_* don't have a weak/strong argument in the libcall. Only in the frontend builtin.

77 ↗(On Diff #50902)

You mean: don't check the whole function output, just some particularly representative lines? I think it's pretty important to check all of the instructions, since each one has some distinct code to emit it.

davidxl added inline comments.Mar 18 2016, 12:53 PM
test/Transforms/AtomicExpand/SPARC/libcalls.ll
48 ↗(On Diff #50902)

From the documentation, the signature of __atomic_compare_exchange_N takes iN* as argument type and it is also how the lib function is implemented.

77 ↗(On Diff #50902)

The problem is that it exposes implementation details which may change in the future. For instance

  1. the bitcast may go away
  2. even though I understand the need for insertvalue/extractvalue (see the fetchadd impl test below using a loop) in current implementation, strictly speaking, they can in theory be collapsed if the lowering sees a slightly larger scope

These are minor issues that do not need to be fixed IMO -- just a comment.

jfb added inline comments.Mar 24 2016, 2:48 PM
docs/Atomics.rst
424 ↗(On Diff #50902)

Code-quote `cmpxchg` here and below.

553 ↗(On Diff #50902)

"which is has"

559 ↗(On Diff #50902)
include/llvm/Target/TargetLowering.h
1053 ↗(On Diff #50902)

Size is usually bytes for LLVM, and SizeInBits is used in the name when we expect bits. I think you sound rename the better / setter and data member appropriately.

lib/CodeGen/AtomicExpandPass.cpp
131 ↗(On Diff #50902)

Reference the bug number.

970 ↗(On Diff #50902)

Agree with @jyknight on this one. You still need unreachable for the compilers that aren't smart enough to see all paths return.

1094 ↗(On Diff #50902)

Change default to FIRST_BINOP, LAST_BINOP, BAD_BINOP and put the unreachable after the switch.

1144 ↗(On Diff #50902)

Why 16? Could use DL->StackNaturalAlign?

jyknight updated this revision to Diff 52275.Mar 31 2016, 1:21 PM
jyknight edited edge metadata.
jyknight marked 7 inline comments as done.

Updates for review comments.

I believe I've addressed as many of the comments as I can. The ones I replied to saying I don't understand remain unresolved.

lib/CodeGen/AtomicExpandPass.cpp
1110 ↗(On Diff #50902)

When attempting that change, I remembered why I didn't in the first place: I need the "AtomicCmpXchgInst *Pair" to pass to expandAtomicCASToLibcall, and it's not readily accessible if I just call that other fn. So I've not made any change here.

1144 ↗(On Diff #50902)

Yeah that seems wrong actually.

The point of setting the alignment is to ensure that the value is aligned sufficiently to be able to cast to an integer of the proper width and load that, as the libcall function may do that. "16" was because that's the maximum specialized size...

But actually, I think it should be using DL.getPrefTypeAlignment(SizedIntTy), so switched to that.

test/Transforms/AtomicExpand/SPARC/libcalls.ll
10 ↗(On Diff #50902)

I'm not sure that's very useful. It can't really do multiple architectures in one test anyhow, since there's no way to say "REQUIRES:" separately for different run lines, is there?

48 ↗(On Diff #50902)

Whether the pointee types match doesn't really matter, though. Especially as the plan is to remove typed pointers entirely, at which point it'll just be "ptr" or something, it seemed best to just use the same kind always.

jfb added inline comments.Apr 4 2016, 1:37 PM
include/llvm/CodeGen/RuntimeLibcalls.h
401 ↗(On Diff #52275)

"new" is a comment that won't age well. Could you instead comment on SYNC_ saying ATOMIC_ are the newer approach, and refer to docs?

jyknight updated this revision to Diff 52724.Apr 5 2016, 12:35 PM
jyknight marked 15 inline comments as done.
jyknight edited edge metadata.

3rd round review fixes.

jfb added a comment.Apr 5 2016, 2:59 PM

Looks good overall, I assume @reames wants to do another round though.

jfb added inline comments.Apr 7 2016, 3:49 PM
lib/CodeGen/AtomicExpandPass.cpp
940 ↗(On Diff #52724)

See this patch for a fix: http://reviews.llvm.org/D18875

Ping. I'd really like to get this in, so I can move on to the rest of the atomics work.

@reames, especially, since you have outstanding concerns...

jfb accepted this revision.Apr 11 2016, 8:29 AM
jfb added a reviewer: jfb.

Forgot to say: I talked to @reames in person last week and he said he was OK moving forward with this for now.

This revision was automatically updated to reflect the committed changes.
jfb added a comment.Apr 11 2016, 3:41 PM

Looks like this broke the build:

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp: In member function ‘void
{anonymous}::AtomicExpand::expandAtomicLoadToLibcall(llvm::LoadInst*)’:
/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1000:8: warning: unused
variable ‘expanded’ [-Wunused-variable]

bool expanded = expandAtomicOpToLibcall(
     ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp: In member function ‘void
{anonymous}::AtomicExpand::expandAtomicStoreToLibcall(llvm::StoreInst*)’:
/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1013:8: warning: unused
variable ‘expanded’ [-Wunused-variable]

bool expanded = expandAtomicOpToLibcall(
     ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp: In member function ‘void
{anonymous}::AtomicExpand::expandAtomicCASToLibcall(llvm::AtomicCmpXchgInst*)’:
/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1027:8: warning: unused
variable ‘expanded’ [-Wunused-variable]

bool expanded = expandAtomicOpToLibcall(
     ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp: In function
‘llvm::ArrayRef<llvm::RTLIB::Libcall>
GetRMWLibcall(llvm::AtomicRMWInst::BinOp)’:
/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1068:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsXchg)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsXchg;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1070:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsAdd)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsAdd;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1072:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsSub)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsSub;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1074:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsAnd)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsAnd;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1076:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsOr)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsOr;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1078:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsXor)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsXor;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1080:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsNand)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsNand;
       ^