This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
Atomics.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
RuntimeLibcalls.h
-
Target/
-
TargetLowering.h
-
lib/
-
CodeGen/
-
AtomicExpandPass.cpp
-
TargetLoweringBase.cpp
-
Target/Sparc/
-
Sparc/
-
SparcISelLowering.cpp
-
test/Transforms/AtomicExpand/SPARC/
-
Transforms/
-
AtomicExpand/
-
SPARC/
-
libcalls.ll
-
lit.local.cfg

Differential D18200

Add __atomic_* lowering to AtomicExpandPass.
ClosedPublic

Authored by jyknight on Mar 15 2016, 3:38 PM.

Download Raw Diff

Details

Reviewers

reames
jfb

Commits

rG19f6cce4e34d: Add __atomic_* lowering to AtomicExpandPass.
rGd41b54be1188: This reverts commit r266002, r266011 and r266016.
rGb91d38c5feb4: Add __atomic_* lowering to AtomicExpandPass.
rL266115: Add __atomic_* lowering to AtomicExpandPass.
rL266062: This reverts commit r266002, r266011 and r266016.
rL266002: Add __atomic_* lowering to AtomicExpandPass.

Summary

AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of sync_* builtins,
atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering code,
where unsupported atomic instructions would turn into sync_* library
functions. Because of that, Clang avoids emitting atomic instructions
when this will happen, and emits atomic_* library functions itself in
the frontend.

It is advantageous to do the lowering to atomic libcalls before ISel
time, because it's important that all atomic instructions for a given
size either lower to __atomic_* libcalls, or don't. No mixing and
matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

There's also a few minor other changes:

getInsertFencesForAtomic() is replaced with shouldInsertFencesForAtomic(), so that the decision can be made per-instruction. (This will be used in the next patch)

emitLeadingFence/emitTrailingFence are no longer called when shouldInsertFencesForAtomic is false, so don't need to check that condition themselves.

Diff Detail

Repository: rL LLVM

Event Timeline

jyknight updated this revision to Diff 50779.Mar 15 2016, 3:38 PM

jyknight retitled this revision from to Add __atomic_* lowering to AtomicExpandPass..

jyknight updated this object.

jyknight added subscribers: theraven, rnk, hfinkel and 4 others.

Herald added subscribers: dsanders, jyknight. · View Herald TranscriptMar 15 2016, 3:38 PM

jyknight added a child revision: D18201: Switch over targets to use AtomicExpandPass, and clean up target atomics code..Mar 15 2016, 3:44 PM

Very nice! The eventual Clang simplification should be a huge relief and more than make up for this code (not to mention other backends). In general the code looks fine (most of my suggestions are nit-picking over comments).

Tim.

docs/Atomics.rst
483 ↗	(On Diff #50779)	Case typo on `LIbrary`?
553–554 ↗	(On Diff #50779)	I probably wouldn't mention ARM here, since I think ldrex/strex is always available in Thumb mode if it is in ARM. We won't feel neglected, honestly!
include/llvm/Target/TargetLowering.h
1055 ↗	(On Diff #50779)	"atomic lowering" instead of "DAG builder", since we're changing it around anyway?
lib/CodeGen/AtomicExpandPass.cpp
1098–1099 ↗	(On Diff #50779)	"So, expand to a CAS libcall instead, via a CAS loop"? I was panicking there for a while until I saw the `expandAtomicCASToLibcall`.
1152–1153 ↗	(On Diff #50779)	Perhaps `RTLibType = Libcalls[Log2_32(Size) + 1]`?
1228 ↗	(On Diff #50779)	The last 2 args are the defaults aren't they?

I would strongly prefer to see the two NFC changes separated and submitted separately. (Assuming you're reasonable sure they are NFC, that doesn't need further review.) Having everything in the same patch makes it harder to spot semantic changes in the diff.

Mostly minor comments follow.

I have to admit, I don't really understand the distinction between atomic_*, sync_*, and the buildins even after reading the documentation. Do you have any suggestions for further reading?

include/llvm/CodeGen/RuntimeLibcalls.h
500 ↗	(On Diff #50779)	Separable NFC change?
lib/CodeGen/AtomicExpandPass.cpp
115 ↗	(On Diff #50779)	Alignment can't be zero for atomic instructions. Make this an assert.
123 ↗	(On Diff #50779)	Same as previous
129 ↗	(On Diff #50779)	This seems like an inconsistency we should fix. Could you file a bug for this so that we don't forget? Also, this function probably makes sense to promote to getAlignment on the AtomicRMWInst class itself.
941 ↗	(On Diff #50779)	Use an enum for the memory_order_* values. Also, don't we have that somewhere already?

jyknight added inline comments.Mar 16 2016, 11:26 AM

docs/Atomics.rst
483 ↗	(On Diff #50779)	The strange case actually is the correct URL. Probably a Wiki link thing.
553–554 ↗	(On Diff #50779)	I believe that's not true: ArmV6 had ldrex/strex, but its "Thumb1" mode did not. It seems to have been introduced with "Thumb2" in ARMv6T2 and ARMv7 and later. The dependent CL adds a new function in ARMSubtarget.h: bool hasLdrex() const { return HasV6Ops && (!InThumbMode \|\| HasV8MBaselineOps); } (effectively the same logic was previously in ARMSubtarget::enableAtomicExpand)
lib/CodeGen/AtomicExpandPass.cpp
1152–1153 ↗	(On Diff #50779)	I dunno, that seems harder to understand to me. Really I think it'd be best as a switch, just with 1/3rd the number of lines in it: switch (Size) { case 1: RTLibType = Libcalls[1]; break; case 2: RTLibType = Libcalls[2]; break; case 4: RTLibType = Libcalls[3]; break; case 8: RTLibType = Libcalls[4]; break; case 16: RTLibType = Libcalls[5]; break; } But, llvm's clang-format style doesn't use AllowShortCaseLabelsOnASingleLine: true Maybe it should.
1228 ↗	(On Diff #50779)	Indeed; fixed all such instances.

jyknight added inline comments.Mar 16 2016, 11:37 AM

lib/CodeGen/AtomicExpandPass.cpp
115 ↗	(On Diff #50779)	I want to change that, to allow specifying atomic IR instructions with any alignment. They'll be expanded to a libcall which uses a mutex if unaligned, of course. From original plan email: A4) In LLVM, add "align" attributes to cmpxchg and atomicrmw, and allow specifying "align" values for "load atomic" and "store atomic". LLVM will lower them to the generic library calls. In clang, start lowering misaligned atomics to these llvm instructions as well.
129 ↗	(On Diff #50779)	I agree it needs to be fixed. See quote from plan above. I guess I can file a bug for it if that makes it more likely someone else will do the work and I don't end up having to... This can't really be made a getAlignment() on the class, because that would make it even more inconsistent with load and store's getAlignment functions: those return only the user-specified alignment attribute, and don't depend on DataLayout.
941 ↗	(On Diff #50779)	Sure, I can put it in a local enum.We wouldn't have it anywhere in LLVM, because only clang deals with these values at the moment. I mean, I suppose I could use stdatomic.h's enum memory_order...but it seemed better not to.

t.p.northover added inline comments.Mar 16 2016, 11:39 AM

docs/Atomics.rst
553–554 ↗	(On Diff #50779)	Oops. Sorry.
lib/CodeGen/AtomicExpandPass.cpp
1152–1153 ↗	(On Diff #50779)	I'd be happy with that.

reames added inline comments.Mar 16 2016, 11:43 AM

lib/CodeGen/AtomicExpandPass.cpp
115 ↗	(On Diff #50779)	This sounds like a reasonable future direction, but it is not the state as of this patch. As a result, you'd be introducing completely dead code if you landed as is. Please do not do this. Introduce the assert, then change in a future commit which actually introduces the new language feature.

In D18200#376555, @reames wrote:

I would strongly prefer to see the two NFC changes separated and submitted separately. (Assuming you're reasonable sure they are NFC, that doesn't need further review.) Having everything in the same patch makes it harder to spot semantic changes in the diff.

Okay. I'll submit those, and then upload a new version of this patch without those changes, and with the modifications requested by reviewers.

I have to admit, I don't really understand the distinction between atomic_*, sync_*, and the buildins even after reading the documentation. Do you have any suggestions for further reading?

Yeah, it's pretty tricky...

So, the builtins are described in GCC's documentation:
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

One thing to note there is that sync_* builtins are now almost identical to the similarly-named atomic_* builtins, without the last ordering parameter.

The `__atomic_*` library functions are described here:
https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary

The `__sync_* library functions are not really described anywhere as far as I know, they are just an implementation detail between the compiler and the compiler-support library. They were originally just the default lowering of the __sync_*` builtins when there was no pattern for the target.

It's all especially confusing because the state of all this evolved a bunch between earlier versions of GCC and the 4.7/4.8 timeframe when it all sort of came together.

lib/CodeGen/AtomicExpandPass.cpp
115 ↗	(On Diff #50779)	OK, will do.

reames added inline comments.Mar 16 2016, 12:44 PM

lib/CodeGen/AtomicExpandPass.cpp
1177 ↗	(On Diff #50779)	If I'm reading the documentation correctly, the sized versions are purely an optimization over the generic versions correct? The compiler is free to emit a mix of sized and generic calls on the same object? If so, it would be nice to have the code reflect that. In particular, it might make sense to extract a helper function which specializes generic calls with known sizes to the specialized form, and use that in way that manually written generic calls get specialized as well. I think that would also give you an logically distinct chunk which could be factored out and submitted separately.

Tweaks for review

jyknight added inline comments.Mar 16 2016, 6:34 PM

lib/CodeGen/AtomicExpandPass.cpp
1177 ↗	(On Diff #50779)	That is correct. Except that there's no generic version of some of the functions (the integer math ones). Also, I'm not sure I followed what you're suggesting. Are you saying you'd like the code to emit a call to a non-size-specialized function, and then run a later optimization pass to turn that into another call, to a size-specialized function? If so, I think that'd be strictly worse. Going through an intermediate step would provide no value. There should be no manually written calls to these library functions. Users call the clang builtins (or, even better, the C11/C++11 standard functions), which will get lowered to "load atomic", "store atomic", "atomicrmw", or "cmpxchg" llvm IR instructions. From those, we'll end up here. And if someone is crazy and calls the library functions directly, llvm shouldn't do try to identify those calls and do anything to them.

More comments. This is a ways from submission just on code quality and test coverage alone.

lib/CodeGen/AtomicExpandPass.cpp
68 ↗	(On Diff #50902)	This function seems overly and falsely generic. If I'm reading this correctly, the CASExpected and Ordering2 arguments are only used for CAS operations. I suggest just specializing this based on whether this is a CAS or not.
172 ↗	(On Diff #50902)	a) move this filter into the previous loop b) separate and land this refactoring without further review
181 ↗	(On Diff #50902)	The current structure of this code will break floating point and pointer atomic canonicalization. The previous code fell through, the new code does not. I suspect you need to swap the order of the two steps.
970 ↗	(On Diff #50902)	This should be the default in the switch.
980 ↗	(On Diff #50902)	Wait, what? Why do we need this? We can simply test whether we have a particular sized libcall. The enable logic should be strictly contained there. This strongly hints to me we should split the libcall part out into it's own change. This would also make it more natural to write tests for the lowering. (Which tests are missing from the current patch and required.)
1000 ↗	(On Diff #50902)	Use an assert.
1013 ↗	(On Diff #50902)	assert
1110 ↗	(On Diff #50902)	This would be much more naturally expressed at the callsite by having this function simply return false when the expansion doesn't work, and then applying the fallback along with the second call.
1137 ↗	(On Diff #50902)	Please use an ArrayRef for the last argument.
1178 ↗	(On Diff #50902)	What I was suggesting was that you first found the generic libcall, and then given the size information, selected the sized version if one was available. I find the mixing of sized and unsized functions in your current array structure very confusing and hard to follow. I hadn't realized there were no generic versions for some of them. That seems unfortunately inconsistent. Also, I strongly disagree with your position that we should optimize manually written generic calls into specific ones. If it truly is an optimization, we should perform it. If nothing else, it removes a lot of complexity from the potential frontends.
1243 ↗	(On Diff #50902)	The difference between generic (allocas) and specific call signatures is large enough that trying to share the code is actively confusing. Please just split this method based on the UseSizedLibcall variable or extract helper functions.

This revision now requires changes to proceed.Mar 17 2016, 6:20 PM

davidxl added a subscriber: davidxl.Mar 17 2016, 10:49 PM

davidxl added inline comments.

test/Transforms/AtomicExpand/SPARC/libcalls.ll
10 ↗	(On Diff #50902)	Perhaps move the option to the command line?
48 ↗	(On Diff #50902)	Are these casting from i16 * -> i8 * necessary? Also should __atomoic_compare_exchange_2's 4th argument be a bool value selecting weak/strong form of compare and exchange ?
77 ↗	(On Diff #50902)	are the checks for alloca, bitcast necessary for the test?

jyknight added inline comments.Mar 18 2016, 11:46 AM

lib/CodeGen/AtomicExpandPass.cpp
181 ↗	(On Diff #50902)	I don't believe so. This code handles those types, via appropriate calls to Builder.CreateBitOrPointerCast. Also see the test cases starting with test_load_double. I'm missing a test for pointers, there, but it does handle them too.
970 ↗	(On Diff #50902)	No it shouldn't; written this way, if someone adds new cases it will emit a compiler warning. That's not terribly important here, but it's a good idiom.
980 ↗	(On Diff #50902)	We can simply test whether we have a particular sized libcall. Simply test how?? What do you mean? This strongly hints to me we should split the libcall part out into it's own change. What? This entire patch is about generating libcalls. What do you mean split it into its own change? Separate from what other part? (Which tests are missing from the current patch and required.) Which tests would you like to see? There are some tests already; I can add more, but I don't know what you actually mean by that, either.
1000 ↗	(On Diff #50902)	You mean like: bool expanded = expandAtomicOpToLibcall(...); assert(expanded); Not sure why that's better, but sure.
1110 ↗	(On Diff #50902)	I don't know what you mean by that. However, I do see that I can remove the 3 Builder.* calls below by simply calling createCmpXchgInstFun, which is the same thing.
1178 ↗	(On Diff #50902)	If you could please be more specific about what you found confusing maybe I can try to address that. However, regarding the "optimization" of generic libcalls into sized libcalls: no, that just makes no sense to do. It will not do anything but obfuscate things. I'll reiterate: nobody should ever be writing calls to these libcalls themselves. Even doing so will take some effort if you're trying to do that from C. It also cannot possibly reduce any frontend complexity, because the frontend builtins which users call are not the same as these libcalls. The frontend recognizes the builtins, and lowers to LLVM atomic IR instructions. Sometimes, now, clang also lowers the builtins to these libcalls at its layer, but the plan is to remove that, once the support in LLVM itself is all in place. For example, compare (from https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html): Built-in Function: bool __atomic_compare_exchange_n (type ptr, type expected, type desired, bool weak, int success_memorder, int failure_memorder) Built-in Function: bool __atomic_compare_exchange (type ptr, type expected, type desired, bool weak, int success_memorder, int failure_memorder) (Note the "n" in "__atomic_compare_exchange_n" is literal letter n, not a stand-in for a number.) with the libcall documented below (and in https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary): bool __atomic_compare_exchange_N(iN ptr, iN expected, iN desired, int success_order, int failure_order) bool __atomic_compare_exchange(size_t size, void ptr, void expected, void desired, int success_order, int failure_order) (with "N=1,2,4,8,16.) They are just completely different.
1243 ↗	(On Diff #50902)	I actually had it split along that axis in a former version of the patch, but I didn't like it, because there were two copies of basically the same function, with just minor variations. I found I was flipping back and forth trying to see what the differences were. So, I merged the two, and like it better this way.
test/Transforms/AtomicExpand/SPARC/libcalls.ll
48 ↗	(On Diff #50902)	The casts aren't strictly necessary, in that all pointers are equivalent. But, the code needed to choose something to be the signature of the libcall, and it was easier to just use i8* everywhere (which, btw, is the same thing clang is doing today). When the typeless pointers work is done, that'll go away of course. No, __atomic_compare_exchange_* don't have a weak/strong argument in the libcall. Only in the frontend builtin.
77 ↗	(On Diff #50902)	You mean: don't check the whole function output, just some particularly representative lines? I think it's pretty important to check all of the instructions, since each one has some distinct code to emit it.

davidxl added inline comments.Mar 18 2016, 12:53 PM

test/Transforms/AtomicExpand/SPARC/libcalls.ll
48 ↗	(On Diff #50902)	From the documentation, the signature of __atomic_compare_exchange_N takes iN* as argument type and it is also how the lib function is implemented.
77 ↗	(On Diff #50902)	The problem is that it exposes implementation details which may change in the future. For instance the bitcast may go away even though I understand the need for insertvalue/extractvalue (see the fetchadd impl test below using a loop) in current implementation, strictly speaking, they can in theory be collapsed if the lowering sees a slightly larger scope These are minor issues that do not need to be fixed IMO -- just a comment.

jfb added inline comments.Mar 24 2016, 2:48 PM

docs/Atomics.rst
424 ↗	(On Diff #50902)	Code-quote `cmpxchg` here and below.
553 ↗	(On Diff #50902)	"which is has"
559 ↗	(On Diff #50902)	Link to the kernel user helpers doc: https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt
include/llvm/Target/TargetLowering.h
1053 ↗	(On Diff #50902)	Size is usually bytes for LLVM, and SizeInBits is used in the name when we expect bits. I think you sound rename the better / setter and data member appropriately.
lib/CodeGen/AtomicExpandPass.cpp
131 ↗	(On Diff #50902)	Reference the bug number.
970 ↗	(On Diff #50902)	Agree with @jyknight on this one. You still need unreachable for the compilers that aren't smart enough to see all paths return.
1094 ↗	(On Diff #50902)	Change `default` to `FIRST_BINOP`, `LAST_BINOP`, `BAD_BINOP` and put the unreachable after the switch.
1144 ↗	(On Diff #50902)	Why 16? Could use DL->StackNaturalAlign?

Updates for review comments.

I believe I've addressed as many of the comments as I can. The ones I replied to saying I don't understand remain unresolved.

lib/CodeGen/AtomicExpandPass.cpp
1110 ↗	(On Diff #50902)	When attempting that change, I remembered why I didn't in the first place: I need the "AtomicCmpXchgInst *Pair" to pass to expandAtomicCASToLibcall, and it's not readily accessible if I just call that other fn. So I've not made any change here.
1144 ↗	(On Diff #50902)	Yeah that seems wrong actually. The point of setting the alignment is to ensure that the value is aligned sufficiently to be able to cast to an integer of the proper width and load that, as the libcall function may do that. "16" was because that's the maximum specialized size... But actually, I think it should be using DL.getPrefTypeAlignment(SizedIntTy), so switched to that.
test/Transforms/AtomicExpand/SPARC/libcalls.ll
10 ↗	(On Diff #50902)	I'm not sure that's very useful. It can't really do multiple architectures in one test anyhow, since there's no way to say "REQUIRES:" separately for different run lines, is there?
48 ↗	(On Diff #50902)	Whether the pointee types match doesn't really matter, though. Especially as the plan is to remove typed pointers entirely, at which point it'll just be "ptr" or something, it seemed best to just use the same kind always.

jfb added inline comments.Apr 4 2016, 1:37 PM

include/llvm/CodeGen/RuntimeLibcalls.h
401 ↗	(On Diff #52275)	"new" is a comment that won't age well. Could you instead comment on SYNC_ saying ATOMIC_ are the newer approach, and refer to docs?

3rd round review fixes.

Looks good overall, I assume @reames wants to do another round though.

jfb added inline comments.Apr 7 2016, 3:49 PM

lib/CodeGen/AtomicExpandPass.cpp
940 ↗	(On Diff #52724)	See this patch for a fix: http://reviews.llvm.org/D18875

Ping. I'd really like to get this in, so I can move on to the rest of the atomics work.

@reames, especially, since you have outstanding concerns...

Forgot to say: I talked to @reames in person last week and he said he was OK moving forward with this for now.

Closed by commit rL266002: Add __atomic_* lowering to AtomicExpandPass. (authored by jyknight). · Explain WhyApr 11 2016, 3:28 PM

This revision was automatically updated to reflect the committed changes.

Looks like this broke the build:

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp: In member function ‘void
{anonymous}::AtomicExpand::expandAtomicLoadToLibcall(llvm::LoadInst*)’:
/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1000:8: warning: unused
variable ‘expanded’ [-Wunused-variable]

bool expanded = expandAtomicOpToLibcall(
     ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp: In member function ‘void
{anonymous}::AtomicExpand::expandAtomicStoreToLibcall(llvm::StoreInst*)’:
/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1013:8: warning: unused
variable ‘expanded’ [-Wunused-variable]

bool expanded = expandAtomicOpToLibcall(
     ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp: In member function ‘void
{anonymous}::AtomicExpand::expandAtomicCASToLibcall(llvm::AtomicCmpXchgInst*)’:
/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1027:8: warning: unused
variable ‘expanded’ [-Wunused-variable]

bool expanded = expandAtomicOpToLibcall(
     ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp: In function
‘llvm::ArrayRef<llvm::RTLIB::Libcall>
GetRMWLibcall(llvm::AtomicRMWInst::BinOp)’:
/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1068:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsXchg)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsXchg;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1070:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsAdd)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsAdd;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1072:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsSub)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsSub;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1074:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsAnd)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsAnd;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1076:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsOr)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsOr;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1078:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsXor)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsXor;
       ^

/s/llvm/llvm/lib/CodeGen/AtomicExpandPass.cpp:1080:12: error: could not
convert ‘(const llvm::RTLIB::Libcall*)(& LibcallsNand)’ from ‘const
llvm::RTLIB::Libcall*’ to ‘llvm::ArrayRef<llvm::RTLIB::Libcall>’

return LibcallsNand;
       ^

Revision Contents

Path

Size

llvm/

trunk/

docs/

Atomics.rst

174 lines

include/

llvm/

CodeGen/

RuntimeLibcalls.h

73 lines

Target/

TargetLowering.h

19 lines

lib/

CodeGen/

AtomicExpandPass.cpp

497 lines

TargetLoweringBase.cpp

64 lines

Target/

Sparc/

SparcISelLowering.cpp

7 lines

test/

Transforms/

AtomicExpand/

SPARC/

libcalls.ll

257 lines

lit.local.cfg

2 lines

Diff 53332

llvm/trunk/docs/Atomics.rst

	Show First 20 Lines • Show All 407 Lines • ▼ Show 20 Lines
	On architectures which use barrier instructions for all atomic ordering (like			On architectures which use barrier instructions for all atomic ordering (like
	ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if			ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
	``setInsertFencesForAtomic()`` was used.			``setInsertFencesForAtomic()`` was used.

	The MachineMemOperand for all atomic operations is currently marked as volatile;			The MachineMemOperand for all atomic operations is currently marked as volatile;
	this is not correct in the IR sense of volatile, but CodeGen handles anything			this is not correct in the IR sense of volatile, but CodeGen handles anything
	marked volatile very conservatively. This should get fixed at some point.			marked volatile very conservatively. This should get fixed at some point.

	Common architectures have some way of representing at least a pointer-sized			One very important property of the atomic operations is that if your backend
	lock-free ``cmpxchg``; such an operation can be used to implement all the other			supports any inline lock-free atomic operations of a given size, you should
	atomic operations which can be represented in IR up to that size. Backends are			support ALL operations of that size in a lock-free manner.
	expected to implement all those operations, but not operations which cannot be
	implemented in a lock-free manner. It is expected that backends will give an			When the target implements atomic ``cmpxchg`` or LL/SC instructions (as most do)
	error when given an operation which cannot be implemented. (The LLVM code			this is trivial: all the other operations can be implemented on top of those
	generator is not very helpful here at the moment, but hopefully that will			primitives. However, on many older CPUs (e.g. ARMv5, SparcV8, Intel 80386) there
	change.)			are atomic load and store instructions, but no ``cmpxchg`` or LL/SC. As it is
				invalid to implement ``atomic load`` using the native instruction, but
				``cmpxchg`` using a library call to a function that uses a mutex, ``atomic
				load`` must also expand to a library call on such architectures, so that it
				can remain atomic with regards to a simultaneous ``cmpxchg``, by using the same
				mutex.

				AtomicExpandPass can help with that: it will expand all atomic operations to the
				proper ``__atomic_*`` libcalls for any size above the maximum set by
				``setMaxAtomicSizeInBitsSupported`` (which defaults to 0).

	On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores			On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
	generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent			generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
	fences generate an ``MFENCE``, other fences do not cause any code to be			fences generate an ``MFENCE``, other fences do not cause any code to be
	generated. cmpxchg uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg``			generated. ``cmpxchg`` uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg``
	uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all			uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all
	other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``. Depending			other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``. Depending
	on the users of the result, some ``atomicrmw`` operations can be translated into			on the users of the result, some ``atomicrmw`` operations can be translated into
	operations like ``LOCK AND``, but that does not work in general.			operations like ``LOCK AND``, but that does not work in general.

	On ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,			On ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,
	and SequentiallyConsistent semantics require barrier instructions for every such			and SequentiallyConsistent semantics require barrier instructions for every such
	operation. Loads and stores generate normal instructions. ``cmpxchg`` and			operation. Loads and stores generate normal instructions. ``cmpxchg`` and
	``atomicrmw`` can be represented using a loop with LL/SC-style instructions			``atomicrmw`` can be represented using a loop with LL/SC-style instructions
	which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``			which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
	on ARM, etc.).			on ARM, etc.).

	It is often easiest for backends to use AtomicExpandPass to lower some of the			It is often easiest for backends to use AtomicExpandPass to lower some of the
	atomic constructs. Here are some lowerings it can do:			atomic constructs. Here are some lowerings it can do:

	* cmpxchg -> loop with load-linked/store-conditional			* cmpxchg -> loop with load-linked/store-conditional
	by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,			by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,
	``emitStoreConditional()``			``emitStoreConditional()``
	* large loads/stores -> ll-sc/cmpxchg			* large loads/stores -> ll-sc/cmpxchg
	by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``			by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
	* strong atomic accesses -> monotonic accesses + fences			* strong atomic accesses -> monotonic accesses + fences by overriding
	by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``			``shouldInsertFencesForAtomic()``, ``emitLeadingFence()``, and
	and ``emitTrailingFence()``			``emitTrailingFence()``
	* atomic rmw -> loop with cmpxchg or load-linked/store-conditional			* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
	by overriding ``expandAtomicRMWInIR()``			by overriding ``expandAtomicRMWInIR()``
				* expansion to __atomic_* libcalls for unsupported sizes.

	For an example of all of these, look at the ARM backend.			For an example of all of these, look at the ARM backend.

				Libcalls: __atomic_*
				====================

				There are two kinds of atomic library calls that are generated by LLVM. Please
				note that both sets of library functions somewhat confusingly share the names of
				builtin functions defined by clang. Despite this, the library functions are
				not directly related to the builtins: it is not the case that ``__atomic_*``
				builtins lower to ``__atomic_`` library calls and ``__sync_`` builtins lower
				to ``__sync_*`` library calls.

				The first set of library functions are named ``__atomic_*``. This set has been
				"standardized" by GCC, and is described below. (See also `GCC's documentation
				<https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary>`_)

				LLVM's AtomicExpandPass will translate atomic operations on data sizes above
				``MaxAtomicSizeInBitsSupported`` into calls to these functions.

				There are four generic functions, which can be called with data of any size or
				alignment::

				void __atomic_load(size_t size, void ptr, void ret, int ordering)
				void __atomic_store(size_t size, void ptr, void val, int ordering)
				void __atomic_exchange(size_t size, void ptr, void val, void *ret, int ordering)
				bool __atomic_compare_exchange(size_t size, void ptr, void expected, void *desired, int success_order, int failure_order)

				There are also size-specialized versions of the above functions, which can only
				be used with naturally-aligned pointers of the appropriate size. In the
				signatures below, "N" is one of 1, 2, 4, 8, and 16, and "iN" is the appropriate
				integer type of that size; if no such integer type exists, the specialization
				cannot be used::

				iN __atomic_load_N(iN *ptr, iN val, int ordering)
				void __atomic_store_N(iN *ptr, iN val, int ordering)
				iN __atomic_exchange_N(iN *ptr, iN val, int ordering)
				bool __atomic_compare_exchange_N(iN ptr, iN expected, iN desired, int success_order, int failure_order)

				Finally there are some read-modify-write functions, which are only available in
				the size-specific variants (any other sizes use a ``__atomic_compare_exchange``
				loop)::

				iN __atomic_fetch_add_N(iN *ptr, iN val, int ordering)
				iN __atomic_fetch_sub_N(iN *ptr, iN val, int ordering)
				iN __atomic_fetch_and_N(iN *ptr, iN val, int ordering)
				iN __atomic_fetch_or_N(iN *ptr, iN val, int ordering)
				iN __atomic_fetch_xor_N(iN *ptr, iN val, int ordering)
				iN __atomic_fetch_nand_N(iN *ptr, iN val, int ordering)

				This set of library functions have some interesting implementation requirements
				to take note of:

				- They support all sizes and alignments -- including those which cannot be
				implemented natively on any existing hardware. Therefore, they will certainly
				use mutexes in for some sizes/alignments.

				- As a consequence, they cannot be shipped in a statically linked
				compiler-support library, as they have state which must be shared amongst all
				DSOs loaded in the program. They must be provided in a shared library used by
				all objects.

				- The set of atomic sizes supported lock-free must be a superset of the sizes
				any compiler can emit. That is: if a new compiler introduces support for
				inline-lock-free atomics of size N, the ``__atomic_*`` functions must also have a
				lock-free implementation for size N. This is a requirement so that code
				produced by an old compiler (which will have called the ``__atomic_*`` function)
				interoperates with code produced by the new compiler (which will use native
				the atomic instruction).

				Note that it's possible to write an entirely target-independent implementation
				of these library functions by using the compiler atomic builtins themselves to
				implement the operations on naturally-aligned pointers of supported sizes, and a
				generic mutex implementation otherwise.

				Libcalls: __sync_*
				==================

				Some targets or OS/target combinations can support lock-free atomics, but for
				various reasons, it is not practical to emit the instructions inline.

				There's two typical examples of this.

				Some CPUs support multiple instruction sets which can be swiched back and forth
				on function-call boundaries. For example, MIPS supports the MIPS16 ISA, which
				has a smaller instruction encoding than the usual MIPS32 ISA. ARM, similarly,
				has the Thumb ISA. In MIPS16 and earlier versions of Thumb, the atomic
				instructions are not encodable. However, those instructions are available via a
				function call to a function with the longer encoding.

				Additionally, a few OS/target pairs provide kernel-supported lock-free
				atomics. ARM/Linux is an example of this: the kernel `provides
				<https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt>`_ a
				function which on older CPUs contains a "magically-restartable" atomic sequence
				(which looks atomic so long as there's only one CPU), and contains actual atomic
				instructions on newer multicore models. This sort of functionality can typically
				be provided on any architecture, if all CPUs which are missing atomic
				compare-and-swap support are uniprocessor (no SMP). This is almost always the
				case. The only common architecture without that property is SPARC -- SPARCV8 SMP
				systems were common, yet it doesn't support any sort of compare-and-swap
				operation.

				In either of these cases, the Target in LLVM can claim support for atomics of an
				appropriate size, and then implement some subset of the operations via libcalls
				to a ``__sync_`` function. Such functions must* not use locks in their
				implementation, because unlike the ``__atomic_*`` routines used by
				AtomicExpandPass, these may be mixed-and-matched with native instructions by the
				target lowering.

				Further, these routines do not need to be shared, as they are stateless. So,
				there is no issue with having multiple copies included in one binary. Thus,
				typically these routines are implemented by the statically-linked compiler
				runtime support library.

				LLVM will emit a call to an appropriate ``__sync_*`` routine if the target
				ISelLowering code has set the corresponding ``ATOMIC_CMPXCHG``, ``ATOMIC_SWAP``,
				or ``ATOMIC_LOAD_*`` operation to "Expand", and if it has opted-into the
				availablity of those library functions via a call to ``initSyncLibcalls()``.

				The full set of functions that may be called by LLVM is (for ``N`` being 1, 2,
				4, 8, or 16)::

				iN __sync_val_compare_and_swap_N(iN *ptr, iN expected, iN desired)
				iN __sync_lock_test_and_set_N(iN *ptr, iN val)
				iN __sync_fetch_and_add_N(iN *ptr, iN val)
				iN __sync_fetch_and_sub_N(iN *ptr, iN val)
				iN __sync_fetch_and_and_N(iN *ptr, iN val)
				iN __sync_fetch_and_or_N(iN *ptr, iN val)
				iN __sync_fetch_and_xor_N(iN *ptr, iN val)
				iN __sync_fetch_and_nand_N(iN *ptr, iN val)
				iN __sync_fetch_and_max_N(iN *ptr, iN val)
				iN __sync_fetch_and_umax_N(iN *ptr, iN val)
				iN __sync_fetch_and_min_N(iN *ptr, iN val)
				iN __sync_fetch_and_umin_N(iN *ptr, iN val)

				This list doesn't include any function for atomic load or store; all known
				architectures support atomic loads and stores directly (possibly by emitting a
				fence on either side of a normal load or store.)

				There's also, somewhat separately, the possibility to lower ``ATOMIC_FENCE`` to
				``__sync_synchronize()``. This may happen or not happen independent of all the
				above, controlled purely by ``setOperationAction(ISD::ATOMIC_FENCE, ...)``.

llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h

Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	enum Libcall {
// MEMORY		// MEMORY
MEMCPY,		MEMCPY,
MEMSET,		MEMSET,
MEMMOVE,		MEMMOVE,

// EXCEPTION HANDLING		// EXCEPTION HANDLING
UNWIND_RESUME,		UNWIND_RESUME,

// Family ATOMICs		// Note: there's two sets of atomics libcalls; see
		// <http://llvm.org/docs/Atomics.html> for more info on the
		// difference between them.

		// Atomic '__sync_*' libcalls.
SYNC_VAL_COMPARE_AND_SWAP_1,		SYNC_VAL_COMPARE_AND_SWAP_1,
SYNC_VAL_COMPARE_AND_SWAP_2,		SYNC_VAL_COMPARE_AND_SWAP_2,
SYNC_VAL_COMPARE_AND_SWAP_4,		SYNC_VAL_COMPARE_AND_SWAP_4,
SYNC_VAL_COMPARE_AND_SWAP_8,		SYNC_VAL_COMPARE_AND_SWAP_8,
SYNC_VAL_COMPARE_AND_SWAP_16,		SYNC_VAL_COMPARE_AND_SWAP_16,
SYNC_LOCK_TEST_AND_SET_1,		SYNC_LOCK_TEST_AND_SET_1,
SYNC_LOCK_TEST_AND_SET_2,		SYNC_LOCK_TEST_AND_SET_2,
SYNC_LOCK_TEST_AND_SET_4,		SYNC_LOCK_TEST_AND_SET_4,
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	enum Libcall {
SYNC_FETCH_AND_MIN_8,		SYNC_FETCH_AND_MIN_8,
SYNC_FETCH_AND_MIN_16,		SYNC_FETCH_AND_MIN_16,
SYNC_FETCH_AND_UMIN_1,		SYNC_FETCH_AND_UMIN_1,
SYNC_FETCH_AND_UMIN_2,		SYNC_FETCH_AND_UMIN_2,
SYNC_FETCH_AND_UMIN_4,		SYNC_FETCH_AND_UMIN_4,
SYNC_FETCH_AND_UMIN_8,		SYNC_FETCH_AND_UMIN_8,
SYNC_FETCH_AND_UMIN_16,		SYNC_FETCH_AND_UMIN_16,

		// Atomic '__atomic_*' libcalls.
		ATOMIC_LOAD,
		ATOMIC_LOAD_1,
		ATOMIC_LOAD_2,
		ATOMIC_LOAD_4,
		ATOMIC_LOAD_8,
		ATOMIC_LOAD_16,

		ATOMIC_STORE,
		ATOMIC_STORE_1,
		ATOMIC_STORE_2,
		ATOMIC_STORE_4,
		ATOMIC_STORE_8,
		ATOMIC_STORE_16,

		ATOMIC_EXCHANGE,
		ATOMIC_EXCHANGE_1,
		ATOMIC_EXCHANGE_2,
		ATOMIC_EXCHANGE_4,
		ATOMIC_EXCHANGE_8,
		ATOMIC_EXCHANGE_16,

		ATOMIC_COMPARE_EXCHANGE,
		ATOMIC_COMPARE_EXCHANGE_1,
		ATOMIC_COMPARE_EXCHANGE_2,
		ATOMIC_COMPARE_EXCHANGE_4,
		ATOMIC_COMPARE_EXCHANGE_8,
		ATOMIC_COMPARE_EXCHANGE_16,

		ATOMIC_FETCH_ADD_1,
		ATOMIC_FETCH_ADD_2,
		ATOMIC_FETCH_ADD_4,
		ATOMIC_FETCH_ADD_8,
		ATOMIC_FETCH_ADD_16,

		ATOMIC_FETCH_SUB_1,
		ATOMIC_FETCH_SUB_2,
		ATOMIC_FETCH_SUB_4,
		ATOMIC_FETCH_SUB_8,
		ATOMIC_FETCH_SUB_16,

		ATOMIC_FETCH_AND_1,
		ATOMIC_FETCH_AND_2,
		ATOMIC_FETCH_AND_4,
		ATOMIC_FETCH_AND_8,
		ATOMIC_FETCH_AND_16,

		ATOMIC_FETCH_OR_1,
		ATOMIC_FETCH_OR_2,
		ATOMIC_FETCH_OR_4,
		ATOMIC_FETCH_OR_8,
		ATOMIC_FETCH_OR_16,

		ATOMIC_FETCH_XOR_1,
		ATOMIC_FETCH_XOR_2,
		ATOMIC_FETCH_XOR_4,
		ATOMIC_FETCH_XOR_8,
		ATOMIC_FETCH_XOR_16,

		ATOMIC_FETCH_NAND_1,
		ATOMIC_FETCH_NAND_2,
		ATOMIC_FETCH_NAND_4,
		ATOMIC_FETCH_NAND_8,
		ATOMIC_FETCH_NAND_16,

		ATOMIC_IS_LOCK_FREE,

// Stack Protector Fail.		// Stack Protector Fail.
STACKPROTECTOR_CHECK_FAIL,		STACKPROTECTOR_CHECK_FAIL,

// Deoptimization.		// Deoptimization.
DEOPTIMIZE,		DEOPTIMIZE,

UNKNOWN_LIBCALL		UNKNOWN_LIBCALL
};		};
Show All 32 Lines

llvm/trunk/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 1,053 Lines • ▼ Show 20 Lines	std::pair<int, MVT> getTypeLegalizationCost(const DataLayout &DL,
Type *Ty) const;		Type *Ty) const;

/// @}		/// @}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
/// \name Helpers for atomic expansion.		/// \name Helpers for atomic expansion.
/// @{		/// @{

		/// Returns the maximum atomic operation size (in bits) supported by
		/// the backend. Atomic operations greater than this size (as well
		/// as ones that are not naturally aligned), will be expanded by
		/// AtomicExpandPass into an __atomic_* library call.
		unsigned getMaxAtomicSizeInBitsSupported() const {
		return MaxAtomicSizeInBitsSupported;
		}

/// Whether AtomicExpandPass should automatically insert fences and reduce		/// Whether AtomicExpandPass should automatically insert fences and reduce
/// ordering for this atomic. This should be true for most architectures with		/// ordering for this atomic. This should be true for most architectures with
/// weak memory ordering. Defaults to false.		/// weak memory ordering. Defaults to false.
virtual bool shouldInsertFencesForAtomic(const Instruction *I) const {		virtual bool shouldInsertFencesForAtomic(const Instruction *I) const {
return false;		return false;
}		}

/// Perform a load-linked operation on Addr, returning a "Value *" with the		/// Perform a load-linked operation on Addr, returning a "Value *" with the
▲ Show 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	void setPrefLoopAlignment(unsigned Align) {
PrefLoopAlignment = Align;		PrefLoopAlignment = Align;
}		}

/// Set the minimum stack alignment of an argument (in log2(bytes)).		/// Set the minimum stack alignment of an argument (in log2(bytes)).
void setMinStackArgumentAlignment(unsigned Align) {		void setMinStackArgumentAlignment(unsigned Align) {
MinStackArgumentAlignment = Align;		MinStackArgumentAlignment = Align;
}		}

		/// Set the maximum atomic operation size supported by the
		/// backend. Atomic operations greater than this size (as well as
		/// ones that are not naturally aligned), will be expanded by
		/// AtomicExpandPass into an __atomic_* library call.
		void setMaxAtomicSizeInBitsSupported(unsigned SizeInBits) {
		MaxAtomicSizeInBitsSupported = SizeInBits;
		}

public:		public:
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Addressing mode description hooks (used by LSR etc).		// Addressing mode description hooks (used by LSR etc).
//		//

/// CodeGenPrepare sinks address calculations into the same BB as Load/Store		/// CodeGenPrepare sinks address calculations into the same BB as Load/Store
/// instructions reading the address. This allows as much computation as		/// instructions reading the address. This allows as much computation as
/// possible to be done in the address mode for that operand. This hook lets		/// possible to be done in the address mode for that operand. This hook lets
▲ Show 20 Lines • Show All 393 Lines • ▼ Show 20 Lines	private:

/// The preferred function alignment (used when alignment unspecified and		/// The preferred function alignment (used when alignment unspecified and
/// optimizing for speed).		/// optimizing for speed).
unsigned PrefFunctionAlignment;		unsigned PrefFunctionAlignment;

/// The preferred loop alignment.		/// The preferred loop alignment.
unsigned PrefLoopAlignment;		unsigned PrefLoopAlignment;

		/// Size in bits of the maximum atomics size the backend supports.
		/// Accesses larger than this will be expanded by AtomicExpandPass.
		unsigned MaxAtomicSizeInBitsSupported;

/// If set to a physical register, this specifies the register that		/// If set to a physical register, this specifies the register that
/// llvm.savestack/llvm.restorestack should save and restore.		/// llvm.savestack/llvm.restorestack should save and restore.
unsigned StackPointerRegisterToSaveRestore;		unsigned StackPointerRegisterToSaveRestore;

/// This indicates the default register class to use for each ValueType the		/// This indicates the default register class to use for each ValueType the
/// target supports natively.		/// target supports natively.
const TargetRegisterClass *RegClassForVT[MVT::LAST_VALUETYPE];		const TargetRegisterClass *RegClassForVT[MVT::LAST_VALUETYPE];
▲ Show 20 Lines • Show All 1,056 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/AtomicExpandPass.cpp

//===-- AtomicExpandPass.cpp - Expand atomic instructions -------===//		//===-- AtomicExpandPass.cpp - Expand atomic instructions -------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file contains a pass (at IR level) to replace atomic instructions with		// This file contains a pass (at IR level) to replace atomic instructions with
// target specific instruction which implement the same semantics in a way		// __atomic_* library calls, or target specific instruction which implement the
// which better fits the target backend. This can include the use of either		// same semantics in a way which better fits the target backend. This can
// (intrinsic-based) load-linked/store-conditional loops, AtomicCmpXchg, or		// include the use of (intrinsic-based) load-linked/store-conditional loops,
// type coercions.		// AtomicCmpXchg, or type coercions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/AtomicExpandUtils.h"		#include "llvm/CodeGen/AtomicExpandUtils.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
Show All 36 Lines	private:
bool tryExpandAtomicRMW(AtomicRMWInst *AI);		bool tryExpandAtomicRMW(AtomicRMWInst *AI);
bool expandAtomicOpToLLSC(		bool expandAtomicOpToLLSC(
Instruction I, Value Addr, AtomicOrdering MemOpOrder,		Instruction I, Value Addr, AtomicOrdering MemOpOrder,
std::function<Value (IRBuilder<> &, Value )> PerformOp);		std::function<Value (IRBuilder<> &, Value )> PerformOp);
AtomicCmpXchgInst convertCmpXchgToIntegerType(AtomicCmpXchgInst CI);		AtomicCmpXchgInst convertCmpXchgToIntegerType(AtomicCmpXchgInst CI);
bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);		bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);
bool isIdempotentRMW(AtomicRMWInst *AI);		bool isIdempotentRMW(AtomicRMWInst *AI);
bool simplifyIdempotentRMW(AtomicRMWInst *AI);		bool simplifyIdempotentRMW(AtomicRMWInst *AI);

		bool expandAtomicOpToLibcall(Instruction *I, unsigned Size, unsigned Align,
		Value PointerOperand, Value ValueOperand,
		Value *CASExpected, AtomicOrdering Ordering,
		AtomicOrdering Ordering2,
		ArrayRef<RTLIB::Libcall> Libcalls);
		void expandAtomicLoadToLibcall(LoadInst *LI);
		void expandAtomicStoreToLibcall(StoreInst *LI);
		void expandAtomicRMWToLibcall(AtomicRMWInst *I);
		void expandAtomicCASToLibcall(AtomicCmpXchgInst *I);
};		};
}		}

char AtomicExpand::ID = 0;		char AtomicExpand::ID = 0;
char &llvm::AtomicExpandID = AtomicExpand::ID;		char &llvm::AtomicExpandID = AtomicExpand::ID;
INITIALIZE_TM_PASS(AtomicExpand, "atomic-expand",		INITIALIZE_TM_PASS(AtomicExpand, "atomic-expand", "Expand Atomic instructions",
"Expand Atomic calls in terms of either load-linked & store-conditional or cmpxchg",
false, false)		false, false)

FunctionPass llvm::createAtomicExpandPass(const TargetMachine TM) {		FunctionPass llvm::createAtomicExpandPass(const TargetMachine TM) {
return new AtomicExpand(TM);		return new AtomicExpand(TM);
}		}

		namespace {
		// Helper functions to retrieve the size of atomic instructions.
		unsigned getAtomicOpSize(LoadInst *LI) {
		const DataLayout &DL = LI->getModule()->getDataLayout();
		return DL.getTypeStoreSize(LI->getType());
		}

		unsigned getAtomicOpSize(StoreInst *SI) {
		const DataLayout &DL = SI->getModule()->getDataLayout();
		return DL.getTypeStoreSize(SI->getValueOperand()->getType());
		}

		unsigned getAtomicOpSize(AtomicRMWInst *RMWI) {
		const DataLayout &DL = RMWI->getModule()->getDataLayout();
		return DL.getTypeStoreSize(RMWI->getValOperand()->getType());
		}

		unsigned getAtomicOpSize(AtomicCmpXchgInst *CASI) {
		const DataLayout &DL = CASI->getModule()->getDataLayout();
		return DL.getTypeStoreSize(CASI->getCompareOperand()->getType());
		}

		// Helper functions to retrieve the alignment of atomic instructions.
		unsigned getAtomicOpAlign(LoadInst *LI) {
		unsigned Align = LI->getAlignment();
		// In the future, if this IR restriction is relaxed, we should
		// return DataLayout::getABITypeAlignment when there's no align
		// value.
		assert(Align != 0 && "An atomic LoadInst always has an explicit alignment");
		return Align;
		}

		unsigned getAtomicOpAlign(StoreInst *SI) {
		unsigned Align = SI->getAlignment();
		// In the future, if this IR restriction is relaxed, we should
		// return DataLayout::getABITypeAlignment when there's no align
		// value.
		assert(Align != 0 && "An atomic StoreInst always has an explicit alignment");
		return Align;
		}

		unsigned getAtomicOpAlign(AtomicRMWInst *RMWI) {
		// TODO(PR27168): This instruction has no alignment attribute, but unlike the
		// default alignment for load/store, the default here is to assume
		// it has NATURAL alignment, not DataLayout-specified alignment.
		const DataLayout &DL = RMWI->getModule()->getDataLayout();
		return DL.getTypeStoreSize(RMWI->getValOperand()->getType());
		}

		unsigned getAtomicOpAlign(AtomicCmpXchgInst *CASI) {
		// TODO(PR27168): same comment as above.
		const DataLayout &DL = CASI->getModule()->getDataLayout();
		return DL.getTypeStoreSize(CASI->getCompareOperand()->getType());
		}

		// Determine if a particular atomic operation has a supported size,
		// and is of appropriate alignment, to be passed through for target
		// lowering. (Versus turning into a __atomic libcall)
		template <typename Inst>
		bool atomicSizeSupported(const TargetLowering TLI, Inst I) {
		unsigned Size = getAtomicOpSize(I);
		unsigned Align = getAtomicOpAlign(I);
		return Align >= Size && Size <= TLI->getMaxAtomicSizeInBitsSupported() / 8;
		}

		} // end anonymous namespace

bool AtomicExpand::runOnFunction(Function &F) {		bool AtomicExpand::runOnFunction(Function &F) {
if (!TM \|\| !TM->getSubtargetImpl(F)->enableAtomicExpand())		if (!TM \|\| !TM->getSubtargetImpl(F)->enableAtomicExpand())
return false;		return false;
TLI = TM->getSubtargetImpl(F)->getTargetLowering();		TLI = TM->getSubtargetImpl(F)->getTargetLowering();

SmallVector<Instruction *, 1> AtomicInsts;		SmallVector<Instruction *, 1> AtomicInsts;

// Changing control-flow while iterating through it is a bad idea, so gather a		// Changing control-flow while iterating through it is a bad idea, so gather a
// list of all atomic instructions before we start.		// list of all atomic instructions before we start.
for (inst_iterator II = inst_begin(F), E = inst_end(F); II != E; ++II) {		for (inst_iterator II = inst_begin(F), E = inst_end(F); II != E; ++II) {
Instruction I = &II;		Instruction I = &II;
if (I->isAtomic() && !isa<FenceInst>(I))		if (I->isAtomic() && !isa<FenceInst>(I))
AtomicInsts.push_back(I);		AtomicInsts.push_back(I);
}		}

bool MadeChange = false;		bool MadeChange = false;
for (auto I : AtomicInsts) {		for (auto I : AtomicInsts) {
auto LI = dyn_cast<LoadInst>(I);		auto LI = dyn_cast<LoadInst>(I);
auto SI = dyn_cast<StoreInst>(I);		auto SI = dyn_cast<StoreInst>(I);
auto RMWI = dyn_cast<AtomicRMWInst>(I);		auto RMWI = dyn_cast<AtomicRMWInst>(I);
auto CASI = dyn_cast<AtomicCmpXchgInst>(I);		auto CASI = dyn_cast<AtomicCmpXchgInst>(I);
assert((LI \|\| SI \|\| RMWI \|\| CASI) && "Unknown atomic instruction");		assert((LI \|\| SI \|\| RMWI \|\| CASI) && "Unknown atomic instruction");

		// If the Size/Alignment is not supported, replace with a libcall.
		if (LI) {
		if (!atomicSizeSupported(TLI, LI)) {
		expandAtomicLoadToLibcall(LI);
		MadeChange = true;
		continue;
		}
		} else if (SI) {
		if (!atomicSizeSupported(TLI, SI)) {
		expandAtomicStoreToLibcall(SI);
		MadeChange = true;
		continue;
		}
		} else if (RMWI) {
		if (!atomicSizeSupported(TLI, RMWI)) {
		expandAtomicRMWToLibcall(RMWI);
		MadeChange = true;
		continue;
		}
		} else if (CASI) {
		if (!atomicSizeSupported(TLI, CASI)) {
		expandAtomicCASToLibcall(CASI);
		MadeChange = true;
		continue;
		}
		}

if (TLI->shouldInsertFencesForAtomic(I)) {		if (TLI->shouldInsertFencesForAtomic(I)) {
auto FenceOrdering = AtomicOrdering::Monotonic;		auto FenceOrdering = AtomicOrdering::Monotonic;
bool IsStore, IsLoad;		bool IsStore, IsLoad;
if (LI && isAcquireOrStronger(LI->getOrdering())) {		if (LI && isAcquireOrStronger(LI->getOrdering())) {
FenceOrdering = LI->getOrdering();		FenceOrdering = LI->getOrdering();
LI->setOrdering(AtomicOrdering::Monotonic);		LI->setOrdering(AtomicOrdering::Monotonic);
IsStore = false;		IsStore = false;
IsLoad = true;		IsLoad = true;
Show All 28 Lines	for (auto I : AtomicInsts) {
if (LI) {		if (LI) {
if (LI->getType()->isFloatingPointTy()) {		if (LI->getType()->isFloatingPointTy()) {
// TODO: add a TLI hook to control this so that each target can		// TODO: add a TLI hook to control this so that each target can
// convert to lowering the original type one at a time.		// convert to lowering the original type one at a time.
LI = convertAtomicLoadToIntegerType(LI);		LI = convertAtomicLoadToIntegerType(LI);
assert(LI->getType()->isIntegerTy() && "invariant broken");		assert(LI->getType()->isIntegerTy() && "invariant broken");
MadeChange = true;		MadeChange = true;
}		}

MadeChange \|= tryExpandAtomicLoad(LI);		MadeChange \|= tryExpandAtomicLoad(LI);
} else if (SI) {		} else if (SI) {
if (SI->getValueOperand()->getType()->isFloatingPointTy()) {		if (SI->getValueOperand()->getType()->isFloatingPointTy()) {
// TODO: add a TLI hook to control this so that each target can		// TODO: add a TLI hook to control this so that each target can
// convert to lowering the original type one at a time.		// convert to lowering the original type one at a time.
SI = convertAtomicStoreToIntegerType(SI);		SI = convertAtomicStoreToIntegerType(SI);
assert(SI->getValueOperand()->getType()->isIntegerTy() &&		assert(SI->getValueOperand()->getType()->isIntegerTy() &&
"invariant broken");		"invariant broken");
▲ Show 20 Lines • Show All 672 Lines • ▼ Show 20 Lines	bool llvm::expandAtomicRMWToCmpXchg(AtomicRMWInst *AI,

Builder.SetInsertPoint(ExitBB, ExitBB->begin());		Builder.SetInsertPoint(ExitBB, ExitBB->begin());

AI->replaceAllUsesWith(NewLoaded);		AI->replaceAllUsesWith(NewLoaded);
AI->eraseFromParent();		AI->eraseFromParent();

return true;		return true;
}		}

		// This converts from LLVM's internal AtomicOrdering enum to the
		// memory_order_* value required by the __atomic_* libcalls.
		static int libcallAtomicModel(AtomicOrdering AO) {
		enum {
		AO_ABI_memory_order_relaxed = 0,
		AO_ABI_memory_order_consume = 1,
		AO_ABI_memory_order_acquire = 2,
		AO_ABI_memory_order_release = 3,
		AO_ABI_memory_order_acq_rel = 4,
		AO_ABI_memory_order_seq_cst = 5
		};

		switch (AO) {
		case AtomicOrdering::NotAtomic:
		llvm_unreachable("Expected atomic memory order.");
		case AtomicOrdering::Unordered:
		case AtomicOrdering::Monotonic:
		return AO_ABI_memory_order_relaxed;
		// Not implemented yet in llvm:
		// case AtomicOrdering::Consume:
		// return AO_ABI_memory_order_consume;
		case AtomicOrdering::Acquire:
		return AO_ABI_memory_order_acquire;
		case AtomicOrdering::Release:
		return AO_ABI_memory_order_release;
		case AtomicOrdering::AcquireRelease:
		return AO_ABI_memory_order_acq_rel;
		case AtomicOrdering::SequentiallyConsistent:
		return AO_ABI_memory_order_seq_cst;
		}
		llvm_unreachable("Unknown atomic memory order.");
		}

		// In order to use one of the sized library calls such as
		// __atomic_fetch_add_4, the alignment must be sufficient, the size
		// must be one of the potentially-specialized sizes, and the value
		// type must actually exist in C on the target (otherwise, the
		// function wouldn't actually be defined.)
		static bool canUseSizedAtomicCall(unsigned Size, unsigned Align,
		const DataLayout &DL) {
		// TODO: "LargestSize" is an approximation for "largest type that
		// you can express in C". It seems to be the case that int128 is
		// supported on all 64-bit platforms, otherwise only up to 64-bit
		// integers are supported. If we get this wrong, then we'll try to
		// call a sized libcall that doesn't actually exist. There should
		// really be some more reliable way in LLVM of determining integer
		// sizes which are valid in the target's C ABI...
		unsigned LargestSize = DL.getLargestLegalIntTypeSize() >= 64 ? 16 : 8;
		return Align >= Size &&
		(Size == 1 \|\| Size == 2 \|\| Size == 4 \|\| Size == 8 \|\| Size == 16) &&
		Size <= LargestSize;
		}

		void AtomicExpand::expandAtomicLoadToLibcall(LoadInst *I) {
		static const RTLIB::Libcall Libcalls[6] = {
		RTLIB::ATOMIC_LOAD, RTLIB::ATOMIC_LOAD_1, RTLIB::ATOMIC_LOAD_2,
		RTLIB::ATOMIC_LOAD_4, RTLIB::ATOMIC_LOAD_8, RTLIB::ATOMIC_LOAD_16};
		unsigned Size = getAtomicOpSize(I);
		unsigned Align = getAtomicOpAlign(I);

		bool expanded = expandAtomicOpToLibcall(
		I, Size, Align, I->getPointerOperand(), nullptr, nullptr,
		I->getOrdering(), AtomicOrdering::NotAtomic, Libcalls);
		assert(expanded && "expandAtomicOpToLibcall shouldn't fail tor Load");
		}

		void AtomicExpand::expandAtomicStoreToLibcall(StoreInst *I) {
		static const RTLIB::Libcall Libcalls[6] = {
		RTLIB::ATOMIC_STORE, RTLIB::ATOMIC_STORE_1, RTLIB::ATOMIC_STORE_2,
		RTLIB::ATOMIC_STORE_4, RTLIB::ATOMIC_STORE_8, RTLIB::ATOMIC_STORE_16};
		unsigned Size = getAtomicOpSize(I);
		unsigned Align = getAtomicOpAlign(I);

		bool expanded = expandAtomicOpToLibcall(
		I, Size, Align, I->getPointerOperand(), I->getValueOperand(), nullptr,
		I->getOrdering(), AtomicOrdering::NotAtomic, Libcalls);
		assert(expanded && "expandAtomicOpToLibcall shouldn't fail tor Store");
		}

		void AtomicExpand::expandAtomicCASToLibcall(AtomicCmpXchgInst *I) {
		static const RTLIB::Libcall Libcalls[6] = {
		RTLIB::ATOMIC_COMPARE_EXCHANGE, RTLIB::ATOMIC_COMPARE_EXCHANGE_1,
		RTLIB::ATOMIC_COMPARE_EXCHANGE_2, RTLIB::ATOMIC_COMPARE_EXCHANGE_4,
		RTLIB::ATOMIC_COMPARE_EXCHANGE_8, RTLIB::ATOMIC_COMPARE_EXCHANGE_16};
		unsigned Size = getAtomicOpSize(I);
		unsigned Align = getAtomicOpAlign(I);

		bool expanded = expandAtomicOpToLibcall(
		I, Size, Align, I->getPointerOperand(), I->getNewValOperand(),
		I->getCompareOperand(), I->getSuccessOrdering(), I->getFailureOrdering(),
		Libcalls);
		assert(expanded && "expandAtomicOpToLibcall shouldn't fail tor CAS");
		}

		static ArrayRef<RTLIB::Libcall> GetRMWLibcall(AtomicRMWInst::BinOp Op) {
		static const RTLIB::Libcall LibcallsXchg[6] = {
		RTLIB::ATOMIC_EXCHANGE, RTLIB::ATOMIC_EXCHANGE_1,
		RTLIB::ATOMIC_EXCHANGE_2, RTLIB::ATOMIC_EXCHANGE_4,
		RTLIB::ATOMIC_EXCHANGE_8, RTLIB::ATOMIC_EXCHANGE_16};
		static const RTLIB::Libcall LibcallsAdd[6] = {
		RTLIB::UNKNOWN_LIBCALL, RTLIB::ATOMIC_FETCH_ADD_1,
		RTLIB::ATOMIC_FETCH_ADD_2, RTLIB::ATOMIC_FETCH_ADD_4,
		RTLIB::ATOMIC_FETCH_ADD_8, RTLIB::ATOMIC_FETCH_ADD_16};
		static const RTLIB::Libcall LibcallsSub[6] = {
		RTLIB::UNKNOWN_LIBCALL, RTLIB::ATOMIC_FETCH_SUB_1,
		RTLIB::ATOMIC_FETCH_SUB_2, RTLIB::ATOMIC_FETCH_SUB_4,
		RTLIB::ATOMIC_FETCH_SUB_8, RTLIB::ATOMIC_FETCH_SUB_16};
		static const RTLIB::Libcall LibcallsAnd[6] = {
		RTLIB::UNKNOWN_LIBCALL, RTLIB::ATOMIC_FETCH_AND_1,
		RTLIB::ATOMIC_FETCH_AND_2, RTLIB::ATOMIC_FETCH_AND_4,
		RTLIB::ATOMIC_FETCH_AND_8, RTLIB::ATOMIC_FETCH_AND_16};
		static const RTLIB::Libcall LibcallsOr[6] = {
		RTLIB::UNKNOWN_LIBCALL, RTLIB::ATOMIC_FETCH_OR_1,
		RTLIB::ATOMIC_FETCH_OR_2, RTLIB::ATOMIC_FETCH_OR_4,
		RTLIB::ATOMIC_FETCH_OR_8, RTLIB::ATOMIC_FETCH_OR_16};
		static const RTLIB::Libcall LibcallsXor[6] = {
		RTLIB::UNKNOWN_LIBCALL, RTLIB::ATOMIC_FETCH_XOR_1,
		RTLIB::ATOMIC_FETCH_XOR_2, RTLIB::ATOMIC_FETCH_XOR_4,
		RTLIB::ATOMIC_FETCH_XOR_8, RTLIB::ATOMIC_FETCH_XOR_16};
		static const RTLIB::Libcall LibcallsNand[6] = {
		RTLIB::UNKNOWN_LIBCALL, RTLIB::ATOMIC_FETCH_NAND_1,
		RTLIB::ATOMIC_FETCH_NAND_2, RTLIB::ATOMIC_FETCH_NAND_4,
		RTLIB::ATOMIC_FETCH_NAND_8, RTLIB::ATOMIC_FETCH_NAND_16};

		switch (Op) {
		case AtomicRMWInst::BAD_BINOP:
		llvm_unreachable("Should not have BAD_BINOP.");
		case AtomicRMWInst::Xchg:
		return LibcallsXchg;
		case AtomicRMWInst::Add:
		return LibcallsAdd;
		case AtomicRMWInst::Sub:
		return LibcallsSub;
		case AtomicRMWInst::And:
		return LibcallsAnd;
		case AtomicRMWInst::Or:
		return LibcallsOr;
		case AtomicRMWInst::Xor:
		return LibcallsXor;
		case AtomicRMWInst::Nand:
		return LibcallsNand;
		case AtomicRMWInst::Max:
		case AtomicRMWInst::Min:
		case AtomicRMWInst::UMax:
		case AtomicRMWInst::UMin:
		// No atomic libcalls are available for max/min/umax/umin.
		return {};
		}
		llvm_unreachable("Unexpected AtomicRMW operation.");
		}

		void AtomicExpand::expandAtomicRMWToLibcall(AtomicRMWInst *I) {
		ArrayRef<RTLIB::Libcall> Libcalls = GetRMWLibcall(I->getOperation());

		unsigned Size = getAtomicOpSize(I);
		unsigned Align = getAtomicOpAlign(I);

		bool Success = false;
		if (!Libcalls.empty())
		Success = expandAtomicOpToLibcall(
		I, Size, Align, I->getPointerOperand(), I->getValOperand(), nullptr,
		I->getOrdering(), AtomicOrdering::NotAtomic, Libcalls);

		// The expansion failed: either there were no libcalls at all for
		// the operation (min/max), or there were only size-specialized
		// libcalls (add/sub/etc) and we needed a generic. So, expand to a
		// CAS libcall, via a CAS loop, instead.
		if (!Success) {
		expandAtomicRMWToCmpXchg(I, [this](IRBuilder<> &Builder, Value *Addr,
		Value Loaded, Value NewVal,
		AtomicOrdering MemOpOrder,
		Value &Success, Value &NewLoaded) {
		// Create the CAS instruction normally...
		AtomicCmpXchgInst *Pair = Builder.CreateAtomicCmpXchg(
		Addr, Loaded, NewVal, MemOpOrder,
		AtomicCmpXchgInst::getStrongestFailureOrdering(MemOpOrder));
		Success = Builder.CreateExtractValue(Pair, 1, "success");
		NewLoaded = Builder.CreateExtractValue(Pair, 0, "newloaded");

		// ...and then expand the CAS into a libcall.
		expandAtomicCASToLibcall(Pair);
		});
		}
		}

		// A helper routine for the above expandAtomic*ToLibcall functions.
		//
		// 'Libcalls' contains an array of enum values for the particular
		// ATOMIC libcalls to be emitted. All of the other arguments besides
		// 'I' are extracted from the Instruction subclass by the
		// caller. Depending on the particular call, some will be null.
		bool AtomicExpand::expandAtomicOpToLibcall(
		Instruction I, unsigned Size, unsigned Align, Value PointerOperand,
		Value ValueOperand, Value CASExpected, AtomicOrdering Ordering,
		AtomicOrdering Ordering2, ArrayRef<RTLIB::Libcall> Libcalls) {
		assert(Libcalls.size() == 6);

		LLVMContext &Ctx = I->getContext();
		Module *M = I->getModule();
		const DataLayout &DL = M->getDataLayout();
		IRBuilder<> Builder(I);
		IRBuilder<> AllocaBuilder(&I->getFunction()->getEntryBlock().front());

		bool UseSizedLibcall = canUseSizedAtomicCall(Size, Align, DL);
		Type SizedIntTy = Type::getIntNTy(Ctx, Size 8);

		unsigned AllocaAlignment = DL.getPrefTypeAlignment(SizedIntTy);

		// TODO: the "order" argument type is "int", not int32. So
		// getInt32Ty may be wrong if the arch uses e.g. 16-bit ints.
		ConstantInt *SizeVal64 = ConstantInt::get(Type::getInt64Ty(Ctx), Size);
		Constant *OrderingVal =
		ConstantInt::get(Type::getInt32Ty(Ctx), libcallAtomicModel(Ordering));
		Constant *Ordering2Val = CASExpected
		? ConstantInt::get(Type::getInt32Ty(Ctx),
		libcallAtomicModel(Ordering2))
		: nullptr;
		bool HasResult = I->getType() != Type::getVoidTy(Ctx);

		RTLIB::Libcall RTLibType;
		if (UseSizedLibcall) {
		switch (Size) {
		case 1: RTLibType = Libcalls[1]; break;
		case 2: RTLibType = Libcalls[2]; break;
		case 4: RTLibType = Libcalls[3]; break;
		case 8: RTLibType = Libcalls[4]; break;
		case 16: RTLibType = Libcalls[5]; break;
		}
		} else if (Libcalls[0] != RTLIB::UNKNOWN_LIBCALL) {
		RTLibType = Libcalls[0];
		} else {
		// Can't use sized function, and there's no generic for this
		// operation, so give up.
		return false;
		}

		// Build up the function call. There's two kinds. First, the sized
		// variants. These calls are going to be one of the following (with
		// N=1,2,4,8,16):
		// iN __atomic_load_N(iN *ptr, int ordering)
		// void __atomic_store_N(iN *ptr, iN val, int ordering)
		// iN __atomic_{exchange\|fetch_}_N(iN ptr, iN val, int ordering)
		// bool __atomic_compare_exchange_N(iN ptr, iN expected, iN desired,
		// int success_order, int failure_order)
		//
		// Note that these functions can be used for non-integer atomic
		// operations, the values just need to be bitcast to integers on the
		// way in and out.
		//
		// And, then, the generic variants. They look like the following:
		// void __atomic_load(size_t size, void ptr, void ret, int ordering)
		// void __atomic_store(size_t size, void ptr, void val, int ordering)
		// void __atomic_exchange(size_t size, void ptr, void val, void *ret,
		// int ordering)
		// bool __atomic_compare_exchange(size_t size, void ptr, void expected,
		// void *desired, int success_order,
		// int failure_order)
		//
		// The different signatures are built up depending on the
		// 'UseSizedLibcall', 'CASExpected', 'ValueOperand', and 'HasResult'
		// variables.

		AllocaInst *AllocaCASExpected = nullptr;
		Value *AllocaCASExpected_i8 = nullptr;
		AllocaInst *AllocaValue = nullptr;
		Value *AllocaValue_i8 = nullptr;
		AllocaInst *AllocaResult = nullptr;
		Value *AllocaResult_i8 = nullptr;

		Type *ResultTy;
		SmallVector<Value *, 6> Args;
		AttributeSet Attr;

		// 'size' argument.
		if (!UseSizedLibcall) {
		// Note, getIntPtrType is assumed equivalent to size_t.
		Args.push_back(ConstantInt::get(DL.getIntPtrType(Ctx), Size));
		}

		// 'ptr' argument.
		Value *PtrVal =
		Builder.CreateBitCast(PointerOperand, Type::getInt8PtrTy(Ctx));
		Args.push_back(PtrVal);

		// 'expected' argument, if present.
		if (CASExpected) {
		AllocaCASExpected = AllocaBuilder.CreateAlloca(CASExpected->getType());
		AllocaCASExpected->setAlignment(AllocaAlignment);
		AllocaCASExpected_i8 =
		Builder.CreateBitCast(AllocaCASExpected, Type::getInt8PtrTy(Ctx));
		Builder.CreateLifetimeStart(AllocaCASExpected_i8, SizeVal64);
		Builder.CreateAlignedStore(CASExpected, AllocaCASExpected, AllocaAlignment);
		Args.push_back(AllocaCASExpected_i8);
		}

		// 'val' argument ('desired' for cas), if present.
		if (ValueOperand) {
		if (UseSizedLibcall) {
		Value *IntValue =
		Builder.CreateBitOrPointerCast(ValueOperand, SizedIntTy);
		Args.push_back(IntValue);
		} else {
		AllocaValue = AllocaBuilder.CreateAlloca(ValueOperand->getType());
		AllocaValue->setAlignment(AllocaAlignment);
		AllocaValue_i8 =
		Builder.CreateBitCast(AllocaValue, Type::getInt8PtrTy(Ctx));
		Builder.CreateLifetimeStart(AllocaValue_i8, SizeVal64);
		Builder.CreateAlignedStore(ValueOperand, AllocaValue, AllocaAlignment);
		Args.push_back(AllocaValue_i8);
		}
		}

		// 'ret' argument.
		if (!CASExpected && HasResult && !UseSizedLibcall) {
		AllocaResult = AllocaBuilder.CreateAlloca(I->getType());
		AllocaResult->setAlignment(AllocaAlignment);
		AllocaResult_i8 =
		Builder.CreateBitCast(AllocaResult, Type::getInt8PtrTy(Ctx));
		Builder.CreateLifetimeStart(AllocaResult_i8, SizeVal64);
		Args.push_back(AllocaResult_i8);
		}

		// 'ordering' ('success_order' for cas) argument.
		Args.push_back(OrderingVal);

		// 'failure_order' argument, if present.
		if (Ordering2Val)
		Args.push_back(Ordering2Val);

		// Now, the return type.
		if (CASExpected) {
		ResultTy = Type::getInt1Ty(Ctx);
		Attr = Attr.addAttribute(Ctx, AttributeSet::ReturnIndex, Attribute::ZExt);
		} else if (HasResult && UseSizedLibcall)
		ResultTy = SizedIntTy;
		else
		ResultTy = Type::getVoidTy(Ctx);

		// Done with setting up arguments and return types, create the call:
		SmallVector<Type *, 6> ArgTys;
		for (Value *Arg : Args)
		ArgTys.push_back(Arg->getType());
		FunctionType *FnType = FunctionType::get(ResultTy, ArgTys, false);
		Constant *LibcallFn =
		M->getOrInsertFunction(TLI->getLibcallName(RTLibType), FnType, Attr);
		CallInst *Call = Builder.CreateCall(LibcallFn, Args);
		Call->setAttributes(Attr);
		Value *Result = Call;

		// And then, extract the results...
		if (ValueOperand && !UseSizedLibcall)
		Builder.CreateLifetimeEnd(AllocaValue_i8, SizeVal64);

		if (CASExpected) {
		// The final result from the CAS is {load of 'expected' alloca, bool result
		// from call}
		Type *FinalResultTy = I->getType();
		Value *V = UndefValue::get(FinalResultTy);
		Value *ExpectedOut =
		Builder.CreateAlignedLoad(AllocaCASExpected, AllocaAlignment);
		Builder.CreateLifetimeEnd(AllocaCASExpected_i8, SizeVal64);
		V = Builder.CreateInsertValue(V, ExpectedOut, 0);
		V = Builder.CreateInsertValue(V, Result, 1);
		I->replaceAllUsesWith(V);
		} else if (HasResult) {
		Value *V;
		if (UseSizedLibcall)
		V = Builder.CreateBitOrPointerCast(Result, I->getType());
		else {
		V = Builder.CreateAlignedLoad(AllocaResult, AllocaAlignment);
		Builder.CreateLifetimeEnd(AllocaResult_i8, SizeVal64);
		}
		I->replaceAllUsesWith(V);
		}
		I->eraseFromParent();
		return true;
		}

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	static void InitLibcallNames(const char **Names, const Triple &TT) {
Names[RTLIB::SYNC_FETCH_AND_MIN_4] = "__sync_fetch_and_min_4";		Names[RTLIB::SYNC_FETCH_AND_MIN_4] = "__sync_fetch_and_min_4";
Names[RTLIB::SYNC_FETCH_AND_MIN_8] = "__sync_fetch_and_min_8";		Names[RTLIB::SYNC_FETCH_AND_MIN_8] = "__sync_fetch_and_min_8";
Names[RTLIB::SYNC_FETCH_AND_MIN_16] = "__sync_fetch_and_min_16";		Names[RTLIB::SYNC_FETCH_AND_MIN_16] = "__sync_fetch_and_min_16";
Names[RTLIB::SYNC_FETCH_AND_UMIN_1] = "__sync_fetch_and_umin_1";		Names[RTLIB::SYNC_FETCH_AND_UMIN_1] = "__sync_fetch_and_umin_1";
Names[RTLIB::SYNC_FETCH_AND_UMIN_2] = "__sync_fetch_and_umin_2";		Names[RTLIB::SYNC_FETCH_AND_UMIN_2] = "__sync_fetch_and_umin_2";
Names[RTLIB::SYNC_FETCH_AND_UMIN_4] = "__sync_fetch_and_umin_4";		Names[RTLIB::SYNC_FETCH_AND_UMIN_4] = "__sync_fetch_and_umin_4";
Names[RTLIB::SYNC_FETCH_AND_UMIN_8] = "__sync_fetch_and_umin_8";		Names[RTLIB::SYNC_FETCH_AND_UMIN_8] = "__sync_fetch_and_umin_8";
Names[RTLIB::SYNC_FETCH_AND_UMIN_16] = "__sync_fetch_and_umin_16";		Names[RTLIB::SYNC_FETCH_AND_UMIN_16] = "__sync_fetch_and_umin_16";

		Names[RTLIB::ATOMIC_LOAD] = "__atomic_load";
		Names[RTLIB::ATOMIC_LOAD_1] = "__atomic_load_1";
		Names[RTLIB::ATOMIC_LOAD_2] = "__atomic_load_2";
		Names[RTLIB::ATOMIC_LOAD_4] = "__atomic_load_4";
		Names[RTLIB::ATOMIC_LOAD_8] = "__atomic_load_8";
		Names[RTLIB::ATOMIC_LOAD_16] = "__atomic_load_16";

		Names[RTLIB::ATOMIC_STORE] = "__atomic_store";
		Names[RTLIB::ATOMIC_STORE_1] = "__atomic_store_1";
		Names[RTLIB::ATOMIC_STORE_2] = "__atomic_store_2";
		Names[RTLIB::ATOMIC_STORE_4] = "__atomic_store_4";
		Names[RTLIB::ATOMIC_STORE_8] = "__atomic_store_8";
		Names[RTLIB::ATOMIC_STORE_16] = "__atomic_store_16";

		Names[RTLIB::ATOMIC_EXCHANGE] = "__atomic_exchange";
		Names[RTLIB::ATOMIC_EXCHANGE_1] = "__atomic_exchange_1";
		Names[RTLIB::ATOMIC_EXCHANGE_2] = "__atomic_exchange_2";
		Names[RTLIB::ATOMIC_EXCHANGE_4] = "__atomic_exchange_4";
		Names[RTLIB::ATOMIC_EXCHANGE_8] = "__atomic_exchange_8";
		Names[RTLIB::ATOMIC_EXCHANGE_16] = "__atomic_exchange_16";

		Names[RTLIB::ATOMIC_COMPARE_EXCHANGE] = "__atomic_compare_exchange";
		Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_1] = "__atomic_compare_exchange_1";
		Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_2] = "__atomic_compare_exchange_2";
		Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_4] = "__atomic_compare_exchange_4";
		Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_8] = "__atomic_compare_exchange_8";
		Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_16] = "__atomic_compare_exchange_16";

		Names[RTLIB::ATOMIC_FETCH_ADD_1] = "__atomic_fetch_add_1";
		Names[RTLIB::ATOMIC_FETCH_ADD_2] = "__atomic_fetch_add_2";
		Names[RTLIB::ATOMIC_FETCH_ADD_4] = "__atomic_fetch_add_4";
		Names[RTLIB::ATOMIC_FETCH_ADD_8] = "__atomic_fetch_add_8";
		Names[RTLIB::ATOMIC_FETCH_ADD_16] = "__atomic_fetch_add_16";
		Names[RTLIB::ATOMIC_FETCH_SUB_1] = "__atomic_fetch_sub_1";
		Names[RTLIB::ATOMIC_FETCH_SUB_2] = "__atomic_fetch_sub_2";
		Names[RTLIB::ATOMIC_FETCH_SUB_4] = "__atomic_fetch_sub_4";
		Names[RTLIB::ATOMIC_FETCH_SUB_8] = "__atomic_fetch_sub_8";
		Names[RTLIB::ATOMIC_FETCH_SUB_16] = "__atomic_fetch_sub_16";
		Names[RTLIB::ATOMIC_FETCH_AND_1] = "__atomic_fetch_and_1";
		Names[RTLIB::ATOMIC_FETCH_AND_2] = "__atomic_fetch_and_2";
		Names[RTLIB::ATOMIC_FETCH_AND_4] = "__atomic_fetch_and_4";
		Names[RTLIB::ATOMIC_FETCH_AND_8] = "__atomic_fetch_and_8";
		Names[RTLIB::ATOMIC_FETCH_AND_16] = "__atomic_fetch_and_16";
		Names[RTLIB::ATOMIC_FETCH_OR_1] = "__atomic_fetch_or_1";
		Names[RTLIB::ATOMIC_FETCH_OR_2] = "__atomic_fetch_or_2";
		Names[RTLIB::ATOMIC_FETCH_OR_4] = "__atomic_fetch_or_4";
		Names[RTLIB::ATOMIC_FETCH_OR_8] = "__atomic_fetch_or_8";
		Names[RTLIB::ATOMIC_FETCH_OR_16] = "__atomic_fetch_or_16";
		Names[RTLIB::ATOMIC_FETCH_XOR_1] = "__atomic_fetch_xor_1";
		Names[RTLIB::ATOMIC_FETCH_XOR_2] = "__atomic_fetch_xor_2";
		Names[RTLIB::ATOMIC_FETCH_XOR_4] = "__atomic_fetch_xor_4";
		Names[RTLIB::ATOMIC_FETCH_XOR_8] = "__atomic_fetch_xor_8";
		Names[RTLIB::ATOMIC_FETCH_XOR_16] = "__atomic_fetch_xor_16";
		Names[RTLIB::ATOMIC_FETCH_NAND_1] = "__atomic_fetch_nand_1";
		Names[RTLIB::ATOMIC_FETCH_NAND_2] = "__atomic_fetch_nand_2";
		Names[RTLIB::ATOMIC_FETCH_NAND_4] = "__atomic_fetch_nand_4";
		Names[RTLIB::ATOMIC_FETCH_NAND_8] = "__atomic_fetch_nand_8";
		Names[RTLIB::ATOMIC_FETCH_NAND_16] = "__atomic_fetch_nand_16";

if (TT.getEnvironment() == Triple::GNU) {		if (TT.getEnvironment() == Triple::GNU) {
Names[RTLIB::SINCOS_F32] = "sincosf";		Names[RTLIB::SINCOS_F32] = "sincosf";
Names[RTLIB::SINCOS_F64] = "sincos";		Names[RTLIB::SINCOS_F64] = "sincos";
Names[RTLIB::SINCOS_F80] = "sincosl";		Names[RTLIB::SINCOS_F80] = "sincosl";
Names[RTLIB::SINCOS_F128] = "sincosl";		Names[RTLIB::SINCOS_F128] = "sincosl";
Names[RTLIB::SINCOS_PPCF128] = "sincosl";		Names[RTLIB::SINCOS_PPCF128] = "sincosl";
} else {		} else {
// These are generally not available.		// These are generally not available.
▲ Show 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm) : TM(tm) {
JumpBufSize = 0;		JumpBufSize = 0;
JumpBufAlignment = 0;		JumpBufAlignment = 0;
MinFunctionAlignment = 0;		MinFunctionAlignment = 0;
PrefFunctionAlignment = 0;		PrefFunctionAlignment = 0;
PrefLoopAlignment = 0;		PrefLoopAlignment = 0;
GatherAllAliasesMaxDepth = 6;		GatherAllAliasesMaxDepth = 6;
MinStackArgumentAlignment = 1;		MinStackArgumentAlignment = 1;
MinimumJumpTableEntries = 4;		MinimumJumpTableEntries = 4;
		// TODO: the default will be switched to 0 in the next commit, along
		// with the Target-specific changes necessary.
		MaxAtomicSizeInBitsSupported = 1024;

InitLibcallNames(LibcallRoutineNames, TM.getTargetTriple());		InitLibcallNames(LibcallRoutineNames, TM.getTargetTriple());
InitCmpLibcallCCs(CmpLibcallCCs);		InitCmpLibcallCCs(CmpLibcallCCs);
InitLibcallCallingConvs(LibcallCallingConvs);		InitLibcallCallingConvs(LibcallCallingConvs);
}		}

void TargetLoweringBase::initActions() {		void TargetLoweringBase::initActions() {
// All operations default to being supported.		// All operations default to being supported.
▲ Show 20 Lines • Show All 990 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/Sparc/SparcISelLowering.cpp

Show First 20 Lines • Show All 1,605 Lines • ▼ Show 20 Lines	if (Subtarget->is64Bit()) {
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Expand);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Expand);
setOperationAction(ISD::BSWAP, MVT::i64, Expand);		setOperationAction(ISD::BSWAP, MVT::i64, Expand);
setOperationAction(ISD::ROTL , MVT::i64, Expand);		setOperationAction(ISD::ROTL , MVT::i64, Expand);
setOperationAction(ISD::ROTR , MVT::i64, Expand);		setOperationAction(ISD::ROTR , MVT::i64, Expand);
setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i64, Custom);		setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i64, Custom);
}		}

// ATOMICs.		// ATOMICs.
		// Atomics are only supported on Sparcv9. (32bit atomics are also
		// supported by the Leon sparcv8 variant, but we don't support that
		// yet.)
		if (Subtarget->isV9())
		setMaxAtomicSizeInBitsSupported(64);
		else
		setMaxAtomicSizeInBitsSupported(0);

setOperationAction(ISD::ATOMIC_SWAP, MVT::i32, Legal);		setOperationAction(ISD::ATOMIC_SWAP, MVT::i32, Legal);
setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32,		setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32,
(Subtarget->isV9() ? Legal: Expand));		(Subtarget->isV9() ? Legal: Expand));


setOperationAction(ISD::ATOMIC_FENCE, MVT::Other, Legal);		setOperationAction(ISD::ATOMIC_FENCE, MVT::Other, Legal);

▲ Show 20 Lines • Show All 1,644 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/AtomicExpand/SPARC/libcalls.ll

				; RUN: opt -S %s -atomic-expand \| FileCheck %s

				;;; NOTE: this test is actually target-independent -- any target which
				;;; doesn't support inline atomics can be used. (E.g. X86 i386 would
				;;; work, if LLVM is properly taught about what it's missing vs i586.)

				;target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
				;target triple = "i386-unknown-unknown"
				target datalayout = "e-m:e-p:32:32-i64:64-f128:64-n32-S64"
				target triple = "sparc-unknown-unknown"

				;; First, check the sized calls. Except for cmpxchg, these are fairly
				;; straightforward.

				; CHECK-LABEL: @test_load_i16(
				; CHECK: %1 = bitcast i16* %arg to i8*
				; CHECK: %2 = call i16 @__atomic_load_2(i8* %1, i32 5)
				; CHECK: ret i16 %2
				define i16 @test_load_i16(i16* %arg) {
				%ret = load atomic i16, i16* %arg seq_cst, align 4
				ret i16 %ret
				}

				; CHECK-LABEL: @test_store_i16(
				; CHECK: %1 = bitcast i16* %arg to i8*
				; CHECK: call void @__atomic_store_2(i8* %1, i16 %val, i32 5)
				; CHECK: ret void
				define void @test_store_i16(i16* %arg, i16 %val) {
				store atomic i16 %val, i16* %arg seq_cst, align 4
				ret void
				}

				; CHECK-LABEL: @test_exchange_i16(
				; CHECK: %1 = bitcast i16* %arg to i8*
				; CHECK: %2 = call i16 @__atomic_exchange_2(i8* %1, i16 %val, i32 5)
				; CHECK: ret i16 %2
				define i16 @test_exchange_i16(i16* %arg, i16 %val) {
				%ret = atomicrmw xchg i16* %arg, i16 %val seq_cst
				ret i16 %ret
				}

				; CHECK-LABEL: @test_cmpxchg_i16(
				; CHECK: %1 = bitcast i16* %arg to i8*
				; CHECK: %2 = alloca i16, align 2
				; CHECK: %3 = bitcast i16* %2 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 2, i8* %3)
				; CHECK: store i16 %old, i16* %2, align 2
				; CHECK: %4 = call zeroext i1 @__atomic_compare_exchange_2(i8* %1, i8* %3, i16 %new, i32 5, i32 0)
				; CHECK: %5 = load i16, i16* %2, align 2
				; CHECK: call void @llvm.lifetime.end(i64 2, i8* %3)
				; CHECK: %6 = insertvalue { i16, i1 } undef, i16 %5, 0
				; CHECK: %7 = insertvalue { i16, i1 } %6, i1 %4, 1
				; CHECK: %ret = extractvalue { i16, i1 } %7, 0
				; CHECK: ret i16 %ret
				define i16 @test_cmpxchg_i16(i16* %arg, i16 %old, i16 %new) {
				%ret_succ = cmpxchg i16* %arg, i16 %old, i16 %new seq_cst monotonic
				%ret = extractvalue { i16, i1 } %ret_succ, 0
				ret i16 %ret
				}

				; CHECK-LABEL: @test_add_i16(
				; CHECK: %1 = bitcast i16* %arg to i8*
				; CHECK: %2 = call i16 @__atomic_fetch_add_2(i8* %1, i16 %val, i32 5)
				; CHECK: ret i16 %2
				define i16 @test_add_i16(i16* %arg, i16 %val) {
				%ret = atomicrmw add i16* %arg, i16 %val seq_cst
				ret i16 %ret
				}


				;; Now, check the output for the unsized libcalls. i128 is used for
				;; these tests because the "16" suffixed functions aren't available on
				;; 32-bit i386.

				; CHECK-LABEL: @test_load_i128(
				; CHECK: %1 = bitcast i128* %arg to i8*
				; CHECK: %2 = alloca i128, align 8
				; CHECK: %3 = bitcast i128* %2 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %3)
				; CHECK: call void @__atomic_load(i32 16, i8* %1, i8* %3, i32 5)
				; CHECK: %4 = load i128, i128* %2, align 8
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %3)
				; CHECK: ret i128 %4
				define i128 @test_load_i128(i128* %arg) {
				%ret = load atomic i128, i128* %arg seq_cst, align 16
				ret i128 %ret
				}

				; CHECK-LABEL @test_store_i128(
				; CHECK: %1 = bitcast i128* %arg to i8*
				; CHECK: %2 = alloca i128, align 8
				; CHECK: %3 = bitcast i128* %2 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %3)
				; CHECK: store i128 %val, i128* %2, align 8
				; CHECK: call void @__atomic_store(i32 16, i8* %1, i8* %3, i32 5)
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %3)
				; CHECK: ret void
				define void @test_store_i128(i128* %arg, i128 %val) {
				store atomic i128 %val, i128* %arg seq_cst, align 16
				ret void
				}

				; CHECK-LABEL: @test_exchange_i128(
				; CHECK: %1 = bitcast i128* %arg to i8*
				; CHECK: %2 = alloca i128, align 8
				; CHECK: %3 = bitcast i128* %2 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %3)
				; CHECK: store i128 %val, i128* %2, align 8
				; CHECK: %4 = alloca i128, align 8
				; CHECK: %5 = bitcast i128* %4 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %5)
				; CHECK: call void @__atomic_exchange(i32 16, i8* %1, i8* %3, i8* %5, i32 5)
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %3)
				; CHECK: %6 = load i128, i128* %4, align 8
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %5)
				; CHECK: ret i128 %6
				define i128 @test_exchange_i128(i128* %arg, i128 %val) {
				%ret = atomicrmw xchg i128* %arg, i128 %val seq_cst
				ret i128 %ret
				}

				; CHECK-LABEL: @test_cmpxchg_i128(
				; CHECK: %1 = bitcast i128* %arg to i8*
				; CHECK: %2 = alloca i128, align 8
				; CHECK: %3 = bitcast i128* %2 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %3)
				; CHECK: store i128 %old, i128* %2, align 8
				; CHECK: %4 = alloca i128, align 8
				; CHECK: %5 = bitcast i128* %4 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %5)
				; CHECK: store i128 %new, i128* %4, align 8
				; CHECK: %6 = call zeroext i1 @__atomic_compare_exchange(i32 16, i8* %1, i8* %3, i8* %5, i32 5, i32 0)
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %5)
				; CHECK: %7 = load i128, i128* %2, align 8
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %3)
				; CHECK: %8 = insertvalue { i128, i1 } undef, i128 %7, 0
				; CHECK: %9 = insertvalue { i128, i1 } %8, i1 %6, 1
				; CHECK: %ret = extractvalue { i128, i1 } %9, 0
				; CHECK: ret i128 %ret
				define i128 @test_cmpxchg_i128(i128* %arg, i128 %old, i128 %new) {
				%ret_succ = cmpxchg i128* %arg, i128 %old, i128 %new seq_cst monotonic
				%ret = extractvalue { i128, i1 } %ret_succ, 0
				ret i128 %ret
				}

				; This one is a verbose expansion, as there is no generic
				; __atomic_fetch_add function, so it needs to expand to a cmpxchg
				; loop, which then itself expands into a libcall.

				; CHECK-LABEL: @test_add_i128(
				; CHECK: %1 = alloca i128, align 8
				; CHECK: %2 = alloca i128, align 8
				; CHECK: %3 = load i128, i128* %arg, align 16
				; CHECK: br label %atomicrmw.start
				; CHECK:atomicrmw.start:
				; CHECK: %loaded = phi i128 [ %3, %0 ], [ %newloaded, %atomicrmw.start ]
				; CHECK: %new = add i128 %loaded, %val
				; CHECK: %4 = bitcast i128* %arg to i8*
				; CHECK: %5 = bitcast i128* %1 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %5)
				; CHECK: store i128 %loaded, i128* %1, align 8
				; CHECK: %6 = bitcast i128* %2 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %6)
				; CHECK: store i128 %new, i128* %2, align 8
				; CHECK: %7 = call zeroext i1 @__atomic_compare_exchange(i32 16, i8* %4, i8* %5, i8* %6, i32 5, i32 5)
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %6)
				; CHECK: %8 = load i128, i128* %1, align 8
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %5)
				; CHECK: %9 = insertvalue { i128, i1 } undef, i128 %8, 0
				; CHECK: %10 = insertvalue { i128, i1 } %9, i1 %7, 1
				; CHECK: %success = extractvalue { i128, i1 } %10, 1
				; CHECK: %newloaded = extractvalue { i128, i1 } %10, 0
				; CHECK: br i1 %success, label %atomicrmw.end, label %atomicrmw.start
				; CHECK:atomicrmw.end:
				; CHECK: ret i128 %newloaded
				define i128 @test_add_i128(i128* %arg, i128 %val) {
				%ret = atomicrmw add i128* %arg, i128 %val seq_cst
				ret i128 %ret
				}

				;; Ensure that non-integer types get bitcast correctly on the way in and out of a libcall:

				; CHECK-LABEL: @test_load_double(
				; CHECK: %1 = bitcast double* %arg to i8*
				; CHECK: %2 = call i64 @__atomic_load_8(i8* %1, i32 5)
				; CHECK: %3 = bitcast i64 %2 to double
				; CHECK: ret double %3
				define double @test_load_double(double* %arg, double %val) {
				%1 = load atomic double, double* %arg seq_cst, align 16
				ret double %1
				}

				; CHECK-LABEL: @test_store_double(
				; CHECK: %1 = bitcast double* %arg to i8*
				; CHECK: %2 = bitcast double %val to i64
				; CHECK: call void @__atomic_store_8(i8* %1, i64 %2, i32 5)
				; CHECK: ret void
				define void @test_store_double(double* %arg, double %val) {
				store atomic double %val, double* %arg seq_cst, align 16
				ret void
				}

				; CHECK-LABEL: @test_cmpxchg_ptr(
				; CHECK: %1 = bitcast i16** %arg to i8*
				; CHECK: %2 = alloca i16*, align 4
				; CHECK: %3 = bitcast i16** %2 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 4, i8* %3)
				; CHECK: store i16* %old, i16** %2, align 4
				; CHECK: %4 = ptrtoint i16* %new to i32
				; CHECK: %5 = call zeroext i1 @__atomic_compare_exchange_4(i8* %1, i8* %3, i32 %4, i32 5, i32 2)
				; CHECK: %6 = load i16, i16* %2, align 4
				; CHECK: call void @llvm.lifetime.end(i64 4, i8* %3)
				; CHECK: %7 = insertvalue { i16, i1 } undef, i16 %6, 0
				; CHECK: %8 = insertvalue { i16*, i1 } %7, i1 %5, 1
				; CHECK: %ret = extractvalue { i16*, i1 } %8, 0
				; CHECK: ret i16* %ret
				; CHECK: }
				define i16* @test_cmpxchg_ptr(i16** %arg, i16* %old, i16* %new) {
				%ret_succ = cmpxchg i16** %arg, i16* %old, i16* %new seq_cst acquire
				%ret = extractvalue { i16*, i1 } %ret_succ, 0
				ret i16* %ret
				}

				;; ...and for a non-integer type of large size too.

				; CHECK-LABEL: @test_store_fp128
				; CHECK: %1 = bitcast fp128* %arg to i8*
				; CHECK: %2 = alloca fp128, align 8
				; CHECK: %3 = bitcast fp128* %2 to i8*
				; CHECK: call void @llvm.lifetime.start(i64 16, i8* %3)
				; CHECK: store fp128 %val, fp128* %2, align 8
				; CHECK: call void @__atomic_store(i32 16, i8* %1, i8* %3, i32 5)
				; CHECK: call void @llvm.lifetime.end(i64 16, i8* %3)
				; CHECK: ret void
				define void @test_store_fp128(fp128* %arg, fp128 %val) {
				store atomic fp128 %val, fp128* %arg seq_cst, align 16
				ret void
				}

				;; Unaligned loads and stores should be expanded to the generic
				;; libcall, just like large loads/stores, and not a specialized one.
				;; NOTE: atomicrmw and cmpxchg don't yet support an align attribute;
				;; when such support is added, they should also be tested here.

				; CHECK-LABEL: @test_unaligned_load_i16(
				; CHECK: __atomic_load(
				define i16 @test_unaligned_load_i16(i16* %arg) {
				%ret = load atomic i16, i16* %arg seq_cst, align 1
				ret i16 %ret
				}

				; CHECK-LABEL: @test_unaligned_store_i16(
				; CHECK: __atomic_store(
				define void @test_unaligned_store_i16(i16* %arg, i16 %val) {
				store atomic i16 %val, i16* %arg seq_cst, align 1
				ret void
				}

llvm/trunk/test/Transforms/AtomicExpand/SPARC/lit.local.cfg

				if not 'Sparc' in config.root.targets:
				config.unsupported = True