This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.
ClosedPublic

Authored by ABataev on Nov 27 2014, 12:54 AM.

Download Raw Diff

Details

Reviewers

rjmccall
• fraggamuffin
• ejstotzer

Commits

rC226788: [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.
rC226786: [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.
rC226784: [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.
rL226788: [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.
rL226786: [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.
rL226784: [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.

Summary

"omp atomic read [seq_cst]" accepts expressions "v=x;". In this patch we perform an atomic load of "x" (using builtin atomic loading instructions or a call to "atomic_load()" for simple lvalues and "kmpc_atomic_start();load <x>;kmpc_atomic_end();" for other lvalues), convert the result of loading to type of "v" (using EmitScalarConversion() for simple types and EmitComplexToScalarConversion() for conversions from complex to scalar) and then store the result in "v".

Diff Detail

Repository: rL LLVM

Event Timeline

ABataev updated this revision to Diff 16670.Nov 27 2014, 12:54 AM

ABataev retitled this revision from to [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive..

ABataev updated this object.

ABataev edited the test plan for this revision. (Show Details)

ABataev added reviewers: rjmccall, • ejstotzer, • fraggamuffin.

ABataev added a subscriber: Unknown Object (MLST).

rjmccall added inline comments.Dec 1 2014, 12:11 AM

lib/CodeGen/CGAtomic.cpp
44 ↗	(On Diff #16670)	Test once with getAs, please.
72 ↗	(On Diff #16670)	A more general thing to do here is to ask the TargetInfo if there are lockless atomics at a particular bit-width and alignment. You can implement that function this way, but let it make the power-of-two assumption, rather than scattered places in IRGen.
979 ↗	(On Diff #16670)	I think it'd be cleaner to just have all the success cases return instead of re-assigning result.
lib/CodeGen/CGStmtOpenMP.cpp
563 ↗	(On Diff #16670)	Somewhere in this code, you should document the ABI requirements here, which appear to basically be: bitfield and vector element lvalues always go through the OpenMP locking path all other lvalues use the target's rules for atomics of the given size Are you planning to use the global lock for anything else besides non-simple lvalues?
567 ↗	(On Diff #16670)	All of the interesting cases here should be separate functions (as you add them).
591 ↗	(On Diff #16670)	I think you'll want this functionality for some of the other atomic ops anyway, so you might as well make this a separate function that you can call like this: llvm::Value *ScalarVal = convertToScalarValue(CGF, Res, X->getType(), V->getType()); It's kindof a shame that you have to redo all of this instead of just relying on the implicit casts that Sema already created when it analyzed the expression, but maybe this is better than messing around with OpaqueValueExprs, and it works because you don't have a requirement to handle custom operators.
608 ↗	(On Diff #16670)	Same thing: go ahead and make a separate function for this that goes from an RValue to a ComplexPairTy. Also, the r-value will tell you whether it's a scalar/complex/aggregate, you don't need to (somewhat expensively) recompute that.
652 ↗	(On Diff #16670)	You don't need a break after llvm_unreachable.
679 ↗	(On Diff #16670)	Same.

Hi John,
Thanks a lot for the review right after holidays!!! :) See my comments, especially about global locks.

lib/CodeGen/CGAtomic.cpp
44 ↗	(On Diff #16670)	Ok, agree
72 ↗	(On Diff #16670)	Yes, I'll add the function to TargetInfo.
979 ↗	(On Diff #16670)	Ok, reworked
lib/CodeGen/CGStmtOpenMP.cpp
563 ↗	(On Diff #16670)	Ok, added. Yes, it may be used for some complex atomic operations, which cannot be implemented using some trivial operations (like user-defined reductions).
567 ↗	(On Diff #16670)	Agree
591 ↗	(On Diff #16670)	Agree, I'll rework this. Yes, I tried to implement it using some hooks/hacks in AST, but there was a lot of troubles with atomic ops. We will have to support some custom operators in future (in OpenMP 4.0 reductions introduces user-defined reductions, which must be called as atomic ops and I plan to use global locks for them).
608 ↗	(On Diff #16670)	Ok
652 ↗	(On Diff #16670)	My bad, removed
679 ↗	(On Diff #16670)	Removed

Update after review

rjmccall added inline comments.Dec 1 2014, 10:44 AM

include/clang/Basic/TargetInfo.h
379 ↗	(On Diff #16759)	That's just getCharWidth() but more expensive.
lib/CodeGen/CGStmtOpenMP.cpp
603 ↗	(On Diff #16759)	I think you missed this comment from my review — please make the OMPC_read case its own function.
591 ↗	(On Diff #16670)	You can't mix locking and non-locking atomics on the same object: they'll be atomic with respect to themselves, but not with respect to each other. That is, assuming that atomic_start and atomic_end aren't implemented by halting all other OpenMP threads. e.g. imagine that you have a = that's implemented with a compare-and-swap loop and a custom reduction that you've implemented with the global lock. The global lock doesn't keep the compare-and-swap from firing during the execution of the custom reduction, so (1) different parts of the custom reduction might see different values for the original l-value and (2) the custom reduction will completely overwrite the = rather than appearing to execute before or after. In fact, you have a similar problem with aggregates, where an atomic access to an aggregate (e.g. a std::pair<float,float>) cannot be made atomic with respect to an atomic access to a subobject unless the aggregate will be accessed locklessly. (You could do some operations locklessly if you wrote them very carefully, e.g. reads and writes, but I don't know of any way to do an aggregate compare-and-swap that's atomic with a subobject compare-and-swap when the first requires a lock and the second doesn't.) That's mostly defined away by the current requirement that the l-value have scalar type, except that you do support _Complex l-values right now, and 32-bit platforms generally can't make _Complex double lockless. The reverse problem applies to vectors: a 16-byte vector will generally be sufficiently aligned that a 64-bit platform can access it locklessly, but you're implementing some accesses to subobjects with locks: specifically, vector projections that create compound l-values. This is really a specification problem: OpenMP wants simple operations to be implementable with native atomics, but that really restricts the set of operations you can do unless you can assume at a high level that there are never partially overlapping atomic operations. The easiest language solution is to say that, for the purposes of OpenMP's atomics, _Complex and vector types don't count as scalars. But I don't know if that would fly — it might be a goal to loosen the restrictions about what types you can use in atomic operations.

ABataev added inline comments.Dec 1 2014, 11:34 PM

include/clang/Basic/TargetInfo.h
379 ↗	(On Diff #16759)	Fixed
lib/CodeGen/CGStmtOpenMP.cpp
603 ↗	(On Diff #16759)	Yes, I missed this. Ok, I'll do.
591 ↗	(On Diff #16670)	I agree with you, currently the code is not quite correct. I think we can resolve this problem by emitting OpenMP specific locks for target supported lockfree atomic operations. But in this case we don't need target atomic ops at all, we can rely on runtime library interface only.

rjmccall added inline comments.Dec 2 2014, 9:24 AM

lib/CodeGen/CGStmtOpenMP.cpp
591 ↗	(On Diff #16670)	You mean, always grabbing a lock instead of ever using lock-free operations? I agree that that's the most general solution, but it's really heavy-handed. It's possible that you can still do user-defined reductions with a compare-and-swap loop, depending on how they're defined. (Specifically: you have to be allowed to try the reduction multiple times, and the reduction's only allowed to access a single atomic value. The existing specification about the #pragma tries to enforce those conditions even for the simple cases allowed there, so I'm optimistic that whoever is writing your spec is at least thinking about these problems.) What you really need here is a statement about subobjects. The C11/C++11 atomic types are very nice because they tell you exactly at what level something is atomic: you can have an std::atomic<std::pair<int,int>>, but that type doesn't provide accessors for atomically accessing the individual members of the pair; you can't just project out a std::atomic<int>& from the first element. That's really important, because it tells you locally that all atomic accesses are going to reference the full, 64-byte aggregate, which means you can agree very easily on a protocol for that access. OpenMP doesn't tie into the type system that way, but you can still impose that rule at a high level by saying that it's undefined behavior for atomic accesses to an aggregate to race with atomic accesses to a subobject. For example, you can have an atomic operation on a _Complex float, and you can have a simultaneous atomic operation on a float, but the fact that they're simultaneous means that we can assume they don't alias. That still lets us decide on the atomic access pattern based purely on the type of the atomic l-value.

ABataev added inline comments.Dec 15 2014, 1:32 AM

lib/CodeGen/CGStmtOpenMP.cpp
591 ↗	(On Diff #16670)	John, I decided to use lock free operations on simple lvalues, but for other lvalues I'm going to use global lock provided by the runtime. I think in this case we can avoid conflicts conflicts. What do you think about this solution?

rjmccall added inline comments.Dec 15 2014, 10:16 AM

lib/CodeGen/CGStmtOpenMP.cpp
591 ↗	(On Diff #16670)	That isn't good enough. Locking solutions do not allow lock-free access to scalar subobjects, and you can't reliably tell from a scalar access whether it's a subobject of something. You need a language rule that says that you can never have simultaneous atomic accesses to both an aggregate (including vectors and complex values) and its subobjects. Once you have that language rule, then using a global lock vs. a lock-free patterns for non-simple vs. simple l-values is fine, as long as all of the user-defined reductions you'll need to implement later are implementable with lock-free compare-and-swap.

Yes, agree. Then, probably, it is better just to disable support for
some of lvalues.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

15.12.2014 21:16, John McCall пишет:

Comment at: lib/CodeGen/CGStmtOpenMP.cpp:591
@@ +590,3 @@
+ llvm::Value *ScalarVal;
+ switch (CGF.getEvaluationKind(X->getType())) {

+ case TEK_Scalar:

ABataev wrote:
rjmccall wrote:
ABataev wrote:
rjmccall wrote:
ABataev wrote:
rjmccall wrote:
I think you'll want this functionality for some of the other atomic ops anyway, so you might as well make this a separate function that you can call like this:
llvm::Value *ScalarVal = convertToScalarValue(CGF, Res, X->getType(), V->getType());
It's kindof a shame that you have to redo all of this instead of just relying on the implicit casts that Sema already created when it analyzed the expression, but maybe this is better than messing around with OpaqueValueExprs, and it works because you don't have a requirement to handle custom operators.
Agree, I'll rework this.

Yes, I tried to implement it using some hooks/hacks in AST, but there was a lot of troubles with atomic ops.

We will have to support some custom operators in future (in OpenMP 4.0 reductions introduces user-defined reductions, which must be called as atomic ops and I plan to use global locks for them).
You can't mix locking and non-locking atomics on the same object: they'll be atomic with respect to themselves, but not with respect to each other. That is, assuming that atomic_start and atomic_end aren't implemented by halting all other OpenMP threads.

e.g. imagine that you have a *= that's implemented with a compare-and-swap loop and a custom reduction that you've implemented with the global lock. The global lock doesn't keep the compare-and-swap from firing during the execution of the custom reduction, so (1) different parts of the custom reduction might see different values for the original l-value and (2) the custom reduction will completely overwrite the *= rather than appearing to execute before or after.

In fact, you have a similar problem with aggregates, where an atomic access to an aggregate (e.g. a std::pair<float,float>) cannot be made atomic with respect to an atomic access to a subobject unless the aggregate will be accessed locklessly. (You could do *some* operations locklessly if you wrote them very carefully, e.g. reads and writes, but I don't know of any way to do an aggregate compare-and-swap that's atomic with a subobject compare-and-swap when the first requires a lock and the second doesn't.) That's mostly defined away by the current requirement that the l-value have scalar type, except that you do support _Complex l-values right now, and 32-bit platforms generally can't make _Complex double lockless. The reverse problem applies to vectors: a 16-byte vector will generally be sufficiently aligned that a 64-bit platform can access it locklessly, but you're implementing some accesses to subobjects with locks: specifically, vector projections that create compound l-values.

This is really a specification problem: OpenMP wants simple operations to be implementable with native atomics, but that really restricts the set of operations you can do unless you can assume at a high level that there are never partially overlapping atomic operations. The easiest language solution is to say that, for the purposes of OpenMP's atomics, _Complex and vector types don't count as scalars. But I don't know if that would fly — it might be a goal to loosen the restrictions about what types you can use in atomic operations.
I agree with you, currently the code is not quite correct. I think we can resolve this problem by emitting OpenMP specific locks for target supported lockfree atomic operations. But in this case we don't need target atomic ops at all, we can rely on runtime library interface only.
You mean, always grabbing a lock instead of ever using lock-free operations? I agree that that's the most general solution, but it's really heavy-handed.

It's possible that you can still do user-defined reductions with a compare-and-swap loop, depending on how they're defined. (Specifically: you have to be allowed to try the reduction multiple times, and the reduction's only allowed to access a single atomic value. The existing specification about the #pragma tries to enforce those conditions even for the simple cases allowed there, so I'm optimistic that whoever is writing your spec is at least thinking about these problems.)

What you really need here is a statement about subobjects. The C11/C++11 atomic types are very nice because they tell you exactly at what level something is atomic: you can have an std::atomic<std::pair<int,int>>, but that type doesn't provide accessors for atomically accessing the individual members of the pair; you can't just project out a std::atomic<int>& from the first element. That's really important, because it tells you locally that all atomic accesses are going to reference the full, 64-byte aggregate, which means you can agree very easily on a protocol for that access.

OpenMP doesn't tie into the type system that way, but you can still impose that rule at a high level by saying that it's undefined behavior for atomic accesses to an aggregate to race with atomic accesses to a subobject. For example, you can have an atomic operation on a _Complex float, and you can have a simultaneous atomic operation on a float, but the fact that they're simultaneous means that we can assume they don't alias. That still lets us decide on the atomic access pattern based purely on the type of the atomic l-value.
John, I decided to use lock free operations on simple lvalues, but for other lvalues I'm going to use global lock provided by the runtime. I think in this case we can avoid conflicts conflicts. What do you think about this solution?
That isn't good enough. Locking solutions do not allow lock-free access to scalar subobjects, and you can't reliably tell from a scalar access whether it's a subobject of something. You need a language rule that says that you can never have simultaneous atomic accesses to both an aggregate (including vectors and complex values) and its subobjects.

Once you have that language rule, then using a global lock vs. a lock-free patterns for non-simple vs. simple l-values is fine, as long as all of the user-defined reductions you'll need to implement later are implementable with lock-free compare-and-swap.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/

Okay. So where does that leave us with this patch? The minimal thing is to just wait for direction from the OpenMP language committee, or we can optimistically assume that we get that rule and then build the implementation around that assumption.

Note also that you need to be compatible with whatever GCC is doing here, assuming you're trying to guarantee compiler interop.

I discussed this problem with the author of atomics in OpenMP already. I
hope to get the answer from him tomorrow.
PS. I'll try to investigate this problem a little bit more, maybe I'll
come into some suitable solution. I have an idea in mind, but I need to
play with it a little bit.
PPS. gcc uses global lock for all atomic operations and that's all. In
icc there is a little bit more complex solution. I want to use an
existing infrastructure of clang/LLVM.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

16.12.2014 21:23, John McCall пишет:

Okay. So where does that leave us with this patch? The minimal thing is to just wait for direction from the OpenMP language committee, or we can optimistically assume that we get that rule and then build the implementation around that assumption.

Note also that you need to be compatible with whatever GCC is doing here, assuming you're trying to guarantee compiler interop.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/

If GCC always surrounds accesses in a global lock from the OpenMP runtime, then you will need to do the same in clang for compatibility unless you've decided not to care about GCC compatibility. I guess you could provide some sort of -fno-openmp-gcc-compatibility flag if you want.

I don't think we should follow GCC rules, because currently we're using
libiomp5 runtime interface, not gomp. Gomp compatible interface can be
implemented later

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

17.12.2014 6:44, John McCall пишет:

If GCC always surrounds accesses in a global lock from the OpenMP runtime, then you will need to do the same in clang for compatibility unless you've decided not to care about GCC compatibility. I guess you could provide some sort of -fno-openmp-gcc-compatibility flag if you want.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/

Okay, so if it's not GCC, who exactly is already using libiomp5? On a previous patch, I was told that we had to maintain compatibility with older runtimes, and it doesn't make sense to me that we have to support interoperation with old runtimes but not with generated code for any particular compiler that uses it. So you really need to look to see how those compilers emit atomics. That is, unless you're willing to break ABI with them as well, in which case we're effectively designing a new ABI, so why do we care about old runtimes at all?

I agree that it would be awful to be stuck using a global lock for all atomic operations, so if you *are* willing to break ABI, that's great.

Fixed codegen for atomic load for bitfield, vector element and ext vector element lvalues: __atomic_load() builtin is used for loading of the whole part of lvalue and then regular processing of bitfield or vector element is applied to this loaded value. Lvalues for global registers are still loaded using global lock.
The same scheme is supposed to be used for all other atomic operations

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Bitfields are interesting because IRGen actually uses larger-than-strictly-necessary accesses: if you have a struct containing 12 bytes of adjacent bitfields, we will join them all into one large i96 access. You need to use narrower bounds than that because you need something that's guaranteed stable: you can't have one version of the compiler trying to access the bitfield with a 12-byte atomic access and another accessing it with a 4-byte access, because the atomic runtime functions don't promise that such accesses will actually be atomic w.r.t. each other. You'll need to invent a rule here that you're willing to stick to forever.

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

test/OpenMP/atomic_read_codegen.c
28 ↗	(On Diff #17532)	This ends up being an inadvertently confusing variable name, since it ends in "six".

Hi John, thanks for the review.

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Yes, that's the plan. Except for global registers: I did not find
compare-and-swap op in LLVM IR for them, so I decided to use global
locks for them.

... You need to use narrower bounds than that because you need something that's guaranteed stable: ...

Hmm, I did not catch why there can be troubles with bitfileds. Why one
compiler may use 12-byte atomic access, while another one will produce a
4-byte access? I think all atomic accesses will be the same. According
to OpenMP spec we cannot perform atomic operation on the whole bitfield
structure, only on their particular bitfields (atomic ops are allowed
only for scalar values). So I expect that all atomic operations on
bitfields will be performed on the same bounds. Or you mean something else?

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

I thought about it. I agree, but also it may significantly complicate
the code itself. That's why I decided to use only libcalls, taking into
account that atomic operations on bitfields/vector elements are very
rarely used (if any, actually I did not see any, but it is good to have
a working solution for all kinds of lvalues).

This ends up being an inadvertently confusing variable name, since it ends in "six".

Ok, I'll try to improve it after our holidays.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

05.01.2015 11:08, John McCall пишет:

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Bitfields are interesting because IRGen actually uses larger-than-strictly-necessary accesses: if you have a struct containing 12 bytes of adjacent bitfields, we will join them all into one large i96 access. You need to use narrower bounds than that because you need something that's guaranteed stable: you can't have one version of the compiler trying to access the bitfield with a 12-byte atomic access and another accessing it with a 4-byte access, because the atomic runtime functions don't promise that such accesses will actually be atomic w.r.t. each other. You'll need to invent a rule here that you're willing to stick to forever.

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

Comment at: test/OpenMP/atomic_read_codegen.c:28
@@ +27,3 @@
+typedef int v4si attribute((vector_size(16)));
+v4si v4six;

+

This ends up being an inadvertently confusing variable name, since it ends in "six".

http://reviews.llvm.org/D6431

EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/

In D6431#105585, @ABataev wrote:

Hi John, thanks for the review.

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Yes, that's the plan. Except for global registers: I did not find
compare-and-swap op in LLVM IR for them, so I decided to use global
locks for them.

I think global registers are generally thread-local, aren't they?

... You need to use narrower bounds than that because you need something that's guaranteed stable: ...

Hmm, I did not catch why there can be troubles with bitfileds. Why one
compiler may use 12-byte atomic access, while another one will produce a
4-byte access? I think all atomic accesses will be the same.

Given this structure:

struct S { int x: 32; int y: 32; };

Clang will currently emit an access to x by masking a 64-bit load. This is an essentially arbitrary implementation choice in IRGen; we've changed it before, and we may change it again in the future. You're generating code that depends on this arbitrary choice, because it ends up being the pointee type of the address stored in the bitfield LValue.

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

I thought about it. I agree, but also it may significantly complicate
the code itself. That's why I decided to use only libcalls, taking into
account that atomic operations on bitfields/vector elements are very
rarely used (if any, actually I did not see any, but it is good to have
a working solution for all kinds of lvalues).

It only seems more convenient because you're doing this logic in a deep place within the atomics code. If you had a high-level routine that reasoned about the kind of LValue it was working with *before* committing to an evaluation strategy, and then just called lower-level atomic routines as if it was doing an atomic operation on a char/short/int (for a bitfield) or the entire vector (for a vector element), this would fall out more naturally.

John.

Hi John,

I think global registers are generally thread-local, aren't they?

Oh, yes, you're right, I missed it. I'll fix it.

Given this structure:
struct S { int x: 32; int y: 32; };
Clang will currently emit an access to x by masking a 64-bit load. This is an essentially arbitrary implementation choice in IRGen; we've changed it before, and we may change it again in the future. You're generating code that depends on this arbitrary choice, because it ends up being the pointee type of the address stored in the bitfield LValue.

Ahh, you're talking about compatibility between different versions of clang/LLVM compilers? I see then. Ok, I'll try to fix it somehow.

It only seems more convenient because you're doing this logic in a deep place within the atomics code. If you had a high-level routine that reasoned about the kind of LValue it was working with *before* committing to an evaluation strategy, and then just called lower-level atomic routines as if it was doing an atomic operation on a char/short/int (for a bitfield) or the entire vector (for a vector element), this would fall out more naturally.

Agree, I'll try to improve the code.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

06.01.2015 4:44, John McCall пишет:

In http://reviews.llvm.org/D6431#105585, @ABataev wrote:
Hi John, thanks for the review.

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Yes, that's the plan. Except for global registers: I did not find
compare-and-swap op in LLVM IR for them, so I decided to use global
locks for them.
I think global registers are generally thread-local, aren't they?
... You need to use narrower bounds than that because you need something that's guaranteed stable: ...

Hmm, I did not catch why there can be troubles with bitfileds. Why one
compiler may use 12-byte atomic access, while another one will produce a
4-byte access? I think all atomic accesses will be the same.
Given this structure:
struct S { int x: 32; int y: 32; };
Clang will currently emit an access to x by masking a 64-bit load. This is an essentially arbitrary implementation choice in IRGen; we've changed it before, and we may change it again in the future. You're generating code that depends on this arbitrary choice, because it ends up being the pointee type of the address stored in the bitfield LValue.
Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

I thought about it. I agree, but also it may significantly complicate
the code itself.  That's why I decided to use only libcalls, taking into
account that atomic operations on bitfields/vector elements are very
rarely used (if any, actually I did not see any, but it is good to have
a working solution for all kinds of lvalues).
It only seems more convenient because you're doing this logic in a deep place within the atomics code. If you had a high-level routine that reasoned about the kind of LValue it was working with *before* committing to an evaluation strategy, and then just called lower-level atomic routines as if it was doing an atomic operation on a char/short/int (for a bitfield) or the entire vector (for a vector element), this would fall out more naturally.

John.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/

Update after review

rjmccall added inline comments.Jan 16 2015, 2:01 AM

lib/CodeGen/CGAtomic.cpp
77 ↗	(On Diff #18142)	Hmm. I feel like you should make sure that your rule uses an aligned atomic access if it's possible to do so. That is, if the bitfield does fall within a single aligned unit, you should definitely access it with an atomic of that size. I think you need to consider the actual bitfield width for this, but maybe I'm missing something and that's done implicitly elsewhere.
86 ↗	(On Diff #18142)	This is dangerous; I think you can end up with an access that goes beyond the end of the structure with this. Consider a 1-bit bitfield of type "unsigned" that's at the end of the struct, e.g. struct { unsigned : 31 unsigned x : 1; // <- access is to this }; You need to make sure this is only accessed with an 8-bit operation.
88 ↗	(On Diff #18142)	This alignment needs to be adjusted.

Fixed codegen for bitfields + foramtting after review.

John, thanks for the review. I prepared a new patch, please look at this.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

16.01.2015 13:01, John McCall пишет:

Comment at: lib/CodeGen/CGAtomic.cpp:77
@@ +76,3 @@
+ auto &OrigBFI = lvalue.getBitFieldInfo();
+ auto OffsetInChars = C.toCharUnitsFromBits(OrigBFI.Offset);

+ auto VoidPtrAddr = CGF.EmitCastToVoidPtr(lvalue.getBitFieldAddr());

Hmm. I feel like you should make sure that your rule uses an aligned atomic access if it's possible to do so. That is, if the bitfield does fall within a single aligned unit, you should definitely access it with an atomic of that size. I think you need to consider the actual bitfield width for this, but maybe I'm missing something and that's done implicitly elsewhere.

Comment at: lib/CodeGen/CGAtomic.cpp:86
@@ +85,3 @@
+ BFI.Offset %= C.getCharWidth();
+ BFI.StorageSize = AtomicSizeInBits;

+ LVal = LValue::MakeBitfield(Addr, BFI, lvalue.getType(),

This is dangerous; I think you can end up with an access that goes beyond the end of the structure with this. Consider a 1-bit bitfield of type "unsigned" that's at the end of the struct, e.g.

struct {
unsigned : 31
unsigned x : 1; // <- access is to this
};

You need to make sure this is only accessed with an 8-bit operation.

Comment at: lib/CodeGen/CGAtomic.cpp:88
@@ +87,3 @@
+ LVal = LValue::MakeBitfield(Addr, BFI, lvalue.getType(),
+ lvalue.getAlignment());

+ } else if (lvalue.isVectorElt()) {

This alignment needs to be adjusted.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/

Cute, yes, I think that works.

Closed by commit rL226784: [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive. (authored by ABataev). · Explain WhyJan 21 2015, 9:30 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

CodeGen/

CGAtomic.cpp

222 lines

CGStmtOpenMP.cpp

121 lines

Sema/

SemaType.cpp

2 lines

test/

OpenMP/

atomic_read_codegen.c

333 lines

Diff 18582

cfe/trunk/lib/CodeGen/CGAtomic.cpp

//===--- CGAtomic.cpp - Emit LLVM IR for atomic operations ----------------===//		//===--- CGAtomic.cpp - Emit LLVM IR for atomic operations ----------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file contains the code for emitting atomic operations.		// This file contains the code for emitting atomic operations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "CGCall.h"		#include "CGCall.h"
		#include "CGRecordLayout.h"
#include "CodeGenModule.h"		#include "CodeGenModule.h"
#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/CodeGen/CGFunctionInfo.h"		#include "clang/CodeGen/CGFunctionInfo.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

namespace {		namespace {
class AtomicInfo {		class AtomicInfo {
CodeGenFunction &CGF;		CodeGenFunction &CGF;
QualType AtomicTy;		QualType AtomicTy;
QualType ValueTy;		QualType ValueTy;
uint64_t AtomicSizeInBits;		uint64_t AtomicSizeInBits;
uint64_t ValueSizeInBits;		uint64_t ValueSizeInBits;
CharUnits AtomicAlign;		CharUnits AtomicAlign;
CharUnits ValueAlign;		CharUnits ValueAlign;
CharUnits LValueAlign;		CharUnits LValueAlign;
TypeEvaluationKind EvaluationKind;		TypeEvaluationKind EvaluationKind;
bool UseLibcall;		bool UseLibcall;
		LValue LVal;
		CGBitFieldInfo BFI;
public:		public:
AtomicInfo(CodeGenFunction &CGF, LValue &lvalue) : CGF(CGF) {		AtomicInfo(CodeGenFunction &CGF, LValue &lvalue)
assert(lvalue.isSimple());		: CGF(CGF), AtomicSizeInBits(0), ValueSizeInBits(0), UseLibcall(true) {
		assert(!lvalue.isGlobalReg());
		ASTContext &C = CGF.getContext();
		if (lvalue.isSimple()) {
AtomicTy = lvalue.getType();		AtomicTy = lvalue.getType();
ValueTy = AtomicTy->castAs<AtomicType>()->getValueType();		if (auto *ATy = AtomicTy->getAs<AtomicType>())
		ValueTy = ATy->getValueType();
		else
		ValueTy = AtomicTy;
EvaluationKind = CGF.getEvaluationKind(ValueTy);		EvaluationKind = CGF.getEvaluationKind(ValueTy);

ASTContext &C = CGF.getContext();

uint64_t ValueAlignInBits;		uint64_t ValueAlignInBits;
uint64_t AtomicAlignInBits;		uint64_t AtomicAlignInBits;
TypeInfo ValueTI = C.getTypeInfo(ValueTy);		TypeInfo ValueTI = C.getTypeInfo(ValueTy);
ValueSizeInBits = ValueTI.Width;		ValueSizeInBits = ValueTI.Width;
ValueAlignInBits = ValueTI.Align;		ValueAlignInBits = ValueTI.Align;

TypeInfo AtomicTI = C.getTypeInfo(AtomicTy);		TypeInfo AtomicTI = C.getTypeInfo(AtomicTy);
AtomicSizeInBits = AtomicTI.Width;		AtomicSizeInBits = AtomicTI.Width;
AtomicAlignInBits = AtomicTI.Align;		AtomicAlignInBits = AtomicTI.Align;

assert(ValueSizeInBits <= AtomicSizeInBits);		assert(ValueSizeInBits <= AtomicSizeInBits);
assert(ValueAlignInBits <= AtomicAlignInBits);		assert(ValueAlignInBits <= AtomicAlignInBits);

AtomicAlign = C.toCharUnitsFromBits(AtomicAlignInBits);		AtomicAlign = C.toCharUnitsFromBits(AtomicAlignInBits);
ValueAlign = C.toCharUnitsFromBits(ValueAlignInBits);		ValueAlign = C.toCharUnitsFromBits(ValueAlignInBits);
if (lvalue.getAlignment().isZero())		if (lvalue.getAlignment().isZero())
lvalue.setAlignment(AtomicAlign);		lvalue.setAlignment(AtomicAlign);

		LVal = lvalue;
		} else if (lvalue.isBitField()) {
		auto &OrigBFI = lvalue.getBitFieldInfo();
		auto Offset = OrigBFI.Offset % C.toBits(lvalue.getAlignment());
		AtomicSizeInBits = C.toBits(
		C.toCharUnitsFromBits(Offset + OrigBFI.Size + C.getCharWidth() - 1)
		.RoundUpToAlignment(lvalue.getAlignment()));
		auto VoidPtrAddr = CGF.EmitCastToVoidPtr(lvalue.getBitFieldAddr());
		auto OffsetInChars =
		(C.toCharUnitsFromBits(OrigBFI.Offset) / lvalue.getAlignment()) *
		lvalue.getAlignment();
		VoidPtrAddr = CGF.Builder.CreateConstGEP1_64(
		VoidPtrAddr, OffsetInChars.getQuantity());
		auto Addr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
		VoidPtrAddr,
		CGF.Builder.getIntNTy(AtomicSizeInBits)->getPointerTo(),
		"atomic_bitfield_base");
		BFI = OrigBFI;
		BFI.Offset = Offset;
		BFI.StorageSize = AtomicSizeInBits;
		LVal = LValue::MakeBitfield(Addr, BFI, lvalue.getType(),
		lvalue.getAlignment());
		} else if (lvalue.isVectorElt()) {
		AtomicSizeInBits = C.getTypeSize(lvalue.getType());
		LVal = lvalue;
		} else {
		assert(lvalue.isExtVectorElt());
		AtomicSizeInBits = C.getTypeSize(lvalue.getType());
		LVal = lvalue;
		}
UseLibcall = !C.getTargetInfo().hasBuiltinAtomic(		UseLibcall = !C.getTargetInfo().hasBuiltinAtomic(
AtomicSizeInBits, C.toBits(lvalue.getAlignment()));		AtomicSizeInBits, C.toBits(lvalue.getAlignment()));
}		}

QualType getAtomicType() const { return AtomicTy; }		QualType getAtomicType() const { return AtomicTy; }
QualType getValueType() const { return ValueTy; }		QualType getValueType() const { return ValueTy; }
CharUnits getAtomicAlignment() const { return AtomicAlign; }		CharUnits getAtomicAlignment() const { return AtomicAlign; }
CharUnits getValueAlignment() const { return ValueAlign; }		CharUnits getValueAlignment() const { return ValueAlign; }
uint64_t getAtomicSizeInBits() const { return AtomicSizeInBits; }		uint64_t getAtomicSizeInBits() const { return AtomicSizeInBits; }
uint64_t getValueSizeInBits() const { return ValueSizeInBits; }		uint64_t getValueSizeInBits() const { return ValueSizeInBits; }
TypeEvaluationKind getEvaluationKind() const { return EvaluationKind; }		TypeEvaluationKind getEvaluationKind() const { return EvaluationKind; }
bool shouldUseLibcall() const { return UseLibcall; }		bool shouldUseLibcall() const { return UseLibcall; }
		const LValue &getAtomicLValue() const { return LVal; }

/// Is the atomic size larger than the underlying value type?		/// Is the atomic size larger than the underlying value type?
///		///
/// Note that the absence of padding does not mean that atomic		/// Note that the absence of padding does not mean that atomic
/// objects are completely interchangeable with non-atomic		/// objects are completely interchangeable with non-atomic
/// objects: we might have promoted the alignment of a type		/// objects: we might have promoted the alignment of a type
/// without making it bigger.		/// without making it bigger.
bool hasPadding() const {		bool hasPadding() const {
return (ValueSizeInBits != AtomicSizeInBits);		return (ValueSizeInBits != AtomicSizeInBits);
}		}

bool emitMemSetZeroIfNecessary(LValue dest) const;		bool emitMemSetZeroIfNecessary() const;

llvm::Value *getAtomicSizeValue() const {		llvm::Value *getAtomicSizeValue() const {
CharUnits size = CGF.getContext().toCharUnitsFromBits(AtomicSizeInBits);		CharUnits size = CGF.getContext().toCharUnitsFromBits(AtomicSizeInBits);
return CGF.CGM.getSize(size);		return CGF.CGM.getSize(size);
}		}

/// Cast the given pointer to an integer pointer suitable for		/// Cast the given pointer to an integer pointer suitable for
/// atomic operations.		/// atomic operations.
llvm::Value emitCastToAtomicIntPointer(llvm::Value addr) const;		llvm::Value emitCastToAtomicIntPointer(llvm::Value addr) const;

/// Turn an atomic-layout object into an r-value.		/// Turn an atomic-layout object into an r-value.
RValue convertTempToRValue(llvm::Value *addr,		RValue convertTempToRValue(llvm::Value *addr,
AggValueSlot resultSlot,		AggValueSlot resultSlot,
SourceLocation loc) const;		SourceLocation loc) const;

/// \brief Converts a rvalue to integer value.		/// \brief Converts a rvalue to integer value.
llvm::Value *convertRValueToInt(RValue RVal) const;		llvm::Value *convertRValueToInt(RValue RVal) const;

RValue convertIntToValue(llvm::Value *IntVal, AggValueSlot ResultSlot,		RValue convertIntToValue(llvm::Value *IntVal, AggValueSlot ResultSlot,
SourceLocation Loc) const;		SourceLocation Loc) const;

/// Copy an atomic r-value into atomic-layout memory.		/// Copy an atomic r-value into atomic-layout memory.
void emitCopyIntoMemory(RValue rvalue, LValue lvalue) const;		void emitCopyIntoMemory(RValue rvalue) const;

/// Project an l-value down to the value field.		/// Project an l-value down to the value field.
LValue projectValue(LValue lvalue) const {		LValue projectValue() const {
llvm::Value *addr = lvalue.getAddress();		assert(LVal.isSimple());
		llvm::Value *addr = LVal.getAddress();
if (hasPadding())		if (hasPadding())
addr = CGF.Builder.CreateStructGEP(addr, 0);		addr = CGF.Builder.CreateStructGEP(addr, 0);

return LValue::MakeAddr(addr, getValueType(), lvalue.getAlignment(),		return LValue::MakeAddr(addr, getValueType(), LVal.getAlignment(),
CGF.getContext(), lvalue.getTBAAInfo());		CGF.getContext(), LVal.getTBAAInfo());
}		}

/// Materialize an atomic r-value in atomic-layout memory.		/// Materialize an atomic r-value in atomic-layout memory.
llvm::Value *materializeRValue(RValue rvalue) const;		llvm::Value *materializeRValue(RValue rvalue) const;

private:		private:
bool requiresMemSetZero(llvm::Type *type) const;		bool requiresMemSetZero(llvm::Type *type) const;
};		};
Show All 36 Lines	bool AtomicInfo::requiresMemSetZero(llvm::Type *type) const {

// Padding in structs has an undefined bit pattern. User beware.		// Padding in structs has an undefined bit pattern. User beware.
case TEK_Aggregate:		case TEK_Aggregate:
return false;		return false;
}		}
llvm_unreachable("bad evaluation kind");		llvm_unreachable("bad evaluation kind");
}		}

bool AtomicInfo::emitMemSetZeroIfNecessary(LValue dest) const {		bool AtomicInfo::emitMemSetZeroIfNecessary() const {
llvm::Value *addr = dest.getAddress();		assert(LVal.isSimple());
		llvm::Value *addr = LVal.getAddress();
if (!requiresMemSetZero(addr->getType()->getPointerElementType()))		if (!requiresMemSetZero(addr->getType()->getPointerElementType()))
return false;		return false;

CGF.Builder.CreateMemSet(addr, llvm::ConstantInt::get(CGF.Int8Ty, 0),		CGF.Builder.CreateMemSet(addr, llvm::ConstantInt::get(CGF.Int8Ty, 0),
AtomicSizeInBits / 8,		AtomicSizeInBits / 8,
dest.getAlignment().getQuantity());		LVal.getAlignment().getQuantity());
return true;		return true;
}		}

static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,		static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
llvm::Value Dest, llvm::Value Ptr,		llvm::Value Dest, llvm::Value Ptr,
llvm::Value Val1, llvm::Value Val2,		llvm::Value Val1, llvm::Value Val2,
uint64_t Size, unsigned Align,		uint64_t Size, unsigned Align,
llvm::AtomicOrdering SuccessOrder,		llvm::AtomicOrdering SuccessOrder,
▲ Show 20 Lines • Show All 706 Lines • ▼ Show 20 Lines	llvm::Value AtomicInfo::emitCastToAtomicIntPointer(llvm::Value addr) const {
llvm::IntegerType *ty =		llvm::IntegerType *ty =
llvm::IntegerType::get(CGF.getLLVMContext(), AtomicSizeInBits);		llvm::IntegerType::get(CGF.getLLVMContext(), AtomicSizeInBits);
return CGF.Builder.CreateBitCast(addr, ty->getPointerTo(addrspace));		return CGF.Builder.CreateBitCast(addr, ty->getPointerTo(addrspace));
}		}

RValue AtomicInfo::convertTempToRValue(llvm::Value *addr,		RValue AtomicInfo::convertTempToRValue(llvm::Value *addr,
AggValueSlot resultSlot,		AggValueSlot resultSlot,
SourceLocation loc) const {		SourceLocation loc) const {
		if (LVal.isSimple()) {
if (EvaluationKind == TEK_Aggregate)		if (EvaluationKind == TEK_Aggregate)
return resultSlot.asRValue();		return resultSlot.asRValue();

// Drill into the padding structure if we have one.		// Drill into the padding structure if we have one.
if (hasPadding())		if (hasPadding())
addr = CGF.Builder.CreateStructGEP(addr, 0);		addr = CGF.Builder.CreateStructGEP(addr, 0);

// Otherwise, just convert the temporary to an r-value using the		// Otherwise, just convert the temporary to an r-value using the
// normal conversion routine.		// normal conversion routine.
return CGF.convertTempToRValue(addr, getValueType(), loc);		return CGF.convertTempToRValue(addr, getValueType(), loc);
		} else if (LVal.isBitField())
		return CGF.EmitLoadOfBitfieldLValue(LValue::MakeBitfield(
		addr, LVal.getBitFieldInfo(), LVal.getType(), LVal.getAlignment()));
		else if (LVal.isVectorElt())
		return CGF.EmitLoadOfLValue(LValue::MakeVectorElt(addr, LVal.getVectorIdx(),
		LVal.getType(),
		LVal.getAlignment()),
		loc);
		assert(LVal.isExtVectorElt());
		return CGF.EmitLoadOfExtVectorElementLValue(LValue::MakeExtVectorElt(
		addr, LVal.getExtVectorElts(), LVal.getType(), LVal.getAlignment()));
}		}

RValue AtomicInfo::convertIntToValue(llvm::Value *IntVal,		RValue AtomicInfo::convertIntToValue(llvm::Value *IntVal,
AggValueSlot ResultSlot,		AggValueSlot ResultSlot,
SourceLocation Loc) const {		SourceLocation Loc) const {
		assert(LVal.isSimple());
// Try not to in some easy cases.		// Try not to in some easy cases.
assert(IntVal->getType()->isIntegerTy() && "Expected integer value");		assert(IntVal->getType()->isIntegerTy() && "Expected integer value");
if (getEvaluationKind() == TEK_Scalar && !hasPadding()) {		if (getEvaluationKind() == TEK_Scalar && !hasPadding()) {
auto *ValTy = CGF.ConvertTypeForMem(ValueTy);		auto *ValTy = CGF.ConvertTypeForMem(ValueTy);
if (ValTy->isIntegerTy()) {		if (ValTy->isIntegerTy()) {
assert(IntVal->getType() == ValTy && "Different integer types.");		assert(IntVal->getType() == ValTy && "Different integer types.");
return RValue::get(IntVal);		return RValue::get(IntVal);
} else if (ValTy->isPointerTy())		} else if (ValTy->isPointerTy())
Show All 25 Lines	RValue AtomicInfo::convertIntToValue(llvm::Value *IntVal,
return convertTempToRValue(Temp, ResultSlot, Loc);		return convertTempToRValue(Temp, ResultSlot, Loc);
}		}

/// Emit a load from an l-value of atomic type. Note that the r-value		/// Emit a load from an l-value of atomic type. Note that the r-value
/// we produce is an r-value of the atomic value type.		/// we produce is an r-value of the atomic value type.
RValue CodeGenFunction::EmitAtomicLoad(LValue src, SourceLocation loc,		RValue CodeGenFunction::EmitAtomicLoad(LValue src, SourceLocation loc,
AggValueSlot resultSlot) {		AggValueSlot resultSlot) {
AtomicInfo atomics(*this, src);		AtomicInfo atomics(*this, src);
		LValue LVal = atomics.getAtomicLValue();
		llvm::Value *SrcAddr = nullptr;
		llvm::AllocaInst *NonSimpleTempAlloca = nullptr;
		if (LVal.isSimple())
		SrcAddr = LVal.getAddress();
		else {
		if (LVal.isBitField())
		SrcAddr = LVal.getBitFieldAddr();
		else if (LVal.isVectorElt())
		SrcAddr = LVal.getVectorAddr();
		else {
		assert(LVal.isExtVectorElt());
		SrcAddr = LVal.getExtVectorAddr();
		}
		NonSimpleTempAlloca = CreateTempAlloca(
		SrcAddr->getType()->getPointerElementType(), "atomic-load-temp");
		NonSimpleTempAlloca->setAlignment(getContext().toBits(src.getAlignment()));
		}

// Check whether we should use a library call.		// Check whether we should use a library call.
if (atomics.shouldUseLibcall()) {		if (atomics.shouldUseLibcall()) {
llvm::Value *tempAddr;		llvm::Value *tempAddr;
		if (LVal.isSimple()) {
if (!resultSlot.isIgnored()) {		if (!resultSlot.isIgnored()) {
assert(atomics.getEvaluationKind() == TEK_Aggregate);		assert(atomics.getEvaluationKind() == TEK_Aggregate);
tempAddr = resultSlot.getAddr();		tempAddr = resultSlot.getAddr();
} else {		} else
tempAddr = CreateMemTemp(atomics.getAtomicType(), "atomic-load-temp");		tempAddr = CreateMemTemp(atomics.getAtomicType(), "atomic-load-temp");
}		} else
		tempAddr = NonSimpleTempAlloca;

// void __atomic_load(size_t size, void mem, void return, int order);		// void __atomic_load(size_t size, void mem, void return, int order);
CallArgList args;		CallArgList args;
args.add(RValue::get(atomics.getAtomicSizeValue()),		args.add(RValue::get(atomics.getAtomicSizeValue()),
getContext().getSizeType());		getContext().getSizeType());
args.add(RValue::get(EmitCastToVoidPtr(src.getAddress())),		args.add(RValue::get(EmitCastToVoidPtr(SrcAddr)), getContext().VoidPtrTy);
getContext().VoidPtrTy);		args.add(RValue::get(EmitCastToVoidPtr(tempAddr)), getContext().VoidPtrTy);
args.add(RValue::get(EmitCastToVoidPtr(tempAddr)),
getContext().VoidPtrTy);
args.add(RValue::get(llvm::ConstantInt::get(		args.add(RValue::get(llvm::ConstantInt::get(
IntTy, AtomicExpr::AO_ABI_memory_order_seq_cst)),		IntTy, AtomicExpr::AO_ABI_memory_order_seq_cst)),
getContext().IntTy);		getContext().IntTy);
emitAtomicLibcall(*this, "__atomic_load", getContext().VoidTy, args);		emitAtomicLibcall(*this, "__atomic_load", getContext().VoidTy, args);

// Produce the r-value.		// Produce the r-value.
return atomics.convertTempToRValue(tempAddr, resultSlot, loc);		return atomics.convertTempToRValue(tempAddr, resultSlot, loc);
}		}

// Okay, we're doing this natively.		// Okay, we're doing this natively.
llvm::Value *addr = atomics.emitCastToAtomicIntPointer(src.getAddress());		llvm::Value *addr = atomics.emitCastToAtomicIntPointer(SrcAddr);
llvm::LoadInst *load = Builder.CreateLoad(addr, "atomic-load");		llvm::LoadInst *load = Builder.CreateLoad(addr, "atomic-load");
load->setAtomic(llvm::SequentiallyConsistent);		load->setAtomic(llvm::SequentiallyConsistent);

// Other decoration.		// Other decoration.
load->setAlignment(src.getAlignment().getQuantity());		load->setAlignment(src.getAlignment().getQuantity());
if (src.isVolatileQualified())		if (src.isVolatileQualified())
load->setVolatile(true);		load->setVolatile(true);
if (src.getTBAAInfo())		if (src.getTBAAInfo())
CGM.DecorateInstruction(load, src.getTBAAInfo());		CGM.DecorateInstruction(load, src.getTBAAInfo());

// If we're ignoring an aggregate return, don't do anything.		// If we're ignoring an aggregate return, don't do anything.
if (atomics.getEvaluationKind() == TEK_Aggregate && resultSlot.isIgnored())		if (atomics.getEvaluationKind() == TEK_Aggregate && resultSlot.isIgnored())
return RValue::getAggregate(nullptr, false);		return RValue::getAggregate(nullptr, false);

// Okay, turn that back into the original value type.		// Okay, turn that back into the original value type.
		if (src.isSimple())
return atomics.convertIntToValue(load, resultSlot, loc);		return atomics.convertIntToValue(load, resultSlot, loc);

		auto *IntAddr = atomics.emitCastToAtomicIntPointer(NonSimpleTempAlloca);
		Builder.CreateAlignedStore(load, IntAddr, src.getAlignment().getQuantity());
		return atomics.convertTempToRValue(NonSimpleTempAlloca, resultSlot, loc);
}		}



/// Copy an r-value into memory as part of storing to an atomic type.		/// Copy an r-value into memory as part of storing to an atomic type.
/// This needs to create a bit-pattern suitable for atomic operations.		/// This needs to create a bit-pattern suitable for atomic operations.
void AtomicInfo::emitCopyIntoMemory(RValue rvalue, LValue dest) const {		void AtomicInfo::emitCopyIntoMemory(RValue rvalue) const {
		assert(LVal.isSimple());
// If we have an r-value, the rvalue should be of the atomic type,		// If we have an r-value, the rvalue should be of the atomic type,
// which means that the caller is responsible for having zeroed		// which means that the caller is responsible for having zeroed
// any padding. Just do an aggregate copy of that type.		// any padding. Just do an aggregate copy of that type.
if (rvalue.isAggregate()) {		if (rvalue.isAggregate()) {
CGF.EmitAggregateCopy(dest.getAddress(),		CGF.EmitAggregateCopy(LVal.getAddress(),
rvalue.getAggregateAddr(),		rvalue.getAggregateAddr(),
getAtomicType(),		getAtomicType(),
(rvalue.isVolatileQualified()		(rvalue.isVolatileQualified()
\|\| dest.isVolatileQualified()),		\|\| LVal.isVolatileQualified()),
dest.getAlignment());		LVal.getAlignment());
return;		return;
}		}

// Okay, otherwise we're copying stuff.		// Okay, otherwise we're copying stuff.

// Zero out the buffer if necessary.		// Zero out the buffer if necessary.
emitMemSetZeroIfNecessary(dest);		emitMemSetZeroIfNecessary();

// Drill past the padding if present.		// Drill past the padding if present.
dest = projectValue(dest);		LValue TempLVal = projectValue();

// Okay, store the rvalue in.		// Okay, store the rvalue in.
if (rvalue.isScalar()) {		if (rvalue.isScalar()) {
CGF.EmitStoreOfScalar(rvalue.getScalarVal(), dest, /init/ true);		CGF.EmitStoreOfScalar(rvalue.getScalarVal(), TempLVal, /init/ true);
} else {		} else {
CGF.EmitStoreOfComplex(rvalue.getComplexVal(), dest, /init/ true);		CGF.EmitStoreOfComplex(rvalue.getComplexVal(), TempLVal, /init/ true);
}		}
}		}


/// Materialize an r-value into memory for the purposes of storing it		/// Materialize an r-value into memory for the purposes of storing it
/// to an atomic type.		/// to an atomic type.
llvm::Value *AtomicInfo::materializeRValue(RValue rvalue) const {		llvm::Value *AtomicInfo::materializeRValue(RValue rvalue) const {
// Aggregate r-values are already in memory, and EmitAtomicStore		// Aggregate r-values are already in memory, and EmitAtomicStore
// requires them to be values of the atomic type.		// requires them to be values of the atomic type.
if (rvalue.isAggregate())		if (rvalue.isAggregate())
return rvalue.getAggregateAddr();		return rvalue.getAggregateAddr();

// Otherwise, make a temporary and materialize into it.		// Otherwise, make a temporary and materialize into it.
llvm::Value *temp = CGF.CreateMemTemp(getAtomicType(), "atomic-store-temp");		llvm::Value *temp = CGF.CreateMemTemp(getAtomicType(), "atomic-store-temp");
LValue tempLV = CGF.MakeAddrLValue(temp, getAtomicType(), getAtomicAlignment());		LValue tempLV =
emitCopyIntoMemory(rvalue, tempLV);		CGF.MakeAddrLValue(temp, getAtomicType(), getAtomicAlignment());
		AtomicInfo Atomics(CGF, tempLV);
		Atomics.emitCopyIntoMemory(rvalue);
return temp;		return temp;
}		}

llvm::Value *AtomicInfo::convertRValueToInt(RValue RVal) const {		llvm::Value *AtomicInfo::convertRValueToInt(RValue RVal) const {
// If we've got a scalar value of the right size, try to avoid going		// If we've got a scalar value of the right size, try to avoid going
// through memory.		// through memory.
if (RVal.isScalar() && !hasPadding()) {		if (RVal.isScalar() && !hasPadding()) {
llvm::Value *Value = RVal.getScalarVal();		llvm::Value *Value = RVal.getScalarVal();
Show All 29 Lines	void CodeGenFunction::EmitAtomicStore(RValue rvalue, LValue dest, bool isInit) {
assert(!rvalue.isAggregate() \|\|		assert(!rvalue.isAggregate() \|\|
rvalue.getAggregateAddr()->getType()->getPointerElementType()		rvalue.getAggregateAddr()->getType()->getPointerElementType()
== dest.getAddress()->getType()->getPointerElementType());		== dest.getAddress()->getType()->getPointerElementType());

AtomicInfo atomics(*this, dest);		AtomicInfo atomics(*this, dest);

// If this is an initialization, just put the value there normally.		// If this is an initialization, just put the value there normally.
if (isInit) {		if (isInit) {
atomics.emitCopyIntoMemory(rvalue, dest);		atomics.emitCopyIntoMemory(rvalue);
return;		return;
}		}

// Check whether we should use a library call.		// Check whether we should use a library call.
if (atomics.shouldUseLibcall()) {		if (atomics.shouldUseLibcall()) {
// Produce a source address.		// Produce a source address.
llvm::Value *srcAddr = atomics.materializeRValue(rvalue);		llvm::Value *srcAddr = atomics.materializeRValue(rvalue);

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
}		}

void CodeGenFunction::EmitAtomicInit(Expr *init, LValue dest) {		void CodeGenFunction::EmitAtomicInit(Expr *init, LValue dest) {
AtomicInfo atomics(*this, dest);		AtomicInfo atomics(*this, dest);

switch (atomics.getEvaluationKind()) {		switch (atomics.getEvaluationKind()) {
case TEK_Scalar: {		case TEK_Scalar: {
llvm::Value *value = EmitScalarExpr(init);		llvm::Value *value = EmitScalarExpr(init);
atomics.emitCopyIntoMemory(RValue::get(value), dest);		atomics.emitCopyIntoMemory(RValue::get(value));
return;		return;
}		}

case TEK_Complex: {		case TEK_Complex: {
ComplexPairTy value = EmitComplexExpr(init);		ComplexPairTy value = EmitComplexExpr(init);
atomics.emitCopyIntoMemory(RValue::getComplex(value), dest);		atomics.emitCopyIntoMemory(RValue::getComplex(value));
return;		return;
}		}

case TEK_Aggregate: {		case TEK_Aggregate: {
// Fix up the destination if the initializer isn't an expression		// Fix up the destination if the initializer isn't an expression
// of atomic type.		// of atomic type.
bool Zeroed = false;		bool Zeroed = false;
if (!init->getType()->isAtomicType()) {		if (!init->getType()->isAtomicType()) {
Zeroed = atomics.emitMemSetZeroIfNecessary(dest);		Zeroed = atomics.emitMemSetZeroIfNecessary();
dest = atomics.projectValue(dest);		dest = atomics.projectValue();
}		}

// Evaluate the expression directly into the destination.		// Evaluate the expression directly into the destination.
AggValueSlot slot = AggValueSlot::forLValue(dest,		AggValueSlot slot = AggValueSlot::forLValue(dest,
AggValueSlot::IsNotDestructed,		AggValueSlot::IsNotDestructed,
AggValueSlot::DoesNotNeedGCBarriers,		AggValueSlot::DoesNotNeedGCBarriers,
AggValueSlot::IsNotAliased,		AggValueSlot::IsNotAliased,
Zeroed ? AggValueSlot::IsZeroed :		Zeroed ? AggValueSlot::IsZeroed :
AggValueSlot::IsNotZeroed);		AggValueSlot::IsNotZeroed);

EmitAggExpr(init, slot);		EmitAggExpr(init, slot);
return;		return;
}		}
}		}
llvm_unreachable("bad evaluation kind");		llvm_unreachable("bad evaluation kind");
}		}

cfe/trunk/lib/CodeGen/CGStmtOpenMP.cpp

Show First 20 Lines • Show All 685 Lines • ▼ Show 20 Lines	CGM.getOpenMPRuntime().EmitOMPFlush(
}(),		}(),
S.getLocStart());		S.getLocStart());
}		}

void CodeGenFunction::EmitOMPOrderedDirective(const OMPOrderedDirective &) {		void CodeGenFunction::EmitOMPOrderedDirective(const OMPOrderedDirective &) {
llvm_unreachable("CodeGen for 'omp ordered' is not supported yet.");		llvm_unreachable("CodeGen for 'omp ordered' is not supported yet.");
}		}

void CodeGenFunction::EmitOMPAtomicDirective(const OMPAtomicDirective &) {		static llvm::Value *convertToScalarValue(CodeGenFunction &CGF, RValue Val,
llvm_unreachable("CodeGen for 'omp atomic' is not supported yet.");		QualType SrcType, QualType DestType) {
		assert(CGF.hasScalarEvaluationKind(DestType) &&
		"DestType must have scalar evaluation kind.");
		assert(!Val.isAggregate() && "Must be a scalar or complex.");
		return Val.isScalar()
		? CGF.EmitScalarConversion(Val.getScalarVal(), SrcType, DestType)
		: CGF.EmitComplexToScalarConversion(Val.getComplexVal(), SrcType,
		DestType);
		}

		static CodeGenFunction::ComplexPairTy
		convertToComplexValue(CodeGenFunction &CGF, RValue Val, QualType SrcType,
		QualType DestType) {
		assert(CGF.getEvaluationKind(DestType) == TEK_Complex &&
		"DestType must have complex evaluation kind.");
		CodeGenFunction::ComplexPairTy ComplexVal;
		if (Val.isScalar()) {
		// Convert the input element to the element type of the complex.
		auto DestElementType = DestType->castAs<ComplexType>()->getElementType();
		auto ScalarVal =
		CGF.EmitScalarConversion(Val.getScalarVal(), SrcType, DestElementType);
		ComplexVal = CodeGenFunction::ComplexPairTy(
		ScalarVal, llvm::Constant::getNullValue(ScalarVal->getType()));
		} else {
		assert(Val.isComplex() && "Must be a scalar or complex.");
		auto SrcElementType = SrcType->castAs<ComplexType>()->getElementType();
		auto DestElementType = DestType->castAs<ComplexType>()->getElementType();
		ComplexVal.first = CGF.EmitScalarConversion(
		Val.getComplexVal().first, SrcElementType, DestElementType);
		ComplexVal.second = CGF.EmitScalarConversion(
		Val.getComplexVal().second, SrcElementType, DestElementType);
		}
		return ComplexVal;
		}

		static void EmitOMPAtomicReadExpr(CodeGenFunction &CGF, bool IsSeqCst,
		const Expr X, const Expr V,
		SourceLocation Loc) {
		// v = x;
		assert(V->isLValue() && "V of 'omp atomic read' is not lvalue");
		assert(X->isLValue() && "X of 'omp atomic read' is not lvalue");
		LValue XLValue = CGF.EmitLValue(X);
		LValue VLValue = CGF.EmitLValue(V);
		RValue Res = XLValue.isGlobalReg() ? CGF.EmitLoadOfLValue(XLValue, Loc)
		: CGF.EmitAtomicLoad(XLValue, Loc);
		// OpenMP, 2.12.6, atomic Construct
		// Any atomic construct with a seq_cst clause forces the atomically
		// performed operation to include an implicit flush operation without a
		// list.
		if (IsSeqCst)
		CGF.CGM.getOpenMPRuntime().EmitOMPFlush(CGF, llvm::None, Loc);
		switch (CGF.getEvaluationKind(V->getType())) {
		case TEK_Scalar:
		CGF.EmitStoreOfScalar(
		convertToScalarValue(CGF, Res, X->getType(), V->getType()), VLValue);
		break;
		case TEK_Complex:
		CGF.EmitStoreOfComplex(
		convertToComplexValue(CGF, Res, X->getType(), V->getType()), VLValue,
		/isInit=/false);
		break;
		case TEK_Aggregate:
		llvm_unreachable("Must be a scalar or complex.");
		}
		}

		static void EmitOMPAtomicExpr(CodeGenFunction &CGF, OpenMPClauseKind Kind,
		bool IsSeqCst, const Expr X, const Expr V,
		const Expr *, SourceLocation Loc) {
		switch (Kind) {
		case OMPC_read:
		EmitOMPAtomicReadExpr(CGF, IsSeqCst, X, V, Loc);
		break;
		case OMPC_write:
		case OMPC_update:
		case OMPC_capture:
		llvm_unreachable("CodeGen for 'omp atomic clause' is not supported yet.");
		case OMPC_if:
		case OMPC_final:
		case OMPC_num_threads:
		case OMPC_private:
		case OMPC_firstprivate:
		case OMPC_lastprivate:
		case OMPC_reduction:
		case OMPC_safelen:
		case OMPC_collapse:
		case OMPC_default:
		case OMPC_seq_cst:
		case OMPC_shared:
		case OMPC_linear:
		case OMPC_aligned:
		case OMPC_copyin:
		case OMPC_copyprivate:
		case OMPC_flush:
		case OMPC_proc_bind:
		case OMPC_schedule:
		case OMPC_ordered:
		case OMPC_nowait:
		case OMPC_untied:
		case OMPC_threadprivate:
		case OMPC_mergeable:
		case OMPC_unknown:
		llvm_unreachable("Clause is not allowed in 'omp atomic'.");
		}
		}

		void CodeGenFunction::EmitOMPAtomicDirective(const OMPAtomicDirective &S) {
		bool IsSeqCst = S.getSingleClause(/K=/OMPC_seq_cst);
		OpenMPClauseKind Kind = OMPC_unknown;
		for (auto *C : S.clauses()) {
		// Find first clause (skip seq_cst clause, if it is first).
		if (C->getClauseKind() != OMPC_seq_cst) {
		Kind = C->getClauseKind();
		break;
		}
		}
		EmitOMPAtomicExpr(*this, Kind, IsSeqCst, S.getX(), S.getV(), S.getExpr(),
		S.getLocStart());
}		}

void CodeGenFunction::EmitOMPTargetDirective(const OMPTargetDirective &) {		void CodeGenFunction::EmitOMPTargetDirective(const OMPTargetDirective &) {
llvm_unreachable("CodeGen for 'omp target' is not supported yet.");		llvm_unreachable("CodeGen for 'omp target' is not supported yet.");
}		}

void CodeGenFunction::EmitOMPTeamsDirective(const OMPTeamsDirective &) {		void CodeGenFunction::EmitOMPTeamsDirective(const OMPTeamsDirective &) {
llvm_unreachable("CodeGen for 'omp teams' is not supported yet.");		llvm_unreachable("CodeGen for 'omp teams' is not supported yet.");
}		}

cfe/trunk/lib/Sema/SemaType.cpp

Show First 20 Lines • Show All 2,710 Lines • ▼ Show 20 Lines	case DeclaratorChunk::Function: {

// Check for auto functions and trailing return type and adjust the		// Check for auto functions and trailing return type and adjust the
// return type accordingly.		// return type accordingly.
if (!D.isInvalidType()) {		if (!D.isInvalidType()) {
// trailing-return-type is only required if we're declaring a function,		// trailing-return-type is only required if we're declaring a function,
// and not, for instance, a pointer to a function.		// and not, for instance, a pointer to a function.
if (D.getDeclSpec().containsPlaceholderType() &&		if (D.getDeclSpec().containsPlaceholderType() &&
!FTI.hasTrailingReturnType() && chunkIndex == 0 &&		!FTI.hasTrailingReturnType() && chunkIndex == 0 &&
!S.getLangOpts().CPlusPlus14) {		!S.getLangOpts().CPlusPlus14 && !S.getLangOpts().MSVCCompat) {
S.Diag(D.getDeclSpec().getTypeSpecTypeLoc(),		S.Diag(D.getDeclSpec().getTypeSpecTypeLoc(),
D.getDeclSpec().getTypeSpecType() == DeclSpec::TST_auto		D.getDeclSpec().getTypeSpecType() == DeclSpec::TST_auto
? diag::err_auto_missing_trailing_return		? diag::err_auto_missing_trailing_return
: diag::err_deduced_return_type);		: diag::err_deduced_return_type);
T = Context.IntTy;		T = Context.IntTy;
D.setInvalidType(true);		D.setInvalidType(true);
} else if (FTI.hasTrailingReturnType()) {		} else if (FTI.hasTrailingReturnType()) {
// T must be exactly 'auto' at this point. See CWG issue 681.		// T must be exactly 'auto' at this point. See CWG issue 681.
▲ Show 20 Lines • Show All 2,900 Lines • Show Last 20 Lines

cfe/trunk/test/OpenMP/atomic_read_codegen.c

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp=libiomp5 -x c -emit-llvm %s -o - \| FileCheck %s
				// RUN: %clang_cc1 -fopenmp=libiomp5 -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
				// RUN: %clang_cc1 -fopenmp=libiomp5 -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck %s
				// expected-no-diagnostics

				#ifndef HEADER
				#define HEADER

				_Bool bv, bx;
				char cv, cx;
				unsigned char ucv, ucx;
				short sv, sx;
				unsigned short usv, usx;
				int iv, ix;
				unsigned int uiv, uix;
				long lv, lx;
				unsigned long ulv, ulx;
				long long llv, llx;
				unsigned long long ullv, ullx;
				float fv, fx;
				double dv, dx;
				long double ldv, ldx;
				_Complex int civ, cix;
				_Complex float cfv, cfx;
				_Complex double cdv, cdx;

				typedef int int4 __attribute__((__vector_size__(16)));
				int4 int4x;

				struct BitFields {
				int : 32;
				int a : 31;
				} bfx;

				struct BitFields_packed {
				int : 32;
				int a : 31;
				} __attribute__ ((__packed__)) bfx_packed;

				struct BitFields2 {
				int : 31;
				int a : 1;
				} bfx2;

				struct BitFields2_packed {
				int : 31;
				int a : 1;
				} __attribute__ ((__packed__)) bfx2_packed;

				struct BitFields3 {
				int : 11;
				int a : 14;
				} bfx3;

				struct BitFields3_packed {
				int : 11;
				int a : 14;
				} __attribute__ ((__packed__)) bfx3_packed;

				struct BitFields4 {
				short : 16;
				int a: 1;
				long b : 7;
				} bfx4;

				struct BitFields4_packed {
				short : 16;
				int a: 1;
				long b : 7;
				} __attribute__ ((__packed__)) bfx4_packed;

				typedef float float2 __attribute__((ext_vector_type(2)));
				float2 float2x;

				register int rix __asm__("0");

				int main() {
				// CHECK: load atomic i8*
				// CHECK: store i8
				#pragma omp atomic read
				bv = bx;
				// CHECK: load atomic i8*
				// CHECK: store i8
				#pragma omp atomic read
				cv = cx;
				// CHECK: load atomic i8*
				// CHECK: store i8
				#pragma omp atomic read
				ucv = ucx;
				// CHECK: load atomic i16*
				// CHECK: store i16
				#pragma omp atomic read
				sv = sx;
				// CHECK: load atomic i16*
				// CHECK: store i16
				#pragma omp atomic read
				usv = usx;
				// CHECK: load atomic i32*
				// CHECK: store i32
				#pragma omp atomic read
				iv = ix;
				// CHECK: load atomic i32*
				// CHECK: store i32
				#pragma omp atomic read
				uiv = uix;
				// CHECK: load atomic i64*
				// CHECK: store i64
				#pragma omp atomic read
				lv = lx;
				// CHECK: load atomic i64*
				// CHECK: store i64
				#pragma omp atomic read
				ulv = ulx;
				// CHECK: load atomic i64*
				// CHECK: store i64
				#pragma omp atomic read
				llv = llx;
				// CHECK: load atomic i64*
				// CHECK: store i64
				#pragma omp atomic read
				ullv = ullx;
				// CHECK: load atomic i32* bitcast (float*
				// CHECK: bitcast i32 {{.*}} to float
				// CHECK: store float
				#pragma omp atomic read
				fv = fx;
				// CHECK: load atomic i64* bitcast (double*
				// CHECK: bitcast i64 {{.*}} to double
				// CHECK: store double
				#pragma omp atomic read
				dv = dx;
				// CHECK: [[LD:%.+]] = load atomic i128* bitcast (x86_fp80*
				// CHECK: [[BITCAST:%.+]] = bitcast x86_fp80* [[LDTEMP:%.]] to i128
				// CHECK: store i128 [[LD]], i128* [[BITCAST]]
				// CHECK: [[LD:%.+]] = load x86_fp80* [[LDTEMP]]
				// CHECK: store x86_fp80 [[LD]]
				#pragma omp atomic read
				ldv = ldx;
				// CHECK: call{{.*}} void @__atomic_load(i64 8,
				// CHECK: store i32
				// CHECK: store i32
				#pragma omp atomic read
				civ = cix;
				// CHECK: call{{.*}} void @__atomic_load(i64 8,
				// CHECK: store float
				// CHECK: store float
				#pragma omp atomic read
				cfv = cfx;
				// CHECK: call{{.*}} void @__atomic_load(i64 16,
				// CHECK: call{{.*}} @__kmpc_flush(
				// CHECK: store double
				// CHECK: store double
				#pragma omp atomic seq_cst read
				cdv = cdx;
				// CHECK: load atomic i64*
				// CHECK: store i8
				#pragma omp atomic read
				bv = ulx;
				// CHECK: load atomic i8*
				// CHECK: store i8
				#pragma omp atomic read
				cv = bx;
				// CHECK: load atomic i8*
				// CHECK: call{{.*}} @__kmpc_flush(
				// CHECK: store i8
				#pragma omp atomic read, seq_cst
				ucv = cx;
				// CHECK: load atomic i64*
				// CHECK: store i16
				#pragma omp atomic read
				sv = ulx;
				// CHECK: load atomic i64*
				// CHECK: store i16
				#pragma omp atomic read
				usv = lx;
				// CHECK: load atomic i32*
				// CHECK: call{{.*}} @__kmpc_flush(
				// CHECK: store i32
				#pragma omp atomic seq_cst, read
				iv = uix;
				// CHECK: load atomic i32*
				// CHECK: store i32
				#pragma omp atomic read
				uiv = ix;
				// CHECK: call{{.*}} void @__atomic_load(i64 8,
				// CHECK: store i64
				#pragma omp atomic read
				lv = cix;
				// CHECK: load atomic i32*
				// CHECK: store i64
				#pragma omp atomic read
				ulv = fx;
				// CHECK: load atomic i64*
				// CHECK: store i64
				#pragma omp atomic read
				llv = dx;
				// CHECK: load atomic i128*
				// CHECK: store i64
				#pragma omp atomic read
				ullv = ldx;
				// CHECK: call{{.*}} void @__atomic_load(i64 8,
				// CHECK: store float
				#pragma omp atomic read
				fv = cix;
				// CHECK: load atomic i16*
				// CHECK: store double
				#pragma omp atomic read
				dv = sx;
				// CHECK: load atomic i8*
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bx;
				// CHECK: load atomic i8*
				// CHECK: store i32
				// CHECK: store i32
				#pragma omp atomic read
				civ = bx;
				// CHECK: load atomic i16*
				// CHECK: store float
				// CHECK: store float
				#pragma omp atomic read
				cfv = usx;
				// CHECK: load atomic i64*
				// CHECK: store double
				// CHECK: store double
				#pragma omp atomic read
				cdv = llx;
				// CHECK: [[I128VAL:%.+]] = load atomic i128* bitcast (<4 x i32>* @{{.+}} to i128*) seq_cst
				// CHECK: [[I128PTR:%.+]] = bitcast <4 x i32>* [[LDTEMP:%.+]] to i128*
				// CHECK: store i128 [[I128VAL]], i128* [[I128PTR]]
				// CHECK: [[LD:%.+]] = load <4 x i32>* [[LDTEMP]]
				// CHECK: extractelement <4 x i32> [[LD]]
				// CHECK: store i8
				#pragma omp atomic read
				bv = int4x[0];
				// CHECK: [[LD:%.+]] = load atomic i32* bitcast (i8* getelementptr (i8* bitcast (%{{.+}}* @{{.+}} to i8), i64 4) to i32) seq_cst
				// CHECK: store i32 [[LD]], i32* [[LDTEMP:%.+]]
				// CHECK: [[LD:%.+]] = load i32* [[LDTEMP]]
				// CHECK: [[SHL:%.+]] = shl i32 [[LD]], 1
				// CHECK: ashr i32 [[SHL]], 1
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx.a;
				// CHECK: [[LDTEMP_VOID_PTR:%.+]] = bitcast i32* [[LDTEMP:%.+]] to i8*
				// CHECK: call void @__atomic_load(i64 4, i8* getelementptr (i8* bitcast (%struct.BitFields_packed* @bfx_packed to i8), i64 4), i8 [[LDTEMP_VOID_PTR]], i32 5)
				// CHECK: [[LD:%.+]] = load i32* [[LDTEMP]]
				// CHECK: [[SHL:%.+]] = shl i32 [[LD]], 1
				// CHECK: ashr i32 [[SHL]], 1
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx_packed.a;
				// CHECK: [[LD:%.+]] = load atomic i32* getelementptr inbounds (%struct.BitFields2* @bfx2, i32 0, i32 0) seq_cst
				// CHECK: store i32 [[LD]], i32* [[LDTEMP:%.+]]
				// CHECK: [[LD:%.+]] = load i32* [[LDTEMP]]
				// CHECK: ashr i32 [[LD]], 31
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx2.a;
				// CHECK: [[LD:%.+]] = load atomic i8* getelementptr (i8* bitcast (%struct.BitFields2_packed* @bfx2_packed to i8*), i64 3) seq_cst
				// CHECK: store i8 [[LD]], i8* [[LDTEMP:%.+]]
				// CHECK: [[LD:%.+]] = load i8* [[LDTEMP]]
				// CHECK: ashr i8 [[LD]], 7
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx2_packed.a;
				// CHECK: [[LD:%.+]] = load atomic i32* getelementptr inbounds (%struct.BitFields3* @bfx3, i32 0, i32 0) seq_cst
				// CHECK: store i32 [[LD]], i32* [[LDTEMP:%.+]]
				// CHECK: [[LD:%.+]] = load i32* [[LDTEMP]]
				// CHECK: [[SHL:%.+]] = shl i32 [[LD]], 7
				// CHECK: ashr i32 [[SHL]], 18
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx3.a;
				// CHECK: [[LDTEMP_VOID_PTR:%.+]] = bitcast i24* [[LDTEMP:%.+]] to i8*
				// CHECK: call void @__atomic_load(i64 3, i8* getelementptr (i8* bitcast (%struct.BitFields3_packed* @bfx3_packed to i8), i64 1), i8 [[LDTEMP_VOID_PTR]], i32 5)
				// CHECK: [[LD:%.+]] = load i24* [[LDTEMP]]
				// CHECK: [[SHL:%.+]] = shl i24 [[LD]], 7
				// CHECK: [[ASHR:%.+]] = ashr i24 [[SHL]], 10
				// CHECK: sext i24 [[ASHR]] to i32
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx3_packed.a;
				// CHECK: [[LD:%.+]] = load atomic i64* bitcast (%struct.BitFields4* @bfx4 to i64*) seq_cst
				// CHECK: store i64 [[LD]], i64* [[LDTEMP:%.+]]
				// CHECK: [[LD:%.+]] = load i64* [[LDTEMP]]
				// CHECK: [[SHL:%.+]] = shl i64 [[LD]], 47
				// CHECK: [[ASHR:%.+]] = ashr i64 [[SHL]], 63
				// CHECK: trunc i64 [[ASHR]] to i32
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx4.a;
				// CHECK: [[LD:%.+]] = load atomic i8* getelementptr inbounds (%struct.BitFields4_packed* @bfx4_packed, i32 0, i32 0, i64 2) seq_cst
				// CHECK: store i8 [[LD]], i8* [[LDTEMP:%.+]]
				// CHECK: [[LD:%.+]] = load i8* [[LDTEMP]]
				// CHECK: [[SHL:%.+]] = shl i8 [[LD]], 7
				// CHECK: [[ASHR:%.+]] = ashr i8 [[SHL]], 7
				// CHECK: sext i8 [[ASHR]] to i32
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx4_packed.a;
				// CHECK: [[LD:%.+]] = load atomic i64* bitcast (%struct.BitFields4* @bfx4 to i64*) seq_cst
				// CHECK: store i64 [[LD]], i64* [[LDTEMP:%.+]]
				// CHECK: [[LD:%.+]] = load i64* [[LDTEMP]]
				// CHECK: [[SHL:%.+]] = shl i64 [[LD]], 40
				// CHECK: [[ASHR:%.+]] = ashr i64 [[SHL]], 57
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx4.b;
				// CHECK: [[LD:%.+]] = load atomic i8* getelementptr inbounds (%struct.BitFields4_packed* @bfx4_packed, i32 0, i32 0, i64 2) seq_cst
				// CHECK: store i8 [[LD]], i8* [[LDTEMP:%.+]]
				// CHECK: [[LD:%.+]] = load i8* [[LDTEMP]]
				// CHECK: [[ASHR:%.+]] = ashr i8 [[LD]], 1
				// CHECK: sext i8 [[ASHR]] to i64
				// CHECK: store x86_fp80
				#pragma omp atomic read
				ldv = bfx4_packed.b;
				// CHECK: [[LD:%.+]] = load atomic i32* bitcast (<2 x float>* @{{.+}} to i32*) seq_cst
				// CHECK: [[BITCAST:%.+]] = bitcast <2 x float>* [[LDTEMP:%.+]] to i32*
				// CHECK: store i32 [[LD]], i32* [[BITCAST]]
				// CHECK: [[LD:%.+]] = load <2 x float>* [[LDTEMP]]
				// CHECK: extractelement <2 x float> [[LD]]
				// CHECK: store i64
				#pragma omp atomic read
				ulv = float2x.x;
				// CHECK: call{{.*}} i{{[0-9]+}} @llvm.read_register
				// CHECK: call{{.*}} @__kmpc_flush(
				// CHECK: store double
				#pragma omp atomic read seq_cst
				dv = rix;
				return 0;
				}

				#endif

This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.ClosedPublic

Details

Diff Detail

Event Timeline

Alexey Bataev

+ case TEK_Scalar:

Alexey Bataev

Alexey Bataev

Alexey Bataev

+

Alexey Bataev

Alexey Bataev

+ auto VoidPtrAddr = CGF.EmitCastToVoidPtr(lvalue.getBitFieldAddr());

+ LVal = LValue::MakeBitfield(Addr, BFI, lvalue.getType(),

+ } else if (lvalue.isVectorElt()) {

Revision Contents

Diff 18582

cfe/trunk/lib/CodeGen/CGAtomic.cpp

cfe/trunk/lib/CodeGen/CGStmtOpenMP.cpp

cfe/trunk/lib/Sema/SemaType.cpp

cfe/trunk/test/OpenMP/atomic_read_codegen.c

[OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.
ClosedPublic