Page MenuHomePhabricator

[OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.
ClosedPublic

Authored by ABataev on Nov 27 2014, 12:54 AM.

Details

Summary

"omp atomic read [seq_cst]" accepts expressions "v=x;". In this patch we perform an atomic load of "x" (using builtin atomic loading instructions or a call to "atomic_load()" for simple lvalues and "kmpc_atomic_start();load <x>;kmpc_atomic_end();" for other lvalues), convert the result of loading to type of "v" (using EmitScalarConversion() for simple types and EmitComplexToScalarConversion() for conversions from complex to scalar) and then store the result in "v".

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev updated this revision to Diff 16670.Nov 27 2014, 12:54 AM
ABataev retitled this revision from to [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive..
ABataev updated this object.
ABataev edited the test plan for this revision. (Show Details)
ABataev added a subscriber: Unknown Object (MLST).
rjmccall added inline comments.Dec 1 2014, 12:11 AM
lib/CodeGen/CGAtomic.cpp
44 ↗(On Diff #16670)

Test once with getAs, please.

72 ↗(On Diff #16670)

A more general thing to do here is to ask the TargetInfo if there are lockless atomics at a particular bit-width and alignment. You can implement that function this way, but let it make the power-of-two assumption, rather than scattered places in IRGen.

979 ↗(On Diff #16670)

I think it'd be cleaner to just have all the success cases return instead of re-assigning result.

lib/CodeGen/CGStmtOpenMP.cpp
563 ↗(On Diff #16670)

Somewhere in this code, you should document the ABI requirements here, which appear to basically be:

  • bitfield and vector element lvalues always go through the OpenMP locking path
  • all other lvalues use the target's rules for atomics of the given size

Are you planning to use the global lock for anything else besides non-simple lvalues?

567 ↗(On Diff #16670)

All of the interesting cases here should be separate functions (as you add them).

591 ↗(On Diff #16670)

I think you'll want this functionality for some of the other atomic ops anyway, so you might as well make this a separate function that you can call like this:

llvm::Value *ScalarVal = convertToScalarValue(CGF, Res, X->getType(), V->getType());

It's kindof a shame that you have to redo all of this instead of just relying on the implicit casts that Sema already created when it analyzed the expression, but maybe this is better than messing around with OpaqueValueExprs, and it works because you don't have a requirement to handle custom operators.

608 ↗(On Diff #16670)

Same thing: go ahead and make a separate function for this that goes from an RValue to a ComplexPairTy.

Also, the r-value will tell you whether it's a scalar/complex/aggregate, you don't need to (somewhat expensively) recompute that.

652 ↗(On Diff #16670)

You don't need a break after llvm_unreachable.

679 ↗(On Diff #16670)

Same.

Hi John,
Thanks a lot for the review right after holidays!!! :) See my comments, especially about global locks.

lib/CodeGen/CGAtomic.cpp
44 ↗(On Diff #16670)

Ok, agree

72 ↗(On Diff #16670)

Yes, I'll add the function to TargetInfo.

979 ↗(On Diff #16670)

Ok, reworked

lib/CodeGen/CGStmtOpenMP.cpp
563 ↗(On Diff #16670)
  1. Ok, added.
  2. Yes, it may be used for some complex atomic operations, which cannot be implemented using some trivial operations (like user-defined reductions).
567 ↗(On Diff #16670)

Agree

591 ↗(On Diff #16670)
  1. Agree, I'll rework this.
  2. Yes, I tried to implement it using some hooks/hacks in AST, but there was a lot of troubles with atomic ops.
  3. We will have to support some custom operators in future (in OpenMP 4.0 reductions introduces user-defined reductions, which must be called as atomic ops and I plan to use global locks for them).
608 ↗(On Diff #16670)

Ok

652 ↗(On Diff #16670)

My bad, removed

679 ↗(On Diff #16670)

Removed

ABataev updated this revision to Diff 16759.Dec 1 2014, 4:02 AM

Update after review

rjmccall added inline comments.Dec 1 2014, 10:44 AM
include/clang/Basic/TargetInfo.h
379 ↗(On Diff #16759)

That's just getCharWidth() but more expensive.

lib/CodeGen/CGStmtOpenMP.cpp
603 ↗(On Diff #16759)

I think you missed this comment from my review — please make the OMPC_read case its own function.

591 ↗(On Diff #16670)

You can't mix locking and non-locking atomics on the same object: they'll be atomic with respect to themselves, but not with respect to each other. That is, assuming that atomic_start and atomic_end aren't implemented by halting all other OpenMP threads.

e.g. imagine that you have a *= that's implemented with a compare-and-swap loop and a custom reduction that you've implemented with the global lock. The global lock doesn't keep the compare-and-swap from firing during the execution of the custom reduction, so (1) different parts of the custom reduction might see different values for the original l-value and (2) the custom reduction will completely overwrite the *= rather than appearing to execute before or after.

In fact, you have a similar problem with aggregates, where an atomic access to an aggregate (e.g. a std::pair<float,float>) cannot be made atomic with respect to an atomic access to a subobject unless the aggregate will be accessed locklessly. (You could do *some* operations locklessly if you wrote them very carefully, e.g. reads and writes, but I don't know of any way to do an aggregate compare-and-swap that's atomic with a subobject compare-and-swap when the first requires a lock and the second doesn't.) That's mostly defined away by the current requirement that the l-value have scalar type, except that you do support _Complex l-values right now, and 32-bit platforms generally can't make _Complex double lockless. The reverse problem applies to vectors: a 16-byte vector will generally be sufficiently aligned that a 64-bit platform can access it locklessly, but you're implementing some accesses to subobjects with locks: specifically, vector projections that create compound l-values.

This is really a specification problem: OpenMP wants simple operations to be implementable with native atomics, but that really restricts the set of operations you can do unless you can assume at a high level that there are never partially overlapping atomic operations. The easiest language solution is to say that, for the purposes of OpenMP's atomics, _Complex and vector types don't count as scalars. But I don't know if that would fly — it might be a goal to loosen the restrictions about what types you can use in atomic operations.

ABataev added inline comments.Dec 1 2014, 11:34 PM
include/clang/Basic/TargetInfo.h
379 ↗(On Diff #16759)

Fixed

lib/CodeGen/CGStmtOpenMP.cpp
603 ↗(On Diff #16759)

Yes, I missed this. Ok, I'll do.

591 ↗(On Diff #16670)

I agree with you, currently the code is not quite correct. I think we can resolve this problem by emitting OpenMP specific locks for target supported lockfree atomic operations. But in this case we don't need target atomic ops at all, we can rely on runtime library interface only.

rjmccall added inline comments.Dec 2 2014, 9:24 AM
lib/CodeGen/CGStmtOpenMP.cpp
591 ↗(On Diff #16670)

You mean, always grabbing a lock instead of ever using lock-free operations? I agree that that's the most general solution, but it's really heavy-handed.

It's possible that you can still do user-defined reductions with a compare-and-swap loop, depending on how they're defined. (Specifically: you have to be allowed to try the reduction multiple times, and the reduction's only allowed to access a single atomic value. The existing specification about the #pragma tries to enforce those conditions even for the simple cases allowed there, so I'm optimistic that whoever is writing your spec is at least thinking about these problems.)

What you really need here is a statement about subobjects. The C11/C++11 atomic types are very nice because they tell you exactly at what level something is atomic: you can have an std::atomic<std::pair<int,int>>, but that type doesn't provide accessors for atomically accessing the individual members of the pair; you can't just project out a std::atomic<int>& from the first element. That's really important, because it tells you locally that all atomic accesses are going to reference the full, 64-byte aggregate, which means you can agree very easily on a protocol for that access.

OpenMP doesn't tie into the type system that way, but you can still impose that rule at a high level by saying that it's undefined behavior for atomic accesses to an aggregate to race with atomic accesses to a subobject. For example, you can have an atomic operation on a _Complex float, and you can have a simultaneous atomic operation on a float, but the fact that they're simultaneous means that we can assume they don't alias. That still lets us decide on the atomic access pattern based purely on the type of the atomic l-value.

ABataev added inline comments.Dec 15 2014, 1:32 AM
lib/CodeGen/CGStmtOpenMP.cpp
591 ↗(On Diff #16670)

John, I decided to use lock free operations on simple lvalues, but for other lvalues I'm going to use global lock provided by the runtime. I think in this case we can avoid conflicts conflicts. What do you think about this solution?

rjmccall added inline comments.Dec 15 2014, 10:16 AM
lib/CodeGen/CGStmtOpenMP.cpp
591 ↗(On Diff #16670)

That isn't good enough. Locking solutions do not allow lock-free access to scalar subobjects, and you can't reliably tell from a scalar access whether it's a subobject of something. You need a language rule that says that you can never have simultaneous atomic accesses to both an aggregate (including vectors and complex values) and its subobjects.

Once you have that language rule, then using a global lock vs. a lock-free patterns for non-simple vs. simple l-values is fine, as long as all of the user-defined reductions you'll need to implement later are implementable with lock-free compare-and-swap.

Yes, agree. Then, probably, it is better just to disable support for
some of lvalues.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

15.12.2014 21:16, John McCall пишет:

================
Comment at: lib/CodeGen/CGStmtOpenMP.cpp:591
@@ +590,3 @@
+ llvm::Value *ScalarVal;
+ switch (CGF.getEvaluationKind(X->getType())) {
+ case TEK_Scalar:


ABataev wrote:

rjmccall wrote:

ABataev wrote:

rjmccall wrote:

ABataev wrote:

rjmccall wrote:

I think you'll want this functionality for some of the other atomic ops anyway, so you might as well make this a separate function that you can call like this:

llvm::Value *ScalarVal = convertToScalarValue(CGF, Res, X->getType(), V->getType());

It's kindof a shame that you have to redo all of this instead of just relying on the implicit casts that Sema already created when it analyzed the expression, but maybe this is better than messing around with OpaqueValueExprs, and it works because you don't have a requirement to handle custom operators.

  1. Agree, I'll rework this.
  2. Yes, I tried to implement it using some hooks/hacks in AST, but there was a lot of troubles with atomic ops.
  3. We will have to support some custom operators in future (in OpenMP 4.0 reductions introduces user-defined reductions, which must be called as atomic ops and I plan to use global locks for them).

You can't mix locking and non-locking atomics on the same object: they'll be atomic with respect to themselves, but not with respect to each other. That is, assuming that atomic_start and atomic_end aren't implemented by halting all other OpenMP threads.

e.g. imagine that you have a *= that's implemented with a compare-and-swap loop and a custom reduction that you've implemented with the global lock. The global lock doesn't keep the compare-and-swap from firing during the execution of the custom reduction, so (1) different parts of the custom reduction might see different values for the original l-value and (2) the custom reduction will completely overwrite the *= rather than appearing to execute before or after.

In fact, you have a similar problem with aggregates, where an atomic access to an aggregate (e.g. a std::pair<float,float>) cannot be made atomic with respect to an atomic access to a subobject unless the aggregate will be accessed locklessly. (You could do *some* operations locklessly if you wrote them very carefully, e.g. reads and writes, but I don't know of any way to do an aggregate compare-and-swap that's atomic with a subobject compare-and-swap when the first requires a lock and the second doesn't.) That's mostly defined away by the current requirement that the l-value have scalar type, except that you do support _Complex l-values right now, and 32-bit platforms generally can't make _Complex double lockless. The reverse problem applies to vectors: a 16-byte vector will generally be sufficiently aligned that a 64-bit platform can access it locklessly, but you're implementing some accesses to subobjects with locks: specifically, vector projections that create compound l-values.

This is really a specification problem: OpenMP wants simple operations to be implementable with native atomics, but that really restricts the set of operations you can do unless you can assume at a high level that there are never partially overlapping atomic operations. The easiest language solution is to say that, for the purposes of OpenMP's atomics, _Complex and vector types don't count as scalars. But I don't know if that would fly — it might be a goal to loosen the restrictions about what types you can use in atomic operations.

I agree with you, currently the code is not quite correct. I think we can resolve this problem by emitting OpenMP specific locks for target supported lockfree atomic operations. But in this case we don't need target atomic ops at all, we can rely on runtime library interface only.

You mean, always grabbing a lock instead of ever using lock-free operations? I agree that that's the most general solution, but it's really heavy-handed.

It's possible that you can still do user-defined reductions with a compare-and-swap loop, depending on how they're defined. (Specifically: you have to be allowed to try the reduction multiple times, and the reduction's only allowed to access a single atomic value. The existing specification about the #pragma tries to enforce those conditions even for the simple cases allowed there, so I'm optimistic that whoever is writing your spec is at least thinking about these problems.)

What you really need here is a statement about subobjects. The C11/C++11 atomic types are very nice because they tell you exactly at what level something is atomic: you can have an std::atomic<std::pair<int,int>>, but that type doesn't provide accessors for atomically accessing the individual members of the pair; you can't just project out a std::atomic<int>& from the first element. That's really important, because it tells you locally that all atomic accesses are going to reference the full, 64-byte aggregate, which means you can agree very easily on a protocol for that access.

OpenMP doesn't tie into the type system that way, but you can still impose that rule at a high level by saying that it's undefined behavior for atomic accesses to an aggregate to race with atomic accesses to a subobject. For example, you can have an atomic operation on a _Complex float, and you can have a simultaneous atomic operation on a float, but the fact that they're simultaneous means that we can assume they don't alias. That still lets us decide on the atomic access pattern based purely on the type of the atomic l-value.

John, I decided to use lock free operations on simple lvalues, but for other lvalues I'm going to use global lock provided by the runtime. I think in this case we can avoid conflicts conflicts. What do you think about this solution?

That isn't good enough. Locking solutions do not allow lock-free access to scalar subobjects, and you can't reliably tell from a scalar access whether it's a subobject of something. You need a language rule that says that you can never have simultaneous atomic accesses to both an aggregate (including vectors and complex values) and its subobjects.

Once you have that language rule, then using a global lock vs. a lock-free patterns for non-simple vs. simple l-values is fine, as long as all of the user-defined reductions you'll need to implement later are implementable with lock-free compare-and-swap.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/
rjmccall edited edge metadata.Dec 16 2014, 10:23 AM

Okay. So where does that leave us with this patch? The minimal thing is to just wait for direction from the OpenMP language committee, or we can optimistically assume that we get that rule and then build the implementation around that assumption.

Note also that you need to be compatible with whatever GCC is doing here, assuming you're trying to guarantee compiler interop.

I discussed this problem with the author of atomics in OpenMP already. I
hope to get the answer from him tomorrow.
PS. I'll try to investigate this problem a little bit more, maybe I'll
come into some suitable solution. I have an idea in mind, but I need to
play with it a little bit.
PPS. gcc uses global lock for all atomic operations and that's all. In
icc there is a little bit more complex solution. I want to use an
existing infrastructure of clang/LLVM.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

16.12.2014 21:23, John McCall пишет:

Okay. So where does that leave us with this patch? The minimal thing is to just wait for direction from the OpenMP language committee, or we can optimistically assume that we get that rule and then build the implementation around that assumption.

Note also that you need to be compatible with whatever GCC is doing here, assuming you're trying to guarantee compiler interop.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/

If GCC always surrounds accesses in a global lock from the OpenMP runtime, then you will need to do the same in clang for compatibility unless you've decided not to care about GCC compatibility. I guess you could provide some sort of -fno-openmp-gcc-compatibility flag if you want.

I don't think we should follow GCC rules, because currently we're using
libiomp5 runtime interface, not gomp. Gomp compatible interface can be
implemented later

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

17.12.2014 6:44, John McCall пишет:

If GCC always surrounds accesses in a global lock from the OpenMP runtime, then you will need to do the same in clang for compatibility unless you've decided not to care about GCC compatibility. I guess you could provide some sort of -fno-openmp-gcc-compatibility flag if you want.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/

Okay, so if it's not GCC, who exactly is already using libiomp5? On a previous patch, I was told that we had to maintain compatibility with older runtimes, and it doesn't make sense to me that we have to support interoperation with old runtimes but not with generated code for any particular compiler that uses it. So you really need to look to see how those compilers emit atomics. That is, unless you're willing to break ABI with them as well, in which case we're effectively designing a new ABI, so why do we care about old runtimes at all?

I agree that it would be awful to be stuck using a global lock for all atomic operations, so if you *are* willing to break ABI, that's great.

ABataev updated this revision to Diff 17532.Dec 20 2014, 11:49 PM
ABataev edited edge metadata.

Fixed codegen for atomic load for bitfield, vector element and ext vector element lvalues: __atomic_load() builtin is used for loading of the whole part of lvalue and then regular processing of bitfield or vector element is applied to this loaded value. Lvalues for global registers are still loaded using global lock.
The same scheme is supposed to be used for all other atomic operations

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Bitfields are interesting because IRGen actually uses larger-than-strictly-necessary accesses: if you have a struct containing 12 bytes of adjacent bitfields, we will join them all into one large i96 access. You need to use narrower bounds than that because you need something that's guaranteed stable: you can't have one version of the compiler trying to access the bitfield with a 12-byte atomic access and another accessing it with a 4-byte access, because the atomic runtime functions don't promise that such accesses will actually be atomic w.r.t. each other. You'll need to invent a rule here that you're willing to stick to forever.

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

test/OpenMP/atomic_read_codegen.c
28 ↗(On Diff #17532)

This ends up being an inadvertently confusing variable name, since it ends in "six".

Hi John, thanks for the review.

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Yes, that's the plan. Except for global registers: I did not find
compare-and-swap op in LLVM IR for them, so I decided to use global
locks for them.

... You need to use narrower bounds than that because you need something that's guaranteed stable: ...

Hmm, I did not catch why there can be troubles with bitfileds. Why one
compiler may use 12-byte atomic access, while another one will produce a
4-byte access? I think all atomic accesses will be the same. According
to OpenMP spec we cannot perform atomic operation on the whole bitfield
structure, only on their particular bitfields (atomic ops are allowed
only for scalar values). So I expect that all atomic operations on
bitfields will be performed on the same bounds. Or you mean something else?

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

I thought about it. I agree, but also it may significantly complicate
the code itself. That's why I decided to use only libcalls, taking into
account that atomic operations on bitfields/vector elements are very
rarely used (if any, actually I did not see any, but it is good to have
a working solution for all kinds of lvalues).

This ends up being an inadvertently confusing variable name, since it ends in "six".

Ok, I'll try to improve it after our holidays.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

05.01.2015 11:08, John McCall пишет:

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Bitfields are interesting because IRGen actually uses larger-than-strictly-necessary accesses: if you have a struct containing 12 bytes of adjacent bitfields, we will join them all into one large i96 access. You need to use narrower bounds than that because you need something that's guaranteed stable: you can't have one version of the compiler trying to access the bitfield with a 12-byte atomic access and another accessing it with a 4-byte access, because the atomic runtime functions don't promise that such accesses will actually be atomic w.r.t. each other. You'll need to invent a rule here that you're willing to stick to forever.

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

================
Comment at: test/OpenMP/atomic_read_codegen.c:28
@@ +27,3 @@
+typedef int v4si attribute((vector_size(16)));
+v4si v4six;
+


This ends up being an inadvertently confusing variable name, since it ends in "six".

http://reviews.llvm.org/D6431

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/

Hi John, thanks for the review.

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Yes, that's the plan. Except for global registers: I did not find
compare-and-swap op in LLVM IR for them, so I decided to use global
locks for them.

I think global registers are generally thread-local, aren't they?

... You need to use narrower bounds than that because you need something that's guaranteed stable: ...

Hmm, I did not catch why there can be troubles with bitfileds. Why one
compiler may use 12-byte atomic access, while another one will produce a
4-byte access? I think all atomic accesses will be the same.

Given this structure:

struct S { int x: 32; int y: 32; };

Clang will currently emit an access to x by masking a 64-bit load. This is an essentially arbitrary implementation choice in IRGen; we've changed it before, and we may change it again in the future. You're generating code that depends on this arbitrary choice, because it ends up being the pointee type of the address stored in the bitfield LValue.

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

I thought about it. I agree, but also it may significantly complicate
the code itself. That's why I decided to use only libcalls, taking into
account that atomic operations on bitfields/vector elements are very
rarely used (if any, actually I did not see any, but it is good to have
a working solution for all kinds of lvalues).

It only seems more convenient because you're doing this logic in a deep place within the atomics code. If you had a high-level routine that reasoned about the kind of LValue it was working with *before* committing to an evaluation strategy, and then just called lower-level atomic routines as if it was doing an atomic operation on a char/short/int (for a bitfield) or the entire vector (for a vector element), this would fall out more naturally.

John.

Hi John,

I think global registers are generally thread-local, aren't they?

Oh, yes, you're right, I missed it. I'll fix it.

Given this structure:

struct S { int x: 32; int y: 32; };

Clang will currently emit an access to x by masking a 64-bit load. This is an essentially arbitrary implementation choice in IRGen; we've changed it before, and we may change it again in the future. You're generating code that depends on this arbitrary choice, because it ends up being the pointee type of the address stored in the bitfield LValue.

Ahh, you're talking about compatibility between different versions of clang/LLVM compilers? I see then. Ok, I'll try to fix it somehow.

It only seems more convenient because you're doing this logic in a deep place within the atomics code. If you had a high-level routine that reasoned about the kind of LValue it was working with *before* committing to an evaluation strategy, and then just called lower-level atomic routines as if it was doing an atomic operation on a char/short/int (for a bitfield) or the entire vector (for a vector element), this would fall out more naturally.

Agree, I'll try to improve the code.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

06.01.2015 4:44, John McCall пишет:

In http://reviews.llvm.org/D6431#105585, @ABataev wrote:

Hi John, thanks for the review.

How are you planning to implement stores for any of the non-simple l-value cases? Compare-and-swap loops?

Yes, that's the plan. Except for global registers: I did not find

compare-and-swap op in LLVM IR for them, so I decided to use global
locks for them.

I think global registers are generally thread-local, aren't they?

... You need to use narrower bounds than that because you need something that's guaranteed stable: ...

Hmm, I did not catch why there can be troubles with bitfileds. Why one

compiler may use 12-byte atomic access, while another one will produce a
4-byte access? I think all atomic accesses will be the same.

Given this structure:

struct S { int x: 32; int y: 32; };

Clang will currently emit an access to x by masking a 64-bit load. This is an essentially arbitrary implementation choice in IRGen; we've changed it before, and we may change it again in the future. You're generating code that depends on this arbitrary choice, because it ends up being the pointee type of the address stored in the bitfield LValue.

Also, both bitfields and vector elements can often be accessed more efficiently than just a libcall, depending on how much space they need.

I thought about it. I agree, but also it may significantly complicate

the code itself.  That's why I decided to use only libcalls, taking into
account that atomic operations on bitfields/vector elements are very
rarely used (if any, actually I did not see any, but it is good to have
a working solution for all kinds of lvalues).

It only seems more convenient because you're doing this logic in a deep place within the atomics code. If you had a high-level routine that reasoned about the kind of LValue it was working with *before* committing to an evaluation strategy, and then just called lower-level atomic routines as if it was doing an atomic operation on a char/short/int (for a bitfield) or the entire vector (for a vector element), this would fall out more naturally.

John.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/
ABataev updated this revision to Diff 18142.Jan 14 2015, 2:17 AM

Update after review

rjmccall added inline comments.Jan 16 2015, 2:01 AM
lib/CodeGen/CGAtomic.cpp
77 ↗(On Diff #18142)

Hmm. I feel like you should make sure that your rule uses an aligned atomic access if it's possible to do so. That is, if the bitfield does fall within a single aligned unit, you should definitely access it with an atomic of that size. I think you need to consider the actual bitfield width for this, but maybe I'm missing something and that's done implicitly elsewhere.

86 ↗(On Diff #18142)

This is dangerous; I think you can end up with an access that goes beyond the end of the structure with this. Consider a 1-bit bitfield of type "unsigned" that's at the end of the struct, e.g.

struct {

unsigned : 31
unsigned x : 1; // <- access is to this

};

You need to make sure this is only accessed with an 8-bit operation.

88 ↗(On Diff #18142)

This alignment needs to be adjusted.

ABataev updated this revision to Diff 18487.Jan 21 2015, 12:39 AM

Fixed codegen for bitfields + foramtting after review.

John, thanks for the review. I prepared a new patch, please look at this.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

16.01.2015 13:01, John McCall пишет:

================
Comment at: lib/CodeGen/CGAtomic.cpp:77
@@ +76,3 @@
+ auto &OrigBFI = lvalue.getBitFieldInfo();
+ auto OffsetInChars = C.toCharUnitsFromBits(OrigBFI.Offset);
+ auto VoidPtrAddr = CGF.EmitCastToVoidPtr(lvalue.getBitFieldAddr());


Hmm. I feel like you should make sure that your rule uses an aligned atomic access if it's possible to do so. That is, if the bitfield does fall within a single aligned unit, you should definitely access it with an atomic of that size. I think you need to consider the actual bitfield width for this, but maybe I'm missing something and that's done implicitly elsewhere.

================
Comment at: lib/CodeGen/CGAtomic.cpp:86
@@ +85,3 @@
+ BFI.Offset %= C.getCharWidth();
+ BFI.StorageSize = AtomicSizeInBits;
+ LVal = LValue::MakeBitfield(Addr, BFI, lvalue.getType(),


This is dangerous; I think you can end up with an access that goes beyond the end of the structure with this. Consider a 1-bit bitfield of type "unsigned" that's at the end of the struct, e.g.

struct {

unsigned : 31
unsigned x : 1; // <- access is to this

};

You need to make sure this is only accessed with an 8-bit operation.

================
Comment at: lib/CodeGen/CGAtomic.cpp:88
@@ +87,3 @@
+ LVal = LValue::MakeBitfield(Addr, BFI, lvalue.getType(),
+ lvalue.getAlignment());
+ } else if (lvalue.isVectorElt()) {


This alignment needs to be adjusted.

http://reviews.llvm.org/D6431

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/

Cute, yes, I think that works.

This revision was automatically updated to reflect the committed changes.