Page MenuHomePhabricator

[IR] Add support for floating pointer atomic loads and stores
ClosedPublic

Authored by reames on Dec 11 2015, 4:15 PM.

Details

Summary

This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom.

Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend.

This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form.

Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out.

As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material.

Diff Detail

Repository
rL LLVM

Event Timeline

reames updated this revision to Diff 42599.Dec 11 2015, 4:15 PM
reames retitled this revision from to [IR] Add support for floating pointer and vector atomic loads and stores.
reames updated this object.
reames added reviewers: jfb, hfinkel, jyknight.
reames added a subscriber: llvm-commits.
sanjoy added a subscriber: sanjoy.Dec 11 2015, 4:44 PM

Minor drop by comments inline.

docs/LangRef.rst
6994 ↗(On Diff #42599)

Minor & optional language suggestion: right now this can be read as "is >= 2^8". Also, I'm not sure if the "greater than or equal to a target specific limit" is better than "greater than a target specific limit".

Might be clearer to phrase this as:

... be a type whose bit width is a power of two, is not less than 8, and is greater than a target specific limit ...

lib/CodeGen/AtomicExpandPass.cpp
199 ↗(On Diff #42599)

Why not have IntegerType * as the return type?

209 ↗(On Diff #42599)

F is a llvm::Module, so please call it M. Also, there is an Instruction::getModule.

218 ↗(On Diff #42599)

Very minor: spacing before =.

292 ↗(On Diff #42599)

Again, I think auto *M = SI->getModule() is better.

reames updated this revision to Diff 42613.Dec 11 2015, 6:02 PM

Address Sanjoy's review.

jfb edited edge metadata.Dec 12 2015, 1:42 PM

Does this have any implications for alias analysis, especially type-based? The C++ model means memory locations are always atomic, so this isn't un-aliasing non-atomics further, but does LLVM's AAs understand that atomics can be non-integral now?

As explained in one of my comments, I think this patch should only do FP and not vectors.

docs/LangRef.rst
6995 ↗(On Diff #42613)

This isn't correct because it can't be any type: it has to be an integer, FP or vector. I'm not sure I'm really enthused by the addition of vector here because it has odd ramifications:

  • What if the vector's size isn't a power of 2?
  • What about vector atomicity and alignment? Some usecases may be OK with element-wise atomicity only (with undefined ordering).
  • What if the vector contains pointers? Right now you can't have atomic load/store of pointers, it seems odd to allow vectors of pointers.

I'd drop vectors from this patch, and clarify the documentation to mention integer and floating-point.

lib/CodeGen/AtomicExpandPass.cpp
142 ↗(On Diff #42613)

Add a string: assert(foo && "bar");

153 ↗(On Diff #42613)

Ditto.

202 ↗(On Diff #42613)

Shouldn't this be getStoreSizeInBits()? If getStoreSizeInBits() != getSizeInBits() then we'll have problems because the alignment may be wrong. I'd assert that they're the same, as well as checking the number is a power of 2 (because lulz fp80).

In D15471#309318, @jfb wrote:

Does this have any implications for alias analysis, especially type-based? The C++ model means memory locations are always atomic, so this isn't un-aliasing non-atomics further, but does LLVM's AAs understand that atomics can be non-integral now?

I'm not sure what you're trying to ask here specifically, but I have no reason to believe this influences AA in any way. Any AA which is using the *llvm type* to prove no alias is wrong and should be fixed. TBAA is entirely orthogonal and uneffected.

As explained in one of my comments, I think this patch should only do FP and not vectors.

I'm okay with this for moment. Will upload a simplified patch shortly.

reames added inline comments.Dec 14 2015, 1:05 PM
docs/LangRef.rst
6995 ↗(On Diff #42613)

Let's move this to the llvm-dev thread.

For the record, your point 3 is based on a wrong assumption. We do support atomic loads and stores of pointers.

lib/CodeGen/AtomicExpandPass.cpp
142 ↗(On Diff #42613)

Will do. For the record, I feel the message adds absolutely nothing here given the code context, but I don't care enough to argue the point.

202 ↗(On Diff #42613)

I'll switch methods and add the first assert since it's slightly non-obvious. The power of two is enforced by the verifier.

reames updated this revision to Diff 42759.Dec 14 2015, 1:08 PM
reames retitled this revision from [IR] Add support for floating pointer and vector atomic loads and stores to [IR] Add support for floating pointer atomic loads and stores.
reames updated this object.
reames edited edge metadata.

Address JF's comments and remove the vector support for the moment.

reames updated this revision to Diff 42760.Dec 14 2015, 1:12 PM

Forgot the new LangRef changes in the last update.

bcraig added a subscriber: bcraig.Dec 14 2015, 1:25 PM
bcraig added inline comments.
test/CodeGen/X86/atomic-non-integer.ll
49 ↗(On Diff #42760)

All of your llc tests are currently testing unordered accesses. The interesting code gen on X86 is with seq_cst stores. I recommend adding tests for those, and ensuring that you get the appropriate [lock] xchg operations.

reames updated this revision to Diff 42788.Dec 14 2015, 3:10 PM

Add a couple of seq_cst test per Ben's request.

LGTM... but that doesn't mean much. Thanks for adding the extra tests.

jfb added a comment.Dec 15 2015, 9:58 AM

For x86, can you add a negative test for fp80 to make sure it doesn't work? Or will that test only hit an assert? I'm assuming that it should fail validation.

Also, add fp128. For now I'm assuming we'll do lock cmpxchg16b on processors which support it, and a call to the runtime lock from compiler-rt otherwise (__atomic_load_16 and __atomic_store_16 from lib/builtins/atomic.c which I'm not sure work?).

lib/CodeGen/AtomicExpandPass.cpp
207 ↗(On Diff #42788)

"integral"

284 ↗(On Diff #42788)

"integral"

test/CodeGen/X86/atomic-non-integer.ll
1 ↗(On Diff #42788)

Add a reference to:
Also add a reference to: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

6 ↗(On Diff #42788)

Why does this involve an f2h conversion? Is it a calling convention thing where half is passed as its integer equivalent? It would be worth documenting, I find it surprising.

32 ↗(On Diff #42788)

Same.

54 ↗(On Diff #42788)

For each of the xchg below, can you also CHECK-NOT: lock since the x86 manual states that the lock prefix is implicit.

test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
5 ↗(On Diff #42788)

Also test volatile and addressspace combinations.

In D15471#311041, @jfb wrote:

For x86, can you add a negative test for fp80 to make sure it doesn't work? Or will that test only hit an assert? I'm assuming that it should fail validation.

This will be caught by the verifier. Will add a test.

Also, add fp128. For now I'm assuming we'll do lock cmpxchg16b on processors which support it, and a call to the runtime lock from compiler-rt otherwise (__atomic_load_16 and __atomic_store_16 from lib/builtins/atomic.c which I'm not sure work?).

I haven't checked, but the details of the lowering (including correctness!) are beyond the scope of this change. The lowering will be whatever you would have gotten with the previous bitcast idiom. *All* this change is doing is removing the need for that idiom. If we need to revise lowering, that should and will be a separate change. I will add a test to check that we lower to *something*, but that's all that should be part of this change.

I'll make the testing changes requested. With the assumption those will be done before submission, can I get an official LGTM?

I'm noticing a trend for this patch to keep snowballing to include more and more lowering questions; I specifically want to cut that off. This change *shouldn't* be changing lowering in any way from what frontends get today. That's the entire point of the patch.

lib/CodeGen/AtomicExpandPass.cpp
284 ↗(On Diff #42788)

Will fix.

test/CodeGen/X86/atomic-non-integer.ll
1 ↗(On Diff #42788)

This is, or should, be documented in the code. Since I didn't write the lowering code for this and don't know what reasoning let to this particular emission, I'd rather not falsely site something due to the later confusion it might cause.

6 ↗(On Diff #42788)

Frankly, I have no clue. We emit the same conversion when doing a non-atomic store, so it's not related to the changes in this patch.

54 ↗(On Diff #42788)

This is an implementation detail that is irrelevant to this functionality. This is a) tested elsewhere, and b) irrelevant to the correctness of the code genation here.

jfb added a comment.Dec 15 2015, 2:34 PM
In D15471#311041, @jfb wrote:

For x86, can you add a negative test for fp80 to make sure it doesn't work? Or will that test only hit an assert? I'm assuming that it should fail validation.

This will be caught by the verifier. Will add a test.

OK ty.

Also, add fp128. For now I'm assuming we'll do lock cmpxchg16b on processors which support it, and a call to the runtime lock from compiler-rt otherwise (__atomic_load_16 and __atomic_store_16 from lib/builtins/atomic.c which I'm not sure work?).

I haven't checked, but the details of the lowering (including correctness!) are beyond the scope of this change. The lowering will be whatever you would have gotten with the previous bitcast idiom. *All* this change is doing is removing the need for that idiom. If we need to revise lowering, that should and will be a separate change. I will add a test to check that we lower to *something*, but that's all that should be part of this change.

Oh yeah I just want to know that we generate something. If it's wrong the fix should definitely be separate.

I'll make the testing changes requested. With the assumption those will be done before submission, can I get an official LGTM?

Yes.

I'm noticing a trend for this patch to keep snowballing to include more and more lowering questions; I specifically want to cut that off. This change *shouldn't* be changing lowering in any way from what frontends get today. That's the entire point of the patch.

I want to improve our test coverage and understand what the tests are doing. I agree that lowering improvements are separate.

test/CodeGen/X86/atomic-non-integer.ll
1 ↗(On Diff #42788)

Could you document this at the top of the test? It's not clear from the name of the test that you're not checking correctness of the generated code.

6 ↗(On Diff #42788)

I'd rather know if the test is correct, and fix elsewhere if not: otherwise it's hard to reason about this test when fixing bugs.

I can't find anything about half or fp16 in the calling convention, but the signature (short __gnu_f2h_ieee(float)) leads me to believe that the ABI passes things as float and then converts to short as a proxy for half.

I think a cleaner test wouldn't pass a half in registers but would instead pass two pointers. That makes the test way easier to understand IMO.

54 ↗(On Diff #42788)

Oh you're right, test/CodeGen/X86/atomic_mi.ll tests this and git blame shows a familiar name on that!

reames updated this revision to Diff 42919.Dec 15 2015, 2:51 PM

add various requested tests

reames updated this revision to Diff 42920.Dec 15 2015, 2:56 PM

address JF's comments with regards to halfs.

jfb accepted this revision.Dec 15 2015, 4:15 PM
jfb edited edge metadata.

lgtm after the two "integral" typo fixes.

test/Verifier/atomics.ll
4 ↗(On Diff #42920)

Ha, that's not the best error message since x86_mmx is a floating-point type. I'll rework it in D15512 if that's OK with you.

This revision is now accepted and ready to land.Dec 15 2015, 4:15 PM
reames added inline comments.Dec 15 2015, 4:48 PM
test/CodeGen/X86/atomic-non-integer.ll
2 ↗(On Diff #42920)

I added a note to the top of the file. Let me know if you want something more.

7 ↗(On Diff #42920)

This comment doesn't really parse for me. If you want to suggest a change here, please raise it on the mailing lists so that someone more knowledgeable than I can comment.

test/Verifier/atomics.ll
4 ↗(On Diff #42920)

Not according to LLVM's isFloatingPointTy it's not. Surprised me too. Fixing that might end up being a much larger change though.

This revision was automatically updated to reflect the committed changes.