This is an archive of the discontinued LLVM Phabricator instance.

[X86] Allow atomic operations using immediates to avoid using a register
ClosedPublic

Authored by morisset on Aug 5 2014, 2:36 PM.

Details

Reviewers
jfb
morisset
Summary

The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.

Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.

This second point matches the first issue identified in

http://llvm.org/bugs/show_bug.cgi?id=17281

Diff Detail

Event Timeline

morisset updated this revision to Diff 12209.Aug 5 2014, 2:36 PM
morisset retitled this revision from to [X86] Allow atomic operations using immediates to avoid using a register.
morisset updated this object.
morisset edited the test plan for this revision. (Show Details)
morisset added a reviewer: jfb.
morisset added subscribers: kcc, dvyukov, nadav, Unknown Object (MLST).
reames added a subscriber: reames.Aug 6 2014, 5:06 PM

I don't see an obvious errors in this change, but am not familiar with the areas you're touching in detail. I believe the memory model aspects to be correct, but can not speak for the x86 backend details.

It's worth noting that these instructions are specifically ordered *operations* not ordered *fences*. There's no need to issue a StoreLoad barrier following the release since a Load from the same address is already ordered and Loads from other locations are unordered with respect to a release *operation*.

I mention this only because I got confused reviewing the patch and it might be good to call it out in a comment. It might also be worth adding a CHECK-NOT to ensure lack of lock prefix in case someone in the future gets this wrong.

morisset updated this revision to Diff 12264.Aug 6 2014, 6:30 PM
  • Add 'CHECK-NOT: lock' (and comment) to atomic_mi.ll, based on Philip Reames suggestion
  • Add three missing tests to atomic_mi.ll

I have visibly misunderstood how arc diff works, as this time I have lost the modifications to the backend in this update to the patch.
I will fix that as soon as possible.

morisset updated this revision to Diff 12285.Aug 7 2014, 10:34 AM

Trying to fix my previous mistake with arcanist and make the diff reappear in Phabricator

morisset updated this revision to Diff 12286.Aug 7 2014, 11:29 AM

Eliminating the modification to X86CodeEmitter as it was part of the JIT that just got nuked upstream

reames added a comment.Aug 7 2014, 2:35 PM

This version LGTM pending review by someone on the backend elements.

Note: I think your refinement of the tests may have pointed out a related bug w.r.t. the missing lock prefix on sequentially consistent accesses. If so, I'm fine with addressing that separately.

test/CodeGen/X86/atomic_mi.ll
224

Shouldn't there be a lock prefix on this one to enforce the StoreLoad barrier required by cst? cst is w.r.t. all addresses (unlike every other ordering mode...)

294

Same here.

358

Again, lock prefix?

Actually, on X86 the xchg instruction is a bit special in that it automatically behaves as if it had a LOCK prefix, even if it is not explicit (source: Intel manual, 2B 4-71)
So I don't think it is a bug that it is not added.
Do you think I should add a comment somewhere to that effect? And if so where?

Thanks again for your reviews!

reames added a comment.Aug 7 2014, 3:43 PM
In D4796#14, @morisset wrote:

Actually, on X86 the xchg instruction is a bit special in that it automatically behaves as if it had a LOCK prefix, even if it is not explicit (source: Intel manual, 2B 4-71)
So I don't think it is a bug that it is not added.

You're right. I had forgotten this. Sorry for the noise.

Do you think I should add a comment somewhere to that effect? And if so where?

A note in the test might be warranted. It's valid to either have the lock prefix or not in the places I pointed out before.

Thanks again for your reviews!

Glad to help where I can. I really really really don't want to be debugging these type of issues later. :)

morisset updated this revision to Diff 12297.Aug 7 2014, 4:12 PM
Add back the modification to X86CodeEmitter as the Jit resurrected

+ an extra comment on the implicitness of the lock prefix before xchg.

morisset accepted this revision.Sep 2 2014, 3:36 PM
morisset added a reviewer: morisset.

Got a lgtm from Nadav Rotem on the mailing list.

This revision is now accepted and ready to land.Sep 2 2014, 3:36 PM
morisset closed this revision.Sep 2 2014, 3:37 PM

Commited r216980