This is an archive of the discontinued LLVM Phabricator instance.

[Power] Improve the expansion of atomic loads/stores
ClosedPublic

Authored by morisset on Oct 2 2014, 1:32 PM.

Details

Summary

Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.

For example, this patch turns

define void @store_i8_unordered(i8* %mem) {
  store atomic i8 42, i8* %mem unordered, align 1
  ret void
}

from

_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    rlwinm r2, r3, 3, 27, 28
    li r4, 42
    xori r5, r2, 24
    rlwinm r2, r3, 0, 0, 29
    li r3, 255
    slw r4, r4, r5
    slw r3, r3, r5
    and r4, r4, r3
LBB4_1:                                 ; =>This Inner Loop Header: Depth=1
    lwarx r5, 0, r2
    andc r5, r5, r3
    or r5, r4, r5
    stwcx. r5, 0, r2
    bne cr0, LBB4_1
; BB#2:
    blr

into

_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    li r2, 42
    stb r2, 0(r3)
    blr

which looks like a pretty clear win to me.

Diff Detail

Event Timeline

morisset updated this revision to Diff 14344.Oct 2 2014, 1:32 PM
morisset retitled this revision from to [Power] Improve the expansion of atomic loads/stores.
morisset updated this object.
morisset edited the test plan for this revision. (Show Details)
morisset added reviewers: jfb, wschmidt, hfinkel.
morisset added a subscriber: Unknown Object (MLST).
hfinkel edited edge metadata.Oct 2 2014, 1:50 PM

Is this something guaranteed by the ISA, or just something we know to be true for common implementations? I'm somewhat afraid of doing this for the generic targets in case this might not be true for some embedded implementation (if it is not an ISA guarantee).

wschmidt edited edge metadata.Oct 2 2014, 1:56 PM

I agree that this is legitimate according to the alignment definitions of atomic loads and stores and the atomicity requirements of the PowerPC ISA. A most excellent improvement indeed.

From the ISA:

Vector storage accesses are not guaranteed to be atomic. The following other types of single-register accesses are always atomic:

  • byte accesses (all bytes are aligned on byte boundaries);
  • halfword accesses aligned on halfword boundaries;
  • word accesses aligned on word boundaries;
  • doubleword accesses aligned on doubleword boundaries.

The language in the LLVM IR reference indicates that if this is not the case for load atomic or store atomic, the result is undefined. So this is safe.

The easiest way to force a use of the indexed forms of load and store is to use an offset that is out of range of the hardware load/store-immediate instructions. So try an array reference that is more than 65535 bytes away from its base.

(BTW, the doubleword accesses only apply to 64-bit -- I got lazy and didn't provide the parenthetical about that.)

Thank you ! This paragraph of the documentation was indeed what I was
thinking about, when saying that atomicity is not a problem, I should have
made it more explicit. I will try to generate tests for the indexed
versions and add them in.

hfinkel accepted this revision.Oct 2 2014, 2:52 PM
hfinkel edited edge metadata.

LGTM.

Regarding the indexed loads/stores, as bill said, you should be able to generate them using a large offset (that won't fit in the immediate), or just using a variable offset.

This revision is now accepted and ready to land.Oct 2 2014, 2:52 PM
morisset updated this revision to Diff 14353.Oct 2 2014, 3:33 PM
morisset edited edge metadata.

Add tests for indexed loads/stores

Thanks! Commit whenever you're ready.

test/CodeGen/PowerPC/atomics-indexed.ll
2 ↗(On Diff #14353)

If at some point you could look at fixing this, that would be great.

morisset closed this revision.Oct 2 2014, 3:37 PM
morisset updated this revision to Diff 14354.

Closed by commit rL218922 (authored by @morisset).