This is an archive of the discontinued LLVM Phabricator instance.

[Power] Improve the expansion of atomic loads/stores
ClosedPublic

Authored by morisset on Oct 2 2014, 1:32 PM.

Download Raw Diff

Details

Reviewers

wschmidt
jfb
hfinkel

Commits

rGe1ca44bd4c1d: [Power] Improve the expansion of atomic loads/stores
rL218922: [Power] Improve the expansion of atomic loads/stores

Summary

Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.

For example, this patch turns

define void @store_i8_unordered(i8* %mem) {
  store atomic i8 42, i8* %mem unordered, align 1
  ret void
}

from

_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    rlwinm r2, r3, 3, 27, 28
    li r4, 42
    xori r5, r2, 24
    rlwinm r2, r3, 0, 0, 29
    li r3, 255
    slw r4, r4, r5
    slw r3, r3, r5
    and r4, r4, r3
LBB4_1:                                 ; =>This Inner Loop Header: Depth=1
    lwarx r5, 0, r2
    andc r5, r5, r3
    or r5, r4, r5
    stwcx. r5, 0, r2
    bne cr0, LBB4_1
; BB#2:
    blr

into

_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    li r2, 42
    stb r2, 0(r3)
    blr

which looks like a pretty clear win to me.

Diff Detail

Repository: rL LLVM

Event Timeline

morisset updated this revision to Diff 14344.Oct 2 2014, 1:32 PM

morisset retitled this revision from to [Power] Improve the expansion of atomic loads/stores.

morisset updated this object.

morisset edited the test plan for this revision. (Show Details)

morisset added reviewers: jfb, wschmidt, hfinkel.

morisset added a subscriber: Unknown Object (MLST).

Is this something guaranteed by the ISA, or just something we know to be true for common implementations? I'm somewhat afraid of doing this for the generic targets in case this might not be true for some embedded implementation (if it is not an ISA guarantee).

I agree that this is legitimate according to the alignment definitions of atomic loads and stores and the atomicity requirements of the PowerPC ISA. A most excellent improvement indeed.

From the ISA:

Vector storage accesses are not guaranteed to be atomic. The following other types of single-register accesses are always atomic:

byte accesses (all bytes are aligned on byte boundaries);
halfword accesses aligned on halfword boundaries;
word accesses aligned on word boundaries;
doubleword accesses aligned on doubleword boundaries.

The language in the LLVM IR reference indicates that if this is not the case for load atomic or store atomic, the result is undefined. So this is safe.

The easiest way to force a use of the indexed forms of load and store is to use an offset that is out of range of the hardware load/store-immediate instructions. So try an array reference that is more than 65535 bytes away from its base.

(BTW, the doubleword accesses only apply to 64-bit -- I got lazy and didn't provide the parenthetical about that.)

Thank you ! This paragraph of the documentation was indeed what I was
thinking about, when saying that atomicity is not a problem, I should have
made it more explicit. I will try to generate tests for the indexed
versions and add them in.

LGTM.

Regarding the indexed loads/stores, as bill said, you should be able to generate them using a large offset (that won't fit in the immediate), or just using a variable offset.

This revision is now accepted and ready to land.Oct 2 2014, 2:52 PM

Add tests for indexed loads/stores

Thanks! Commit whenever you're ready.

test/CodeGen/PowerPC/atomics-indexed.ll
2 ↗	(On Diff #14353)	If at some point you could look at fixing this, that would be great.

Closed by commit rL218922 (authored by @morisset).

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

PowerPC/

PPCISelLowering.cpp

8 lines

PPCInstr64Bit.td

6 lines

PPCInstrInfo.td

16 lines

test/

CodeGen/

PowerPC/

11 lines

81 lines

12 lines

3 lines

Diff 14354

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 607 Lines • ▼ Show 20 Lines	if (Subtarget.hasAltivec()) {
}		}
}		}

if (Subtarget.has64BitSupport()) {		if (Subtarget.has64BitSupport()) {
setOperationAction(ISD::PREFETCH, MVT::Other, Legal);		setOperationAction(ISD::PREFETCH, MVT::Other, Legal);
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);
}		}

setOperationAction(ISD::ATOMIC_LOAD, MVT::i32, Expand);		if (!isPPC64) {
setOperationAction(ISD::ATOMIC_STORE, MVT::i32, Expand);
setOperationAction(ISD::ATOMIC_LOAD, MVT::i64, Expand);		setOperationAction(ISD::ATOMIC_LOAD, MVT::i64, Expand);
setOperationAction(ISD::ATOMIC_STORE, MVT::i64, Expand);		setOperationAction(ISD::ATOMIC_STORE, MVT::i64, Expand);
		}

setBooleanContents(ZeroOrOneBooleanContent);		setBooleanContents(ZeroOrOneBooleanContent);
// Altivec instructions set fields to all zeros or all ones.		// Altivec instructions set fields to all zeros or all ones.
setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);		setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);

if (!isPPC64) {		if (!isPPC64) {
// These libcalls are not available in 32-bit.		// These libcalls are not available in 32-bit.
setLibcallName(RTLIB::SHL_I128, nullptr);		setLibcallName(RTLIB::SHL_I128, nullptr);
▲ Show 20 Lines • Show All 8,634 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstr64Bit.td

	Show First 20 Lines • Show All 1,129 Lines • ▼ Show 20 Lines
	// addresses without at least 4-byte alignment.			// addresses without at least 4-byte alignment.
	def : Pat<(i64 (unaligned4sextloadi32 xoaddr:$src)),			def : Pat<(i64 (unaligned4sextloadi32 xoaddr:$src)),
	(LWAX xoaddr:$src)>;			(LWAX xoaddr:$src)>;
	def : Pat<(i64 (unaligned4load xoaddr:$src)),			def : Pat<(i64 (unaligned4load xoaddr:$src)),
	(LDX xoaddr:$src)>;			(LDX xoaddr:$src)>;
	def : Pat<(unaligned4store i64:$rS, xoaddr:$dst),			def : Pat<(unaligned4store i64:$rS, xoaddr:$dst),
	(STDX $rS, xoaddr:$dst)>;			(STDX $rS, xoaddr:$dst)>;

				// 64-bits atomic loads and stores
				def : Pat<(atomic_load_64 ixaddr:$src), (LD memrix:$src)>;
				def : Pat<(atomic_load_64 xaddr:$src), (LDX memrr:$src)>;

				def : Pat<(atomic_store_64 ixaddr:$ptr, i64:$val), (STD g8rc:$val, memrix:$ptr)>;
				def : Pat<(atomic_store_64 xaddr:$ptr, i64:$val), (STDX g8rc:$val, memrr:$ptr)>;

llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td

	Show First 20 Lines • Show All 3,689 Lines • ▼ Show 20 Lines
	defm : TrapExtendedMnemonic<"ng", 20>;			defm : TrapExtendedMnemonic<"ng", 20>;
	defm : TrapExtendedMnemonic<"llt", 2>;			defm : TrapExtendedMnemonic<"llt", 2>;
	defm : TrapExtendedMnemonic<"lle", 6>;			defm : TrapExtendedMnemonic<"lle", 6>;
	defm : TrapExtendedMnemonic<"lge", 5>;			defm : TrapExtendedMnemonic<"lge", 5>;
	defm : TrapExtendedMnemonic<"lgt", 1>;			defm : TrapExtendedMnemonic<"lgt", 1>;
	defm : TrapExtendedMnemonic<"lnl", 5>;			defm : TrapExtendedMnemonic<"lnl", 5>;
	defm : TrapExtendedMnemonic<"lng", 6>;			defm : TrapExtendedMnemonic<"lng", 6>;
	defm : TrapExtendedMnemonic<"u", 31>;			defm : TrapExtendedMnemonic<"u", 31>;

				// Atomic loads
				def : Pat<(atomic_load_8 iaddr:$src), (LBZ memri:$src)>;
				def : Pat<(atomic_load_16 iaddr:$src), (LHZ memri:$src)>;
				def : Pat<(atomic_load_32 iaddr:$src), (LWZ memri:$src)>;
				def : Pat<(atomic_load_8 xaddr:$src), (LBZX memrr:$src)>;
				def : Pat<(atomic_load_16 xaddr:$src), (LHZX memrr:$src)>;
				def : Pat<(atomic_load_32 xaddr:$src), (LWZX memrr:$src)>;

				// Atomic stores
				def : Pat<(atomic_store_8 iaddr:$ptr, i32:$val), (STB gprc:$val, memri:$ptr)>;
				def : Pat<(atomic_store_16 iaddr:$ptr, i32:$val), (STH gprc:$val, memri:$ptr)>;
				def : Pat<(atomic_store_32 iaddr:$ptr, i32:$val), (STW gprc:$val, memri:$ptr)>;
				def : Pat<(atomic_store_8 xaddr:$ptr, i32:$val), (STBX gprc:$val, memrr:$ptr)>;
				def : Pat<(atomic_store_16 xaddr:$ptr, i32:$val), (STHX gprc:$val, memrr:$ptr)>;
				def : Pat<(atomic_store_32 xaddr:$ptr, i32:$val), (STWX gprc:$val, memrr:$ptr)>;

llvm/trunk/test/CodeGen/PowerPC/atomic-2.ll

	Show All 24 Lines
	; CHECK: stdcx.			; CHECK: stdcx.
	ret i64 %tmp			ret i64 %tmp
	}			}

	define void @atomic_store(i64* %mem, i64 %val) nounwind {			define void @atomic_store(i64* %mem, i64 %val) nounwind {
	entry:			entry:
	; CHECK: @atomic_store			; CHECK: @atomic_store
	store atomic i64 %val, i64* %mem release, align 64			store atomic i64 %val, i64* %mem release, align 64
	; CHECK: ldarx			; CHECK: sync 1
	; CHECK: stdcx.			; CHECK-NOT: stdcx
				; CHECK: std
	ret void			ret void
	}			}

	define i64 @atomic_load(i64* %mem) nounwind {			define i64 @atomic_load(i64* %mem) nounwind {
	entry:			entry:
	; CHECK: @atomic_load			; CHECK: @atomic_load
	%tmp = load atomic i64* %mem acquire, align 64			%tmp = load atomic i64* %mem acquire, align 64
	; CHECK: ldarx			; CHECK-NOT: ldarx
	; CHECK: stdcx.			; CHECK: ld
	; CHECK: stdcx.			; CHECK: sync 1
	ret i64 %tmp			ret i64 %tmp
	}			}

llvm/trunk/test/CodeGen/PowerPC/atomics-indexed.ll

				; RUN: llc < %s -mtriple=powerpc-apple-darwin -march=ppc32 -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=PPC32
				; FIXME: -verify-machineinstrs currently fail on ppc64 (mismatched register/instruction).
				; This is already checked for in Atomics-64.ll
				; RUN: llc < %s -mtriple=powerpc-apple-darwin -march=ppc64 \| FileCheck %s --check-prefix=CHECK --check-prefix=PPC64

				; In this file, we check that atomic load/store can make use of the indexed
				; versions of the instructions.

				; Indexed version of loads
				define i8 @load_x_i8_seq_cst([100000 x i8]* %mem) {
				; CHECK-LABEL: load_x_i8_seq_cst
				; CHECK: sync 0
				; CHECK: lbzx
				; CHECK: sync 1
				%ptr = getelementptr inbounds [100000 x i8]* %mem, i64 0, i64 90000
				%val = load atomic i8* %ptr seq_cst, align 1
				ret i8 %val
				}
				define i16 @load_x_i16_acquire([100000 x i16]* %mem) {
				; CHECK-LABEL: load_x_i16_acquire
				; CHECK: lhzx
				; CHECK: sync 1
				%ptr = getelementptr inbounds [100000 x i16]* %mem, i64 0, i64 90000
				%val = load atomic i16* %ptr acquire, align 2
				ret i16 %val
				}
				define i32 @load_x_i32_monotonic([100000 x i32]* %mem) {
				; CHECK-LABEL: load_x_i32_monotonic
				; CHECK: lwzx
				; CHECK-NOT: sync
				%ptr = getelementptr inbounds [100000 x i32]* %mem, i64 0, i64 90000
				%val = load atomic i32* %ptr monotonic, align 4
				ret i32 %val
				}
				define i64 @load_x_i64_unordered([100000 x i64]* %mem) {
				; CHECK-LABEL: load_x_i64_unordered
				; PPC32: __sync_
				; PPC64-NOT: __sync_
				; PPC64: ldx
				; CHECK-NOT: sync
				%ptr = getelementptr inbounds [100000 x i64]* %mem, i64 0, i64 90000
				%val = load atomic i64* %ptr unordered, align 8
				ret i64 %val
				}

				; Indexed version of stores
				define void @store_x_i8_seq_cst([100000 x i8]* %mem) {
				; CHECK-LABEL: store_x_i8_seq_cst
				; CHECK: sync 0
				; CHECK: stbx
				%ptr = getelementptr inbounds [100000 x i8]* %mem, i64 0, i64 90000
				store atomic i8 42, i8* %ptr seq_cst, align 1
				ret void
				}
				define void @store_x_i16_release([100000 x i16]* %mem) {
				; CHECK-LABEL: store_x_i16_release
				; CHECK: sync 1
				; CHECK: sthx
				%ptr = getelementptr inbounds [100000 x i16]* %mem, i64 0, i64 90000
				store atomic i16 42, i16* %ptr release, align 2
				ret void
				}
				define void @store_x_i32_monotonic([100000 x i32]* %mem) {
				; CHECK-LABEL: store_x_i32_monotonic
				; CHECK-NOT: sync
				; CHECK: stwx
				%ptr = getelementptr inbounds [100000 x i32]* %mem, i64 0, i64 90000
				store atomic i32 42, i32* %ptr monotonic, align 4
				ret void
				}
				define void @store_x_i64_unordered([100000 x i64]* %mem) {
				; CHECK-LABEL: store_x_i64_unordered
				; CHECK-NOT: sync 0
				; CHECK-NOT: sync 1
				; PPC32: __sync_
				; PPC64-NOT: __sync_
				; PPC64: stdx
				%ptr = getelementptr inbounds [100000 x i64]* %mem, i64 0, i64 90000
				store atomic i64 42, i64* %ptr unordered, align 8
				ret void
				}

llvm/trunk/test/CodeGen/PowerPC/atomics.ll

	; RUN: llc < %s -mtriple=powerpc-apple-darwin -march=ppc32 -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=PPC32			; RUN: llc < %s -mtriple=powerpc-apple-darwin -march=ppc32 -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=PPC32
	; FIXME: -verify-machineinstrs currently fail on ppc64 (mismatched register/instruction).			; FIXME: -verify-machineinstrs currently fail on ppc64 (mismatched register/instruction).
	; This is already checked for in Atomics-64.ll			; This is already checked for in Atomics-64.ll
	; RUN: llc < %s -mtriple=powerpc-apple-darwin -march=ppc64 \| FileCheck %s --check-prefix=CHECK --check-prefix=PPC64			; RUN: llc < %s -mtriple=powerpc-apple-darwin -march=ppc64 \| FileCheck %s --check-prefix=CHECK --check-prefix=PPC64

	; FIXME: we don't currently check for the operations themselves with CHECK-NEXT,			; FIXME: we don't currently check for the operations themselves with CHECK-NEXT,
	; because they are implemented in a very messy way with lwarx/stwcx.			; because they are implemented in a very messy way with lwarx/stwcx.
	; It should be fixed soon in another patch.			; It should be fixed soon in another patch.

	; We first check loads, for all sizes from i8 to i64.			; We first check loads, for all sizes from i8 to i64.
	; We also vary orderings to check for barriers.			; We also vary orderings to check for barriers.
	define i8 @load_i8_unordered(i8* %mem) {			define i8 @load_i8_unordered(i8* %mem) {
	; CHECK-LABEL: load_i8_unordered			; CHECK-LABEL: load_i8_unordered
				; CHECK: lbz
	; CHECK-NOT: sync			; CHECK-NOT: sync
	%val = load atomic i8* %mem unordered, align 1			%val = load atomic i8* %mem unordered, align 1
	ret i8 %val			ret i8 %val
	}			}
	define i16 @load_i16_monotonic(i16* %mem) {			define i16 @load_i16_monotonic(i16* %mem) {
	; CHECK-LABEL: load_i16_monotonic			; CHECK-LABEL: load_i16_monotonic
				; CHECK: lhz
	; CHECK-NOT: sync			; CHECK-NOT: sync
	%val = load atomic i16* %mem monotonic, align 2			%val = load atomic i16* %mem monotonic, align 2
	ret i16 %val			ret i16 %val
	}			}
	define i32 @load_i32_acquire(i32* %mem) {			define i32 @load_i32_acquire(i32* %mem) {
	; CHECK-LABEL: load_i32_acquire			; CHECK-LABEL: load_i32_acquire
				; CHECK: lwz
	%val = load atomic i32* %mem acquire, align 4			%val = load atomic i32* %mem acquire, align 4
	; CHECK: sync 1			; CHECK: sync 1
	ret i32 %val			ret i32 %val
	}			}
	define i64 @load_i64_seq_cst(i64* %mem) {			define i64 @load_i64_seq_cst(i64* %mem) {
	; CHECK-LABEL: load_i64_seq_cst			; CHECK-LABEL: load_i64_seq_cst
	; CHECK: sync 0			; CHECK: sync 0
				; PPC32: __sync_
				; PPC64-NOT: __sync_
				; PPC64: ld
	%val = load atomic i64* %mem seq_cst, align 8			%val = load atomic i64* %mem seq_cst, align 8
	; CHECK: sync 1			; CHECK: sync 1
	ret i64 %val			ret i64 %val
	}			}

	; Stores			; Stores
	define void @store_i8_unordered(i8* %mem) {			define void @store_i8_unordered(i8* %mem) {
	; CHECK-LABEL: store_i8_unordered			; CHECK-LABEL: store_i8_unordered
	; CHECK-NOT: sync			; CHECK-NOT: sync
				; CHECK: stb
	store atomic i8 42, i8* %mem unordered, align 1			store atomic i8 42, i8* %mem unordered, align 1
	ret void			ret void
	}			}
	define void @store_i16_monotonic(i16* %mem) {			define void @store_i16_monotonic(i16* %mem) {
	; CHECK-LABEL: store_i16_monotonic			; CHECK-LABEL: store_i16_monotonic
	; CHECK-NOT: sync			; CHECK-NOT: sync
				; CHECK: sth
	store atomic i16 42, i16* %mem monotonic, align 2			store atomic i16 42, i16* %mem monotonic, align 2
	ret void			ret void
	}			}
	define void @store_i32_release(i32* %mem) {			define void @store_i32_release(i32* %mem) {
	; CHECK-LABEL: store_i32_release			; CHECK-LABEL: store_i32_release
	; CHECK: sync 1			; CHECK: sync 1
				; CHECK: stw
	store atomic i32 42, i32* %mem release, align 4			store atomic i32 42, i32* %mem release, align 4
	ret void			ret void
	}			}
	define void @store_i64_seq_cst(i64* %mem) {			define void @store_i64_seq_cst(i64* %mem) {
	; CHECK-LABEL: store_i64_seq_cst			; CHECK-LABEL: store_i64_seq_cst
	; CHECK: sync 0			; CHECK: sync 0
				; PPC32: __sync_
				; PPC64-NOT: __sync_
				; PPC64: std
	store atomic i64 42, i64* %mem seq_cst, align 8			store atomic i64 42, i64* %mem seq_cst, align 8
	ret void			ret void
	}			}

	; Atomic CmpXchg			; Atomic CmpXchg
	define i8 @cas_strong_i8_sc_sc(i8* %mem) {			define i8 @cas_strong_i8_sc_sc(i8* %mem) {
	; CHECK-LABEL: cas_strong_i8_sc_sc			; CHECK-LABEL: cas_strong_i8_sc_sc
	; CHECK: sync 0			; CHECK: sync 0
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/pr15630.ll

	; RUN: llc -mcpu=pwr7 -O0 < %s \| FileCheck %s			; RUN: llc -mcpu=pwr7 -O0 < %s \| FileCheck %s

	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	define weak_odr void @_D4core6atomic49__T11atomicStoreVE4core6atomic11MemoryOrder3ThThZ11atomicStoreFNaNbKOhhZv(i8* %val_arg, i8 zeroext %newval_arg) {			define weak_odr void @_D4core6atomic49__T11atomicStoreVE4core6atomic11MemoryOrder3ThThZ11atomicStoreFNaNbKOhhZv(i8* %val_arg, i8 zeroext %newval_arg) {
	entry:			entry:
	%newval = alloca i8			%newval = alloca i8
	%ordering = alloca i32, align 4			%ordering = alloca i32, align 4
	store i8 %newval_arg, i8* %newval			store i8 %newval_arg, i8* %newval
	%tmp = load i8* %newval			%tmp = load i8* %newval
	store atomic volatile i8 %tmp, i8* %val_arg seq_cst, align 1			store atomic volatile i8 %tmp, i8* %val_arg seq_cst, align 1
	ret void			ret void
	}			}

	; CHECK: stwcx.			; CHECK: sync
				; CHECK: stb