Skip to content

Commit e2de06b

Browse files
author
Robin Morisset
committedOct 16, 2014
Erase fence insertion from SelectionDAGBuilder.cpp (NFC)
Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain *unsound*, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
1 parent 70c8217 commit e2de06b

File tree

6 files changed

+75
-83
lines changed

6 files changed

+75
-83
lines changed
 

‎llvm/include/llvm/Target/TargetLowering.h

+38-13
Original file line numberDiff line numberDiff line change
@@ -964,29 +964,54 @@ class TargetLoweringBase {
964964
/// It is called by AtomicExpandPass before expanding an
965965
/// AtomicRMW/AtomicCmpXchg/AtomicStore/AtomicLoad.
966966
/// RMW and CmpXchg set both IsStore and IsLoad to true.
967-
/// Backends with !getInsertFencesForAtomic() should keep a no-op here.
968967
/// This function should either return a nullptr, or a pointer to an IR-level
969968
/// Instruction*. Even complex fence sequences can be represented by a
970969
/// single Instruction* through an intrinsic to be lowered later.
970+
/// Backends with !getInsertFencesForAtomic() should keep a no-op here.
971+
/// Backends should override this method to produce target-specific intrinsic
972+
/// for their fences.
973+
/// FIXME: Please note that the default implementation here in terms of
974+
/// IR-level fences exists for historical/compatibility reasons and is
975+
/// *unsound* ! Fences cannot, in general, be used to restore sequential
976+
/// consistency. For example, consider the following example:
977+
/// atomic<int> x = y = 0;
978+
/// int r1, r2, r3, r4;
979+
/// Thread 0:
980+
/// x.store(1);
981+
/// Thread 1:
982+
/// y.store(1);
983+
/// Thread 2:
984+
/// r1 = x.load();
985+
/// r2 = y.load();
986+
/// Thread 3:
987+
/// r3 = y.load();
988+
/// r4 = x.load();
989+
/// r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all
990+
/// seq_cst. But if they are lowered to monotonic accesses, no amount of
991+
/// IR-level fences can prevent it.
992+
/// @{
971993
virtual Instruction* emitLeadingFence(IRBuilder<> &Builder, AtomicOrdering Ord,
972994
bool IsStore, bool IsLoad) const {
973-
assert(!getInsertFencesForAtomic());
974-
return nullptr;
995+
if (!getInsertFencesForAtomic())
996+
return nullptr;
997+
998+
if (isAtLeastRelease(Ord) && IsStore)
999+
return Builder.CreateFence(Ord);
1000+
else
1001+
return nullptr;
9751002
}
9761003

977-
/// Inserts in the IR a target-specific intrinsic specifying a fence.
978-
/// It is called by AtomicExpandPass after expanding an
979-
/// AtomicRMW/AtomicCmpXchg/AtomicStore/AtomicLoad.
980-
/// RMW and CmpXchg set both IsStore and IsLoad to true.
981-
/// Backends with !getInsertFencesForAtomic() should keep a no-op here.
982-
/// This function should either return a nullptr, or a pointer to an IR-level
983-
/// Instruction*. Even complex fence sequences can be represented by a
984-
/// single Instruction* through an intrinsic to be lowered later.
9851004
virtual Instruction* emitTrailingFence(IRBuilder<> &Builder, AtomicOrdering Ord,
9861005
bool IsStore, bool IsLoad) const {
987-
assert(!getInsertFencesForAtomic());
988-
return nullptr;
1006+
if (!getInsertFencesForAtomic())
1007+
return nullptr;
1008+
1009+
if (isAtLeastAcquire(Ord))
1010+
return Builder.CreateFence(Ord);
1011+
else
1012+
return nullptr;
9891013
}
1014+
/// @}
9901015

9911016
/// Returns true if the given (atomic) store should be expanded by the
9921017
/// IR-level AtomicExpand pass into an "atomic xchg" which ignores its input.

‎llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+20-67
Original file line numberDiff line numberDiff line change
@@ -3604,30 +3604,6 @@ void SelectionDAGBuilder::visitStore(const StoreInst &I) {
36043604
DAG.setRoot(StoreNode);
36053605
}
36063606

3607-
static SDValue InsertFenceForAtomic(SDValue Chain, AtomicOrdering Order,
3608-
SynchronizationScope Scope,
3609-
bool Before, SDLoc dl,
3610-
SelectionDAG &DAG,
3611-
const TargetLowering &TLI) {
3612-
// Fence, if necessary
3613-
if (Before) {
3614-
if (Order == AcquireRelease || Order == SequentiallyConsistent)
3615-
Order = Release;
3616-
else if (Order == Acquire || Order == Monotonic || Order == Unordered)
3617-
return Chain;
3618-
} else {
3619-
if (Order == AcquireRelease)
3620-
Order = Acquire;
3621-
else if (Order == Release || Order == Monotonic || Order == Unordered)
3622-
return Chain;
3623-
}
3624-
SDValue Ops[3];
3625-
Ops[0] = Chain;
3626-
Ops[1] = DAG.getConstant(Order, TLI.getPointerTy());
3627-
Ops[2] = DAG.getConstant(Scope, TLI.getPointerTy());
3628-
return DAG.getNode(ISD::ATOMIC_FENCE, dl, MVT::Other, Ops);
3629-
}
3630-
36313607
void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
36323608
SDLoc dl = getCurSDLoc();
36333609
AtomicOrdering SuccessOrder = I.getSuccessOrdering();
@@ -3636,27 +3612,16 @@ void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
36363612

36373613
SDValue InChain = getRoot();
36383614

3639-
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
3640-
if (TLI.getInsertFencesForAtomic())
3641-
InChain =
3642-
InsertFenceForAtomic(InChain, SuccessOrder, Scope, true, dl, DAG, TLI);
3643-
36443615
MVT MemVT = getValue(I.getCompareOperand()).getSimpleValueType();
36453616
SDVTList VTs = DAG.getVTList(MemVT, MVT::i1, MVT::Other);
36463617
SDValue L = DAG.getAtomicCmpSwap(
36473618
ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, dl, MemVT, VTs, InChain,
36483619
getValue(I.getPointerOperand()), getValue(I.getCompareOperand()),
36493620
getValue(I.getNewValOperand()), MachinePointerInfo(I.getPointerOperand()),
3650-
0 /* Alignment */,
3651-
TLI.getInsertFencesForAtomic() ? Monotonic : SuccessOrder,
3652-
TLI.getInsertFencesForAtomic() ? Monotonic : FailureOrder, Scope);
3621+
/*Alignment=*/ 0, SuccessOrder, FailureOrder, Scope);
36533622

36543623
SDValue OutChain = L.getValue(2);
36553624

3656-
if (TLI.getInsertFencesForAtomic())
3657-
OutChain = InsertFenceForAtomic(OutChain, SuccessOrder, Scope, false, dl,
3658-
DAG, TLI);
3659-
36603625
setValue(&I, L);
36613626
DAG.setRoot(OutChain);
36623627
}
@@ -3683,22 +3648,17 @@ void SelectionDAGBuilder::visitAtomicRMW(const AtomicRMWInst &I) {
36833648

36843649
SDValue InChain = getRoot();
36853650

3686-
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
3687-
if (TLI.getInsertFencesForAtomic())
3688-
InChain = InsertFenceForAtomic(InChain, Order, Scope, true, dl, DAG, TLI);
3689-
3690-
SDValue L = DAG.getAtomic(
3691-
NT, dl, getValue(I.getValOperand()).getSimpleValueType(), InChain,
3692-
getValue(I.getPointerOperand()), getValue(I.getValOperand()),
3693-
I.getPointerOperand(), 0 /* Alignment */,
3694-
TLI.getInsertFencesForAtomic() ? Monotonic : Order, Scope);
3651+
SDValue L =
3652+
DAG.getAtomic(NT, dl,
3653+
getValue(I.getValOperand()).getSimpleValueType(),
3654+
InChain,
3655+
getValue(I.getPointerOperand()),
3656+
getValue(I.getValOperand()),
3657+
I.getPointerOperand(),
3658+
/* Alignment=*/ 0, Order, Scope);
36953659

36963660
SDValue OutChain = L.getValue(1);
36973661

3698-
if (TLI.getInsertFencesForAtomic())
3699-
OutChain =
3700-
InsertFenceForAtomic(OutChain, Order, Scope, false, dl, DAG, TLI);
3701-
37023662
setValue(&I, L);
37033663
DAG.setRoot(OutChain);
37043664
}
@@ -3736,16 +3696,13 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
37363696
DAG.getEVTAlignment(VT));
37373697

37383698
InChain = TLI.prepareVolatileOrAtomicLoad(InChain, dl, DAG);
3739-
SDValue L = DAG.getAtomic(
3740-
ISD::ATOMIC_LOAD, dl, VT, VT, InChain, getValue(I.getPointerOperand()),
3741-
MMO, TLI.getInsertFencesForAtomic() ? Monotonic : Order, Scope);
3699+
SDValue L =
3700+
DAG.getAtomic(ISD::ATOMIC_LOAD, dl, VT, VT, InChain,
3701+
getValue(I.getPointerOperand()), MMO,
3702+
Order, Scope);
37423703

37433704
SDValue OutChain = L.getValue(1);
37443705

3745-
if (TLI.getInsertFencesForAtomic())
3746-
OutChain = InsertFenceForAtomic(OutChain, Order, Scope, false, dl,
3747-
DAG, TLI);
3748-
37493706
setValue(&I, L);
37503707
DAG.setRoot(OutChain);
37513708
}
@@ -3764,17 +3721,13 @@ void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) {
37643721
if (I.getAlignment() < VT.getSizeInBits() / 8)
37653722
report_fatal_error("Cannot generate unaligned atomic store");
37663723

3767-
if (TLI.getInsertFencesForAtomic())
3768-
InChain = InsertFenceForAtomic(InChain, Order, Scope, true, dl, DAG, TLI);
3769-
3770-
SDValue OutChain = DAG.getAtomic(
3771-
ISD::ATOMIC_STORE, dl, VT, InChain, getValue(I.getPointerOperand()),
3772-
getValue(I.getValueOperand()), I.getPointerOperand(), I.getAlignment(),
3773-
TLI.getInsertFencesForAtomic() ? Monotonic : Order, Scope);
3774-
3775-
if (TLI.getInsertFencesForAtomic())
3776-
OutChain =
3777-
InsertFenceForAtomic(OutChain, Order, Scope, false, dl, DAG, TLI);
3724+
SDValue OutChain =
3725+
DAG.getAtomic(ISD::ATOMIC_STORE, dl, VT,
3726+
InChain,
3727+
getValue(I.getPointerOperand()),
3728+
getValue(I.getValueOperand()),
3729+
I.getPointerOperand(), I.getAlignment(),
3730+
Order, Scope);
37783731

37793732
DAG.setRoot(OutChain);
37803733
}

‎llvm/lib/Target/Mips/MipsTargetMachine.cpp

+1
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,7 @@ TargetPassConfig *MipsTargetMachine::createPassConfig(PassManagerBase &PM) {
178178

179179
void MipsPassConfig::addIRPasses() {
180180
TargetPassConfig::addIRPasses();
181+
addPass(createAtomicExpandPass(&getMipsTargetMachine()));
181182
if (getMipsSubtarget().os16())
182183
addPass(createMipsOs16(getMipsTargetMachine()));
183184
if (getMipsSubtarget().inMips16HardFloat())

‎llvm/lib/Target/Sparc/SparcTargetMachine.cpp

+7
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ class SparcPassConfig : public TargetPassConfig {
4747
return getTM<SparcTargetMachine>();
4848
}
4949

50+
void addIRPasses() override;
5051
bool addInstSelector() override;
5152
bool addPreEmitPass() override;
5253
};
@@ -56,6 +57,12 @@ TargetPassConfig *SparcTargetMachine::createPassConfig(PassManagerBase &PM) {
5657
return new SparcPassConfig(this, PM);
5758
}
5859

60+
void SparcPassConfig::addIRPasses() {
61+
addPass(createAtomicExpandPass(&getSparcTargetMachine()));
62+
63+
TargetPassConfig::addIRPasses();
64+
}
65+
5966
bool SparcPassConfig::addInstSelector() {
6067
addPass(createSparcISelDag(getSparcTargetMachine()));
6168
return false;

‎llvm/lib/Target/XCore/XCoreTargetMachine.cpp

+7
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ class XCorePassConfig : public TargetPassConfig {
4141
return getTM<XCoreTargetMachine>();
4242
}
4343

44+
void addIRPasses() override;
4445
bool addPreISel() override;
4546
bool addInstSelector() override;
4647
bool addPreEmitPass() override;
@@ -51,6 +52,12 @@ TargetPassConfig *XCoreTargetMachine::createPassConfig(PassManagerBase &PM) {
5152
return new XCorePassConfig(this, PM);
5253
}
5354

55+
void XCorePassConfig::addIRPasses() {
56+
addPass(createAtomicExpandPass(&getXCoreTargetMachine()));
57+
58+
TargetPassConfig::addIRPasses();
59+
}
60+
5461
bool XCorePassConfig::addPreISel() {
5562
addPass(createXCoreLowerThreadLocalPass());
5663
return false;

‎llvm/test/CodeGen/XCore/atomic.ll

+2-3
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,10 @@ entry:
2222
; CHECK-LABEL: atomicloadstore
2323

2424
; CHECK: ldw r[[R0:[0-9]+]], dp[pool]
25-
; CHECK-NEXT: #MEMBARRIER
26-
%0 = load atomic i32* bitcast (i64* @pool to i32*) acquire, align 4
27-
2825
; CHECK-NEXT: ldaw r[[R1:[0-9]+]], dp[pool]
26+
; CHECK-NEXT: #MEMBARRIER
2927
; CHECK-NEXT: ldc r[[R2:[0-9]+]], 0
28+
%0 = load atomic i32* bitcast (i64* @pool to i32*) acquire, align 4
3029

3130
; CHECK-NEXT: ld16s r3, r[[R1]][r[[R2]]]
3231
; CHECK-NEXT: #MEMBARRIER

0 commit comments

Comments
 (0)
Please sign in to comment.