This is an archive of the discontinued LLVM Phabricator instance.

Disallow sinking of unordered atomic loads into loops
ClosedPublic

Authored by DaniilSuchkov on Sep 29 2017, 1:57 AM.

Download Raw Diff

Details

Reviewers

danielcdh
hfinkel
efriedma
reames

Commits

rG0c8dd052b828: [LICM] Disallow sinking of unordered atomic loads into loops
rL315438: [LICM] Disallow sinking of unordered atomic loads into loops

Summary

Sinking of unordered atomic load into loop must be disallowed because it turns a single load into multiple loads.
The relevant section of the documentation is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for Optimizers section.
Here is the full text of this section:

Notes for optimizers
In terms of the optimizer, this prohibits any transformation that transforms a single load into multiple loads, transforms a store into multiple stores, narrows a store, or stores a value which would not be stored otherwise. Some examples of unsafe optimizations are narrowing an assignment into a bitfield, rematerializing a load, and turning loads and stores into a memcpy call. Reordering unordered operations is safe, though, and optimizers should take advantage of that because unordered operations are common in languages that need them.

Diff Detail

Repository: rL LLVM

Event Timeline

DaniilSuchkov created this revision.Sep 29 2017, 1:57 AM

Reviewers: Note that there's a lot of context in previous discussion about the semantics of various types of nonatomic, unordered, and ordered loads in https://reviews.llvm.org/D37463. Comment https://reviews.llvm.org/D37463#882487 is particularly relevant for this review. It's the direct motivation for this patch.

reames requested changes to this revision.Sep 29 2017, 9:01 AM

reames added inline comments.

lib/Transforms/Scalar/LICM.cpp
582 ↗	(On Diff #117100)	Deriving this from whether an optional parameter is present is very fragile. Please just add another parameter. Reads further: Oh, you're not introducing this, just hoisting it in the function. Mind doing a separate NFC patch to expose the parameter? This could be either before this patch (no further review needed) or after, I don't care.
588 ↗	(On Diff #117100)	The placement of this check is wrong. Duplicating a load from an invariant/constant memory location is fine. Move this down and add a respective test case.

This revision now requires changes to proceed.Sep 29 2017, 9:01 AM

Is it legal to hoist an atomic load out of the loop?

In D38392#884553, @danielcdh wrote:

Is it legal to hoist an atomic load out of the loop?

Yes, that strictly reduces the number of loads. You can think of hoisting as reusing the load from the first iteration across future iterations and just moving where in the first iteration that load runs. As long as that repositioning is legal, the rest of the transform is just basic FRE.

How about the following case:

while(c1) {
if (c2)

atomic_load;

}

Looks like isSafeToExecuteUnconditionally will not prevent the atomic_load from hoisted to preheader. So in real execution, it may introduce extra atomic loads?

In D38392#884577, @danielcdh wrote:
How about the following case:

while(c1) {
if (c2)
atomic_load;
}

Looks like isSafeToExecuteUnconditionally will not prevent the atomic_load from hoisted to preheader. So in real execution, it may introduce extra atomic loads?

From an aliasing/ordering perspective nothing has changed in this example. We have an additional safety requirement (we can't introduce faults), but we haven't changed anything about the memory model legality of the reordering.

In the following example:

while(true) {

a1 = foo();
bar(a1);
while(true) {
  if (baz())
    return atomic_load(a1);
}

}

The atomic_load(a1) will be executed once without hoisting. But with LICM, it may be called multiple times.

In D38392#884816, @danielcdh wrote:
In the following example:

while(true) {
a1 = foo();
bar(a1);
while(true) {
  if (baz())
    return atomic_load(a1);
}
}

The atomic_load(a1) will be executed once without hoisting. But with LICM, it may be called multiple times.

I can't make out your example enough to assess what you're intent is. As a general comment, keep in mind that *loading* multiple times is not a problem if only one of them is *used*.

I see your point now. My concern is performance: if we allow hoisting of atomic load, but not allow sinking it, we may end up with bad performance as we may have too much redundant atomic loads in the preheader. Any suggestions on how to solve that?

In D38392#885108, @danielcdh wrote:

I see your point now. My concern is performance: if we allow hoisting of atomic load, but not allow sinking it, we may end up with bad performance as we may have too much redundant atomic loads in the preheader. Any suggestions on how to solve that?

Not really. I can say that we've been running performance tests for months in this configuration (LICM hoisting unordered loads, no LoopSink) without noticing any problems. I'm not immediately concerned.

In D38392#885134, @reames wrote:

In D38392#885108, @danielcdh wrote:

I see your point now. My concern is performance: if we allow hoisting of atomic load, but not allow sinking it, we may end up with bad performance as we may have too much redundant atomic loads in the preheader. Any suggestions on how to solve that?

Not really. I can say that we've been running performance tests for months in this configuration (LICM hoisting unordered loads, no LoopSink) without noticing any problems. I'm not immediately concerned.

Sorry, not sure if I follow the logic, could you explain why you don't think this is a performance concern?

In D38392#885147, @danielcdh wrote:

In D38392#885134, @reames wrote:

In D38392#885108, @danielcdh wrote:

I see your point now. My concern is performance: if we allow hoisting of atomic load, but not allow sinking it, we may end up with bad performance as we may have too much redundant atomic loads in the preheader. Any suggestions on how to solve that?

Not really. I can say that we've been running performance tests for months in this configuration (LICM hoisting unordered loads, no LoopSink) without noticing any problems. I'm not immediately concerned.

Sorry, not sure if I follow the logic, could you explain why you don't think this is a performance concern?

I acknowledge it's a potential issue. I have no good ideas for a solution. I don't believe it's a current problem and evidence to believe that it's relatively minor. (i.e. we haven't seen it)

Now sinking of unordered atomic loads from constant/invariant memory is allowed, corresponding test case added.
I am going to expose SafetyInfo in follow up patch.

DaniilSuchkov added inline comments.Oct 5 2017, 10:06 PM

lib/Transforms/Scalar/LICM.cpp

582 ↗

(On Diff #117100)

Here is declaration of canSinkOrHoistInst (LoopUtils.h):

/// Returns true if the hoister and sinker can handle this instruction.
/// If SafetyInfo is null, we are checking for sinking instructions from
/// preheader to loop body (no speculation).
/// If SafetyInfo is not null, we are checking for hoisting/sinking
/// instructions from loop body to preheader/exit. Check if the instruction
/// can execute speculatively.
/// If \p ORE is set use it to emit optimization remarks.
bool canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
                        Loop *CurLoop, AliasSetTracker *CurAST,
                        LoopSafetyInfo *SafetyInfo,
                        OptimizationRemarkEmitter *ORE = nullptr);

As you can see it is safe to use SafetyInfo for such check. Actually this line is the only use of SafetyInfo in the function, so I am going to replace this parameter with bool in a separate NFC after this patch.

Ping

LGTM

This revision is now accepted and ready to land.Oct 10 2017, 12:00 PM

Closed by commit rL315438: [LICM] Disallow sinking of unordered atomic loads into loops (authored by mkazantsev). · Explain WhyOct 11 2017, 12:27 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

LICM.cpp

14 lines

test/

Transforms/

LICM/

loopsink.ll

95 lines

Diff 118549

llvm/trunk/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	static bool isLoadInvariantInLoop(LoadInst LI, DominatorTree DT,

return false;		return false;
}		}

bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,		bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
Loop CurLoop, AliasSetTracker CurAST,		Loop CurLoop, AliasSetTracker CurAST,
LoopSafetyInfo *SafetyInfo,		LoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
		// SafetyInfo is nullptr if we are checking for sinking from preheader to
		// loop body.
		const bool SinkingToLoopBody = !SafetyInfo;
// Loads have extra constraints we have to verify before we can hoist them.		// Loads have extra constraints we have to verify before we can hoist them.
if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {		if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {
if (!LI->isUnordered())		if (!LI->isUnordered())
return false; // Don't hoist volatile/atomic loads!		return false; // Don't sink/hoist volatile or ordered atomic loads!

// Loads from constant memory are always safe to move, even if they end up		// Loads from constant memory are always safe to move, even if they end up
// in the same alias set as something that ends up being modified.		// in the same alias set as something that ends up being modified.
if (AA->pointsToConstantMemory(LI->getOperand(0)))		if (AA->pointsToConstantMemory(LI->getOperand(0)))
return true;		return true;
if (LI->getMetadata(LLVMContext::MD_invariant_load))		if (LI->getMetadata(LLVMContext::MD_invariant_load))
return true;		return true;

		if (LI->isAtomic() && SinkingToLoopBody)
		return false; // Don't sink unordered atomic loads to loop body.

// This checks for an invariant.start dominating the load.		// This checks for an invariant.start dominating the load.
if (isLoadInvariantInLoop(LI, DT, CurLoop))		if (isLoadInvariantInLoop(LI, DT, CurLoop))
return true;		return true;

// Don't hoist loads which have may-aliased stores in loop.		// Don't hoist loads which have may-aliased stores in loop.
uint64_t Size = 0;		uint64_t Size = 0;
if (LI->getType()->isSized())		if (LI->getType()->isSized())
Size = I.getModule()->getDataLayout().getTypeStoreSize(LI->getType());		Size = I.getModule()->getDataLayout().getTypeStoreSize(LI->getType());
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
// Only these instructions are hoistable/sinkable.		// Only these instructions are hoistable/sinkable.
if (!isa<BinaryOperator>(I) && !isa<CastInst>(I) && !isa<SelectInst>(I) &&		if (!isa<BinaryOperator>(I) && !isa<CastInst>(I) && !isa<SelectInst>(I) &&
!isa<GetElementPtrInst>(I) && !isa<CmpInst>(I) &&		!isa<GetElementPtrInst>(I) && !isa<CmpInst>(I) &&
!isa<InsertElementInst>(I) && !isa<ExtractElementInst>(I) &&		!isa<InsertElementInst>(I) && !isa<ExtractElementInst>(I) &&
!isa<ShuffleVectorInst>(I) && !isa<ExtractValueInst>(I) &&		!isa<ShuffleVectorInst>(I) && !isa<ExtractValueInst>(I) &&
!isa<InsertValueInst>(I))		!isa<InsertValueInst>(I))
return false;		return false;

// SafetyInfo is nullptr if we are checking for sinking from preheader to		// If we are checking for sinking from preheader to loop body it will be
// loop body. It will be always safe as there is no speculative execution.		// always safe as there is no speculative execution.
if (!SafetyInfo)		if (SinkingToLoopBody)
return true;		return true;

// TODO: Plumb the context instruction through to make hoisting and sinking		// TODO: Plumb the context instruction through to make hoisting and sinking
// more powerful. Hoisting of loads already works due to the special casing		// more powerful. Hoisting of loads already works due to the special casing
// above.		// above.
return isSafeToExecuteUnconditionally(I, DT, CurLoop, SafetyInfo, nullptr);		return isSafeToExecuteUnconditionally(I, DT, CurLoop, SafetyInfo, nullptr);
}		}

▲ Show 20 Lines • Show All 719 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LICM/loopsink.ll

; RUN: opt -S -loop-sink < %s \| FileCheck %s		; RUN: opt -S -loop-sink < %s \| FileCheck %s
; RUN: opt -S -passes=loop-sink < %s \| FileCheck %s		; RUN: opt -S -aa-pipeline=basic-aa -passes=loop-sink < %s \| FileCheck %s

@g = global i32 0, align 4		@g = global i32 0, align 4

; b1		; b1
; / \		; / \
; b2 b6		; b2 b6
; / \ \|		; / \ \|
; b3 b4 \|		; b3 b4 \|
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	.b7:
%t7 = add nuw nsw i32 %iv, 1		%t7 = add nuw nsw i32 %iv, 1
%c7 = icmp eq i32 %t7, %p7		%c7 = icmp eq i32 %t7, %p7
br i1 %c7, label %.b1, label %.exit, !prof !3		br i1 %c7, label %.b1, label %.exit, !prof !3

.exit:		.exit:
ret i32 10		ret i32 10
}		}

		; b1
		; / \
		; b2 b3
		; \ /
		; b4
		; preheader: 1000
		; b2: 15
		; b3: 7
		; Do not sink unordered atomic load to b2
		; CHECK: t6
		; CHECK: .preheader:
		; CHECK: load atomic i32, i32* @g unordered, align 4
		; CHECK: .b2:
		; CHECK-NOT: load atomic i32, i32* @g unordered, align 4
		define i32 @t6(i32, i32) #0 !prof !0 {
		%3 = icmp eq i32 %1, 0
		br i1 %3, label %.exit, label %.preheader

		.preheader:
		%invariant = load atomic i32, i32* @g unordered, align 4
		br label %.b1

		.b1:
		%iv = phi i32 [ %t3, %.b4 ], [ 0, %.preheader ]
		%c1 = icmp sgt i32 %iv, %0
		br i1 %c1, label %.b2, label %.b3, !prof !1

		.b2:
		%t1 = add nsw i32 %invariant, %iv
		br label %.b4

		.b3:
		%t2 = add nsw i32 %iv, 100
		br label %.b4

		.b4:
		%p1 = phi i32 [ %t2, %.b3 ], [ %t1, %.b2 ]
		%t3 = add nuw nsw i32 %iv, 1
		%c2 = icmp eq i32 %t3, %p1
		br i1 %c2, label %.b1, label %.exit, !prof !3

		.exit:
		ret i32 10
		}

		@g_const = constant i32 0, align 4

		; b1
		; / \
		; b2 b3
		; \ /
		; b4
		; preheader: 1000
		; b2: 0.5
		; b3: 999.5
		; Sink unordered atomic load to b2. It is allowed to sink into loop unordered
		; load from constant.
		; CHECK: t7
		; CHECK: .preheader:
		; CHECK-NOT: load atomic i32, i32* @g_const unordered, align 4
		; CHECK: .b2:
		; CHECK: load atomic i32, i32* @g_const unordered, align 4
		define i32 @t7(i32, i32) #0 !prof !0 {
		%3 = icmp eq i32 %1, 0
		br i1 %3, label %.exit, label %.preheader

		.preheader:
		%invariant = load atomic i32, i32* @g_const unordered, align 4
		br label %.b1

		.b1:
		%iv = phi i32 [ %t3, %.b4 ], [ 0, %.preheader ]
		%c1 = icmp sgt i32 %iv, %0
		br i1 %c1, label %.b2, label %.b3, !prof !1

		.b2:
		%t1 = add nsw i32 %invariant, %iv
		br label %.b4

		.b3:
		%t2 = add nsw i32 %iv, 100
		br label %.b4

		.b4:
		%p1 = phi i32 [ %t2, %.b3 ], [ %t1, %.b2 ]
		%t3 = add nuw nsw i32 %iv, 1
		%c2 = icmp eq i32 %t3, %p1
		br i1 %c2, label %.b1, label %.exit, !prof !3

		.exit:
		ret i32 10
		}

declare i32 @foo()		declare i32 @foo()

!0 = !{!"function_entry_count", i64 1}		!0 = !{!"function_entry_count", i64 1}
!1 = !{!"branch_weights", i32 1, i32 2000}		!1 = !{!"branch_weights", i32 1, i32 2000}
!2 = !{!"branch_weights", i32 2000, i32 1}		!2 = !{!"branch_weights", i32 2000, i32 1}
!3 = !{!"branch_weights", i32 100, i32 1}		!3 = !{!"branch_weights", i32 100, i32 1}