This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
MachineSink.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
pr53990-incorrect-machine-sink.ll

Differential D120330

[MachineSink] Fix CFG walk in clobber check (PR53990)
AbandonedPublic

Authored by nikic on Feb 22 2022, 8:09 AM.

Download Raw Diff

Details

Reviewers

shchenz
qcolombet

Summary

MachineSink::hasStoreBetween() determines that the code is "straight-line" by checking that From dominates To and To post-dominates From. Then it does a DFS walk starting at To, but only considers MBBs post-dominated by From.

The critical part this misses, is while the code is, in a certain sense of the word, "straight-line", it may still be part of a loop. To give a simple example:

From ->  BB2
   \    /  A
    v  v   |
     To -> BB3

Here From dominates To, and To post-dominates From, and the current code would only check for a clobber in BB2, but not in BB3. (Unfortunately, I was not able to get MachineSink to trigger with a CFG that simple.)

What we actually want to determine is that no clobber is reverse-reachable from To (without going through From), so implement that, without the post-dominance check.

Fixes https://github.com/llvm/llvm-project/issues/53990.

Diff Detail

Unit TestsFailed

	Time	Test
	60,090 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
	60,290 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
	60,290 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vloxseg.c
	60,310 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vluxseg.c
	60,510 ms	x64 debian > Clang.Driver::aarch64-cpus.c
		View Full Test Results (6 Failed)

Event Timeline

nikic created this revision.Feb 22 2022, 8:09 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptFeb 22 2022, 8:09 AM

nikic requested review of this revision.Feb 22 2022, 8:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 8:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

jmorse added a subscriber: jmorse.Feb 22 2022, 8:17 AM

cuviper added a subscriber: cuviper.Feb 22 2022, 8:51 AM

Harbormaster completed remote builds in B150868: Diff 410538.Feb 22 2022, 9:25 AM

Can we check why the instruction is sunk from a shallower block From to a deeper block To? MachineSinking::isProfitableToSinkTo() should not allow this?

In D120330#3339261, @shchenz wrote:

Can we check why the instruction is sunk from a shallower block From to a deeper block To? MachineSinking::isProfitableToSinkTo() should not allow this?

This happens because the loop is irreducible. The loop depth check is based on MachineLoopInfo, which only handles natural loops. So sinking into irreducible cycles is not prevented by the profitability check.

In D120330#3339665, @nikic wrote:

In D120330#3339261, @shchenz wrote:

Can we check why the instruction is sunk from a shallower block From to a deeper block To? MachineSinking::isProfitableToSinkTo() should not allow this?

This happens because the loop is irreducible. The loop depth check is based on MachineLoopInfo, which only handles natural loops. So sinking into irreducible cycles is not prevented by the profitability check.

Is it possible to exclude this irreducible loop case in isProfitableToSinkTo and add some assertions/checks in hasStoreBetween? The posted patch may largely increase the compile time as now more blocks are checked. Compile-time issue happened before in https://reviews.llvm.org/D86864#2469289

In D120330#3339787, @shchenz wrote:

In D120330#3339665, @nikic wrote:

In D120330#3339261, @shchenz wrote:

Can we check why the instruction is sunk from a shallower block From to a deeper block To? MachineSinking::isProfitableToSinkTo() should not allow this?

This happens because the loop is irreducible. The loop depth check is based on MachineLoopInfo, which only handles natural loops. So sinking into irreducible cycles is not prevented by the profitability check.

Is it possible to exclude this irreducible loop case in isProfitableToSinkTo and add some assertions/checks in hasStoreBetween?

It should be possible to do this by switching MachineSink from using MachineLoopInfo to MachineCycleInfo, which supports irreducible cycles. I think this allows a better profitability decision, but I'm not entirely sure that it would be a sufficient correctness condition, as irreducible cycles are DFS-order dependent. In any case, MachineCycleInfo is a new addition that is not actually used anywhere yet, so I don't think this would be appropriate for an LLVM 14 backport.

The posted patch may largely increase the compile time as now more blocks are checked. Compile-time issue happened before in https://reviews.llvm.org/D86864#2469289

This is still subject to the existing limits on the walks. If there are compile-time issues, then those limits are too large (which, looking at them now, may well be the case: the limits allow walking up to 20 * 2000 instructions, which does not seem like a sensible limit.)

It should be possible to do this by switching MachineSink from using MachineLoopInfo to MachineCycleInfo, which supports irreducible cycles. I think this allows a better profitability decision, but I'm not entirely sure that it would be a sufficient correctness condition, as irreducible cycles are DFS-order dependent. In any case, MachineCycleInfo is a new addition that is not actually used anywhere yet, so I don't think this would be appropriate for an LLVM 14 backport.

Yes, I agree changing MachineLoopInfo to MachineCycleInfo would be a little risky for backport. Could we just simply mark it as not profitable in isProfitableToSinkTo if MBB or SuccToSinkTo is in a loop that contains IrreducibleCFG.

ShrinkWrap.cpp has an example check:

ReversePostOrderTraversal<MachineBasicBlock *> RPOT(&*MF.begin());
if (containsIrreducibleCFG<MachineBasicBlock *>(RPOT, *MLI)) {

For this case, IMO, not allowing the sink should be the right choice as block To is in a deeper loop? Apart from the load/store sink, we may sink some other instructions to a hot loop. And not sure other queries from MachineLoopInfo in this pass have the right behavior for an irreducible loop.

In D120330#3342357, @shchenz wrote:
It should be possible to do this by switching MachineSink from using MachineLoopInfo to MachineCycleInfo, which supports irreducible cycles. I think this allows a better profitability decision, but I'm not entirely sure that it would be a sufficient correctness condition, as irreducible cycles are DFS-order dependent. In any case, MachineCycleInfo is a new addition that is not actually used anywhere yet, so I don't think this would be appropriate for an LLVM 14 backport.

Yes, I agree changing MachineLoopInfo to MachineCycleInfo would be a little risky for backport. Could we just simply mark it as not profitable in isProfitableToSinkTo if MBB or SuccToSinkTo is in a loop that contains IrreducibleCFG.

ShrinkWrap.cpp has an example check:
ReversePostOrderTraversal<MachineBasicBlock *> RPOT(&*MF.begin());
if (containsIrreducibleCFG<MachineBasicBlock *>(RPOT, *MLI)) {
For this case, IMO, not allowing the sink should be the right choice as block To is in a deeper loop? Apart from the load/store sink, we may sink some other instructions to a hot loop. And not sure other queries from MachineLoopInfo in this pass have the right behavior for an irreducible loop.

This is possible, but without MachineCycleInfo, this would be a very crude check: We would completely disable sinking for any functions that contain irreducible control flow (I don't think containsIrreducibleCFG is legal to use on a sub-graph). This seems potentially problematic to me, because irreducible control-flow is more common in the machine back-end than the middle-end. Here's a quick test: https://gist.github.com/nikic/e45e179e9a84c18de86c041158d3067e

I'm generally okay with doing that though, excluding irreducible control-flow entirely is certainly the most conservative fix.

I've opened D120800 for the variant that disables sinking in the presence of irreducible control flow entirely.

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 1:27 AM

nikic mentioned this in rG6fde04395125: [MachineSink] Disable if there are any irreducible cycles.Mar 2 2022, 7:57 AM

Abandoning this in favor of D120800.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineSink.cpp

81 lines

test/

CodeGen/

X86/

pr53990-incorrect-machine-sink.ll

9 lines

Diff 410538

llvm/lib/CodeGen/MachineSink.cpp

Show First 20 Lines • Show All 1,136 Lines • ▼ Show 20 Lines	bool MachineSinking::hasStoreBetween(MachineBasicBlock *From,

if (StoreInstrCache.find(BlockPair) != StoreInstrCache.end())		if (StoreInstrCache.find(BlockPair) != StoreInstrCache.end())
return llvm::any_of(StoreInstrCache[BlockPair], [&](MachineInstr *I) {		return llvm::any_of(StoreInstrCache[BlockPair], [&](MachineInstr *I) {
return I->mayAlias(AA, MI, false);		return I->mayAlias(AA, MI, false);
});		});

bool SawStore = false;		bool SawStore = false;
bool HasAliasedStore = false;		bool HasAliasedStore = false;
DenseSet<MachineBasicBlock *> HandledBlocks;		SmallPtrSet<MachineBasicBlock *, 16> HandledDomBlocks;
DenseSet<MachineBasicBlock *> HandledDomBlocks;		// Go through all block reverse-reachable from To.
// Go through all reachable blocks from From.		SmallVector<MachineBasicBlock *, 16> Worklist;
for (MachineBasicBlock *BB : depth_first(From)) {		append_range(Worklist, To->predecessors());
// We insert the instruction at the start of block To, so no need to worry		while (!Worklist.empty()) {
// about stores inside To.		MachineBasicBlock *BB = Worklist.pop_back_val();
// Store in block From should be already considered when just enter function		if (!HandledDomBlocks.insert(BB).second)
// SinkInstruction.		continue;
if (BB == To \|\| BB == From)
continue;		// Don't walk past the original position of the instruction.
		if (BB == From)
// We already handle this BB in previous iteration.		continue;
if (HandledBlocks.count(BB))
continue;

HandledBlocks.insert(BB);
// To post dominates BB, it must be a path from block From.
if (PDT->dominates(To, BB)) {
if (!HandledDomBlocks.count(BB))
HandledDomBlocks.insert(BB);

// If this BB is too big or the block number in straight line between From		// If this BB is too big or the block number in straight line between From
// and To is too big, stop searching to save compiling time.		// and To is too big, stop searching to save compiling time.
if (BB->size() > SinkLoadInstsPerBlockThreshold \|\|		if (BB->size() > SinkLoadInstsPerBlockThreshold \|\|
HandledDomBlocks.size() > SinkLoadBlocksThreshold) {		HandledDomBlocks.size() > SinkLoadBlocksThreshold) {
for (auto *DomBB : HandledDomBlocks) {		for (auto *DomBB : HandledDomBlocks) {
if (DomBB != BB && DT->dominates(DomBB, BB))		if (DomBB != BB && DT->dominates(DomBB, BB))
HasStoreCache[std::make_pair(DomBB, To)] = true;		HasStoreCache[std::make_pair(DomBB, To)] = true;
else if(DomBB != BB && DT->dominates(BB, DomBB))		else if(DomBB != BB && DT->dominates(BB, DomBB))
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - else if(DomBB != BB && DT->dominates(BB, DomBB)) + else if (DomBB != BB && DT->dominates(BB, DomBB)) Lint: Pre-merge checks: clang-format: please reformat the code ``` - else if(DomBB != BB && DT->dominates(BB…
HasStoreCache[std::make_pair(From, DomBB)] = true;		HasStoreCache[std::make_pair(From, DomBB)] = true;
}		}
HasStoreCache[BlockPair] = true;		HasStoreCache[BlockPair] = true;
return true;		return true;
}		}

for (MachineInstr &I : *BB) {		for (MachineInstr &I : *BB) {
// Treat as alias conservatively for a call or an ordered memory		// Treat as alias conservatively for a call or an ordered memory
// operation.		// operation.
if (I.isCall() \|\| I.hasOrderedMemoryRef()) {		if (I.isCall() \|\| I.hasOrderedMemoryRef()) {
for (auto *DomBB : HandledDomBlocks) {		for (auto *DomBB : HandledDomBlocks) {
if (DomBB != BB && DT->dominates(DomBB, BB))		if (DomBB != BB && DT->dominates(DomBB, BB))
HasStoreCache[std::make_pair(DomBB, To)] = true;		HasStoreCache[std::make_pair(DomBB, To)] = true;
else if(DomBB != BB && DT->dominates(BB, DomBB))		else if(DomBB != BB && DT->dominates(BB, DomBB))
HasStoreCache[std::make_pair(From, DomBB)] = true;		HasStoreCache[std::make_pair(From, DomBB)] = true;
}		}
HasStoreCache[BlockPair] = true;		HasStoreCache[BlockPair] = true;
return true;		return true;
}		}

if (I.mayStore()) {		if (I.mayStore()) {
SawStore = true;		SawStore = true;
// We still have chance to sink MI if all stores between are not		// We still have chance to sink MI if all stores between are not
// aliased to MI.		// aliased to MI.
// Cache all store instructions, so that we don't need to go through		// Cache all store instructions, so that we don't need to go through
// all From reachable blocks for next load instruction.		// all From reachable blocks for next load instruction.
if (I.mayAlias(AA, MI, false))		if (I.mayAlias(AA, MI, false))
HasAliasedStore = true;		HasAliasedStore = true;
StoreInstrCache[BlockPair].push_back(&I);		StoreInstrCache[BlockPair].push_back(&I);
}		}
}		}
}
		append_range(Worklist, BB->predecessors());
}		}
// If there is no store at all, cache the result.		// If there is no store at all, cache the result.
if (!SawStore)		if (!SawStore)
HasStoreCache[BlockPair] = false;		HasStoreCache[BlockPair] = false;
return HasAliasedStore;		return HasAliasedStore;
}		}

/// Sink instructions into loops if profitable. This especially tries to prevent		/// Sink instructions into loops if profitable. This especially tries to prevent
▲ Show 20 Lines • Show All 611 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/pr53990-incorrect-machine-sink.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s

	declare void @clobber()			declare void @clobber()

	define void @test(i1 %c, i64* %p, i64* noalias %p2) nounwind {			define void @test(i1 %c, i64* %p, i64* noalias %p2) nounwind {
	; CHECK-LABEL: test:			; CHECK-LABEL: test:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: pushq %rbp			; CHECK-NEXT: pushq %rbp
	; CHECK-NEXT: pushq %r15
	; CHECK-NEXT: pushq %r14			; CHECK-NEXT: pushq %r14
	; CHECK-NEXT: pushq %rbx			; CHECK-NEXT: pushq %rbx
	; CHECK-NEXT: pushq %rax
	; CHECK-NEXT: movq %rdx, %rbx			; CHECK-NEXT: movq %rdx, %rbx
	; CHECK-NEXT: movq %rsi, %r14			; CHECK-NEXT: movl %edi, %r14d
	; CHECK-NEXT: movl %edi, %r15d			; CHECK-NEXT: movq (%rsi), %rbp
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: jmpq *.LJTI0_0(,%rax,8)			; CHECK-NEXT: jmpq *.LJTI0_0(,%rax,8)
	; CHECK-NEXT: .LBB0_1: # %split.3			; CHECK-NEXT: .LBB0_1: # %split.3
	; CHECK-NEXT: movq (%r14), %rbp			; CHECK-NEXT: testb $1, %r14b
	; CHECK-NEXT: testb $1, %r15b
	; CHECK-NEXT: je .LBB0_3			; CHECK-NEXT: je .LBB0_3
	; CHECK-NEXT: # %bb.2: # %clobber			; CHECK-NEXT: # %bb.2: # %clobber
	; CHECK-NEXT: callq clobber@PLT			; CHECK-NEXT: callq clobber@PLT
	; CHECK-NEXT: .LBB0_3: # %sink			; CHECK-NEXT: .LBB0_3: # %sink
	; CHECK-NEXT: movq %rbp, (%rbx)			; CHECK-NEXT: movq %rbp, (%rbx)
	; CHECK-NEXT: .LBB0_4: # %latch			; CHECK-NEXT: .LBB0_4: # %latch
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	Show All 38 Lines