This is an archive of the discontinued LLVM Phabricator instance.

Fold constant & invariant loads into uses over barrier instructions
AbandonedPublic

Authored by reames on Mar 14 2019, 12:09 PM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
baldrick
lhames

Summary

PeepholeOpt knows how to fold loads into using instructions, but if we encounter an instruction w/store semantics, we discard all candidates. This patch relaxes that slightly for memory accesses which are known to be invariant. This is one step towards a more general selective clearing based on aliasing, but doing the full aliasing scheme efficiently would be challenging.

There are two design questions which come up where I need input from reviewers:

Currently, loads from constant global variables are not treated as invariant when queried from PeepholdOpt. This is because PeepholeOpt doesn't have access to AA, and the query function uses pointsToConstantMemory. I can tackle this in three ways: a) add direct GV checking in the query routine, or b) set the associated flag in SslectionDAG, or c) pipe through AA. (c) would be complicated, so I'm tentatively rejecting that. Out of (a) and (b), what is more consistent with overall design? I've included both in the patch so that you can see what they look like.
The patch implemented waits until a fold barrier is encountered, and the selectively filtered the sets. Unfortunately, this makes the operation O(n^2) in the worst case. Other options would be to maintain a separate InvariantFoldSet - O(n) - or to simply query each operand at the using instruction - O(operands), but provides cross block functionality. What would folks prefer to see?

p.s. The current tests are all atomic loads, but there's nothing atomic specific about this. I'll add new tests, and rebase once the design questions are mostly settled.

Diff Detail

Event Timeline

reames created this revision.Mar 14 2019, 12:09 PM

Herald added subscribers: jdoerfert, jfb, bollu, mcrosier. · View Herald TranscriptMar 14 2019, 12:09 PM

Ready for review.

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2019, 10:50 AM

ping?

jfb added inline comments.Apr 19 2019, 12:17 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4555	?
4574	?

reames marked an inline comment as done.Apr 22 2019, 10:13 AM

reames added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4555	See the patch comment (part 1)

Abandoning an old review I'm not going to return to any time soon.

Herald added a subscriber: pengfei. · View Herald TranscriptOct 15 2021, 11:58 AM

Revision Contents

Path

Size

lib/

CodeGen/

MachineInstr.cpp

5 lines

PeepholeOptimizer.cpp

9 lines

SelectionDAG/

SelectionDAGBuilder.cpp

12 lines

test/

CodeGen/

X86/

atomic-unordered.ll

16 lines

Diff 191532

lib/CodeGen/MachineInstr.cpp

Show First 20 Lines • Show All 1,321 Lines • ▼ Show 20 Lines	if (MMO->isInvariant() && MMO->isDereferenceable())
continue;		continue;

// A load from a constant PseudoSourceValue is invariant.		// A load from a constant PseudoSourceValue is invariant.
if (const PseudoSourceValue *PSV = MMO->getPseudoValue())		if (const PseudoSourceValue *PSV = MMO->getPseudoValue())
if (PSV->isConstant(&MFI))		if (PSV->isConstant(&MFI))
continue;		continue;

if (const Value *V = MMO->getValue()) {		if (const Value *V = MMO->getValue()) {
		// In case we don't have AA, handle a few common cases
		if (auto *GV = dyn_cast<GlobalVariable>(V))
		if (GV->isConstant() &&
		MMO->getSize() <= GV->getParent()->getDataLayout().getTypeStoreSize(GV->getType()))
		continue;
// If we have an AliasAnalysis, ask it whether the memory is constant.		// If we have an AliasAnalysis, ask it whether the memory is constant.
if (AA &&		if (AA &&
AA->pointsToConstantMemory(		AA->pointsToConstantMemory(
MemoryLocation(V, MMO->getSize(), MMO->getAAInfo())))		MemoryLocation(V, MMO->getSize(), MMO->getAAInfo())))
continue;		continue;
}		}

// Otherwise assume conservatively.		// Otherwise assume conservatively.
▲ Show 20 Lines • Show All 828 Lines • Show Last 20 Lines

lib/CodeGen/PeepholeOptimizer.cpp

Show First 20 Lines • Show All 1,790 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator MII = MBB.begin(), MIE = MBB.end();
}		}
}		}

// If we run into an instruction we can't fold across, discard		// If we run into an instruction we can't fold across, discard
// the load candidates. Note: We might be able to fold into this		// the load candidates. Note: We might be able to fold into this
// instruction, so this needs to be after the folding logic.		// instruction, so this needs to be after the folding logic.
if (MI->isLoadFoldBarrier()) {		if (MI->isLoadFoldBarrier()) {
LLVM_DEBUG(dbgs() << "Encountered load fold barrier on " << *MI);		LLVM_DEBUG(dbgs() << "Encountered load fold barrier on " << *MI);
FoldAsLoadDefCandidates.clear();		SmallSet<unsigned, 16> InvariantDefs;
		for (unsigned VReg : FoldAsLoadDefCandidates) {
		auto *DefMI = MRI->getVRegDef(VReg);
		if (!DefMI->isDereferenceableInvariantLoad(nullptr))
		continue;
		InvariantDefs.insert(VReg);
		}
		FoldAsLoadDefCandidates = InvariantDefs;
}		}
}		}
}		}

return Changed;		return Changed;
}		}

ValueTrackerResult ValueTracker::getNextSourceFromCopy() {		ValueTrackerResult ValueTracker::getNextSourceFromCopy() {
▲ Show 20 Lines • Show All 303 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,546 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitFence(const FenceInst &I) {
DAG.setRoot(DAG.getNode(ISD::ATOMIC_FENCE, dl, MVT::Other, Ops));		DAG.setRoot(DAG.getNode(ISD::ATOMIC_FENCE, dl, MVT::Other, Ops));
}		}

void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {		void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
SDLoc dl = getCurSDLoc();		SDLoc dl = getCurSDLoc();
AtomicOrdering Order = I.getOrdering();		AtomicOrdering Order = I.getOrdering();
SyncScope::ID SSID = I.getSyncScopeID();		SyncScope::ID SSID = I.getSyncScopeID();

		#if 0
		jfbUnsubmitted Not Done Reply Inline Actions ? jfb: ?
		reamesAuthorUnsubmitted Done Reply Inline Actions See the patch comment (part 1) reames: See the patch comment (part 1)
		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);
		#endif

SDValue InChain = getRoot();		SDValue InChain = getRoot();

const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());

if (!TLI.supportsUnalignedAtomics() &&		if (!TLI.supportsUnalignedAtomics() &&
I.getAlignment() < VT.getStoreSize())		I.getAlignment() < VT.getStoreSize())
report_fatal_error("Cannot generate unaligned atomic load");		report_fatal_error("Cannot generate unaligned atomic load");

auto Flags = MachineMemOperand::MOLoad;		auto Flags = MachineMemOperand::MOLoad;
if (I.isVolatile())		if (I.isVolatile())
Flags \|= MachineMemOperand::MOVolatile;		Flags \|= MachineMemOperand::MOVolatile;
if (I.getMetadata(LLVMContext::MD_invariant_load) != nullptr)		if (I.getMetadata(LLVMContext::MD_invariant_load) != nullptr)
Flags \|= MachineMemOperand::MOInvariant;		Flags \|= MachineMemOperand::MOInvariant;
		#if 0
		jfbUnsubmitted Not Done Reply Inline Actions ? jfb: ?
		if (AA &&
		AA->pointsToConstantMemory(MemoryLocation(I.getPointerOperand(),
		LocationSize::precise(DAG.getDataLayout().getTypeStoreSize(I.getType())),
		AAInfo)))
		Flags \|= MachineMemOperand::MOInvariant;
		#endif
if (isDereferenceablePointer(I.getPointerOperand(), DAG.getDataLayout()))		if (isDereferenceablePointer(I.getPointerOperand(), DAG.getDataLayout()))
Flags \|= MachineMemOperand::MODereferenceable;		Flags \|= MachineMemOperand::MODereferenceable;

Flags \|= TLI.getMMOFlags(I);		Flags \|= TLI.getMMOFlags(I);

MachineMemOperand *MMO =		MachineMemOperand *MMO =
DAG.getMachineFunction().		DAG.getMachineFunction().
getMachineMemOperand(MachinePointerInfo(I.getPointerOperand()),		getMachineMemOperand(MachinePointerInfo(I.getPointerOperand()),
▲ Show 20 Lines • Show All 6,173 Lines • Show Last 20 Lines

test/CodeGen/X86/atomic-unordered.ll

	Show First 20 Lines • Show All 2,510 Lines • ▼ Show 20 Lines
	; CHECK-O0: # %bb.0:			; CHECK-O0: # %bb.0:
	; CHECK-O0-NEXT: movq {{.*}}(%rip), %rax			; CHECK-O0-NEXT: movq {{.*}}(%rip), %rax
	; CHECK-O0-NEXT: movq $5, (%rdi)			; CHECK-O0-NEXT: movq $5, (%rdi)
	; CHECK-O0-NEXT: addq %rsi, %rax			; CHECK-O0-NEXT: addq %rsi, %rax
	; CHECK-O0-NEXT: retq			; CHECK-O0-NEXT: retq
	;			;
	; CHECK-O3-LABEL: fold_constant_clobber:			; CHECK-O3-LABEL: fold_constant_clobber:
	; CHECK-O3: # %bb.0:			; CHECK-O3: # %bb.0:
	; CHECK-O3-NEXT: movq {{.*}}(%rip), %rax			; CHECK-O3-NEXT: movq %rsi, %rax
	; CHECK-O3-NEXT: movq $5, (%rdi)			; CHECK-O3-NEXT: movq $5, (%rdi)
	; CHECK-O3-NEXT: addq %rsi, %rax			; CHECK-O3-NEXT: addq {{.*}}(%rip), %rax
	; CHECK-O3-NEXT: retq			; CHECK-O3-NEXT: retq
	%v = load atomic i64, i64* @Constant unordered, align 8			%v = load atomic i64, i64* @Constant unordered, align 8
	store i64 5, i64* %p			store i64 5, i64* %p
	%ret = add i64 %v, %arg			%ret = add i64 %v, %arg
	ret i64 %ret			ret i64 %ret
	}			}

	define i64 @fold_constant_fence(i64 %arg) {			define i64 @fold_constant_fence(i64 %arg) {
	; CHECK-O0-LABEL: fold_constant_fence:			; CHECK-O0-LABEL: fold_constant_fence:
	; CHECK-O0: # %bb.0:			; CHECK-O0: # %bb.0:
	; CHECK-O0-NEXT: movq {{.*}}(%rip), %rax			; CHECK-O0-NEXT: movq {{.*}}(%rip), %rax
	; CHECK-O0-NEXT: mfence			; CHECK-O0-NEXT: mfence
	; CHECK-O0-NEXT: addq %rdi, %rax			; CHECK-O0-NEXT: addq %rdi, %rax
	; CHECK-O0-NEXT: retq			; CHECK-O0-NEXT: retq
	;			;
	; CHECK-O3-LABEL: fold_constant_fence:			; CHECK-O3-LABEL: fold_constant_fence:
	; CHECK-O3: # %bb.0:			; CHECK-O3: # %bb.0:
	; CHECK-O3-NEXT: movq {{.*}}(%rip), %rax			; CHECK-O3-NEXT: movq %rdi, %rax
	; CHECK-O3-NEXT: mfence			; CHECK-O3-NEXT: mfence
	; CHECK-O3-NEXT: addq %rdi, %rax			; CHECK-O3-NEXT: addq {{.*}}(%rip), %rax
	; CHECK-O3-NEXT: retq			; CHECK-O3-NEXT: retq
	%v = load atomic i64, i64* @Constant unordered, align 8			%v = load atomic i64, i64* @Constant unordered, align 8
	fence seq_cst			fence seq_cst
	%ret = add i64 %v, %arg			%ret = add i64 %v, %arg
	ret i64 %ret			ret i64 %ret
	}			}

	define i64 @fold_invariant_clobber(i64* dereferenceable(8) %p, i64 %arg) {			define i64 @fold_invariant_clobber(i64* dereferenceable(8) %p, i64 %arg) {
	; CHECK-O0-LABEL: fold_invariant_clobber:			; CHECK-O0-LABEL: fold_invariant_clobber:
	; CHECK-O0: # %bb.0:			; CHECK-O0: # %bb.0:
	; CHECK-O0-NEXT: movq (%rdi), %rax			; CHECK-O0-NEXT: movq (%rdi), %rax
	; CHECK-O0-NEXT: movq $5, (%rdi)			; CHECK-O0-NEXT: movq $5, (%rdi)
	; CHECK-O0-NEXT: addq %rsi, %rax			; CHECK-O0-NEXT: addq %rsi, %rax
	; CHECK-O0-NEXT: retq			; CHECK-O0-NEXT: retq
	;			;
	; CHECK-O3-LABEL: fold_invariant_clobber:			; CHECK-O3-LABEL: fold_invariant_clobber:
	; CHECK-O3: # %bb.0:			; CHECK-O3: # %bb.0:
	; CHECK-O3-NEXT: movq (%rdi), %rax			; CHECK-O3-NEXT: movq %rsi, %rax
	; CHECK-O3-NEXT: movq $5, (%rdi)			; CHECK-O3-NEXT: movq $5, (%rdi)
	; CHECK-O3-NEXT: addq %rsi, %rax			; CHECK-O3-NEXT: addq (%rdi), %rax
	; CHECK-O3-NEXT: retq			; CHECK-O3-NEXT: retq
	%v = load atomic i64, i64* %p unordered, align 8, !invariant.load !{}			%v = load atomic i64, i64* %p unordered, align 8, !invariant.load !{}
	store i64 5, i64* %p			store i64 5, i64* %p
	%ret = add i64 %v, %arg			%ret = add i64 %v, %arg
	ret i64 %ret			ret i64 %ret
	}			}


	define i64 @fold_invariant_fence(i64* dereferenceable(8) %p, i64 %arg) {			define i64 @fold_invariant_fence(i64* dereferenceable(8) %p, i64 %arg) {
	; CHECK-O0-LABEL: fold_invariant_fence:			; CHECK-O0-LABEL: fold_invariant_fence:
	; CHECK-O0: # %bb.0:			; CHECK-O0: # %bb.0:
	; CHECK-O0-NEXT: movq (%rdi), %rdi			; CHECK-O0-NEXT: movq (%rdi), %rdi
	; CHECK-O0-NEXT: mfence			; CHECK-O0-NEXT: mfence
	; CHECK-O0-NEXT: addq %rsi, %rdi			; CHECK-O0-NEXT: addq %rsi, %rdi
	; CHECK-O0-NEXT: movq %rdi, %rax			; CHECK-O0-NEXT: movq %rdi, %rax
	; CHECK-O0-NEXT: retq			; CHECK-O0-NEXT: retq
	;			;
	; CHECK-O3-LABEL: fold_invariant_fence:			; CHECK-O3-LABEL: fold_invariant_fence:
	; CHECK-O3: # %bb.0:			; CHECK-O3: # %bb.0:
	; CHECK-O3-NEXT: movq (%rdi), %rax			; CHECK-O3-NEXT: movq %rsi, %rax
	; CHECK-O3-NEXT: mfence			; CHECK-O3-NEXT: mfence
	; CHECK-O3-NEXT: addq %rsi, %rax			; CHECK-O3-NEXT: addq (%rdi), %rax
	; CHECK-O3-NEXT: retq			; CHECK-O3-NEXT: retq
	%v = load atomic i64, i64* %p unordered, align 8, !invariant.load !{}			%v = load atomic i64, i64* %p unordered, align 8, !invariant.load !{}
	fence seq_cst			fence seq_cst
	%ret = add i64 %v, %arg			%ret = add i64 %v, %arg
	ret i64 %ret			ret i64 %ret
	}			}