Download Raw Diff

Details

Reviewers

rtereshin
dsanders

Summary

At the moment, MachineCSE allows CSE-ing convergent instrs which are non-local to each other. This can cause illegal codegen as convergent instrs are control flow dependent. The patch prevents non-local CSE of convergent instrs by adding a check in isProfitableToCSE and rejecting CSE-ing if we're considering CSE-ing non-local convergent instrs. We can still CSE convergent instrs which are in the same control flow scope, so the patch purposely does not make all convergent instrs non-CSE candidates in isCSECandidate.

Diff Detail

Event Timeline

mkitzan created this revision.Apr 23 2021, 11:02 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 23 2021, 11:02 AM

mkitzan requested review of this revision.Apr 23 2021, 11:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 11:02 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

The change LGTM but could you add a test case? There probably aren't many convergent instructions upstream but it should be possible to make a test case using AMDGPU instructions or via G_INTRINSIC

Harbormaster completed remote builds in B100630: Diff 340107.Apr 23 2021, 1:39 PM

Update: added handwritten MIR unit test for the MachineCSE change using AMDGPU's DS_SWIZZLE_B32 instr (which is marked isConvergent in llvm/lib/Target/AMDGPU/DSInstructions.td)

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptApr 23 2021, 3:09 PM

dsanders accepted this revision.Apr 23 2021, 4:15 PM

dsanders added inline comments.

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
9	CHECK-LABEL is about partitioning the input into multiple pieces that can be checked independently rather than about labels. LGTM with this and the bb.2 one below as either CHECK/CHECK-NEXT

This revision is now accepted and ready to land.Apr 23 2021, 4:15 PM

Ah ok, good to know. Thanks for the review! Changing them to CHECK.

Update: changed basic block checks from CHECK-LABEL to CHECK

mkitzan mentioned this in rG59f2dd5f1acd: [MachineCSE] Prevent CSE of non-local convergent instrs.Apr 23 2021, 4:53 PM

Harbormaster completed remote builds in B100679: Diff 340176.Apr 23 2021, 5:25 PM

dsanders requested changes to this revision.Apr 23 2021, 6:17 PM

dsanders added inline comments.

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53	It's been pointed out to me off-list that CSE'ing to here isn't actually banned by isConvergent, it's just one of the cases we conservatively decline to CSE in the change. To be covered by isConvergent it'd have to be CSE'd into a more/differently predicated block (less is ok). Furthermore the other the cases where we wouldn't be conservative are already prevented by other checks in CSE. If we can find the field we actually mean this patch will only need a small change. I haven't been able to find it though, it doesn't seem to exist in the backend and that's probably what's gotten me confused (I don't think this is the first time either :-)) That actually reminded me of something else to double check: Does this CSE without the change too?

This revision now requires changes to proceed.Apr 23 2021, 6:17 PM

arsenm added a subscriber: arsenm.Apr 23 2021, 6:22 PM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53	The definition of convergent is pretty broken. For AMDGPU in the MIR the control flow as represented by basic blocks no longer expresses the lane level CFG which we're concerned with for convergent ops. In the future when we have convergence tokens, it's not clear to me if we'll somehow preserve those through codegen. It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal

Harbormaster completed remote builds in B100696: Diff 340193.Apr 23 2021, 6:44 PM

rtereshin added inline comments.Apr 23 2021, 6:59 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53	Maybe we should at least put a comment with just about that, Matt, on the change made in MachineCSE? Otherwise I'm afraid it's way too easy to remove the check and be technically right about it. Thoughts? A green light from AMDGPU for this patch though is very helpful, thank you.

lkail added a subscriber: lkail.Apr 24 2021, 1:17 AM

foad added a subscriber: foad.Apr 26 2021, 1:40 AM

foad added inline comments.

llvm/lib/CodeGen/MachineCSE.cpp
437	If this is a correctness issue then surely it should not be done inside "is profitable to cse"?

dsanders added inline comments.Apr 26 2021, 2:58 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53	It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR.

dsanders added inline comments.Apr 26 2021, 3:01 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53	I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR. Just to clarify, I don't mean to prevent cases within the same BB there. Those happen and are legal and worthwhile

My suggestion is to keep making progress here:

move the check out of is profitable to processBlock top level
put a comprehensive comment on it outlining the issues discussed here (and off fabricator) so far
do (2) in the test as well (and keep the test otherwise as is)

The issues include:
a) isConvergent as of current definition in LLVM does not prove cross-block MachineCSE illegal, however, with the change MachineCSE pass takes the liberty to extend the definition of isConvergent as a practical necessity. The extension is: "assume it is illegal to make a convergent operation dependent not only on additional conditions, but also on fewer conditions than originally"
b) The current open source GPU backends as is do not appear to allow a reasonably simple test case that provably and undeniably functionally breaks w/o the MachineCSE change proposed, as a result, the test being added is merely a coverage test for the change being made, not a reproducer of an actual (execution) problem in AMDGPU backend.

And we merge it from there. This is a conditional LGTM from me, conditions are above. Thanks!

Following Roman's suggestions, the update:

Move the code preventing CSE of isConvergent instrs into ProcessBlockCSE from isProfitableToCSE
Adds comments in MachineCSE and the test explaining why isConvergent is checked to prevent CSE
Adds comment in the test explaining the test is not reproducing an AMDGPU backend bug, but rather is a coverage test for the MachineCSE change

Thanks for all the feedback!

Harbormaster completed remote builds in B101502: Diff 341316.Apr 28 2021, 4:38 PM

foad added inline comments.Apr 29 2021, 2:40 AM

llvm/lib/CodeGen/MachineCSE.cpp
600	Do we also need this check in ProcessBlockPRE?

LGTM with the check in ProcessBlockPRE

llvm/lib/CodeGen/MachineCSE.cpp
600	I think it's needed there too

This revision is now accepted and ready to land.Apr 29 2021, 9:57 AM

rtereshin added inline comments.Apr 29 2021, 2:03 PM

llvm/lib/CodeGen/MachineCSE.cpp
600	@mkitzan IIUC (which might be not the case) PRE not checking for isConvergent is a genuine bug, unlike the CSE part: PRE moves ops into predicated blocks, making them more predicated than before, which is illegal for isConvergent. If that's the case, perhaps in case of PRE `isConvergent` check could be a part of `isPRECandidate`.

Update:

Added isConvergent check in ProcessBlockPRE

Note: @rtereshin and I talked off-list about whether PRE not checking for isConvergent is a bug, and it was determined that for MachineCSE's implementation of PRE it is not a bug.

Harbormaster completed remote builds in B102365: Diff 342509.May 3 2021, 3:48 PM

LGTM

Forgot to link the differential before pushing, but latest update is in a11489ae3e36063c64921439cbab89d1f3280f4a

Diff 342509

llvm/lib/CodeGen/MachineCSE.cpp

Show First 20 Lines • Show All 427 Lines • ▼ Show 20 Lines

/// isProfitableToCSE - Return true if it's profitable to eliminate MI with a		/// isProfitableToCSE - Return true if it's profitable to eliminate MI with a
/// common expression that defines Reg. CSBB is basic block where CSReg is		/// common expression that defines Reg. CSBB is basic block where CSReg is
/// defined.		/// defined.
bool MachineCSE::isProfitableToCSE(Register CSReg, Register Reg,		bool MachineCSE::isProfitableToCSE(Register CSReg, Register Reg,
MachineBasicBlock CSBB, MachineInstr MI) {		MachineBasicBlock CSBB, MachineInstr MI) {
// FIXME: Heuristics that works around the lack the live range splitting.		// FIXME: Heuristics that works around the lack the live range splitting.

MachineBasicBlock *BB = MI->getParent();
// Prevent CSE-ing non-local convergent instructions.
if (MI->isConvergent() && CSBB != BB)
return false;

// If CSReg is used at all uses of Reg, CSE should not increase register		// If CSReg is used at all uses of Reg, CSE should not increase register
// pressure of CSReg.		// pressure of CSReg.
		foadUnsubmitted Not Done Reply Inline Actions If this is a correctness issue then surely it should not be done inside "is profitable to cse"? foad: If this is a correctness issue then surely it should not be done inside "is profitable to…
bool MayIncreasePressure = true;		bool MayIncreasePressure = true;
if (Register::isVirtualRegister(CSReg) && Register::isVirtualRegister(Reg)) {		if (Register::isVirtualRegister(CSReg) && Register::isVirtualRegister(Reg)) {
MayIncreasePressure = false;		MayIncreasePressure = false;
SmallPtrSet<MachineInstr*, 8> CSUses;		SmallPtrSet<MachineInstr*, 8> CSUses;
for (MachineInstr &MI : MRI->use_nodbg_instructions(CSReg)) {		for (MachineInstr &MI : MRI->use_nodbg_instructions(CSReg)) {
CSUses.insert(&MI);		CSUses.insert(&MI);
}		}
for (MachineInstr &MI : MRI->use_nodbg_instructions(Reg)) {		for (MachineInstr &MI : MRI->use_nodbg_instructions(Reg)) {
if (!CSUses.count(&MI)) {		if (!CSUses.count(&MI)) {
MayIncreasePressure = true;		MayIncreasePressure = true;
break;		break;
}		}
}		}
}		}
if (!MayIncreasePressure) return true;		if (!MayIncreasePressure) return true;

// Heuristics #1: Don't CSE "cheap" computation if the def is not local or in		// Heuristics #1: Don't CSE "cheap" computation if the def is not local or in
// an immediate predecessor. We don't want to increase register pressure and		// an immediate predecessor. We don't want to increase register pressure and
// end up causing other computation to be spilled.		// end up causing other computation to be spilled.
if (TII->isAsCheapAsAMove(*MI)) {		if (TII->isAsCheapAsAMove(*MI)) {
		MachineBasicBlock *BB = MI->getParent();
if (CSBB != BB && !CSBB->isSuccessor(BB))		if (CSBB != BB && !CSBB->isSuccessor(BB))
return false;		return false;
}		}

// Heuristics #2: If the expression doesn't not use a vr and the only use		// Heuristics #2: If the expression doesn't not use a vr and the only use
// of the redundant computation are copies, do not cse.		// of the redundant computation are copies, do not cse.
bool HasVRegUse = false;		bool HasVRegUse = false;
for (const MachineOperand &MO : MI->operands()) {		for (const MachineOperand &MO : MI->operands()) {
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end(); I != E; ) {
}		}

// Found a common subexpression, eliminate it.		// Found a common subexpression, eliminate it.
unsigned CSVN = VNT.lookup(MI);		unsigned CSVN = VNT.lookup(MI);
MachineInstr *CSMI = Exps[CSVN];		MachineInstr *CSMI = Exps[CSVN];
LLVM_DEBUG(dbgs() << "Examining: " << *MI);		LLVM_DEBUG(dbgs() << "Examining: " << *MI);
LLVM_DEBUG(dbgs() << "*** Found a common subexpression: " << *CSMI);		LLVM_DEBUG(dbgs() << "*** Found a common subexpression: " << *CSMI);

		// Prevent CSE-ing non-local convergent instructions.
		// LLVM's current definition of `isConvergent` does not necessarily prove
		// that non-local CSE is illegal. The following check extends the definition
		// of `isConvergent` to assume a convergent instruction is dependent not
		// only on additional conditions, but also on fewer conditions. LLVM does
		// not have a MachineInstr attribute which expresses this extended
		// definition, so it's necessary to use `isConvergent` to prevent illegally
		// CSE-ing the subset of `isConvergent` instructions which do fall into this
		// extended definition.
		if (MI->isConvergent() && MI->getParent() != CSMI->getParent()) {
		foadUnsubmitted Not Done Reply Inline Actions Do we also need this check in ProcessBlockPRE? foad: Do we also need this check in ProcessBlockPRE?
		dsandersUnsubmitted Not Done Reply Inline Actions I think it's needed there too dsanders: I think it's needed there too
		rtereshinUnsubmitted Not Done Reply Inline Actions @mkitzan IIUC (which might be not the case) PRE not checking for isConvergent is a genuine bug, unlike the CSE part: PRE moves ops into predicated blocks, making them more predicated than before, which is illegal for isConvergent. If that's the case, perhaps in case of PRE `isConvergent` check could be a part of `isPRECandidate`. rtereshin: @mkitzan IIUC (which might be not the case) PRE not checking for isConvergent is a genuine bug…
		LLVM_DEBUG(dbgs() << "*** Convergent MI and subexpression exist in "
		"different BBs, avoid CSE!\n");
		VNT.insert(MI, CurrVN++);
		Exps.push_back(MI);
		continue;
		}

// Check if it's profitable to perform this CSE.		// Check if it's profitable to perform this CSE.
bool DoCSE = true;		bool DoCSE = true;
unsigned NumDefs = MI->getNumDefs();		unsigned NumDefs = MI->getNumDefs();

for (unsigned i = 0, e = MI->getNumOperands(); NumDefs && i != e; ++i) {		for (unsigned i = 0, e = MI->getNumOperands(); NumDefs && i != e; ++i) {
MachineOperand &MO = MI->getOperand(i);		MachineOperand &MO = MI->getOperand(i);
if (!MO.isReg() \|\| !MO.isDef())		if (!MO.isReg() \|\| !MO.isDef())
continue;		continue;
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end(); I != E;) {

// Two instrs are partial redundant if their basic blocks are reachable		// Two instrs are partial redundant if their basic blocks are reachable
// from one to another but one doesn't dominate another.		// from one to another but one doesn't dominate another.
if (CMBB != MBB1) {		if (CMBB != MBB1) {
auto BB = MBB->getBasicBlock(), BB1 = MBB1->getBasicBlock();		auto BB = MBB->getBasicBlock(), BB1 = MBB1->getBasicBlock();
if (BB != nullptr && BB1 != nullptr &&		if (BB != nullptr && BB1 != nullptr &&
(isPotentiallyReachable(BB1, BB) \|\|		(isPotentiallyReachable(BB1, BB) \|\|
isPotentiallyReachable(BB, BB1))) {		isPotentiallyReachable(BB, BB1))) {
		// The following check extends the definition of `isConvergent` to
		// assume a convergent instruction is dependent not only on additional
		// conditions, but also on fewer conditions. LLVM does not have a
		// MachineInstr attribute which expresses this extended definition, so
		// it's necessary to use `isConvergent` to prevent illegally PRE-ing the
		// subset of `isConvergent` instructions which do fall into this
		// extended definition.
		if (MI->isConvergent() && CMBB != MBB)
		continue;

assert(MI->getOperand(0).isDef() &&		assert(MI->getOperand(0).isDef() &&
"First operand of instr with one explicit def must be this def");		"First operand of instr with one explicit def must be this def");
Register VReg = MI->getOperand(0).getReg();		Register VReg = MI->getOperand(0).getReg();
Register NewReg = MRI->cloneVirtualRegister(VReg);		Register NewReg = MRI->cloneVirtualRegister(VReg);
if (!isProfitableToCSE(NewReg, VReg, CMBB, MI))		if (!isProfitableToCSE(NewReg, VReg, CMBB, MI))
continue;		continue;
MachineInstr &NewMI =		MachineInstr &NewMI =
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -o - -run-pass=machine-cse %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -o - -run-pass=machine-cse %s \| FileCheck %s

	# Check that we don't CSE non-local convergent instrs. Otherwise, reusing defs			# LLVM's current definition of `isConvergent` does not necessarily prove that
	# of convergent instrs from different control flow scopes can cause illegal			# non-local CSE is illegal. The following test extends the definition of
	# codegen. Previously, the swizzle in bb2 would be CSE-ed in favor of using the			# `isConvergent` to assume a convergent instruction is dependent not only on
	# swizzle in bb1 despite bb2 being a different control flow scope.			# additional conditions, but also on fewer conditions. LLVM does not have a
				# MachineInstr attribute which expresses this extended definition, so it's
				# necessary to use `isConvergent` to prevent illegally CSE-ing the subset of
				# `isConvergent` instructions which do fall into this extended definition.
				dsandersUnsubmitted Not Done Reply Inline Actions CHECK-LABEL is about partitioning the input into multiple pieces that can be checked independently rather than about labels. LGTM with this and the bb.2 one below as either CHECK/CHECK-NEXT dsanders: CHECK-LABEL is about partitioning the input into multiple pieces that can be checked…

				# This is a coverage test for the MachineCSE change. It does not reproduce an
				# actual bug in the AMDGPU backend. The current open source GPU backends as is
				# do not appear to allow a reasonably simple test case that provably and
				# undeniably functionally breaks without the associated MachineCSE changes.

				# The test checks that we don't CSE non-local convergent instrs. Otherwise,
				# reusing defs of convergent instrs from different control flow scopes can
				# cause illegal codegen. Previously, the swizzle in bb2 would be CSE-ed in
				# favor of using the swizzle in bb1 despite bb2 being a different BBs.

	# CHECK-LABEL: name: no_cse			# CHECK-LABEL: name: no_cse
	# CHECK: bb.1.if.then			# CHECK: bb.1.if.then
	# CHECK: [[SWIZZLE1:%[0-9]+]]:vgpr_32 = DS_SWIZZLE_B32 [[SRC:%[0-9]+]], 100, 0, implicit $exec			# CHECK: [[SWIZZLE1:%[0-9]+]]:vgpr_32 = DS_SWIZZLE_B32 [[SRC:%[0-9]+]], 100, 0, implicit $exec
	# CHECK-NEXT: V_ADD_CO_U32_e64 [[SWIZZLE1]], {{%[0-9]+}}, 0, implicit $exec			# CHECK-NEXT: V_ADD_CO_U32_e64 [[SWIZZLE1]], {{%[0-9]+}}, 0, implicit $exec
	# CHECK-NEXT: S_CMP_LT_I32 {{.*}} implicit-def $scc			# CHECK-NEXT: S_CMP_LT_I32 {{.*}} implicit-def $scc
	# CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit $scc			# CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit $scc
	# CHECK-NEXT: S_BRANCH %bb.2			# CHECK-NEXT: S_BRANCH %bb.2
	Show All 17 Lines
	...			...
	---			---
	name: no_cse			name: no_cse
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	liveins: $sgpr4_sgpr5			liveins: $sgpr4_sgpr5
	%0:sgpr_64(p4) = COPY $sgpr4_sgpr5			%0:sgpr_64(p4) = COPY $sgpr4_sgpr5
	%1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0(p4), 0, 0			%1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0(p4), 0, 0
				dsandersUnsubmitted Not Done Reply Inline Actions It's been pointed out to me off-list that CSE'ing to here isn't actually banned by isConvergent, it's just one of the cases we conservatively decline to CSE in the change. To be covered by isConvergent it'd have to be CSE'd into a more/differently predicated block (less is ok). Furthermore the other the cases where we wouldn't be conservative are already prevented by other checks in CSE. If we can find the field we actually mean this patch will only need a small change. I haven't been able to find it though, it doesn't seem to exist in the backend and that's probably what's gotten me confused (I don't think this is the first time either :-)) That actually reminded me of something else to double check: Does this CSE without the change too? dsanders: It's been pointed out to me off-list that CSE'ing to here isn't actually banned by isConvergent…
				arsenmUnsubmitted Not Done Reply Inline Actions The definition of convergent is pretty broken. For AMDGPU in the MIR the control flow as represented by basic blocks no longer expresses the lane level CFG which we're concerned with for convergent ops. In the future when we have convergence tokens, it's not clear to me if we'll somehow preserve those through codegen. It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal arsenm: The definition of convergent is pretty broken. For AMDGPU in the MIR the control flow as…
				rtereshinUnsubmitted Not Done Reply Inline Actions Maybe we should at least put a comment with just about that, Matt, on the change made in MachineCSE? Otherwise I'm afraid it's way too easy to remove the check and be technically right about it. Thoughts? A green light from AMDGPU for this patch though is very helpful, thank you. rtereshin: Maybe we should at least put a comment with just about that, Matt, on the change made in…
				dsandersUnsubmitted Not Done Reply Inline Actions It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR. dsanders: > It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile…
				dsandersUnsubmitted Not Done Reply Inline Actions I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR. Just to clarify, I don't mean to prevent cases within the same BB there. Those happen and are legal and worthwhile dsanders: > I'd be ok with that (with an explanatory comment). I could believe that any that were legal…
	%2:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0(p4), 2, 0			%2:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0(p4), 2, 0
	%3:sreg_64 = COPY %1			%3:sreg_64 = COPY %1
	%4:sreg_32 = COPY %2.sub1			%4:sreg_32 = COPY %2.sub1
	%5:sreg_32 = S_MOV_B32 42			%5:sreg_32 = S_MOV_B32 42
	S_CMP_EQ_U32 %4, %5, implicit-def $scc			S_CMP_EQ_U32 %4, %5, implicit-def $scc
	%6:vgpr_32 = COPY %5, implicit $exec			%6:vgpr_32 = COPY %5, implicit $exec
	S_CBRANCH_SCC1 %bb.4, implicit $scc			S_CBRANCH_SCC1 %bb.4, implicit $scc
	S_BRANCH %bb.1			S_BRANCH %bb.1
	Show All 25 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCSE] Prevent CSE of non-local convergent instrs
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 342509

llvm/lib/CodeGen/MachineCSE.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCSE] Prevent CSE of non-local convergent instrsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 342509

llvm/lib/CodeGen/MachineCSE.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir

[MachineCSE] Prevent CSE of non-local convergent instrs
ClosedPublic