This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachineInstr.h
-
lib/CodeGen/
-
CodeGen/
-
MachineLICM.cpp

Differential D15730

[MachineLICM] Fix handling of memoperands
ClosedPublic

Authored by reames on Dec 22 2015, 4:39 PM.

Download Raw Diff

Details

Reviewers

qcolombet
atrick
stoklund
bkramer

Commits

rG42bd26f29d4b: [MachineLICM] Fix handling of memoperands
rL256335: [MachineLICM] Fix handling of memoperands

Summary

As far as I can tell, the correct interpretation of an empty memoperands list is that we didn't have sufficient room to store information about the MachineInstr, NOT that the MachineInstr doesn't access any particular bit of memory. This appears to be fairly consistent in a number of places, but I'm not 100% sure of this interpretation. I'd really appreciate someone more knowledgeable confirming my reading of the code.

This patch fixes two latent bugs in MachineLICM - given the above assumption - and adds comments to document the meaning and required handling. I don't have test cases; these were noticed by inspection.

Diff Detail

Repository: rL LLVM

Event Timeline

reames updated this revision to Diff 43493.Dec 22 2015, 4:39 PM

reames retitled this revision from to [MachineLICM] Fix handling of memoperands.

reames updated this object.

reames added reviewers: qcolombet, atrick, stoklund, bkramer.

reames added a subscriber: llvm-commits.

Thanks for clarifying the spec and fixing assumptions. It looks like there is a temptation to assume that access to a frame index must have non-empty memoperands. But that would be totally unenforceable. If there's a performance issue, it's probably better to fix cases where LLVM drops memoperands rather than make those assumptions.

This revision is now accepted and ready to land.Dec 22 2015, 7:07 PM

I think the right way to model "touches all memory" is with a bit on the machine instruction, not with an empty memoperand list. Otherwise, for instance, every place that calls MachineInstr::addMemOperand has to be audited to ensure that it does not transition a machine instruction from 0 mem operands (allowed to clobber all memory) to non-zero mem operands (allowed to clobber only the specific named locations). That or we need to change MachineInstr::addMemOperand.

I'm going to move forward with the current change, but the point Sanjoy raises is entirely correct. In fact, I've already found several more instances of this same bug while digging through related code. I think we need to have three states (no memref, has memrefs, had memrefs/poison). An MI should never be able to transition backwards through those states.

I'll sent a patch out that adds that, along with some assertions in the MachineVerifier and the access APIs. I suspect they'll uncover a lot more bugs of this nature. I suspect we're not seeing this in practice just because few MIs (even patchpoints and statepoints) actually have 256 unique memory locations they touch.

In D15730#315930, @reames wrote:

I'm going to move forward with the current change, but the point Sanjoy raises is entirely correct. In fact, I've already found several more instances of this same bug while digging through related code. I think we need to have three states (no memref, has memrefs, had memrefs/poison). An MI should never be able to transition backwards through those states.

That, or we can have a special "everything" memref. In that scheme, it would always be correct (but not optimal) to replace an arbitrary set of memrefs with one instance of the "everything" memref. This would let you squash 256 memrefs into one if you want to.

(Side note: NumMemRefs is an uint16_t -- does this mean the limit is 65536, and not 256?)

I'll sent a patch out that adds that, along with some assertions in the MachineVerifier and the access APIs. I suspect they'll uncover a lot more bugs of this nature. I suspect we're not seeing this in practice just because few MIs (even patchpoints and statepoints) actually have 256 unique memory locations they touch.

In D15730#315931, @sanjoy wrote:

(Side note: NumMemRefs is an uint16_t -- does this mean the limit is 65536, and not 256?)

Where are you looking? The one in MachineInstr.h is definitely an uint8_t. I just triple checked.

Oh, wait. I think you're looking at our downstream tree right? I think we patched that locally.

[-CC all]

Philip Reames wrote:

reames added a comment.

In http://reviews.llvm.org/D15730#315931, @sanjoy wrote:

(Side note: NumMemRefs is an uint16_t -- does this mean the limit is 65536, and not 256?)

Where are you looking? The one in MachineInstr.h is definitely an uint8_t. I just triple checked.

Oh, wait. I think you're looking at our downstream tree right? I think we patched that locally.

The perils of living downstream. :)

I was the one who added this in Apr 24 2014; but somehow the leading
"/// AZUL BEGIN" got lost, possibly in some upstream merge. I'll add it
back.

Sanjoy

http://reviews.llvm.org/D15730

Here's my current understanding, but I'm open to other possibilities.

Empty memoperands should be interpreted as the most conservative state.

Going from zero to non-zero memoperands would indeed be an error.

Casual pass writers often forget to propagate memoperands entirely (naïve sloppiness is ok), but dropping only some of them and not all of them would be incorrect (conscious sloppiness is an error).

addMemOperand is supposed to be an "internal" API, not called directly from passes.

Merging memory accesses cannot be done naïvely.

We should work hard to avoid empty memoperands, and to that end we could introduce PseudoSourceValues with specific meanings. I certainly don't think that overflowing memoperands should result in a zero memoperands state, which as Sanjoy pointed out is more error prone.

There is a related change being proposed (handling mem operands in tail merge) here:

http://reviews.llvm.org/D15230

in case anyone interested in this subject would like to take a look.

Closed by commit rL256335: [MachineLICM] Fix handling of memoperands (authored by reames). · Explain WhyDec 23 2015, 9:09 AM

This revision was automatically updated to reflect the committed changes.

Original patch submitted, but let's keep the discussion going. Once we settle on what should be, I'm going to prepare a follow on patch to clarify/enforce invariants.

In D15730#315946, @atrick wrote:

Here's my current understanding, but I'm open to other possibilities.

Empty memoperands should be interpreted as the most conservative state.

Going from zero to non-zero memoperands would indeed be an error.

I'm not sure this will work in practice. I haven't tried yet, but I believe we're likely to use zero as a starting state and then add all the needed operands. We do need a conservative state, but I suspect that will have to be a separate explicit poison state.

Casual pass writers often forget to propagate memoperands entirely (naïve sloppiness is ok), but dropping only some of them and not all of them would be incorrect (conscious sloppiness is an error).

addMemOperand is supposed to be an "internal" API, not called directly from passes.

This doesn't look to be actual true. Or at least, "internal" means a lot more code than I'd think it should...

Merging memory accesses cannot be done naïvely.

Can you describe how it can be done at all? I don't know today.

We should work hard to avoid empty memoperands, and to that end we could introduce PseudoSourceValues with specific meanings. I certainly don't think that overflowing memoperands should result in a zero memoperands state, which as Sanjoy pointed out is more error prone.

How do we avoid this? Do we introduce more abstract/less precise PSVs? I'm open to ideas here; I don't really claim to understand what all we're using this for.

In D15730#316212, @reames wrote:

Original patch submitted, but let's keep the discussion going. Once we settle on what should be, I'm going to prepare a follow on patch to clarify/enforce invariants.

Going from zero to non-zero memoperands would indeed be an error.

I'm not sure this will work in practice. I haven't tried yet, but I believe we're likely to use zero as a starting state and then add all the needed operands. We do need a conservative state, but I suspect that will have to be a separate explicit poison state.

Casual pass writers often forget to propagate memoperands entirely (naïve sloppiness is ok), but dropping only some of them and not all of them would be incorrect (conscious sloppiness is an error).

addMemOperand is supposed to be an "internal" API, not called directly from passes.

This doesn't look to be actual true. Or at least, "internal" means a lot more code than I'd think it should...

"Internal" was a bad choice of words. I mean that it is designed as a utility to use within code that already knows how to correctly build an instruction or merge memoperands. addMemOperand seems mostly like a convenience for constructing stack load/stores. It turns out we do this a lot. There's no utility to encapsulate that operation, and there's no way for addOperands to know this is a "new" instruction.

In fact, I'm not aware of any reason to add memoperands at all outside of building a new instruction. When merging instructions you're likely to just create a new one, not modify an old one.

Maybe your fear is that an instruction is built up in stages. Initially something indicates an unknown memory location so we zero memoperands, then later we acquire a memoperand. That would be a bug. I agree it would be better to insert a PseudoSourceValue placeholder when we see an unknown location.

Merging memory accesses cannot be done naïvely.

Can you describe how it can be done at all? I don't know today.

I should restate that: merging load/stores can be done naïvely by dropping all memoperands. The danger would be in referring to the memoperands list from only one of the original load/stores without merging them into a new memoperands list.

AArch64 uses a concatenateMemOperands utility.

I meant to contrast this with splitting a load/store, which could either very conservatively drop memoperands or much less conservatively refer to the original memoperands list in both places.

We should work hard to avoid empty memoperands, and to that end we could introduce PseudoSourceValues with specific meanings. I certainly don't think that overflowing memoperands should result in a zero memoperands state, which as Sanjoy pointed out is more error prone.

How do we avoid this? Do we introduce more abstract/less precise PSVs? I'm open to ideas here; I don't really claim to understand what all we're using this for.

That was just a suggestion for improvement. I don't have a better idea for representing an unknown memory location than with a new kind of PSV.

Andy

jevinskie added a subscriber: jevinskie.Dec 23 2015, 10:32 AM

In D15730#316291, @atrick wrote:

In D15730#316212, @reames wrote:

Original patch submitted, but let's keep the discussion going. Once we settle on what should be, I'm going to prepare a follow on patch to clarify/enforce invariants.

Going from zero to non-zero memoperands would indeed be an error.

I'm not sure this will work in practice. I haven't tried yet, but I believe we're likely to use zero as a starting state and then add all the needed operands. We do need a conservative state, but I suspect that will have to be a separate explicit poison state.

Casual pass writers often forget to propagate memoperands entirely (naïve sloppiness is ok), but dropping only some of them and not all of them would be incorrect (conscious sloppiness is an error).

addMemOperand is supposed to be an "internal" API, not called directly from passes.

This doesn't look to be actual true. Or at least, "internal" means a lot more code than I'd think it should...

"Internal" was a bad choice of words. I mean that it is designed as a utility to use within code that already knows how to correctly build an instruction or merge memoperands. addMemOperand seems mostly like a convenience for constructing stack load/stores. It turns out we do this a lot. There's no utility to encapsulate that operation, and there's no way for addOperands to know this is a "new" instruction.

In fact, I'm not aware of any reason to add memoperands at all outside of building a new instruction. When merging instructions you're likely to just create a new one, not modify an old one.

Maybe your fear is that an instruction is built up in stages. Initially something indicates an unknown memory location so we zero memoperands, then later we acquire a memoperand. That would be a bug.

This is exactly my concern. As I mentioned in my separate email to you, it looks like we're doing exactly this for patchpoints/statepoints. I might not technically be a bug per your description (we're creating a new instruction to replace an old one), but it does look suspicious.

I agree it would be better to insert a PseudoSourceValue placeholder when we see an unknown location.

Ok, that's the mental model I'll run with going forward.

Merging memory accesses cannot be done naïvely.

Can you describe how it can be done at all? I don't know today.

I should restate that: merging load/stores can be done naïvely by dropping all memoperands. The danger would be in referring to the memoperands list from only one of the original load/stores without merging them into a new memoperands list.

AArch64 uses a concatenateMemOperands utility.

I meant to contrast this with splitting a load/store, which could either very conservatively drop memoperands or much less conservatively refer to the original memoperands list in both places.

To restate to make sure I understand. When merging two instructions, it's always legal to concatenate (and possibly remove duplicates) the original two lists. When splitting, it's correct - but slightly conservative - to keep the original list. In either case, it is *legal* to use a new empty list, but not recommended due to imprecision.

We should work hard to avoid empty memoperands, and to that end we could introduce PseudoSourceValues with specific meanings. I certainly don't think that overflowing memoperands should result in a zero memoperands state, which as Sanjoy pointed out is more error prone.

How do we avoid this? Do we introduce more abstract/less precise PSVs? I'm open to ideas here; I don't really claim to understand what all we're using this for.

That was just a suggestion for improvement. I don't have a better idea for representing an unknown memory location than with a new kind of PSV.

Andy

To restate to make sure I understand. When merging two instructions, it's always legal to concatenate (and possibly remove duplicates) the original two lists. When splitting, it's correct - but slightly conservative - to keep the original list. In either case, it is *legal* to use a new empty list, but not recommended due to imprecision.

That's my understanding, FWIW. We should try not to drop memoperands, but anticipate that some authors of target passes could be lazy.

In D15730#316291, @atrick wrote:

AArch64 uses a concatenateMemOperands utility.

Looking around, I see three copes of this routine (2 in tree, 1 under review). I think it makes sense to common this functionality. Would a static function on MachineInstr be a reasonable place for this? Or is there a better location for this type of utility function.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

MachineInstr.h

9 lines

lib/

CodeGen/

MachineLICM.cpp

14 lines

Diff 43542

llvm/trunk/include/llvm/CodeGen/MachineInstr.h

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	private:
uint8_t AsmPrinterFlags; // Various bits of information used by		uint8_t AsmPrinterFlags; // Various bits of information used by
// the AsmPrinter to emit helpful		// the AsmPrinter to emit helpful
// comments. This is not semantic		// comments. This is not semantic
// information. Do not use this for		// information. Do not use this for
// anything other than to convey comment		// anything other than to convey comment
// information to AsmPrinter.		// information to AsmPrinter.

uint8_t NumMemRefs; // Information on memory references.		uint8_t NumMemRefs; // Information on memory references.
		// Note that MemRefs == nullptr, means 'don't know', not 'no memory access'.
		// Calling code must treat missing information conservatively. If the number
		// of memory operands required to be precise exceeds the maximum value of
		// NumMemRefs - currently 256 - we remove the operands entirely. Note also
		// that this is a non-owning reference to a shared copy on write buffer owned
		// by the MachineFunction and created via MF.allocateMemRefsArray.
mmo_iterator MemRefs;		mmo_iterator MemRefs;

DebugLoc debugLoc; // Source line information.		DebugLoc debugLoc; // Source line information.

MachineInstr(const MachineInstr&) = delete;		MachineInstr(const MachineInstr&) = delete;
void operator=(const MachineInstr&) = delete;		void operator=(const MachineInstr&) = delete;
// Use MachineFunction::DeleteMachineInstr() instead.		// Use MachineFunction::DeleteMachineInstr() instead.
~MachineInstr() = delete;		~MachineInstr() = delete;
▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	public:
/// Returns the number of the operand iterator \p I points to.		/// Returns the number of the operand iterator \p I points to.
unsigned getOperandNo(const_mop_iterator I) const {		unsigned getOperandNo(const_mop_iterator I) const {
return I - operands_begin();		return I - operands_begin();
}		}

/// Access to memory operands of the instruction		/// Access to memory operands of the instruction
mmo_iterator memoperands_begin() const { return MemRefs; }		mmo_iterator memoperands_begin() const { return MemRefs; }
mmo_iterator memoperands_end() const { return MemRefs + NumMemRefs; }		mmo_iterator memoperands_end() const { return MemRefs + NumMemRefs; }
		/// Return true if we don't have any memory operands which described the the
		/// memory access done by this instruction. If this is true, calling code
		/// must be conservative.
bool memoperands_empty() const { return NumMemRefs == 0; }		bool memoperands_empty() const { return NumMemRefs == 0; }

iterator_range<mmo_iterator> memoperands() {		iterator_range<mmo_iterator> memoperands() {
return make_range(memoperands_begin(), memoperands_end());		return make_range(memoperands_begin(), memoperands_end());
}		}
iterator_range<mmo_iterator> memoperands() const {		iterator_range<mmo_iterator> memoperands() const {
return make_range(memoperands_begin(), memoperands_end());		return make_range(memoperands_begin(), memoperands_end());
}		}
▲ Show 20 Lines • Show All 902 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/MachineLICM.cpp

Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
}		}
}		}

return Changed;		return Changed;
}		}

/// Return true if instruction stores to the specified frame.		/// Return true if instruction stores to the specified frame.
static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {		static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {
		// If we lost memory operands, conservatively assume that the instruction
		// writes to all slots.
		if (MI->memoperands_empty())
		return true;
for (MachineInstr::mmo_iterator o = MI->memoperands_begin(),		for (MachineInstr::mmo_iterator o = MI->memoperands_begin(),
oe = MI->memoperands_end(); o != oe; ++o) {		oe = MI->memoperands_end(); o != oe; ++o) {
if (!(o)->isStore() \|\| !(o)->getPseudoValue())		if (!(o)->isStore() \|\| !(o)->getPseudoValue())
continue;		continue;
if (const FixedStackPseudoSourceValue *Value =		if (const FixedStackPseudoSourceValue *Value =
dyn_cast<FixedStackPseudoSourceValue>((*o)->getPseudoValue())) {		dyn_cast<FixedStackPseudoSourceValue>((*o)->getPseudoValue())) {
if (Value->getFrameIndex() == FI)		if (Value->getFrameIndex() == FI)
return true;		return true;
▲ Show 20 Lines • Show All 500 Lines • ▼ Show 20 Lines	for (; *PS != -1; ++PS) {
Cost[*PS] += RCCost;		Cost[*PS] += RCCost;
}		}
}		}
return Cost;		return Cost;
}		}

/// Return true if this machine instruction loads from global offset table or		/// Return true if this machine instruction loads from global offset table or
/// constant pool.		/// constant pool.
static bool isLoadFromGOTOrConstantPool(MachineInstr &MI) {		static bool mayLoadFromGOTOrConstantPool(MachineInstr &MI) {
assert (MI.mayLoad() && "Expected MI that loads!");		assert (MI.mayLoad() && "Expected MI that loads!");

		// If we lost memory operands, conservatively assume that the instruction
		// reads from everything..
		if (MI.memoperands_empty())
		return true;

for (MachineInstr::mmo_iterator I = MI.memoperands_begin(),		for (MachineInstr::mmo_iterator I = MI.memoperands_begin(),
E = MI.memoperands_end(); I != E; ++I) {		E = MI.memoperands_end(); I != E; ++I) {
if (const PseudoSourceValue PSV = (I)->getPseudoValue()) {		if (const PseudoSourceValue PSV = (I)->getPseudoValue()) {
if (PSV->isGOT() \|\| PSV->isConstantPool())		if (PSV->isGOT() \|\| PSV->isConstantPool())
return true;		return true;
}		}
}		}
return false;		return false;
}		}

/// Returns true if the instruction may be a suitable candidate for LICM.		/// Returns true if the instruction may be a suitable candidate for LICM.
/// e.g. If the instruction is a call, then it's obviously not safe to hoist it.		/// e.g. If the instruction is a call, then it's obviously not safe to hoist it.
bool MachineLICM::IsLICMCandidate(MachineInstr &I) {		bool MachineLICM::IsLICMCandidate(MachineInstr &I) {
// Check if it's safe to move the instruction.		// Check if it's safe to move the instruction.
bool DontMoveAcrossStore = true;		bool DontMoveAcrossStore = true;
if (!I.isSafeToMove(AA, DontMoveAcrossStore))		if (!I.isSafeToMove(AA, DontMoveAcrossStore))
return false;		return false;

// If it is load then check if it is guaranteed to execute by making sure that		// If it is load then check if it is guaranteed to execute by making sure that
// it dominates all exiting blocks. If it doesn't, then there is a path out of		// it dominates all exiting blocks. If it doesn't, then there is a path out of
// the loop which does not execute this load, so we can't hoist it. Loads		// the loop which does not execute this load, so we can't hoist it. Loads
// from constant memory are not safe to speculate all the time, for example		// from constant memory are not safe to speculate all the time, for example
// indexed load from a jump table.		// indexed load from a jump table.
// Stores and side effects are already checked by isSafeToMove.		// Stores and side effects are already checked by isSafeToMove.
if (I.mayLoad() && !isLoadFromGOTOrConstantPool(I) &&		if (I.mayLoad() && !mayLoadFromGOTOrConstantPool(I) &&
!IsGuaranteedToExecute(I.getParent()))		!IsGuaranteedToExecute(I.getParent()))
return false;		return false;

return true;		return true;
}		}

/// Returns true if the instruction is loop invariant.		/// Returns true if the instruction is loop invariant.
/// I.e., all virtual register operands are defined outside of the loop,		/// I.e., all virtual register operands are defined outside of the loop,
▲ Show 20 Lines • Show All 525 Lines • Show Last 20 Lines