This is an archive of the discontinued LLVM Phabricator instance.

[MachineLICM] Fix handling of memoperands
ClosedPublic

Authored by reames on Dec 22 2015, 4:39 PM.

Details

Summary

As far as I can tell, the correct interpretation of an empty memoperands list is that we didn't have sufficient room to store information about the MachineInstr, NOT that the MachineInstr doesn't access any particular bit of memory. This appears to be fairly consistent in a number of places, but I'm not 100% sure of this interpretation. I'd really appreciate someone more knowledgeable confirming my reading of the code.

This patch fixes two latent bugs in MachineLICM - given the above assumption - and adds comments to document the meaning and required handling. I don't have test cases; these were noticed by inspection.

Diff Detail

Repository
rL LLVM

Event Timeline

reames updated this revision to Diff 43493.Dec 22 2015, 4:39 PM
reames retitled this revision from to [MachineLICM] Fix handling of memoperands.
reames updated this object.
reames added reviewers: qcolombet, atrick, stoklund, bkramer.
reames added a subscriber: llvm-commits.
atrick accepted this revision.Dec 22 2015, 7:07 PM
atrick edited edge metadata.

Thanks for clarifying the spec and fixing assumptions. It looks like there is a temptation to assume that access to a frame index must have non-empty memoperands. But that would be totally unenforceable. If there's a performance issue, it's probably better to fix cases where LLVM drops memoperands rather than make those assumptions.

This revision is now accepted and ready to land.Dec 22 2015, 7:07 PM
sanjoy added a subscriber: sanjoy.Dec 22 2015, 7:19 PM

I think the right way to model "touches all memory" is with a bit on the machine instruction, not with an empty memoperand list. Otherwise, for instance, every place that calls MachineInstr::addMemOperand has to be audited to ensure that it does not transition a machine instruction from 0 mem operands (allowed to clobber all memory) to non-zero mem operands (allowed to clobber only the specific named locations). That or we need to change MachineInstr::addMemOperand.

I'm going to move forward with the current change, but the point Sanjoy raises is entirely correct. In fact, I've already found several more instances of this same bug while digging through related code. I think we need to have three states (no memref, has memrefs, had memrefs/poison). An MI should never be able to transition backwards through those states.

I'll sent a patch out that adds that, along with some assertions in the MachineVerifier and the access APIs. I suspect they'll uncover a lot more bugs of this nature. I suspect we're not seeing this in practice just because few MIs (even patchpoints and statepoints) actually have 256 unique memory locations they touch.

I'm going to move forward with the current change, but the point Sanjoy raises is entirely correct. In fact, I've already found several more instances of this same bug while digging through related code. I think we need to have three states (no memref, has memrefs, had memrefs/poison). An MI should never be able to transition backwards through those states.

That, or we can have a special "everything" memref. In that scheme, it would always be correct (but not optimal) to replace an arbitrary set of memrefs with one instance of the "everything" memref. This would let you squash 256 memrefs into one if you want to.

(Side note: NumMemRefs is an uint16_t -- does this mean the limit is 65536, and not 256?)

I'll sent a patch out that adds that, along with some assertions in the MachineVerifier and the access APIs. I suspect they'll uncover a lot more bugs of this nature. I suspect we're not seeing this in practice just because few MIs (even patchpoints and statepoints) actually have 256 unique memory locations they touch.

(Side note: NumMemRefs is an uint16_t -- does this mean the limit is 65536, and not 256?)

Where are you looking? The one in MachineInstr.h is definitely an uint8_t. I just triple checked.

Oh, wait. I think you're looking at our downstream tree right? I think we patched that locally.

[-CC all]

Philip Reames wrote:

reames added a comment.

In http://reviews.llvm.org/D15730#315931, @sanjoy wrote:

(Side note: NumMemRefs is an uint16_t -- does this mean the limit is 65536, and not 256?)

Where are you looking? The one in MachineInstr.h is definitely an uint8_t. I just triple checked.

Oh, wait. I think you're looking at our downstream tree right? I think we patched that locally.

The perils of living downstream. :)

I was the one who added this in Apr 24 2014; but somehow the leading
"/// AZUL BEGIN" got lost, possibly in some upstream merge. I'll add it
back.

  • Sanjoy

http://reviews.llvm.org/D15730

Here's my current understanding, but I'm open to other possibilities.

  • Empty memoperands should be interpreted as the most conservative state.
  • Going from zero to non-zero memoperands would indeed be an error.
  • Casual pass writers often forget to propagate memoperands entirely (naïve sloppiness is ok), but dropping only some of them and not all of them would be incorrect (conscious sloppiness is an error).
  • addMemOperand is supposed to be an "internal" API, not called directly from passes.
  • Merging memory accesses cannot be done naïvely.
  • We should work hard to avoid empty memoperands, and to that end we could introduce PseudoSourceValues with specific meanings. I certainly don't think that overflowing memoperands should result in a zero memoperands state, which as Sanjoy pointed out is more error prone.
gberry added a subscriber: gberry.Dec 23 2015, 7:18 AM

There is a related change being proposed (handling mem operands in tail merge) here:

http://reviews.llvm.org/D15230

in case anyone interested in this subject would like to take a look.

This revision was automatically updated to reflect the committed changes.

Original patch submitted, but let's keep the discussion going. Once we settle on what should be, I'm going to prepare a follow on patch to clarify/enforce invariants.

Here's my current understanding, but I'm open to other possibilities.

  • Empty memoperands should be interpreted as the most conservative state.
  • Going from zero to non-zero memoperands would indeed be an error.

I'm not sure this will work in practice. I haven't tried yet, but I believe we're likely to use zero as a starting state and then add all the needed operands. We do need a conservative state, but I suspect that will have to be a separate explicit poison state.

  • Casual pass writers often forget to propagate memoperands entirely (naïve sloppiness is ok), but dropping only some of them and not all of them would be incorrect (conscious sloppiness is an error).
  • addMemOperand is supposed to be an "internal" API, not called directly from passes.

This doesn't look to be actual true. Or at least, "internal" means a lot more code than I'd think it should...

  • Merging memory accesses cannot be done naïvely.

Can you describe how it can be done at all? I don't know today.

  • We should work hard to avoid empty memoperands, and to that end we could introduce PseudoSourceValues with specific meanings. I certainly don't think that overflowing memoperands should result in a zero memoperands state, which as Sanjoy pointed out is more error prone.

How do we avoid this? Do we introduce more abstract/less precise PSVs? I'm open to ideas here; I don't really claim to understand what all we're using this for.

Original patch submitted, but let's keep the discussion going. Once we settle on what should be, I'm going to prepare a follow on patch to clarify/enforce invariants.

  • Going from zero to non-zero memoperands would indeed be an error.

I'm not sure this will work in practice. I haven't tried yet, but I believe we're likely to use zero as a starting state and then add all the needed operands. We do need a conservative state, but I suspect that will have to be a separate explicit poison state.

  • Casual pass writers often forget to propagate memoperands entirely (naïve sloppiness is ok), but dropping only some of them and not all of them would be incorrect (conscious sloppiness is an error).
  • addMemOperand is supposed to be an "internal" API, not called directly from passes.

This doesn't look to be actual true. Or at least, "internal" means a lot more code than I'd think it should...

"Internal" was a bad choice of words. I mean that it is designed as a utility to use within code that already knows how to correctly build an instruction or merge memoperands. addMemOperand seems mostly like a convenience for constructing stack load/stores. It turns out we do this a lot. There's no utility to encapsulate that operation, and there's no way for addOperands to know this is a "new" instruction.

In fact, I'm not aware of any reason to add memoperands at all outside of building a new instruction. When merging instructions you're likely to just create a new one, not modify an old one.

Maybe your fear is that an instruction is built up in stages. Initially something indicates an unknown memory location so we zero memoperands, then later we acquire a memoperand. That would be a bug. I agree it would be better to insert a PseudoSourceValue placeholder when we see an unknown location.

  • Merging memory accesses cannot be done naïvely.

Can you describe how it can be done at all? I don't know today.

I should restate that: merging load/stores can be done naïvely by dropping all memoperands. The danger would be in referring to the memoperands list from only one of the original load/stores without merging them into a new memoperands list.

AArch64 uses a concatenateMemOperands utility.

I meant to contrast this with splitting a load/store, which could either very conservatively drop memoperands or much less conservatively refer to the original memoperands list in both places.

  • We should work hard to avoid empty memoperands, and to that end we could introduce PseudoSourceValues with specific meanings. I certainly don't think that overflowing memoperands should result in a zero memoperands state, which as Sanjoy pointed out is more error prone.

How do we avoid this? Do we introduce more abstract/less precise PSVs? I'm open to ideas here; I don't really claim to understand what all we're using this for.

That was just a suggestion for improvement. I don't have a better idea for representing an unknown memory location than with a new kind of PSV.

Andy

Original patch submitted, but let's keep the discussion going. Once we settle on what should be, I'm going to prepare a follow on patch to clarify/enforce invariants.

  • Going from zero to non-zero memoperands would indeed be an error.

I'm not sure this will work in practice. I haven't tried yet, but I believe we're likely to use zero as a starting state and then add all the needed operands. We do need a conservative state, but I suspect that will have to be a separate explicit poison state.

  • Casual pass writers often forget to propagate memoperands entirely (naïve sloppiness is ok), but dropping only some of them and not all of them would be incorrect (conscious sloppiness is an error).
  • addMemOperand is supposed to be an "internal" API, not called directly from passes.

This doesn't look to be actual true. Or at least, "internal" means a lot more code than I'd think it should...

"Internal" was a bad choice of words. I mean that it is designed as a utility to use within code that already knows how to correctly build an instruction or merge memoperands. addMemOperand seems mostly like a convenience for constructing stack load/stores. It turns out we do this a lot. There's no utility to encapsulate that operation, and there's no way for addOperands to know this is a "new" instruction.

In fact, I'm not aware of any reason to add memoperands at all outside of building a new instruction. When merging instructions you're likely to just create a new one, not modify an old one.

Maybe your fear is that an instruction is built up in stages. Initially something indicates an unknown memory location so we zero memoperands, then later we acquire a memoperand. That would be a bug.

This is exactly my concern. As I mentioned in my separate email to you, it looks like we're doing exactly this for patchpoints/statepoints. I might not technically be a bug per your description (we're creating a new instruction to replace an old one), but it does look suspicious.

I agree it would be better to insert a PseudoSourceValue placeholder when we see an unknown location.

Ok, that's the mental model I'll run with going forward.

  • Merging memory accesses cannot be done naïvely.

Can you describe how it can be done at all? I don't know today.

I should restate that: merging load/stores can be done naïvely by dropping all memoperands. The danger would be in referring to the memoperands list from only one of the original load/stores without merging them into a new memoperands list.

AArch64 uses a concatenateMemOperands utility.

I meant to contrast this with splitting a load/store, which could either very conservatively drop memoperands or much less conservatively refer to the original memoperands list in both places.

To restate to make sure I understand. When merging two instructions, it's always legal to concatenate (and possibly remove duplicates) the original two lists. When splitting, it's correct - but slightly conservative - to keep the original list. In either case, it is *legal* to use a new empty list, but not recommended due to imprecision.

  • We should work hard to avoid empty memoperands, and to that end we could introduce PseudoSourceValues with specific meanings. I certainly don't think that overflowing memoperands should result in a zero memoperands state, which as Sanjoy pointed out is more error prone.

How do we avoid this? Do we introduce more abstract/less precise PSVs? I'm open to ideas here; I don't really claim to understand what all we're using this for.

That was just a suggestion for improvement. I don't have a better idea for representing an unknown memory location than with a new kind of PSV.

Andy

To restate to make sure I understand. When merging two instructions, it's always legal to concatenate (and possibly remove duplicates) the original two lists. When splitting, it's correct - but slightly conservative - to keep the original list. In either case, it is *legal* to use a new empty list, but not recommended due to imprecision.

That's my understanding, FWIW. We should try not to drop memoperands, but anticipate that some authors of target passes could be lazy.

AArch64 uses a concatenateMemOperands utility.

Looking around, I see three copes of this routine (2 in tree, 1 under review). I think it makes sense to common this functionality. Would a static function on MachineInstr be a reasonable place for this? Or is there a better location for this type of utility function.