This is an archive of the discontinued LLVM Phabricator instance.

Fix for bootstrap bug introduced in r244921
ClosedPublic

Authored by nemanjai on Sep 3 2015, 4:34 AM.

Details

Summary

Since I have enabled building vectors using vector shuffles for v2i64 when direct moves are available, there was something I missed in PPCDAGToDAGISel::Select.
Namely, when a ISD::VECTOR_SHUFFLE is fed by an ISD::SCALAR_TO_VECTOR which is fed by an unindexed load, we were transforming the node to a VSX load and splat. However, since we produce an MTVSRD and a swap for ISD::SCALAR_TO_VECTOR, this is no longer the right thing to do.
So I have added logic to not change this scalar load to a VSX load-and-splat when we are building a v2i64 and have direct moves.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 33929.Sep 3 2015, 4:34 AM
nemanjai retitled this revision from to Fix for bootstrap bug introduced in r244921.
nemanjai updated this object.
nemanjai added reviewers: wschmidt, seurer, kbarton, hfinkel.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added a subscriber: llvm-commits.
wschmidt edited edge metadata.Sep 3 2015, 6:39 AM

First, congrats on sorting this out; bootstrap issues are always a pain. I have one minor inline comment. Also, can you add a test case for this situation?

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2798

Minor nit: Positive logic is easier to read than negative logic. My preference would be to compute DirectMovable and use !DirectMovable here instead.

nemanjai updated this revision to Diff 34045.Sep 4 2015, 10:48 AM
nemanjai edited edge metadata.

DeMorgan-ized the negative logic as per Bill's comment.
Added a test case for which a similar shuffle produces a VSX load and splat with v2f64.

wschmidt accepted this revision.Sep 4 2015, 10:53 AM
wschmidt edited edge metadata.

LGTM! Thanks.

This revision is now accepted and ready to land.Sep 4 2015, 10:53 AM
hfinkel edited edge metadata.Sep 4 2015, 3:00 PM

Since I have enabled building vectors using vector shuffles for v2i64 when direct moves are available, there was something I missed in PPCDAGToDAGISel::Select. Namely, when a ISD::VECTOR_SHUFFLE is fed by an ISD::SCALAR_TO_VECTOR which is fed by an unindexed load, we were transforming the node to a VSX load and splat. However, since we produce an MTVSRD and a swap for ISD::SCALAR_TO_VECTOR, this is no longer the right thing to do.

So we have shuffle(scalar_to_vector(load), <0, 0>), and we had been producing a vsx load+splat. Now, instead, we have a load + direct move + swap? Is that better?

So I have added logic to not change this scalar load to a VSX load-and-splat when we are building a v2i64 and have direct moves.

Hi Hal, sorry I was on vacation so I'm only getting to this now.
I agree that this code sequence in isolation is not better than what we had in the particular case. However, the direct move sequence is applicable in every situation whereas the old simpler sequence only occurred in the specific sequence you mentioned. Furthermore, if the loaded value is used for non-vector computations as well, it is still in the register we moved it from - although I have no idea if this actually happens or if LLVM infrastructure knows this.
Please advise whether I should:

  1. Check this in as-is and move on
  2. Add some info in the README about improving this sequence
  3. Re-implement SCALAR_TO_VECTOR of v2i64 as custom lowering, detect this sequence and do not emit the direct move
  4. Do something different - i.e. a peephole optimization or something along those lines

I am of the opinion that going with options 1 or 2 is fine as the new sequence is only marginally worse than what we had and it is a lot of work to get this small win.

hfinkel requested changes to this revision.Sep 25 2015, 1:31 PM
hfinkel edited edge metadata.

Hi Hal, sorry I was on vacation so I'm only getting to this now.
I agree that this code sequence in isolation is not better than what we had in the particular case. However, the direct move sequence is applicable in every situation whereas the old simpler sequence only occurred in the specific sequence you mentioned. Furthermore, if the loaded value is used for non-vector computations as well, it is still in the register we moved it from - although I have no idea if this actually happens or if LLVM infrastructure knows this.
Please advise whether I should:

  1. Check this in as-is and move on
  2. Add some info in the README about improving this sequence
  3. Re-implement SCALAR_TO_VECTOR of v2i64 as custom lowering, detect this sequence and do not emit the direct move
  4. Do something different - i.e. a peephole optimization or something along those lines

I am of the opinion that going with options 1 or 2 is fine as the new sequence is only marginally worse than what we had and it is a lot of work to get this small win.

Something more like (3), although you might be able to do that in TableGen too (we have patterns that use callbacks to match shuffle patterns already).

Before that, however, I still don't understand the problem. Why does it matter how SCALAR_TO_VECTOR is lowered? We're selecting the shuffle node to be the LXVDSX, so it becomes the result of that shuffle (which does represent a splat). The SCALAR_TO_VECTOR is then dead regardless of how it is lowered, if it is lowered at all.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2797

You'd need to explain *why* in the comment.

This revision now requires changes to proceed.Sep 25 2015, 1:31 PM
hfinkel added inline comments.Sep 25 2015, 1:35 PM
lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2796

Also, is this optimization missing some hasOneUse() checks? It seems like we might need to make sure that the load and the scalar_to_vector have no other uses.

nemanjai updated this revision to Diff 37563.Oct 16 2015, 1:46 AM
nemanjai edited edge metadata.

Hal, thank you for your comments and suggestions regarding this fix.
It is quite obvious that I did not adequately understand the problem. When you suggested the hasOneUse() calls, I tried that but clearly I did it incorrectly when I didn't understand the problem.

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

The reason we never hit this before the legalization of SCALAR_TO_VECTOR for v2i64 is that we never got in this code because the BUILD_VECTOR for v2i64 was never lowered to a vector_shuffle. I mistakenly assumed that my lowering code for SCALAR_TO_VECTOR was somehow broken and was causing this issue so my investigation took me down the wrong path.

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

So, you're going to have to ensure that the DAG nodes for LXVDSX have chains on them. This is an oversight in the way things are implemented today. See what's done for LXVD2X:

def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                        [SDNPHasChain, SDNPMayLoad]>;

So you'll need to add LXVDSX to the PPCISD enumeration in PPCISelLowering.h, add an entry like the above in PPCInstrVSX.td, and make sure we expand to that node type in the DAGtoDAG code. Then you'll have a chain that you can manipulate.

I should clarify that this is a bug that's independent of your work, so I think it should be treated separately.

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

So, you're going to have to ensure that the DAG nodes for LXVDSX have chains on them. This is an oversight in the way things are implemented today. See what's done for LXVD2X:

def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                        [SDNPHasChain, SDNPMayLoad]>;

So you'll need to add LXVDSX to the PPCISD enumeration in PPCISelLowering.h, add an entry like the above in PPCInstrVSX.td, and make sure we expand to that node type in the DAGtoDAG code. Then you'll have a chain that you can manipulate.

You don't need to add special ISD nodes to do instruction selection in DAGToDAG (only in ISelLowering). LXVDSX is already tagged as mayLoad, and so is already assumed to carry a chain operand. In DAGToDAG we directly generate machine-instruction SDAG nodes. You just need to make sure that chain users are appropriately updated before returning.

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

So, you're going to have to ensure that the DAG nodes for LXVDSX have chains on them. This is an oversight in the way things are implemented today. See what's done for LXVD2X:

def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                        [SDNPHasChain, SDNPMayLoad]>;

So you'll need to add LXVDSX to the PPCISD enumeration in PPCISelLowering.h, add an entry like the above in PPCInstrVSX.td, and make sure we expand to that node type in the DAGtoDAG code. Then you'll have a chain that you can manipulate.

You don't need to add special ISD nodes to do instruction selection in DAGToDAG (only in ISelLowering). LXVDSX is already tagged as mayLoad, and so is already assumed to carry a chain operand. In DAGToDAG we directly generate machine-instruction SDAG nodes. You just need to make sure that chain users are appropriately updated before returning.

OK, I am confused by the need for SNDPHasChain, then. Is that redundant when SNDPMayLoad is specified? Just a curiosity question.

hfinkel accepted this revision.Oct 27 2015, 12:34 PM
hfinkel edited edge metadata.

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

So, you're going to have to ensure that the DAG nodes for LXVDSX have chains on them. This is an oversight in the way things are implemented today. See what's done for LXVD2X:

def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                        [SDNPHasChain, SDNPMayLoad]>;

So you'll need to add LXVDSX to the PPCISD enumeration in PPCISelLowering.h, add an entry like the above in PPCInstrVSX.td, and make sure we expand to that node type in the DAGtoDAG code. Then you'll have a chain that you can manipulate.

You don't need to add special ISD nodes to do instruction selection in DAGToDAG (only in ISelLowering). LXVDSX is already tagged as mayLoad, and so is already assumed to carry a chain operand. In DAGToDAG we directly generate machine-instruction SDAG nodes. You just need to make sure that chain users are appropriately updated before returning.

OK, I am confused by the need for SNDPHasChain, then. Is that redundant when SNDPMayLoad is specified? Just a curiosity question.

No, it is not. But we only need these when defining matching patterns for custom nodes. For instructions themselves, there's nothing being matched (since we're doing isel manually).

In any case, this change now LGTM.

This revision is now accepted and ready to land.Oct 27 2015, 12:34 PM
nemanjai closed this revision.Nov 2 2015, 6:03 AM

Committed revision 251798.