This is an archive of the discontinued LLVM Phabricator instance.

Emit a swap for STXVD2X when it's emitted by matching a 'store' node
ClosedPublic

Authored by nemanjai on Jun 15 2016, 2:43 PM.

Details

Summary

PR 28130 points out that we have a missing swap prior to a permuting vector store with fast isel. This patch changes the dag patterns for STXVD2X to make them endianness-sensitive.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 60895.Jun 15 2016, 2:43 PM
nemanjai retitled this revision from to Emit a swap for STXVD2X when it's emitted by matching a 'store' node.
nemanjai updated this object.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added a subscriber: llvm-commits.
timshen added inline comments.Jun 15 2016, 3:24 PM
lib/Target/PowerPC/PPCInstrVSX.td
940

Without this change, powerpc le stxvd2x still works in most of the cases. Does that mean this line introduces a second code path to handle powerpc le stxvd2x? If so, can we remove the first one?

timshen edited edge metadata.Jun 16 2016, 2:57 PM
nemanjai added inline comments.Jun 17 2016, 1:44 AM
lib/Target/PowerPC/PPCInstrVSX.td
940

I am not sure which definitions you're referring to as first and second and what you're suggesting we remove. As you can see above, I've removed the DAG pattern that was part of the instruction definition.

timshen added inline comments.Jun 17 2016, 10:52 AM
lib/Target/PowerPC/PPCInstrVSX.td
940

The DAG pattern "(store v2f64:$XT, xoaddr:$dst)" is for big endian, not little endian, because it doesn't consider swap at all.

By saying the first code path I mean PPCTargetLowering::expandVSXStoreForLE().

nemanjai added inline comments.Jun 17 2016, 12:04 PM
lib/Target/PowerPC/PPCInstrVSX.td
940

But both code paths now will emit a swap. The code in PPCTargetLowering::expandVSXStoreForLE() handles more than just this pattern you mentioned. I believe the code in this patch will only ever be exercised with fast isel, but I may be wrong about that though.

amehsan added inline comments.Jun 17 2016, 12:36 PM
lib/Target/PowerPC/PPCInstrVSX.td
940

This may or may not help to address the concern raised.

What we generate in PPCTargetLowering::expandVSXStoreForLE() is PPCISD::STXVD2X which is different from PPC::STXVD2X which is defined in the td file.

amehsan edited edge metadata.Jun 17 2016, 12:46 PM

Having said that, I have not done a full investigation of these patches. But from a quick look, with the other fix, for this bug, PPCTargetLowering::expandVSXStoreForLE() was invoked and this patch was not working. So the other patch seems to depend on the first path. The question is, with that patch do we still have cases, where this path of code is executed.

Again, I just did a quick experiment with the two patches, to make sure there is no obvious inconsistency between them. There might be something that I missed.

Having said that, I have not done a full investigation of these patches. But from a quick look, with the other fix, for this bug, PPCTargetLowering::expandVSXStoreForLE() was invoked and this patch was not working. So the other patch seems to depend on the first path. The question is, with that patch do we still have cases, where this path of code is executed.

As the conclusion of my investigation, the other patch switches the lowering from the second path ((store v2f64:$XT, xoaddr:$dst) in the .td file above, which is also a fallback path when DAGCombiner::combine() fails) to the first path (PPCTargetLowering::expandVSXStoreForLE, generated by a successful combine()). This patch fixes the second path.

I guess the question is why do we have two paths? I haven't looked at FastISel yet.

lib/Target/PowerPC/PPCInstrVSX.td
940

I believe the code in this patch will only ever be exercised with fast isel, but I may be wrong about that though.

Does that mean, the STXVD2X selected by fast isel was entirely wrong before this patch? Wouldn't we have noticed that before?

940

What we generate in PPCTargetLowering::expandVSXStoreForLE() is PPCISD::STXVD2X which is different from PPC::STXVD2X which is defined in the td file.

This is helpful, thanks! Actually my patch D21416 fixes the test failure because it makes SelectionDAG successfully lower store to PPCISD::STXVD2X.

Just and idea, but now that you are investigating it, you may consider this. We should not have C++ code for code gen if it is possible to do things in .td files. Could we keep this patch and get rid of the code in PPCTargetLowering::expandVSXStoreForLE() which generates PPCISD::STXVD2X (and possibly get rid of PPCISD::STXVC2X altogether?).

nemanjai added a comment.EditedJun 20 2016, 11:15 AM

Just and idea, but now that you are investigating it, you may consider this. We should not have C++ code for code gen if it is possible to do things in .td files. Could we keep this patch and get rid of the code in PPCTargetLowering::expandVSXStoreForLE() which generates PPCISD::STXVD2X (and possibly get rid of PPCISD::STXVC2X altogether?).

Yes, I think it would be a good idea to get rid of the custom ISD nodes for LXVD2X and STXVD2X as well as the XXSWAPD node. This way we will always get a swap after the load and before the store as we did with the custom code. Furthermore, we can lower int_ppc_vsx_stxvd2x directly to the STXVD2X instruction (and conversely for the load). This will ensure that if someone uses the intrinsic through a builtin, they just get the instruction without the swap allowing us to trivially implement vec_xl_be/vec_xst_be for 64-bit data types.

I think this is a worth-while simplification and can update the patch to do that if @hfinkel and @wschmidt agree.

Yes, I think it would be a good idea to get rid of the custom ISD nodes for LXVD2X and STXVD2X as well as the XXSWAPD node. This way we will always get a swap after the load and before the store as we did with the custom code. Furthermore, we can lower int_ppc_vsx_stxvd2x directly to the STXVD2X instruction (and conversely for the load). This will ensure that if someone uses the intrinsic through a builtin, they just get the instruction without the swap allowing us to trivially implement vec_xl_be/vec_xst_be for 64-bit data types.

I think this is a worth-while simplification and can update the patch to do that if @hfinkel and @wschmidt agree.

As we discussed separately, let's first wait for the other patch to be approved. Then re-purposing this patch to do some clean up in the code, will depend on priorities.

kbarton requested changes to this revision.Aug 31 2016, 9:47 AM
kbarton edited edge metadata.

Test case?

This revision now requires changes to proceed.Aug 31 2016, 9:47 AM

Test case?

There is currently no known way to actually hit this code. This was initially proposed as a fix for a bug in which an invalid optimization left the store node past the DAG combine phase so we matched the node to the SDAG ISel pattern.
When everything is working correctly, there is no way to actually get to the DAG ISel with this node so this pattern should never be matched. However, the pattern is indeed incorrect on little endian platforms.

I can abandon this revision if we choose to leave this pattern in, but ultimately it is incorrect and I would argue that it'd be better to not have it (and crash if we get here) than it is to have an incorrect one.

kbarton accepted this revision.Sep 21 2016, 11:12 AM
kbarton edited edge metadata.

One minor request for a comment, aside from that LGTM.

lib/Target/PowerPC/PPCInstrVSX.td
123

Please add a comment here that this pattern is intentionally left blank to catch an error case that should not happen (because everything should be handled by SDAG ISel).

This revision is now accepted and ready to land.Sep 21 2016, 11:12 AM
nemanjai closed this revision.Sep 22 2016, 3:41 AM

Committed revision 282144.