PR 28130 points out that we have a missing swap prior to a permuting vector store with fast isel. This patch changes the dag patterns for STXVD2X to make them endianness-sensitive.
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
940 | Without this change, powerpc le stxvd2x still works in most of the cases. Does that mean this line introduces a second code path to handle powerpc le stxvd2x? If so, can we remove the first one? |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
940 | I am not sure which definitions you're referring to as first and second and what you're suggesting we remove. As you can see above, I've removed the DAG pattern that was part of the instruction definition. |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
940 | The DAG pattern "(store v2f64:$XT, xoaddr:$dst)" is for big endian, not little endian, because it doesn't consider swap at all. By saying the first code path I mean PPCTargetLowering::expandVSXStoreForLE(). |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
940 | But both code paths now will emit a swap. The code in PPCTargetLowering::expandVSXStoreForLE() handles more than just this pattern you mentioned. I believe the code in this patch will only ever be exercised with fast isel, but I may be wrong about that though. |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
940 | This may or may not help to address the concern raised. What we generate in PPCTargetLowering::expandVSXStoreForLE() is PPCISD::STXVD2X which is different from PPC::STXVD2X which is defined in the td file. |
Having said that, I have not done a full investigation of these patches. But from a quick look, with the other fix, for this bug, PPCTargetLowering::expandVSXStoreForLE() was invoked and this patch was not working. So the other patch seems to depend on the first path. The question is, with that patch do we still have cases, where this path of code is executed.
Again, I just did a quick experiment with the two patches, to make sure there is no obvious inconsistency between them. There might be something that I missed.
Having said that, I have not done a full investigation of these patches. But from a quick look, with the other fix, for this bug, PPCTargetLowering::expandVSXStoreForLE() was invoked and this patch was not working. So the other patch seems to depend on the first path. The question is, with that patch do we still have cases, where this path of code is executed.
As the conclusion of my investigation, the other patch switches the lowering from the second path ((store v2f64:$XT, xoaddr:$dst) in the .td file above, which is also a fallback path when DAGCombiner::combine() fails) to the first path (PPCTargetLowering::expandVSXStoreForLE, generated by a successful combine()). This patch fixes the second path.
I guess the question is why do we have two paths? I haven't looked at FastISel yet.
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
940 |
Does that mean, the STXVD2X selected by fast isel was entirely wrong before this patch? Wouldn't we have noticed that before? | |
940 |
This is helpful, thanks! Actually my patch D21416 fixes the test failure because it makes SelectionDAG successfully lower store to PPCISD::STXVD2X. |
Just and idea, but now that you are investigating it, you may consider this. We should not have C++ code for code gen if it is possible to do things in .td files. Could we keep this patch and get rid of the code in PPCTargetLowering::expandVSXStoreForLE() which generates PPCISD::STXVD2X (and possibly get rid of PPCISD::STXVC2X altogether?).
Yes, I think it would be a good idea to get rid of the custom ISD nodes for LXVD2X and STXVD2X as well as the XXSWAPD node. This way we will always get a swap after the load and before the store as we did with the custom code. Furthermore, we can lower int_ppc_vsx_stxvd2x directly to the STXVD2X instruction (and conversely for the load). This will ensure that if someone uses the intrinsic through a builtin, they just get the instruction without the swap allowing us to trivially implement vec_xl_be/vec_xst_be for 64-bit data types.
I think this is a worth-while simplification and can update the patch to do that if @hfinkel and @wschmidt agree.
Yes, I think it would be a good idea to get rid of the custom ISD nodes for LXVD2X and STXVD2X as well as the XXSWAPD node. This way we will always get a swap after the load and before the store as we did with the custom code. Furthermore, we can lower int_ppc_vsx_stxvd2x directly to the STXVD2X instruction (and conversely for the load). This will ensure that if someone uses the intrinsic through a builtin, they just get the instruction without the swap allowing us to trivially implement vec_xl_be/vec_xst_be for 64-bit data types.
I think this is a worth-while simplification and can update the patch to do that if @hfinkel and @wschmidt agree.
As we discussed separately, let's first wait for the other patch to be approved. Then re-purposing this patch to do some clean up in the code, will depend on priorities.
There is currently no known way to actually hit this code. This was initially proposed as a fix for a bug in which an invalid optimization left the store node past the DAG combine phase so we matched the node to the SDAG ISel pattern.
When everything is working correctly, there is no way to actually get to the DAG ISel with this node so this pattern should never be matched. However, the pattern is indeed incorrect on little endian platforms.
I can abandon this revision if we choose to leave this pattern in, but ultimately it is incorrect and I would argue that it'd be better to not have it (and crash if we get here) than it is to have an incorrect one.
One minor request for a comment, aside from that LGTM.
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
123 | Please add a comment here that this pattern is intentionally left blank to catch an error case that should not happen (because everything should be handled by SDAG ISel). |
Please add a comment here that this pattern is intentionally left blank to catch an error case that should not happen (because everything should be handled by SDAG ISel).