This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
8
PPCInstrVSX.td

Differential D21409

Emit a swap for STXVD2X when it's emitted by matching a 'store' node
ClosedPublic

Authored by nemanjai on Jun 15 2016, 2:43 PM.

Download Raw Diff

Details

Reviewers

wschmidt
echristo
kbarton
amehsan
timshen
hfinkel

Summary

PR 28130 points out that we have a missing swap prior to a permuting vector store with fast isel. This patch changes the dag patterns for STXVD2X to make them endianness-sensitive.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 60895.Jun 15 2016, 2:43 PM

nemanjai retitled this revision from to Emit a swap for STXVD2X when it's emitted by matching a 'store' node.

nemanjai updated this object.

nemanjai added reviewers: timshen, echristo, wschmidt, hfinkel, kbarton, amehsan.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptJun 15 2016, 2:43 PM

timshen added inline comments.Jun 15 2016, 3:24 PM

lib/Target/PowerPC/PPCInstrVSX.td
940	Without this change, powerpc le stxvd2x still works in most of the cases. Does that mean this line introduces a second code path to handle powerpc le stxvd2x? If so, can we remove the first one?

timshen mentioned this in D21416: [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if findBetterNeighborChains doesn't actually CombineTo it..Jun 15 2016, 3:32 PM

nemanjai added inline comments.Jun 17 2016, 1:44 AM

lib/Target/PowerPC/PPCInstrVSX.td
940	I am not sure which definitions you're referring to as first and second and what you're suggesting we remove. As you can see above, I've removed the DAG pattern that was part of the instruction definition.

timshen added inline comments.Jun 17 2016, 10:52 AM

lib/Target/PowerPC/PPCInstrVSX.td
940	The DAG pattern "(store v2f64:$XT, xoaddr:$dst)" is for big endian, not little endian, because it doesn't consider swap at all. By saying the first code path I mean PPCTargetLowering::expandVSXStoreForLE().

nemanjai added inline comments.Jun 17 2016, 12:04 PM

lib/Target/PowerPC/PPCInstrVSX.td
940	But both code paths now will emit a swap. The code in PPCTargetLowering::expandVSXStoreForLE() handles more than just this pattern you mentioned. I believe the code in this patch will only ever be exercised with fast isel, but I may be wrong about that though.

amehsan added inline comments.Jun 17 2016, 12:36 PM

lib/Target/PowerPC/PPCInstrVSX.td
940	This may or may not help to address the concern raised. What we generate in PPCTargetLowering::expandVSXStoreForLE() is PPCISD::STXVD2X which is different from PPC::STXVD2X which is defined in the td file.

Having said that, I have not done a full investigation of these patches. But from a quick look, with the other fix, for this bug, PPCTargetLowering::expandVSXStoreForLE() was invoked and this patch was not working. So the other patch seems to depend on the first path. The question is, with that patch do we still have cases, where this path of code is executed.

Again, I just did a quick experiment with the two patches, to make sure there is no obvious inconsistency between them. There might be something that I missed.

Having said that, I have not done a full investigation of these patches. But from a quick look, with the other fix, for this bug, PPCTargetLowering::expandVSXStoreForLE() was invoked and this patch was not working. So the other patch seems to depend on the first path. The question is, with that patch do we still have cases, where this path of code is executed.

As the conclusion of my investigation, the other patch switches the lowering from the second path ((store v2f64:$XT, xoaddr:$dst) in the .td file above, which is also a fallback path when DAGCombiner::combine() fails) to the first path (PPCTargetLowering::expandVSXStoreForLE, generated by a successful combine()). This patch fixes the second path.

I guess the question is why do we have two paths? I haven't looked at FastISel yet.

lib/Target/PowerPC/PPCInstrVSX.td
940	I believe the code in this patch will only ever be exercised with fast isel, but I may be wrong about that though. Does that mean, the STXVD2X selected by fast isel was entirely wrong before this patch? Wouldn't we have noticed that before?
940	What we generate in PPCTargetLowering::expandVSXStoreForLE() is PPCISD::STXVD2X which is different from PPC::STXVD2X which is defined in the td file. This is helpful, thanks! Actually my patch D21416 fixes the test failure because it makes SelectionDAG successfully lower `store` to PPCISD::STXVD2X.

Just and idea, but now that you are investigating it, you may consider this. We should not have C++ code for code gen if it is possible to do things in .td files. Could we keep this patch and get rid of the code in PPCTargetLowering::expandVSXStoreForLE() which generates PPCISD::STXVD2X (and possibly get rid of PPCISD::STXVC2X altogether?).

In D21409#461284, @amehsan wrote:

Just and idea, but now that you are investigating it, you may consider this. We should not have C++ code for code gen if it is possible to do things in .td files. Could we keep this patch and get rid of the code in PPCTargetLowering::expandVSXStoreForLE() which generates PPCISD::STXVD2X (and possibly get rid of PPCISD::STXVC2X altogether?).

Yes, I think it would be a good idea to get rid of the custom ISD nodes for LXVD2X and STXVD2X as well as the XXSWAPD node. This way we will always get a swap after the load and before the store as we did with the custom code. Furthermore, we can lower int_ppc_vsx_stxvd2x directly to the STXVD2X instruction (and conversely for the load). This will ensure that if someone uses the intrinsic through a builtin, they just get the instruction without the swap allowing us to trivially implement vec_xl_be/vec_xst_be for 64-bit data types.

I think this is a worth-while simplification and can update the patch to do that if @hfinkel and @wschmidt agree.

Yes, I think it would be a good idea to get rid of the custom ISD nodes for LXVD2X and STXVD2X as well as the XXSWAPD node. This way we will always get a swap after the load and before the store as we did with the custom code. Furthermore, we can lower int_ppc_vsx_stxvd2x directly to the STXVD2X instruction (and conversely for the load). This will ensure that if someone uses the intrinsic through a builtin, they just get the instruction without the swap allowing us to trivially implement vec_xl_be/vec_xst_be for 64-bit data types.

I think this is a worth-while simplification and can update the patch to do that if @hfinkel and @wschmidt agree.

As we discussed separately, let's first wait for the other patch to be approved. Then re-purposing this patch to do some clean up in the code, will depend on priorities.

timshen mentioned this in D21692: [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if findBetterNeighborChains doesn't actually CombineTo it..Jun 24 2016, 11:02 AM

timshen mentioned this in rL274644: [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if….Jul 6 2016, 10:51 AM

Test case?

This revision now requires changes to proceed.Aug 31 2016, 9:47 AM

In D21409#530485, @kbarton wrote:

Test case?

There is currently no known way to actually hit this code. This was initially proposed as a fix for a bug in which an invalid optimization left the store node past the DAG combine phase so we matched the node to the SDAG ISel pattern.
When everything is working correctly, there is no way to actually get to the DAG ISel with this node so this pattern should never be matched. However, the pattern is indeed incorrect on little endian platforms.

I can abandon this revision if we choose to leave this pattern in, but ultimately it is incorrect and I would argue that it'd be better to not have it (and crash if we get here) than it is to have an incorrect one.

One minor request for a comment, aside from that LGTM.

lib/Target/PowerPC/PPCInstrVSX.td
123	Please add a comment here that this pattern is intentionally left blank to catch an error case that should not happen (because everything should be handled by SDAG ISel).

This revision is now accepted and ready to land.Sep 21 2016, 11:12 AM

Committed revision 282144.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCInstrVSX.td

11 lines

Diff 60895

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	let Uses = [RM] in {
} // mayLoad		} // mayLoad

// Store indexed instructions		// Store indexed instructions
let mayStore = 1 in {		let mayStore = 1 in {
def STXSDX : XX1Form<31, 716,		def STXSDX : XX1Form<31, 716,
(outs), (ins vsfrc:$XT, memrr:$dst),		(outs), (ins vsfrc:$XT, memrr:$dst),
"stxsdx $XT, $dst", IIC_LdStSTFD,		"stxsdx $XT, $dst", IIC_LdStSTFD,
[(store f64:$XT, xoaddr:$dst)]>;		[(store f64:$XT, xoaddr:$dst)]>;

		kbartonUnsubmitted Not Done Reply Inline Actions Please add a comment here that this pattern is intentionally left blank to catch an error case that should not happen (because everything should be handled by SDAG ISel). kbarton: Please add a comment here that this pattern is intentionally left blank to catch an error case…
def STXVD2X : XX1Form<31, 972,		def STXVD2X : XX1Form<31, 972,
(outs), (ins vsrc:$XT, memrr:$dst),		(outs), (ins vsrc:$XT, memrr:$dst),
"stxvd2x $XT, $dst", IIC_LdStSTFD,		"stxvd2x $XT, $dst", IIC_LdStSTFD,
[(store v2f64:$XT, xoaddr:$dst)]>;		[]>;

def STXVW4X : XX1Form<31, 908,		def STXVW4X : XX1Form<31, 908,
(outs), (ins vsrc:$XT, memrr:$dst),		(outs), (ins vsrc:$XT, memrr:$dst),
"stxvw4x $XT, $dst", IIC_LdStSTFD,		"stxvw4x $XT, $dst", IIC_LdStSTFD,
[(store v4i32:$XT, xoaddr:$dst)]>;		[(store v4i32:$XT, xoaddr:$dst)]>;

} // mayStore		} // mayStore

▲ Show 20 Lines • Show All 796 Lines • ▼ Show 20 Lines
def : Pat<(v2f64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2f64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
def : Pat<(v2i64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2i64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
def : Pat<(v4i32 (load xoaddr:$src)), (LXVW4X xoaddr:$src)>;		def : Pat<(v4i32 (load xoaddr:$src)), (LXVW4X xoaddr:$src)>;
def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;

// Stores.		// Stores.
def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),		def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
(STXVD2X $rS, xoaddr:$dst)>;		(STXVD2X $rS, xoaddr:$dst)>;
		let Predicates = [IsLittleEndian] in {
		timshenUnsubmitted Not Done Reply Inline Actions Without this change, powerpc le stxvd2x still works in most of the cases. Does that mean this line introduces a second code path to handle powerpc le stxvd2x? If so, can we remove the first one? timshen: Without this change, powerpc le stxvd2x still works in most of the cases. Does that mean this…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions I am not sure which definitions you're referring to as first and second and what you're suggesting we remove. As you can see above, I've removed the DAG pattern that was part of the instruction definition. nemanjai: I am not sure which definitions you're referring to as first and second and what you're…
		timshenUnsubmitted Not Done Reply Inline Actions The DAG pattern "(store v2f64:$XT, xoaddr:$dst)" is for big endian, not little endian, because it doesn't consider swap at all. By saying the first code path I mean PPCTargetLowering::expandVSXStoreForLE(). timshen: The DAG pattern "(store v2f64:$XT, xoaddr:$dst)" is for big endian, not little endian, because…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions But both code paths now will emit a swap. The code in PPCTargetLowering::expandVSXStoreForLE() handles more than just this pattern you mentioned. I believe the code in this patch will only ever be exercised with fast isel, but I may be wrong about that though. nemanjai: But both code paths now will emit a swap. The code in PPCTargetLowering::expandVSXStoreForLE()…
		amehsanUnsubmitted Not Done Reply Inline Actions This may or may not help to address the concern raised. What we generate in PPCTargetLowering::expandVSXStoreForLE() is PPCISD::STXVD2X which is different from PPC::STXVD2X which is defined in the td file. amehsan: This may or may not help to address the concern raised. What we generate in PPCTargetLowering…
		timshenUnsubmitted Not Done Reply Inline Actions What we generate in PPCTargetLowering::expandVSXStoreForLE() is PPCISD::STXVD2X which is different from PPC::STXVD2X which is defined in the td file. This is helpful, thanks! Actually my patch D21416 fixes the test failure because it makes SelectionDAG successfully lower `store` to PPCISD::STXVD2X. timshen: > What we generate in PPCTargetLowering::expandVSXStoreForLE() is PPCISD::STXVD2X which is…
		timshenUnsubmitted Not Done Reply Inline Actions I believe the code in this patch will only ever be exercised with fast isel, but I may be wrong about that though. Does that mean, the STXVD2X selected by fast isel was entirely wrong before this patch? Wouldn't we have noticed that before? timshen: > I believe the code in this patch will only ever be exercised with fast isel, but I may be…
		def : Pat<(store v2i64:$rS, xoaddr:$dst),
		(STXVD2X (XXPERMDI $rS, $rS, 2), xoaddr:$dst)>;
		def : Pat<(store v2f64:$rS, xoaddr:$dst),
		(STXVD2X (XXPERMDI $rS, $rS, 2), xoaddr:$dst)>;
		}
		let Predicates = [IsBigEndian] in {
def : Pat<(store v2i64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;		def : Pat<(store v2i64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
		def : Pat<(store v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
		}
def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),		def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),
(STXVW4X $rS, xoaddr:$dst)>;		(STXVW4X $rS, xoaddr:$dst)>;
def : Pat<(PPCstxvd2x v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;		def : Pat<(PPCstxvd2x v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;

// Permutes.		// Permutes.
def : Pat<(v2f64 (PPCxxswapd v2f64:$src)), (XXPERMDI $src, $src, 2)>;		def : Pat<(v2f64 (PPCxxswapd v2f64:$src)), (XXPERMDI $src, $src, 2)>;
def : Pat<(v2i64 (PPCxxswapd v2i64:$src)), (XXPERMDI $src, $src, 2)>;		def : Pat<(v2i64 (PPCxxswapd v2i64:$src)), (XXPERMDI $src, $src, 2)>;
def : Pat<(v4f32 (PPCxxswapd v4f32:$src)), (XXPERMDI $src, $src, 2)>;		def : Pat<(v4f32 (PPCxxswapd v4f32:$src)), (XXPERMDI $src, $src, 2)>;
▲ Show 20 Lines • Show All 1,210 Lines • Show Last 20 Lines