This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
3/6
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
4/9
extract-and-store.ll
-
scalar_vector_test_2.ll

Differential D56175

[PowerPC] Exploit store instructions that store a single vector element
ClosedPublic

Authored by nemanjai on Dec 31 2018, 11:34 AM.

Download Raw Diff

Details

Reviewers

hfinkel
jsji
steven.zhang
stefanp

Commits

rGb9b75de0aebc: [PowerPC] Exploit store instructions that store a single vector element
rL352131: [PowerPC] Exploit store instructions that store a single vector element

Summary

This patch was originally the CodeGen portion of https://reviews.llvm.org/D44528.
Namely, it exploits the instructions that store a single element from a vector to preform a (store (extract_elt)). We already have code that does this with ISA 3.0 instructions that were added to handle i8/i16 types. However, we had never exploited the existing ones that handle f32/f64/i32/i64 types. This patch does that.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai created this revision.Dec 31 2018, 11:34 AM

Herald added a subscriber: kbarton. · View Herald TranscriptDec 31 2018, 11:34 AM

Indentation was off in the td file.

nemanjai marked an inline comment as done.Dec 31 2018, 12:09 PM

nemanjai added inline comments.

test/CodeGen/PowerPC/extract-and-store.ll
118	One of the unanswered comments from the original patch was along the lines of: "The `stfs` instruction performs a conversion from 8-byte single precision (that PPC uses for single precision representation in registers) to 4-byte single precision (the in-memory single precision representation). The updated code no longer involves such a conversion, is that semantically correct?" The short answer is yes. The reason we don't need this conversion is that single precision vector elements are represented the same way in registers and memory. If you inspect the original code sequence carefully, you'll see that it does the following `xxsldwi` to line up the element into the correct location in the register `xscvspdpn` to convert vector single precision to scalar single precision `stfs` to implicitly convert the value back and store it The new sequence just skips all conversion and stores the single-precision vector element as a 4-byte single-precision value.

Thanks for updating!

lib/Target/PowerPC/PPCInstrVSX.td
3343	These are identical to little endian patterns above?
3881	Can we add some comments about the "magic" indexes here.
test/CodeGen/PowerPC/extract-and-store.ll
118	Thanks for checking! yes, you are right, the new sequence should work the same way at the old sequences for normal floating point values. But I am more concern with abnormal values. Since conversions will change abnormal values, so I think we may get different results for out of range values eg: NaN, Inf, Denormals if we remove those conversions. Of course we may argue that the new behavior without conversion maybe be better? So I still think the semantic is actually different, but it is a good idea to remove the conversions if we can accept the risk of changing some floating point application's behavior.
118	Also, looks like some typos in above comments, so it looks confusing to me. My understandings from ISA are: Power ISA defined two data formats for floating point: 32-bit (4-byte) single format: 64-bit (8-byte) double format: Each FPR contains 64 bits that support the floating-point double format. Every instruction that interprets the contents of an FPR as a floating-point value uses the floating-point double format for this interpretation. Vector Floating-Point (VMX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the floating-point single data format. Vector-Scalar Floating-Point (VSX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the floating-point single data format (4 bytes) or floating-point double format (8 bytes) depending on the length. Single Precision Floating-Point takes operands from the FPRs in double format, performs the operation, and then coerces this intermediate result to fit in single format. Status bits, in the FPSCR and optionally in the Condition Register, are set to reflect the single-precision result. The result is then converted to double format and placed into an FPR. `xscvspdpn` is used to convert the element from single data format to double format, without setting FPSCR etc. `stfs` will converting x in FPR from floating-point double format to floating-point single format, then store. `stfiwx` will store the 32-63 bits of FPR to memory directly, without conversion.

nemanjai marked 3 inline comments as done.Jan 11 2019, 6:58 AM

nemanjai added inline comments.

lib/Target/PowerPC/PPCInstrVSX.td
3343	Oops. The extract indices were supposed to be reversed. Good catch. Thank you.
3881	Sure, will do.
test/CodeGen/PowerPC/extract-and-store.ll
118	I'm not really sure what typos you're referring to, but we can think of it as there being two different single precision representations (i.e. bit patterns). The "in-scalar-register" representation is 64-bits wide and is equivalent to the double precision bit pattern for the same value. The in-memory representation is 32-bits wide and conforms to IEEE specifications for binary 32-bit floating point. Vectors of single precision values use the latter for each element. Each instruction that operates on scalar floating point registers and "produces a single precision result" actually produces a double precision result that has equivalent precision to the single precision value that would be produced by performing that operation on its inputs. Now for the issue at hand (i.e. convert-to-extract-then-convert-to-store vs. store-with-no-conversion): The current implementation does the conversions The implementation in this patch does not do the conversions, just store The first conversion (`xscvspdpn`) will perform normalization, NaNs and INFs just produce double precision versions of the same The second conversion (`stfs`) will perform denormalization (NaNs, INFs and zero remain) Now the memory contains an unmodified or denormalized version of the vector element If it's unmodified (the conversion simply extracts the correct number of bits from exponent/mantissa), nothing to discuss here If it's denormalized, it will produce a 32-bit bit pattern that is equivalent to the original denormal value in the vector So unless I am missing something subtle in the conversion that `stfs` does, I believe that this change does not modify the semantics.

Updated the wrong indices for big endian systems. Added comments for magic numbers for indices/shift amounts.

jsji added inline comments.Jan 11 2019, 7:54 AM

test/CodeGen/PowerPC/extract-and-store.ll
118	The typos I was referring is the "8 byte single precision", as I don't think there is any "8 byte single precision" ... There are only two data format: 4 byte single format , 8 byte double format. Anyway, if this can help for your "understanding", it is fine to "think of it as there being two different single precision". I think our key divergency here is whether convert-convert will change semantic for NaN/INFs. From my point of view, conversions might change the values. eg: ConvertSPtoDP_NS used by xscvspdpn if (x.bit[1:8] == 255) then do exponent <- 2047 end else if (x.bit[1:8] == 0) && (fraction == 0) the do exponent <- 0 end The exponent might be overridden. So `convert-to-extract-then-convert-to-store` vs. `store-with-no-conversion` might get different results. But I also agree that both results should be valid. Just that we may want to be aware or accept the risk of changing some floating point application's behavior.

jsji added inline comments.Jan 11 2019, 7:57 AM

lib/Target/PowerPC/PPCInstrVSX.td
3343	Thanks for updating! One question to me is why this not exposed by testcases below? Are we missing some coverage there?

nemanjai marked 2 inline comments as done.Jan 14 2019, 1:50 PM

nemanjai added inline comments.

lib/Target/PowerPC/PPCInstrVSX.td
3343	I'll add it to the respective test.
test/CodeGen/PowerPC/extract-and-store.ll
118	But that's precisely what I'm saying is not the case. Setting the exponent to 2047 is precisely the "keeping NaNs NaNs and INFs INFs" that I was referring to. The `if` in the RTL you have pasted accomplishes that very thing. The `else if` just says that a zero stays a zero. A NaN has the exponent set to all 1's. All 1's in single precision is `255` and in double precision, it's `2047`. INF has the same property with the additional property that the fraction bits are all zeros. https://en.wikipedia.org/wiki/IEEE_754-1985#NaN

nemanjai marked an inline comment as done.Jan 14 2019, 2:10 PM

nemanjai added inline comments.

test/CodeGen/PowerPC/extract-and-store.ll
118	There is of course obviously a very real possibility that I am missing some corner case or am misunderstanding something.

Thanks! Yes, you are right. Sorry, my mistake of pasting conversion code without going through them carefully.

test/CodeGen/PowerPC/extract-and-store.ll
118	Ah, yes, you are right. After going through the two algorithm carefully, I agree that for Inf/NaNs, they should be fine. eg: 0_11111111_10101010101010101010101 => (ConvertSPtoDP_NS) 0_11111111111_10101010101010101010101_0_0000_0000_0000_0000_0000_0000_0000 => (SINGLE) 0_11111111_10101010101010101010101 `
118	What I meant to discuss should be the case that involve denormalization, as you mentioned that `it will produce a 32-bit bit pattern that is equivalent to the original denormal value in the vector`. Here, the 32-bit pattern may be different from what it was without conversion, although it is `equivalent to the original denormal value`. That is what I meant `both results should be valid`, but bit pattern may be different. And since we have flax-vector-conversions, if some application rely on bit patterns with conversion, they may get different result without conversion.

LGTM, as long as we add the missing BE test. Thanks for exploiting this, and also great patience while discussion.

This revision is now accepted and ready to land.Jan 14 2019, 3:21 PM

In D56175#1356963, @jsji wrote:

LGTM, as long as we add the missing BE test. Thanks for exploiting this, and also great patience while discussion.

Not at all, thank you for the close scrutiny of the changes. I always feel much more at ease committing something after a thorough discussion in the review than something that just gets an automatic approval. Also yup, I'll add the BE test on the commit.

Also, some SPEC numbers with this transformation for motivation:
blender - 6.5%/9.5% improvement on base/peak rate
gcc - 2% improvement on both base/peak rate
deepsjeng - 1.31% improvement on base rate
xz - 15.9% improvement on base

lbm - 1.38% degradation on peak rate, no observable change in base rate

All other changes are below 1% with the majority of those in the improvement category.

Closed by commit rL352131: [PowerPC] Exploit store instructions that store a single vector element (authored by nemanjai). · Explain WhyJan 24 2019, 3:44 PM

This revision was automatically updated to reflect the committed changes.

bzEq added a subscriber: bzEq.Apr 30 2019, 2:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2019, 2:28 AM

bzEq removed a subscriber: bzEq.Apr 30 2019, 2:28 AM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCInstrVSX.td

94 lines

test/

CodeGen/

PowerPC/

extract-and-store.ll

145 lines

scalar_vector_test_2.ll

55 lines

Diff 179776

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 3,301 Lines • ▼ Show 20 Lines	// of all 64 VSX registers.
(COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC), 2))>;		(COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC), 2))>;

def : Pat<(v2f64 (scalar_to_vector (f64 (load ixaddr:$src)))),		def : Pat<(v2f64 (scalar_to_vector (f64 (load ixaddr:$src)))),
(v2f64 (XXPERMDIs		(v2f64 (XXPERMDIs
(COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC), 2))>;		(COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC), 2))>;
def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddr:$src)))),		def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddr:$src)))),
(v2f64 (XXPERMDIs		(v2f64 (XXPERMDIs
(COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC), 2))>;		(COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC), 2))>;
}		def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
		sub_64), xaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 0)), xaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
		sub_64), xaddr:$src)>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 1)), xaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), xaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 1)), xaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), xaddr:$src)>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 0)), ixaddr:$src),
		(DFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
		sub_64), ixaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 0)), ixaddr:$src),
		(DFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2), sub_64),
		ixaddr:$src)>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 1)), ixaddr:$src),
		(DFSTOREf64 (EXTRACT_SUBREG $A, sub_64), ixaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 1)), ixaddr:$src),
		(DFSTOREf64 (EXTRACT_SUBREG $A, sub_64), ixaddr:$src)>;
		} // IsLittleEndian, HasP9Vector

let Predicates = [IsBigEndian, HasP9Vector] in {		let Predicates = [IsBigEndian, HasP9Vector] in {
def : Pat<(v2i64 (scalar_to_vector (i64 (load ixaddr:$src)))),		def : Pat<(v2i64 (scalar_to_vector (i64 (load ixaddr:$src)))),
(v2i64 (COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC))>;		(v2i64 (COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC))>;
def : Pat<(v2i64 (scalar_to_vector (i64 (load xaddr:$src)))),		def : Pat<(v2i64 (scalar_to_vector (i64 (load xaddr:$src)))),
(v2i64 (COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC))>;		(v2i64 (COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC))>;

def : Pat<(v2f64 (scalar_to_vector (f64 (load ixaddr:$src)))),		def : Pat<(v2f64 (scalar_to_vector (f64 (load ixaddr:$src)))),
(v2f64 (COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC))>;		(v2f64 (COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC))>;
def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddr:$src)))),		def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddr:$src)))),
(v2f64 (COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC))>;		(v2f64 (COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC))>;
}		def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
		jsjiUnsubmitted Not Done Reply Inline Actions These are identical to little endian patterns above? jsji: These are identical to little endian patterns above?
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Oops. The extract indices were supposed to be reversed. Good catch. Thank you. nemanjai: Oops. The extract indices were supposed to be reversed. Good catch. Thank you.
		jsjiUnsubmitted Not Done Reply Inline Actions Thanks for updating! One question to me is why this not exposed by testcases below? Are we missing some coverage there? jsji: Thanks for updating! One question to me is why this not exposed by testcases below? Are we…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions I'll add it to the respective test. nemanjai: I'll add it to the respective test.
		sub_64), xaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 0)), xaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
		sub_64), xaddr:$src)>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 1)), xaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), xaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 1)), xaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), xaddr:$src)>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 0)), ixaddr:$src),
		(DFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
		sub_64), ixaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 0)), ixaddr:$src),
		(DFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
		sub_64), ixaddr:$src)>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 1)), ixaddr:$src),
		(DFSTOREf64 (EXTRACT_SUBREG $A, sub_64), ixaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 1)), ixaddr:$src),
		(DFSTOREf64 (EXTRACT_SUBREG $A, sub_64), ixaddr:$src)>;
		} // IsBigEndian, HasP9Vector
}		}

let Predicates = [IsBigEndian, HasP9Vector] in {		let Predicates = [IsBigEndian, HasP9Vector] in {

// (Un)Signed DWord vector extract -> QP		// (Un)Signed DWord vector extract -> QP
def : Pat<(f128 (sint_to_fp (i64 (extractelt v2i64:$src, 0)))),		def : Pat<(f128 (sint_to_fp (i64 (extractelt v2i64:$src, 0)))),
(f128 (XSCVSDQP (COPY_TO_REGCLASS $src, VFRC)))>;		(f128 (XSCVSDQP (COPY_TO_REGCLASS $src, VFRC)))>;
def : Pat<(f128 (sint_to_fp (i64 (extractelt v2i64:$src, 1)))),		def : Pat<(f128 (sint_to_fp (i64 (extractelt v2i64:$src, 1)))),
▲ Show 20 Lines • Show All 498 Lines • ▼ Show 20 Lines	let AddedComplexity = 400 in {

let Predicates = [IsBigEndian, HasP8Vector] in {		let Predicates = [IsBigEndian, HasP8Vector] in {
def : Pat<DWToSPExtractConv.BVU,		def : Pat<DWToSPExtractConv.BVU,
(v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S1), (XVCVUXDSP $S1), 3),		(v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S1), (XVCVUXDSP $S1), 3),
(XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3)))>;		(XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3)))>;
def : Pat<DWToSPExtractConv.BVS,		def : Pat<DWToSPExtractConv.BVS,
(v4f32 (VPKUDUM (XXSLDWI (XVCVSXDSP $S1), (XVCVSXDSP $S1), 3),		(v4f32 (VPKUDUM (XXSLDWI (XVCVSXDSP $S1), (XVCVSXDSP $S1), 3),
(XXSLDWI (XVCVSXDSP $S2), (XVCVSXDSP $S2), 3)))>;		(XXSLDWI (XVCVSXDSP $S2), (XVCVSXDSP $S2), 3)))>;
		def : Pat<(store (i32 (extractelt v4i32:$A, 1)), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		def : Pat<(store (f32 (extractelt v4f32:$A, 1)), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		foreach Idx = [ [0,3], [2,1], [3,2] ] in {
		jsjiUnsubmitted Not Done Reply Inline Actions Can we add some comments about the "magic" indexes here. jsji: Can we add some comments about the "magic" indexes here.
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Sure, will do. nemanjai: Sure, will do.
		def : Pat<(store (i32 (extractelt v4i32:$A, !head(Idx))), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG (XXSLDWI $A, $A, !head(!tail(Idx))),
		sub_64), xoaddr:$src)>;
		def : Pat<(store (f32 (extractelt v4f32:$A, !head(Idx))), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG (XXSLDWI $A, $A, !head(!tail(Idx))),
		sub_64), xoaddr:$src)>;
		}
		}

		let Predicates = [HasP8Vector, IsBigEndian, NoP9Vector] in {
		def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xoaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 0)), xoaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 1)), xoaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2), sub_64),
		xoaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 1)), xoaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2), sub_64),
		xoaddr:$src)>;
}		}

// Big endian, available on all targets with VSX		// Big endian, available on all targets with VSX
let Predicates = [IsBigEndian, HasVSX] in {		let Predicates = [IsBigEndian, HasVSX] in {
def : Pat<(v2f64 (build_vector f64:$A, f64:$B)),		def : Pat<(v2f64 (build_vector f64:$A, f64:$B)),
(v2f64 (XXPERMDI		(v2f64 (XXPERMDI
(COPY_TO_REGCLASS $A, VSRC),		(COPY_TO_REGCLASS $A, VSRC),
(COPY_TO_REGCLASS $B, VSRC), 0))>;		(COPY_TO_REGCLASS $B, VSRC), 0))>;

Show All 20 Lines	let AddedComplexity = 400 in {

let Predicates = [IsLittleEndian, HasP8Vector] in {		let Predicates = [IsLittleEndian, HasP8Vector] in {
def : Pat<DWToSPExtractConv.BVU,		def : Pat<DWToSPExtractConv.BVU,
(v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3),		(v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3),
(XXSLDWI (XVCVUXDSP $S1), (XVCVUXDSP $S1), 3)))>;		(XXSLDWI (XVCVUXDSP $S1), (XVCVUXDSP $S1), 3)))>;
def : Pat<DWToSPExtractConv.BVS,		def : Pat<DWToSPExtractConv.BVS,
(v4f32 (VPKUDUM (XXSLDWI (XVCVSXDSP $S2), (XVCVSXDSP $S2), 3),		(v4f32 (VPKUDUM (XXSLDWI (XVCVSXDSP $S2), (XVCVSXDSP $S2), 3),
(XXSLDWI (XVCVSXDSP $S1), (XVCVSXDSP $S1), 3)))>;		(XXSLDWI (XVCVSXDSP $S1), (XVCVSXDSP $S1), 3)))>;
		def : Pat<(store (i32 (extractelt v4i32:$A, 2)), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		def : Pat<(store (f32 (extractelt v4f32:$A, 2)), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		foreach Idx = [ [0,2], [1,1], [3,3] ] in {
		def : Pat<(store (i32 (extractelt v4i32:$A, !head(Idx))), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG (XXSLDWI $A, $A, !head(!tail(Idx))),
		sub_64), xoaddr:$src)>;
		def : Pat<(store (f32 (extractelt v4f32:$A, !head(Idx))), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG (XXSLDWI $A, $A, !head(!tail(Idx))),
		sub_64), xoaddr:$src)>;
		}
		}

		let Predicates = [HasP8Vector, IsLittleEndian, NoP9Vector] in {
		def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xoaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2), sub_64),
		xoaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 0)), xoaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2), sub_64),
		xoaddr:$src)>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 1)), xoaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		def : Pat<(store (f64 (extractelt v2f64:$A, 1)), xoaddr:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
}		}

let Predicates = [IsLittleEndian, HasVSX] in {		let Predicates = [IsLittleEndian, HasVSX] in {
// Little endian, available on all targets with VSX		// Little endian, available on all targets with VSX
def : Pat<(v2f64 (build_vector f64:$A, f64:$B)),		def : Pat<(v2f64 (build_vector f64:$A, f64:$B)),
(v2f64 (XXPERMDI		(v2f64 (XXPERMDI
(COPY_TO_REGCLASS $B, VSRC),		(COPY_TO_REGCLASS $B, VSRC),
(COPY_TO_REGCLASS $A, VSRC), 0))>;		(COPY_TO_REGCLASS $A, VSRC), 0))>;

▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/extract-and-store.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unkknown-unknown \		; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unkknown-unknown \
; RUN: -ppc-asm-full-reg-names -verify-machineinstrs -O2 < %s \| FileCheck %s		; RUN: -ppc-asm-full-reg-names -verify-machineinstrs -O2 < %s \| FileCheck %s
; RUN: llc -mcpu=pwr8 -mtriple=powerpc64-unkknown-unknown \		; RUN: llc -mcpu=pwr8 -mtriple=powerpc64-unkknown-unknown \
; RUN: -ppc-asm-full-reg-names -verify-machineinstrs -O2 < %s \| FileCheck %s \		; RUN: -ppc-asm-full-reg-names -verify-machineinstrs -O2 < %s \| FileCheck %s \
; RUN: --check-prefix=CHECK-BE		; RUN: --check-prefix=CHECK-BE
; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unkknown-unknown \		; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unkknown-unknown \
; RUN: -ppc-asm-full-reg-names -verify-machineinstrs -O2 < %s \| FileCheck %s \		; RUN: -ppc-asm-full-reg-names -verify-machineinstrs -O2 < %s \| FileCheck %s \
; RUN: --check-prefix=CHECK-P9		; RUN: --check-prefix=CHECK-P9
; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <2 x i64> @testll0(<2 x i64> returned %a, <2 x i64> %b, i64* nocapture %ap) local_unnamed_addr #0 {		define <2 x i64> @testll0(<2 x i64> returned %a, <2 x i64> %b, i64* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testll0:		; CHECK-LABEL: testll0:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxswapd vs0, vs34		; CHECK-NEXT: xxswapd vs0, vs34
; CHECK-NEXT: mfvsrd r3, f0		; CHECK-NEXT: stfd f0, 24(r7)
; CHECK-NEXT: std r3, 24(r7)
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testll0:		; CHECK-BE-LABEL: testll0:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: mfvsrd r3, vs34		; CHECK-BE-NEXT: addi r3, r7, 24
; CHECK-BE-NEXT: std r3, 24(r7)		; CHECK-BE-NEXT: stxsdx vs34, 0, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testll0:		; CHECK-P9-LABEL: testll0:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: mfvsrld r3, vs34		; CHECK-P9-NEXT: xxswapd vs0, vs34
; CHECK-P9-NEXT: std r3, 24(r7)		; CHECK-P9-NEXT: stfd f0, 24(r7)
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <2 x i64> %a, i32 0		%vecext = extractelement <2 x i64> %a, i32 0
%arrayidx = getelementptr inbounds i64, i64* %ap, i64 3		%arrayidx = getelementptr inbounds i64, i64* %ap, i64 3
store i64 %vecext, i64* %arrayidx, align 8		store i64 %vecext, i64* %arrayidx, align 8
ret <2 x i64> %a		ret <2 x i64> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <2 x i64> @testll1(<2 x i64> returned %a, i64 %b, i64* nocapture %ap) local_unnamed_addr #0 {		define <2 x i64> @testll1(<2 x i64> returned %a, i64 %b, i64* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testll1:		; CHECK-LABEL: testll1:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: mfvsrd r3, vs34		; CHECK-NEXT: addi r3, r6, 24
; CHECK-NEXT: std r3, 24(r6)		; CHECK-NEXT: stxsdx vs34, 0, r3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testll1:		; CHECK-BE-LABEL: testll1:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxswapd vs0, vs34		; CHECK-BE-NEXT: xxswapd vs0, vs34
; CHECK-BE-NEXT: mfvsrd r3, f0		; CHECK-BE-NEXT: stfd f0, 24(r6)
; CHECK-BE-NEXT: std r3, 24(r6)
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testll1:		; CHECK-P9-LABEL: testll1:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: mfvsrd r3, vs34		; CHECK-P9-NEXT: stxsd v2, 24(r6)
; CHECK-P9-NEXT: std r3, 24(r6)
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <2 x i64> %a, i32 1		%vecext = extractelement <2 x i64> %a, i32 1
%arrayidx = getelementptr inbounds i64, i64* %ap, i64 3		%arrayidx = getelementptr inbounds i64, i64* %ap, i64 3
store i64 %vecext, i64* %arrayidx, align 8		store i64 %vecext, i64* %arrayidx, align 8
ret <2 x i64> %a		ret <2 x i64> %a
}		}

▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	entry:
store double %vecext, double* %arrayidx, align 8		store double %vecext, double* %arrayidx, align 8
ret <2 x double> %a		ret <2 x double> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <4 x float> @testf0(<4 x float> returned %a, <4 x float> %b, float* nocapture %ap) local_unnamed_addr #0 {		define <4 x float> @testf0(<4 x float> returned %a, <4 x float> %b, float* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testf0:		; CHECK-LABEL: testf0:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 2
; CHECK-NEXT: xscvspdpn f0, vs0		; CHECK-NEXT: addi r3, r7, 12
; CHECK-NEXT: stfs f0, 12(r7)		; CHECK-NEXT: stfiwx f0, 0, r3
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions One of the unanswered comments from the original patch was along the lines of: "The `stfs` instruction performs a conversion from 8-byte single precision (that PPC uses for single precision representation in registers) to 4-byte single precision (the in-memory single precision representation). The updated code no longer involves such a conversion, is that semantically correct?" The short answer is yes. The reason we don't need this conversion is that single precision vector elements are represented the same way in registers and memory. If you inspect the original code sequence carefully, you'll see that it does the following `xxsldwi` to line up the element into the correct location in the register `xscvspdpn` to convert vector single precision to scalar single precision `stfs` to implicitly convert the value back and store it The new sequence just skips all conversion and stores the single-precision vector element as a 4-byte single-precision value. nemanjai: One of the unanswered comments from the original patch was along the lines of: "The `stfs`…
		jsjiUnsubmitted Not Done Reply Inline Actions Thanks for checking! yes, you are right, the new sequence should work the same way at the old sequences for normal floating point values. But I am more concern with abnormal values. Since conversions will change abnormal values, so I think we may get different results for out of range values eg: NaN, Inf, Denormals if we remove those conversions. Of course we may argue that the new behavior without conversion maybe be better? So I still think the semantic is actually different, but it is a good idea to remove the conversions if we can accept the risk of changing some floating point application's behavior. jsji: Thanks for checking! yes, you are right, the new sequence should work the same way at the old…
		jsjiUnsubmitted Not Done Reply Inline Actions Also, looks like some typos in above comments, so it looks confusing to me. My understandings from ISA are: Power ISA defined two data formats for floating point: 32-bit (4-byte) single format: 64-bit (8-byte) double format: Each FPR contains 64 bits that support the floating-point double format. Every instruction that interprets the contents of an FPR as a floating-point value uses the floating-point double format for this interpretation. Vector Floating-Point (VMX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the floating-point single data format. Vector-Scalar Floating-Point (VSX) instruction interprets the contents of a Vector Register (VR) as a sequence of equal-length elements, each element use the floating-point single data format (4 bytes) or floating-point double format (8 bytes) depending on the length. Single Precision Floating-Point takes operands from the FPRs in double format, performs the operation, and then coerces this intermediate result to fit in single format. Status bits, in the FPSCR and optionally in the Condition Register, are set to reflect the single-precision result. The result is then converted to double format and placed into an FPR. `xscvspdpn` is used to convert the element from single data format to double format, without setting FPSCR etc. `stfs` will converting x in FPR from floating-point double format to floating-point single format, then store. `stfiwx` will store the 32-63 bits of FPR to memory directly, without conversion. jsji: Also, looks like some typos in above comments, so it looks confusing to me. My understandings…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions I'm not really sure what typos you're referring to, but we can think of it as there being two different single precision representations (i.e. bit patterns). The "in-scalar-register" representation is 64-bits wide and is equivalent to the double precision bit pattern for the same value. The in-memory representation is 32-bits wide and conforms to IEEE specifications for binary 32-bit floating point. Vectors of single precision values use the latter for each element. Each instruction that operates on scalar floating point registers and "produces a single precision result" actually produces a double precision result that has equivalent precision to the single precision value that would be produced by performing that operation on its inputs. Now for the issue at hand (i.e. convert-to-extract-then-convert-to-store vs. store-with-no-conversion): The current implementation does the conversions The implementation in this patch does not do the conversions, just store The first conversion (`xscvspdpn`) will perform normalization, NaNs and INFs just produce double precision versions of the same The second conversion (`stfs`) will perform denormalization (NaNs, INFs and zero remain) Now the memory contains an unmodified or denormalized version of the vector element If it's unmodified (the conversion simply extracts the correct number of bits from exponent/mantissa), nothing to discuss here If it's denormalized, it will produce a 32-bit bit pattern that is equivalent to the original denormal value in the vector So unless I am missing something subtle in the conversion that `stfs` does, I believe that this change does not modify the semantics. nemanjai: I'm not really sure what typos you're referring to, but we can think of it as there being two…
		jsjiUnsubmitted Not Done Reply Inline Actions The typos I was referring is the "8 byte single precision", as I don't think there is any "8 byte single precision" ... There are only two data format: 4 byte single format , 8 byte double format. Anyway, if this can help for your "understanding", it is fine to "think of it as there being two different single precision". I think our key divergency here is whether convert-convert will change semantic for NaN/INFs. From my point of view, conversions might change the values. eg: ConvertSPtoDP_NS used by xscvspdpn if (x.bit[1:8] == 255) then do exponent <- 2047 end else if (x.bit[1:8] == 0) && (fraction == 0) the do exponent <- 0 end The exponent might be overridden. So `convert-to-extract-then-convert-to-store` vs. `store-with-no-conversion` might get different results. But I also agree that both results should be valid. Just that we may want to be aware or accept the risk of changing some floating point application's behavior. jsji: The typos I was referring is the "8 byte single precision", as I don't think there is any "8…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions But that's precisely what I'm saying is not the case. Setting the exponent to 2047 is precisely the "keeping NaNs NaNs and INFs INFs" that I was referring to. The `if` in the RTL you have pasted accomplishes that very thing. The `else if` just says that a zero stays a zero. A NaN has the exponent set to all 1's. All 1's in single precision is `255` and in double precision, it's `2047`. INF has the same property with the additional property that the fraction bits are all zeros. https://en.wikipedia.org/wiki/IEEE_754-1985#NaN nemanjai: But that's precisely what I'm saying is not the case. Setting the exponent to 2047 is…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions There is of course obviously a very real possibility that I am missing some corner case or am misunderstanding something. nemanjai: There is of course obviously a very real possibility that I am missing some corner case or am…
		jsjiUnsubmitted Not Done Reply Inline Actions Ah, yes, you are right. After going through the two algorithm carefully, I agree that for Inf/NaNs, they should be fine. eg: 0_11111111_10101010101010101010101 => (ConvertSPtoDP_NS) 0_11111111111_10101010101010101010101_0_0000_0000_0000_0000_0000_0000_0000 => (SINGLE) 0_11111111_10101010101010101010101 ` jsji: Ah, yes, you are right. After going through the two algorithm carefully, I agree that for…
		jsjiUnsubmitted Not Done Reply Inline Actions What I meant to discuss should be the case that involve denormalization, as you mentioned that `it will produce a 32-bit bit pattern that is equivalent to the original denormal value in the vector`. Here, the 32-bit pattern may be different from what it was without conversion, although it is `equivalent to the original denormal value`. That is what I meant `both results should be valid`, but bit pattern may be different. And since we have flax-vector-conversions, if some application rely on bit patterns with conversion, they may get different result without conversion. jsji: What I meant to discuss should be the case that involve denormalization, as you mentioned that…
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testf0:		; CHECK-BE-LABEL: testf0:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xscvspdpn f0, vs34		; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 3
; CHECK-BE-NEXT: stfs f0, 12(r7)		; CHECK-BE-NEXT: addi r3, r7, 12
		; CHECK-BE-NEXT: stfiwx f0, 0, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testf0:		; CHECK-P9-LABEL: testf0:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 2
; CHECK-P9-NEXT: xscvspdpn f0, vs0		; CHECK-P9-NEXT: addi r3, r7, 12
; CHECK-P9-NEXT: stfs f0, 12(r7)		; CHECK-P9-NEXT: stfiwx f0, 0, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x float> %a, i32 0		%vecext = extractelement <4 x float> %a, i32 0
%arrayidx = getelementptr inbounds float, float* %ap, i64 3		%arrayidx = getelementptr inbounds float, float* %ap, i64 3
store float %vecext, float* %arrayidx, align 4		store float %vecext, float* %arrayidx, align 4
ret <4 x float> %a		ret <4 x float> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <4 x float> @testf1(<4 x float> returned %a, <4 x float> %b, float* nocapture %ap) local_unnamed_addr #0 {		define <4 x float> @testf1(<4 x float> returned %a, <4 x float> %b, float* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testf1:		; CHECK-LABEL: testf1:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxswapd vs0, vs34		; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 1
; CHECK-NEXT: xscvspdpn f0, vs0		; CHECK-NEXT: addi r3, r7, 12
; CHECK-NEXT: stfs f0, 12(r7)		; CHECK-NEXT: stfiwx f0, 0, r3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testf1:		; CHECK-BE-LABEL: testf1:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-BE-NEXT: addi r3, r7, 12
; CHECK-BE-NEXT: xscvspdpn f0, vs0		; CHECK-BE-NEXT: stxsiwx vs34, 0, r3
; CHECK-BE-NEXT: stfs f0, 12(r7)
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testf1:		; CHECK-P9-LABEL: testf1:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: xxswapd vs0, vs34		; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 1
; CHECK-P9-NEXT: xscvspdpn f0, vs0		; CHECK-P9-NEXT: addi r3, r7, 12
; CHECK-P9-NEXT: stfs f0, 12(r7)		; CHECK-P9-NEXT: stfiwx f0, 0, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x float> %a, i32 1		%vecext = extractelement <4 x float> %a, i32 1
%arrayidx = getelementptr inbounds float, float* %ap, i64 3		%arrayidx = getelementptr inbounds float, float* %ap, i64 3
store float %vecext, float* %arrayidx, align 4		store float %vecext, float* %arrayidx, align 4
ret <4 x float> %a		ret <4 x float> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <4 x float> @testf2(<4 x float> returned %a, <4 x float> %b, float* nocapture %ap) local_unnamed_addr #0 {		define <4 x float> @testf2(<4 x float> returned %a, <4 x float> %b, float* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testf2:		; CHECK-LABEL: testf2:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-NEXT: addi r3, r7, 12
; CHECK-NEXT: xscvspdpn f0, vs0		; CHECK-NEXT: stxsiwx vs34, 0, r3
; CHECK-NEXT: stfs f0, 12(r7)
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testf2:		; CHECK-BE-LABEL: testf2:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxswapd vs0, vs34		; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 1
; CHECK-BE-NEXT: xscvspdpn f0, vs0		; CHECK-BE-NEXT: addi r3, r7, 12
; CHECK-BE-NEXT: stfs f0, 12(r7)		; CHECK-BE-NEXT: stfiwx f0, 0, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testf2:		; CHECK-P9-LABEL: testf2:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-P9-NEXT: addi r3, r7, 12
; CHECK-P9-NEXT: xscvspdpn f0, vs0		; CHECK-P9-NEXT: stxsiwx vs34, 0, r3
; CHECK-P9-NEXT: stfs f0, 12(r7)
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x float> %a, i32 2		%vecext = extractelement <4 x float> %a, i32 2
%arrayidx = getelementptr inbounds float, float* %ap, i64 3		%arrayidx = getelementptr inbounds float, float* %ap, i64 3
store float %vecext, float* %arrayidx, align 4		store float %vecext, float* %arrayidx, align 4
ret <4 x float> %a		ret <4 x float> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <4 x float> @testf3(<4 x float> returned %a, <4 x float> %b, float* nocapture %ap) local_unnamed_addr #0 {		define <4 x float> @testf3(<4 x float> returned %a, <4 x float> %b, float* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testf3:		; CHECK-LABEL: testf3:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xscvspdpn f0, vs34		; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 3
; CHECK-NEXT: stfs f0, 12(r7)		; CHECK-NEXT: addi r3, r7, 12
		; CHECK-NEXT: stfiwx f0, 0, r3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testf3:		; CHECK-BE-LABEL: testf3:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 2
; CHECK-BE-NEXT: xscvspdpn f0, vs0		; CHECK-BE-NEXT: addi r3, r7, 12
; CHECK-BE-NEXT: stfs f0, 12(r7)		; CHECK-BE-NEXT: stfiwx f0, 0, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testf3:		; CHECK-P9-LABEL: testf3:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: xscvspdpn f0, vs34		; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 3
; CHECK-P9-NEXT: stfs f0, 12(r7)		; CHECK-P9-NEXT: addi r3, r7, 12
		; CHECK-P9-NEXT: stfiwx f0, 0, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x float> %a, i32 3		%vecext = extractelement <4 x float> %a, i32 3
%arrayidx = getelementptr inbounds float, float* %ap, i64 3		%arrayidx = getelementptr inbounds float, float* %ap, i64 3
store float %vecext, float* %arrayidx, align 4		store float %vecext, float* %arrayidx, align 4
ret <4 x float> %a		ret <4 x float> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <4 x i32> @testi0(<4 x i32> returned %a, <4 x i32> %b, i32* nocapture %ap) local_unnamed_addr #0 {		define <4 x i32> @testi0(<4 x i32> returned %a, <4 x i32> %b, i32* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testi0:		; CHECK-LABEL: testi0:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxswapd vs0, vs34		; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 2
; CHECK-NEXT: mfvsrwz r3, f0		; CHECK-NEXT: addi r3, r7, 12
; CHECK-NEXT: stw r3, 12(r7)		; CHECK-NEXT: stfiwx f0, 0, r3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testi0:		; CHECK-BE-LABEL: testi0:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 3
; CHECK-BE-NEXT: mfvsrwz r3, f0		; CHECK-BE-NEXT: addi r3, r7, 12
; CHECK-BE-NEXT: stw r3, 12(r7)		; CHECK-BE-NEXT: stfiwx f0, 0, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testi0:		; CHECK-P9-LABEL: testi0:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: li r3, 0		; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 2
; CHECK-P9-NEXT: vextuwrx r3, r3, v2		; CHECK-P9-NEXT: addi r3, r7, 12
; CHECK-P9-NEXT: stw r3, 12(r7)		; CHECK-P9-NEXT: stfiwx f0, 0, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x i32> %a, i32 0		%vecext = extractelement <4 x i32> %a, i32 0
%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3		%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3
store i32 %vecext, i32* %arrayidx, align 4		store i32 %vecext, i32* %arrayidx, align 4
ret <4 x i32> %a		ret <4 x i32> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <4 x i32> @testi1(<4 x i32> returned %a, <4 x i32> %b, i32* nocapture %ap) local_unnamed_addr #0 {		define <4 x i32> @testi1(<4 x i32> returned %a, <4 x i32> %b, i32* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testi1:		; CHECK-LABEL: testi1:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 1
; CHECK-NEXT: mfvsrwz r3, f0		; CHECK-NEXT: addi r3, r7, 12
; CHECK-NEXT: stw r3, 12(r7)		; CHECK-NEXT: stfiwx f0, 0, r3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testi1:		; CHECK-BE-LABEL: testi1:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: mfvsrwz r3, vs34		; CHECK-BE-NEXT: addi r3, r7, 12
; CHECK-BE-NEXT: stw r3, 12(r7)		; CHECK-BE-NEXT: stxsiwx vs34, 0, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testi1:		; CHECK-P9-LABEL: testi1:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: li r3, 4		; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 1
; CHECK-P9-NEXT: vextuwrx r3, r3, v2		; CHECK-P9-NEXT: addi r3, r7, 12
; CHECK-P9-NEXT: stw r3, 12(r7)		; CHECK-P9-NEXT: stfiwx f0, 0, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x i32> %a, i32 1		%vecext = extractelement <4 x i32> %a, i32 1
%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3		%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3
store i32 %vecext, i32* %arrayidx, align 4		store i32 %vecext, i32* %arrayidx, align 4
ret <4 x i32> %a		ret <4 x i32> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <4 x i32> @testi2(<4 x i32> returned %a, <4 x i32> %b, i32* nocapture %ap) local_unnamed_addr #0 {		define <4 x i32> @testi2(<4 x i32> returned %a, <4 x i32> %b, i32* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testi2:		; CHECK-LABEL: testi2:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: mfvsrwz r3, vs34		; CHECK-NEXT: addi r3, r7, 12
; CHECK-NEXT: stw r3, 12(r7)		; CHECK-NEXT: stxsiwx vs34, 0, r3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testi2:		; CHECK-BE-LABEL: testi2:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 1
; CHECK-BE-NEXT: mfvsrwz r3, f0		; CHECK-BE-NEXT: addi r3, r7, 12
; CHECK-BE-NEXT: stw r3, 12(r7)		; CHECK-BE-NEXT: stfiwx f0, 0, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testi2:		; CHECK-P9-LABEL: testi2:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: mfvsrwz r3, vs34		; CHECK-P9-NEXT: addi r3, r7, 12
; CHECK-P9-NEXT: stw r3, 12(r7)		; CHECK-P9-NEXT: stxsiwx vs34, 0, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x i32> %a, i32 2		%vecext = extractelement <4 x i32> %a, i32 2
%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3		%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3
store i32 %vecext, i32* %arrayidx, align 4		store i32 %vecext, i32* %arrayidx, align 4
ret <4 x i32> %a		ret <4 x i32> %a
}		}

; Function Attrs: norecurse nounwind writeonly		; Function Attrs: norecurse nounwind writeonly
define <4 x i32> @testi3(<4 x i32> returned %a, <4 x i32> %b, i32* nocapture %ap) local_unnamed_addr #0 {		define <4 x i32> @testi3(<4 x i32> returned %a, <4 x i32> %b, i32* nocapture %ap) local_unnamed_addr #0 {
; CHECK-LABEL: testi3:		; CHECK-LABEL: testi3:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 3
; CHECK-NEXT: mfvsrwz r3, f0		; CHECK-NEXT: addi r3, r7, 12
; CHECK-NEXT: stw r3, 12(r7)		; CHECK-NEXT: stfiwx f0, 0, r3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: testi3:		; CHECK-BE-LABEL: testi3:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxswapd vs0, vs34		; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 2
; CHECK-BE-NEXT: mfvsrwz r3, f0		; CHECK-BE-NEXT: addi r3, r7, 12
; CHECK-BE-NEXT: stw r3, 12(r7)		; CHECK-BE-NEXT: stfiwx f0, 0, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: testi3:		; CHECK-P9-LABEL: testi3:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: li r3, 12		; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 3
; CHECK-P9-NEXT: vextuwrx r3, r3, v2		; CHECK-P9-NEXT: addi r3, r7, 12
; CHECK-P9-NEXT: stw r3, 12(r7)		; CHECK-P9-NEXT: stfiwx f0, 0, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x i32> %a, i32 3		%vecext = extractelement <4 x i32> %a, i32 3
%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3		%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3
store i32 %vecext, i32* %arrayidx, align 4		store i32 %vecext, i32* %arrayidx, align 4
ret <4 x i32> %a		ret <4 x i32> %a
}		}

test/CodeGen/PowerPC/scalar_vector_test_2.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mcpu=pwr9 -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names \			; RUN: llc -mcpu=pwr9 -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names \
	; RUN: -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=P9LE			; RUN: -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=P9LE
	; RUN: llc -mcpu=pwr9 -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names \			; RUN: llc -mcpu=pwr9 -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names \
	; RUN: -mtriple=powerpc64-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=P9BE			; RUN: -mtriple=powerpc64-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=P9BE
	; RUN: llc -mcpu=pwr8 -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names \			; RUN: llc -mcpu=pwr8 -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names \
	; RUN: -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=P8LE			; RUN: -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=P8LE
	; RUN: llc -mcpu=pwr8 -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names \			; RUN: llc -mcpu=pwr8 -verify-machineinstrs -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names \
	; RUN: -mtriple=powerpc64-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=P8BE			; RUN: -mtriple=powerpc64-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=P8BE

	define void @test_liwzx1(<1 x float>* %A, <1 x float>* %B, <1 x float>* %C) {			define void @test_liwzx1(<1 x float>* %A, <1 x float>* %B, <1 x float>* %C) {
	; P9LE-LABEL: test_liwzx1:			; P9LE-LABEL: test_liwzx1:
	; P9LE: # %bb.0:			; P9LE: # %bb.0:
	; P9LE-NEXT: lfiwzx f0, 0, r3			; P9LE-NEXT: lfiwzx f0, 0, r3
	; P9LE-NEXT: lfiwzx f1, 0, r4			; P9LE-NEXT: lfiwzx f1, 0, r4
	; P9LE-NEXT: xxpermdi vs0, f0, f0, 2			; P9LE-NEXT: xxpermdi vs0, f0, f0, 2
	; P9LE-NEXT: xxpermdi vs1, f1, f1, 2			; P9LE-NEXT: xxpermdi vs1, f1, f1, 2
	; P9LE-NEXT: xvaddsp vs0, vs0, vs1			; P9LE-NEXT: xvaddsp vs0, vs0, vs1
	; P9LE-NEXT: xxsldwi vs0, vs0, vs0, 3			; P9LE-NEXT: xxsldwi vs0, vs0, vs0, 2
	; P9LE-NEXT: xscvspdpn f0, vs0			; P9LE-NEXT: stfiwx f0, 0, r5
	; P9LE-NEXT: stfs f0, 0(r5)
	; P9LE-NEXT: blr			; P9LE-NEXT: blr
				;
	; P9BE-LABEL: test_liwzx1:			; P9BE-LABEL: test_liwzx1:
	; P9BE: # %bb.0:			; P9BE: # %bb.0:
	; P9BE-NEXT: lfiwzx f0, 0, r3			; P9BE-NEXT: lfiwzx f0, 0, r3
	; P9BE-NEXT: lfiwzx f1, 0, r4			; P9BE-NEXT: lfiwzx f1, 0, r4
	; P9BE-NEXT: xxsldwi vs0, f0, f0, 1			; P9BE-NEXT: xxsldwi vs0, f0, f0, 1
	; P9BE-NEXT: xxsldwi vs1, f1, f1, 1			; P9BE-NEXT: xxsldwi vs1, f1, f1, 1
	; P9BE-NEXT: xvaddsp vs0, vs0, vs1			; P9BE-NEXT: xvaddsp vs0, vs0, vs1
	; P9BE-NEXT: xscvspdpn f0, vs0			; P9BE-NEXT: xxsldwi vs0, vs0, vs0, 3
	; P9BE-NEXT: stfs f0, 0(r5)			; P9BE-NEXT: stfiwx f0, 0, r5
	; P9BE-NEXT: blr			; P9BE-NEXT: blr
				;
	; P8LE-LABEL: test_liwzx1:			; P8LE-LABEL: test_liwzx1:
	; P8LE: # %bb.0:			; P8LE: # %bb.0:
	; P8LE-NEXT: lfiwzx f0, 0, r3			; P8LE-NEXT: lfiwzx f0, 0, r3
	; P8LE-NEXT: lfiwzx f1, 0, r4			; P8LE-NEXT: lfiwzx f1, 0, r4
	; P8LE-NEXT: xxpermdi vs0, f0, f0, 2			; P8LE-NEXT: xxpermdi vs0, f0, f0, 2
	; P8LE-NEXT: xxpermdi vs1, f1, f1, 2			; P8LE-NEXT: xxpermdi vs1, f1, f1, 2
	; P8LE-NEXT: xvaddsp vs0, vs0, vs1			; P8LE-NEXT: xvaddsp vs0, vs0, vs1
	; P8LE-NEXT: xxsldwi vs0, vs0, vs0, 3			; P8LE-NEXT: xxsldwi vs0, vs0, vs0, 2
	; P8LE-NEXT: xscvspdpn f0, vs0			; P8LE-NEXT: stfiwx f0, 0, r5
	; P8LE-NEXT: stfsx f0, 0, r5
	; P8LE-NEXT: blr			; P8LE-NEXT: blr
				;
	; P8BE-LABEL: test_liwzx1:			; P8BE-LABEL: test_liwzx1:
	; P8BE: # %bb.0:			; P8BE: # %bb.0:
	; P8BE-NEXT: lfiwzx f0, 0, r3			; P8BE-NEXT: lfiwzx f0, 0, r3
	; P8BE-NEXT: lfiwzx f1, 0, r4			; P8BE-NEXT: lfiwzx f1, 0, r4
	; P8BE-NEXT: xxsldwi vs0, f0, f0, 1			; P8BE-NEXT: xxsldwi vs0, f0, f0, 1
	; P8BE-NEXT: xxsldwi vs1, f1, f1, 1			; P8BE-NEXT: xxsldwi vs1, f1, f1, 1
	; P8BE-NEXT: xvaddsp vs0, vs0, vs1			; P8BE-NEXT: xvaddsp vs0, vs0, vs1
	; P8BE-NEXT: xscvspdpn f0, vs0			; P8BE-NEXT: xxsldwi vs0, vs0, vs0, 3
	; P8BE-NEXT: stfsx f0, 0, r5			; P8BE-NEXT: stfiwx f0, 0, r5
	; P8BE-NEXT: blr			; P8BE-NEXT: blr



	%a = load <1 x float>, <1 x float>* %A			%a = load <1 x float>, <1 x float>* %A
	%b = load <1 x float>, <1 x float>* %B			%b = load <1 x float>, <1 x float>* %B
	%X = fadd <1 x float> %a, %b			%X = fadd <1 x float> %a, %b
	store <1 x float> %X, <1 x float>* %C			store <1 x float> %X, <1 x float>* %C
	ret void			ret void
	}			}

	define <1 x float>* @test_liwzx2(<1 x float>* %A, <1 x float>* %B, <1 x float>* %C) {			define <1 x float>* @test_liwzx2(<1 x float>* %A, <1 x float>* %B, <1 x float>* %C) {
	; P9LE-LABEL: test_liwzx2:			; P9LE-LABEL: test_liwzx2:
	; P9LE: # %bb.0:			; P9LE: # %bb.0:
	; P9LE-NEXT: lfiwzx f0, 0, r3			; P9LE-NEXT: lfiwzx f0, 0, r3
	; P9LE-NEXT: lfiwzx f1, 0, r4			; P9LE-NEXT: lfiwzx f1, 0, r4
	; P9LE-NEXT: mr r3, r5			; P9LE-NEXT: mr r3, r5
	; P9LE-NEXT: xxpermdi vs0, f0, f0, 2			; P9LE-NEXT: xxpermdi vs0, f0, f0, 2
	; P9LE-NEXT: xxpermdi vs1, f1, f1, 2			; P9LE-NEXT: xxpermdi vs1, f1, f1, 2
	; P9LE-NEXT: xvsubsp vs0, vs0, vs1			; P9LE-NEXT: xvsubsp vs0, vs0, vs1
	; P9LE-NEXT: xxsldwi vs0, vs0, vs0, 3			; P9LE-NEXT: xxsldwi vs0, vs0, vs0, 2
	; P9LE-NEXT: xscvspdpn f0, vs0			; P9LE-NEXT: stfiwx f0, 0, r5
	; P9LE-NEXT: stfs f0, 0(r5)
	; P9LE-NEXT: blr			; P9LE-NEXT: blr
				;
	; P9BE-LABEL: test_liwzx2:			; P9BE-LABEL: test_liwzx2:
	; P9BE: # %bb.0:			; P9BE: # %bb.0:
	; P9BE-NEXT: lfiwzx f0, 0, r3			; P9BE-NEXT: lfiwzx f0, 0, r3
	; P9BE-NEXT: lfiwzx f1, 0, r4			; P9BE-NEXT: lfiwzx f1, 0, r4
	; P9BE-NEXT: mr r3, r5			; P9BE-NEXT: mr r3, r5
	; P9BE-NEXT: xxsldwi vs0, f0, f0, 1			; P9BE-NEXT: xxsldwi vs0, f0, f0, 1
	; P9BE-NEXT: xxsldwi vs1, f1, f1, 1			; P9BE-NEXT: xxsldwi vs1, f1, f1, 1
	; P9BE-NEXT: xvsubsp vs0, vs0, vs1			; P9BE-NEXT: xvsubsp vs0, vs0, vs1
	; P9BE-NEXT: xscvspdpn f0, vs0			; P9BE-NEXT: xxsldwi vs0, vs0, vs0, 3
	; P9BE-NEXT: stfs f0, 0(r5)			; P9BE-NEXT: stfiwx f0, 0, r5
	; P9BE-NEXT: blr			; P9BE-NEXT: blr
				;
	; P8LE-LABEL: test_liwzx2:			; P8LE-LABEL: test_liwzx2:
	; P8LE: # %bb.0:			; P8LE: # %bb.0:
	; P8LE-NEXT: lfiwzx f0, 0, r3			; P8LE-NEXT: lfiwzx f0, 0, r3
	; P8LE-NEXT: lfiwzx f1, 0, r4			; P8LE-NEXT: lfiwzx f1, 0, r4
	; P8LE-NEXT: mr r3, r5			; P8LE-NEXT: mr r3, r5
	; P8LE-NEXT: xxpermdi vs0, f0, f0, 2			; P8LE-NEXT: xxpermdi vs0, f0, f0, 2
	; P8LE-NEXT: xxpermdi vs1, f1, f1, 2			; P8LE-NEXT: xxpermdi vs1, f1, f1, 2
	; P8LE-NEXT: xvsubsp vs0, vs0, vs1			; P8LE-NEXT: xvsubsp vs0, vs0, vs1
	; P8LE-NEXT: xxsldwi vs0, vs0, vs0, 3			; P8LE-NEXT: xxsldwi vs0, vs0, vs0, 2
	; P8LE-NEXT: xscvspdpn f0, vs0			; P8LE-NEXT: stfiwx f0, 0, r5
	; P8LE-NEXT: stfsx f0, 0, r5
	; P8LE-NEXT: blr			; P8LE-NEXT: blr
				;
	; P8BE-LABEL: test_liwzx2:			; P8BE-LABEL: test_liwzx2:
	; P8BE: # %bb.0:			; P8BE: # %bb.0:
	; P8BE-NEXT: lfiwzx f0, 0, r3			; P8BE-NEXT: lfiwzx f0, 0, r3
	; P8BE-NEXT: lfiwzx f1, 0, r4			; P8BE-NEXT: lfiwzx f1, 0, r4
	; P8BE-NEXT: mr r3, r5			; P8BE-NEXT: mr r3, r5
	; P8BE-NEXT: xxsldwi vs0, f0, f0, 1			; P8BE-NEXT: xxsldwi vs0, f0, f0, 1
	; P8BE-NEXT: xxsldwi vs1, f1, f1, 1			; P8BE-NEXT: xxsldwi vs1, f1, f1, 1
	; P8BE-NEXT: xvsubsp vs0, vs0, vs1			; P8BE-NEXT: xvsubsp vs0, vs0, vs1
	; P8BE-NEXT: xscvspdpn f0, vs0			; P8BE-NEXT: xxsldwi vs0, vs0, vs0, 3
	; P8BE-NEXT: stfsx f0, 0, r5			; P8BE-NEXT: stfiwx f0, 0, r5
	; P8BE-NEXT: blr			; P8BE-NEXT: blr



	%a = load <1 x float>, <1 x float>* %A			%a = load <1 x float>, <1 x float>* %A
	%b = load <1 x float>, <1 x float>* %B			%b = load <1 x float>, <1 x float>* %B
	%X = fsub <1 x float> %a, %b			%X = fsub <1 x float> %a, %b
	store <1 x float> %X, <1 x float>* %C			store <1 x float> %X, <1 x float>* %C
	ret <1 x float>* %C			ret <1 x float>* %C
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Exploit store instructions that store a single vector elementClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 179776

lib/Target/PowerPC/PPCInstrVSX.td

test/CodeGen/PowerPC/extract-and-store.ll

test/CodeGen/PowerPC/scalar_vector_test_2.ll

[PowerPC] Exploit store instructions that store a single vector element
ClosedPublic