This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.h
1/2
PPCISelLowering.cpp
1
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
6
combine-extract-store.ll

Differential D44528

[PowerPC] Implement canCombineStoreAndExtract and provide the missing codegen patterns
AbandonedPublic

Authored by nemanjai on Mar 15 2018, 10:34 AM.

Download Raw Diff

Details

Reviewers

hfinkel
Carrot
kbarton
echristo
jsji
steven.zhang

Summary

We have VSX store instructions that will store a single element from a vector without modifying it in any way. Previous generation cores can do this for word-sized elements and Power9 can also do it for half-word and byte-sized elements.
The TableGen patterns for the word-sized versions were missing - this patch adds them.

Furthermore, it provides the information about the cost of such a combine - zero cost when the index is the element number that the instruction stores, cost of 3 for other elements. The cost for the other elements is because a vector permute is needed to shift the element and it's only worthwhile if keeping the value as a vector reduces the cost enough to offset the cost of the permute.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai created this revision.Mar 15 2018, 10:34 AM

Add the missing predicates on the patterns.

lei added a subscriber: lei.Apr 26 2018, 12:43 PM

lei added inline comments.

test/CodeGen/PowerPC/combine-extract-store.ll
8	nit: line past col 80
11	Should these 2 lines be CHECK-DAG since they don't necessarily have to happen in this order?

Ping.

test/CodeGen/PowerPC/combine-extract-store.ll
8	I think we generally allow code in test cases to have lines however long clang produces them.
11	They certainly don't need to happen in this order, but I find that changing the CHECK directive produced by the tool makes the test case harder to maintain sometimes.

Implemented all the patterns for codegen of extract->store.

Herald added a subscriber: jsji. · View Herald TranscriptOct 22 2018, 5:14 AM

nemanjai retitled this revision from [PowerPC] Implement canCombineStoreAndExtract and provide the missing pattern for the combination to [PowerPC] Implement canCombineStoreAndExtract and provide the missing codegen patterns.Oct 22 2018, 5:15 AM

nemanjai added reviewers: jsji, steven.zhang.

steven.zhang added inline comments.Oct 25 2018, 12:27 AM

lib/Target/PowerPC/PPCISelLowering.cpp
13959	I didn't take deep look at the implementation for this patch. The condition here seems not quite align with the comments. If the bitwidth is 32bit, we will combine the store and extract no matter if it is Power9 or not. I am not sure if this is by intention.

It is a good idea if we can CombineStoreAndExtract to save one instruction.
But looks like to me that it is not always safe and beneficial to do so, especially for floating points.

lib/Target/PowerPC/PPCInstrVSX.td
3073–3074	This looks like not related to "extract+store" exploitation? Maybe we should do it in another patch?
test/CodeGen/PowerPC/combine-extract-store.ll
2	I would be better if we can split this new testcase into new patch, then just show diff due to this change here.
3	How about pwr9? Any change due to this patch?
test/CodeGen/PowerPC/extract-and-store.ll
69 ↗	(On Diff #170392)	This looks not better than using stfd ? Can we avoid combining this case?
122 ↗	(On Diff #170392)	Are we sure that the semantic are equivalent here for stfs-> stfiwx? With stfs: `The contents of register FRS are converted to single format (see page 160) and stored into the word in storage addressed by EA.`, With stfiwx: `(FRS)32:63 are stored, without conversion, into the word in storage addressed by EA.` So, is it safe to assume that there is not difference due to conversion?
157 ↗	(On Diff #170392)	Similar to above, are we sure `stxsiwx` will store the save value as `stfs` for all single precision values?
test/CodeGen/PowerPC/store_fptoi.ll
21 ↗	(On Diff #170392)	Looks like not related to "extract+store" exploitation? Maybe we should do it in another patch?

Upon closer inspection, this actually almost never fires on PPC so spending any more time on it does not seem useful. Abandoning this patch.

lib/Target/PowerPC/PPCISelLowering.cpp

13959

That is correct. And I feel that the comment does not suggest otherwise. Perhaps if I change the comment to something like:

// Prior to Power9, we only have an instruction that combines a store and extract
// for i32 (STXSIWX). ISA 3.0 (Power9) introduced instructions that do this for
// subword types (i8, i16). There is also no advantage to doing this for i64 on
// any subtarget.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.h

5 lines

PPCISelLowering.cpp

32 lines

PPCInstrVSX.td

12 lines

test/

CodeGen/

PowerPC/

combine-extract-store.ll

85 lines

Diff 138591

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 1,090 Lines • ▼ Show 20 Lines	private:
/// handled by the VINSERTB instruction introduced in ISA 3.0. This is		/// handled by the VINSERTB instruction introduced in ISA 3.0. This is
/// essentially v16i8 vector version of VINSERTH.		/// essentially v16i8 vector version of VINSERTH.
SDValue lowerToVINSERTB(ShuffleVectorSDNode *N, SelectionDAG &DAG) const;		SDValue lowerToVINSERTB(ShuffleVectorSDNode *N, SelectionDAG &DAG) const;

// Return whether the call instruction can potentially be optimized to a		// Return whether the call instruction can potentially be optimized to a
// tail call. This will cause the optimizers to attempt to move, or		// tail call. This will cause the optimizers to attempt to move, or
// duplicate return instructions to help enable tail call optimizations.		// duplicate return instructions to help enable tail call optimizations.
bool mayBeEmittedAsTailCall(const CallInst *CI) const override;		bool mayBeEmittedAsTailCall(const CallInst *CI) const override;

		// If the input vector will require a direct-move to extract the element
		// but the store can be combined into PPC::STIWX, we want to combine it.
		bool canCombineStoreAndExtract(Type VectorTy, Value Idx,
		unsigned &Cost) const override;
}; // end class PPCTargetLowering		}; // end class PPCTargetLowering

namespace PPC {		namespace PPC {

FastISel *createFastISel(FunctionLoweringInfo &FuncInfo,		FastISel *createFastISel(FunctionLoweringInfo &FuncInfo,
const TargetLibraryInfo *LibInfo);		const TargetLibraryInfo *LibInfo);

} // end namespace PPC		} // end namespace PPC
Show All 31 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,934 Lines • ▼ Show 20 Lines	bool PPCTargetLowering::mayBeEmittedAsTailCall(const CallInst *CI) const {
// Make sure the callee and caller calling conventions are eligible for tco.		// Make sure the callee and caller calling conventions are eligible for tco.
if (!areCallingConvEligibleForTCO_64SVR4(Caller->getCallingConv(),		if (!areCallingConvEligibleForTCO_64SVR4(Caller->getCallingConv(),
CI->getCallingConv()))		CI->getCallingConv()))
return false;		return false;

// If the function is local then we have a good chance at tail-calling it		// If the function is local then we have a good chance at tail-calling it
return getTargetMachine().shouldAssumeDSOLocal(*Caller->getParent(), Callee);		return getTargetMachine().shouldAssumeDSOLocal(*Caller->getParent(), Callee);
}		}

		bool PPCTargetLowering::canCombineStoreAndExtract(Type VectorTy, Value Idx,
		unsigned &Cost) const {
		if (!Subtarget.hasDirectMove() \|\| !Subtarget.hasAltivec() \|\|
		!Subtarget.hasVSX())
		return false;

		// If the index is unknown at compile time, this is very expensive to lower
		// and it is not possible to combine the store with the extract.
		ConstantInt *CI = dyn_cast<ConstantInt>(Idx);
		if (!CI)
		return false;

		assert(VectorTy->isVectorTy() && "VectorTy is not a vector type");
		unsigned BitWidth = VectorTy->getScalarSizeInBits();

		// Only have combined stores for sub-word types on Power9.
		steven.zhangUnsubmitted Not Done Reply Inline Actions I didn't take deep look at the implementation for this patch. The condition here seems not quite align with the comments. If the bitwidth is 32bit, we will combine the store and extract no matter if it is Power9 or not. I am not sure if this is by intention. steven.zhang: I didn't take deep look at the implementation for this patch. The condition here seems not…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions That is correct. And I feel that the comment does not suggest otherwise. Perhaps if I change the comment to something like: // Prior to Power9, we only have an instruction that combines a store and extract // for i32 (STXSIWX). ISA 3.0 (Power9) introduced instructions that do this for // subword types (i8, i16). There is also no advantage to doing this for i64 on // any subtarget. nemanjai: That is correct. And I feel that the comment does not suggest otherwise. Perhaps if I change…
		if (BitWidth > 32 \|\| (!Subtarget.hasP9Vector() && BitWidth != 32))
		return false;

		uint64_t CIdx = CI->getZExtValue();
		uint64_t NaturalIdx = -1UL;
		switch (BitWidth) {
		default: return false;
		case 8: NaturalIdx = Subtarget.isLittleEndian() ? 8 : 7; break;
		case 16: NaturalIdx = Subtarget.isLittleEndian() ? 4 : 3; break;
		case 32: NaturalIdx = Subtarget.isLittleEndian() ? 2 : 1; break;
		}

		Cost = CIdx == NaturalIdx ? 0 : 3;
		return true;
		}

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 1,425 Lines • ▼ Show 20 Lines	let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.

// Conversions between vector and scalar single precision		// Conversions between vector and scalar single precision
def XSCVDPSPN : XX2Form<60, 267, (outs vsrc:$XT), (ins vssrc:$XB),		def XSCVDPSPN : XX2Form<60, 267, (outs vsrc:$XT), (ins vssrc:$XB),
"xscvdpspn $XT, $XB", IIC_VecFP, []>;		"xscvdpspn $XT, $XB", IIC_VecFP, []>;
def XSCVSPDPN : XX2Form<60, 331, (outs vssrc:$XT), (ins vsrc:$XB),		def XSCVSPDPN : XX2Form<60, 331, (outs vssrc:$XT), (ins vsrc:$XB),
"xscvspdpn $XT, $XB", IIC_VecFP, []>;		"xscvspdpn $XT, $XB", IIC_VecFP, []>;
} // UseVSXReg = 1		} // UseVSXReg = 1

let Predicates = [IsLittleEndian] in {		let Predicates = [HasP8Vector, IsLittleEndian] in {
def : Pat<(f32 (PPCfcfids		def : Pat<(f32 (PPCfcfids
(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 0)))))),		(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 0)))))),
(f32 (XSCVSXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f32 (XSCVSXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
def : Pat<(f32 (PPCfcfids		def : Pat<(f32 (PPCfcfids
(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 1)))))),		(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 1)))))),
(f32 (XSCVSXDSP (COPY_TO_REGCLASS		(f32 (XSCVSXDSP (COPY_TO_REGCLASS
(f64 (COPY_TO_REGCLASS $S, VSRC)), VSFRC)))>;		(f64 (COPY_TO_REGCLASS $S, VSRC)), VSFRC)))>;
def : Pat<(f32 (PPCfcfidus		def : Pat<(f32 (PPCfcfidus
(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 0)))))),		(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 0)))))),
(f32 (XSCVUXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f32 (XSCVUXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
def : Pat<(f32 (PPCfcfidus		def : Pat<(f32 (PPCfcfidus
(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 1)))))),		(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 1)))))),
(f32 (XSCVUXDSP (COPY_TO_REGCLASS		(f32 (XSCVUXDSP (COPY_TO_REGCLASS
(f64 (COPY_TO_REGCLASS $S, VSRC)), VSFRC)))>;		(f64 (COPY_TO_REGCLASS $S, VSRC)), VSFRC)))>;
		def : Pat<(store (i32 (extractelt v4i32:$A, 2)), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		def : Pat<(store (f32 (extractelt v4f32:$A, 2)), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
}		}

let Predicates = [IsBigEndian] in {		let Predicates = [HasP8Vector, IsBigEndian] in {
def : Pat<(f32 (PPCfcfids		def : Pat<(f32 (PPCfcfids
(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 0)))))),		(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 0)))))),
(f32 (XSCVSXDSP (COPY_TO_REGCLASS $S, VSFRC)))>;		(f32 (XSCVSXDSP (COPY_TO_REGCLASS $S, VSFRC)))>;
def : Pat<(f32 (PPCfcfids		def : Pat<(f32 (PPCfcfids
(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 1)))))),		(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 1)))))),
(f32 (XSCVSXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f32 (XSCVSXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
def : Pat<(f32 (PPCfcfidus		def : Pat<(f32 (PPCfcfidus
(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 0)))))),		(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 0)))))),
(f32 (XSCVUXDSP (COPY_TO_REGCLASS $S, VSFRC)))>;		(f32 (XSCVUXDSP (COPY_TO_REGCLASS $S, VSFRC)))>;
def : Pat<(f32 (PPCfcfidus		def : Pat<(f32 (PPCfcfidus
(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 1)))))),		(f64 (PPCmtvsra (i64 (vector_extract v2i64:$S, 1)))))),
(f32 (XSCVUXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f32 (XSCVUXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
		def : Pat<(store (i32 (extractelt v4i32:$A, 1)), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
		def : Pat<(store (f32 (extractelt v4f32:$A, 1)), xoaddr:$src),
		(STIWX (EXTRACT_SUBREG $A, sub_64), xoaddr:$src)>;
}		}
def : Pat<(v4i32 (scalar_to_vector ScalarLoads.Li32)),		def : Pat<(v4i32 (scalar_to_vector ScalarLoads.Li32)),
(v4i32 (XXSPLTWs (LIWAX xoaddr:$src), 1))>;		(v4i32 (XXSPLTWs (LIWAX xoaddr:$src), 1))>;
} // AddedComplexity = 400		} // AddedComplexity = 400
} // HasP8Vector		} // HasP8Vector

let UseVSXReg = 1, AddedComplexity = 400 in {		let UseVSXReg = 1, AddedComplexity = 400 in {
let Predicates = [HasDirectMove] in {		let Predicates = [HasDirectMove] in {
▲ Show 20 Lines • Show All 1,585 Lines • ▼ Show 20 Lines	def DFSTOREf32 : Pseudo<(outs), (ins vssrc:$XT, memrix:$dst),
[(store f32:$XT, ixaddr:$dst)]>;		[(store f32:$XT, ixaddr:$dst)]>;
def DFSTOREf64 : Pseudo<(outs), (ins vsfrc:$XT, memrix:$dst),		def DFSTOREf64 : Pseudo<(outs), (ins vsfrc:$XT, memrix:$dst),
"#DFSTOREf64",		"#DFSTOREf64",
[(store f64:$XT, ixaddr:$dst)]>;		[(store f64:$XT, ixaddr:$dst)]>;
}		}
def : Pat<(f64 (extloadf32 ixaddr:$src)),		def : Pat<(f64 (extloadf32 ixaddr:$src)),
(COPY_TO_REGCLASS (DFLOADf32 ixaddr:$src), VSFRC)>;		(COPY_TO_REGCLASS (DFLOADf32 ixaddr:$src), VSFRC)>;
def : Pat<(f32 (fpround (f64 (extloadf32 ixaddr:$src)))),		def : Pat<(f32 (fpround (f64 (extloadf32 ixaddr:$src)))),
(f32 (DFLOADf32 ixaddr:$src))>;		(f32 (DFLOADf32 ixaddr:$src))>;
} // end HasP9Vector, AddedComplexity		} // end HasP9Vector, AddedComplexity
		jsjiUnsubmitted Not Done Reply Inline Actions This looks like not related to "extract+store" exploitation? Maybe we should do it in another patch? jsji: This looks like not related to "extract+store" exploitation? Maybe we should do it in another…

let Predicates = [HasP9Vector] in {		let Predicates = [HasP9Vector] in {
let isPseudo = 1 in {		let isPseudo = 1 in {
let mayStore = 1 in {		let mayStore = 1 in {
def SPILLTOVSR_STX : Pseudo<(outs), (ins spilltovsrrc:$XT, memrr:$dst),		def SPILLTOVSR_STX : Pseudo<(outs), (ins spilltovsrrc:$XT, memrr:$dst),
"#SPILLTOVSR_STX", []>;		"#SPILLTOVSR_STX", []>;
def SPILLTOVSR_ST : Pseudo<(outs), (ins spilltovsrrc:$XT, memrix:$dst),		def SPILLTOVSR_ST : Pseudo<(outs), (ins spilltovsrrc:$XT, memrix:$dst),
"#SPILLTOVSR_ST", []>;		"#SPILLTOVSR_ST", []>;
▲ Show 20 Lines • Show All 378 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/combine-extract-store.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unkknown-unknown \
				jsjiUnsubmitted Not Done Reply Inline Actions I would be better if we can split this new testcase into new patch, then just show diff due to this change here. jsji: I would be better if we can split this new testcase into new patch, then just show diff due to…
				; RUN: -verify-machineinstrs -O2 < %s \| FileCheck %s
				jsjiUnsubmitted Not Done Reply Inline Actions How about pwr9? Any change due to this patch? jsji: How about pwr9? Any change due to this patch?
				; RUN: llc -mcpu=pwr8 -mtriple=powerpc64-unkknown-unknown \
				; RUN: -verify-machineinstrs -O2 < %s \| FileCheck %s --check-prefix=CHECK-BE

				; Function Attrs: norecurse nounwind
				define void @test(<4 x i32>* noalias nocapture readonly %VP, <4 x i32>* noalias nocapture %VP2, i32* noalias nocapture %IP) local_unnamed_addr #0 {
				leiUnsubmitted Not Done Reply Inline Actions nit: line past col 80 lei: nit: line past col 80
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions I think we generally allow code in test cases to have lines however long clang produces them. nemanjai: I think we generally allow code in test cases to have lines however long clang produces them.
				; CHECK-LABEL: test:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vspltisw 2, 4
				leiUnsubmitted Not Done Reply Inline Actions Should these 2 lines be CHECK-DAG since they don't necessarily have to happen in this order? lei: Should these 2 lines be CHECK-DAG since they don't necessarily have to happen in this order?
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions They certainly don't need to happen in this order, but I find that changing the CHECK directive produced by the tool makes the test case harder to maintain sometimes. nemanjai: They certainly don't need to happen in this order, but I find that changing the CHECK directive…
				; CHECK-NEXT: lvx 3, 0, 3
				; CHECK-NEXT: vadduwm 2, 3, 2
				; CHECK-NEXT: stxsiwx 34, 0, 5
				; CHECK-NEXT: stvx 3, 0, 4
				; CHECK-NEXT: blr
				entry:
				%0 = load <4 x i32>, <4 x i32>* %VP, align 16
				%vecext = extractelement <4 x i32> %0, i32 2
				%add = add nsw i32 %vecext, 4
				store i32 %add, i32* %IP, align 4
				store <4 x i32> %0, <4 x i32>* %VP2, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @testf(<4 x float>* noalias nocapture readonly %VP, <4 x float>* noalias nocapture %VP2, float* noalias nocapture %IP) local_unnamed_addr #0 {
				; CHECK-LABEL: testf:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: addis 6, 2, .LCPI1_0@toc@ha
				; CHECK-NEXT: lvx 2, 0, 3
				; CHECK-NEXT: addi 6, 6, .LCPI1_0@toc@l
				; CHECK-NEXT: lvx 3, 0, 6
				; CHECK-NEXT: stvx 2, 0, 4
				; CHECK-NEXT: xvaddsp 0, 34, 35
				; CHECK-NEXT: stfiwx 0, 0, 5
				; CHECK-NEXT: blr
				entry:
				%0 = load <4 x float>, <4 x float>* %VP, align 16
				%vecext = extractelement <4 x float> %0, i32 2
				%add = fadd float %vecext, 4.000000e+00
				store float %add, float* %IP, align 4
				store <4 x float> %0, <4 x float>* %VP2, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @testBE(<4 x i32>* noalias nocapture readonly %VP, <4 x i32>* noalias nocapture %VP2, i32* noalias nocapture %IP) local_unnamed_addr #0 {
				; CHECK-BE-LABEL: testBE:
				; CHECK-BE: # %bb.0: # %entry
				; CHECK-BE-NEXT: vspltisw 2, 4
				; CHECK-BE-NEXT: lxvw4x 35, 0, 3
				; CHECK-BE-NEXT: vadduwm 2, 3, 2
				; CHECK-BE-NEXT: stxsiwx 34, 0, 5
				; CHECK-BE-NEXT: stxvw4x 35, 0, 4
				; CHECK-BE-NEXT: blr
				entry:
				%0 = load <4 x i32>, <4 x i32>* %VP, align 16
				%vecext = extractelement <4 x i32> %0, i32 1
				%add = add nsw i32 %vecext, 4
				store i32 %add, i32* %IP, align 4
				store <4 x i32> %0, <4 x i32>* %VP2, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @testBEf(<4 x float>* noalias nocapture readonly %VP, <4 x float>* noalias nocapture %VP2, float* noalias nocapture %IP) local_unnamed_addr #0 {
				; CHECK-BE-LABEL: testBEf:
				; CHECK-BE: # %bb.0: # %entry
				; CHECK-BE-NEXT: addis 6, 2, .LCPI3_0@toc@ha
				; CHECK-BE-NEXT: lxvw4x 0, 0, 3
				; CHECK-BE-NEXT: addi 6, 6, .LCPI3_0@toc@l
				; CHECK-BE-NEXT: lxvw4x 1, 0, 6
				; CHECK-BE-NEXT: stxvw4x 0, 0, 4
				; CHECK-BE-NEXT: xvaddsp 1, 0, 1
				; CHECK-BE-NEXT: stfiwx 1, 0, 5
				; CHECK-BE-NEXT: blr
				entry:
				%0 = load <4 x float>, <4 x float>* %VP, align 16
				%vecext = extractelement <4 x float> %0, i32 1
				%add = fadd float %vecext, 4.000000e+00
				store float %add, float* %IP, align 4
				store <4 x float> %0, <4 x float>* %VP2, align 16
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Implement canCombineStoreAndExtract and provide the missing codegen patternsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 138591

lib/Target/PowerPC/PPCISelLowering.h

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/PowerPC/PPCInstrVSX.td

test/CodeGen/PowerPC/combine-extract-store.ll

[PowerPC] Implement canCombineStoreAndExtract and provide the missing codegen patterns
AbandonedPublic