This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Target/PowerPC/
-
PowerPC/
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
2/4
extract-and-store.ll

Differential D62890

[DAGCombiner] Improve tryStoreMergeOfExtracts to merge stores before type is legalized
Needs ReviewPublic

Authored by tingwang on Jun 4 2019, 10:40 PM.

Download Raw Diff

Details

Reviewers

nemanjai
bogner
niravd
craig.topper
RKSimon
uweigand
jonpa
lkail
shchenz
qiucf
Esme

Group Reviewers

Restricted Project

Summary

Implement the opportunity to merge stores from extracts before type is legalized on PPC.

Diff Detail

Event Timeline

lkail created this revision.Jun 4 2019, 10:40 PM

Herald added subscribers: llvm-commits, kbarton, hiraditya. · View Herald TranscriptJun 4 2019, 10:40 PM

lkail edited the summary of this revision. (Show Details)Jun 4 2019, 10:44 PM

lkail retitled this revision from Merge consecutive stores of vector elements before types are legalized to [PowerPC] Merge consecutive stores of vector elements before types are legalized.Jun 4 2019, 10:49 PM

lkail added reviewers: bogner, niravd, MaskRay.Jun 5 2019, 11:44 PM

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

niravd added a subscriber: fhahn.Jun 7 2019, 11:34 AM

In D62890#1532667, @niravd wrote:

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

Thanks for responding, @niravd. I currently have no idea of why NumStoresToMerge was not being before. I might have a look at patches related to this portion of code. I notice that consecutive stores of vector elements was first introduced by https://reviews.llvm.org/rL224611 in which a legal type was a requirement already. I'll have a try of what you said.

In D62890#1532667, @niravd wrote:

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

Hi, @niravd, after investigate code carefully, I think this check might not be redundant. Considering the case, we have 3 i32 values extracted from vectors, both isTypeLegal and TIL.isTypeLegal see a v3i32 illegal. However, MergeStoresOfConstantsOrVecElts which is called later by MergeConsecutiveStores doesn't require type legality and builds a BUILD_VECTOR node whose elements are 3 EXTRACT_VECTOR_ELT values. PowerPC's vector type legalizer can handle such cases, so it can benefit from getNumStoresOfVectorElementsToMergePreLegalize. I know it's quite weird such check added within a context where type legality check is around. I once wanna implement it in PPCTargetLowering::PerformDAGCombine, however it might make code duplicated.

Hi, @niravd, after investigate code carefully, I think this check might not be redundant. Considering the case, we have 3 i32 values extracted from vectors, both isTypeLegal and TIL.isTypeLegal see a v3i32 illegal.

Are you certain isTypeLegal was returning false in prelegalization? isTypeLegal(x) should be "!LegalTypes || TLI.isTypeLegal(x)". I was expecting that if you swapped out TLI.isTypeLegal for isTypeLegal where it was failing, we could generate an invalid node, but it sounds like that's not the case. If so, I think we should double check.

However, MergeStoresOfConstantsOrVecElts which is called later by MergeConsecutiveStores doesn't require type legality and builds a BUILD_VECTOR node whose elements are 3 EXTRACT_VECTOR_ELT values. PowerPC's vector type legalizer can handle such cases, so it can benefit from getNumStoresOfVectorElementsToMergePreLegalize. I know it's quite weird such check added within a context where type legality check is around. I once wanna implement it in PPCTargetLowering::PerformDAGCombine, however it might make code duplicated.

@niravd Thanks for pointing out my mistake. I tried swapping out TIL.isTypeLegal with isTypeLegal, code generated for PowerPC is not what I expect, cuz TIL.allowsMemoryAccess will return false if Ty is something like v3i32. Also this change will break some regression tests of SystemZ and X86. And I don't quite understand the meaning of 'double check' here, could you please explain more?

By double check, I just meant to look at the results again with isTypeLegal checked, which is where we are?

FWIW, It's probably fine to do something like isTypeLegal with allowsMemoryAccess, though there will likely be more changes in other backends,. I expect most to be mundane. It may be worth it to update and see if others are motivated to look into real regressions.

In D62890#1549589, @lkail wrote:

@niravd Thanks for pointing out my mistake. I tried swapping out TIL.isTypeLegal with isTypeLegal, code generated for PowerPC is not what I expect, cuz TIL.allowsMemoryAccess will return false if Ty is something like v3i32. Also this change will break some regression tests of SystemZ and X86. And I don't quite understand the meaning of 'double check' here, could you please explain more?

lkail updated this revision to Diff 205740.Jun 19 2019, 11:26 PM

lkail retitled this revision from [PowerPC] Merge consecutive stores of vector elements before types are legalized to [DAGCombiner] Merge consecutive stores of vector elements before types are legalized.

lkail edited the summary of this revision. (Show Details)

Updated the patch. @niravd any further suggestions?

lkail edited reviewers, added: craig.topper, RKSimon, uweigand; removed: MaskRay.Jun 19 2019, 11:34 PM

Looks like it's mostly an improvement though there are some potential regressions around vector shuffles. I'll leave it to the others if it's acceptable to land.

Also, what happened to the PPC test changes?

The X86 changes LGTM

niravd added a reviewer: jonpa.Jun 20 2019, 1:22 PM

Also, what happened to the PPC test changes?

Cuz PPC's allowsMemoryAccess lacks information about which CombineLevel is at, it fails the check(Only a few vector types are allowed to have misaligned access). As a result, currently no changes happen in PPC's code. I might try to solve this problem with another patch.

Hi @jonpa, could you have a look if it is a real reg in SystemZ's change?

Cuz PPC's allowsMemoryAccess lacks information about which CombineLevel is at, it fails the check(Only a few vector types are allowed to have misaligned access). As a result, currently no changes happen in PPC's code. I might try to solve this problem with another patch.

Ah. Can you solve this by doing the analog to isTypeLegal (vs. isTypeLegal) and assume it's true prelegalization? If it's that small, you should (but don't feel like you must) fold it into this patch.

In D62890#1554934, @lkail wrote:

Hi @jonpa, could you have a look if it is a real reg in SystemZ's change?

At a first glance this seems to be generating worse code now since we need to do all those permute-type instructions in order to get the bytes into the correct order in a single register to store ... I'll clarify with the hardware folks which of the sequences would actually be preferable.

If it turns out that there are instances where *not* merging stores (even when it would be *possible*) is not preferred from a performance perspective, should common code use some cost function here?

Considering suggestions of @niravd and @uweigand , is it proper to have an implementation like

bool DAGCombiner::isAbleToMergeConsecutiveStoresPreLegalize(ArrayRef<SNode*> elements, .../* Params related to align, addrspace and etc.*/) {
  if (LegalTypes)
    return false;
  // Let target decides cost considering elements to be stored.
  return TLI.canMergeConsecutiveStoresOfVectorElements(elements, ...);
}

And new check is

if (isTypeLegal(Ty) &&
    TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) &&
    ((TLI.allowsMemoryAccess(Context, DL, Ty,
                           *FirstInChain->getMemOperand(), &IsFast) &&
      IsFast) || isAbleToMergeConsecutiveStoresPreLegalize(elements, ...))

It's reasonable assuming we want to prohibit the SystemZ cases. Note that TLI.canMergeConsecutiveStoresOfVectorElements and TLI.canMergeStoresTo are both only called here and should be merged into a single method.

That said, the permute expression seems like a sign that there are some permutation peepholes may be worthwhile. We could simplify the permute into a BUILD_VECTOR of a vector stores which should be close enough to the original element-wise stores here.

In D62890#1557119, @lkail wrote:

Considering suggestions of @niravd and @uweigand , is it proper to have an implementation like

bool DAGCombiner::isAbleToMergeConsecutiveStoresPreLegalize(ArrayRef<SNode*> elements, .../* Params related to align, addrspace and etc.*/) {
  if (LegalTypes)
    return false;
  // Let target decides cost considering elements to be stored.
  return TLI.canMergeConsecutiveStoresOfVectorElements(elements, ...);
}

And new check is

if (isTypeLegal(Ty) &&
    TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) &&
    ((TLI.allowsMemoryAccess(Context, DL, Ty,
                           *FirstInChain->getMemOperand(), &IsFast) &&
      IsFast) || isAbleToMergeConsecutiveStoresPreLegalize(elements, ...))

@lkail Are you still looking at this please?

Herald added a subscriber: • wuzish. · View Herald TranscriptAug 19 2019, 8:28 AM

Hi @RKSimon , currently I'm not working on it. I should have abandoned this patch. I might have another patch which also works for PowerPC.

Since this patch has long been not updated, I'll close it and plan another patch that also works for PowerPC.

tingwang commandeered this revision.Jul 31 2022, 11:24 PM

tingwang added a reviewer: lkail.

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2022, 11:24 PM

Herald added subscribers: StephenFan, ecnelises, pengfei. · View Herald Transcript

With this change, the SystemZ case is not touched. Two x86 cases still need to confirm.

According to previous comments, it may be better to have some cost function to decide if merge is preferred or not. I'm not sure how to implement the cost function, one reason is that the final code sequence generated may depends on target specific flag etc., and backend target at this stage may not have enough information to decide. For example results in extract-and-store.ll show dependence on -ppc-disable-perfect-shuffle=false.

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2022, 11:45 PM

Harbormaster completed remote builds in B178505: Diff 448921.Jul 31 2022, 11:46 PM

Instead of a cost function, could we use (possibly tweaked) isMultiStoresCheaperThanBitsMerge?

llvm/test/CodeGen/PowerPC/extract-and-store.ll
528	is this really an issue with the store merging or the ppc shuffle combines have gotten messed up?

In D62890#3690606, @RKSimon wrote:

Instead of a cost function, could we use (possibly tweaked) isMultiStoresCheaperThanBitsMerge?

Thank you for pointing out. I will try to come up with something to guard against cases that got degenerated.

llvm/test/CodeGen/PowerPC/extract-and-store.ll
528	Had a quick check, this is the case PPC::LowerVECTOR_SHUFFLE does not have efficient solution, so it turned into VPERM as last resort. Probably the cost function should avoid this kind of situation.

RKSimon added inline comments.Aug 1 2022, 5:12 AM

llvm/test/CodeGen/PowerPC/extract-and-store.ll
528	I'd be nervous about a cost function as that is likely to be very difficult to keep balanced. I'd probably recommend just overriding canMergeStoresTo or isMultiStoresCheaperThanBitsMerge for PPC

Added cost function for target as suggested. The function on PPC tries to avoid some patterns that lead to TOC accesses.

Harbormaster completed remote builds in B178715: Diff 449219.Aug 2 2022, 1:38 AM

tingwang added inline comments.Aug 2 2022, 1:42 AM

llvm/test/CodeGen/PowerPC/extract-and-store.ll
528	Thank you for the advice. I created a new function, hope that is fine.

This code seems quite unnecessarily complex. I can achieve essentially the same results with something like this:

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 36973f5bddb0..984e84ba6fdc 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -939,6 +939,9 @@ public:
            (unsigned)VT.getSimpleVT().SimpleTy < array_lengthof(RegClassForVT));
     return VT.isSimple() && RegClassForVT[VT.getSimpleVT().SimpleTy] != nullptr;
   }
+  virtual bool isTypeLegalForMemAccess(EVT VT) const {
+    return isTypeLegal(VT);
+  }
 
   class ValueTypeActionImpl {
     /// ValueTypeActions - For each value type, keep a LegalizeTypeAction enum
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 5e77317572af..6acde2a5ae91 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18486,7 +18486,7 @@ bool DAGCombiner::tryStoreMergeOfExtracts(
       if (Ty.getSizeInBits() > MaximumLegalStoreInBits)
         break;
 
-      if (TLI.isTypeLegal(Ty) &&
+      if (TLI.isTypeLegalForMemAccess(Ty) &&
           TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG.getMachineFunction()) &&
           TLI.allowsMemoryAccess(Context, DL, Ty,
                                  *FirstInChain->getMemOperand(), &IsFast) &&
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 862d2ebc75a6..ceed4c1ffc91 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -17569,7 +17569,8 @@ bool PPCTargetLowering::allowsMisalignedMemoryAccesses(EVT VT,
       return true;
     if (Subtarget.hasVSX()) {
       if (VT != MVT::v2f64 && VT != MVT::v2i64 &&
-          VT != MVT::v4f32 && VT != MVT::v4i32)
+          VT != MVT::v4f32 && VT != MVT::v4i32 &&
+          VT != MVT::v2f32 && VT != MVT::v2i32)
         return false;
     } else {
       return false;
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 2fa6d45bfe1a..1f0051f8d273 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -1101,6 +1101,11 @@ namespace llvm {
     EVT getOptimalMemOpType(const MemOp &Op,
                             const AttributeList &FuncAttributes) const override;
 
+    bool isTypeLegalForMemAccess(EVT VT) const override {
+      bool Ret = TargetLoweringBase::isTypeLegalForMemAccess(VT) || VT == MVT::v2i32 || VT == MVT::v2f32;
+      return Ret;
+    }
+
     /// Is unaligned memory access allowed for the given type, and is it fast
     /// relative to software emulation.
     bool allowsMisalignedMemoryAccesses(

Sure, it produces some vperm's with this test case, but I don't see an issue with that - in most cases that matter, the constant pool loads aren't likely to lead to a lot of cache misses.

Thank you. A good lesson for me to learn how to simplify logic!

The original approach is too complex, and the same effect can be achieved more simply as Nemanja pointed out.

I'm adopting the whole approach, and added a guard to make sure this is applied only before type is legalized.

Harbormaster completed remote builds in B180097: Diff 451048.Aug 9 2022, 1:52 AM

RKSimon added inline comments.Aug 9 2022, 2:27 AM

llvm/include/llvm/CodeGen/TargetLowering.h
701	please add doxygen descriptions of the params (B in particular....)

Update parameter naming and comments.

Harbormaster completed remote builds in B180132: Diff 451097.Aug 9 2022, 5:29 AM

Gentle ping.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

8 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

12 lines

Target/

PowerPC/

PPCISelLowering.h

6 lines

PPCISelLowering.cpp

18 lines

test/

CodeGen/

PowerPC/

extract-and-store.ll

415 lines

Diff 203077

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 442 Lines • ▼ Show 20 Lines	public:
}		}

/// Returns if it's reasonable to merge stores to MemVT size.		/// Returns if it's reasonable to merge stores to MemVT size.
virtual bool canMergeStoresTo(unsigned AS, EVT MemVT,		virtual bool canMergeStoresTo(unsigned AS, EVT MemVT,
const SelectionDAG &DAG) const {		const SelectionDAG &DAG) const {
return true;		return true;
}		}

		/// Return number of consecutive stores of vector elements that can be merged
		/// before legalizing types.
		virtual unsigned getNumStoresOfVectorElementsToMergePreLegalize(
		LLVMContext &Context, const DataLayout &DL, EVT MemVT, unsigned AS,
		unsigned Align, unsigned NumConsecutiveStores) const {
		return std::min(1U, NumConsecutiveStores);
		}

/// Return true if it is cheap to speculate a call to intrinsic cttz.		/// Return true if it is cheap to speculate a call to intrinsic cttz.
virtual bool isCheapToSpeculateCttz() const {		virtual bool isCheapToSpeculateCttz() const {
return false;		return false;
}		}

/// Return true if it is cheap to speculate a call to intrinsic ctlz.		/// Return true if it is cheap to speculate a call to intrinsic ctlz.
virtual bool isCheapToSpeculateCtlz() const {		virtual bool isCheapToSpeculateCtlz() const {
return false;		return false;
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	public:
bool isTypeLegal(EVT VT) const {		bool isTypeLegal(EVT VT) const {
assert(!VT.isSimple() \|\|		assert(!VT.isSimple() \|\|
(unsigned)VT.getSimpleVT().SimpleTy < array_lengthof(RegClassForVT));		(unsigned)VT.getSimpleVT().SimpleTy < array_lengthof(RegClassForVT));
return VT.isSimple() && RegClassForVT[VT.getSimpleVT().SimpleTy] != nullptr;		return VT.isSimple() && RegClassForVT[VT.getSimpleVT().SimpleTy] != nullptr;
}		}

class ValueTypeActionImpl {		class ValueTypeActionImpl {
/// ValueTypeActions - For each value type, keep a LegalizeTypeAction enum		/// ValueTypeActions - For each value type, keep a LegalizeTypeAction enum
/// that indicates how instruction selection should deal with the type.		/// that indicates how instruction selection should deal with the type.
		RKSimonUnsubmitted Not Done Reply Inline Actions please add doxygen descriptions of the params (B in particular....) RKSimon: please add doxygen descriptions of the params (B in particular....)
LegalizeTypeAction ValueTypeActions[MVT::LAST_VALUETYPE];		LegalizeTypeAction ValueTypeActions[MVT::LAST_VALUETYPE];

public:		public:
ValueTypeActionImpl() {		ValueTypeActionImpl() {
std::fill(std::begin(ValueTypeActions), std::end(ValueTypeActions),		std::fill(std::begin(ValueTypeActions), std::end(ValueTypeActions),
TypeLegal);		TypeLegal);
}		}

▲ Show 20 Lines • Show All 3,368 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,343 Lines • ▼ Show 20 Lines	if (IsExtractVecSrc) {
if (TLI.isTypeLegal(Ty) &&		if (TLI.isTypeLegal(Ty) &&
TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) &&		TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) &&
TLI.allowsMemoryAccess(Context, DL, Ty, FirstStoreAS,		TLI.allowsMemoryAccess(Context, DL, Ty, FirstStoreAS,
FirstStoreAlign, &IsFast) &&		FirstStoreAlign, &IsFast) &&
IsFast)		IsFast)
NumStoresToMerge = i + 1;		NumStoresToMerge = i + 1;
}		}

		// In case the loop above found no merges and NumStoresToMerge is not
		// changed.
		if (NumStoresToMerge == 1 && Level == BeforeLegalizeTypes) {
		// Some targets support shuffling of vector elements in type
		// legalizing phase, so at BeforeLegalizeTypes level, a legal type for
		// the vector store is not essential. Let target decide how many
		// elements it can merge.
		NumStoresToMerge = TLI.getNumStoresOfVectorElementsToMergePreLegalize(
		Context, DL, MemVT.getScalarType(), FirstStoreAS, FirstStoreAlign,
		NumConsecutiveStores);
		}

// Check if we found a legal integer type creating a meaningful		// Check if we found a legal integer type creating a meaningful
// merge.		// merge.
if (NumStoresToMerge < 2) {		if (NumStoresToMerge < 2) {
// We know that candidate stores are in order and of correct		// We know that candidate stores are in order and of correct
// shape. While there is no mergeable sequence from the		// shape. While there is no mergeable sequence from the
// beginning one may start later in the sequence. The only		// beginning one may start later in the sequence. The only
// reason a merge of size N could have failed where another of		// reason a merge of size N could have failed where another of
// the same size would not have, is if the alignment has		// the same size would not have, is if the alignment has
▲ Show 20 Lines • Show All 5,033 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 846 Lines • ▼ Show 20 Lines	public:

/// Is unaligned memory access allowed for the given type, and is it fast		/// Is unaligned memory access allowed for the given type, and is it fast
/// relative to software emulation.		/// relative to software emulation.
bool allowsMisalignedMemoryAccesses(EVT VT,		bool allowsMisalignedMemoryAccesses(EVT VT,
unsigned AddrSpace,		unsigned AddrSpace,
unsigned Align = 1,		unsigned Align = 1,
bool *Fast = nullptr) const override;		bool *Fast = nullptr) const override;

		/// For some consecutive stores of vector elements that can't fit in legal
		/// vector type, merge is still allowed before type legalizing.
		unsigned getNumStoresOfVectorElementsToMergePreLegalize(
		LLVMContext &Context, const DataLayout &DL, EVT MemVT, unsigned AS,
		unsigned Align, unsigned NumConsecutiveStores) const override;

/// isFMAFasterThanFMulAndFAdd - Return true if an FMA operation is faster		/// isFMAFasterThanFMulAndFAdd - Return true if an FMA operation is faster
/// than a pair of fmul and fadd instructions. fmuladd intrinsics will be		/// than a pair of fmul and fadd instructions. fmuladd intrinsics will be
/// expanded to FMAs when this method returns true, otherwise fmuladd is		/// expanded to FMAs when this method returns true, otherwise fmuladd is
/// expanded to fmul + fadd.		/// expanded to fmul + fadd.
bool isFMAFasterThanFMulAndFAdd(EVT VT) const override;		bool isFMAFasterThanFMulAndFAdd(EVT VT) const override;

const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;		const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;

▲ Show 20 Lines • Show All 337 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,482 Lines • ▼ Show 20 Lines	if (VT == MVT::ppcf128)
return false;		return false;

if (Fast)		if (Fast)
*Fast = true;		*Fast = true;

return true;		return true;
}		}

		unsigned PPCTargetLowering::getNumStoresOfVectorElementsToMergePreLegalize(
		LLVMContext &Context, const DataLayout &DL, EVT VT, unsigned AS,
		unsigned Align, unsigned NumConsecutiveStores) const {
		if (DisablePPCUnaligned) {
		Type *Ty = VT.getTypeForEVT(Context);
		if (Align < DL.getABITypeAlignment(Ty))
		return TargetLowering::getNumStoresOfVectorElementsToMergePreLegalize(
		Context, DL, VT, AS, Align, NumConsecutiveStores);
		}

		if (NumConsecutiveStores < 2 \|\| !Subtarget.hasVSX() \|\| !VT.isSimple())
		return TargetLowering::getNumStoresOfVectorElementsToMergePreLegalize(
		Context, DL, VT, AS, Align, NumConsecutiveStores);
		// PPC's vector has a size of 128 bits.
		unsigned MaxNumberOfLegalStores = 128U / VT.getSizeInBits();
		return std::min(MaxNumberOfLegalStores, NumConsecutiveStores);
		}

bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const {		bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const {
VT = VT.getScalarType();		VT = VT.getScalarType();

if (!VT.isSimple())		if (!VT.isSimple())
return false;		return false;

switch (VT.getSimpleVT().SimpleTy) {		switch (VT.getSimpleVT().SimpleTy) {
case MVT::f32:		case MVT::f32:
▲ Show 20 Lines • Show All 617 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/extract-and-store.ll

Show First 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	entry:
%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3		%arrayidx = getelementptr inbounds i32, i32* %ap, i64 3
store i32 %vecext, i32* %arrayidx, align 4		store i32 %vecext, i32* %arrayidx, align 4
ret <4 x i32> %a		ret <4 x i32> %a
}		}

define dso_local void @test_consecutive_i32(<4 x i32> %a, i32* nocapture %b) local_unnamed_addr #0 {		define dso_local void @test_consecutive_i32(<4 x i32> %a, i32* nocapture %b) local_unnamed_addr #0 {
; CHECK-LABEL: test_consecutive_i32:		; CHECK-LABEL: test_consecutive_i32:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 2		; CHECK-NEXT: vpkudum v2, v2, v2
; CHECK-NEXT: li r3, 4		; CHECK-NEXT: xxswapd vs0, vs34
; CHECK-NEXT: stfiwx f0, 0, r5		; CHECK-NEXT: stfdx f0, 0, r5
; CHECK-NEXT: stxsiwx vs34, r5, r3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: test_consecutive_i32:		; CHECK-BE-LABEL: test_consecutive_i32:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-BE-NEXT: xxswapd vs35, vs34
; CHECK-BE-NEXT: xxsldwi vs1, vs34, vs34, 1		; CHECK-BE-NEXT: vmrghw v2, v2, v3
; CHECK-BE-NEXT: li r3, 4		; CHECK-BE-NEXT: stxsdx vs34, 0, r5
; CHECK-BE-NEXT: stfiwx f0, 0, r5
; CHECK-BE-NEXT: stfiwx f1, r5, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: test_consecutive_i32:		; CHECK-P9-LABEL: test_consecutive_i32:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 2		; CHECK-P9-NEXT: vpkudum v2, v2, v2
; CHECK-P9-NEXT: li r3, 4		; CHECK-P9-NEXT: xxswapd vs0, vs34
; CHECK-P9-NEXT: stfiwx f0, 0, r5		; CHECK-P9-NEXT: stfd f0, 0(r5)
; CHECK-P9-NEXT: stxsiwx vs34, r5, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
;		;
; CHECK-P9-BE-LABEL: test_consecutive_i32:		; CHECK-P9-BE-LABEL: test_consecutive_i32:
; CHECK-P9-BE: # %bb.0: # %entry		; CHECK-P9-BE: # %bb.0: # %entry
; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-P9-BE-NEXT: xxswapd vs35, vs34
; CHECK-P9-BE-NEXT: stfiwx f0, 0, r5		; CHECK-P9-BE-NEXT: vmrghw v2, v2, v3
; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-P9-BE-NEXT: stxsd v2, 0(r5)
; CHECK-P9-BE-NEXT: li r3, 4
; CHECK-P9-BE-NEXT: stfiwx f0, r5, r3
; CHECK-P9-BE-NEXT: blr		; CHECK-P9-BE-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x i32> %a, i32 0		%vecext = extractelement <4 x i32> %a, i32 0
store i32 %vecext, i32* %b, align 4		store i32 %vecext, i32* %b, align 4
%vecext1 = extractelement <4 x i32> %a, i32 2		%vecext1 = extractelement <4 x i32> %a, i32 2
%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 1		%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 1
store i32 %vecext1, i32* %arrayidx2, align 4		store i32 %vecext1, i32* %arrayidx2, align 4
ret void		ret void
}		}

define dso_local void @test_consecutive_float(<4 x float> %a, float* nocapture %b) local_unnamed_addr #0 {		define dso_local void @test_consecutive_float(<4 x float> %a, float* nocapture %b) local_unnamed_addr #0 {
; CHECK-LABEL: test_consecutive_float:		; CHECK-LABEL: test_consecutive_float:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-NEXT: addis r3, r2, .LCPI15_0@toc@ha
; CHECK-NEXT: xxsldwi vs1, vs34, vs34, 3		; CHECK-NEXT: addi r3, r3, .LCPI15_0@toc@l
; CHECK-NEXT: li r3, 4		; CHECK-NEXT: lvx v3, 0, r3
; CHECK-NEXT: stfiwx f0, 0, r5		; CHECK-NEXT: vperm v2, v2, v2, v3
; CHECK-NEXT: stfiwx f1, r5, r3		; CHECK-NEXT: xxswapd vs0, vs34
		; CHECK-NEXT: stfdx f0, 0, r5
; CHECK-NEXT: blr		; CHECK-NEXT: blr
		RKSimonUnsubmitted Not Done Reply Inline Actions is this really an issue with the store merging or the ppc shuffle combines have gotten messed up? RKSimon: is this really an issue with the store merging or the ppc shuffle combines have gotten messed…
		tingwangAuthorUnsubmitted Done Reply Inline Actions Had a quick check, this is the case PPC::LowerVECTOR_SHUFFLE does not have efficient solution, so it turned into VPERM as last resort. Probably the cost function should avoid this kind of situation. tingwang: Had a quick check, this is the case PPC::LowerVECTOR_SHUFFLE does not have efficient solution…
		RKSimonUnsubmitted Not Done Reply Inline Actions I'd be nervous about a cost function as that is likely to be very difficult to keep balanced. I'd probably recommend just overriding canMergeStoresTo or isMultiStoresCheaperThanBitsMerge for PPC RKSimon: I'd be nervous about a cost function as that is likely to be very difficult to keep balanced.
		tingwangAuthorUnsubmitted Done Reply Inline Actions Thank you for the advice. I created a new function, hope that is fine. tingwang: Thank you for the advice. I created a new function, hope that is fine.
;		;
; CHECK-BE-LABEL: test_consecutive_float:		; CHECK-BE-LABEL: test_consecutive_float:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 2		; CHECK-BE-NEXT: vpkudum v2, v2, v2
; CHECK-BE-NEXT: li r3, 4		; CHECK-BE-NEXT: stxsdx vs34, 0, r5
; CHECK-BE-NEXT: stxsiwx vs34, 0, r5
; CHECK-BE-NEXT: stfiwx f0, r5, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: test_consecutive_float:		; CHECK-P9-LABEL: test_consecutive_float:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-P9-NEXT: addis r3, r2, .LCPI15_0@toc@ha
; CHECK-P9-NEXT: stfiwx f0, 0, r5		; CHECK-P9-NEXT: addi r3, r3, .LCPI15_0@toc@l
; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-P9-NEXT: lxvx vs35, 0, r3
; CHECK-P9-NEXT: li r3, 4		; CHECK-P9-NEXT: vperm v2, v2, v2, v3
; CHECK-P9-NEXT: stfiwx f0, r5, r3		; CHECK-P9-NEXT: xxswapd vs0, vs34
		; CHECK-P9-NEXT: stfd f0, 0(r5)
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
;		;
; CHECK-P9-BE-LABEL: test_consecutive_float:		; CHECK-P9-BE-LABEL: test_consecutive_float:
; CHECK-P9-BE: # %bb.0: # %entry		; CHECK-P9-BE: # %bb.0: # %entry
; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs34, 2		; CHECK-P9-BE-NEXT: vpkudum v2, v2, v2
; CHECK-P9-BE-NEXT: li r3, 4		; CHECK-P9-BE-NEXT: stxsd v2, 0(r5)
; CHECK-P9-BE-NEXT: stxsiwx vs34, 0, r5
; CHECK-P9-BE-NEXT: stfiwx f0, r5, r3
; CHECK-P9-BE-NEXT: blr		; CHECK-P9-BE-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x float> %a, i32 1		%vecext = extractelement <4 x float> %a, i32 1
store float %vecext, float* %b, align 4		store float %vecext, float* %b, align 4
%vecext1 = extractelement <4 x float> %a, i32 3		%vecext1 = extractelement <4 x float> %a, i32 3
%arrayidx2 = getelementptr inbounds float, float* %b, i64 1		%arrayidx2 = getelementptr inbounds float, float* %b, i64 1
store float %vecext1, float* %arrayidx2, align 4		store float %vecext1, float* %arrayidx2, align 4
ret void		ret void
}		}

define dso_local void @test_stores_exceed_vec_size(<4 x i32> %a, i32* nocapture %b) local_unnamed_addr #0 {		define dso_local void @test_stores_exceed_vec_size(<4 x i32> %a, i32* nocapture %b) local_unnamed_addr #0 {
; CHECK-LABEL: test_stores_exceed_vec_size:		; CHECK-LABEL: test_stores_exceed_vec_size:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: addis r3, r2, .LCPI16_0@toc@ha		; CHECK-NEXT: addis r3, r2, .LCPI16_0@toc@ha
; CHECK-NEXT: xxsldwi vs1, vs34, vs34, 1
; CHECK-NEXT: li r4, 20
; CHECK-NEXT: addi r3, r3, .LCPI16_0@toc@l		; CHECK-NEXT: addi r3, r3, .LCPI16_0@toc@l
; CHECK-NEXT: lvx v3, 0, r3		; CHECK-NEXT: lvx v3, 0, r3
; CHECK-NEXT: li r3, 16
; CHECK-NEXT: vperm v3, v2, v2, v3		; CHECK-NEXT: vperm v3, v2, v2, v3
		; CHECK-NEXT: vsldoi v2, v2, v2, 12
; CHECK-NEXT: xxswapd vs0, vs35		; CHECK-NEXT: xxswapd vs0, vs35
		; CHECK-NEXT: xxswapd vs1, vs34
; CHECK-NEXT: stxvd2x vs0, 0, r5		; CHECK-NEXT: stxvd2x vs0, 0, r5
; CHECK-NEXT: stfiwx f1, r5, r3		; CHECK-NEXT: stfd f1, 16(r5)
; CHECK-NEXT: stxsiwx vs34, r5, r4
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: test_stores_exceed_vec_size:		; CHECK-BE-LABEL: test_stores_exceed_vec_size:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxspltw vs0, vs34, 0		; CHECK-BE-NEXT: xxspltw vs0, vs34, 0
; CHECK-BE-NEXT: xxsldwi vs1, vs34, vs34, 1		; CHECK-BE-NEXT: vsldoi v3, v2, v2, 4
; CHECK-BE-NEXT: li r3, 16		; CHECK-BE-NEXT: li r3, 16
; CHECK-BE-NEXT: li r4, 20
; CHECK-BE-NEXT: stxsiwx vs34, r5, r3
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs0, 2		; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs0, 2
; CHECK-BE-NEXT: stxvw4x vs0, 0, r5		; CHECK-BE-NEXT: stxvw4x vs0, 0, r5
; CHECK-BE-NEXT: stfiwx f1, r5, r4		; CHECK-BE-NEXT: stxsdx vs35, r5, r3
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: test_stores_exceed_vec_size:		; CHECK-P9-LABEL: test_stores_exceed_vec_size:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: addis r3, r2, .LCPI16_0@toc@ha		; CHECK-P9-NEXT: addis r3, r2, .LCPI16_0@toc@ha
; CHECK-P9-NEXT: addi r3, r3, .LCPI16_0@toc@l		; CHECK-P9-NEXT: addi r3, r3, .LCPI16_0@toc@l
; CHECK-P9-NEXT: lxvx vs35, 0, r3		; CHECK-P9-NEXT: lxvx vs35, 0, r3
; CHECK-P9-NEXT: li r3, 16
; CHECK-P9-NEXT: vperm v3, v2, v2, v3		; CHECK-P9-NEXT: vperm v3, v2, v2, v3
; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-P9-NEXT: vsldoi v2, v2, v2, 12
		; CHECK-P9-NEXT: xxswapd vs0, vs34
; CHECK-P9-NEXT: stxv vs35, 0(r5)		; CHECK-P9-NEXT: stxv vs35, 0(r5)
; CHECK-P9-NEXT: stfiwx f0, r5, r3		; CHECK-P9-NEXT: stfd f0, 16(r5)
; CHECK-P9-NEXT: li r3, 20
; CHECK-P9-NEXT: stxsiwx vs34, r5, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
;		;
; CHECK-P9-BE-LABEL: test_stores_exceed_vec_size:		; CHECK-P9-BE-LABEL: test_stores_exceed_vec_size:
; CHECK-P9-BE: # %bb.0: # %entry		; CHECK-P9-BE: # %bb.0: # %entry
; CHECK-P9-BE-NEXT: xxspltw vs0, vs34, 0		; CHECK-P9-BE-NEXT: xxspltw vs0, vs34, 0
; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs0, 2		; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs0, 2
; CHECK-P9-BE-NEXT: li r3, 16		; CHECK-P9-BE-NEXT: vsldoi v2, v2, v2, 4
; CHECK-P9-BE-NEXT: stxv vs0, 0(r5)		; CHECK-P9-BE-NEXT: stxv vs0, 0(r5)
; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs34, 1		; CHECK-P9-BE-NEXT: stxsd v2, 16(r5)
; CHECK-P9-BE-NEXT: stxsiwx vs34, r5, r3
; CHECK-P9-BE-NEXT: li r3, 20
; CHECK-P9-BE-NEXT: stfiwx f0, r5, r3
; CHECK-P9-BE-NEXT: blr		; CHECK-P9-BE-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x i32> %a, i32 2		%vecext = extractelement <4 x i32> %a, i32 2
store i32 %vecext, i32* %b, align 4		store i32 %vecext, i32* %b, align 4
%vecext1 = extractelement <4 x i32> %a, i32 3		%vecext1 = extractelement <4 x i32> %a, i32 3
%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 1		%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 1
store i32 %vecext1, i32* %arrayidx2, align 4		store i32 %vecext1, i32* %arrayidx2, align 4
%vecext3 = extractelement <4 x i32> %a, i32 0		%vecext3 = extractelement <4 x i32> %a, i32 0
%arrayidx4 = getelementptr inbounds i32, i32* %b, i64 2		%arrayidx4 = getelementptr inbounds i32, i32* %b, i64 2
store i32 %vecext3, i32* %arrayidx4, align 4		store i32 %vecext3, i32* %arrayidx4, align 4
%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 3		%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 3
store i32 %vecext3, i32* %arrayidx6, align 4		store i32 %vecext3, i32* %arrayidx6, align 4
%vecext7 = extractelement <4 x i32> %a, i32 1		%vecext7 = extractelement <4 x i32> %a, i32 1
%arrayidx8 = getelementptr inbounds i32, i32* %b, i64 4		%arrayidx8 = getelementptr inbounds i32, i32* %b, i64 4
store i32 %vecext7, i32* %arrayidx8, align 4		store i32 %vecext7, i32* %arrayidx8, align 4
%arrayidx10 = getelementptr inbounds i32, i32* %b, i64 5		%arrayidx10 = getelementptr inbounds i32, i32* %b, i64 5
store i32 %vecext, i32* %arrayidx10, align 4		store i32 %vecext, i32* %arrayidx10, align 4
ret void		ret void
}		}

define void @test_5_consecutive_stores_of_bytes(<16 x i8> %a, i8* nocapture %b) local_unnamed_addr #0 {		define void @test_5_consecutive_stores_of_bytes(<16 x i8> %a, i8* nocapture %b) local_unnamed_addr #0 {
; CHECK-LABEL: test_5_consecutive_stores_of_bytes:		; CHECK-LABEL: test_5_consecutive_stores_of_bytes:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: addis r3, r2, .LCPI17_0@toc@ha
; CHECK-NEXT: xxswapd vs0, vs34		; CHECK-NEXT: xxswapd vs0, vs34
; CHECK-NEXT: mfvsrd r3, vs34		; CHECK-NEXT: addi r3, r3, .LCPI17_0@toc@l
; CHECK-NEXT: rldicl r6, r3, 32, 56		; CHECK-NEXT: lvx v3, 0, r3
; CHECK-NEXT: rldicl r3, r3, 56, 56		; CHECK-NEXT: mfvsrd r3, f0
; CHECK-NEXT: mfvsrd r4, f0		; CHECK-NEXT: vperm v3, v2, v2, v3
; CHECK-NEXT: stb r6, 1(r5)		; CHECK-NEXT: rldicl r3, r3, 16, 56
; CHECK-NEXT: stb r3, 2(r5)		; CHECK-NEXT: stb r3, 4(r5)
; CHECK-NEXT: rldicl r6, r4, 32, 56		; CHECK-NEXT: xxsldwi vs1, vs35, vs35, 2
; CHECK-NEXT: rldicl r3, r4, 8, 56		; CHECK-NEXT: stfiwx f1, 0, r5
; CHECK-NEXT: rldicl r4, r4, 16, 56
; CHECK-NEXT: stb r6, 0(r5)
; CHECK-NEXT: stb r3, 3(r5)
; CHECK-NEXT: stb r4, 4(r5)
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: test_5_consecutive_stores_of_bytes:		; CHECK-BE-LABEL: test_5_consecutive_stores_of_bytes:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxswapd vs0, vs34		; CHECK-BE-NEXT: addis r3, r2, .LCPI17_0@toc@ha
		; CHECK-BE-NEXT: addi r3, r3, .LCPI17_0@toc@l
		; CHECK-BE-NEXT: lxvw4x vs35, 0, r3
; CHECK-BE-NEXT: mfvsrd r3, vs34		; CHECK-BE-NEXT: mfvsrd r3, vs34
; CHECK-BE-NEXT: rldicl r6, r3, 40, 56
; CHECK-BE-NEXT: mfvsrd r4, f0
; CHECK-BE-NEXT: stb r6, 0(r5)
; CHECK-BE-NEXT: rldicl r6, r4, 40, 56
; CHECK-BE-NEXT: rldicl r4, r4, 16, 56
; CHECK-BE-NEXT: stb r6, 1(r5)
; CHECK-BE-NEXT: clrldi r6, r3, 56
; CHECK-BE-NEXT: rldicl r3, r3, 56, 56		; CHECK-BE-NEXT: rldicl r3, r3, 56, 56
; CHECK-BE-NEXT: stb r4, 2(r5)		; CHECK-BE-NEXT: vperm v3, v2, v2, v3
; CHECK-BE-NEXT: stb r6, 3(r5)
; CHECK-BE-NEXT: stb r3, 4(r5)		; CHECK-BE-NEXT: stb r3, 4(r5)
		; CHECK-BE-NEXT: xxsldwi vs0, vs35, vs35, 3
		; CHECK-BE-NEXT: stfiwx f0, 0, r5
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: test_5_consecutive_stores_of_bytes:		; CHECK-P9-LABEL: test_5_consecutive_stores_of_bytes:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 4		; CHECK-P9-NEXT: vsldoi v3, v2, v2, 2
; CHECK-P9-NEXT: stxsibx vs35, 0, r5
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 12
; CHECK-P9-NEXT: li r3, 1
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 15
; CHECK-P9-NEXT: li r3, 2
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 1
; CHECK-P9-NEXT: li r3, 3
; CHECK-P9-NEXT: vsldoi v2, v2, v2, 2
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: li r3, 4		; CHECK-P9-NEXT: li r3, 4
; CHECK-P9-NEXT: stxsibx vs34, r5, r3		; CHECK-P9-NEXT: stxsibx vs35, r5, r3
		; CHECK-P9-NEXT: addis r3, r2, .LCPI17_0@toc@ha
		; CHECK-P9-NEXT: addi r3, r3, .LCPI17_0@toc@l
		; CHECK-P9-NEXT: lxvx vs35, 0, r3
		; CHECK-P9-NEXT: vperm v2, v2, v2, v3
		; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 2
		; CHECK-P9-NEXT: stfiwx f0, 0, r5
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
;		;
; CHECK-P9-BE-LABEL: test_5_consecutive_stores_of_bytes:		; CHECK-P9-BE-LABEL: test_5_consecutive_stores_of_bytes:
; CHECK-P9-BE: # %bb.0: # %entry		; CHECK-P9-BE: # %bb.0: # %entry
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 13		; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 15
; CHECK-P9-BE-NEXT: stxsibx vs35, 0, r5
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 5
; CHECK-P9-BE-NEXT: li r3, 1
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 2
; CHECK-P9-BE-NEXT: li r3, 2
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: li r3, 3
; CHECK-P9-BE-NEXT: stxsibx vs34, r5, r3
; CHECK-P9-BE-NEXT: vsldoi v2, v2, v2, 15
; CHECK-P9-BE-NEXT: li r3, 4		; CHECK-P9-BE-NEXT: li r3, 4
; CHECK-P9-BE-NEXT: stxsibx vs34, r5, r3		; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
		; CHECK-P9-BE-NEXT: addis r3, r2, .LCPI17_0@toc@ha
		; CHECK-P9-BE-NEXT: addi r3, r3, .LCPI17_0@toc@l
		; CHECK-P9-BE-NEXT: lxvx vs35, 0, r3
		; CHECK-P9-BE-NEXT: vperm v2, v2, v2, v3
		; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs34, 3
		; CHECK-P9-BE-NEXT: stfiwx f0, 0, r5
; CHECK-P9-BE-NEXT: blr		; CHECK-P9-BE-NEXT: blr
entry:		entry:
%vecext = extractelement <16 x i8> %a, i32 4		%vecext = extractelement <16 x i8> %a, i32 4
store i8 %vecext, i8* %b, align 1		store i8 %vecext, i8* %b, align 1
%vecext1 = extractelement <16 x i8> %a, i32 12		%vecext1 = extractelement <16 x i8> %a, i32 12
%arrayidx2 = getelementptr inbounds i8, i8* %b, i64 1		%arrayidx2 = getelementptr inbounds i8, i8* %b, i64 1
store i8 %vecext1, i8* %arrayidx2, align 1		store i8 %vecext1, i8* %arrayidx2, align 1
%vecext3 = extractelement <16 x i8> %a, i32 9		%vecext3 = extractelement <16 x i8> %a, i32 9
%arrayidx4 = getelementptr inbounds i8, i8* %b, i64 2		%arrayidx4 = getelementptr inbounds i8, i8* %b, i64 2
store i8 %vecext3, i8* %arrayidx4, align 1		store i8 %vecext3, i8* %arrayidx4, align 1
%vecext5 = extractelement <16 x i8> %a, i32 7		%vecext5 = extractelement <16 x i8> %a, i32 7
%arrayidx6 = getelementptr inbounds i8, i8* %b, i64 3		%arrayidx6 = getelementptr inbounds i8, i8* %b, i64 3
store i8 %vecext5, i8* %arrayidx6, align 1		store i8 %vecext5, i8* %arrayidx6, align 1
%vecext7 = extractelement <16 x i8> %a, i32 6		%vecext7 = extractelement <16 x i8> %a, i32 6
%arrayidx8 = getelementptr inbounds i8, i8* %b, i64 4		%arrayidx8 = getelementptr inbounds i8, i8* %b, i64 4
store i8 %vecext7, i8* %arrayidx8, align 1		store i8 %vecext7, i8* %arrayidx8, align 1
ret void		ret void
}		}

define void @test_13_consecutive_stores_of_bytes(<16 x i8> %a, i8* nocapture %b) local_unnamed_addr #0 {		define void @test_13_consecutive_stores_of_bytes(<16 x i8> %a, i8* nocapture %b) local_unnamed_addr #0 {
; CHECK-LABEL: test_13_consecutive_stores_of_bytes:		; CHECK-LABEL: test_13_consecutive_stores_of_bytes:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxswapd vs0, vs34		; CHECK-NEXT: addis r3, r2, .LCPI18_0@toc@ha
		; CHECK-NEXT: li r4, 8
		; CHECK-NEXT: addi r3, r3, .LCPI18_0@toc@l
		; CHECK-NEXT: lvx v3, 0, r3
; CHECK-NEXT: mfvsrd r3, vs34		; CHECK-NEXT: mfvsrd r3, vs34
; CHECK-NEXT: rldicl r4, r3, 32, 56
; CHECK-NEXT: rldicl r6, r3, 56, 56
; CHECK-NEXT: stb r4, 1(r5)
; CHECK-NEXT: rldicl r4, r3, 40, 56
; CHECK-NEXT: mfvsrd r7, f0
; CHECK-NEXT: stb r6, 2(r5)
; CHECK-NEXT: rldicl r6, r3, 24, 56
; CHECK-NEXT: stb r4, 6(r5)
; CHECK-NEXT: rldicl r4, r3, 8, 56
; CHECK-NEXT: stb r6, 7(r5)
; CHECK-NEXT: rldicl r3, r3, 16, 56		; CHECK-NEXT: rldicl r3, r3, 16, 56
; CHECK-NEXT: stb r4, 9(r5)		; CHECK-NEXT: vperm v3, v2, v2, v3
; CHECK-NEXT: rldicl r4, r7, 32, 56		; CHECK-NEXT: xxswapd vs0, vs35
; CHECK-NEXT: rldicl r6, r7, 8, 56		; CHECK-NEXT: stxsiwx vs35, r5, r4
; CHECK-NEXT: stb r4, 0(r5)
; CHECK-NEXT: rldicl r4, r7, 16, 56
; CHECK-NEXT: stb r6, 3(r5)
; CHECK-NEXT: clrldi r6, r7, 56
; CHECK-NEXT: stb r4, 4(r5)
; CHECK-NEXT: rldicl r4, r7, 48, 56
; CHECK-NEXT: stb r6, 5(r5)
; CHECK-NEXT: rldicl r6, r7, 56, 56
; CHECK-NEXT: stb r4, 8(r5)
; CHECK-NEXT: rldicl r4, r7, 24, 56
; CHECK-NEXT: stb r6, 10(r5)
; CHECK-NEXT: stb r4, 11(r5)
; CHECK-NEXT: stb r3, 12(r5)		; CHECK-NEXT: stb r3, 12(r5)
		; CHECK-NEXT: stfdx f0, 0, r5
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: test_13_consecutive_stores_of_bytes:		; CHECK-BE-LABEL: test_13_consecutive_stores_of_bytes:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: mfvsrd r3, vs34		; CHECK-BE-NEXT: addis r3, r2, .LCPI18_0@toc@ha
; CHECK-BE-NEXT: xxswapd vs0, vs34		; CHECK-BE-NEXT: xxswapd vs0, vs34
; CHECK-BE-NEXT: rldicl r4, r3, 40, 56		; CHECK-BE-NEXT: li r4, 8
; CHECK-BE-NEXT: clrldi r6, r3, 56		; CHECK-BE-NEXT: addi r3, r3, .LCPI18_0@toc@l
; CHECK-BE-NEXT: stb r4, 0(r5)		; CHECK-BE-NEXT: lxvw4x vs35, 0, r3
; CHECK-BE-NEXT: rldicl r4, r3, 56, 56		; CHECK-BE-NEXT: mfvsrd r3, f0
; CHECK-BE-NEXT: mfvsrd r7, f0		; CHECK-BE-NEXT: vperm v3, v2, v2, v3
; CHECK-BE-NEXT: stb r6, 3(r5)		; CHECK-BE-NEXT: rldicl r3, r3, 56, 56
; CHECK-BE-NEXT: rldicl r6, r3, 8, 56		; CHECK-BE-NEXT: stb r3, 12(r5)
; CHECK-BE-NEXT: stb r4, 4(r5)		; CHECK-BE-NEXT: xxsldwi vs1, vs35, vs35, 1
; CHECK-BE-NEXT: rldicl r4, r3, 24, 56		; CHECK-BE-NEXT: stxsdx vs35, 0, r5
; CHECK-BE-NEXT: stb r6, 5(r5)		; CHECK-BE-NEXT: stfiwx f1, r5, r4
; CHECK-BE-NEXT: rldicl r6, r3, 16, 56
; CHECK-BE-NEXT: stb r4, 8(r5)
; CHECK-BE-NEXT: rldicl r4, r7, 40, 56
; CHECK-BE-NEXT: stb r6, 10(r5)
; CHECK-BE-NEXT: rldicl r6, r7, 16, 56
; CHECK-BE-NEXT: stb r4, 1(r5)
; CHECK-BE-NEXT: rldicl r4, r7, 32, 56
; CHECK-BE-NEXT: stb r6, 2(r5)
; CHECK-BE-NEXT: rldicl r6, r7, 48, 56
; CHECK-BE-NEXT: stb r4, 6(r5)
; CHECK-BE-NEXT: clrldi r4, r7, 56
; CHECK-BE-NEXT: stb r6, 7(r5)
; CHECK-BE-NEXT: rldicl r3, r3, 48, 56
; CHECK-BE-NEXT: rldicl r6, r7, 56, 56
; CHECK-BE-NEXT: stb r4, 9(r5)
; CHECK-BE-NEXT: stb r3, 11(r5)
; CHECK-BE-NEXT: stb r6, 12(r5)
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: test_13_consecutive_stores_of_bytes:		; CHECK-P9-LABEL: test_13_consecutive_stores_of_bytes:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 4		; CHECK-P9-NEXT: vsldoi v3, v2, v2, 10
; CHECK-P9-NEXT: stxsibx vs35, 0, r5		; CHECK-P9-NEXT: li r3, 12
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 12
; CHECK-P9-NEXT: li r3, 1
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 15
; CHECK-P9-NEXT: li r3, 2
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 1
; CHECK-P9-NEXT: li r3, 3
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 2
; CHECK-P9-NEXT: li r3, 4
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 8
; CHECK-P9-NEXT: li r3, 5
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 13
; CHECK-P9-NEXT: li r3, 6
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 11
; CHECK-P9-NEXT: li r3, 7
; CHECK-P9-NEXT: stxsibx vs35, r5, r3		; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 6		; CHECK-P9-NEXT: addis r3, r2, .LCPI18_0@toc@ha
		; CHECK-P9-NEXT: addi r3, r3, .LCPI18_0@toc@l
		; CHECK-P9-NEXT: lxvx vs35, 0, r3
; CHECK-P9-NEXT: li r3, 8		; CHECK-P9-NEXT: li r3, 8
; CHECK-P9-NEXT: stxsibx vs35, r5, r3		; CHECK-P9-NEXT: vperm v2, v2, v2, v3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 9		; CHECK-P9-NEXT: xxswapd vs0, vs34
; CHECK-P9-NEXT: li r3, 9		; CHECK-P9-NEXT: stxsiwx vs34, r5, r3
; CHECK-P9-NEXT: stxsibx vs35, r5, r3		; CHECK-P9-NEXT: stfd f0, 0(r5)
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 7
; CHECK-P9-NEXT: li r3, 10
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: vsldoi v3, v2, v2, 3
; CHECK-P9-NEXT: li r3, 11
; CHECK-P9-NEXT: vsldoi v2, v2, v2, 10
; CHECK-P9-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-NEXT: li r3, 12
; CHECK-P9-NEXT: stxsibx vs34, r5, r3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
;		;
; CHECK-P9-BE-LABEL: test_13_consecutive_stores_of_bytes:		; CHECK-P9-BE-LABEL: test_13_consecutive_stores_of_bytes:
; CHECK-P9-BE: # %bb.0: # %entry		; CHECK-P9-BE: # %bb.0: # %entry
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 13		; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 7
; CHECK-P9-BE-NEXT: stxsibx vs35, 0, r5		; CHECK-P9-BE-NEXT: li r3, 12
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 5
; CHECK-P9-BE-NEXT: li r3, 1
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 2
; CHECK-P9-BE-NEXT: li r3, 2
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: li r3, 3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 15
; CHECK-P9-BE-NEXT: stxsibx vs34, r5, r3
; CHECK-P9-BE-NEXT: li r3, 4
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 9
; CHECK-P9-BE-NEXT: li r3, 5
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 4
; CHECK-P9-BE-NEXT: li r3, 6
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 6
; CHECK-P9-BE-NEXT: li r3, 7
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3		; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 11		; CHECK-P9-BE-NEXT: addis r3, r2, .LCPI18_0@toc@ha
		; CHECK-P9-BE-NEXT: addi r3, r3, .LCPI18_0@toc@l
		; CHECK-P9-BE-NEXT: lxvx vs35, 0, r3
; CHECK-P9-BE-NEXT: li r3, 8		; CHECK-P9-BE-NEXT: li r3, 8
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3		; CHECK-P9-BE-NEXT: vperm v2, v2, v2, v3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 8		; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs34, 1
; CHECK-P9-BE-NEXT: li r3, 9		; CHECK-P9-BE-NEXT: stfiwx f0, r5, r3
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3		; CHECK-P9-BE-NEXT: stxsd v2, 0(r5)
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 10
; CHECK-P9-BE-NEXT: li r3, 10
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: vsldoi v3, v2, v2, 14
; CHECK-P9-BE-NEXT: li r3, 11
; CHECK-P9-BE-NEXT: vsldoi v2, v2, v2, 7
; CHECK-P9-BE-NEXT: stxsibx vs35, r5, r3
; CHECK-P9-BE-NEXT: li r3, 12
; CHECK-P9-BE-NEXT: stxsibx vs34, r5, r3
; CHECK-P9-BE-NEXT: blr		; CHECK-P9-BE-NEXT: blr
entry:		entry:
%vecext = extractelement <16 x i8> %a, i32 4		%vecext = extractelement <16 x i8> %a, i32 4
store i8 %vecext, i8* %b, align 1		store i8 %vecext, i8* %b, align 1
%vecext1 = extractelement <16 x i8> %a, i32 12		%vecext1 = extractelement <16 x i8> %a, i32 12
%arrayidx2 = getelementptr inbounds i8, i8* %b, i64 1		%arrayidx2 = getelementptr inbounds i8, i8* %b, i64 1
store i8 %vecext1, i8* %arrayidx2, align 1		store i8 %vecext1, i8* %arrayidx2, align 1
%vecext3 = extractelement <16 x i8> %a, i32 9		%vecext3 = extractelement <16 x i8> %a, i32 9
Show All 30 Lines	entry:
%arrayidx24 = getelementptr inbounds i8, i8* %b, i64 12		%arrayidx24 = getelementptr inbounds i8, i8* %b, i64 12
store i8 %vecext23, i8* %arrayidx24, align 1		store i8 %vecext23, i8* %arrayidx24, align 1
ret void		ret void
}		}

define void @test_elements_from_two_vec(<4 x i32> %a, <4 x i32> %b, i32* nocapture %c) local_unnamed_addr #0 {		define void @test_elements_from_two_vec(<4 x i32> %a, <4 x i32> %b, i32* nocapture %c) local_unnamed_addr #0 {
; CHECK-LABEL: test_elements_from_two_vec:		; CHECK-LABEL: test_elements_from_two_vec:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 2		; CHECK-NEXT: addis r3, r2, .LCPI19_0@toc@ha
; CHECK-NEXT: xxsldwi vs1, vs35, vs35, 1		; CHECK-NEXT: addi r3, r3, .LCPI19_0@toc@l
; CHECK-NEXT: li r3, 4		; CHECK-NEXT: lvx v4, 0, r3
; CHECK-NEXT: stfiwx f0, r7, r3		; CHECK-NEXT: vperm v2, v2, v3, v4
; CHECK-NEXT: stfiwx f1, 0, r7		; CHECK-NEXT: xxswapd vs0, vs34
		; CHECK-NEXT: stfdx f0, 0, r7
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: test_elements_from_two_vec:		; CHECK-BE-LABEL: test_elements_from_two_vec:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-BE-NEXT: vmrghw v3, v3, v3
; CHECK-BE-NEXT: li r3, 4		; CHECK-BE-NEXT: xxsldwi vs0, vs35, vs34, 3
; CHECK-BE-NEXT: stfiwx f0, r7, r3		; CHECK-BE-NEXT: stfdx f0, 0, r7
; CHECK-BE-NEXT: stxsiwx vs35, 0, r7
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: test_elements_from_two_vec:		; CHECK-P9-LABEL: test_elements_from_two_vec:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 2		; CHECK-P9-NEXT: addis r3, r2, .LCPI19_0@toc@ha
; CHECK-P9-NEXT: li r3, 4		; CHECK-P9-NEXT: addi r3, r3, .LCPI19_0@toc@l
; CHECK-P9-NEXT: stfiwx f0, r7, r3		; CHECK-P9-NEXT: lxvx vs36, 0, r3
; CHECK-P9-NEXT: xxsldwi vs0, vs35, vs35, 1		; CHECK-P9-NEXT: vperm v2, v2, v3, v4
; CHECK-P9-NEXT: stfiwx f0, 0, r7		; CHECK-P9-NEXT: xxswapd vs0, vs34
		; CHECK-P9-NEXT: stfd f0, 0(r7)
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
;		;
; CHECK-P9-BE-LABEL: test_elements_from_two_vec:		; CHECK-P9-BE-LABEL: test_elements_from_two_vec:
; CHECK-P9-BE: # %bb.0: # %entry		; CHECK-P9-BE: # %bb.0: # %entry
; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-P9-BE-NEXT: vmrghw v3, v3, v3
; CHECK-P9-BE-NEXT: li r3, 4		; CHECK-P9-BE-NEXT: xxsldwi vs0, vs35, vs34, 3
; CHECK-P9-BE-NEXT: stfiwx f0, r7, r3		; CHECK-P9-BE-NEXT: stfd f0, 0(r7)
; CHECK-P9-BE-NEXT: stxsiwx vs35, 0, r7
; CHECK-P9-BE-NEXT: blr		; CHECK-P9-BE-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x i32> %a, i32 0		%vecext = extractelement <4 x i32> %a, i32 0
%arrayidx = getelementptr inbounds i32, i32* %c, i64 1		%arrayidx = getelementptr inbounds i32, i32* %c, i64 1
store i32 %vecext, i32* %arrayidx, align 4		store i32 %vecext, i32* %arrayidx, align 4
%vecext1 = extractelement <4 x i32> %b, i32 1		%vecext1 = extractelement <4 x i32> %b, i32 1
store i32 %vecext1, i32* %c, align 4		store i32 %vecext1, i32* %c, align 4
ret void		ret void
}		}

define dso_local void @test_elements_from_three_vec(<4 x float> %a, <4 x float> %b, <4 x float> %c, float* nocapture %d) local_unnamed_addr #0 {		define dso_local void @test_elements_from_three_vec(<4 x float> %a, <4 x float> %b, <4 x float> %c, float* nocapture %d) local_unnamed_addr #0 {
; CHECK-LABEL: test_elements_from_three_vec:		; CHECK-LABEL: test_elements_from_three_vec:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxsldwi vs0, vs34, vs34, 3		; CHECK-NEXT: addis r3, r2, .LCPI20_0@toc@ha
; CHECK-NEXT: xxsldwi vs1, vs36, vs36, 1		; CHECK-NEXT: xxsldwi vs0, vs36, vs36, 1
; CHECK-NEXT: li r3, 4		; CHECK-NEXT: addi r3, r3, .LCPI20_0@toc@l
; CHECK-NEXT: li r4, 8		; CHECK-NEXT: lvx v5, 0, r3
; CHECK-NEXT: stxsiwx vs35, r9, r3		; CHECK-NEXT: li r3, 8
; CHECK-NEXT: stfiwx f0, 0, r9		; CHECK-NEXT: stfiwx f0, r9, r3
; CHECK-NEXT: stfiwx f1, r9, r4		; CHECK-NEXT: vperm v2, v3, v2, v5
		; CHECK-NEXT: xxswapd vs1, vs34
		; CHECK-NEXT: stfdx f1, 0, r9
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-BE-LABEL: test_elements_from_three_vec:		; CHECK-BE-LABEL: test_elements_from_three_vec:
; CHECK-BE: # %bb.0: # %entry		; CHECK-BE: # %bb.0: # %entry
; CHECK-BE-NEXT: xxsldwi vs0, vs34, vs34, 2		; CHECK-BE-NEXT: xxsldwi vs34, vs34, vs34, 1
; CHECK-BE-NEXT: xxsldwi vs1, vs35, vs35, 1		; CHECK-BE-NEXT: li r3, 8
; CHECK-BE-NEXT: li r3, 4		; CHECK-BE-NEXT: stxsiwx vs36, r9, r3
; CHECK-BE-NEXT: li r4, 8		; CHECK-BE-NEXT: vmrglw v2, v2, v3
; CHECK-BE-NEXT: stfiwx f1, r9, r3		; CHECK-BE-NEXT: stxsdx vs34, 0, r9
; CHECK-BE-NEXT: stfiwx f0, 0, r9
; CHECK-BE-NEXT: stxsiwx vs36, r9, r4
; CHECK-BE-NEXT: blr		; CHECK-BE-NEXT: blr
;		;
; CHECK-P9-LABEL: test_elements_from_three_vec:		; CHECK-P9-LABEL: test_elements_from_three_vec:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: xxsldwi vs0, vs34, vs34, 3
; CHECK-P9-NEXT: li r3, 4
; CHECK-P9-NEXT: stfiwx f0, 0, r9
; CHECK-P9-NEXT: xxsldwi vs0, vs36, vs36, 1		; CHECK-P9-NEXT: xxsldwi vs0, vs36, vs36, 1
; CHECK-P9-NEXT: stxsiwx vs35, r9, r3
; CHECK-P9-NEXT: li r3, 8		; CHECK-P9-NEXT: li r3, 8
; CHECK-P9-NEXT: stfiwx f0, r9, r3		; CHECK-P9-NEXT: stfiwx f0, r9, r3
		; CHECK-P9-NEXT: addis r3, r2, .LCPI20_0@toc@ha
		; CHECK-P9-NEXT: addi r3, r3, .LCPI20_0@toc@l
		; CHECK-P9-NEXT: lxvx vs36, 0, r3
		; CHECK-P9-NEXT: vperm v2, v3, v2, v4
		; CHECK-P9-NEXT: xxswapd vs0, vs34
		; CHECK-P9-NEXT: stfd f0, 0(r9)
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
;		;
; CHECK-P9-BE-LABEL: test_elements_from_three_vec:		; CHECK-P9-BE-LABEL: test_elements_from_three_vec:
; CHECK-P9-BE: # %bb.0: # %entry		; CHECK-P9-BE: # %bb.0: # %entry
; CHECK-P9-BE-NEXT: xxsldwi vs0, vs34, vs34, 2		; CHECK-P9-BE-NEXT: xxsldwi vs34, vs34, vs34, 1
; CHECK-P9-BE-NEXT: stfiwx f0, 0, r9		; CHECK-P9-BE-NEXT: vmrglw v2, v2, v3
; CHECK-P9-BE-NEXT: xxsldwi vs0, vs35, vs35, 1
; CHECK-P9-BE-NEXT: li r3, 4
; CHECK-P9-BE-NEXT: stfiwx f0, r9, r3
; CHECK-P9-BE-NEXT: li r3, 8		; CHECK-P9-BE-NEXT: li r3, 8
; CHECK-P9-BE-NEXT: stxsiwx vs36, r9, r3		; CHECK-P9-BE-NEXT: stxsiwx vs36, r9, r3
		; CHECK-P9-BE-NEXT: stxsd v2, 0(r9)
; CHECK-P9-BE-NEXT: blr		; CHECK-P9-BE-NEXT: blr
entry:		entry:
%vecext = extractelement <4 x float> %a, i32 3		%vecext = extractelement <4 x float> %a, i32 3
store float %vecext, float* %d, align 4		store float %vecext, float* %d, align 4
%vecext1 = extractelement <4 x float> %b, i32 2		%vecext1 = extractelement <4 x float> %b, i32 2
%arrayidx2 = getelementptr inbounds float, float* %d, i64 1		%arrayidx2 = getelementptr inbounds float, float* %d, i64 1
store float %vecext1, float* %arrayidx2, align 4		store float %vecext1, float* %arrayidx2, align 4
%vecext3 = extractelement <4 x float> %c, i32 1		%vecext3 = extractelement <4 x float> %c, i32 1
%arrayidx4 = getelementptr inbounds float, float* %d, i64 2		%arrayidx4 = getelementptr inbounds float, float* %d, i64 2
store float %vecext3, float* %arrayidx4, align 4		store float %vecext3, float* %arrayidx4, align 4
ret void		ret void
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Improve tryStoreMergeOfExtracts to merge stores before type is legalizedNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 203077

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/test/CodeGen/PowerPC/extract-and-store.ll

[DAGCombiner] Improve tryStoreMergeOfExtracts to merge stores before type is legalized
Needs ReviewPublic