This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/2
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/2
SelectionDAG.cpp
-
Target/PowerPC/
-
PowerPC/
-
PPCISelLowering.h
1/7
PPCISelLowering.cpp
-
PPCInstrP10.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
9/12
memset-tail.ll

Differential D138883

[SelectionDAG][PowerPC] Memset reuse vector element for tail store
ClosedPublic

Authored by tingwang on Nov 28 2022, 5:29 PM.

Download Raw Diff

Details

Reviewers

shchenz
nemanjai
rzurob
RKSimon
dmgreen
asavonic
lkail
ecnelises

Group Reviewers

Restricted Project

Commits

rG71be020dda2c: [SelectionDAG][PowerPC] Memset reuse vector element for tail store

Summary

On PPC there are instructions to store element from vector(e.g. stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail constant in memset and constant splat array initialization.

This patch tries to explore these opportunities.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	3,730 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases/Linux::auto_memory_profile_test.cpp

Event Timeline

tingwang created this revision.Nov 28 2022, 5:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 5:29 PM

Herald added subscribers: kbarton, hiraditya. · View Herald Transcript

tingwang requested review of this revision.Nov 28 2022, 5:29 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptNov 28 2022, 5:29 PM

tingwang added a child revision: D138881: [PowerPC][NFC] Add test case for memset tail store.Nov 28 2022, 5:29 PM

Harbormaster completed remote builds in B199907: Diff 478424.Nov 28 2022, 5:29 PM

lkail added a reviewer: RKSimon.Nov 28 2022, 5:44 PM

I think this is useful, but we should ensure we can get rid of the swap that this introduces (in a separate patch).

llvm/include/llvm/CodeGen/TargetLowering.h
677	Do we need this? Can `canCombineStoreAndExtract()` suffice for this purpose?
llvm/test/CodeGen/PowerPC/memset-tail.ll
197	Why do we now get the redundant swap for the vector store that we didn't get before? Was it eliminated by the swap elimination before and now it is not because we have a use of the partial vector?

nemanjai added inline comments.Nov 28 2022, 8:00 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7586	Can we reduce the nesting here by converting this to `else if`?

Update according to comments:
(1) Use existing canCombineStoreAndExtract() instead of creating new.
(2) Nest the else statement properly.
(3) Saw two cases changed due to (1).

Harbormaster completed remote builds in B199988: Diff 478535.Nov 29 2022, 5:15 AM

tingwang marked an inline comment as done.Nov 29 2022, 5:20 AM

tingwang added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
677	Thank you! I did a try and updated patch. Saw two cases changed. I will look into the detail tomorrow.
llvm/test/CodeGen/PowerPC/memset-tail.ll
197	Debug-only `ppc-vsx-swaps` shows "Web 0 rejected for physreg, partial reg, or not swap[pable]". I will look into it and probably post another patch to fix the issue. Thank you!

Realized I need to use PPCTTIImpl::getVectorInstrCost() API to determine the cost of instructions. I'm working on it now.

Changes in this update:
(1) I was trying to use TTI.getVectorInstrCost() to query instruction cost in PPCTargetLowering::canCombineStoreAndExtract(). However not able to reach TTI, and didn't find any reference to do that in SDAG. Given that the original implementation of canCombineStoreAndExtract() on ARM implemented its own logic to calculate Cost, followed the approach and implemented logic by referring to PPCTTIImpl::getVectorInstrCost().

(2) Refactored logic inside getMemsetStores(). For PPC CombineStoreAndExtract is beneficial on some specific element index according to endianness and instruction (StoreAndExtract elements on other indexes requires vector permutation, which makes the whole idea less attractive). Since this is platform independent logic, I'm querying the cost for indexes, and pick the least cost to do the combine.

tingwang added inline comments.Nov 29 2022, 11:57 PM

llvm/test/CodeGen/PowerPC/p10-fi-elim.ll
31 ↗	(On Diff #478808)	Instruction sequence change in `PowerPC/p10-fi-elim.ll` is result of `CodeGenPrepare::optimizeExtractElementInst()` now can generate combined pattern since we enabled `canCombineStoreAndExtract()`. Seems we can avoid two mfvsrd instructions.

tingwang added inline comments.Nov 29 2022, 11:59 PM

llvm/test/CodeGen/ARM/memset-align.ll
21 ↗	(On Diff #478808)	Hello @dmgreen @asavonic. This patch tries to reuse vector element for the tail store in memset by implementing `canCombineStoreAndExtract()` on PPC. This changed introduced test case change on ARM in llvm/test/CodeGen/ARM/memset-align.ll. Could you please help me check if the change looks good or not? Thank you! Looked into the scenario on ARM, if the i8 fill value of memset is zero, it creates vector for the initial 16B, and constant tail for the remaining bytes, which exactly hit this patch's scenario. For other values, it creates i32 for memset and will not be impacted by this patch.

Add memset-tail.ll changes.

Harbormaster completed remote builds in B200192: Diff 478827.Nov 30 2022, 12:10 AM

I would expect not only memset, some consecutive stores could also reuse the result of vector split, see https://godbolt.org/z/77aMvncb4.
For

void foo(long a[3]) {
    a[0] = 12;
    a[1] = 12;
    a[2] = 12;
}

foo(long*):                               # @foo(long*)
        .quad   .Lfunc_begin0
        .quad   .TOC.@tocbase
        .quad   0
.Lfunc_begin0:
        xxlxor 0, 0, 0
        li 4, 12
        xxsplti32dx 0, 1, 12
        std 4, 16(3)
        stxv 0, 0(3)
        blr
        .long   0
        .quad   0

We don't reuse the result of xxsplti32dx.

In D138883#3962255, @lkail wrote:
I would expect not only memset, some consecutive stores could also reuse the result of vector split, see https://godbolt.org/z/77aMvncb4.
For
void foo(long a[3]) {
    a[0] = 12;
    a[1] = 12;
    a[2] = 12;
}

foo(long*):                               # @foo(long*)
        .quad   .Lfunc_begin0
        .quad   .TOC.@tocbase
        .quad   0
.Lfunc_begin0:
        xxlxor 0, 0, 0
        li 4, 12
        xxsplti32dx 0, 1, 12
        std 4, 16(3)
        stxv 0, 0(3)
        blr
        .long   0
        .quad   0
We don't reuse the result of xxsplti32dx.

Sure. The posted IR could be handled by DAGCombiner::mergeConsecutiveStores(), and I agree similar combine can be applied there. But for this case, memset stores are volatile, and DAGCombiner::getStoreMergeCandidates() does not accept volatile store currently.

asavonic added inline comments.Dec 1 2022, 12:22 PM

llvm/test/CodeGen/ARM/memset-align.ll
21 ↗	(On Diff #478808)	This looks fine to me. VST1 and scalar STR seem equivalent in this case, if I'm reading the docs right.

tingwang added inline comments.Dec 1 2022, 3:45 PM

llvm/test/CodeGen/ARM/memset-align.ll
21 ↗	(On Diff #478808)	This looks fine to me. VST1 and scalar STR seem equivalent in this case, if I'm reading the docs right. Thank you for the confirm!

Test case update.

Harbormaster completed remote builds in B200714: Diff 479543.Dec 2 2022, 12:50 AM

tingwang mentioned this in D139193: [PowerPC] remove XXSWAPD after vector splat immediate.Dec 2 2022, 4:42 AM

tingwang added a parent revision: D139193: [PowerPC] remove XXSWAPD after vector splat immediate.

tingwang added a child revision: D139491: [PowerPC] remove XXSWAPD after load from CP which is a splat value.Dec 6 2022, 5:07 PM

In D138883#3955904, @nemanjai wrote:

I think this is useful, but we should ensure we can get rid of the swap that this introduces (in a separate patch).

According to my test case, there are two kinds of swap in LE: (1) swap after vector splat immediate; (2) swap after load from constant-pool. Submitted two patches D139193 and D139491 to address them separately.

shchenz mentioned this in D138881: [PowerPC][NFC] Add test case for memset tail store.Dec 7 2022, 12:37 AM

Gentle ping.

Update patch as following test case pattern changed:
llvm/test/CodeGen/ARM/memset-align.ll

Harbormaster completed remote builds in B205563: Diff 486119.Jan 3 2023, 5:15 PM

Gentle ping.

Rebase && Gentle ping.

Harbormaster completed remote builds in B210890: Diff 493461.Jan 30 2023, 5:39 PM

tingwang mentioned this in D139691: [PowerPC] add a peephole to remove redundant swap instructions created by expandVSXStoreForLE.Feb 5 2023, 5:46 PM

While I am not principally against this approach, it doesn't really give me a great feeling going in this direction. The issue is more widespread than just memset/memcpy/memmove as Kai's example illustrates.
I wonder if it would be a more complete solution to add a DAG combine that looks up the chain from the store to see if a store of a splat of the same value exists. That should certainly cover both examples.

I will try to get similar results by DAG combine. Thanks to Nemanja and Kai's insight!

tingwang added a parent revision: D144235: [PowerPC][NFC] add const-splat-array-init.ll.Feb 16 2023, 10:52 PM

tingwang retitled this revision from [SelectionDAG][PowerPC] Memset reuse vector element for tail store to [PowerPC] find and reuse ConstantSplatVector to combine constant store into extract and store.Feb 16 2023, 11:06 PM

tingwang edited the summary of this revision. (Show Details)

tingwang added reviewers: lkail, ecnelises.

tingwang removed a project: Restricted Project.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 11:06 PM

Redo the implementation, and now both memset and constant splat array initialization get changed.

Herald added a subscriber: qcolombet. · View Herald TranscriptFeb 16 2023, 11:12 PM

Harbormaster completed remote builds in B214332: Diff 498255.Feb 16 2023, 11:13 PM

Plan to continue improve the patch...

(1) Format code to follow coding style guidance.
(2) Fix SplatValue check.
(3) Remaining redundant instructions like mtfprd will be fixed in separate patches.

Harbormaster completed remote builds in B216705: Diff 501485.Mar 1 2023, 6:11 AM

(1) Update element type ElemTy which now matches the type expected by both STFIWX and STXSIX PPCISD nodes.
(2) Add missing match pattern for PPCstxsix.

Harbormaster completed remote builds in B216925: Diff 501775.Mar 2 2023, 12:40 AM

Update StoreSizeInBits check to skip on PowerOf2 bit size less than 8.

Harbormaster completed remote builds in B217094: Diff 502028.Mar 2 2023, 5:15 PM

Attempt to push the logic into DAGCombiner::mergeConsecutiveStores()...

Sorry, I had unsubmitted comments. Not sure if they still apply.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15091	Nit: you don't need the name of the function here.
15105	What if `dyn_cast` returns `null` (i.e. if operand 1 is not a constant)?
15114–15116	We don't need to construct an `APInt` just to check whether it is a power of 2. You can just use `isPowerOf2_64()` from `MathExtras.h`.

In D138883#4181453, @nemanjai wrote:

Sorry, I had unsubmitted comments. Not sure if they still apply.

Hi Nemanja,

Appreciate your help! I planned to change due to the reason that DAGCombiner::getStoreMergeCandidates() already walks through chain of stores, and I realized that it could be a better place to find candidate for this opportunity. By the way, maybe the criteria of splat of constant could be relieved to just match the subsection that is extracted by the target store, and I would like to have a try.

I hope next version will be final for review. Thank you again for taking time looking into this!

Since canCombineStoreAndExtract() target hook looks not good on PPC, reconsider use this approach to do the combine.

Change in this version:
(1) Address comments from previous review.
(2) Add check to make sure do not combine on truncated stores or those stores that return value.

Harbormaster completed remote builds in B223906: Diff 511247.Apr 5 2023, 6:32 PM

Gentle ping. Since the alternative path (D146602/D146610) looks not good, shall we take this approach forward? Any comments are welcome. Thank you!

Minor update:
(1) Add bitwidth is multiple of check for isSplat() call.
(2) Reduce MaxSearchNodes from 4 to 3, this is the minimum setting to allow target patterns in test cases.

And ping...

Harbormaster completed remote builds in B237197: Diff 529204.Jun 7 2023, 1:01 AM

Rebase and added some comments.

Hi @nemanjai, I accepted and addressed your previous comments. Do you have any more concerns on the approach that is implemented here? Thank you!

Harbormaster completed remote builds in B243642: Diff 537959.Jul 6 2023, 8:21 PM

In D138883#4479380, @tingwang wrote:

Rebase and added some comments.

Hi @nemanjai, I accepted and addressed your previous comments. Do you have any more concerns on the approach that is implemented here? Thank you!

I am really sorry about the delay...
While I am not completely opposed to this, it seems like a fair bit of machinery to add for something that we could solve more simply with unaligned stores (i.e. the same way we would codegen a memset where the tail is *not* a power of 2).

I don't think that

xxspltib 0, 165
li 4, 16
stxsibx 0, 3, 4
stxv 0, 0(3)

is any better than

xxspltib 0, 165
li 4, 1
stxvx 0, 3, 4
stxv 0, 0(3)

Is it possible to do something like that without walking the chain and with existing capabilities?

Hi @nemanjai, appreciate your time looking into this patch. Thank you!

I agree with you, and I think walking the chain is burning CPU cycles without achieving anything. I realized it is difficult for me to take both targets (the original memset case, and the one @lkail mentioned) in this patch, so I would like to drop the second target, in order to focus on the first one.

I like the idea to use unaligned store, and quickly tested that to see if any potential issue. Created memset.c with multiple memset(p, 0xXY, 24); lines to stress the performance. According to my numbers from Power10, use extract-and-store (https://reviews.llvm.org/D138883?id=493461) got 17% faster than baseline, whereas unaligned store got about 30% slower than baseline.

From performance perspective, I think I should pursuit the original approach. However since canCombineStoreAndExtract target hook has been proved not beneficial (https://reviews.llvm.org/D146602) on PPC, I probably need to create one for PPC only at this moment.

Let me know if any comments. I will post patch shortly.

memset.c92 KBDownload

Return to the initial proposal after exploring different approaches. Since canCombineStoreAndExtract() is not beneficial to PPC, created another filter for PPC.

Harbormaster completed remote builds in B244374: Diff 538961.Jul 11 2023, 1:45 AM

tingwang added inline comments.Jul 11 2023, 1:50 AM

llvm/test/CodeGen/PowerPC/memset-tail.ll
197	Will be eliminated by https://reviews.llvm.org/D139193.
236	Plan to address this pattern in separate patch.
246	Will be eliminated by https://reviews.llvm.org/D139193.
258	Plan to address this pattern in separate patch.
297	Will be eliminated by https://reviews.llvm.org/D139193.
350	Will be eliminated by https://reviews.llvm.org/D139193.
709	Plan to address this pattern in separate patch.
732	Plan to address this pattern in separate patch.

tingwang mentioned this in rG0bcef1d93de8: [PowerPC] remove XXSWAPD after vector splat immediate.Jul 11 2023, 9:59 PM

(1) Rebase after commit D139193.
(2) Add two P10 patterns to match extract-and-store.

Now test case is clean. Gentle ping.

Harbormaster completed remote builds in B244667: Diff 539375.Jul 12 2023, 1:54 AM

Rebase && Ping.

Harbormaster completed remote builds in B249970: Diff 546725.Aug 3 2023, 1:57 AM

Gentle ping...

Herald added a subscriber: sunshaoce. · View Herald TranscriptAug 20 2023, 5:18 PM

shchenz added inline comments.Sep 4 2023, 7:45 PM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
17096	If using `stfd` is allowed for tail size 5/6/7, then can we use `stfd` for tail size 3/4 too? (I assume the change here impacts cases `memsetTailV1B3` and `memsetTailV1B4`?)
llvm/test/CodeGen/PowerPC/memset-tail.ll
244	This seems a legacy issue because I also found same issue in case `memsetTailV1B12` and also from the left side of this case. Is it safe to extend the store length from 23 bytes to 32(or 24) bytes here? There is no clue saying that memory after `(char *)p + 7` is writable by the user? The related logic is in `allowsMisalignedMemoryAccesses()`. But is it correct that we can safely assume this memset can write more memory even this memset handles aligned memory? What do you think? @nemanjai

shchenz added inline comments.Sep 4 2023, 8:12 PM

llvm/test/CodeGen/PowerPC/memset-tail.ll
244	Sorry, please ignore this comment. I didn't realize that the two stores `stxsdx` and `stxvd2x` have overlaps. So the real write size is not extended.

tingwang added inline comments.Sep 5 2023, 12:30 AM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
17096	It seems `TargetLowering::findOptimalMemOpLowering()` decides the type of each store. I guess if we change the type for the size 3/4 case from i32 to i64, then it will result in stfd.

This LGTM with some nits.

Let's first target for the memset cases as this is the common case where splat values happens.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7565	Nit: this comment needs update?
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1639	nit: We may need comments here why we don't try to extract constant for `ElemSizeInBits` 8/16. (I guess the reason is we don't have benefit as we need `li` to load the index and this `li` can also be used to load the 8/16 bit imm?
17096	Thanks. Better to add some comment here why we need to set the type to `MVT::v8i16`

This revision is now accepted and ready to land.Sep 5 2023, 7:29 PM

In D138883#4639161, @shchenz wrote:

This LGTM with some nits.

Let's first target for the memset cases as this is the common case where splat values happens.

Thank you! I will address the remaining comments in the commit.

Closed by commit rG71be020dda2c: [SelectionDAG][PowerPC] Memset reuse vector element for tail store (authored by tingwang). · Explain WhySep 5 2023, 10:55 PM

This revision was automatically updated to reflect the committed changes.

tingwang added a commit: rG71be020dda2c: [SelectionDAG][PowerPC] Memset reuse vector element for tail store.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

9 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

16 lines

Target/

PowerPC/

PPCISelLowering.h

5 lines

PPCISelLowering.cpp

38 lines

PPCInstrP10.td

7 lines

test/

CodeGen/

PowerPC/

memset-tail.ll

236 lines

Diff 539375

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 668 Lines • ▼ Show 20 Lines	public:
/// from a pair of smaller values into multiple stores.		/// from a pair of smaller values into multiple stores.
virtual bool isMultiStoresCheaperThanBitsMerge(EVT LTy, EVT HTy) const {		virtual bool isMultiStoresCheaperThanBitsMerge(EVT LTy, EVT HTy) const {
return false;		return false;
}		}

/// Return if the target supports combining a		/// Return if the target supports combining a
/// chain like:		/// chain like:
/// \code		/// \code
/// %andResult = and %val1, #mask		/// %andResult = and %val1, #mask
		nemanjaiUnsubmitted Not Done Reply Inline Actions Do we need this? Can `canCombineStoreAndExtract()` suffice for this purpose? nemanjai: Do we need this? Can `canCombineStoreAndExtract()` suffice for this purpose?
		tingwangAuthorUnsubmitted Done Reply Inline Actions Thank you! I did a try and updated patch. Saw two cases changed. I will look into the detail tomorrow. tingwang: Thank you! I did a try and updated patch. Saw two cases changed. I will look into the detail…
/// %icmpResult = icmp %andResult, 0		/// %icmpResult = icmp %andResult, 0
/// \endcode		/// \endcode
/// into a single machine instruction of a form like:		/// into a single machine instruction of a form like:
/// \code		/// \code
/// cc = test %register, #mask		/// cc = test %register, #mask
/// \endcode		/// \endcode
virtual bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const {		virtual bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const {
return false;		return false;
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	public:
/// Return true if the target can combine store(extractelement VectorTy,		/// Return true if the target can combine store(extractelement VectorTy,
/// Idx).		/// Idx).
/// \p Cost[out] gives the cost of that transformation when this is true.		/// \p Cost[out] gives the cost of that transformation when this is true.
virtual bool canCombineStoreAndExtract(Type VectorTy, Value Idx,		virtual bool canCombineStoreAndExtract(Type VectorTy, Value Idx,
unsigned &Cost) const {		unsigned &Cost) const {
return false;		return false;
}		}

		/// Return true if the target shall perform extract vector element and store
		/// given that the vector is known to be splat of constant.
		/// \p Index[out] gives the index of the vector element to be extracted when
		/// this is true.
		virtual bool shallExtractConstSplatVectorElementToStore(
		Type *VectorTy, unsigned ElemSizeInBits, unsigned &Index) const {
		return false;
		}

/// Return true if inserting a scalar into a variable element of an undef		/// Return true if inserting a scalar into a variable element of an undef
/// vector is more efficiently handled by splatting the scalar instead.		/// vector is more efficiently handled by splatting the scalar instead.
virtual bool shouldSplatInsEltVarIndex(EVT) const {		virtual bool shouldSplatInsEltVarIndex(EVT) const {
return false;		return false;
}		}

/// Return true if target always benefits from combining into FMA for a		/// Return true if target always benefits from combining into FMA for a
/// given value type. This must typically return false on targets where FMA		/// given value type. This must typically return false on targets where FMA
▲ Show 20 Lines • Show All 4,498 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,556 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < NumMemOps; i++) {
if (VTSize > Size) {		if (VTSize > Size) {
// Issuing an unaligned load / store pair that overlaps with the previous		// Issuing an unaligned load / store pair that overlaps with the previous
// pair. Adjust the offset accordingly.		// pair. Adjust the offset accordingly.
assert(i == NumMemOps-1 && i != 0);		assert(i == NumMemOps-1 && i != 0);
DstOff -= VTSize - Size;		DstOff -= VTSize - Size;
}		}

// If this store is smaller than the largest store see whether we can get		// If this store is smaller than the largest store see whether we can get
// the smaller value for free with a truncate.		// the smaller value for free with a truncate.
		shchenzUnsubmitted Not Done Reply Inline Actions Nit: this comment needs update? shchenz: Nit: this comment needs update?
SDValue Value = MemSetValue;		SDValue Value = MemSetValue;
if (VT.bitsLT(LargestVT)) {		if (VT.bitsLT(LargestVT)) {
		unsigned Index;
		unsigned NElts = LargestVT.getSizeInBits() / VT.getSizeInBits();
		EVT SVT = EVT::getVectorVT(*DAG.getContext(), VT.getScalarType(), NElts);
if (!LargestVT.isVector() && !VT.isVector() &&		if (!LargestVT.isVector() && !VT.isVector() &&
TLI.isTruncateFree(LargestVT, VT))		TLI.isTruncateFree(LargestVT, VT))
Value = DAG.getNode(ISD::TRUNCATE, dl, VT, MemSetValue);		Value = DAG.getNode(ISD::TRUNCATE, dl, VT, MemSetValue);
else		else if (LargestVT.isVector() && !VT.isVector() &&
		TLI.shallExtractConstSplatVectorElementToStore(
		LargestVT.getTypeForEVT(*DAG.getContext()),
		VT.getSizeInBits(), Index) &&
		TLI.isTypeLegal(SVT) &&
		LargestVT.getSizeInBits() == SVT.getSizeInBits()) {
		// Target which can combine store(extractelement VectorTy, Idx) can get
		// the smaller value for free.
		SDValue TailValue = DAG.getNode(ISD::BITCAST, dl, SVT, MemSetValue);
		Value = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, TailValue,
		DAG.getVectorIdxConstant(Index, dl));
		} else
Value = getMemsetValue(Src, VT, DAG, dl);		Value = getMemsetValue(Src, VT, DAG, dl);
		nemanjaiUnsubmitted Done Reply Inline Actions Can we reduce the nesting here by converting this to `else if`? nemanjai: Can we reduce the nesting here by converting this to `else if`?
}		}
assert(Value.getValueType() == VT && "Value with wrong type.");		assert(Value.getValueType() == VT && "Value with wrong type.");
SDValue Store = DAG.getStore(		SDValue Store = DAG.getStore(
Chain, dl, Value,		Chain, dl, Value,
DAG.getMemBasePlusOffset(Dst, TypeSize::Fixed(DstOff), dl),		DAG.getMemBasePlusOffset(Dst, TypeSize::Fixed(DstOff), dl),
DstPtrInfo.getWithOffset(DstOff), Alignment,		DstPtrInfo.getWithOffset(DstOff), Alignment,
isVol ? MachineMemOperand::MOVolatile : MachineMemOperand::MONone,		isVol ? MachineMemOperand::MOVolatile : MachineMemOperand::MONone,
NewAAInfo);		NewAAInfo);
▲ Show 20 Lines • Show All 5,044 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 785 Lines • ▼ Show 20 Lines	public:
bool isCheapToSpeculateCttz(Type *Ty) const override {		bool isCheapToSpeculateCttz(Type *Ty) const override {
return true;		return true;
}		}

bool isCheapToSpeculateCtlz(Type *Ty) const override {		bool isCheapToSpeculateCtlz(Type *Ty) const override {
return true;		return true;
}		}

		bool
		shallExtractConstSplatVectorElementToStore(Type *VectorTy,
		unsigned ElemSizeInBits,
		unsigned &Index) const override;

bool isCtlzFast() const override {		bool isCtlzFast() const override {
return true;		return true;
}		}

bool isEqualityCmpFoldedWithSignedCmp() const override {		bool isEqualityCmpFoldedWithSignedCmp() const override {
return false;		return false;
}		}

▲ Show 20 Lines • Show All 696 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,624 Lines • ▼ Show 20 Lines
bool PPCTargetLowering::hasSPE() const {		bool PPCTargetLowering::hasSPE() const {
return Subtarget.hasSPE();		return Subtarget.hasSPE();
}		}

bool PPCTargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {		bool PPCTargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {
return VT.isScalarInteger();		return VT.isScalarInteger();
}		}

		bool PPCTargetLowering::shallExtractConstSplatVectorElementToStore(
		Type *VectorTy, unsigned ElemSizeInBits, unsigned &Index) const {
		if (!Subtarget.isPPC64() \|\| !Subtarget.hasVSX())
		return false;

		if (auto *VTy = dyn_cast<VectorType>(VectorTy)) {
		if (VTy->getScalarType()->isIntegerTy()) {
		shchenzUnsubmitted Not Done Reply Inline Actions nit: We may need comments here why we don't try to extract constant for `ElemSizeInBits` 8/16. (I guess the reason is we don't have benefit as we need `li` to load the index and this `li` can also be used to load the 8/16 bit imm? shchenz: nit: We may need comments here why we don't try to extract constant for `ElemSizeInBits` 8/16.
		if (ElemSizeInBits == 32) {
		Index = Subtarget.isLittleEndian() ? 2 : 1;
		return true;
		}
		if (ElemSizeInBits == 64) {
		Index = Subtarget.isLittleEndian() ? 1 : 0;
		return true;
		}
		}
		}
		return false;
		}

const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {		const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
switch ((PPCISD::NodeType)Opcode) {		switch ((PPCISD::NodeType)Opcode) {
case PPCISD::FIRST_NUMBER: break;		case PPCISD::FIRST_NUMBER: break;
case PPCISD::FSEL: return "PPCISD::FSEL";		case PPCISD::FSEL: return "PPCISD::FSEL";
case PPCISD::XSMAXC: return "PPCISD::XSMAXC";		case PPCISD::XSMAXC: return "PPCISD::XSMAXC";
case PPCISD::XSMINC: return "PPCISD::XSMINC";		case PPCISD::XSMINC: return "PPCISD::XSMINC";
case PPCISD::FCFID: return "PPCISD::FCFID";		case PPCISD::FCFID: return "PPCISD::FCFID";
case PPCISD::FCFIDU: return "PPCISD::FCFIDU";		case PPCISD::FCFIDU: return "PPCISD::FCFIDU";
▲ Show 20 Lines • Show All 13,422 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::expandVSXStoreForLE(SDNode *N,
SDValue StoreOps[] = { Chain, Swap, Base };		SDValue StoreOps[] = { Chain, Swap, Base };
SDValue Store = DAG.getMemIntrinsicNode(PPCISD::STXVD2X, dl,		SDValue Store = DAG.getMemIntrinsicNode(PPCISD::STXVD2X, dl,
DAG.getVTList(MVT::Other),		DAG.getVTList(MVT::Other),
StoreOps, VecTy, MMO);		StoreOps, VecTy, MMO);
DCI.AddToWorklist(Store.getNode());		DCI.AddToWorklist(Store.getNode());
return Store;		return Store;
}		}

// Handle DAG combine for STORE (FP_TO_INT F).		// Handle DAG combine for STORE (FP_TO_INT F).
		nemanjaiUnsubmitted Not Done Reply Inline Actions Nit: you don't need the name of the function here. nemanjai: Nit: you don't need the name of the function here.
SDValue PPCTargetLowering::combineStoreFPToInt(SDNode *N,		SDValue PPCTargetLowering::combineStoreFPToInt(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDLoc dl(N);		SDLoc dl(N);
unsigned Opcode = N->getOperand(1).getOpcode();		unsigned Opcode = N->getOperand(1).getOpcode();
(void)Opcode;		(void)Opcode;
bool Strict = N->getOperand(1)->isStrictFPOpcode();		bool Strict = N->getOperand(1)->isStrictFPOpcode();

assert((Opcode == ISD::FP_TO_SINT \|\| Opcode == ISD::FP_TO_UINT \|\|		assert((Opcode == ISD::FP_TO_SINT \|\| Opcode == ISD::FP_TO_UINT \|\|
Opcode == ISD::STRICT_FP_TO_SINT \|\| Opcode == ISD::STRICT_FP_TO_UINT)		Opcode == ISD::STRICT_FP_TO_SINT \|\| Opcode == ISD::STRICT_FP_TO_UINT)
&& "Not a FP_TO_INT Instruction!");		&& "Not a FP_TO_INT Instruction!");

SDValue Val = N->getOperand(1).getOperand(Strict ? 1 : 0);		SDValue Val = N->getOperand(1).getOperand(Strict ? 1 : 0);
EVT Op1VT = N->getOperand(1).getValueType();		EVT Op1VT = N->getOperand(1).getValueType();
		nemanjaiUnsubmitted Not Done Reply Inline Actions What if `dyn_cast` returns `null` (i.e. if operand 1 is not a constant)? nemanjai: What if `dyn_cast` returns `null` (i.e. if operand 1 is not a constant)?
EVT ResVT = Val.getValueType();		EVT ResVT = Val.getValueType();

if (!Subtarget.hasVSX() \|\| !Subtarget.hasFPCVT() \|\| !isTypeLegal(ResVT))		if (!Subtarget.hasVSX() \|\| !Subtarget.hasFPCVT() \|\| !isTypeLegal(ResVT))
return SDValue();		return SDValue();

// Only perform combine for conversion to i64/i32 or power9 i16/i8.		// Only perform combine for conversion to i64/i32 or power9 i16/i8.
bool ValidTypeForStoreFltAsInt =		bool ValidTypeForStoreFltAsInt =
(Op1VT == MVT::i32 \|\| (Op1VT == MVT::i64 && Subtarget.isPPC64()) \|\|		(Op1VT == MVT::i32 \|\| (Op1VT == MVT::i64 && Subtarget.isPPC64()) \|\|
(Subtarget.hasP9Vector() && (Op1VT == MVT::i16 \|\| Op1VT == MVT::i8)));		(Subtarget.hasP9Vector() && (Op1VT == MVT::i16 \|\| Op1VT == MVT::i8)));

// TODO: Lower conversion from f128 on all VSX targets		// TODO: Lower conversion from f128 on all VSX targets
		nemanjaiUnsubmitted Not Done Reply Inline Actions We don't need to construct an `APInt` just to check whether it is a power of 2. You can just use `isPowerOf2_64()` from `MathExtras.h`. nemanjai: We don't need to construct an `APInt` just to check whether it is a power of 2. You can just…
if (ResVT == MVT::ppcf128 \|\| (ResVT == MVT::f128 && !Subtarget.hasP9Vector()))		if (ResVT == MVT::ppcf128 \|\| (ResVT == MVT::f128 && !Subtarget.hasP9Vector()))
return SDValue();		return SDValue();

if ((Op1VT != MVT::i64 && !Subtarget.hasP8Vector()) \|\|		if ((Op1VT != MVT::i64 && !Subtarget.hasP8Vector()) \|\|
cast<StoreSDNode>(N)->isTruncatingStore() \|\| !ValidTypeForStoreFltAsInt)		cast<StoreSDNode>(N)->isTruncatingStore() \|\| !ValidTypeForStoreFltAsInt)
return SDValue();		return SDValue();

Val = convertFPToInt(N->getOperand(1), DAG, Subtarget);		Val = convertFPToInt(N->getOperand(1), DAG, Subtarget);
▲ Show 20 Lines • Show All 1,957 Lines • ▼ Show 20 Lines

/// It returns EVT::Other if the type should be determined using generic		/// It returns EVT::Other if the type should be determined using generic
/// target-independent logic.		/// target-independent logic.
EVT PPCTargetLowering::getOptimalMemOpType(		EVT PPCTargetLowering::getOptimalMemOpType(
const MemOp &Op, const AttributeList &FuncAttributes) const {		const MemOp &Op, const AttributeList &FuncAttributes) const {
if (getTargetMachine().getOptLevel() != CodeGenOpt::None) {		if (getTargetMachine().getOptLevel() != CodeGenOpt::None) {
// We should use Altivec/VSX loads and stores when available. For unaligned		// We should use Altivec/VSX loads and stores when available. For unaligned
// addresses, unaligned VSX loads are only fast starting with the P8.		// addresses, unaligned VSX loads are only fast starting with the P8.
if (Subtarget.hasAltivec() && Op.size() >= 16 &&		if (Subtarget.hasAltivec() && Op.size() >= 16) {
(Op.isAligned(Align(16)) \|\|		if (Op.isMemset() && Subtarget.hasVSX()) {
((Op.isMemset() && Subtarget.hasVSX()) \|\| Subtarget.hasP8Vector())))		uint64_t TailSize = Op.size() % 16;
		// For memset lowering, tail size need be different from vector element
		// size to allow borrow tail from vector, otherwise constant tail will
		// be generated.
		if (TailSize > 2 && TailSize <= 4) {
		shchenzUnsubmitted Not Done Reply Inline Actions If using `stfd` is allowed for tail size 5/6/7, then can we use `stfd` for tail size 3/4 too? (I assume the change here impacts cases `memsetTailV1B3` and `memsetTailV1B4`?) shchenz: If using `stfd` is allowed for tail size 5/6/7, then can we use `stfd` for tail size 3/4 too?
		tingwangAuthorUnsubmitted Done Reply Inline Actions It seems `TargetLowering::findOptimalMemOpLowering()` decides the type of each store. I guess if we change the type for the size 3/4 case from i32 to i64, then it will result in stfd. tingwang: It seems `TargetLowering::findOptimalMemOpLowering()` decides the type of each store. I guess…
		shchenzUnsubmitted Not Done Reply Inline Actions Thanks. Better to add some comment here why we need to set the type to `MVT::v8i16` shchenz: Thanks. Better to add some comment here why we need to set the type to `MVT::v8i16`
		return MVT::v8i16;
		}
return MVT::v4i32;		return MVT::v4i32;
}		}
		if (Op.isAligned(Align(16)) \|\| Subtarget.hasP8Vector())
		return MVT::v4i32;
		}
		}

if (Subtarget.isPPC64()) {		if (Subtarget.isPPC64()) {
return MVT::i64;		return MVT::i64;
}		}

return MVT::i32;		return MVT::i32;
}		}

▲ Show 20 Lines • Show All 1,492 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrP10.td

Show First 20 Lines • Show All 2,025 Lines • ▼ Show 20 Lines	def : Pat<(store (i64 (extractelt v2i64:$src, 0)), ForceXForm:$dst),
(STXVRDX $src, ForceXForm:$dst)>;		(STXVRDX $src, ForceXForm:$dst)>;
def : Pat<(store (f64 (extractelt v2f64:$src, 0)), ForceXForm:$dst),		def : Pat<(store (f64 (extractelt v2f64:$src, 0)), ForceXForm:$dst),
(STXVRDX $src, ForceXForm:$dst)>;		(STXVRDX $src, ForceXForm:$dst)>;
// Load element 0 of a VSX register to memory		// Load element 0 of a VSX register to memory
def : Pat<(v8i16 (scalar_to_vector (i32 (extloadi16 ForceXForm:$src)))),		def : Pat<(v8i16 (scalar_to_vector (i32 (extloadi16 ForceXForm:$src)))),
(v8i16 (COPY_TO_REGCLASS (LXVRHX ForceXForm:$src), VSRC))>;		(v8i16 (COPY_TO_REGCLASS (LXVRHX ForceXForm:$src), VSRC))>;
def : Pat<(v16i8 (scalar_to_vector (i32 (extloadi8 ForceXForm:$src)))),		def : Pat<(v16i8 (scalar_to_vector (i32 (extloadi8 ForceXForm:$src)))),
(v16i8 (COPY_TO_REGCLASS (LXVRBX ForceXForm:$src), VSRC))>;		(v16i8 (COPY_TO_REGCLASS (LXVRBX ForceXForm:$src), VSRC))>;
		def : Pat<(store (i64 (extractelt v2i64:$A, 1)), ForceXForm:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), ForceXForm:$src)>;
		}

		let Predicates = [IsISA3_1, IsBigEndian] in {
		def : Pat<(store (i64 (extractelt v2i64:$A, 0)), ForceXForm:$src),
		(XFSTOREf64 (EXTRACT_SUBREG $A, sub_64), ForceXForm:$src)>;
}		}

// FIXME: The swap is overkill when the shift amount is a constant.		// FIXME: The swap is overkill when the shift amount is a constant.
// We should just fix the constant in the DAG.		// We should just fix the constant in the DAG.
let AddedComplexity = 400, Predicates = [IsISA3_1, HasVSX] in {		let AddedComplexity = 400, Predicates = [IsISA3_1, HasVSX] in {
def : Pat<(v1i128 (shl v1i128:$VRA, v1i128:$VRB)),		def : Pat<(v1i128 (shl v1i128:$VRA, v1i128:$VRB)),
(v1i128 (VSLQ v1i128:$VRA,		(v1i128 (VSLQ v1i128:$VRA,
(XXPERMDI (COPY_TO_REGCLASS $VRB, VSRC),		(XXPERMDI (COPY_TO_REGCLASS $VRB, VSRC),
(COPY_TO_REGCLASS $VRB, VSRC), 2)))>;		(COPY_TO_REGCLASS $VRB, VSRC), 2)))>;
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/memset-tail.ll

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	entry:
tail call void @llvm.memset.p0.i64(ptr %p, i8 15, i64 25, i1 false)		tail call void @llvm.memset.p0.i64(ptr %p, i8 15, i64 25, i1 false)
ret void		ret void
}		}

define dso_local void @memsetTailV1B8(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memsetTailV1B8(ptr nocapture noundef writeonly %p) local_unnamed_addr {
; P8-BE-LABEL: memsetTailV1B8:		; P8-BE-LABEL: memsetTailV1B8:
; P8-BE: # %bb.0: # %entry		; P8-BE: # %bb.0: # %entry
; P8-BE-NEXT: vspltisb 2, 15		; P8-BE-NEXT: vspltisb 2, 15
; P8-BE-NEXT: lis 4, 3855		; P8-BE-NEXT: li 4, 16
; P8-BE-NEXT: ori 4, 4, 3855		; P8-BE-NEXT: stxsdx 34, 3, 4
; P8-BE-NEXT: rldimi 4, 4, 32, 0
; P8-BE-NEXT: stxvw4x 34, 0, 3		; P8-BE-NEXT: stxvw4x 34, 0, 3
; P8-BE-NEXT: std 4, 16(3)
; P8-BE-NEXT: blr		; P8-BE-NEXT: blr
;		;
; P9-BE-LABEL: memsetTailV1B8:		; P9-BE-LABEL: memsetTailV1B8:
; P9-BE: # %bb.0: # %entry		; P9-BE: # %bb.0: # %entry
; P9-BE-NEXT: lis 4, 3855
; P9-BE-NEXT: xxspltib 0, 15		; P9-BE-NEXT: xxspltib 0, 15
; P9-BE-NEXT: ori 4, 4, 3855
; P9-BE-NEXT: stxv 0, 0(3)		; P9-BE-NEXT: stxv 0, 0(3)
; P9-BE-NEXT: rldimi 4, 4, 32, 0		; P9-BE-NEXT: stfd 0, 16(3)
; P9-BE-NEXT: std 4, 16(3)
; P9-BE-NEXT: blr		; P9-BE-NEXT: blr
;		;
; P10-BE-LABEL: memsetTailV1B8:		; P10-BE-LABEL: memsetTailV1B8:
; P10-BE: # %bb.0: # %entry		; P10-BE: # %bb.0: # %entry
; P10-BE-NEXT: pli 4, 252645135
; P10-BE-NEXT: rldimi 4, 4, 32, 0
; P10-BE-NEXT: std 4, 16(3)
; P10-BE-NEXT: xxspltib 0, 15		; P10-BE-NEXT: xxspltib 0, 15
; P10-BE-NEXT: stxv 0, 0(3)		; P10-BE-NEXT: stxv 0, 0(3)
		; P10-BE-NEXT: stfd 0, 16(3)
; P10-BE-NEXT: blr		; P10-BE-NEXT: blr
;		;
; P8-LE-LABEL: memsetTailV1B8:		; P8-LE-LABEL: memsetTailV1B8:
; P8-LE: # %bb.0: # %entry		; P8-LE: # %bb.0: # %entry
; P8-LE-NEXT: lis 4, 3855
; P8-LE-NEXT: vspltisb 2, 15		; P8-LE-NEXT: vspltisb 2, 15
; P8-LE-NEXT: ori 4, 4, 3855		; P8-LE-NEXT: li 4, 16
; P8-LE-NEXT: rldimi 4, 4, 32, 0		; P8-LE-NEXT: stxsdx 34, 3, 4
; P8-LE-NEXT: std 4, 16(3)
; P8-LE-NEXT: stxvd2x 34, 0, 3		; P8-LE-NEXT: stxvd2x 34, 0, 3
; P8-LE-NEXT: blr		; P8-LE-NEXT: blr
		nemanjaiUnsubmitted Not Done Reply Inline Actions Why do we now get the redundant swap for the vector store that we didn't get before? Was it eliminated by the swap elimination before and now it is not because we have a use of the partial vector? nemanjai: Why do we now get the redundant swap for the vector store that we didn't get before? Was it…
		tingwangAuthorUnsubmitted Done Reply Inline Actions Debug-only `ppc-vsx-swaps` shows "Web 0 rejected for physreg, partial reg, or not swap[pable]". I will look into it and probably post another patch to fix the issue. Thank you! tingwang: Debug-only `ppc-vsx-swaps` shows "Web 0 rejected for physreg, partial reg, or not swap[pable]".
		tingwangAuthorUnsubmitted Done Reply Inline Actions Will be eliminated by https://reviews.llvm.org/D139193. tingwang: Will be eliminated by https://reviews.llvm.org/D139193.
;		;
; P9-LE-LABEL: memsetTailV1B8:		; P9-LE-LABEL: memsetTailV1B8:
; P9-LE: # %bb.0: # %entry		; P9-LE: # %bb.0: # %entry
; P9-LE-NEXT: lis 4, 3855
; P9-LE-NEXT: xxspltib 0, 15		; P9-LE-NEXT: xxspltib 0, 15
; P9-LE-NEXT: ori 4, 4, 3855
; P9-LE-NEXT: stxv 0, 0(3)		; P9-LE-NEXT: stxv 0, 0(3)
; P9-LE-NEXT: rldimi 4, 4, 32, 0		; P9-LE-NEXT: stfd 0, 16(3)
; P9-LE-NEXT: std 4, 16(3)
; P9-LE-NEXT: blr		; P9-LE-NEXT: blr
;		;
; P10-LE-LABEL: memsetTailV1B8:		; P10-LE-LABEL: memsetTailV1B8:
; P10-LE: # %bb.0: # %entry		; P10-LE: # %bb.0: # %entry
; P10-LE-NEXT: pli 4, 252645135
; P10-LE-NEXT: rldimi 4, 4, 32, 0
; P10-LE-NEXT: std 4, 16(3)
; P10-LE-NEXT: xxspltib 0, 15		; P10-LE-NEXT: xxspltib 0, 15
; P10-LE-NEXT: stxv 0, 0(3)		; P10-LE-NEXT: stxv 0, 0(3)
		; P10-LE-NEXT: stfd 0, 16(3)
; P10-LE-NEXT: blr		; P10-LE-NEXT: blr
entry:		entry:
tail call void @llvm.memset.p0.i64(ptr %p, i8 15, i64 24, i1 false)		tail call void @llvm.memset.p0.i64(ptr %p, i8 15, i64 24, i1 false)
ret void		ret void
}		}

define dso_local void @memsetTailV1B7(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memsetTailV1B7(ptr nocapture noundef writeonly %p) local_unnamed_addr {
; P8-BE-LABEL: memsetTailV1B7:		; P8-BE-LABEL: memsetTailV1B7:
; P8-BE: # %bb.0: # %entry		; P8-BE: # %bb.0: # %entry
; P8-BE-NEXT: lis 4, 3855
; P8-BE-NEXT: vspltisb 2, 15		; P8-BE-NEXT: vspltisb 2, 15
; P8-BE-NEXT: li 5, 15		; P8-BE-NEXT: li 4, 15
; P8-BE-NEXT: ori 4, 4, 3855		; P8-BE-NEXT: stxsdx 34, 3, 4
; P8-BE-NEXT: rldimi 4, 4, 32, 0
; P8-BE-NEXT: stdx 4, 3, 5
; P8-BE-NEXT: stxvw4x 34, 0, 3		; P8-BE-NEXT: stxvw4x 34, 0, 3
; P8-BE-NEXT: blr		; P8-BE-NEXT: blr
;		;
; P9-BE-LABEL: memsetTailV1B7:		; P9-BE-LABEL: memsetTailV1B7:
; P9-BE: # %bb.0: # %entry		; P9-BE: # %bb.0: # %entry
; P9-BE-NEXT: lis 4, 3855
; P9-BE-NEXT: li 5, 15
; P9-BE-NEXT: ori 4, 4, 3855
; P9-BE-NEXT: rldimi 4, 4, 32, 0
; P9-BE-NEXT: stdx 4, 3, 5
; P9-BE-NEXT: xxspltib 0, 15		; P9-BE-NEXT: xxspltib 0, 15
		; P9-BE-NEXT: stfd 0, 15(3)
; P9-BE-NEXT: stxv 0, 0(3)		; P9-BE-NEXT: stxv 0, 0(3)
; P9-BE-NEXT: blr		; P9-BE-NEXT: blr
;		;
; P10-BE-LABEL: memsetTailV1B7:		; P10-BE-LABEL: memsetTailV1B7:
; P10-BE: # %bb.0: # %entry		; P10-BE: # %bb.0: # %entry
; P10-BE-NEXT: pli 4, 252645135
; P10-BE-NEXT: rldimi 4, 4, 32, 0
; P10-BE-NEXT: pstd 4, 15(3), 0
; P10-BE-NEXT: xxspltib 0, 15		; P10-BE-NEXT: xxspltib 0, 15
		; P10-BE-NEXT: stfd 0, 15(3)
		tingwangAuthorUnsubmitted Done Reply Inline Actions Plan to address this pattern in separate patch. tingwang: Plan to address this pattern in separate patch.
; P10-BE-NEXT: stxv 0, 0(3)		; P10-BE-NEXT: stxv 0, 0(3)
; P10-BE-NEXT: blr		; P10-BE-NEXT: blr
;		;
; P8-LE-LABEL: memsetTailV1B7:		; P8-LE-LABEL: memsetTailV1B7:
; P8-LE: # %bb.0: # %entry		; P8-LE: # %bb.0: # %entry
; P8-LE-NEXT: lis 4, 3855
; P8-LE-NEXT: vspltisb 2, 15		; P8-LE-NEXT: vspltisb 2, 15
; P8-LE-NEXT: li 5, 15		; P8-LE-NEXT: li 4, 15
; P8-LE-NEXT: ori 4, 4, 3855		; P8-LE-NEXT: stxsdx 34, 3, 4
		shchenzUnsubmitted Not Done Reply Inline Actions This seems a legacy issue because I also found same issue in case `memsetTailV1B12` and also from the left side of this case. Is it safe to extend the store length from 23 bytes to 32(or 24) bytes here? There is no clue saying that memory after `(char )p + 7` is writable by the user? The related logic is in `allowsMisalignedMemoryAccesses()`. But is it correct that we can safely assume this memset can write more memory even this memset handles aligned memory? What do you think? @nemanjai shchenz:* This seems a legacy issue because I also found same issue in case `memsetTailV1B12` and also…
		shchenzUnsubmitted Not Done Reply Inline Actions Sorry, please ignore this comment. I didn't realize that the two stores `stxsdx` and `stxvd2x` have overlaps. So the real write size is not extended. shchenz: Sorry, please ignore this comment. I didn't realize that the two stores `stxsdx` and `stxvd2x`…
; P8-LE-NEXT: rldimi 4, 4, 32, 0
; P8-LE-NEXT: stdx 4, 3, 5
; P8-LE-NEXT: stxvd2x 34, 0, 3		; P8-LE-NEXT: stxvd2x 34, 0, 3
; P8-LE-NEXT: blr		; P8-LE-NEXT: blr
		tingwangAuthorUnsubmitted Done Reply Inline Actions Will be eliminated by https://reviews.llvm.org/D139193. tingwang: Will be eliminated by https://reviews.llvm.org/D139193.
;		;
; P9-LE-LABEL: memsetTailV1B7:		; P9-LE-LABEL: memsetTailV1B7:
; P9-LE: # %bb.0: # %entry		; P9-LE: # %bb.0: # %entry
; P9-LE-NEXT: lis 4, 3855
; P9-LE-NEXT: li 5, 15
; P9-LE-NEXT: ori 4, 4, 3855
; P9-LE-NEXT: rldimi 4, 4, 32, 0
; P9-LE-NEXT: stdx 4, 3, 5
; P9-LE-NEXT: xxspltib 0, 15		; P9-LE-NEXT: xxspltib 0, 15
		; P9-LE-NEXT: stfd 0, 15(3)
; P9-LE-NEXT: stxv 0, 0(3)		; P9-LE-NEXT: stxv 0, 0(3)
; P9-LE-NEXT: blr		; P9-LE-NEXT: blr
;		;
; P10-LE-LABEL: memsetTailV1B7:		; P10-LE-LABEL: memsetTailV1B7:
; P10-LE: # %bb.0: # %entry		; P10-LE: # %bb.0: # %entry
; P10-LE-NEXT: pli 4, 252645135
; P10-LE-NEXT: rldimi 4, 4, 32, 0
; P10-LE-NEXT: pstd 4, 15(3), 0
; P10-LE-NEXT: xxspltib 0, 15		; P10-LE-NEXT: xxspltib 0, 15
		; P10-LE-NEXT: stfd 0, 15(3)
		tingwangAuthorUnsubmitted Done Reply Inline Actions Plan to address this pattern in separate patch. tingwang: Plan to address this pattern in separate patch.
; P10-LE-NEXT: stxv 0, 0(3)		; P10-LE-NEXT: stxv 0, 0(3)
; P10-LE-NEXT: blr		; P10-LE-NEXT: blr
entry:		entry:
tail call void @llvm.memset.p0.i64(ptr %p, i8 15, i64 23, i1 false)		tail call void @llvm.memset.p0.i64(ptr %p, i8 15, i64 23, i1 false)
ret void		ret void
}		}

define dso_local void @memsetTailV1B4(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memsetTailV1B4(ptr nocapture noundef writeonly %p) local_unnamed_addr {
; P8-BE-LABEL: memsetTailV1B4:		; P8-BE-LABEL: memsetTailV1B4:
; P8-BE: # %bb.0: # %entry		; P8-BE: # %bb.0: # %entry
; P8-BE-NEXT: vspltisb 2, 15		; P8-BE-NEXT: vspltisb 2, 15
; P8-BE-NEXT: lis 4, 3855		; P8-BE-NEXT: li 4, 16
; P8-BE-NEXT: ori 4, 4, 3855		; P8-BE-NEXT: stxsiwx 34, 3, 4
; P8-BE-NEXT: stw 4, 16(3)
; P8-BE-NEXT: stxvw4x 34, 0, 3		; P8-BE-NEXT: stxvw4x 34, 0, 3
; P8-BE-NEXT: blr		; P8-BE-NEXT: blr
;		;
; P9-BE-LABEL: memsetTailV1B4:		; P9-BE-LABEL: memsetTailV1B4:
; P9-BE: # %bb.0: # %entry		; P9-BE: # %bb.0: # %entry
; P9-BE-NEXT: lis 4, 3855
; P9-BE-NEXT: ori 4, 4, 3855
; P9-BE-NEXT: stw 4, 16(3)
; P9-BE-NEXT: xxspltib 0, 15		; P9-BE-NEXT: xxspltib 0, 15
		; P9-BE-NEXT: li 4, 16
		; P9-BE-NEXT: stfiwx 0, 3, 4
; P9-BE-NEXT: stxv 0, 0(3)		; P9-BE-NEXT: stxv 0, 0(3)
; P9-BE-NEXT: blr		; P9-BE-NEXT: blr
;		;
; P10-BE-LABEL: memsetTailV1B4:		; P10-BE-LABEL: memsetTailV1B4:
; P10-BE: # %bb.0: # %entry		; P10-BE: # %bb.0: # %entry
; P10-BE-NEXT: pli 4, 252645135
; P10-BE-NEXT: stw 4, 16(3)
; P10-BE-NEXT: xxspltib 0, 15		; P10-BE-NEXT: xxspltib 0, 15
		; P10-BE-NEXT: li 4, 16
		; P10-BE-NEXT: stfiwx 0, 3, 4
; P10-BE-NEXT: stxv 0, 0(3)		; P10-BE-NEXT: stxv 0, 0(3)
; P10-BE-NEXT: blr		; P10-BE-NEXT: blr
;		;
; P8-LE-LABEL: memsetTailV1B4:		; P8-LE-LABEL: memsetTailV1B4:
; P8-LE: # %bb.0: # %entry		; P8-LE: # %bb.0: # %entry
; P8-LE-NEXT: vspltisb 2, 15		; P8-LE-NEXT: vspltisb 2, 15
; P8-LE-NEXT: lis 4, 3855		; P8-LE-NEXT: li 4, 16
; P8-LE-NEXT: ori 4, 4, 3855		; P8-LE-NEXT: stxsiwx 34, 3, 4
; P8-LE-NEXT: stw 4, 16(3)
; P8-LE-NEXT: stxvd2x 34, 0, 3		; P8-LE-NEXT: stxvd2x 34, 0, 3
; P8-LE-NEXT: blr		; P8-LE-NEXT: blr
		tingwangAuthorUnsubmitted Done Reply Inline Actions Will be eliminated by https://reviews.llvm.org/D139193. tingwang: Will be eliminated by https://reviews.llvm.org/D139193.
;		;
; P9-LE-LABEL: memsetTailV1B4:		; P9-LE-LABEL: memsetTailV1B4:
; P9-LE: # %bb.0: # %entry		; P9-LE: # %bb.0: # %entry
; P9-LE-NEXT: lis 4, 3855
; P9-LE-NEXT: ori 4, 4, 3855
; P9-LE-NEXT: stw 4, 16(3)
; P9-LE-NEXT: xxspltib 0, 15		; P9-LE-NEXT: xxspltib 0, 15
		; P9-LE-NEXT: li 4, 16
		; P9-LE-NEXT: stfiwx 0, 3, 4
; P9-LE-NEXT: stxv 0, 0(3)		; P9-LE-NEXT: stxv 0, 0(3)
; P9-LE-NEXT: blr		; P9-LE-NEXT: blr
;		;
; P10-LE-LABEL: memsetTailV1B4:		; P10-LE-LABEL: memsetTailV1B4:
; P10-LE: # %bb.0: # %entry		; P10-LE: # %bb.0: # %entry
; P10-LE-NEXT: pli 4, 252645135
; P10-LE-NEXT: stw 4, 16(3)
; P10-LE-NEXT: xxspltib 0, 15		; P10-LE-NEXT: xxspltib 0, 15
		; P10-LE-NEXT: li 4, 16
		; P10-LE-NEXT: stfiwx 0, 3, 4
; P10-LE-NEXT: stxv 0, 0(3)		; P10-LE-NEXT: stxv 0, 0(3)
; P10-LE-NEXT: blr		; P10-LE-NEXT: blr
entry:		entry:
tail call void @llvm.memset.p0.i32(ptr %p, i8 15, i32 20, i1 false)		tail call void @llvm.memset.p0.i32(ptr %p, i8 15, i32 20, i1 false)
ret void		ret void
}		}

define dso_local void @memsetTailV1B3(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memsetTailV1B3(ptr nocapture noundef writeonly %p) local_unnamed_addr {
; P8-BE-LABEL: memsetTailV1B3:		; P8-BE-LABEL: memsetTailV1B3:
; P8-BE: # %bb.0: # %entry		; P8-BE: # %bb.0: # %entry
; P8-BE-NEXT: vspltisb 2, 15		; P8-BE-NEXT: vspltisb 2, 15
; P8-BE-NEXT: lis 4, 3855		; P8-BE-NEXT: li 4, 15
; P8-BE-NEXT: ori 4, 4, 3855		; P8-BE-NEXT: stxsiwx 34, 3, 4
; P8-BE-NEXT: stxvw4x 34, 0, 3		; P8-BE-NEXT: stxvw4x 34, 0, 3
; P8-BE-NEXT: stw 4, 15(3)
; P8-BE-NEXT: blr		; P8-BE-NEXT: blr
;		;
; P9-BE-LABEL: memsetTailV1B3:		; P9-BE-LABEL: memsetTailV1B3:
; P9-BE: # %bb.0: # %entry		; P9-BE: # %bb.0: # %entry
; P9-BE-NEXT: lis 4, 3855
; P9-BE-NEXT: ori 4, 4, 3855
; P9-BE-NEXT: stw 4, 15(3)
; P9-BE-NEXT: xxspltib 0, 15		; P9-BE-NEXT: xxspltib 0, 15
		; P9-BE-NEXT: li 4, 15
		; P9-BE-NEXT: stfiwx 0, 3, 4
; P9-BE-NEXT: stxv 0, 0(3)		; P9-BE-NEXT: stxv 0, 0(3)
; P9-BE-NEXT: blr		; P9-BE-NEXT: blr
;		;
; P10-BE-LABEL: memsetTailV1B3:		; P10-BE-LABEL: memsetTailV1B3:
; P10-BE: # %bb.0: # %entry		; P10-BE: # %bb.0: # %entry
; P10-BE-NEXT: pli 4, 252645135
; P10-BE-NEXT: stw 4, 15(3)
; P10-BE-NEXT: xxspltib 0, 15		; P10-BE-NEXT: xxspltib 0, 15
		; P10-BE-NEXT: li 4, 15
		; P10-BE-NEXT: stfiwx 0, 3, 4
; P10-BE-NEXT: stxv 0, 0(3)		; P10-BE-NEXT: stxv 0, 0(3)
; P10-BE-NEXT: blr		; P10-BE-NEXT: blr
;		;
; P8-LE-LABEL: memsetTailV1B3:		; P8-LE-LABEL: memsetTailV1B3:
; P8-LE: # %bb.0: # %entry		; P8-LE: # %bb.0: # %entry
; P8-LE-NEXT: vspltisb 2, 15		; P8-LE-NEXT: vspltisb 2, 15
; P8-LE-NEXT: lis 4, 3855		; P8-LE-NEXT: li 4, 15
; P8-LE-NEXT: ori 4, 4, 3855		; P8-LE-NEXT: stxsiwx 34, 3, 4
; P8-LE-NEXT: stw 4, 15(3)
; P8-LE-NEXT: stxvd2x 34, 0, 3		; P8-LE-NEXT: stxvd2x 34, 0, 3
; P8-LE-NEXT: blr		; P8-LE-NEXT: blr
		tingwangAuthorUnsubmitted Done Reply Inline Actions Will be eliminated by https://reviews.llvm.org/D139193. tingwang: Will be eliminated by https://reviews.llvm.org/D139193.
;		;
; P9-LE-LABEL: memsetTailV1B3:		; P9-LE-LABEL: memsetTailV1B3:
; P9-LE: # %bb.0: # %entry		; P9-LE: # %bb.0: # %entry
; P9-LE-NEXT: lis 4, 3855
; P9-LE-NEXT: ori 4, 4, 3855
; P9-LE-NEXT: stw 4, 15(3)
; P9-LE-NEXT: xxspltib 0, 15		; P9-LE-NEXT: xxspltib 0, 15
		; P9-LE-NEXT: li 4, 15
		; P9-LE-NEXT: stfiwx 0, 3, 4
; P9-LE-NEXT: stxv 0, 0(3)		; P9-LE-NEXT: stxv 0, 0(3)
; P9-LE-NEXT: blr		; P9-LE-NEXT: blr
;		;
; P10-LE-LABEL: memsetTailV1B3:		; P10-LE-LABEL: memsetTailV1B3:
; P10-LE: # %bb.0: # %entry		; P10-LE: # %bb.0: # %entry
; P10-LE-NEXT: pli 4, 252645135
; P10-LE-NEXT: stw 4, 15(3)
; P10-LE-NEXT: xxspltib 0, 15		; P10-LE-NEXT: xxspltib 0, 15
		; P10-LE-NEXT: li 4, 15
		; P10-LE-NEXT: stfiwx 0, 3, 4
; P10-LE-NEXT: stxv 0, 0(3)		; P10-LE-NEXT: stxv 0, 0(3)
; P10-LE-NEXT: blr		; P10-LE-NEXT: blr
entry:		entry:
tail call void @llvm.memset.p0.i64(ptr %p, i8 15, i64 19, i1 false)		tail call void @llvm.memset.p0.i64(ptr %p, i8 15, i64 19, i1 false)
ret void		ret void
}		}

define dso_local void @memsetTailV1B2(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memsetTailV1B2(ptr nocapture noundef writeonly %p) local_unnamed_addr {
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

define dso_local void @memset2TailV1B8(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memset2TailV1B8(ptr nocapture noundef writeonly %p) local_unnamed_addr {
; P8-BE-LABEL: memset2TailV1B8:		; P8-BE-LABEL: memset2TailV1B8:
; P8-BE: # %bb.0: # %entry		; P8-BE: # %bb.0: # %entry
; P8-BE-NEXT: ld 4, L..C3(2) # %const.0		; P8-BE-NEXT: ld 4, L..C3(2) # %const.0
; P8-BE-NEXT: lxvw4x 0, 0, 4		; P8-BE-NEXT: lxvw4x 0, 0, 4
; P8-BE-NEXT: lis 4, -23131		; P8-BE-NEXT: stfd 0, 16(3)
; P8-BE-NEXT: ori 4, 4, 42405
; P8-BE-NEXT: rldimi 4, 4, 32, 0
; P8-BE-NEXT: stxvw4x 0, 0, 3		; P8-BE-NEXT: stxvw4x 0, 0, 3
; P8-BE-NEXT: std 4, 16(3)
; P8-BE-NEXT: blr		; P8-BE-NEXT: blr
;		;
; P9-BE-LABEL: memset2TailV1B8:		; P9-BE-LABEL: memset2TailV1B8:
; P9-BE: # %bb.0: # %entry		; P9-BE: # %bb.0: # %entry
; P9-BE-NEXT: lis 4, -23131
; P9-BE-NEXT: xxspltib 0, 165		; P9-BE-NEXT: xxspltib 0, 165
; P9-BE-NEXT: ori 4, 4, 42405
; P9-BE-NEXT: stxv 0, 0(3)		; P9-BE-NEXT: stxv 0, 0(3)
; P9-BE-NEXT: rldimi 4, 4, 32, 0		; P9-BE-NEXT: stfd 0, 16(3)
; P9-BE-NEXT: std 4, 16(3)
; P9-BE-NEXT: blr		; P9-BE-NEXT: blr
;		;
; P10-BE-LABEL: memset2TailV1B8:		; P10-BE-LABEL: memset2TailV1B8:
; P10-BE: # %bb.0: # %entry		; P10-BE: # %bb.0: # %entry
; P10-BE-NEXT: pli 4, 2779096485
; P10-BE-NEXT: rldimi 4, 4, 32, 0
; P10-BE-NEXT: std 4, 16(3)
; P10-BE-NEXT: xxspltib 0, 165		; P10-BE-NEXT: xxspltib 0, 165
; P10-BE-NEXT: stxv 0, 0(3)		; P10-BE-NEXT: stxv 0, 0(3)
		; P10-BE-NEXT: stfd 0, 16(3)
; P10-BE-NEXT: blr		; P10-BE-NEXT: blr
;		;
; P8-LE-LABEL: memset2TailV1B8:		; P8-LE-LABEL: memset2TailV1B8:
; P8-LE: # %bb.0: # %entry		; P8-LE: # %bb.0: # %entry
; P8-LE-NEXT: addis 4, 2, .LCPI12_0@toc@ha		; P8-LE-NEXT: addis 4, 2, .LCPI12_0@toc@ha
; P8-LE-NEXT: addi 4, 4, .LCPI12_0@toc@l		; P8-LE-NEXT: addi 4, 4, .LCPI12_0@toc@l
; P8-LE-NEXT: lxvd2x 0, 0, 4		; P8-LE-NEXT: lxvd2x 0, 0, 4
; P8-LE-NEXT: lis 4, -23131		; P8-LE-NEXT: stfd 0, 16(3)
; P8-LE-NEXT: ori 4, 4, 42405
; P8-LE-NEXT: rldimi 4, 4, 32, 0
; P8-LE-NEXT: std 4, 16(3)
; P8-LE-NEXT: stxvd2x 0, 0, 3		; P8-LE-NEXT: stxvd2x 0, 0, 3
; P8-LE-NEXT: blr		; P8-LE-NEXT: blr
;		;
; P9-LE-LABEL: memset2TailV1B8:		; P9-LE-LABEL: memset2TailV1B8:
; P9-LE: # %bb.0: # %entry		; P9-LE: # %bb.0: # %entry
; P9-LE-NEXT: lis 4, -23131
; P9-LE-NEXT: xxspltib 0, 165		; P9-LE-NEXT: xxspltib 0, 165
; P9-LE-NEXT: ori 4, 4, 42405
; P9-LE-NEXT: stxv 0, 0(3)		; P9-LE-NEXT: stxv 0, 0(3)
; P9-LE-NEXT: rldimi 4, 4, 32, 0		; P9-LE-NEXT: stfd 0, 16(3)
; P9-LE-NEXT: std 4, 16(3)
; P9-LE-NEXT: blr		; P9-LE-NEXT: blr
;		;
; P10-LE-LABEL: memset2TailV1B8:		; P10-LE-LABEL: memset2TailV1B8:
; P10-LE: # %bb.0: # %entry		; P10-LE: # %bb.0: # %entry
; P10-LE-NEXT: pli 4, 2779096485
; P10-LE-NEXT: rldimi 4, 4, 32, 0
; P10-LE-NEXT: std 4, 16(3)
; P10-LE-NEXT: xxspltib 0, 165		; P10-LE-NEXT: xxspltib 0, 165
; P10-LE-NEXT: stxv 0, 0(3)		; P10-LE-NEXT: stxv 0, 0(3)
		; P10-LE-NEXT: stfd 0, 16(3)
; P10-LE-NEXT: blr		; P10-LE-NEXT: blr
entry:		entry:
tail call void @llvm.memset.p0.i64(ptr %p, i8 165, i64 24, i1 false)		tail call void @llvm.memset.p0.i64(ptr %p, i8 165, i64 24, i1 false)
ret void		ret void
}		}

define dso_local void @memset2TailV1B7(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memset2TailV1B7(ptr nocapture noundef writeonly %p) local_unnamed_addr {
; P8-BE-LABEL: memset2TailV1B7:		; P8-BE-LABEL: memset2TailV1B7:
; P8-BE: # %bb.0: # %entry		; P8-BE: # %bb.0: # %entry
; P8-BE-NEXT: ld 4, L..C4(2) # %const.0		; P8-BE-NEXT: ld 4, L..C4(2) # %const.0
; P8-BE-NEXT: lis 5, -23131
; P8-BE-NEXT: lxvw4x 0, 0, 4		; P8-BE-NEXT: lxvw4x 0, 0, 4
; P8-BE-NEXT: ori 4, 5, 42405		; P8-BE-NEXT: stfd 0, 15(3)
; P8-BE-NEXT: li 5, 15
; P8-BE-NEXT: rldimi 4, 4, 32, 0
; P8-BE-NEXT: stdx 4, 3, 5
; P8-BE-NEXT: stxvw4x 0, 0, 3		; P8-BE-NEXT: stxvw4x 0, 0, 3
; P8-BE-NEXT: blr		; P8-BE-NEXT: blr
;		;
; P9-BE-LABEL: memset2TailV1B7:		; P9-BE-LABEL: memset2TailV1B7:
; P9-BE: # %bb.0: # %entry		; P9-BE: # %bb.0: # %entry
; P9-BE-NEXT: lis 4, -23131
; P9-BE-NEXT: li 5, 15
; P9-BE-NEXT: ori 4, 4, 42405
; P9-BE-NEXT: rldimi 4, 4, 32, 0
; P9-BE-NEXT: stdx 4, 3, 5
; P9-BE-NEXT: xxspltib 0, 165		; P9-BE-NEXT: xxspltib 0, 165
		; P9-BE-NEXT: stfd 0, 15(3)
; P9-BE-NEXT: stxv 0, 0(3)		; P9-BE-NEXT: stxv 0, 0(3)
; P9-BE-NEXT: blr		; P9-BE-NEXT: blr
;		;
; P10-BE-LABEL: memset2TailV1B7:		; P10-BE-LABEL: memset2TailV1B7:
; P10-BE: # %bb.0: # %entry		; P10-BE: # %bb.0: # %entry
; P10-BE-NEXT: pli 4, 2779096485
; P10-BE-NEXT: rldimi 4, 4, 32, 0
; P10-BE-NEXT: pstd 4, 15(3), 0
; P10-BE-NEXT: xxspltib 0, 165		; P10-BE-NEXT: xxspltib 0, 165
		; P10-BE-NEXT: stfd 0, 15(3)
		tingwangAuthorUnsubmitted Done Reply Inline Actions Plan to address this pattern in separate patch. tingwang: Plan to address this pattern in separate patch.
; P10-BE-NEXT: stxv 0, 0(3)		; P10-BE-NEXT: stxv 0, 0(3)
; P10-BE-NEXT: blr		; P10-BE-NEXT: blr
;		;
; P8-LE-LABEL: memset2TailV1B7:		; P8-LE-LABEL: memset2TailV1B7:
; P8-LE: # %bb.0: # %entry		; P8-LE: # %bb.0: # %entry
; P8-LE-NEXT: addis 4, 2, .LCPI13_0@toc@ha		; P8-LE-NEXT: addis 4, 2, .LCPI13_0@toc@ha
; P8-LE-NEXT: lis 5, -23131
; P8-LE-NEXT: addi 4, 4, .LCPI13_0@toc@l		; P8-LE-NEXT: addi 4, 4, .LCPI13_0@toc@l
; P8-LE-NEXT: lxvd2x 0, 0, 4		; P8-LE-NEXT: lxvd2x 0, 0, 4
; P8-LE-NEXT: ori 4, 5, 42405		; P8-LE-NEXT: stfd 0, 15(3)
; P8-LE-NEXT: li 5, 15
; P8-LE-NEXT: rldimi 4, 4, 32, 0
; P8-LE-NEXT: stdx 4, 3, 5
; P8-LE-NEXT: stxvd2x 0, 0, 3		; P8-LE-NEXT: stxvd2x 0, 0, 3
; P8-LE-NEXT: blr		; P8-LE-NEXT: blr
;		;
; P9-LE-LABEL: memset2TailV1B7:		; P9-LE-LABEL: memset2TailV1B7:
; P9-LE: # %bb.0: # %entry		; P9-LE: # %bb.0: # %entry
; P9-LE-NEXT: lis 4, -23131
; P9-LE-NEXT: li 5, 15
; P9-LE-NEXT: ori 4, 4, 42405
; P9-LE-NEXT: rldimi 4, 4, 32, 0
; P9-LE-NEXT: stdx 4, 3, 5
; P9-LE-NEXT: xxspltib 0, 165		; P9-LE-NEXT: xxspltib 0, 165
		; P9-LE-NEXT: stfd 0, 15(3)
; P9-LE-NEXT: stxv 0, 0(3)		; P9-LE-NEXT: stxv 0, 0(3)
; P9-LE-NEXT: blr		; P9-LE-NEXT: blr
;		;
; P10-LE-LABEL: memset2TailV1B7:		; P10-LE-LABEL: memset2TailV1B7:
; P10-LE: # %bb.0: # %entry		; P10-LE: # %bb.0: # %entry
; P10-LE-NEXT: pli 4, 2779096485
; P10-LE-NEXT: rldimi 4, 4, 32, 0
; P10-LE-NEXT: pstd 4, 15(3), 0
; P10-LE-NEXT: xxspltib 0, 165		; P10-LE-NEXT: xxspltib 0, 165
		; P10-LE-NEXT: stfd 0, 15(3)
		tingwangAuthorUnsubmitted Done Reply Inline Actions Plan to address this pattern in separate patch. tingwang: Plan to address this pattern in separate patch.
; P10-LE-NEXT: stxv 0, 0(3)		; P10-LE-NEXT: stxv 0, 0(3)
; P10-LE-NEXT: blr		; P10-LE-NEXT: blr
entry:		entry:
tail call void @llvm.memset.p0.i64(ptr %p, i8 165, i64 23, i1 false)		tail call void @llvm.memset.p0.i64(ptr %p, i8 165, i64 23, i1 false)
ret void		ret void
}		}

define dso_local void @memset2TailV1B4(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memset2TailV1B4(ptr nocapture noundef writeonly %p) local_unnamed_addr {
; P8-BE-LABEL: memset2TailV1B4:		; P8-BE-LABEL: memset2TailV1B4:
; P8-BE: # %bb.0: # %entry		; P8-BE: # %bb.0: # %entry
; P8-BE-NEXT: ld 4, L..C5(2) # %const.0		; P8-BE-NEXT: ld 4, L..C5(2) # %const.0
; P8-BE-NEXT: lxvw4x 0, 0, 4		; P8-BE-NEXT: lxvw4x 0, 0, 4
; P8-BE-NEXT: lis 4, -23131		; P8-BE-NEXT: li 4, 16
; P8-BE-NEXT: ori 4, 4, 42405		; P8-BE-NEXT: stfiwx 0, 3, 4
; P8-BE-NEXT: stw 4, 16(3)
; P8-BE-NEXT: stxvw4x 0, 0, 3		; P8-BE-NEXT: stxvw4x 0, 0, 3
; P8-BE-NEXT: blr		; P8-BE-NEXT: blr
;		;
; P9-BE-LABEL: memset2TailV1B4:		; P9-BE-LABEL: memset2TailV1B4:
; P9-BE: # %bb.0: # %entry		; P9-BE: # %bb.0: # %entry
; P9-BE-NEXT: lis 4, -23131
; P9-BE-NEXT: ori 4, 4, 42405
; P9-BE-NEXT: stw 4, 16(3)
; P9-BE-NEXT: xxspltib 0, 165		; P9-BE-NEXT: xxspltib 0, 165
		; P9-BE-NEXT: li 4, 16
		; P9-BE-NEXT: stfiwx 0, 3, 4
; P9-BE-NEXT: stxv 0, 0(3)		; P9-BE-NEXT: stxv 0, 0(3)
; P9-BE-NEXT: blr		; P9-BE-NEXT: blr
;		;
; P10-BE-LABEL: memset2TailV1B4:		; P10-BE-LABEL: memset2TailV1B4:
; P10-BE: # %bb.0: # %entry		; P10-BE: # %bb.0: # %entry
; P10-BE-NEXT: pli 4, -1515870811
; P10-BE-NEXT: stw 4, 16(3)
; P10-BE-NEXT: xxspltib 0, 165		; P10-BE-NEXT: xxspltib 0, 165
		; P10-BE-NEXT: li 4, 16
		; P10-BE-NEXT: stfiwx 0, 3, 4
; P10-BE-NEXT: stxv 0, 0(3)		; P10-BE-NEXT: stxv 0, 0(3)
; P10-BE-NEXT: blr		; P10-BE-NEXT: blr
;		;
; P8-LE-LABEL: memset2TailV1B4:		; P8-LE-LABEL: memset2TailV1B4:
; P8-LE: # %bb.0: # %entry		; P8-LE: # %bb.0: # %entry
; P8-LE-NEXT: addis 4, 2, .LCPI14_0@toc@ha		; P8-LE-NEXT: addis 4, 2, .LCPI14_0@toc@ha
; P8-LE-NEXT: addi 4, 4, .LCPI14_0@toc@l		; P8-LE-NEXT: addi 4, 4, .LCPI14_0@toc@l
; P8-LE-NEXT: lxvd2x 0, 0, 4		; P8-LE-NEXT: lxvd2x 0, 0, 4
; P8-LE-NEXT: lis 4, -23131		; P8-LE-NEXT: li 4, 16
; P8-LE-NEXT: ori 4, 4, 42405		; P8-LE-NEXT: stfiwx 0, 3, 4
; P8-LE-NEXT: stw 4, 16(3)
; P8-LE-NEXT: stxvd2x 0, 0, 3		; P8-LE-NEXT: stxvd2x 0, 0, 3
; P8-LE-NEXT: blr		; P8-LE-NEXT: blr
;		;
; P9-LE-LABEL: memset2TailV1B4:		; P9-LE-LABEL: memset2TailV1B4:
; P9-LE: # %bb.0: # %entry		; P9-LE: # %bb.0: # %entry
; P9-LE-NEXT: lis 4, -23131
; P9-LE-NEXT: ori 4, 4, 42405
; P9-LE-NEXT: stw 4, 16(3)
; P9-LE-NEXT: xxspltib 0, 165		; P9-LE-NEXT: xxspltib 0, 165
		; P9-LE-NEXT: li 4, 16
		; P9-LE-NEXT: stfiwx 0, 3, 4
; P9-LE-NEXT: stxv 0, 0(3)		; P9-LE-NEXT: stxv 0, 0(3)
; P9-LE-NEXT: blr		; P9-LE-NEXT: blr
;		;
; P10-LE-LABEL: memset2TailV1B4:		; P10-LE-LABEL: memset2TailV1B4:
; P10-LE: # %bb.0: # %entry		; P10-LE: # %bb.0: # %entry
; P10-LE-NEXT: pli 4, -1515870811
; P10-LE-NEXT: stw 4, 16(3)
; P10-LE-NEXT: xxspltib 0, 165		; P10-LE-NEXT: xxspltib 0, 165
		; P10-LE-NEXT: li 4, 16
		; P10-LE-NEXT: stfiwx 0, 3, 4
; P10-LE-NEXT: stxv 0, 0(3)		; P10-LE-NEXT: stxv 0, 0(3)
; P10-LE-NEXT: blr		; P10-LE-NEXT: blr
entry:		entry:
tail call void @llvm.memset.p0.i32(ptr %p, i8 165, i32 20, i1 false)		tail call void @llvm.memset.p0.i32(ptr %p, i8 165, i32 20, i1 false)
ret void		ret void
}		}

define dso_local void @memset2TailV1B3(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memset2TailV1B3(ptr nocapture noundef writeonly %p) local_unnamed_addr {
; P8-BE-LABEL: memset2TailV1B3:		; P8-BE-LABEL: memset2TailV1B3:
; P8-BE: # %bb.0: # %entry		; P8-BE: # %bb.0: # %entry
; P8-BE-NEXT: ld 4, L..C6(2) # %const.0		; P8-BE-NEXT: ld 4, L..C6(2) # %const.0
; P8-BE-NEXT: lxvw4x 0, 0, 4		; P8-BE-NEXT: lxvw4x 0, 0, 4
; P8-BE-NEXT: lis 4, -23131		; P8-BE-NEXT: li 4, 15
; P8-BE-NEXT: ori 4, 4, 42405		; P8-BE-NEXT: stfiwx 0, 3, 4
; P8-BE-NEXT: stw 4, 15(3)
; P8-BE-NEXT: stxvw4x 0, 0, 3		; P8-BE-NEXT: stxvw4x 0, 0, 3
; P8-BE-NEXT: blr		; P8-BE-NEXT: blr
;		;
; P9-BE-LABEL: memset2TailV1B3:		; P9-BE-LABEL: memset2TailV1B3:
; P9-BE: # %bb.0: # %entry		; P9-BE: # %bb.0: # %entry
; P9-BE-NEXT: lis 4, -23131
; P9-BE-NEXT: ori 4, 4, 42405
; P9-BE-NEXT: stw 4, 15(3)
; P9-BE-NEXT: xxspltib 0, 165		; P9-BE-NEXT: xxspltib 0, 165
		; P9-BE-NEXT: li 4, 15
		; P9-BE-NEXT: stfiwx 0, 3, 4
; P9-BE-NEXT: stxv 0, 0(3)		; P9-BE-NEXT: stxv 0, 0(3)
; P9-BE-NEXT: blr		; P9-BE-NEXT: blr
;		;
; P10-BE-LABEL: memset2TailV1B3:		; P10-BE-LABEL: memset2TailV1B3:
; P10-BE: # %bb.0: # %entry		; P10-BE: # %bb.0: # %entry
; P10-BE-NEXT: pli 4, -1515870811
; P10-BE-NEXT: stw 4, 15(3)
; P10-BE-NEXT: xxspltib 0, 165		; P10-BE-NEXT: xxspltib 0, 165
		; P10-BE-NEXT: li 4, 15
		; P10-BE-NEXT: stfiwx 0, 3, 4
; P10-BE-NEXT: stxv 0, 0(3)		; P10-BE-NEXT: stxv 0, 0(3)
; P10-BE-NEXT: blr		; P10-BE-NEXT: blr
;		;
; P8-LE-LABEL: memset2TailV1B3:		; P8-LE-LABEL: memset2TailV1B3:
; P8-LE: # %bb.0: # %entry		; P8-LE: # %bb.0: # %entry
; P8-LE-NEXT: addis 4, 2, .LCPI15_0@toc@ha		; P8-LE-NEXT: addis 4, 2, .LCPI15_0@toc@ha
; P8-LE-NEXT: addi 4, 4, .LCPI15_0@toc@l		; P8-LE-NEXT: addi 4, 4, .LCPI15_0@toc@l
; P8-LE-NEXT: lxvd2x 0, 0, 4		; P8-LE-NEXT: lxvd2x 0, 0, 4
; P8-LE-NEXT: lis 4, -23131		; P8-LE-NEXT: li 4, 15
; P8-LE-NEXT: ori 4, 4, 42405		; P8-LE-NEXT: stfiwx 0, 3, 4
; P8-LE-NEXT: stw 4, 15(3)
; P8-LE-NEXT: stxvd2x 0, 0, 3		; P8-LE-NEXT: stxvd2x 0, 0, 3
; P8-LE-NEXT: blr		; P8-LE-NEXT: blr
;		;
; P9-LE-LABEL: memset2TailV1B3:		; P9-LE-LABEL: memset2TailV1B3:
; P9-LE: # %bb.0: # %entry		; P9-LE: # %bb.0: # %entry
; P9-LE-NEXT: lis 4, -23131
; P9-LE-NEXT: ori 4, 4, 42405
; P9-LE-NEXT: stw 4, 15(3)
; P9-LE-NEXT: xxspltib 0, 165		; P9-LE-NEXT: xxspltib 0, 165
		; P9-LE-NEXT: li 4, 15
		; P9-LE-NEXT: stfiwx 0, 3, 4
; P9-LE-NEXT: stxv 0, 0(3)		; P9-LE-NEXT: stxv 0, 0(3)
; P9-LE-NEXT: blr		; P9-LE-NEXT: blr
;		;
; P10-LE-LABEL: memset2TailV1B3:		; P10-LE-LABEL: memset2TailV1B3:
; P10-LE: # %bb.0: # %entry		; P10-LE: # %bb.0: # %entry
; P10-LE-NEXT: pli 4, -1515870811
; P10-LE-NEXT: stw 4, 15(3)
; P10-LE-NEXT: xxspltib 0, 165		; P10-LE-NEXT: xxspltib 0, 165
		; P10-LE-NEXT: li 4, 15
		; P10-LE-NEXT: stfiwx 0, 3, 4
; P10-LE-NEXT: stxv 0, 0(3)		; P10-LE-NEXT: stxv 0, 0(3)
; P10-LE-NEXT: blr		; P10-LE-NEXT: blr
entry:		entry:
tail call void @llvm.memset.p0.i64(ptr %p, i8 165, i64 19, i1 false)		tail call void @llvm.memset.p0.i64(ptr %p, i8 165, i64 19, i1 false)
ret void		ret void
}		}

define dso_local void @memset2TailV1B2(ptr nocapture noundef writeonly %p) local_unnamed_addr {		define dso_local void @memset2TailV1B2(ptr nocapture noundef writeonly %p) local_unnamed_addr {
▲ Show 20 Lines • Show All 539 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG][PowerPC] Memset reuse vector element for tail storeClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 539375

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/lib/Target/PowerPC/PPCInstrP10.td

llvm/test/CodeGen/PowerPC/memset-tail.ll

[SelectionDAG][PowerPC] Memset reuse vector element for tail store
ClosedPublic