This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/
-
CodeGen/
-
InterleavedAccessPass.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
ARM/
-
ARMISelLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
aarch64-interleaved-accesses.ll
-
ARM/
-
arm-interleaved-accesses.ll

Differential D23646

Generalize strided store pattern in interleave access pass
ClosedPublic

Authored by asbirlea on Aug 17 2016, 11:46 PM.

Download Raw Diff

Details

Reviewers

silviu.baranga
rengolin
t.p.northover
• HaoLiu
jmolloy
mssimpso

Commits

rG77c5eaaedac7: Generalize strided store pattern in interleave access pass
rL289573: Generalize strided store pattern in interleave access pass

Summary

This patch aims to generalize matching of the strided store accesses to more general masks.
The more general rule is to have consecutive accesses based on the stride:
[x, y, ... z, x+1, y+1, ...z+1, x+2, y+2, ...z+2, ...]
and for each start element in each stride (x, y, ... z] to be aligned.
However all elements in the masks need not form a contiguous space, there may be gaps.
As before, undefs are allowed and filled in with adjacent element loads.

Note this patch is not final, but I would like to get feedback on the approach.
There are at least the pending TODOs.

Diff Detail

Repository: rL LLVM

Event Timeline

asbirlea updated this revision to Diff 68484.Aug 17 2016, 11:46 PM

asbirlea retitled this revision from to Generalize strided store pattern in interleave access pass.

asbirlea updated this object.

asbirlea added reviewers: • HaoLiu, mssimpso.

asbirlea added subscribers: llvm-commits, delena, mkuper.

Hi Alina,

I think I understand this, but I just want to be sure I get how this differs from what we currently have before going further. Currently, we only match [x, y, ..., z, x+1, y+1, z+1, ...] where each y-x and each z-y equals the number of sub elements for the given factor. Or said another way, if I create a list or all the x's followed by all the y's and then all the z's, the entire list would be consecutive. With your path, the only requirement is that each sub-list be consecutive. Is this right?

The current approach was designed to match the shuffle patterns produced by the loop vectorizer. I'm curious to know where we are generating these more general patterns. Have you run across some code examples?

Also, another high level comment before I start looking at the details: you'll want to include some IR test cases as well (to be run with opt instead of llc).

Matt.

Hi Matt,

Thanks for looking to review this. Please find my answers below.

In D23646#521091, @mssimpso wrote:

Hi Alina,

I think I understand this, but I just want to be sure I get how this differs from what we currently have before going further. Currently, we only match [x, y, ..., z, x+1, y+1, z+1, ...] where each y-x and each z-y equals the number of sub elements for the given factor. Or said another way, if I create a list or all the x's followed by all the y's and then all the z's, the entire list would be consecutive. With your path, the only requirement is that each sub-list be consecutive. Is this right?

That's right. Also, from my understanding, x is always 0. So all elements form a consecutive sublist which always starts at 0.
My first approach was actually to generalize this just to add a prefix to remove the "starts with 0" restriction and a more general stride that allowed gaps. But this still didn't cover all the testcases I came across, such as the example I added in "store_general_mask_factor4".

To answer your question below, the usecases I'm looking at are generated by Halide (https://github.com/halide/Halide).
Halide generates LLVM IR and relies on its optimization pipeline and lowering, but they need to generate explicit intrinsics (including strided loads and stores) for arm and aarch64, because their patterns are not lowered to intrinsics by LLVM.
Since this approach was taken before the interleaved-access pass was added, it's quite understandable, but LLVM is more powerful now and I'm trying to make use of this, and in the process, cover the cases missing in LLVM.
For example, for strided loads the interleaved-access pass does cover the code patterns generated by Halide, so the "custom" intrinsic code generation in Halide will soon be removed. My goal is to improve the pass to make this happen for the stores as well.
The tests I will add are actually simplified versions of what Halide is generating.

The current approach was designed to match the shuffle patterns produced by the loop vectorizer. I'm curious to know where we are generating these more general patterns. Have you run across some code examples?

Also, another high level comment before I start looking at the details: you'll want to include some IR test cases as well (to be run with opt instead of llc).

Agreed, the plan is to add more tests, including IR tests.

Matt.

mssimpso added inline comments.Aug 26 2016, 10:42 AM

lib/CodeGen/InterleavedAccessPass.cpp
159–164 ↗	(On Diff #68484)	You should probably update this to define the more general pattern.
181–207 ↗	(On Diff #68484)	This looks fairly reasonable to me, but the parts dealing with undef are pretty difficult to follow. I think some more high-level comments would help people better understand what's going on here.

Address comments re. comments.
Complete TODOs.
Requesting help on whether to include TLI.misalignedAccess check and on what's the correct way to do it.

asbirlea added inline comments.Sep 6 2016, 3:37 PM

lib/CodeGen/InterleavedAccessPass.cpp
400 ↗	(On Diff #70479)	This is a part that I'm not sure is needed, and how to address it. The goal was to check for the alignment of each of the strides, i.e. BaseStoreAddress + StartingIncrementInStride, for all stride [0, Factor). The commented attempt has a series of problems and does not achieve this. Should this check exist and what's the correct way to handle it?

mssimpso added a reviewer: t.p.northover.Sep 13 2016, 1:35 PM

Hi Alina,

Sorry for the delay. I'm not quite sure I understand this patch anymore. I'm adding Tim Northover (ARM/AArch64 code owner) as a reviewer to hopefully get this unstuck.

SG, thank you!

I'm going back to look at the alignment check this afternoon (that's the big commented out block).
I'd really like to understand why some basic alignment checks lead to ARM tests failing and not their AArch64 counterparts.

mssimpso added inline comments.Sep 13 2016, 1:51 PM

lib/CodeGen/InterleavedAccessPass.cpp
400 ↗	(On Diff #70479)	I could be wrong about this, but I don't think you need to worry about alignment here. I'm not seeing how the memory behavior with this patch would be different than the current situation.

asbirlea added inline comments.Sep 13 2016, 4:09 PM

lib/CodeGen/InterleavedAccessPass.cpp
400 ↗	(On Diff #70479)	I agree with you that there should be no significant difference from the current situation. There is one small difference though...before there could be one misaligned access, now there may be Factor such accesses. That's why I'd still like to understand the alignment issue - whether the check is needed or not and in what form. Perhaps it would be better to have it in a separate patch though.

Remove comment block checking for alignment. Will revisit in a future patch.

Pinging patch.

Also, working around the case when masks are larger than 16 elements.
This can happen now if Halide takes advantage of this pass for strided stores.
This is not the only use-case of larger shuffle masks, but the topic is beyond the scope of this patch.

Minor edit of temporary variables.

rengolin added reviewers: rengolin, silviu.baranga, jmolloy.Oct 8 2016, 6:11 AM

rengolin added inline comments.Oct 14 2016, 7:36 AM

lib/CodeGen/InterleavedAccessPass.cpp
193 ↗	(On Diff #73332)	Nit, `ij` is hard to follow. Try `lane` or something more expressive. (this is not a matrix :)
204 ↗	(On Diff #73332)	This is really confusing. Can you factor the comparison elements out with expressive names, so the if becomes a comparison of obvious terms?
208 ↗	(On Diff #73332)	PreviousMask is always used in conjunction with PreviousPos, so you don't need the mask to be signed and you can compare the pos in the block above and get rid of the static casts. Or you could have an additional boolean flag and make them both unsigned.
lib/Target/AArch64/AArch64ISelLowering.cpp
7243 ↗	(On Diff #73332)	Nit. use brackets here: if (...) { ... } else {
7247 ↗	(On Diff #73332)	I don't get the `- j` here.
lib/Target/ARM/ARMISelLowering.cpp
13126 ↗	(On Diff #73332)	Better to duplicate the comment, I think. These back-ends evolve at different paces.
test/CodeGen/ARM/arm-interleaved-accesses.ll
321 ↗	(On Diff #73332)	is this really guaranteed to reproduce? they don't seem connected to the pcs directly...

Address comments.

asbirlea added inline comments.Oct 14 2016, 11:34 AM

lib/CodeGen/InterleavedAccessPass.cpp
193 ↗	(On Diff #73332)	Renamed to Lane. I hope changing the NumSubElts to LaneLen makes more sense too. I'm inclined to change it in the lowering files as well for consistency.
lib/Target/AArch64/AArch64ISelLowering.cpp
7247 ↗	(On Diff #73332)	Assuming the mask starts with a few undefs, this computes what the start of the mask would be based on the first non-undef value. The computation is done first in the pass to make sure the start is a positive value (hence the correctness comment below on "StartMask cannot be negative")
test/CodeGen/ARM/arm-interleaved-accesses.ll
321 ↗	(On Diff #73332)	I'm not sure about this TBH, and not sure how to verify it. Should I replace it by a simple vst4.32 check?

Hi Alina,

This is looking much better, thanks!

The code has a lot of undef handling, but not much in the way of testing it. I think we should have at least the following:

one and two undefs in the middle
one undef at the beginning and one at the end
all undefs in one lane
one undef in each lane, at different positions

Repeating the pattern of your current tests but adding undefs should be enough.

cheers,
--renato

lib/CodeGen/InterleavedAccessPass.cpp
193 ↗	(On Diff #73332)	Nice, much better! I agree with renaming the lowering code, too.
188 ↗	(On Diff #74721)	Better to declare I and J inside the `for` declaration.
200 ↗	(On Diff #74721)	A comment here would help... // If both defined, only sequential values allowed
216 ↗	(On Diff #74721)	What about the case where the first lane is undef, but the others aren't?
226 ↗	(On Diff #74721)	Instead of using a `SavedNonUndef` above, you could save the last non-undef value and the number of undefs since that value. That'd make the next-value computation easier: If (NextValue != SavedValue + NumUndefs) break; and also help get the StartMask here, for free.
237 ↗	(On Diff #74721)	"Found an interleaved..."
lib/Target/AArch64/AArch64ISelLowering.cpp
7247 ↗	(On Diff #73332)	Right, I agree you could repeat the naming pattern above, here.
test/CodeGen/ARM/arm-interleaved-accesses.ll
321 ↗	(On Diff #73332)	Something like: vst4.32 {d{{\n+}}, d{{\n+}}, d{{\n+}}, d{{\n+}}}, [r0] would do. (I'm not sure of the triple brackets there...)

Address comments. One pending.

Great point on the lack of testing. But I wasn't happy with the coverage the vst4/st4 had.
I added a pattern for vst3/st3 that covers the undefs in the middle of a lane.

lib/CodeGen/InterleavedAccessPass.cpp
188 ↗	(On Diff #74721)	There's a check on I and J following each loop. I could add an additional flag to check that we broke out of the loop early, but it seemed overkill to do that when I and J could be used if declared outside the loop.
216 ↗	(On Diff #74721)	Nothing wrong with that (unless I'm missing something).. It'll check the correctness for the ones that follow and the first one will receive a value based on the following values - that's the start mask value.
226 ↗	(On Diff #74721)	I'm still looking into this one. I can do without SaveNonUndef, and update the condition to a "SavedLaneValue+SavedNoUndefs (+1)". This needs an additional if clause in the loop to increment the SavedNoUndefs, and at least another check to help with computing the mask. The second check is because right now I only store SavedLaneValue if a value is followed by an undef, but at the end of the mask we'll need this updated too to get the correct StartMask as something like SavedLaneValue+SavedNoUndefs -LaneLen (+/- 1). Right now I find it easier to just compute the StartMask in the same j loop. So, yeah, still looking what's the cleanest way to do this.

Address remaining comment. Add additional testcases.

[clang-format]

Gentle ping.

Re-pinging patch.

Pinging again.

Hi,

Sorry to keep you waiting, this completely fell out of my radar.

I think the code looks good now, just need to make sure the test is generic enough on the CHECK line (see inline comment).

cheers,
--renato

test/CodeGen/AArch64/aarch64-interleaved-accesses.ll
285 ↗	(On Diff #75332)	Please, also use: st4.32 {d{{\n+}}, d{{\n+}}, d{{\n+}}, d{{\n+}}}, [x0] here.

This revision is now accepted and ready to land.Dec 13 2016, 7:31 AM

Update aarch64 test.

Thank you for the review, Renato!

Closed by commit rL289573: Generalize strided store pattern in interleave access pass (authored by asbirlea). · Explain WhyDec 13 2016, 11:43 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

InterleavedAccessPass.cpp

88 lines

Target/

AArch64/

AArch64ISelLowering.cpp

44 lines

ARM/

ARMISelLowering.cpp

42 lines

test/

CodeGen/

AArch64/

aarch64-interleaved-accesses.ll

111 lines

ARM/

arm-interleaved-accesses.ll

144 lines

Diff 81270

llvm/trunk/lib/CodeGen/InterleavedAccessPass.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	static bool isDeInterleaveMask(ArrayRef<int> Mask, unsigned &Factor,
// Check potential Factors.		// Check potential Factors.
for (Factor = 2; Factor <= MaxFactor; Factor++)		for (Factor = 2; Factor <= MaxFactor; Factor++)
if (isDeInterleaveMaskOfFactor(Mask, Factor, Index))		if (isDeInterleaveMaskOfFactor(Mask, Factor, Index))
return true;		return true;

return false;		return false;
}		}

/// \brief Check if the mask is RE-interleave mask for an interleaved store.		/// \brief Check if the mask can be used in an interleaved store.
///		//
/// I.e. <0, NumSubElts, ... , NumSubElts*(Factor - 1), 1, NumSubElts + 1, ...>		/// It checks for a more general pattern than the RE-interleave mask.
		/// I.e. <x, y, ... z, x+1, y+1, ...z+1, x+2, y+2, ...z+2, ...>
		/// E.g. For a Factor of 2 (LaneLen=4): <4, 32, 5, 33, 6, 34, 7, 35>
		/// E.g. For a Factor of 3 (LaneLen=4): <4, 32, 16, 5, 33, 17, 6, 34, 18, 7, 35, 19>
		/// E.g. For a Factor of 4 (LaneLen=2): <8, 2, 12, 4, 9, 3, 13, 5>
///		///
/// E.g. The RE-interleave mask (Factor = 2) could be:		/// The particular case of an RE-interleave mask is:
/// <0, 4, 1, 5, 2, 6, 3, 7>		/// I.e. <0, LaneLen, ... , LaneLen*(Factor - 1), 1, LaneLen + 1, ...>
		/// E.g. For a Factor of 2 (LaneLen=4): <0, 4, 1, 5, 2, 6, 3, 7>
static bool isReInterleaveMask(ArrayRef<int> Mask, unsigned &Factor,		static bool isReInterleaveMask(ArrayRef<int> Mask, unsigned &Factor,
unsigned MaxFactor) {		unsigned MaxFactor) {
unsigned NumElts = Mask.size();		unsigned NumElts = Mask.size();
if (NumElts < 4)		if (NumElts < 4)
return false;		return false;

// Check potential Factors.		// Check potential Factors.
for (Factor = 2; Factor <= MaxFactor; Factor++) {		for (Factor = 2; Factor <= MaxFactor; Factor++) {
if (NumElts % Factor)		if (NumElts % Factor)
continue;		continue;

unsigned NumSubElts = NumElts / Factor;		unsigned LaneLen = NumElts / Factor;
if (!isPowerOf2_32(NumSubElts))		if (!isPowerOf2_32(LaneLen))
continue;		continue;

// Check whether each element matchs the RE-interleaved rule. Ignore undef		// Check whether each element matches the general interleaved rule.
// elements.		// Ignore undef elements, as long as the defined elements match the rule.
unsigned i = 0;		// Outer loop processes all factors (x, y, z in the above example)
for (; i < NumElts; i++)		unsigned I = 0, J;
if (Mask[i] >= 0 &&		for (; I < Factor; I++) {
static_cast<unsigned>(Mask[i]) !=		unsigned SavedLaneValue;
(i % Factor) * NumSubElts + i / Factor)		unsigned SavedNoUndefs = 0;

		// Inner loop processes consecutive accesses (x, x+1... in the example)
		for (J = 0; J < LaneLen - 1; J++) {
		// Lane computes x's position in the Mask
		unsigned Lane = J * Factor + I;
		unsigned NextLane = Lane + Factor;
		int LaneValue = Mask[Lane];
		int NextLaneValue = Mask[NextLane];

		// If both are defined, values must be sequential
		if (LaneValue >= 0 && NextLaneValue >= 0 &&
		LaneValue + 1 != NextLaneValue)
break;		break;

// Find a RE-interleaved mask of current factor.		// If the next value is undef, save the current one as reference
if (i == NumElts)		if (LaneValue >= 0 && NextLaneValue < 0) {
		SavedLaneValue = LaneValue;
		SavedNoUndefs = 1;
		}

		// Undefs are allowed, but defined elements must still be consecutive:
		// i.e.: x,..., undef,..., x + 2,..., undef,..., undef,..., x + 5, ....
		// Verify this by storing the last non-undef followed by an undef
		// Check that following non-undef masks are incremented with the
		// corresponding distance.
		if (SavedNoUndefs > 0 && LaneValue < 0) {
		SavedNoUndefs++;
		if (NextLaneValue >= 0 &&
		SavedLaneValue + SavedNoUndefs != (unsigned)NextLaneValue)
		break;
		}
		}

		if (J < LaneLen - 1)
		break;

		int StartMask = 0;
		if (Mask[I] >= 0) {
		// Check that the start of the I range (J=0) is greater than 0
		StartMask = Mask[I];
		} else if (Mask[(LaneLen - 1) * Factor + I] >= 0) {
		// StartMask defined by the last value in lane
		StartMask = Mask[(LaneLen - 1) * Factor + I] - J;
		} else if (SavedNoUndefs > 0) {
		// StartMask defined by some non-zero value in the j loop
		StartMask = SavedLaneValue - (LaneLen - 1 - SavedNoUndefs);
		}
		// else StartMask remains set to 0, i.e. all elements are undefs

		if (StartMask < 0)
		break;
		}

		// Found an interleaved mask of current factor.
		if (I == Factor)
return true;		return true;
}		}

return false;		return false;
}		}

bool InterleavedAccess::lowerInterleavedLoad(		bool InterleavedAccess::lowerInterleavedLoad(
LoadInst LI, SmallVector<Instruction , 32> &DeadInsts) {		LoadInst LI, SmallVector<Instruction , 32> &DeadInsts) {
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,275 Lines • ▼ Show 20 Lines	static Constant *getSequentialMask(IRBuilder<> &Builder, unsigned Start,

return ConstantVector::get(Mask);		return ConstantVector::get(Mask);
}		}

/// \brief Lower an interleaved store into a stN intrinsic.		/// \brief Lower an interleaved store into a stN intrinsic.
///		///
/// E.g. Lower an interleaved store (Factor = 3):		/// E.g. Lower an interleaved store (Factor = 3):
/// %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1,		/// %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1,
/// <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>		/// <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
/// store <12 x i32> %i.vec, <12 x i32>* %ptr		/// store <12 x i32> %i.vec, <12 x i32>* %ptr
///		///
/// Into:		/// Into:
/// %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3>		/// %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3>
/// %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7>		/// %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7>
/// %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11>		/// %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11>
/// call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)		/// call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)
///		///
/// Note that the new shufflevectors will be removed and we'll only generate one		/// Note that the new shufflevectors will be removed and we'll only generate one
/// st3 instruction in CodeGen.		/// st3 instruction in CodeGen.
		///
		/// Example for a more general valid mask (Factor 3). Lower:
		/// %i.vec = shuffle <32 x i32> %v0, <32 x i32> %v1,
		/// <4, 32, 16, 5, 33, 17, 6, 34, 18, 7, 35, 19>
		/// store <12 x i32> %i.vec, <12 x i32>* %ptr
		///
		/// Into:
		/// %sub.v0 = shuffle <32 x i32> %v0, <32 x i32> v1, <4, 5, 6, 7>
		/// %sub.v1 = shuffle <32 x i32> %v0, <32 x i32> v1, <32, 33, 34, 35>
		/// %sub.v2 = shuffle <32 x i32> %v0, <32 x i32> v1, <16, 17, 18, 19>
		/// call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)
bool AArch64TargetLowering::lowerInterleavedStore(StoreInst *SI,		bool AArch64TargetLowering::lowerInterleavedStore(StoreInst *SI,
ShuffleVectorInst *SVI,		ShuffleVectorInst *SVI,
unsigned Factor) const {		unsigned Factor) const {
assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&		assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&
"Invalid interleave factor");		"Invalid interleave factor");

VectorType *VecTy = SVI->getType();		VectorType *VecTy = SVI->getType();
assert(VecTy->getVectorNumElements() % Factor == 0 &&		assert(VecTy->getVectorNumElements() % Factor == 0 &&
"Invalid interleaved store");		"Invalid interleaved store");

unsigned NumSubElts = VecTy->getVectorNumElements() / Factor;		unsigned LaneLen = VecTy->getVectorNumElements() / Factor;
Type *EltTy = VecTy->getVectorElementType();		Type *EltTy = VecTy->getVectorElementType();
VectorType *SubVecTy = VectorType::get(EltTy, NumSubElts);		VectorType *SubVecTy = VectorType::get(EltTy, LaneLen);

const DataLayout &DL = SI->getModule()->getDataLayout();		const DataLayout &DL = SI->getModule()->getDataLayout();
unsigned SubVecSize = DL.getTypeSizeInBits(SubVecTy);		unsigned SubVecSize = DL.getTypeSizeInBits(SubVecTy);

// Skip if we do not have NEON and skip illegal vector types.		// Skip if we do not have NEON and skip illegal vector types.
if (!Subtarget->hasNEON() \|\| (SubVecSize != 64 && SubVecSize != 128))		if (!Subtarget->hasNEON() \|\| (SubVecSize != 64 && SubVecSize != 128))
return false;		return false;

Value *Op0 = SVI->getOperand(0);		Value *Op0 = SVI->getOperand(0);
Value *Op1 = SVI->getOperand(1);		Value *Op1 = SVI->getOperand(1);
IRBuilder<> Builder(SI);		IRBuilder<> Builder(SI);

// StN intrinsics don't support pointer vectors as arguments. Convert pointer		// StN intrinsics don't support pointer vectors as arguments. Convert pointer
// vectors to integer vectors.		// vectors to integer vectors.
if (EltTy->isPointerTy()) {		if (EltTy->isPointerTy()) {
Type *IntTy = DL.getIntPtrType(EltTy);		Type *IntTy = DL.getIntPtrType(EltTy);
unsigned NumOpElts =		unsigned NumOpElts =
dyn_cast<VectorType>(Op0->getType())->getVectorNumElements();		dyn_cast<VectorType>(Op0->getType())->getVectorNumElements();

// Convert to the corresponding integer vector.		// Convert to the corresponding integer vector.
Type *IntVecTy = VectorType::get(IntTy, NumOpElts);		Type *IntVecTy = VectorType::get(IntTy, NumOpElts);
Op0 = Builder.CreatePtrToInt(Op0, IntVecTy);		Op0 = Builder.CreatePtrToInt(Op0, IntVecTy);
Op1 = Builder.CreatePtrToInt(Op1, IntVecTy);		Op1 = Builder.CreatePtrToInt(Op1, IntVecTy);

SubVecTy = VectorType::get(IntTy, NumSubElts);		SubVecTy = VectorType::get(IntTy, LaneLen);
}		}

Type *PtrTy = SubVecTy->getPointerTo(SI->getPointerAddressSpace());		Type *PtrTy = SubVecTy->getPointerTo(SI->getPointerAddressSpace());
Type *Tys[2] = {SubVecTy, PtrTy};		Type *Tys[2] = {SubVecTy, PtrTy};
static const Intrinsic::ID StoreInts[3] = {Intrinsic::aarch64_neon_st2,		static const Intrinsic::ID StoreInts[3] = {Intrinsic::aarch64_neon_st2,
Intrinsic::aarch64_neon_st3,		Intrinsic::aarch64_neon_st3,
Intrinsic::aarch64_neon_st4};		Intrinsic::aarch64_neon_st4};
Function *StNFunc =		Function *StNFunc =
Intrinsic::getDeclaration(SI->getModule(), StoreInts[Factor - 2], Tys);		Intrinsic::getDeclaration(SI->getModule(), StoreInts[Factor - 2], Tys);

SmallVector<Value *, 5> Ops;		SmallVector<Value *, 5> Ops;

// Split the shufflevector operands into sub vectors for the new stN call.		// Split the shufflevector operands into sub vectors for the new stN call.
for (unsigned i = 0; i < Factor; i++)		auto Mask = SVI->getShuffleMask();
		for (unsigned i = 0; i < Factor; i++) {
		if (Mask[i] >= 0) {
		Ops.push_back(Builder.CreateShuffleVector(
		Op0, Op1, getSequentialMask(Builder, Mask[i], LaneLen)));
		} else {
		unsigned StartMask = 0;
		for (unsigned j = 1; j < LaneLen; j++) {
		if (Mask[j*Factor + i] >= 0) {
		StartMask = Mask[j*Factor + i] - j;
		break;
		}
		}
		// Note: If all elements in a chunk are undefs, StartMask=0!
		// Note: Filling undef gaps with random elements is ok, since
		// those elements were being written anyway (with undefs).
		// In the case of all undefs we're defaulting to using elems from 0
		// Note: StartMask cannot be negative, it's checked in isReInterleaveMask
Ops.push_back(Builder.CreateShuffleVector(		Ops.push_back(Builder.CreateShuffleVector(
Op0, Op1, getSequentialMask(Builder, NumSubElts * i, NumSubElts)));		Op0, Op1, getSequentialMask(Builder, StartMask, LaneLen)));
		}
		}

Ops.push_back(Builder.CreateBitCast(SI->getPointerOperand(), PtrTy));		Ops.push_back(Builder.CreateBitCast(SI->getPointerOperand(), PtrTy));
Builder.CreateCall(StNFunc, Ops);		Builder.CreateCall(StNFunc, Ops);
return true;		return true;
}		}

static bool memOpAlign(unsigned DstAlign, unsigned SrcAlign,		static bool memOpAlign(unsigned DstAlign, unsigned SrcAlign,
unsigned AlignCheck) {		unsigned AlignCheck) {
▲ Show 20 Lines • Show All 3,279 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,185 Lines • ▼ Show 20 Lines
/// Into:		/// Into:
/// %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3>		/// %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3>
/// %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7>		/// %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7>
/// %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11>		/// %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11>
/// call void llvm.arm.neon.vst3(%ptr, %sub.v0, %sub.v1, %sub.v2, 4)		/// call void llvm.arm.neon.vst3(%ptr, %sub.v0, %sub.v1, %sub.v2, 4)
///		///
/// Note that the new shufflevectors will be removed and we'll only generate one		/// Note that the new shufflevectors will be removed and we'll only generate one
/// vst3 instruction in CodeGen.		/// vst3 instruction in CodeGen.
		///
		/// Example for a more general valid mask (Factor 3). Lower:
		/// %i.vec = shuffle <32 x i32> %v0, <32 x i32> %v1,
		/// <4, 32, 16, 5, 33, 17, 6, 34, 18, 7, 35, 19>
		/// store <12 x i32> %i.vec, <12 x i32>* %ptr
		///
		/// Into:
		/// %sub.v0 = shuffle <32 x i32> %v0, <32 x i32> v1, <4, 5, 6, 7>
		/// %sub.v1 = shuffle <32 x i32> %v0, <32 x i32> v1, <32, 33, 34, 35>
		/// %sub.v2 = shuffle <32 x i32> %v0, <32 x i32> v1, <16, 17, 18, 19>
		/// call void llvm.arm.neon.vst3(%ptr, %sub.v0, %sub.v1, %sub.v2, 4)
bool ARMTargetLowering::lowerInterleavedStore(StoreInst *SI,		bool ARMTargetLowering::lowerInterleavedStore(StoreInst *SI,
ShuffleVectorInst *SVI,		ShuffleVectorInst *SVI,
unsigned Factor) const {		unsigned Factor) const {
assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&		assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&
"Invalid interleave factor");		"Invalid interleave factor");

VectorType *VecTy = SVI->getType();		VectorType *VecTy = SVI->getType();
assert(VecTy->getVectorNumElements() % Factor == 0 &&		assert(VecTy->getVectorNumElements() % Factor == 0 &&
"Invalid interleaved store");		"Invalid interleaved store");

unsigned NumSubElts = VecTy->getVectorNumElements() / Factor;		unsigned LaneLen = VecTy->getVectorNumElements() / Factor;
Type *EltTy = VecTy->getVectorElementType();		Type *EltTy = VecTy->getVectorElementType();
VectorType *SubVecTy = VectorType::get(EltTy, NumSubElts);		VectorType *SubVecTy = VectorType::get(EltTy, LaneLen);

const DataLayout &DL = SI->getModule()->getDataLayout();		const DataLayout &DL = SI->getModule()->getDataLayout();
unsigned SubVecSize = DL.getTypeSizeInBits(SubVecTy);		unsigned SubVecSize = DL.getTypeSizeInBits(SubVecTy);
bool EltIs64Bits = DL.getTypeSizeInBits(EltTy) == 64;		bool EltIs64Bits = DL.getTypeSizeInBits(EltTy) == 64;

// Skip if we do not have NEON and skip illegal vector types and vector types		// Skip if we do not have NEON and skip illegal vector types and vector types
// with i64/f64 elements (vstN doesn't support i64/f64 elements).		// with i64/f64 elements (vstN doesn't support i64/f64 elements).
if (!Subtarget->hasNEON() \|\| (SubVecSize != 64 && SubVecSize != 128) \|\|		if (!Subtarget->hasNEON() \|\| (SubVecSize != 64 && SubVecSize != 128) \|\|
Show All 10 Lines	if (EltTy->isPointerTy()) {
Type *IntTy = DL.getIntPtrType(EltTy);		Type *IntTy = DL.getIntPtrType(EltTy);

// Convert to the corresponding integer vector.		// Convert to the corresponding integer vector.
Type *IntVecTy =		Type *IntVecTy =
VectorType::get(IntTy, Op0->getType()->getVectorNumElements());		VectorType::get(IntTy, Op0->getType()->getVectorNumElements());
Op0 = Builder.CreatePtrToInt(Op0, IntVecTy);		Op0 = Builder.CreatePtrToInt(Op0, IntVecTy);
Op1 = Builder.CreatePtrToInt(Op1, IntVecTy);		Op1 = Builder.CreatePtrToInt(Op1, IntVecTy);

SubVecTy = VectorType::get(IntTy, NumSubElts);		SubVecTy = VectorType::get(IntTy, LaneLen);
}		}

static const Intrinsic::ID StoreInts[3] = {Intrinsic::arm_neon_vst2,		static const Intrinsic::ID StoreInts[3] = {Intrinsic::arm_neon_vst2,
Intrinsic::arm_neon_vst3,		Intrinsic::arm_neon_vst3,
Intrinsic::arm_neon_vst4};		Intrinsic::arm_neon_vst4};
SmallVector<Value *, 6> Ops;		SmallVector<Value *, 6> Ops;

Type *Int8Ptr = Builder.getInt8PtrTy(SI->getPointerAddressSpace());		Type *Int8Ptr = Builder.getInt8PtrTy(SI->getPointerAddressSpace());
Ops.push_back(Builder.CreateBitCast(SI->getPointerOperand(), Int8Ptr));		Ops.push_back(Builder.CreateBitCast(SI->getPointerOperand(), Int8Ptr));

Type *Tys[] = { Int8Ptr, SubVecTy };		Type *Tys[] = { Int8Ptr, SubVecTy };
Function *VstNFunc = Intrinsic::getDeclaration(		Function *VstNFunc = Intrinsic::getDeclaration(
SI->getModule(), StoreInts[Factor - 2], Tys);		SI->getModule(), StoreInts[Factor - 2], Tys);

// Split the shufflevector operands into sub vectors for the new vstN call.		// Split the shufflevector operands into sub vectors for the new vstN call.
for (unsigned i = 0; i < Factor; i++)		auto Mask = SVI->getShuffleMask();
		for (unsigned i = 0; i < Factor; i++) {
		if (Mask[i] >= 0) {
Ops.push_back(Builder.CreateShuffleVector(		Ops.push_back(Builder.CreateShuffleVector(
Op0, Op1, getSequentialMask(Builder, NumSubElts * i, NumSubElts)));		Op0, Op1, getSequentialMask(Builder, Mask[i], LaneLen)));
		} else {
		unsigned StartMask = 0;
		for (unsigned j = 1; j < LaneLen; j++) {
		if (Mask[j*Factor + i] >= 0) {
		StartMask = Mask[j*Factor + i] - j;
		break;
		}
		}
		// Note: If all elements in a chunk are undefs, StartMask=0!
		// Note: Filling undef gaps with random elements is ok, since
		// those elements were being written anyway (with undefs).
		// In the case of all undefs we're defaulting to using elems from 0
		// Note: StartMask cannot be negative, it's checked in isReInterleaveMask
		Ops.push_back(Builder.CreateShuffleVector(
		Op0, Op1, getSequentialMask(Builder, StartMask, LaneLen)));
		}
		}

Ops.push_back(Builder.getInt32(SI->getAlignment()));		Ops.push_back(Builder.getInt32(SI->getAlignment()));
Builder.CreateCall(VstNFunc, Ops);		Builder.CreateCall(VstNFunc, Ops);
return true;		return true;
}		}

enum HABaseType {		enum HABaseType {
HA_UNKNOWN = 0,		HA_UNKNOWN = 0,
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/aarch64-interleaved-accesses.ll

	Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines
	; NONEON-LABEL: load_factor2_with_extract_user:			; NONEON-LABEL: load_factor2_with_extract_user:
	; NONEON-NOT: ld2			; NONEON-NOT: ld2
	define i32 @load_factor2_with_extract_user(<8 x i32>* %a) {			define i32 @load_factor2_with_extract_user(<8 x i32>* %a) {
	%1 = load <8 x i32>, <8 x i32>* %a, align 8			%1 = load <8 x i32>, <8 x i32>* %a, align 8
	%2 = shufflevector <8 x i32> %1, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%2 = shufflevector <8 x i32> %1, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%3 = extractelement <8 x i32> %1, i32 2			%3 = extractelement <8 x i32> %1, i32 2
	ret i32 %3			ret i32 %3
	}			}

				; NEON-LABEL: store_general_mask_factor4:
				; NEON: st4 { v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s }, [x0]
				; NONEON-LABEL: store_general_mask_factor4:
				; NONEON-NOT: st4
				define void @store_general_mask_factor4(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor4_undefbeg:
				; NEON: st4 { v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s }, [x0]
				; NONEON-LABEL: store_general_mask_factor4_undefbeg:
				; NONEON-NOT: st4
				define void @store_general_mask_factor4_undefbeg(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 undef, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor4_undefend:
				; NEON: st4 { v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s }, [x0]
				; NONEON-LABEL: store_general_mask_factor4_undefend:
				; NONEON-NOT: st4
				define void @store_general_mask_factor4_undefend(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 undef>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor4_undefmid:
				; NEON: st4 { v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s }, [x0]
				; NONEON-LABEL: store_general_mask_factor4_undefmid:
				; NONEON-NOT: st4
				define void @store_general_mask_factor4_undefmid(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 undef, i32 32, i32 8, i32 5, i32 17, i32 undef, i32 9>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor4_undefmulti:
				; NEON: st4 { v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s }, [x0]
				; NONEON-LABEL: store_general_mask_factor4_undefmulti:
				; NONEON-NOT: st4
				define void @store_general_mask_factor4_undefmulti(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 undef, i32 undef, i32 8, i32 undef, i32 undef, i32 undef, i32 9>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3:
				; NEON: st3 { v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s }, [x0]
				; NONEON-LABEL: store_general_mask_factor3:
				; NONEON-NOT: st3
				define void @store_general_mask_factor3(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 5, i32 33, i32 17, i32 6, i32 34, i32 18, i32 7, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_undefmultimid:
				; NEON: st3 { v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s }, [x0]
				; NONEON-LABEL: store_general_mask_factor3_undefmultimid:
				; NONEON-NOT: st3
				define void @store_general_mask_factor3_undefmultimid(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 7, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_undef_fail:
				; NEON-NOT: st3
				; NONEON-LABEL: store_general_mask_factor3_undef_fail:
				; NONEON-NOT: st3
				define void @store_general_mask_factor3_undef_fail(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 8, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_undeflane:
				; NEON: st3 { v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s, v{{[0-9]+}}.{{[0-9]+}}s }, [x0]
				; NONEON-LABEL: store_general_mask_factor3_undeflane:
				; NONEON-NOT: st3
				define void @store_general_mask_factor3_undeflane(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_negativestart:
				; NEON-NOT: st3
				; NONEON-LABEL: store_general_mask_factor3_negativestart:
				; NONEON-NOT: st3
				define void @store_general_mask_factor3_negativestart(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 2, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

llvm/trunk/test/CodeGen/ARM/arm-interleaved-accesses.ll

	Show First 20 Lines • Show All 310 Lines • ▼ Show 20 Lines
	; NONEON-LABEL: load_factor2_with_extract_user:			; NONEON-LABEL: load_factor2_with_extract_user:
	; NONEON-NOT: vld2			; NONEON-NOT: vld2
	define i32 @load_factor2_with_extract_user(<8 x i32>* %a) {			define i32 @load_factor2_with_extract_user(<8 x i32>* %a) {
	%1 = load <8 x i32>, <8 x i32>* %a, align 8			%1 = load <8 x i32>, <8 x i32>* %a, align 8
	%2 = shufflevector <8 x i32> %1, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%2 = shufflevector <8 x i32> %1, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%3 = extractelement <8 x i32> %1, i32 2			%3 = extractelement <8 x i32> %1, i32 2
	ret i32 %3			ret i32 %3
	}			}

				; NEON-LABEL: store_general_mask_factor4:
				; NEON: vst4.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor4:
				; NONEON-NOT: vst4.32
				define void @store_general_mask_factor4(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor4_undefbeg:
				; NEON: vst4.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor4_undefbeg:
				; NONEON-NOT: vst4.32
				define void @store_general_mask_factor4_undefbeg(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 undef, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor4_undefend:
				; NEON: vst4.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor4_undefend:
				; NONEON-NOT: vst4.32
				define void @store_general_mask_factor4_undefend(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 undef>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor4_undefmid:
				; NEON: vst4.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor4_undefmid:
				; NONEON-NOT: vst4.32
				define void @store_general_mask_factor4_undefmid(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 undef, i32 32, i32 8, i32 5, i32 17, i32 undef, i32 9>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor4_undefmulti:
				; NEON: vst4.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor4_undefmulti:
				; NONEON-NOT: vst4.32
				define void @store_general_mask_factor4_undefmulti(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <8 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 undef, i32 undef, i32 8, i32 undef, i32 undef, i32 undef, i32 9>
				store <8 x i32> %i.vec, <8 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3:
				; NEON: vst3.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor3:
				; NONEON-NOT: vst3.32
				define void @store_general_mask_factor3(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 5, i32 33, i32 17, i32 6, i32 34, i32 18, i32 7, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_undefmultimid:
				; NEON: vst3.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor3_undefmultimid:
				; NONEON-NOT: vst3.32
				define void @store_general_mask_factor3_undefmultimid(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 7, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_undef_fail:
				; NEON-NOT: vst3.32
				; NONEON-LABEL: store_general_mask_factor3_undef_fail:
				; NONEON-NOT: vst3.32
				define void @store_general_mask_factor3_undef_fail(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 8, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_undeflane:
				; NEON: vst3.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor3_undeflane:
				; NONEON-NOT: vst3.32
				define void @store_general_mask_factor3_undeflane(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_endstart_fail:
				; NEON-NOT: vst3.32
				; NONEON-LABEL: store_general_mask_factor3_endstart_fail:
				; NONEON-NOT: vst3.32
				define void @store_general_mask_factor3_endstart_fail(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 2, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_endstart_pass:
				; NEON: vst3.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor3_endstart_pass:
				; NONEON-NOT: vst3.32
				define void @store_general_mask_factor3_endstart_pass(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 7, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_midstart_fail:
				; NEON-NOT: vst3.32
				; NONEON-LABEL: store_general_mask_factor3_midstart_fail:
				; NONEON-NOT: vst3.32
				define void @store_general_mask_factor3_midstart_fail(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 0, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

				; NEON-LABEL: store_general_mask_factor3_midstart_pass:
				; NEON: vst3.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
				; NONEON-LABEL: store_general_mask_factor3_midstart_pass:
				; NONEON-NOT: vst3.32
				define void @store_general_mask_factor3_midstart_pass(i32* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
				%base = bitcast i32* %ptr to <12 x i32>*
				%i.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 1, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>
				store <12 x i32> %i.vec, <12 x i32>* %base, align 4
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Generalize strided store pattern in interleave access passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 81270

llvm/trunk/lib/CodeGen/InterleavedAccessPass.cpp

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

llvm/trunk/test/CodeGen/AArch64/aarch64-interleaved-accesses.ll

llvm/trunk/test/CodeGen/ARM/arm-interleaved-accesses.ll

Generalize strided store pattern in interleave access pass
ClosedPublic