This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineVectorOps.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
-
insert-trunc.ll
-
PhaseOrdering/X86/
-
X86/
-
vec-load-combine.ll

Differential D138874

[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 3
AbandonedPublic

Authored by spatel on Nov 28 2022, 3:34 PM.

Download Raw Diff

Details

Reviewers

RKSimon
nikic
dmgreen
lebedev.ri

Summary

This enhances the folds from part 1 and part 2 to allow insertion into an arbitrary vector. This means we form a select-shuffle (no cross-lane movement is allowed).

Example proofs with endian diffs:
https://alive2.llvm.org/ce/z/Mqfgt8

We can create a select-shuffle for all targets because targets are expected to be able to lower select-shuffles reasonably. This transform could be generalized further if it was implemented in a target-specific pass (with a cost/legality model).

The transform can result in more instructions than we started with (in the case where the vector size is longer/shorter than the scalar), but I think that's a reasonable trade-off to make the canonicalization more consistent.

This allows removing a pair of instructions from the motivating example from issue #17113, but it is still not the ideal IR/codegen.

Diff Detail

Event Timeline

spatel created this revision.Nov 28 2022, 3:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 3:34 PM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

spatel requested review of this revision.Nov 28 2022, 3:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 3:34 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B199890: Diff 478385.Nov 28 2022, 3:35 PM

spatel added a parent revision: D138873: [InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 2.Nov 28 2022, 3:35 PM

Rebased on top of part 1 ( a4c466766db7 ).

Harbormaster completed remote builds in B200323: Diff 479017.Nov 30 2022, 11:14 AM

We can create a select-shuffle for all targets because targets are expected to be able to lower select-shuffles reasonably.

A perhaps minor point, I don't (think) I have objections to the patch, but I always considered select-shuffles to be somewhat of an x86-ism. I believe there is a special set of instructions for handling them, where the mask is stored as part of the instruction. As far as I understand there usually isn't a truly generic way to lower them efficiently (I'd be interested if there was!), and at worst case needing to resort to either lane moves or a constant mask + and/or. If its only a single lane like all the tests then it would just be an extract+insert, which is simpler.

Generally I would consider shuffles to be complex operations that often have a fairly high cost. Insert and trunc and bitcast are all usually simple.

In D138874#3962771, @dmgreen wrote:

We can create a select-shuffle for all targets because targets are expected to be able to lower select-shuffles reasonably.

A perhaps minor point, I don't (think) I have objections to the patch, but I always considered select-shuffles to be somewhat of an x86-ism. I believe there is a special set of instructions for handling them, where the mask is stored as part of the instruction. As far as I understand there usually isn't a truly generic way to lower them efficiently (I'd be interested if there was!), and at worst case needing to resort to either lane moves or a constant mask + and/or. If its only a single lane like all the tests then it would just be an extract+insert, which is simpler.

Generally I would consider shuffles to be complex operations that often have a fairly high cost. Insert and trunc and bitcast are all usually simple.

That's a good point. x86 does have limited specialized select-shuffles (blends in x86 lingo) depending on which level of SSE/AVX is implemented. Most other SIMD targets have a vector bitwise select (bsl on AArch64 IIRC).
But yes, in the cases here "select-shuffle" is actually an over-specification/misnomer because we're only inserting a single element (what started as the scalar value) into the base vector.

I tried pushing a couple of tests through AArch64 codegen, and see diffs like this:

lsr	x8, x0, #48
mov	v0.h[3], w8
->
fmov	d1, x0
mov	v0.h[3], v1.h[3]

Does that seem neutral? If not, we could try harder to fold back to an insertelt in codegen or convert to a target-dependent transform in VectorCombine instead of a generic fold here.

I tried pushing a couple of tests through AArch64 codegen, and see diffs like this:
lsr	x8, x0, #48
mov	v0.h[3], w8
->
fmov	d1, x0
mov	v0.h[3], v1.h[3]
Does that seem neutral? If not, we could try harder to fold back to an insertelt in codegen or convert to a target-dependent transform in VectorCombine instead of a generic fold here.

That would come down to the difference between shift (cheap) and lane mov (should be cheapish too). I don't think there's a lot in it.

https://godbolt.org/z/haP87afo9 has some other cases from the tests here. bitcast can be awkward if is secretly includes an extend, which is more difficult than it should be for MVE where most vectors are assumed to be 128bit. We've had problem in the past with instcombine transforming shuffles where it isn't helpful, and I think we still have some. Like I said I don't want to block anything, but this doesn't seem very general, and might be better in the backend or to be cost modelled. (I'm not sure we have sensible costs for bitcasts though. They don't often come up from the vectorizers).

So I think the question is - do we treat this as a canonicalization issue (in which case this code can remain in InstCombine) or do we make it a cost driven fold and move it to VectorCombine?

Also, which option will make it easier to address the remaining missing GVN handling?

In D138874#3970326, @RKSimon wrote:

Also, which option will make it easier to address the remaining missing GVN handling?

Having it in InstCombine will definitely be easier than VectorCombine with respect to phase ordering/dependencies on other passes.
In the motivating example, we don't get the folding opportunity until late because it requires inlining to see the pattern. That means we wouldn't do this until near the very end of optimization (and so no subsequent GVN).

I wasn't aware of the SDAG shuffle problems that @dmgreen noted for Thumb/MVE, so I was looking at that a bit closer. Even without this patch, we've already uncovered some awful codegen with the earlier folds like:

define <8 x i16> @low_index_longer_length_poison_basevec_i64(i64 %x) {
  %t = trunc i64 %x to i16
  %r = insertelement <8 x i16> poison, i16 %t, i64 0
  ret <8 x i16> %r
}

$ llc -o - -mtriple=thumbv8.1-m.main -mattr=+mve.fp -float-abi=hard
	vmov.16	q0[0], r0

-->

define <8 x i16> @low_index_longer_length_poison_basevec_i64(i64 %x) {
  %vec.x = bitcast i64 %x to <4 x i16>
  %r = shufflevector <4 x i16> %vec.x, <4 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  ret <8 x i16> %r
}

	sub	sp, #8
	strd	r0, r1, [sp]
	mov	r0, sp
	vldrh.u32	q0, [r0]
	vmov	r2, r3, d0
	vmov	r0, r1, d1
	vmov.16	q0[0], r2
	vmov.16	q0[1], r3
	vmov.16	q0[2], r0
	vmov.16	q0[3], r1
	add	sp, #8

The reason for that is what seems like a bug in SelectionDAGBuilder. It creates these nodes for the bitcast + shuffle sequence:

Creating new node: t5: i64 = build_pair t2, t4
Creating new node: t6: v4i16 = bitcast t5
Creating new node: t7: v4i16 = undef
Creating new node: t8: v8i16 = concat_vectors t6, undef:v4i16

But that's discarding information - the upper 48-bits of the build_pair are zapped to undef by the shuffle in IR, but that's gone with the translation to concat_vectors.
I'll try to fix that.

The good news is that potential regressions like above have been in main for almost a week now, and I haven't seen any bug reports/complaints yet.
So maybe this kind of IR pattern doesn't happen much in real code where it would be noticed.

The good news is that potential regressions like above have been in main for almost a week now, and I haven't seen any bug reports/complaints yet.
So maybe this kind of IR pattern doesn't happen much in real code where it would be noticed.

I had regressions reported from 2 places from those patches. I was looking into fixing those in the backend using a combine of splat(bitcast(buildvector or splat(bitcast(scalar_to_vector to splat (it was having trouble getting the buildvector legal types correct). Transforming trunc+insert to bitcast+shuffle feels like a bit of a strange canonicalization to me. We can probably fix it up in the backend (and the only regressions I've seen have both been unsimplified splats), but trunc+insert seems simpler.

In D138874#3978809, @dmgreen wrote:

The good news is that potential regressions like above have been in main for almost a week now, and I haven't seen any bug reports/complaints yet.
So maybe this kind of IR pattern doesn't happen much in real code where it would be noticed.

I had regressions reported from 2 places from those patches. I was looking into fixing those in the backend using a combine of splat(bitcast(buildvector or splat(bitcast(scalar_to_vector to splat (it was having trouble getting the buildvector legal types correct). Transforming trunc+insert to bitcast+shuffle feels like a bit of a strange canonicalization to me. We can probably fix it up in the backend (and the only regressions I've seen have both been unsimplified splats), but trunc+insert seems simpler.

Thanks for the update - so there has been some fallout.

I agree that trunc+insert is simpler in the basic case. The challenge is that we haven't found any other way to solve the motivating bug. Leaving it to the backend is too late, so we need to convert a chain of inserts into a shuffle in IR to get the optimal result.

This line of patches got us at least partially there. If we convert back to insert at SDAG builder/combine time, that seems like it could mitigate the problems. If that's not feasible, then I think we should revert the preceding patches in this set.

Thanks for the update - so there has been some fallout.

I agree that trunc+insert is simpler in the basic case. The challenge is that we haven't found any other way to solve the motivating bug. Leaving it to the backend is too late, so we need to convert a chain of inserts into a shuffle in IR to get the optimal result.

This line of patches got us at least partially there. If we convert back to insert at SDAG builder/combine time, that seems like it could mitigate the problems. If that's not feasible, then I think we should revert the preceding patches in this set.

Yeah I don't know of a better way to fix it, I'm afraid. There is a quick fix for the regressions I ran into in D139611.

In D138874#3980790, @dmgreen wrote:

Thanks for the update - so there has been some fallout.

I agree that trunc+insert is simpler in the basic case. The challenge is that we haven't found any other way to solve the motivating bug. Leaving it to the backend is too late, so we need to convert a chain of inserts into a shuffle in IR to get the optimal result.

This line of patches got us at least partially there. If we convert back to insert at SDAG builder/combine time, that seems like it could mitigate the problems. If that's not feasible, then I think we should revert the preceding patches in this set.

Yeah I don't know of a better way to fix it, I'm afraid. There is a quick fix for the regressions I ran into in D139611.

After thinking this over again, we should be able to add a more specific peephole that finds the common source op by looking through shifts and casts:
https://alive2.llvm.org/ce/z/_4iTEu
It's bigger than the typical pattern match, but it's not that bad. It could start off very narrow and be generalized in a few steps.
That avoids creating a shuffle, so it sidesteps the backend problems noted here. The question of whether we should canonicalize in the opposite direction from this patch is still open.

After thinking this over again, we should be able to add a more specific peephole that finds the common source op by looking through shifts and casts:
https://alive2.llvm.org/ce/z/_4iTEu
It's bigger than the typical pattern match, but it's not that bad. It could start off very narrow and be generalized in a few steps.
That avoids creating a shuffle, so it sidesteps the backend problems noted here. The question of whether we should canonicalize in the opposite direction from this patch is still open.

I don't want to block any work, don't consider me having any real objection. Some of this feels a little X86-shaped for instcombine, but the motivating example in #17113 seems like it would apply to any architecture, and I haven't seen any cases that can't be fixed up in the backend.

spatel mentioned this in rG286ae63e168b: Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 2".Dec 8 2022, 7:07 AM

spatel mentioned this in rG99254f925185: Revert "[InstCombine] improve efficiency of bool logic; NFC".Dec 8 2022, 11:18 AM

spatel mentioned this in rG05dbdb0088a3: Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1….

spatel mentioned this in D139668: [InstCombine] try to fold a pair of insertelements into one insertelement.Dec 8 2022, 12:44 PM

In D138874#3981419, @dmgreen wrote:

After thinking this over again, we should be able to add a more specific peephole that finds the common source op by looking through shifts and casts:
https://alive2.llvm.org/ce/z/_4iTEu
It's bigger than the typical pattern match, but it's not that bad. It could start off very narrow and be generalized in a few steps.
That avoids creating a shuffle, so it sidesteps the backend problems noted here. The question of whether we should canonicalize in the opposite direction from this patch is still open.

I don't want to block any work, don't consider me having any real objection. Some of this feels a little X86-shaped for instcombine, but the motivating example in #17113 seems like it would apply to any architecture, and I haven't seen any cases that can't be fixed up in the backend.

I agree that this is on the borderline - we're not ready to canonicalize to shuffles more generally. I've reverted the earlier patches and put up an alternative:
D139668

spatel mentioned this in rG4446f71ce392: [InstCombine] try to fold a pair of insertelements into one insertelement.Dec 12 2022, 7:55 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

6 lines

test/

Transforms/

InstCombine/

insert-trunc.ll

178 lines

PhaseOrdering/

X86/

vec-load-combine.ll

28 lines

Diff 479017

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 1,514 Lines • ▼ Show 20 Lines	static Instruction *narrowInsElt(InsertElementInst &InsElt,
return CastInst::Create(CastOpcode, NewInsElt, InsElt.getType());		return CastInst::Create(CastOpcode, NewInsElt, InsElt.getType());
}		}

/// Try to convert scalar extraction ops (shift+trunc) with insertelt to		/// Try to convert scalar extraction ops (shift+trunc) with insertelt to
/// bitcast and shuffle:		/// bitcast and shuffle:
/// inselt V, (lshr (trunc X)), IndexC --> shuffle (bitcast X), V, Mask		/// inselt V, (lshr (trunc X)), IndexC --> shuffle (bitcast X), V, Mask
static Instruction *foldTruncInsElt(InsertElementInst &InsElt, bool IsBigEndian,		static Instruction *foldTruncInsElt(InsertElementInst &InsElt, bool IsBigEndian,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
// inselt undef, (trunc T), IndexC		// inselt V, (trunc T), IndexC
// TODO: Allow any base vector value.
// TODO: The one-use limitation could be removed for some cases (eg, no		// TODO: The one-use limitation could be removed for some cases (eg, no
// extra shuffle is needed and a shift is eliminated).		// extra shuffle is needed and a shift is eliminated).
auto *VTy = dyn_cast<FixedVectorType>(InsElt.getType());		auto *VTy = dyn_cast<FixedVectorType>(InsElt.getType());
Value T, V = InsElt.getOperand(0);		Value T, V = InsElt.getOperand(0);
uint64_t IndexC;		uint64_t IndexC;
if (!VTy \|\| !match(InsElt.getOperand(1), m_OneUse(m_Trunc(m_Value(T)))) \|\|		if (!VTy \|\| !match(InsElt.getOperand(1), m_OneUse(m_Trunc(m_Value(T)))) \|\|
!match(InsElt.getOperand(2), m_ConstantInt(IndexC)) \|\|		!match(InsElt.getOperand(2), m_ConstantInt(IndexC)))
!match(V, m_Undef()))
return nullptr;		return nullptr;

Type *SrcTy = T->getType();		Type *SrcTy = T->getType();
unsigned ScalarWidth = SrcTy->getScalarSizeInBits();		unsigned ScalarWidth = SrcTy->getScalarSizeInBits();
unsigned VecEltWidth = VTy->getScalarSizeInBits();		unsigned VecEltWidth = VTy->getScalarSizeInBits();
if (ScalarWidth % VecEltWidth != 0)		if (ScalarWidth % VecEltWidth != 0)
return nullptr;		return nullptr;

▲ Show 20 Lines • Show All 1,562 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/insert-trunc.ll

Show First 20 Lines • Show All 336 Lines • ▼ Show 20 Lines	;
%s = lshr i64 %x, 40		%s = lshr i64 %x, 40
call void @use64(i64 %s)		call void @use64(i64 %s)
%t = trunc i64 %s to i8		%t = trunc i64 %s to i8
%r = insertelement <4 x i8> poison, i8 %t, i64 2		%r = insertelement <4 x i8> poison, i8 %t, i64 2
ret <4 x i8> %r		ret <4 x i8> %r
}		}

define <4 x i16> @low_index_same_length_basevec(i64 %x, <4 x i16> %v) {		define <4 x i16> @low_index_same_length_basevec(i64 %x, <4 x i16> %v) {
; ALL-LABEL: @low_index_same_length_basevec(		; BE-LABEL: @low_index_same_length_basevec(
; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16		; BE-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 0		; BE-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 0
; ALL-NEXT: ret <4 x i16> [[R]]		; BE-NEXT: ret <4 x i16> [[R]]
		;
		; LE-LABEL: @low_index_same_length_basevec(
		; LE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
		; LE-NEXT: [[R:%.]] = shufflevector <4 x i16> [[VEC_X]], <4 x i16> [[V:%.]], <4 x i32> <i32 0, i32 5, i32 6, i32 7>
		; LE-NEXT: ret <4 x i16> [[R]]
;		;
%t = trunc i64 %x to i16		%t = trunc i64 %x to i16
%r = insertelement <4 x i16> %v, i16 %t, i64 0		%r = insertelement <4 x i16> %v, i16 %t, i64 0
ret <4 x i16> %r		ret <4 x i16> %r
}		}

define <4 x i16> @high_index_same_length_basevec(i64 %x, <4 x i16> %v) {		define <4 x i16> @high_index_same_length_basevec(i64 %x, <4 x i16> %v) {
; ALL-LABEL: @high_index_same_length_basevec(		; BE-LABEL: @high_index_same_length_basevec(
; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16		; BE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 3		; BE-NEXT: [[R:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[VEC_X]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
; ALL-NEXT: ret <4 x i16> [[R]]		; BE-NEXT: ret <4 x i16> [[R]]
		;
		; LE-LABEL: @high_index_same_length_basevec(
		; LE-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16
		; LE-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 3
		; LE-NEXT: ret <4 x i16> [[R]]
;		;
%t = trunc i64 %x to i16		%t = trunc i64 %x to i16
%r = insertelement <4 x i16> %v, i16 %t, i64 3		%r = insertelement <4 x i16> %v, i16 %t, i64 3
ret <4 x i16> %r		ret <4 x i16> %r
}		}

define <4 x i16> @wrong_index_same_length_basevec(i64 %x, <4 x i16> %v) {		define <4 x i16> @wrong_index_same_length_basevec(i64 %x, <4 x i16> %v) {
; ALL-LABEL: @wrong_index_same_length_basevec(		; ALL-LABEL: @wrong_index_same_length_basevec(
; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16		; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 1		; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 1
; ALL-NEXT: ret <4 x i16> [[R]]		; ALL-NEXT: ret <4 x i16> [[R]]
;		;
%t = trunc i64 %x to i16		%t = trunc i64 %x to i16
%r = insertelement <4 x i16> %v, i16 %t, i64 1		%r = insertelement <4 x i16> %v, i16 %t, i64 1
ret <4 x i16> %r		ret <4 x i16> %r
}		}

define <8 x i16> @low_index_longer_length_basevec(i64 %x, <8 x i16> %v) {		define <8 x i16> @low_index_longer_length_basevec(i64 %x, <8 x i16> %v) {
; ALL-LABEL: @low_index_longer_length_basevec(		; BE-LABEL: @low_index_longer_length_basevec(
; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16		; BE-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 0		; BE-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 0
; ALL-NEXT: ret <8 x i16> [[R]]		; BE-NEXT: ret <8 x i16> [[R]]
		;
		; LE-LABEL: @low_index_longer_length_basevec(
		; LE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
		; LE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[VEC_X]], <4 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; LE-NEXT: [[R:%.]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> [[V:%.]], <8 x i32> <i32 0, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; LE-NEXT: ret <8 x i16> [[R]]
;		;
%t = trunc i64 %x to i16		%t = trunc i64 %x to i16
%r = insertelement <8 x i16> %v, i16 %t, i64 0		%r = insertelement <8 x i16> %v, i16 %t, i64 0
ret <8 x i16> %r		ret <8 x i16> %r
}		}

define <8 x i16> @high_index_longer_length_basevec(i64 %x, <8 x i16> %v) {		define <8 x i16> @high_index_longer_length_basevec(i64 %x, <8 x i16> %v) {
; ALL-LABEL: @high_index_longer_length_basevec(		; BE-LABEL: @high_index_longer_length_basevec(
; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16		; BE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
; ALL-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 3		; BE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[VEC_X]], <4 x i16> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; ALL-NEXT: ret <8 x i16> [[R]]		; BE-NEXT: [[R:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 11, i32 4, i32 5, i32 6, i32 7>
		; BE-NEXT: ret <8 x i16> [[R]]
		;
		; LE-LABEL: @high_index_longer_length_basevec(
		; LE-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16
		; LE-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 3
		; LE-NEXT: ret <8 x i16> [[R]]
;		;
%t = trunc i64 %x to i16		%t = trunc i64 %x to i16
%r = insertelement <8 x i16> %v, i16 %t, i64 3		%r = insertelement <8 x i16> %v, i16 %t, i64 3
ret <8 x i16> %r		ret <8 x i16> %r
}		}

define <8 x i16> @wrong_index_longer_length_basevec(i64 %x, <8 x i16> %v) {		define <8 x i16> @wrong_index_longer_length_basevec(i64 %x, <8 x i16> %v) {
; ALL-LABEL: @wrong_index_longer_length_basevec(		; ALL-LABEL: @wrong_index_longer_length_basevec(
; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16		; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 7		; ALL-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 7
; ALL-NEXT: ret <8 x i16> [[R]]		; ALL-NEXT: ret <8 x i16> [[R]]
;		;
%t = trunc i64 %x to i16		%t = trunc i64 %x to i16
%r = insertelement <8 x i16> %v, i16 %t, i64 7		%r = insertelement <8 x i16> %v, i16 %t, i64 7
ret <8 x i16> %r		ret <8 x i16> %r
}		}

define <2 x i16> @low_index_shorter_length_basevec(i64 %x, <2 x i16> %v) {		define <2 x i16> @low_index_shorter_length_basevec(i64 %x, <2 x i16> %v) {
; ALL-LABEL: @low_index_shorter_length_basevec(		; BE-LABEL: @low_index_shorter_length_basevec(
; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16		; BE-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <2 x i16> [[V:%.]], i16 [[T]], i64 0		; BE-NEXT: [[R:%.]] = insertelement <2 x i16> [[V:%.]], i16 [[T]], i64 0
; ALL-NEXT: ret <2 x i16> [[R]]		; BE-NEXT: ret <2 x i16> [[R]]
		;
		; LE-LABEL: @low_index_shorter_length_basevec(
		; LE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
		; LE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[VEC_X]], <4 x i16> poison, <2 x i32> <i32 0, i32 undef>
		; LE-NEXT: [[R:%.]] = shufflevector <2 x i16> [[TMP1]], <2 x i16> [[V:%.]], <2 x i32> <i32 0, i32 3>
		; LE-NEXT: ret <2 x i16> [[R]]
;		;
%t = trunc i64 %x to i16		%t = trunc i64 %x to i16
%r = insertelement <2 x i16> %v, i16 %t, i64 0		%r = insertelement <2 x i16> %v, i16 %t, i64 0
ret <2 x i16> %r		ret <2 x i16> %r
}		}

define <4 x i8> @wrong_index_shorter_length_basevec(i64 %x, <4 x i8> %v) {		define <4 x i8> @wrong_index_shorter_length_basevec(i64 %x, <4 x i8> %v) {
; ALL-LABEL: @wrong_index_shorter_length_basevec(		; ALL-LABEL: @wrong_index_shorter_length_basevec(
; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i8		; ALL-NEXT: [[T:%.]] = trunc i64 [[X:%.]] to i8
; ALL-NEXT: [[R:%.]] = insertelement <4 x i8> [[V:%.]], i8 [[T]], i64 3		; ALL-NEXT: [[R:%.]] = insertelement <4 x i8> [[V:%.]], i8 [[T]], i64 3
; ALL-NEXT: ret <4 x i8> [[R]]		; ALL-NEXT: ret <4 x i8> [[R]]
;		;
%t = trunc i64 %x to i8		%t = trunc i64 %x to i8
%r = insertelement <4 x i8> %v, i8 %t, i64 3		%r = insertelement <4 x i8> %v, i8 %t, i64 3
ret <4 x i8> %r		ret <4 x i8> %r
}		}

define <4 x i16> @lshr_same_length_basevec_le(i64 %x, <4 x i16> %v) {		define <4 x i16> @lshr_same_length_basevec_le(i64 %x, <4 x i16> %v) {
; ALL-LABEL: @lshr_same_length_basevec_le(		; BE-LABEL: @lshr_same_length_basevec_le(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 32		; BE-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 32
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16		; BE-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 2		; BE-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 2
; ALL-NEXT: ret <4 x i16> [[R]]		; BE-NEXT: ret <4 x i16> [[R]]
		;
		; LE-LABEL: @lshr_same_length_basevec_le(
		; LE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
		; LE-NEXT: [[R:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[VEC_X]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
		; LE-NEXT: ret <4 x i16> [[R]]
;		;
%s = lshr i64 %x, 32		%s = lshr i64 %x, 32
%t = trunc i64 %s to i16		%t = trunc i64 %s to i16
%r = insertelement <4 x i16> %v, i16 %t, i64 2		%r = insertelement <4 x i16> %v, i16 %t, i64 2
ret <4 x i16> %r		ret <4 x i16> %r
}		}

define <4 x i16> @lshr_same_length_basevec_be(i64 %x, <4 x i16> %v) {		define <4 x i16> @lshr_same_length_basevec_be(i64 %x, <4 x i16> %v) {
; ALL-LABEL: @lshr_same_length_basevec_be(		; BE-LABEL: @lshr_same_length_basevec_be(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 32		; BE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16		; BE-NEXT: [[R:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[VEC_X]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 1		; BE-NEXT: ret <4 x i16> [[R]]
; ALL-NEXT: ret <4 x i16> [[R]]		;
		; LE-LABEL: @lshr_same_length_basevec_be(
		; LE-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 32
		; LE-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16
		; LE-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 1
		; LE-NEXT: ret <4 x i16> [[R]]
;		;
%s = lshr i64 %x, 32		%s = lshr i64 %x, 32
%t = trunc i64 %s to i16		%t = trunc i64 %s to i16
%r = insertelement <4 x i16> %v, i16 %t, i64 1		%r = insertelement <4 x i16> %v, i16 %t, i64 1
ret <4 x i16> %r		ret <4 x i16> %r
}		}

define <4 x i16> @lshr_same_length_basevec_both_endian(i64 %x, <4 x i16> %v) {		define <4 x i16> @lshr_same_length_basevec_both_endian(i64 %x, <4 x i16> %v) {
; ALL-LABEL: @lshr_same_length_basevec_both_endian(		; BE-LABEL: @lshr_same_length_basevec_both_endian(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 48		; BE-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 48
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16		; BE-NEXT: [[VEC_S:%.*]] = bitcast i64 [[S]] to <4 x i16>
; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 3		; BE-NEXT: [[R:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[VEC_S]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
; ALL-NEXT: ret <4 x i16> [[R]]		; BE-NEXT: ret <4 x i16> [[R]]
		;
		; LE-LABEL: @lshr_same_length_basevec_both_endian(
		; LE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
		; LE-NEXT: [[R:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[VEC_X]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
		; LE-NEXT: ret <4 x i16> [[R]]
;		;
%s = lshr i64 %x, 48		%s = lshr i64 %x, 48
%t = trunc i64 %s to i16		%t = trunc i64 %s to i16
%r = insertelement <4 x i16> %v, i16 %t, i64 3		%r = insertelement <4 x i16> %v, i16 %t, i64 3
ret <4 x i16> %r		ret <4 x i16> %r
}		}

define <4 x i16> @lshr_wrong_index_same_length_basevec(i64 %x, <4 x i16> %v) {		define <4 x i16> @lshr_wrong_index_same_length_basevec(i64 %x, <4 x i16> %v) {
; ALL-LABEL: @lshr_wrong_index_same_length_basevec(		; ALL-LABEL: @lshr_wrong_index_same_length_basevec(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 48		; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 48
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16		; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 1		; ALL-NEXT: [[R:%.]] = insertelement <4 x i16> [[V:%.]], i16 [[T]], i64 1
; ALL-NEXT: ret <4 x i16> [[R]]		; ALL-NEXT: ret <4 x i16> [[R]]
;		;
%s = lshr i64 %x, 48		%s = lshr i64 %x, 48
%t = trunc i64 %s to i16		%t = trunc i64 %s to i16
%r = insertelement <4 x i16> %v, i16 %t, i64 1		%r = insertelement <4 x i16> %v, i16 %t, i64 1
ret <4 x i16> %r		ret <4 x i16> %r
}		}

define <8 x i16> @lshr_longer_length_basevec_le(i64 %x, <8 x i16> %v) {		define <8 x i16> @lshr_longer_length_basevec_le(i64 %x, <8 x i16> %v) {
; ALL-LABEL: @lshr_longer_length_basevec_le(		; BE-LABEL: @lshr_longer_length_basevec_le(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 48		; BE-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 48
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16		; BE-NEXT: [[VEC_S:%.*]] = bitcast i64 [[S]] to <4 x i16>
; ALL-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 3		; BE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[VEC_S]], <4 x i16> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; ALL-NEXT: ret <8 x i16> [[R]]		; BE-NEXT: [[R:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 11, i32 4, i32 5, i32 6, i32 7>
		; BE-NEXT: ret <8 x i16> [[R]]
		;
		; LE-LABEL: @lshr_longer_length_basevec_le(
		; LE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
		; LE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[VEC_X]], <4 x i16> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; LE-NEXT: [[R:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 11, i32 4, i32 5, i32 6, i32 7>
		; LE-NEXT: ret <8 x i16> [[R]]
;		;
%s = lshr i64 %x, 48		%s = lshr i64 %x, 48
%t = trunc i64 %s to i16		%t = trunc i64 %s to i16
%r = insertelement <8 x i16> %v, i16 %t, i64 3		%r = insertelement <8 x i16> %v, i16 %t, i64 3
ret <8 x i16> %r		ret <8 x i16> %r
}		}

define <8 x i16> @lshr_longer_length_basevec_be(i64 %x, <8 x i16> %v) {		define <8 x i16> @lshr_longer_length_basevec_be(i64 %x, <8 x i16> %v) {
; ALL-LABEL: @lshr_longer_length_basevec_be(		; BE-LABEL: @lshr_longer_length_basevec_be(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 32		; BE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16		; BE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[VEC_X]], <4 x i16> poison, <8 x i32> <i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; ALL-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 1		; BE-NEXT: [[R:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[TMP1]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; ALL-NEXT: ret <8 x i16> [[R]]		; BE-NEXT: ret <8 x i16> [[R]]
		;
		; LE-LABEL: @lshr_longer_length_basevec_be(
		; LE-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 32
		; LE-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16
		; LE-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 1
		; LE-NEXT: ret <8 x i16> [[R]]
;		;
%s = lshr i64 %x, 32		%s = lshr i64 %x, 32
%t = trunc i64 %s to i16		%t = trunc i64 %s to i16
%r = insertelement <8 x i16> %v, i16 %t, i64 1		%r = insertelement <8 x i16> %v, i16 %t, i64 1
ret <8 x i16> %r		ret <8 x i16> %r
}		}

define <8 x i16> @lshr_wrong_index_longer_length_basevec(i64 %x, <8 x i16> %v) {		define <8 x i16> @lshr_wrong_index_longer_length_basevec(i64 %x, <8 x i16> %v) {
; ALL-LABEL: @lshr_wrong_index_longer_length_basevec(		; ALL-LABEL: @lshr_wrong_index_longer_length_basevec(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 16		; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 16
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16		; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 6		; ALL-NEXT: [[R:%.]] = insertelement <8 x i16> [[V:%.]], i16 [[T]], i64 6
; ALL-NEXT: ret <8 x i16> [[R]]		; ALL-NEXT: ret <8 x i16> [[R]]
;		;
%s = lshr i64 %x, 16		%s = lshr i64 %x, 16
%t = trunc i64 %s to i16		%t = trunc i64 %s to i16
%r = insertelement <8 x i16> %v, i16 %t, i64 6		%r = insertelement <8 x i16> %v, i16 %t, i64 6
ret <8 x i16> %r		ret <8 x i16> %r
}		}

define <2 x i16> @lshr_shorter_length_basevec_le(i64 %x, <2 x i16> %v) {		define <2 x i16> @lshr_shorter_length_basevec_le(i64 %x, <2 x i16> %v) {
; ALL-LABEL: @lshr_shorter_length_basevec_le(		; BE-LABEL: @lshr_shorter_length_basevec_le(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 16		; BE-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 16
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16		; BE-NEXT: [[T:%.*]] = trunc i64 [[S]] to i16
; ALL-NEXT: [[R:%.]] = insertelement <2 x i16> [[V:%.]], i16 [[T]], i64 1		; BE-NEXT: [[R:%.]] = insertelement <2 x i16> [[V:%.]], i16 [[T]], i64 1
; ALL-NEXT: ret <2 x i16> [[R]]		; BE-NEXT: ret <2 x i16> [[R]]
		;
		; LE-LABEL: @lshr_shorter_length_basevec_le(
		; LE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <4 x i16>
		; LE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[VEC_X]], <4 x i16> poison, <2 x i32> <i32 undef, i32 1>
		; LE-NEXT: [[R:%.]] = shufflevector <2 x i16> [[V:%.]], <2 x i16> [[TMP1]], <2 x i32> <i32 0, i32 3>
		; LE-NEXT: ret <2 x i16> [[R]]
;		;
%s = lshr i64 %x, 16		%s = lshr i64 %x, 16
%t = trunc i64 %s to i16		%t = trunc i64 %s to i16
%r = insertelement <2 x i16> %v, i16 %t, i64 1		%r = insertelement <2 x i16> %v, i16 %t, i64 1
ret <2 x i16> %r		ret <2 x i16> %r
}		}

define <4 x i8> @lshr_shorter_length_basevec_be(i64 %x, <4 x i8> %v) {		define <4 x i8> @lshr_shorter_length_basevec_be(i64 %x, <4 x i8> %v) {
; ALL-LABEL: @lshr_shorter_length_basevec_be(		; BE-LABEL: @lshr_shorter_length_basevec_be(
; ALL-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 48		; BE-NEXT: [[VEC_X:%.]] = bitcast i64 [[X:%.]] to <8 x i8>
; ALL-NEXT: [[T:%.*]] = trunc i64 [[S]] to i8		; BE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i8> [[VEC_X]], <8 x i8> poison, <4 x i32> <i32 undef, i32 1, i32 undef, i32 undef>
; ALL-NEXT: [[R:%.]] = insertelement <4 x i8> [[V:%.]], i8 [[T]], i64 1		; BE-NEXT: [[R:%.]] = shufflevector <4 x i8> [[V:%.]], <4 x i8> [[TMP1]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
; ALL-NEXT: ret <4 x i8> [[R]]		; BE-NEXT: ret <4 x i8> [[R]]
		;
		; LE-LABEL: @lshr_shorter_length_basevec_be(
		; LE-NEXT: [[S:%.]] = lshr i64 [[X:%.]], 48
		; LE-NEXT: [[T:%.*]] = trunc i64 [[S]] to i8
		; LE-NEXT: [[R:%.]] = insertelement <4 x i8> [[V:%.]], i8 [[T]], i64 1
		; LE-NEXT: ret <4 x i8> [[R]]
;		;
%s = lshr i64 %x, 48		%s = lshr i64 %x, 48
%t = trunc i64 %s to i8		%t = trunc i64 %s to i8
%r = insertelement <4 x i8> %v, i8 %t, i64 1		%r = insertelement <4 x i8> %v, i8 %t, i64 1
ret <4 x i8> %r		ret <4 x i8> %r
}		}

define <4 x i8> @lshr_wrong_index_shorter_length_basevec(i64 %x, <4 x i8> %v) {		define <4 x i8> @lshr_wrong_index_shorter_length_basevec(i64 %x, <4 x i8> %v) {
Show All 11 Lines

llvm/test/Transforms/PhaseOrdering/X86/vec-load-combine.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines

	define noundef <4 x float> @ConvertVectors_ByVal(ptr noundef nonnull align 16 dereferenceable(16) %V) #0 {			define noundef <4 x float> @ConvertVectors_ByVal(ptr noundef nonnull align 16 dereferenceable(16) %V) #0 {
	; SSE-LABEL: @ConvertVectors_ByVal(			; SSE-LABEL: @ConvertVectors_ByVal(
	; SSE-NEXT: entry:			; SSE-NEXT: entry:
	; SSE-NEXT: [[V_VAL20:%.]] = load i64, ptr [[V:%.]], align 16			; SSE-NEXT: [[V_VAL20:%.]] = load i64, ptr [[V:%.]], align 16
	; SSE-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[V]], i64 8			; SSE-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[V]], i64 8
	; SSE-NEXT: [[V_VAL421:%.*]] = load i64, ptr [[TMP0]], align 8			; SSE-NEXT: [[V_VAL421:%.*]] = load i64, ptr [[TMP0]], align 8
	; SSE-NEXT: [[VEC_V_VAL20:%.*]] = bitcast i64 [[V_VAL20]] to <2 x i32>			; SSE-NEXT: [[VEC_V_VAL20:%.*]] = bitcast i64 [[V_VAL20]] to <2 x i32>
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[VEC_V_VAL20]], <2 x i32> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[VEC_V_VAL2022:%.*]] = bitcast i64 [[V_VAL20]] to <2 x i32>
	; SSE-NEXT: [[TMP2:%.*]] = lshr i64 [[V_VAL20]], 32			; SSE-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[VEC_V_VAL20]], <2 x i32> [[VEC_V_VAL2022]], <4 x i32> <i32 0, i32 3, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP3:%.*]] = trunc i64 [[TMP2]] to i32			; SSE-NEXT: [[TMP2:%.*]] = trunc i64 [[V_VAL421]] to i32
	; SSE-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[TMP3]], i64 1			; SSE-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[TMP2]], i64 2
	; SSE-NEXT: [[TMP5:%.*]] = trunc i64 [[V_VAL421]] to i32			; SSE-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[TMP2]], i64 3
	; SSE-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i64 2			; SSE-NEXT: [[VECINIT16:%.*]] = bitcast <4 x i32> [[TMP4]] to <4 x float>
	; SSE-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP5]], i64 3
	; SSE-NEXT: [[VECINIT16:%.*]] = bitcast <4 x i32> [[TMP7]] to <4 x float>
	; SSE-NEXT: ret <4 x float> [[VECINIT16]]			; SSE-NEXT: ret <4 x float> [[VECINIT16]]
	;			;
	; AVX-LABEL: @ConvertVectors_ByVal(			; AVX-LABEL: @ConvertVectors_ByVal(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[V_VAL20:%.]] = load i64, ptr [[V:%.]], align 16			; AVX-NEXT: [[V_VAL20:%.]] = load i64, ptr [[V:%.]], align 16
	; AVX-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[V]], i64 8			; AVX-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[V]], i64 8
	; AVX-NEXT: [[V_VAL421:%.*]] = load i64, ptr [[TMP0]], align 8			; AVX-NEXT: [[V_VAL421:%.*]] = load i64, ptr [[TMP0]], align 8
	; AVX-NEXT: [[VEC_V_VAL20:%.*]] = bitcast i64 [[V_VAL20]] to <2 x i32>			; AVX-NEXT: [[VEC_V_VAL20:%.*]] = bitcast i64 [[V_VAL20]] to <2 x i32>
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[VEC_V_VAL20]], <2 x i32> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[VEC_V_VAL2022:%.*]] = bitcast i64 [[V_VAL20]] to <2 x i32>
	; AVX-NEXT: [[TMP2:%.*]] = lshr i64 [[V_VAL20]], 32			; AVX-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[VEC_V_VAL20]], <2 x i32> [[VEC_V_VAL2022]], <4 x i32> <i32 0, i32 3, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP3:%.*]] = trunc i64 [[TMP2]] to i32			; AVX-NEXT: [[TMP2:%.*]] = trunc i64 [[V_VAL421]] to i32
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[TMP3]], i64 1			; AVX-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[TMP2]], i64 2
	; AVX-NEXT: [[TMP5:%.*]] = trunc i64 [[V_VAL421]] to i32			; AVX-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[TMP2]], i64 3
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i64 2			; AVX-NEXT: [[VECINIT16:%.*]] = bitcast <4 x i32> [[TMP4]] to <4 x float>
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP5]], i64 3
	; AVX-NEXT: [[VECINIT16:%.*]] = bitcast <4 x i32> [[TMP7]] to <4 x float>
	; AVX-NEXT: ret <4 x float> [[VECINIT16]]			; AVX-NEXT: ret <4 x float> [[VECINIT16]]
	;			;
	entry:			entry:
	%V.addr = alloca ptr, align 8			%V.addr = alloca ptr, align 8
	%.compoundliteral = alloca <4 x float>, align 16			%.compoundliteral = alloca <4 x float>, align 16
	%ref.tmp = alloca %union.ElementWiseAccess, align 16			%ref.tmp = alloca %union.ElementWiseAccess, align 16
	%ref.tmp2 = alloca %union.ElementWiseAccess, align 16			%ref.tmp2 = alloca %union.ElementWiseAccess, align 16
	%ref.tmp7 = alloca %union.ElementWiseAccess, align 16			%ref.tmp7 = alloca %union.ElementWiseAccess, align 16
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines