This is an archive of the discontinued LLVM Phabricator instance.

Differential D4043

ARMEB: Vector extend operations
ClosedPublic

Authored by cpirker on Jun 6 2014, 6:44 AM.

Download Raw Diff

Details

Reviewers

cpirker

Summary

Hi all,

This patch implements correct vector extensions from <2 x 8>, <2 x 16>, and <2 x 32> to <2 x 64> and <4 x 8>, <4 x 16> to <4 x 32> vector type for big endian.
It applies to both integer and float type vectors. The CLANG vectorizer is likely to generate such vector extensions for array type casts.

Background: LLVM is generating a scalar load for under-sized vectors and simply use this loaded value as vector data.
In particular, LLVM employes the VLD1_LN instruction to load one lane of vector.
This works well in little endian. However in big endian mode, scalar and vector data representation differ.
Please refer to http://llvm.org/docs/BigEndianNEON.html for more information on vectors in big endian.

Fix: Insert corresponding VREV instruction to convert scalar-loaded data into vector representation in big endian mode.

Please review.

Thanks,
Christian

Diff Detail

Event Timeline

cpirker updated this revision to Diff 10179.Jun 6 2014, 6:44 AM

cpirker retitled this revision from to ARMEB: Vector extend operations.

cpirker updated this object.

cpirker edited the test plan for this revision. (Show Details)

cpirker added subscribers: Unknown Object (MLST), Konrad.

Herald added a subscriber: aemerson. · View Herald TranscriptJun 6 2014, 6:44 AM

Hi Christian,

Could you please explain why this is necessary? I'm somewhat confused.

We have canonicalised on using LD1 for vector loads, and our register content is in the form "as if" loaded by an LD1. I therefore do not understand why you need a VREV32 after the LD1. The lane order should be correct, and all you need to do is lengthen.

In fact, we chose LD1 as our form precisely because we didn't want to change the vectorizer!

Cheers,

James

Hi James,

In order to load a 2x8 vector, LLVM generates a ld1.16, to load a 2x16
vector a ld1.32 load is utilized. Therefore VREV instructions are needed
for a correct vector representation in registers.

Cheers,
Conny

Am 2014-06-12 12:01, schrieb James Molloy:

Hi Christian,

Could you please explain why this is necessary? I'm somewhat confused.

We have canonicalised on using LD1 for vector loads, and our register
content is in the form "as if" loaded by an LD1. I therefore do not
understand why you need a VREV32 after the LD1. The lane order should
be correct, and all you need to do is lengthen.

In fact, we chose LD1 as our form precisely because we didn't want
to change the vectorizer!

Cheers,

James

http://reviews.llvm.org/D4043

Hi Konrad,

Thanks for the explanation. So as I understand the problem, LLVM is generating a scalar load for under-sized vectors.

I'd like to see a fuller explanation of what is going on and why in the source code and commit message. The important bits being:

We need to load an under-sized vector.
To do this we need to use a VLD1_LN to load one lane of a vector.
So we need to pretend that we're loading a larger vector element size than we are.
This means we load "as-if" VLDR, and need to perform a REV to get us back right again.

Also, the following testcase doesn't generate the best code sequence:

; CHECK-LABEL: vector_ext_2i32_to_2i64:
; CHECK: vldr [[REG:d[0-9]+]]
; CHECK: vrev64.32 [[REG]], [[REG]]
; CHECK: vmovl.u32 {{q[0-9]+}}, [[REG]]

That VLDR can be a VLD1.32 dX, which means we don't need a REV. Can you please change this? I suspect this only affects extloads from 64-bit types to 128-bit types.

Cheers,

James

Hi James,

Would you please accept http://reviews.llvm.org/D4043 as functional OK,
and would consider any VREV, VLD optimization as separate issue?

Cheers,
Conny

Hi Konrad,

No, I don't think so. As I mentioned in my previous review comment, I would like to see more explanations in the code and commit message before I'm happy.

Also, I believe you're actually editing the code that generates this VLDR/VREV pair now. So I think that for 64-bit to 128-bit vector extloads, you can just use the little-endian version, not predicate it, and make little-endian generate an LD1 instead of a LDR.

Also, I take it this affects AArch64 too?

Cheers,

James

Updated summary and added some source comments.

Reverted the big endian version of the "Lengthen_Single" pattern.
Current version is fine for both little and big endian modes.

Thanks,
Christian

Hi James,

can you please review my third revision.

Thanks,
Christian

I committed this patch as rL211520.

This revision is now accepted and ready to land.Jun 23 2014, 11:52 AM

cpirker closed this revision.Jun 23 2014, 11:52 AM

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMISelLowering.cpp

5 lines

ARMInstrNEON.td

186 lines

test/

CodeGen/

ARM/

big-endian-neon-extend.ll

103 lines

Diff 10179

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,486 Lines • ▼ Show 20 Lines	for (int ByteNum = 0; ByteNum < 8; ++ByteNum) {
Val \|= BitMask;		Val \|= BitMask;
Imm \|= ImmMask;		Imm \|= ImmMask;
} else if ((SplatBits & BitMask) != 0) {		} else if ((SplatBits & BitMask) != 0) {
return SDValue();		return SDValue();
}		}
BitMask <<= 8;		BitMask <<= 8;
ImmMask <<= 1;		ImmMask <<= 1;
}		}

		if (DAG.getTargetLoweringInfo().isBigEndian())
		// swap higher and lower 32 bit word
		Imm = ((Imm & 0xf) << 4) \| ((Imm & 0xf0) >> 4);

// Op=1, Cmode=1110.		// Op=1, Cmode=1110.
OpCmode = 0x1e;		OpCmode = 0x1e;
VT = is128Bits ? MVT::v2i64 : MVT::v1i64;		VT = is128Bits ? MVT::v2i64 : MVT::v1i64;
break;		break;
}		}

default:		default:
llvm_unreachable("unexpected size for isNEONModifiedImm");		llvm_unreachable("unexpected size for isNEONModifiedImm");
▲ Show 20 Lines • Show All 6,290 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrNEON.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,340 Lines • ▼ Show 20 Lines	multiclass Lengthen_Single<string DestLanes, string DestTy, string SrcTy> {

def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)		def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
(!cast<PatFrag>("sextloadvi" # SrcTy) addrmode6:$addr)),		(!cast<PatFrag>("sextloadvi" # SrcTy) addrmode6:$addr)),
(!cast<Instruction>("VMOVLsv" # DestLanes # DestTy)		(!cast<Instruction>("VMOVLsv" # DestLanes # DestTy)
(!cast<Instruction>("VLD1d" # SrcTy) addrmode6:$addr))>;		(!cast<Instruction>("VLD1d" # SrcTy) addrmode6:$addr))>;
}		}
}		}

		multiclass Lengthen_Single_Big_Endian<string DestLanes, string DestTy, string SrcTy> {
		let AddedComplexity = 10 in {
		def _Any : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("extloadvi" # SrcTy) addrmode6:$addr)),
		(!cast<Instruction>("VMOVLuv" # DestLanes # DestTy)
		(!cast<Instruction>("VREV64d" # SrcTy)
		(!cast<Instruction>("VLD1d" # SrcTy) addrmode6:$addr)))>;

		def _Z : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("zextloadvi" # SrcTy) addrmode6:$addr)),
		(!cast<Instruction>("VMOVLuv" # DestLanes # DestTy)
		(!cast<Instruction>("VREV64d" # SrcTy)
		(!cast<Instruction>("VLD1d" # SrcTy) addrmode6:$addr)))>;

		def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("sextloadvi" # SrcTy) addrmode6:$addr)),
		(!cast<Instruction>("VMOVLsv" # DestLanes # DestTy)
		(!cast<Instruction>("VREV64d" # SrcTy)
		(!cast<Instruction>("VLD1d" # SrcTy) addrmode6:$addr)))>;
		}
		}

// extload, zextload and sextload for a lengthening load which only uses		// extload, zextload and sextload for a lengthening load which only uses
// half the lanes available. Example:		// half the lanes available. Example:
// Lengthen_HalfSingle<"4", "i16", "8", "i16", "i8"> =		// Lengthen_HalfSingle<"4", "i16", "8", "i16", "i8"> =
// Pat<(v4i16 (extloadvi8 addrmode6oneL32:$addr)),		// Pat<(v4i16 (extloadvi8 addrmode6oneL32:$addr)),
// (EXTRACT_SUBREG (VMOVLuv8i16 (VLD1LNd32 addrmode6oneL32:$addr,		// (EXTRACT_SUBREG (VMOVLuv8i16 (VLD1LNd32 addrmode6oneL32:$addr,
// (f64 (IMPLICIT_DEF)), (i32 0))),		// (f64 (IMPLICIT_DEF)), (i32 0))),
// dsub_0)>;		// dsub_0)>;
multiclass Lengthen_HalfSingle<string DestLanes, string DestTy, string SrcTy,		multiclass Lengthen_HalfSingle<string DestLanes, string DestTy, string SrcTy,
Show All 10 Lines	def _Z : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
dsub_0)>;		dsub_0)>;
def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)		def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6oneL32:$addr)),		(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6oneL32:$addr)),
(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # InsnLanes # InsnTy)		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # InsnLanes # InsnTy)
(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0))),		(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0))),
dsub_0)>;		dsub_0)>;
}		}

		multiclass Lengthen_HalfSingle_Big_Endian<string DestLanes, string DestTy, string SrcTy,
		string InsnLanes, string InsnTy, string RevLanes> {
		def _Any : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("extloadv" # SrcTy) addrmode6oneL32:$addr)),
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLuv" # InsnLanes # InsnTy)
		(!cast<Instruction>("VREV32d" # RevLanes)
		(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0)>;
		def _Z : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("zextloadv" # SrcTy) addrmode6oneL32:$addr)),
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLuv" # InsnLanes # InsnTy)
		(!cast<Instruction>("VREV32d" # RevLanes)
		(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0)>;
		def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6oneL32:$addr)),
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # InsnLanes # InsnTy)
		(!cast<Instruction>("VREV32d" # RevLanes)
		(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0)>;
		}

// extload, zextload and sextload for a lengthening load followed by another		// extload, zextload and sextload for a lengthening load followed by another
// lengthening load, to quadruple the initial length.		// lengthening load, to quadruple the initial length.
//		//
// Lengthen_Double<"4", "i32", "i8", "8", "i16", "4", "i32"> =		// Lengthen_Double<"4", "i32", "i8", "8", "i16", "4", "i32"> =
// Pat<(v4i32 (extloadvi8 addrmode6oneL32:$addr))		// Pat<(v4i32 (extloadvi8 addrmode6oneL32:$addr))
// (EXTRACT_SUBREG (VMOVLuv4i32		// (EXTRACT_SUBREG (VMOVLuv4i32
// (EXTRACT_SUBREG (VMOVLuv8i16 (VLD1LNd32 addrmode6oneL32:$addr,		// (EXTRACT_SUBREG (VMOVLuv8i16 (VLD1LNd32 addrmode6oneL32:$addr,
// (f64 (IMPLICIT_DEF)),		// (f64 (IMPLICIT_DEF)),
Show All 18 Lines	multiclass Lengthen_Double<string DestLanes, string DestTy, string SrcTy,
def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)		def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6oneL32:$addr)),		(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6oneL32:$addr)),
(!cast<Instruction>("VMOVLsv" # Insn2Lanes # Insn2Ty)		(!cast<Instruction>("VMOVLsv" # Insn2Lanes # Insn2Ty)
(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn1Lanes # Insn1Ty)		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn1Lanes # Insn1Ty)
(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0))),		(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0))),
dsub_0))>;		dsub_0))>;
}		}

		multiclass Lengthen_Double_Big_Endian<string DestLanes, string DestTy, string SrcTy,
		string Insn1Lanes, string Insn1Ty, string Insn2Lanes,
		string Insn2Ty, string RevLanes> {
		def _Any : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("extloadv" # SrcTy) addrmode6oneL32:$addr)),
		(!cast<Instruction>("VMOVLuv" # Insn2Lanes # Insn2Ty)
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLuv" # Insn1Lanes # Insn1Ty)
		(!cast<Instruction>("VREV32d" # RevLanes)
		(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0))>;
		def _Z : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("zextloadv" # SrcTy) addrmode6oneL32:$addr)),
		(!cast<Instruction>("VMOVLuv" # Insn2Lanes # Insn2Ty)
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLuv" # Insn1Lanes # Insn1Ty)
		(!cast<Instruction>("VREV32d" # RevLanes)
		(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0))>;
		def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6oneL32:$addr)),
		(!cast<Instruction>("VMOVLsv" # Insn2Lanes # Insn2Ty)
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn1Lanes # Insn1Ty)
		(!cast<Instruction>("VREV32d" # RevLanes)
		(VLD1LNd32 addrmode6oneL32:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0))>;
		}

// extload, zextload and sextload for a lengthening load followed by another		// extload, zextload and sextload for a lengthening load followed by another
// lengthening load, to quadruple the initial length, but which ends up only		// lengthening load, to quadruple the initial length, but which ends up only
// requiring half the available lanes (a 64-bit outcome instead of a 128-bit).		// requiring half the available lanes (a 64-bit outcome instead of a 128-bit).
//		//
// Lengthen_HalfDouble<"2", "i32", "i8", "8", "i16", "4", "i32"> =		// Lengthen_HalfDouble<"2", "i32", "i8", "8", "i16", "4", "i32"> =
// Pat<(v2i32 (extloadvi8 addrmode6:$addr))		// Pat<(v2i32 (extloadvi8 addrmode6:$addr))
// (EXTRACT_SUBREG (VMOVLuv4i32		// (EXTRACT_SUBREG (VMOVLuv4i32
// (EXTRACT_SUBREG (VMOVLuv8i16 (VLD1LNd16 addrmode6:$addr,		// (EXTRACT_SUBREG (VMOVLuv8i16 (VLD1LNd16 addrmode6:$addr,
Show All 21 Lines	def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6:$addr)),		(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6:$addr)),
(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn2Lanes # Insn2Ty)		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn2Lanes # Insn2Ty)
(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn1Lanes # Insn1Ty)		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn1Lanes # Insn1Ty)
(VLD1LNd16 addrmode6:$addr, (f64 (IMPLICIT_DEF)), (i32 0))),		(VLD1LNd16 addrmode6:$addr, (f64 (IMPLICIT_DEF)), (i32 0))),
dsub_0)),		dsub_0)),
dsub_0)>;		dsub_0)>;
}		}

		multiclass Lengthen_HalfDouble_Big_Endian<string DestLanes, string DestTy, string SrcTy,
		string Insn1Lanes, string Insn1Ty, string Insn2Lanes,
		string Insn2Ty> {
		def _Any : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("extloadv" # SrcTy) addrmode6:$addr)),
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLuv" # Insn2Lanes # Insn2Ty)
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLuv" # Insn1Lanes # Insn1Ty)
		(!cast<Instruction>("VREV16d8")
		(VLD1LNd16 addrmode6:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0)),
		dsub_0)>;
		def _Z : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("zextloadv" # SrcTy) addrmode6:$addr)),
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLuv" # Insn2Lanes # Insn2Ty)
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLuv" # Insn1Lanes # Insn1Ty)
		(!cast<Instruction>("VREV16d8")
		(VLD1LNd16 addrmode6:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0)),
		dsub_0)>;
		def _S : Pat<(!cast<ValueType>("v" # DestLanes # DestTy)
		(!cast<PatFrag>("sextloadv" # SrcTy) addrmode6:$addr)),
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn2Lanes # Insn2Ty)
		(EXTRACT_SUBREG (!cast<Instruction>("VMOVLsv" # Insn1Lanes # Insn1Ty)
		(!cast<Instruction>("VREV16d8")
		(VLD1LNd16 addrmode6:$addr, (f64 (IMPLICIT_DEF)), (i32 0)))),
		dsub_0)),
		dsub_0)>;
		}

		let Predicates = [IsLE] in {
defm : Lengthen_Single<"8", "i16", "8">; // v8i8 -> v8i16		defm : Lengthen_Single<"8", "i16", "8">; // v8i8 -> v8i16
defm : Lengthen_Single<"4", "i32", "16">; // v4i16 -> v4i32		defm : Lengthen_Single<"4", "i32", "16">; // v4i16 -> v4i32
defm : Lengthen_Single<"2", "i64", "32">; // v2i32 -> v2i64		defm : Lengthen_Single<"2", "i64", "32">; // v2i32 -> v2i64

defm : Lengthen_HalfSingle<"4", "i16", "i8", "8", "i16">; // v4i8 -> v4i16		defm : Lengthen_HalfSingle<"4", "i16", "i8", "8", "i16">; // v4i8 -> v4i16
defm : Lengthen_HalfSingle<"2", "i32", "i16", "4", "i32">; // v2i16 -> v2i32		defm : Lengthen_HalfSingle<"2", "i32", "i16", "4", "i32">; // v2i16 -> v2i32

// Double lengthening - v4i8 -> v4i16 -> v4i32		// Double lengthening - v4i8 -> v4i16 -> v4i32
defm : Lengthen_Double<"4", "i32", "i8", "8", "i16", "4", "i32">;		defm : Lengthen_Double<"4", "i32", "i8", "8", "i16", "4", "i32">;
// v2i8 -> v2i16 -> v2i32		// v2i8 -> v2i16 -> v2i32
defm : Lengthen_HalfDouble<"2", "i32", "i8", "8", "i16", "4", "i32">;		defm : Lengthen_HalfDouble<"2", "i32", "i8", "8", "i16", "4", "i32">;
// v2i16 -> v2i32 -> v2i64		// v2i16 -> v2i32 -> v2i64
defm : Lengthen_Double<"2", "i64", "i16", "4", "i32", "2", "i64">;		defm : Lengthen_Double<"2", "i64", "i16", "4", "i32", "2", "i64">;
		}

		let Predicates = [IsBE] in {
		defm : Lengthen_Single_Big_Endian<"8", "i16", "8">; // v8i8 -> v8i16
		defm : Lengthen_Single_Big_Endian<"4", "i32", "16">; // v4i16 -> v4i32
		defm : Lengthen_Single_Big_Endian<"2", "i64", "32">; // v2i32 -> v2i64

		defm : Lengthen_HalfSingle_Big_Endian<"4", "i16", "i8", "8", "i16", "8">; // v4i8 -> v4i16
		defm : Lengthen_HalfSingle_Big_Endian<"2", "i32", "i16", "4", "i32", "16">; // v2i16 -> v2i32

		// Double lengthening - v4i8 -> v4i16 -> v4i32
		defm : Lengthen_Double_Big_Endian<"4", "i32", "i8", "8", "i16", "4", "i32", "8">;
		// v2i8 -> v2i16 -> v2i32
		defm : Lengthen_HalfDouble_Big_Endian<"2", "i32", "i8", "8", "i16", "4", "i32">;
		// v2i16 -> v2i32 -> v2i64
		defm : Lengthen_Double_Big_Endian<"2", "i64", "i16", "4", "i32", "2", "i64", "16">;
		}

// Triple lengthening - v2i8 -> v2i16 -> v2i32 -> v2i64		// Triple lengthening - v2i8 -> v2i16 -> v2i32 -> v2i64
		let Predicates = [IsLE] in {
def : Pat<(v2i64 (extloadvi8 addrmode6:$addr)),		def : Pat<(v2i64 (extloadvi8 addrmode6:$addr)),
(VMOVLuv2i64 (EXTRACT_SUBREG (VMOVLuv4i32 (EXTRACT_SUBREG (VMOVLuv8i16		(VMOVLuv2i64 (EXTRACT_SUBREG (VMOVLuv4i32 (EXTRACT_SUBREG (VMOVLuv8i16
(VLD1LNd16 addrmode6:$addr,		(VLD1LNd16 addrmode6:$addr,
(f64 (IMPLICIT_DEF)), (i32 0))), dsub_0)), dsub_0))>;		(f64 (IMPLICIT_DEF)), (i32 0))), dsub_0)), dsub_0))>;
def : Pat<(v2i64 (zextloadvi8 addrmode6:$addr)),		def : Pat<(v2i64 (zextloadvi8 addrmode6:$addr)),
(VMOVLuv2i64 (EXTRACT_SUBREG (VMOVLuv4i32 (EXTRACT_SUBREG (VMOVLuv8i16		(VMOVLuv2i64 (EXTRACT_SUBREG (VMOVLuv4i32 (EXTRACT_SUBREG (VMOVLuv8i16
(VLD1LNd16 addrmode6:$addr,		(VLD1LNd16 addrmode6:$addr,
(f64 (IMPLICIT_DEF)), (i32 0))), dsub_0)), dsub_0))>;		(f64 (IMPLICIT_DEF)), (i32 0))), dsub_0)), dsub_0))>;
def : Pat<(v2i64 (sextloadvi8 addrmode6:$addr)),		def : Pat<(v2i64 (sextloadvi8 addrmode6:$addr)),
(VMOVLsv2i64 (EXTRACT_SUBREG (VMOVLsv4i32 (EXTRACT_SUBREG (VMOVLsv8i16		(VMOVLsv2i64 (EXTRACT_SUBREG (VMOVLsv4i32 (EXTRACT_SUBREG (VMOVLsv8i16
(VLD1LNd16 addrmode6:$addr,		(VLD1LNd16 addrmode6:$addr,
(f64 (IMPLICIT_DEF)), (i32 0))), dsub_0)), dsub_0))>;		(f64 (IMPLICIT_DEF)), (i32 0))), dsub_0)), dsub_0))>;
		}
		let Predicates = [IsBE] in {
		def : Pat<(v2i64 (extloadvi8 addrmode6:$addr)),
		(VMOVLuv2i64 (EXTRACT_SUBREG (VMOVLuv4i32 (EXTRACT_SUBREG (VMOVLuv8i16
		(!cast<Instruction>("VREV16d8")
		(VLD1LNd16 addrmode6:$addr,
		(f64 (IMPLICIT_DEF)), (i32 0)))), dsub_0)), dsub_0))>;
		def : Pat<(v2i64 (zextloadvi8 addrmode6:$addr)),
		(VMOVLuv2i64 (EXTRACT_SUBREG (VMOVLuv4i32 (EXTRACT_SUBREG (VMOVLuv8i16
		(!cast<Instruction>("VREV16d8")
		(VLD1LNd16 addrmode6:$addr,
		(f64 (IMPLICIT_DEF)), (i32 0)))), dsub_0)), dsub_0))>;
		def : Pat<(v2i64 (sextloadvi8 addrmode6:$addr)),
		(VMOVLsv2i64 (EXTRACT_SUBREG (VMOVLsv4i32 (EXTRACT_SUBREG (VMOVLsv8i16
		(!cast<Instruction>("VREV16d8")
		(VLD1LNd16 addrmode6:$addr,
		(f64 (IMPLICIT_DEF)), (i32 0)))), dsub_0)), dsub_0))>;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Assembler aliases		// Assembler aliases
//		//

def : VFP2InstAlias<"fmdhr${p} $Dd, $Rn",		def : VFP2InstAlias<"fmdhr${p} $Dd, $Rn",
(VSETLNi32 DPR:$Dd, GPR:$Rn, 1, pred:$p)>;		(VSETLNi32 DPR:$Dd, GPR:$Rn, 1, pred:$p)>;
def : VFP2InstAlias<"fmdlr${p} $Dd, $Rn",		def : VFP2InstAlias<"fmdlr${p} $Dd, $Rn",
▲ Show 20 Lines • Show All 1,021 Lines • Show Last 20 Lines

test/CodeGen/ARM/big-endian-neon-extend.ll

				; RUN: llc < %s -mtriple armeb-eabi -mattr v7,neon -o - \| FileCheck %s

				define void @vector_ext_2i8_to_2i64( <2 x i8>* %loadaddr, <2 x i64>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_2i8_to_2i64:
				; CHECK: vld1.16 {[[REG:d[0-9]+]]
				; CHECK: vmov.i64 {{q[0-9]+}}, #0xff
				; CHECK: vrev16.8 [[REG]], [[REG]]
				; CHECK: vmovl.u8 {{q[0-9]+}}, [[REG]]
				%1 = load <2 x i8>* %loadaddr
				%2 = zext <2 x i8> %1 to <2 x i64>
				store <2 x i64> %2, <2 x i64>* %storeaddr
				ret void
				}

				define void @vector_ext_2i16_to_2i64( <2 x i16>* %loadaddr, <2 x i64>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_2i16_to_2i64:
				; CHECK: vld1.32 {[[REG:d[0-9]+]]
				; CHECK: vmov.i64 {{q[0-9]+}}, #0xffff
				; CHECK: vrev32.16 [[REG]], [[REG]]
				; CHECK: vmovl.u16 {{q[0-9]+}}, [[REG]]
				%1 = load <2 x i16>* %loadaddr
				%2 = zext <2 x i16> %1 to <2 x i64>
				store <2 x i64> %2, <2 x i64>* %storeaddr
				ret void
				}

				define void @vector_ext_2i32_to_2i64( <2 x i32>* %loadaddr, <2 x i64>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_2i32_to_2i64:
				; CHECK: vldr [[REG:d[0-9]+]]
				; CHECK: vrev64.32 [[REG]], [[REG]]
				; CHECK: vmovl.u32 {{q[0-9]+}}, [[REG]]
				%1 = load <2 x i32>* %loadaddr
				%2 = zext <2 x i32> %1 to <2 x i64>
				store <2 x i64> %2, <2 x i64>* %storeaddr
				ret void
				}


				define void @vector_ext_2i8_to_2i32( <2 x i8>* %loadaddr, <2 x i32>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_2i8_to_2i32:
				; CHECK: vld1.16 {[[REG:d[0-9]+]]
				; CHECK: vrev16.8 [[REG]], [[REG]]
				%1 = load <2 x i8>* %loadaddr
				%2 = zext <2 x i8> %1 to <2 x i32>
				store <2 x i32> %2, <2 x i32>* %storeaddr
				ret void
				}

				define void @vector_ext_2i16_to_2i32( <2 x i16>* %loadaddr, <2 x i32>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_2i16_to_2i32:
				; CHECK: vld1.32 {[[REG:d[0-9]+]]
				; CHECK: vrev32.16 [[REG]], [[REG]]
				; CHECK: vmovl.u16 {{q[0-9]+}}, [[REG]]
				%1 = load <2 x i16>* %loadaddr
				%2 = zext <2 x i16> %1 to <2 x i32>
				store <2 x i32> %2, <2 x i32>* %storeaddr
				ret void
				}

				define void @vector_ext_2i8_to_2i16( <2 x i8>* %loadaddr, <2 x i16>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_2i8_to_2i16:
				; CHECK: vld1.16 {[[REG:d[0-9]+]]
				; CHECK: vrev16.8 [[REG]], [[REG]]
				; CHECK: vmovl.u8 {{q[0-9]+}}, [[REG]]
				%1 = load <2 x i8>* %loadaddr
				%2 = zext <2 x i8> %1 to <2 x i16>
				store <2 x i16> %2, <2 x i16>* %storeaddr
				ret void
				}

				define void @vector_ext_4i8_to_4i32( <4 x i8>* %loadaddr, <4 x i32>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_4i8_to_4i32:
				; CHECK: vld1.32 {[[REG:d[0-9]+]]
				; CHECK: vrev32.8 [[REG]], [[REG]]
				; CHECK: vmovl.u8 {{q[0-9]+}}, [[REG]]
				%1 = load <4 x i8>* %loadaddr
				%2 = zext <4 x i8> %1 to <4 x i32>
				store <4 x i32> %2, <4 x i32>* %storeaddr
				ret void
				}

				define void @vector_ext_4i16_to_4i32( <4 x i16>* %loadaddr, <4 x i32>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_4i16_to_4i32:
				; CHECK: vldr [[REG:d[0-9]+]]
				; CHECK: vrev64.16 [[REG]], [[REG]]
				; CHECK: vmovl.u16 {{q[0-9]+}}, [[REG]]
				%1 = load <4 x i16>* %loadaddr
				%2 = zext <4 x i16> %1 to <4 x i32>
				store <4 x i32> %2, <4 x i32>* %storeaddr
				ret void
				}

				define void @vector_ext_4i8_to_4i16( <4 x i8>* %loadaddr, <4 x i16>* %storeaddr ) {
				; CHECK-LABEL: vector_ext_4i8_to_4i16:
				; CHECK: vld1.32 {[[REG:d[0-9]+]]
				; CHECK: vrev32.8 [[REG]], [[REG]]
				; CHECK: vmovl.u8 {{q[0-9]+}}, [[REG]]
				%1 = load <4 x i8>* %loadaddr
				%2 = zext <4 x i8> %1 to <4 x i16>
				store <4 x i16> %2, <4 x i16>* %storeaddr
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

ARMEB: Vector extend operationsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 10179

lib/Target/ARM/ARMISelLowering.cpp

lib/Target/ARM/ARMInstrNEON.td

test/CodeGen/ARM/big-endian-neon-extend.ll

ARMEB: Vector extend operations
ClosedPublic