This is an archive of the discontinued LLVM Phabricator instance.

Differential D3651

ARM: Implement big endian bit-conversion for NEON types
ClosedPublic

Authored by cpirker on May 7 2014, 7:48 AM.

Download Raw Diff

Details

Reviewers

jmolloy

Summary

Hi All,

This patch enables correct conversion between integer and vector data in big endian mode (data format differs wrt. endian mode).
The patch covers the conversion between floating point and vector types, and the function argument passing of vector types providing compatibility to the ARM ABI.
Big endian bit-conversions are implemented using VREV64 and VREV32 instructions.

Thanks,
Christian

Diff Detail

Event Timeline

cpirker updated this revision to Diff 9173.May 7 2014, 7:48 AM

cpirker retitled this revision from to ARM: Implement big endian bit-conversion for NEON types.

cpirker updated this object.

cpirker edited the test plan for this revision. (Show Details)

cpirker added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptMay 7 2014, 7:48 AM

cpirker added a subscriber: Konrad.May 7 2014, 8:17 AM

Hi Christian,

Generally this looks good and I have no real problems with it, just a few small nits.

The patch covers the conversion between floating point and vector types, and the function argument passing of vector types providing compatibility to the ARM ABI.

It doesn't look like this patch does anything for argument passing or lowering formal arguments. Is this going to follow in another patch or do you think there's nothing to do here?

Cheers,

James

lib/Target/ARM/ARMFastISel.cpp
192	This function seems unused?
lib/Target/ARM/ARMISelLowering.cpp
3827	This seems too complex for a ternary - I'd prefer to see it written as an if/else.
lib/Target/ARM/ARMInstrNEON.td
2369	Why are these not allowed in big-endian mode? and why only these patterns?
6261	These look fine, but what was your methodology for generating them? Did you do them all by hand, or use some script?

jmolloy added a reviewer: jmolloy.May 7 2014, 8:24 AM

Hi James,

Thanks for your feedback.

It doesn't look like this patch does anything for argument passing or lowering formal arguments. Is this going to follow in another patch or do you think there's nothing to do here?

The patch converts vector arguments into integer format (and vice versa) when passing vectors as function arguments, as required by the ABI.

Thanks,
Christian

lib/Target/ARM/ARMFastISel.cpp
192	This function is required to evaluate the [IsBE] predicate for the bitconversion rules that specify the VREV instructions.
lib/Target/ARM/ARMISelLowering.cpp
3827	Yes, if/else would be more readable.
lib/Target/ARM/ARMInstrNEON.td
2369	v2f64 is not compatible with v4i32 in big endian mode (a vrev instruction would be needed). All other patterns like v8i8,... are already disabled in big endian mode. Not sure why this specific rule was left active for both endian modes.
6261	I did it by hand, based on similar patch for AArch64 (D3424).

Hi Christian,

The patch converts vector arguments into integer format (and vice versa) when passing vectors as function arguments, as required by the ABI.

Whereabouts, exactly? I see no changes to the calling convention .td or the calling convention parts of ISel lowering.

Cheers,

James

lib/Target/ARM/ARMFastISel.cpp
192	But that predicate isn't defined anywhere in this patch.
lib/Target/ARM/ARMInstrNEON.td
2369	Ah yes, I see. Thanks.

Hi James,

Whereabouts, exactly? I see no changes to the calling convention .td or the calling convention parts of ISel lowering.

These conversions are done implicitly when fitting a vector value into a GPR(s), therefore no changes in the calling convetions .td.

Thanks,
Christian

lib/Target/ARM/ARMFastISel.cpp
192	The [IsBE] predicate is defined in ARMInstrInfo.td (see line 305).

These conversions are done implicitly when fitting a vector value into a GPR(s), therefore no changes in the calling convetions .td.

But vectors don't get implicitly moved to GPRs unless you're compiling soft-float. Are you only testing softfloat, not hardfloat?

Cheers,

James

lib/Target/ARM/ARMFastISel.cpp
192	I see. Strange that this hasn't been required before now - is it that you haven't tested with FastISel before this patch? The * should go nearest the function name: const TargetLowering *getTargetLowering().

Hi James,

But vectors don't get implicitly moved to GPRs unless you're compiling soft-float. Are you only testing softfloat, not hardfloat?

All testing includes both soft and hard-float mode.

Thanks,
Christian

Hi James,

I updated the patch:

"*" nearest the function name in "lib/Target/ARM/ARMFastISel.cpp"
Use i/else in "lib/Target/ARM/ARMISelLowering.cpp"

Thanks,
Christian

Hi James,

The mentioned ABI related vector conversions are applicable in soft-mode only.
In hard-mode vectors are passed using the vector registers. In such a case these conversions are triggered by BITCAST operations only.

Thanks,
Christian

Hi all,

I updated the patch with two updated test files.

Thanks,
Christian

Hi Christian,

Sorry, must have missed your comment first time round.

In hard-mode vectors are passed using the vector registers. In such a case these conversions are triggered by BITCAST operations only.

Indeed - this is what I was getting at. So where are those BITCASTs inserted in your patch? They aren't, and I don't think your patch will work in hard float mode. So do you want to make a followon patch to work for hard-float mode, or do you want me to treat this patch as working for hard-float mode as well (as you mentioned in an earlier comment)?

Cheers,
James

Hi James,

Indeed - this is what I was getting at. So where are those BITCASTs inserted in your patch? They aren't, and I don't think your patch will work in hard float mode. So do you want to make a followon patch to work for hard-float mode, or do you want me to treat this patch as working for hard-float mode as well (as you mentioned in an earlier comment)?

The "bitconvert" patterns in this patch are valid for both soft and hard float modes.
However in hard float there will be less integer to vector and vice versa converts as generated by LLVM.

Thanks,
Christian

Hi Christian,

This is still not what I'm getting at. The A32 ABI says that vectors passed over function call boundaries in vector registers (which happens in hard float mode) must be passed in a form as if they were loaded by a VLDM. From the ABI:

"""A 128-bit containerized vector type is passed as if it were loaded from its

memory format into a 128-bit vector register (Qn) with a single VLDM of the
two component 64-bit vector registers (for example, VLDM r0,{d2,d3} would
load q1)"""

Your patch does not address this. Vectors will be passed with incompatible types across ABI boundaries.

James

Hi James,

you are right, vector passing in VLDM format is not addressed by this patch.

Thanks,
Christian

Hi Christian,

OK, we're finally on the same page :) So do you want to address this as part of this patch, or leave it for a followup?

Cheers,

James

Hi James,

I will do this in another patch.

Thanks,
Christian

Hi James,

I checked the following testcase:

llc -march armeb -mtriple=arm-eabi -mattr v7,neon -float-abi=hard

with:

define void @test( <4 x i32> %var, <4 x i32>* %store ) {
  store <4 x i32> %var, <4 x i32>* %store
  ret void
}

The generated code is as follows (both for le and be):

vst1.64 {d0, d1}, [r0:128]
bx lr

I believe that is ABI compliant. please let me know if you think otherwise.

Thanks,
Christian

Hi Christian,

OK, the patch looks good as-is. It really needs a hard-float testcase - otherwise someone could change the behaviour that seems to accidentally work for BE and regress BE without us noticing.

I'll leave it up to you whether to change this patch or to do it in a followup, though.

Thanks!

James

This revision is now accepted and ready to land.May 12 2014, 2:28 AM

Hi James,

I added the hard-float testcase to the test file "test/CodeGen/ARM/big-endian-neon-bitconv.ll".
I committed this patch as rL208538.

Thanks,
Christian

Hi Christian,

Thanks, but I'm afraid that's not sufficient. That testcase has all arguments passed as pointers, which it then loads. It does not test vector argument passing. See for example test/CodeGen/ARM64/big-endian-callee.ll and test/CodeGen/ARM64/big-endian-caller.ll.

Please could you add an equivalent testcase for ARM.

Cheers,

James

-----Original Message-----
From: Christian Pirker [mailto:cpirker@a-bix.com]
Sent: 12 May 2014 12:44
To: cpirker@a-bix.com; James Molloy
Cc: kanheim@a-bix.com; Amara Emerson; llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] ARM: Implement big endian bit-conversion for NEON
types

Hi James,

I added the hard-float testcase to the test file "test/CodeGen/ARM/big-
endian-neon-bitconv.ll".
I committed this patch as rL208538.

Thanks,
Christian

http://reviews.llvm.org/D3651

Hi Christian,

I've taken a closer look, and it looks like this should actually Just Work (tm).

It turns out all vector types are explicitly bitcast to v2f64 in ARMCallingConv.td:

// Handle all vector types as either f64 or v2f64.
CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType<f64>>,
CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType<v2f64>>,

CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>,

I'm not sure why it was decided to implement that way (happened in r73095, or approximately the stone age, when Anton landed initial hard float support).

V2f64 is also coincidentally exactly the right type for proper parameter passing on Big Endian, and so yes, this should just work for hardfloat.

I'll re-review the rest of your patch now.

Cheers,

James

-----Original Message-----
From: Christian Pirker [mailto:cpirker@a-bix.com]
Sent: 12 May 2014 09:52
To: cpirker@a-bix.com; James Molloy
Cc: kanheim@a-bix.com; Amara Emerson; llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] ARM: Implement big endian bit-conversion for NEON
types

Hi James,

I checked the following testcase:
llc -march armeb -mtriple=arm-eabi -mattr v7,neon -float-abi=hard
with:
define void @test( <4 x i32> %var, <4 x i32>* %store ) {
  store <4 x i32> %var, <4 x i32>* %store
  ret void
}
The generated code is as follows (both for le and be):
vst1.64 {d0, d1}, [r0:128]
bx lr
I believe that is ABI compliant. please let me know if you think otherwise.

Thanks,
Christian

http://reviews.llvm.org/D3651

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782

Hi Christian,

Please don't be confused by this email - I "sent" it much earlier today, as justification for why I accepted your Phab review.

It turns out that my work inbox was full so the email silently didn't send until I'd cleared it out.

Sorry for any confusion,

James

-----Original Message-----
From: James Molloy [mailto:james.molloy@arm.com]
Sent: 12 May 2014 13:26
To: cpirker@a-bix.com; James Molloy
Cc: kakaka@akakaka.com; kanheim@a-bix.com; Amara Emerson; llvm-
commits@cs.uiuc.edu
Subject: Re: [PATCH] ARM: Implement big endian bit-conversion for NEON
types

Hi Christian,

I've taken a closer look, and it looks like this should actually Just Work (tm).

It turns out all vector types are explicitly bitcast to v2f64 in
ARMCallingConv.td:
// Handle all vector types as either f64 or v2f64.
CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType<f64>>,
CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType<v2f64>>,

CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>,
I'm not sure why it was decided to implement that way (happened in r73095,
or approximately the stone age, when Anton landed initial hard float
support).

V2f64 is also coincidentally exactly the right type for proper parameter
passing on Big Endian, and so yes, this should just work for hardfloat.

I'll re-review the rest of your patch now.

Cheers,

James
-----Original Message-----
From: Christian Pirker [mailto:cpirker@a-bix.com]
Sent: 12 May 2014 09:52
To: cpirker@a-bix.com; James Molloy
Cc: kanheim@a-bix.com; Amara Emerson; llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] ARM: Implement big endian bit-conversion for NEON
types

Hi James,

I checked the following testcase:
llc -march armeb -mtriple=arm-eabi -mattr v7,neon -float-abi=hard
with:
define void @test( <4 x i32> %var, <4 x i32>* %store ) {
  store <4 x i32> %var, <4 x i32>* %store
  ret void
}
The generated code is as follows (both for le and be):
vst1.64 {d0, d1}, [r0:128]
bx lr
I believe that is ABI compliant. please let me know if you think otherwise.

Thanks,
Christian

http://reviews.llvm.org/D3651
IMPORTANT NOTICE: The contents of this email and any attachments are

confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No: 2548782

http://reviews.llvm.org/D3651

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Hi James,

the test file "test/CodeGen/ARM/big-endian-neon-bitconv.ll" that you can find in the rL208538 includes three test functions with 128 bit vector arguments tested for hard and soft float modes.

Thanks,
Christian

Hi James,

would you please confirm my comment above, just to be sure being on the same page.
Nevertheless you can find a punch of more test cases with respect to soft and hard float in D3766.

Thanks,
Christian

Hi Christian,

The testcases in this patch were insufficient, but the ones you've subsequently added (callee and caller) are sufficient.

Cheers,

James

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMFastISel.cpp

2 lines

ARMISelLowering.cpp

10 lines

ARMInstrNEON.td

186 lines

test/

CodeGen/

ARM/

big-endian-neon-bitconv.ll

355 lines

dagcombine-concatvector.ll

4 lines

vcombine.ll

2 lines

Diff 9257

lib/Target/ARM/ARMFastISel.cpp

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	private:
unsigned ARMMaterializeFP(const ConstantFP *CFP, MVT VT);		unsigned ARMMaterializeFP(const ConstantFP *CFP, MVT VT);
unsigned ARMMaterializeInt(const Constant *C, MVT VT);		unsigned ARMMaterializeInt(const Constant *C, MVT VT);
unsigned ARMMaterializeGV(const GlobalValue *GV, MVT VT);		unsigned ARMMaterializeGV(const GlobalValue *GV, MVT VT);
unsigned ARMMoveToFPReg(MVT VT, unsigned SrcReg);		unsigned ARMMoveToFPReg(MVT VT, unsigned SrcReg);
unsigned ARMMoveToIntReg(MVT VT, unsigned SrcReg);		unsigned ARMMoveToIntReg(MVT VT, unsigned SrcReg);
unsigned ARMSelectCallOp(bool UseReg);		unsigned ARMSelectCallOp(bool UseReg);
unsigned ARMLowerPICELF(const GlobalValue *GV, unsigned Align, MVT VT);		unsigned ARMLowerPICELF(const GlobalValue *GV, unsigned Align, MVT VT);

		const TargetLowering *getTargetLowering() { return TM.getTargetLowering(); }
		jmolloyUnsubmitted Not Done Reply Inline Actions This function seems unused? jmolloy: This function seems unused?
		cpirkerAuthorUnsubmitted Not Done Reply Inline Actions This function is required to evaluate the [IsBE] predicate for the bitconversion rules that specify the VREV instructions. cpirker: This function is required to evaluate the [IsBE] predicate for the bitconversion rules that…
		jmolloyUnsubmitted Not Done Reply Inline Actions But that predicate isn't defined anywhere in this patch. jmolloy: But that predicate isn't defined anywhere in this patch.
		cpirkerAuthorUnsubmitted Not Done Reply Inline Actions The [IsBE] predicate is defined in ARMInstrInfo.td (see line 305). cpirker: The [IsBE] predicate is defined in ARMInstrInfo.td (see line 305).
		jmolloyUnsubmitted Not Done Reply Inline Actions I see. Strange that this hasn't been required before now - is it that you haven't tested with FastISel before this patch? The * should go nearest the function name: const TargetLowering getTargetLowering(). jmolloy:* I see. Strange that this hasn't been required before now - is it that you haven't tested with…

// Call handling routines.		// Call handling routines.
private:		private:
CCAssignFn *CCAssignFnForCall(CallingConv::ID CC,		CCAssignFn *CCAssignFnForCall(CallingConv::ID CC,
bool Return,		bool Return,
bool isVarArg);		bool isVarArg);
bool ProcessCallArgs(SmallVectorImpl<Value*> &Args,		bool ProcessCallArgs(SmallVectorImpl<Value*> &Args,
SmallVectorImpl<unsigned> &ArgRegs,		SmallVectorImpl<unsigned> &ArgRegs,
SmallVectorImpl<MVT> &ArgVTs,		SmallVectorImpl<MVT> &ArgVTs,
▲ Show 20 Lines • Show All 2,880 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,818 Lines • ▼ Show 20 Lines
unsigned ARMTargetLowering::getRegisterByName(const char* RegName) const {		unsigned ARMTargetLowering::getRegisterByName(const char* RegName) const {
unsigned Reg = StringSwitch<unsigned>(RegName)		unsigned Reg = StringSwitch<unsigned>(RegName)
.Case("sp", ARM::SP)		.Case("sp", ARM::SP)
.Default(0);		.Default(0);
if (Reg)		if (Reg)
return Reg;		return Reg;
report_fatal_error("Invalid register name global variable");		report_fatal_error("Invalid register name global variable");
}		}

		jmolloyUnsubmitted Not Done Reply Inline Actions This seems too complex for a ternary - I'd prefer to see it written as an if/else. jmolloy: This seems too complex for a ternary - I'd prefer to see it written as an if/else.
		cpirkerAuthorUnsubmitted Not Done Reply Inline Actions Yes, if/else would be more readable. cpirker: Yes, if/else would be more readable.
/// ExpandBITCAST - If the target supports VFP, this function is called to		/// ExpandBITCAST - If the target supports VFP, this function is called to
/// expand a bit convert where either the source or destination type is i64 to		/// expand a bit convert where either the source or destination type is i64 to
/// use a VMOVDRR or VMOVRRD node. This should not be done when the non-i64		/// use a VMOVDRR or VMOVRRD node. This should not be done when the non-i64
/// operand type is illegal (e.g., v2f32 for a target that doesn't support		/// operand type is illegal (e.g., v2f32 for a target that doesn't support
/// vectors), since the legalizer won't know what to do with that.		/// vectors), since the legalizer won't know what to do with that.
static SDValue ExpandBITCAST(SDNode *N, SelectionDAG &DAG) {		static SDValue ExpandBITCAST(SDNode *N, SelectionDAG &DAG) {
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDLoc dl(N);		SDLoc dl(N);
Show All 13 Lines	if (SrcVT == MVT::i64 && TLI.isTypeLegal(DstVT)) {
SDValue Hi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, Op,		SDValue Hi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, Op,
DAG.getConstant(1, MVT::i32));		DAG.getConstant(1, MVT::i32));
return DAG.getNode(ISD::BITCAST, dl, DstVT,		return DAG.getNode(ISD::BITCAST, dl, DstVT,
DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Lo, Hi));		DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Lo, Hi));
}		}

// Turn f64->i64 into VMOVRRD.		// Turn f64->i64 into VMOVRRD.
if (DstVT == MVT::i64 && TLI.isTypeLegal(SrcVT)) {		if (DstVT == MVT::i64 && TLI.isTypeLegal(SrcVT)) {
SDValue Cvt = DAG.getNode(ARMISD::VMOVRRD, dl,		SDValue Cvt;
		if (TLI.isBigEndian() && SrcVT.isVector())
		Cvt = DAG.getNode(ARMISD::VMOVRRD, dl,
		DAG.getVTList(MVT::i32, MVT::i32),
		DAG.getNode(ARMISD::VREV64, dl, SrcVT, Op));
		else
		Cvt = DAG.getNode(ARMISD::VMOVRRD, dl,
DAG.getVTList(MVT::i32, MVT::i32), Op);		DAG.getVTList(MVT::i32, MVT::i32), Op);
// Merge the pieces into a single i64 value.		// Merge the pieces into a single i64 value.
return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Cvt, Cvt.getValue(1));		return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Cvt, Cvt.getValue(1));
}		}

return SDValue();		return SDValue();
}		}

/// getZeroVector - Returns a vector of specified type with all zero elements.		/// getZeroVector - Returns a vector of specified type with all zero elements.
▲ Show 20 Lines • Show All 6,764 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrNEON.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,360 Lines • ▼ Show 20 Lines

// Use vld1/vst1 for Q and QQ. Also use them for unaligned v2f64		// Use vld1/vst1 for Q and QQ. Also use them for unaligned v2f64
// load / store if it's legal.		// load / store if it's legal.
def : Pat<(v2f64 (dword_alignedload addrmode6:$addr)),		def : Pat<(v2f64 (dword_alignedload addrmode6:$addr)),
(VLD1q64 addrmode6:$addr)>;		(VLD1q64 addrmode6:$addr)>;
def : Pat<(dword_alignedstore (v2f64 QPR:$value), addrmode6:$addr),		def : Pat<(dword_alignedstore (v2f64 QPR:$value), addrmode6:$addr),
(VST1q64 addrmode6:$addr, QPR:$value)>;		(VST1q64 addrmode6:$addr, QPR:$value)>;
def : Pat<(v2f64 (word_alignedload addrmode6:$addr)),		def : Pat<(v2f64 (word_alignedload addrmode6:$addr)),
(VLD1q32 addrmode6:$addr)>;		(VLD1q32 addrmode6:$addr)>, Requires<[IsLE]>;
		jmolloyUnsubmitted Not Done Reply Inline Actions Why are these not allowed in big-endian mode? and why only these patterns? jmolloy: Why are these not allowed in big-endian mode? and why only these patterns?
		cpirkerAuthorUnsubmitted Not Done Reply Inline Actions v2f64 is not compatible with v4i32 in big endian mode (a vrev instruction would be needed). All other patterns like v8i8,... are already disabled in big endian mode. Not sure why this specific rule was left active for both endian modes. cpirker: v2f64 is not compatible with v4i32 in big endian mode (a vrev instruction would be needed). All…
		jmolloyUnsubmitted Not Done Reply Inline Actions Ah yes, I see. Thanks. jmolloy: Ah yes, I see. Thanks.
def : Pat<(word_alignedstore (v2f64 QPR:$value), addrmode6:$addr),		def : Pat<(word_alignedstore (v2f64 QPR:$value), addrmode6:$addr),
(VST1q32 addrmode6:$addr, QPR:$value)>;		(VST1q32 addrmode6:$addr, QPR:$value)>, Requires<[IsLE]>;
def : Pat<(v2f64 (hword_alignedload addrmode6:$addr)),		def : Pat<(v2f64 (hword_alignedload addrmode6:$addr)),
(VLD1q16 addrmode6:$addr)>, Requires<[IsLE]>;		(VLD1q16 addrmode6:$addr)>, Requires<[IsLE]>;
def : Pat<(hword_alignedstore (v2f64 QPR:$value), addrmode6:$addr),		def : Pat<(hword_alignedstore (v2f64 QPR:$value), addrmode6:$addr),
(VST1q16 addrmode6:$addr, QPR:$value)>, Requires<[IsLE]>;		(VST1q16 addrmode6:$addr, QPR:$value)>, Requires<[IsLE]>;
def : Pat<(v2f64 (byte_alignedload addrmode6:$addr)),		def : Pat<(v2f64 (byte_alignedload addrmode6:$addr)),
(VLD1q8 addrmode6:$addr)>, Requires<[IsLE]>;		(VLD1q8 addrmode6:$addr)>, Requires<[IsLE]>;
def : Pat<(byte_alignedstore (v2f64 QPR:$value), addrmode6:$addr),		def : Pat<(byte_alignedstore (v2f64 QPR:$value), addrmode6:$addr),
(VST1q8 addrmode6:$addr, QPR:$value)>, Requires<[IsLE]>;		(VST1q8 addrmode6:$addr, QPR:$value)>, Requires<[IsLE]>;
▲ Show 20 Lines • Show All 3,791 Lines • ▼ Show 20 Lines	def : Pat<(f32 (bitconvert GPR:$a)),
(EXTRACT_SUBREG (VMOVDRR GPR:$a, GPR:$a), ssub_0)>,		(EXTRACT_SUBREG (VMOVDRR GPR:$a, GPR:$a), ssub_0)>,
Requires<[HasNEON, DontUseVMOVSR]>;		Requires<[HasNEON, DontUseVMOVSR]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Non-Instruction Patterns		// Non-Instruction Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// bit_convert		// bit_convert
		let Predicates = [IsLE] in {
def : Pat<(v1i64 (bitconvert (v2i32 DPR:$src))), (v1i64 DPR:$src)>;		def : Pat<(v1i64 (bitconvert (v2i32 DPR:$src))), (v1i64 DPR:$src)>;
def : Pat<(v1i64 (bitconvert (v4i16 DPR:$src))), (v1i64 DPR:$src)>;		def : Pat<(v1i64 (bitconvert (v4i16 DPR:$src))), (v1i64 DPR:$src)>;
def : Pat<(v1i64 (bitconvert (v8i8 DPR:$src))), (v1i64 DPR:$src)>;		def : Pat<(v1i64 (bitconvert (v8i8 DPR:$src))), (v1i64 DPR:$src)>;
		}
def : Pat<(v1i64 (bitconvert (f64 DPR:$src))), (v1i64 DPR:$src)>;		def : Pat<(v1i64 (bitconvert (f64 DPR:$src))), (v1i64 DPR:$src)>;
		let Predicates = [IsLE] in {
def : Pat<(v1i64 (bitconvert (v2f32 DPR:$src))), (v1i64 DPR:$src)>;		def : Pat<(v1i64 (bitconvert (v2f32 DPR:$src))), (v1i64 DPR:$src)>;
def : Pat<(v2i32 (bitconvert (v1i64 DPR:$src))), (v2i32 DPR:$src)>;		def : Pat<(v2i32 (bitconvert (v1i64 DPR:$src))), (v2i32 DPR:$src)>;
def : Pat<(v2i32 (bitconvert (v4i16 DPR:$src))), (v2i32 DPR:$src)>;		def : Pat<(v2i32 (bitconvert (v4i16 DPR:$src))), (v2i32 DPR:$src)>;
def : Pat<(v2i32 (bitconvert (v8i8 DPR:$src))), (v2i32 DPR:$src)>;		def : Pat<(v2i32 (bitconvert (v8i8 DPR:$src))), (v2i32 DPR:$src)>;
def : Pat<(v2i32 (bitconvert (f64 DPR:$src))), (v2i32 DPR:$src)>;		def : Pat<(v2i32 (bitconvert (f64 DPR:$src))), (v2i32 DPR:$src)>;
		}
def : Pat<(v2i32 (bitconvert (v2f32 DPR:$src))), (v2i32 DPR:$src)>;		def : Pat<(v2i32 (bitconvert (v2f32 DPR:$src))), (v2i32 DPR:$src)>;
		let Predicates = [IsLE] in {
def : Pat<(v4i16 (bitconvert (v1i64 DPR:$src))), (v4i16 DPR:$src)>;		def : Pat<(v4i16 (bitconvert (v1i64 DPR:$src))), (v4i16 DPR:$src)>;
def : Pat<(v4i16 (bitconvert (v2i32 DPR:$src))), (v4i16 DPR:$src)>;		def : Pat<(v4i16 (bitconvert (v2i32 DPR:$src))), (v4i16 DPR:$src)>;
def : Pat<(v4i16 (bitconvert (v8i8 DPR:$src))), (v4i16 DPR:$src)>;		def : Pat<(v4i16 (bitconvert (v8i8 DPR:$src))), (v4i16 DPR:$src)>;
def : Pat<(v4i16 (bitconvert (f64 DPR:$src))), (v4i16 DPR:$src)>;		def : Pat<(v4i16 (bitconvert (f64 DPR:$src))), (v4i16 DPR:$src)>;
def : Pat<(v4i16 (bitconvert (v2f32 DPR:$src))), (v4i16 DPR:$src)>;		def : Pat<(v4i16 (bitconvert (v2f32 DPR:$src))), (v4i16 DPR:$src)>;
def : Pat<(v8i8 (bitconvert (v1i64 DPR:$src))), (v8i8 DPR:$src)>;		def : Pat<(v8i8 (bitconvert (v1i64 DPR:$src))), (v8i8 DPR:$src)>;
def : Pat<(v8i8 (bitconvert (v2i32 DPR:$src))), (v8i8 DPR:$src)>;		def : Pat<(v8i8 (bitconvert (v2i32 DPR:$src))), (v8i8 DPR:$src)>;
def : Pat<(v8i8 (bitconvert (v4i16 DPR:$src))), (v8i8 DPR:$src)>;		def : Pat<(v8i8 (bitconvert (v4i16 DPR:$src))), (v8i8 DPR:$src)>;
def : Pat<(v8i8 (bitconvert (f64 DPR:$src))), (v8i8 DPR:$src)>;		def : Pat<(v8i8 (bitconvert (f64 DPR:$src))), (v8i8 DPR:$src)>;
def : Pat<(v8i8 (bitconvert (v2f32 DPR:$src))), (v8i8 DPR:$src)>;		def : Pat<(v8i8 (bitconvert (v2f32 DPR:$src))), (v8i8 DPR:$src)>;
		}
def : Pat<(f64 (bitconvert (v1i64 DPR:$src))), (f64 DPR:$src)>;		def : Pat<(f64 (bitconvert (v1i64 DPR:$src))), (f64 DPR:$src)>;
		let Predicates = [IsLE] in {
def : Pat<(f64 (bitconvert (v2i32 DPR:$src))), (f64 DPR:$src)>;		def : Pat<(f64 (bitconvert (v2i32 DPR:$src))), (f64 DPR:$src)>;
def : Pat<(f64 (bitconvert (v4i16 DPR:$src))), (f64 DPR:$src)>;		def : Pat<(f64 (bitconvert (v4i16 DPR:$src))), (f64 DPR:$src)>;
def : Pat<(f64 (bitconvert (v8i8 DPR:$src))), (f64 DPR:$src)>;		def : Pat<(f64 (bitconvert (v8i8 DPR:$src))), (f64 DPR:$src)>;
def : Pat<(f64 (bitconvert (v2f32 DPR:$src))), (f64 DPR:$src)>;		def : Pat<(f64 (bitconvert (v2f32 DPR:$src))), (f64 DPR:$src)>;
def : Pat<(v2f32 (bitconvert (f64 DPR:$src))), (v2f32 DPR:$src)>;		def : Pat<(v2f32 (bitconvert (f64 DPR:$src))), (v2f32 DPR:$src)>;
def : Pat<(v2f32 (bitconvert (v1i64 DPR:$src))), (v2f32 DPR:$src)>;		def : Pat<(v2f32 (bitconvert (v1i64 DPR:$src))), (v2f32 DPR:$src)>;
		}
def : Pat<(v2f32 (bitconvert (v2i32 DPR:$src))), (v2f32 DPR:$src)>;		def : Pat<(v2f32 (bitconvert (v2i32 DPR:$src))), (v2f32 DPR:$src)>;
		let Predicates = [IsLE] in {
def : Pat<(v2f32 (bitconvert (v4i16 DPR:$src))), (v2f32 DPR:$src)>;		def : Pat<(v2f32 (bitconvert (v4i16 DPR:$src))), (v2f32 DPR:$src)>;
def : Pat<(v2f32 (bitconvert (v8i8 DPR:$src))), (v2f32 DPR:$src)>;		def : Pat<(v2f32 (bitconvert (v8i8 DPR:$src))), (v2f32 DPR:$src)>;
		}

		let Predicates = [IsLE] in {
def : Pat<(v2i64 (bitconvert (v4i32 QPR:$src))), (v2i64 QPR:$src)>;		def : Pat<(v2i64 (bitconvert (v4i32 QPR:$src))), (v2i64 QPR:$src)>;
def : Pat<(v2i64 (bitconvert (v8i16 QPR:$src))), (v2i64 QPR:$src)>;		def : Pat<(v2i64 (bitconvert (v8i16 QPR:$src))), (v2i64 QPR:$src)>;
def : Pat<(v2i64 (bitconvert (v16i8 QPR:$src))), (v2i64 QPR:$src)>;		def : Pat<(v2i64 (bitconvert (v16i8 QPR:$src))), (v2i64 QPR:$src)>;
		}
def : Pat<(v2i64 (bitconvert (v2f64 QPR:$src))), (v2i64 QPR:$src)>;		def : Pat<(v2i64 (bitconvert (v2f64 QPR:$src))), (v2i64 QPR:$src)>;
		let Predicates = [IsLE] in {
def : Pat<(v2i64 (bitconvert (v4f32 QPR:$src))), (v2i64 QPR:$src)>;		def : Pat<(v2i64 (bitconvert (v4f32 QPR:$src))), (v2i64 QPR:$src)>;
def : Pat<(v4i32 (bitconvert (v2i64 QPR:$src))), (v4i32 QPR:$src)>;		def : Pat<(v4i32 (bitconvert (v2i64 QPR:$src))), (v4i32 QPR:$src)>;
def : Pat<(v4i32 (bitconvert (v8i16 QPR:$src))), (v4i32 QPR:$src)>;		def : Pat<(v4i32 (bitconvert (v8i16 QPR:$src))), (v4i32 QPR:$src)>;
def : Pat<(v4i32 (bitconvert (v16i8 QPR:$src))), (v4i32 QPR:$src)>;		def : Pat<(v4i32 (bitconvert (v16i8 QPR:$src))), (v4i32 QPR:$src)>;
def : Pat<(v4i32 (bitconvert (v2f64 QPR:$src))), (v4i32 QPR:$src)>;		def : Pat<(v4i32 (bitconvert (v2f64 QPR:$src))), (v4i32 QPR:$src)>;
		}
def : Pat<(v4i32 (bitconvert (v4f32 QPR:$src))), (v4i32 QPR:$src)>;		def : Pat<(v4i32 (bitconvert (v4f32 QPR:$src))), (v4i32 QPR:$src)>;
		let Predicates = [IsLE] in {
def : Pat<(v8i16 (bitconvert (v2i64 QPR:$src))), (v8i16 QPR:$src)>;		def : Pat<(v8i16 (bitconvert (v2i64 QPR:$src))), (v8i16 QPR:$src)>;
def : Pat<(v8i16 (bitconvert (v4i32 QPR:$src))), (v8i16 QPR:$src)>;		def : Pat<(v8i16 (bitconvert (v4i32 QPR:$src))), (v8i16 QPR:$src)>;
def : Pat<(v8i16 (bitconvert (v16i8 QPR:$src))), (v8i16 QPR:$src)>;		def : Pat<(v8i16 (bitconvert (v16i8 QPR:$src))), (v8i16 QPR:$src)>;
def : Pat<(v8i16 (bitconvert (v2f64 QPR:$src))), (v8i16 QPR:$src)>;		def : Pat<(v8i16 (bitconvert (v2f64 QPR:$src))), (v8i16 QPR:$src)>;
def : Pat<(v8i16 (bitconvert (v4f32 QPR:$src))), (v8i16 QPR:$src)>;		def : Pat<(v8i16 (bitconvert (v4f32 QPR:$src))), (v8i16 QPR:$src)>;
def : Pat<(v16i8 (bitconvert (v2i64 QPR:$src))), (v16i8 QPR:$src)>;		def : Pat<(v16i8 (bitconvert (v2i64 QPR:$src))), (v16i8 QPR:$src)>;
def : Pat<(v16i8 (bitconvert (v4i32 QPR:$src))), (v16i8 QPR:$src)>;		def : Pat<(v16i8 (bitconvert (v4i32 QPR:$src))), (v16i8 QPR:$src)>;
def : Pat<(v16i8 (bitconvert (v8i16 QPR:$src))), (v16i8 QPR:$src)>;		def : Pat<(v16i8 (bitconvert (v8i16 QPR:$src))), (v16i8 QPR:$src)>;
def : Pat<(v16i8 (bitconvert (v2f64 QPR:$src))), (v16i8 QPR:$src)>;		def : Pat<(v16i8 (bitconvert (v2f64 QPR:$src))), (v16i8 QPR:$src)>;
def : Pat<(v16i8 (bitconvert (v4f32 QPR:$src))), (v16i8 QPR:$src)>;		def : Pat<(v16i8 (bitconvert (v4f32 QPR:$src))), (v16i8 QPR:$src)>;
def : Pat<(v4f32 (bitconvert (v2i64 QPR:$src))), (v4f32 QPR:$src)>;		def : Pat<(v4f32 (bitconvert (v2i64 QPR:$src))), (v4f32 QPR:$src)>;
		}
def : Pat<(v4f32 (bitconvert (v4i32 QPR:$src))), (v4f32 QPR:$src)>;		def : Pat<(v4f32 (bitconvert (v4i32 QPR:$src))), (v4f32 QPR:$src)>;
		let Predicates = [IsLE] in {
def : Pat<(v4f32 (bitconvert (v8i16 QPR:$src))), (v4f32 QPR:$src)>;		def : Pat<(v4f32 (bitconvert (v8i16 QPR:$src))), (v4f32 QPR:$src)>;
def : Pat<(v4f32 (bitconvert (v16i8 QPR:$src))), (v4f32 QPR:$src)>;		def : Pat<(v4f32 (bitconvert (v16i8 QPR:$src))), (v4f32 QPR:$src)>;
def : Pat<(v4f32 (bitconvert (v2f64 QPR:$src))), (v4f32 QPR:$src)>;		def : Pat<(v4f32 (bitconvert (v2f64 QPR:$src))), (v4f32 QPR:$src)>;
		}
def : Pat<(v2f64 (bitconvert (v2i64 QPR:$src))), (v2f64 QPR:$src)>;		def : Pat<(v2f64 (bitconvert (v2i64 QPR:$src))), (v2f64 QPR:$src)>;
		let Predicates = [IsLE] in {
def : Pat<(v2f64 (bitconvert (v4i32 QPR:$src))), (v2f64 QPR:$src)>;		def : Pat<(v2f64 (bitconvert (v4i32 QPR:$src))), (v2f64 QPR:$src)>;
def : Pat<(v2f64 (bitconvert (v8i16 QPR:$src))), (v2f64 QPR:$src)>;		def : Pat<(v2f64 (bitconvert (v8i16 QPR:$src))), (v2f64 QPR:$src)>;
def : Pat<(v2f64 (bitconvert (v16i8 QPR:$src))), (v2f64 QPR:$src)>;		def : Pat<(v2f64 (bitconvert (v16i8 QPR:$src))), (v2f64 QPR:$src)>;
def : Pat<(v2f64 (bitconvert (v4f32 QPR:$src))), (v2f64 QPR:$src)>;		def : Pat<(v2f64 (bitconvert (v4f32 QPR:$src))), (v2f64 QPR:$src)>;
		}

		let Predicates = [IsBE] in {
		jmolloyUnsubmitted Not Done Reply Inline Actions These look fine, but what was your methodology for generating them? Did you do them all by hand, or use some script? jmolloy: These look fine, but what was your methodology for generating them? Did you do them all by hand…
		cpirkerAuthorUnsubmitted Not Done Reply Inline Actions I did it by hand, based on similar patch for AArch64 (D3424). cpirker: I did it by hand, based on similar patch for AArch64 (D3424).
		// 64 bit conversions
		def : Pat<(v1i64 (bitconvert (v2i32 DPR:$src))), (VREV64d32 DPR:$src)>;
		def : Pat<(v1i64 (bitconvert (v4i16 DPR:$src))), (VREV64d16 DPR:$src)>;
		def : Pat<(v1i64 (bitconvert (v8i8 DPR:$src))), (VREV64d8 DPR:$src)>;
		def : Pat<(v1i64 (bitconvert (v2f32 DPR:$src))), (VREV64d32 DPR:$src)>;
		def : Pat<(v2i32 (bitconvert (v1i64 DPR:$src))), (VREV64d32 DPR:$src)>;
		def : Pat<(v2i32 (bitconvert (v4i16 DPR:$src))), (VREV32d16 DPR:$src)>;
		def : Pat<(v2i32 (bitconvert (v8i8 DPR:$src))), (VREV32d8 DPR:$src)>;
		def : Pat<(v2i32 (bitconvert (f64 DPR:$src))), (VREV64d32 DPR:$src)>;
		def : Pat<(v4i16 (bitconvert (v1i64 DPR:$src))), (VREV64d16 DPR:$src)>;
		def : Pat<(v4i16 (bitconvert (v2i32 DPR:$src))), (VREV32d16 DPR:$src)>;
		def : Pat<(v4i16 (bitconvert (v8i8 DPR:$src))), (VREV16d8 DPR:$src)>;
		def : Pat<(v4i16 (bitconvert (f64 DPR:$src))), (VREV64d16 DPR:$src)>;
		def : Pat<(v4i16 (bitconvert (v2f32 DPR:$src))), (VREV32d16 DPR:$src)>;
		def : Pat<(v8i8 (bitconvert (v1i64 DPR:$src))), (VREV64d8 DPR:$src)>;
		def : Pat<(v8i8 (bitconvert (v2i32 DPR:$src))), (VREV32d8 DPR:$src)>;
		def : Pat<(v8i8 (bitconvert (v4i16 DPR:$src))), (VREV16d8 DPR:$src)>;
		def : Pat<(v8i8 (bitconvert (f64 DPR:$src))), (VREV64d8 DPR:$src)>;
		def : Pat<(v8i8 (bitconvert (v2f32 DPR:$src))), (VREV32d8 DPR:$src)>;
		def : Pat<(f64 (bitconvert (v2i32 DPR:$src))), (VREV64d32 DPR:$src)>;
		def : Pat<(f64 (bitconvert (v4i16 DPR:$src))), (VREV64d16 DPR:$src)>;
		def : Pat<(f64 (bitconvert (v8i8 DPR:$src))), (VREV64d8 DPR:$src)>;
		def : Pat<(f64 (bitconvert (v2f32 DPR:$src))), (VREV64d32 DPR:$src)>;
		def : Pat<(v2f32 (bitconvert (f64 DPR:$src))), (VREV64d32 DPR:$src)>;
		def : Pat<(v2f32 (bitconvert (v1i64 DPR:$src))), (VREV64d32 DPR:$src)>;
		def : Pat<(v2f32 (bitconvert (v4i16 DPR:$src))), (VREV32d16 DPR:$src)>;
		def : Pat<(v2f32 (bitconvert (v8i8 DPR:$src))), (VREV32d8 DPR:$src)>;

		// 128 bit conversions
		def : Pat<(v2i64 (bitconvert (v4i32 QPR:$src))), (VREV64q32 QPR:$src)>;
		def : Pat<(v2i64 (bitconvert (v8i16 QPR:$src))), (VREV64q16 QPR:$src)>;
		def : Pat<(v2i64 (bitconvert (v16i8 QPR:$src))), (VREV64q8 QPR:$src)>;
		def : Pat<(v2i64 (bitconvert (v4f32 QPR:$src))), (VREV64q32 QPR:$src)>;
		def : Pat<(v4i32 (bitconvert (v2i64 QPR:$src))), (VREV64q32 QPR:$src)>;
		def : Pat<(v4i32 (bitconvert (v8i16 QPR:$src))), (VREV32q16 QPR:$src)>;
		def : Pat<(v4i32 (bitconvert (v16i8 QPR:$src))), (VREV32q8 QPR:$src)>;
		def : Pat<(v4i32 (bitconvert (v2f64 QPR:$src))), (VREV64q32 QPR:$src)>;
		def : Pat<(v8i16 (bitconvert (v2i64 QPR:$src))), (VREV64q16 QPR:$src)>;
		def : Pat<(v8i16 (bitconvert (v4i32 QPR:$src))), (VREV32q16 QPR:$src)>;
		def : Pat<(v8i16 (bitconvert (v16i8 QPR:$src))), (VREV16q8 QPR:$src)>;
		def : Pat<(v8i16 (bitconvert (v2f64 QPR:$src))), (VREV64q16 QPR:$src)>;
		def : Pat<(v8i16 (bitconvert (v4f32 QPR:$src))), (VREV32q16 QPR:$src)>;
		def : Pat<(v16i8 (bitconvert (v2i64 QPR:$src))), (VREV64q8 QPR:$src)>;
		def : Pat<(v16i8 (bitconvert (v4i32 QPR:$src))), (VREV32q8 QPR:$src)>;
		def : Pat<(v16i8 (bitconvert (v8i16 QPR:$src))), (VREV16q8 QPR:$src)>;
		def : Pat<(v16i8 (bitconvert (v2f64 QPR:$src))), (VREV64q8 QPR:$src)>;
		def : Pat<(v16i8 (bitconvert (v4f32 QPR:$src))), (VREV32q8 QPR:$src)>;
		def : Pat<(v4f32 (bitconvert (v2i64 QPR:$src))), (VREV64q32 QPR:$src)>;
		def : Pat<(v4f32 (bitconvert (v8i16 QPR:$src))), (VREV32q16 QPR:$src)>;
		def : Pat<(v4f32 (bitconvert (v16i8 QPR:$src))), (VREV32q8 QPR:$src)>;
		def : Pat<(v4f32 (bitconvert (v2f64 QPR:$src))), (VREV64q32 QPR:$src)>;
		def : Pat<(v2f64 (bitconvert (v4i32 QPR:$src))), (VREV64q32 QPR:$src)>;
		def : Pat<(v2f64 (bitconvert (v8i16 QPR:$src))), (VREV64q16 QPR:$src)>;
		def : Pat<(v2f64 (bitconvert (v16i8 QPR:$src))), (VREV64q8 QPR:$src)>;
		def : Pat<(v2f64 (bitconvert (v4f32 QPR:$src))), (VREV64q32 QPR:$src)>;
		}

// Fold extracting an element out of a v2i32 into a vfp register.		// Fold extracting an element out of a v2i32 into a vfp register.
def : Pat<(f32 (bitconvert (i32 (extractelt (v2i32 DPR:$src), imm:$lane)))),		def : Pat<(f32 (bitconvert (i32 (extractelt (v2i32 DPR:$src), imm:$lane)))),
(f32 (EXTRACT_SUBREG DPR:$src, (SSubReg_f32_reg imm:$lane)))>;		(f32 (EXTRACT_SUBREG DPR:$src, (SSubReg_f32_reg imm:$lane)))>;

// Vector lengthening move with load, matching extending loads.		// Vector lengthening move with load, matching extending loads.

// extload, zextload and sextload for a standard lengthening load. Example:		// extload, zextload and sextload for a standard lengthening load. Example:
▲ Show 20 Lines • Show All 1,176 Lines • Show Last 20 Lines

test/CodeGen/ARM/big-endian-neon-bitconv.ll

				; RUN: llc < %s -march armeb -mattr v7,neon -o - \| FileCheck %s

				@v2i64 = global <2 x i64> zeroinitializer
				@v2i32 = global <2 x i32> zeroinitializer
				@v4i32 = global <4 x i32> zeroinitializer
				@v4i16 = global <4 x i16> zeroinitializer
				@v8i16 = global <8 x i16> zeroinitializer
				@v8i8 = global <8 x i8> zeroinitializer
				@v16i8 = global <16 x i8> zeroinitializer

				@v2f32 = global <2 x float> zeroinitializer
				@v2f64 = global <2 x double> zeroinitializer
				@v4f32 = global <4 x float> zeroinitializer


				; 64 bit conversions
				define void @conv_i64_to_v8i8( i64 %val, <8 x i8>* %store ) {
				; CHECK-LABEL: conv_i64_to_v8i8:
				; CHECK: vrev64.8
				%v = bitcast i64 %val to <8 x i8>
				%w = load <8 x i8>* @v8i8
				%a = add <8 x i8> %v, %w
				store <8 x i8> %a, <8 x i8>* %store
				ret void
				}

				define void @conv_v8i8_to_i64( <8 x i8>* %load, <8 x i8>* %store ) {
				; CHECK-LABEL: conv_v8i8_to_i64:
				; CHECK: vrev64.8
				%v = load <8 x i8>* %load
				%w = load <8 x i8>* @v8i8
				%a = add <8 x i8> %v, %w
				%f = bitcast <8 x i8> %a to i64
				call void @conv_i64_to_v8i8( i64 %f, <8 x i8>* %store )
				ret void
				}

				define void @conv_i64_to_v4i16( i64 %val, <4 x i16>* %store ) {
				; CHECK-LABEL: conv_i64_to_v4i16:
				; CHECK: vrev64.16
				%v = bitcast i64 %val to <4 x i16>
				%w = load <4 x i16>* @v4i16
				%a = add <4 x i16> %v, %w
				store <4 x i16> %a, <4 x i16>* %store
				ret void
				}

				define void @conv_v4i16_to_i64( <4 x i16>* %load, <4 x i16>* %store ) {
				; CHECK-LABEL: conv_v4i16_to_i64:
				; CHECK: vrev64.16
				%v = load <4 x i16>* %load
				%w = load <4 x i16>* @v4i16
				%a = add <4 x i16> %v, %w
				%f = bitcast <4 x i16> %a to i64
				call void @conv_i64_to_v4i16( i64 %f, <4 x i16>* %store )
				ret void
				}

				define void @conv_i64_to_v2i32( i64 %val, <2 x i32>* %store ) {
				; CHECK-LABEL: conv_i64_to_v2i32:
				; CHECK: vrev64.32
				%v = bitcast i64 %val to <2 x i32>
				%w = load <2 x i32>* @v2i32
				%a = add <2 x i32> %v, %w
				store <2 x i32> %a, <2 x i32>* %store
				ret void
				}

				define void @conv_v2i32_to_i64( <2 x i32>* %load, <2 x i32>* %store ) {
				; CHECK-LABEL: conv_v2i32_to_i64:
				; CHECK: vrev64.32
				%v = load <2 x i32>* %load
				%w = load <2 x i32>* @v2i32
				%a = add <2 x i32> %v, %w
				%f = bitcast <2 x i32> %a to i64
				call void @conv_i64_to_v2i32( i64 %f, <2 x i32>* %store )
				ret void
				}

				define void @conv_i64_to_v2f32( i64 %val, <2 x float>* %store ) {
				; CHECK-LABEL: conv_i64_to_v2f32:
				; CHECK: vrev64.32
				%v = bitcast i64 %val to <2 x float>
				%w = load <2 x float>* @v2f32
				%a = fadd <2 x float> %v, %w
				store <2 x float> %a, <2 x float>* %store
				ret void
				}

				define void @conv_v2f32_to_i64( <2 x float>* %load, <2 x float>* %store ) {
				; CHECK-LABEL: conv_v2f32_to_i64:
				; CHECK: vrev64.32
				%v = load <2 x float>* %load
				%w = load <2 x float>* @v2f32
				%a = fadd <2 x float> %v, %w
				%f = bitcast <2 x float> %a to i64
				call void @conv_i64_to_v2f32( i64 %f, <2 x float>* %store )
				ret void
				}

				define void @conv_f64_to_v8i8( double %val, <8 x i8>* %store ) {
				; CHECK-LABEL: conv_f64_to_v8i8:
				; CHECK: vrev64.8
				%v = bitcast double %val to <8 x i8>
				%w = load <8 x i8>* @v8i8
				%a = add <8 x i8> %v, %w
				store <8 x i8> %a, <8 x i8>* %store
				ret void
				}

				define void @conv_v8i8_to_f64( <8 x i8>* %load, <8 x i8>* %store ) {
				; CHECK-LABEL: conv_v8i8_to_f64:
				; CHECK: vrev64.8
				%v = load <8 x i8>* %load
				%w = load <8 x i8>* @v8i8
				%a = add <8 x i8> %v, %w
				%f = bitcast <8 x i8> %a to double
				call void @conv_f64_to_v8i8( double %f, <8 x i8>* %store )
				ret void
				}

				define void @conv_f64_to_v4i16( double %val, <4 x i16>* %store ) {
				; CHECK-LABEL: conv_f64_to_v4i16:
				; CHECK: vrev64.16
				%v = bitcast double %val to <4 x i16>
				%w = load <4 x i16>* @v4i16
				%a = add <4 x i16> %v, %w
				store <4 x i16> %a, <4 x i16>* %store
				ret void
				}

				define void @conv_v4i16_to_f64( <4 x i16>* %load, <4 x i16>* %store ) {
				; CHECK-LABEL: conv_v4i16_to_f64:
				; CHECK: vrev64.16
				%v = load <4 x i16>* %load
				%w = load <4 x i16>* @v4i16
				%a = add <4 x i16> %v, %w
				%f = bitcast <4 x i16> %a to double
				call void @conv_f64_to_v4i16( double %f, <4 x i16>* %store )
				ret void
				}

				define void @conv_f64_to_v2i32( double %val, <2 x i32>* %store ) {
				; CHECK-LABEL: conv_f64_to_v2i32:
				; CHECK: vrev64.32
				%v = bitcast double %val to <2 x i32>
				%w = load <2 x i32>* @v2i32
				%a = add <2 x i32> %v, %w
				store <2 x i32> %a, <2 x i32>* %store
				ret void
				}

				define void @conv_v2i32_to_f64( <2 x i32>* %load, <2 x i32>* %store ) {
				; CHECK-LABEL: conv_v2i32_to_f64:
				; CHECK: vrev64.32
				%v = load <2 x i32>* %load
				%w = load <2 x i32>* @v2i32
				%a = add <2 x i32> %v, %w
				%f = bitcast <2 x i32> %a to double
				call void @conv_f64_to_v2i32( double %f, <2 x i32>* %store )
				ret void
				}

				define void @conv_f64_to_v2f32( double %val, <2 x float>* %store ) {
				; CHECK-LABEL: conv_f64_to_v2f32:
				; CHECK: vrev64.32
				%v = bitcast double %val to <2 x float>
				%w = load <2 x float>* @v2f32
				%a = fadd <2 x float> %v, %w
				store <2 x float> %a, <2 x float>* %store
				ret void
				}

				define void @conv_v2f32_to_f64( <2 x float>* %load, <2 x float>* %store ) {
				; CHECK-LABEL: conv_v2f32_to_f64:
				; CHECK: vrev64.32
				%v = load <2 x float>* %load
				%w = load <2 x float>* @v2f32
				%a = fadd <2 x float> %v, %w
				%f = bitcast <2 x float> %a to double
				call void @conv_f64_to_v2f32( double %f, <2 x float>* %store )
				ret void
				}

				; 128 bit conversions


				define void @conv_i128_to_v16i8( i128 %val, <16 x i8>* %store ) {
				; CHECK-LABEL: conv_i128_to_v16i8:
				; CHECK: vrev32.8
				%v = bitcast i128 %val to <16 x i8>
				%w = load <16 x i8>* @v16i8
				%a = add <16 x i8> %v, %w
				store <16 x i8> %a, <16 x i8>* %store
				ret void
				}

				define void @conv_v16i8_to_i128( <16 x i8>* %load, <16 x i8>* %store ) {
				; CHECK-LABEL: conv_v16i8_to_i128:
				; CHECK: vrev32.8
				%v = load <16 x i8>* %load
				%w = load <16 x i8>* @v16i8
				%a = add <16 x i8> %v, %w
				%f = bitcast <16 x i8> %a to i128
				call void @conv_i128_to_v16i8( i128 %f, <16 x i8>* %store )
				ret void
				}

				define void @conv_i128_to_v8i16( i128 %val, <8 x i16>* %store ) {
				; CHECK-LABEL: conv_i128_to_v8i16:
				; CHECK: vrev32.16
				%v = bitcast i128 %val to <8 x i16>
				%w = load <8 x i16>* @v8i16
				%a = add <8 x i16> %v, %w
				store <8 x i16> %a, <8 x i16>* %store
				ret void
				}

				define void @conv_v8i16_to_i128( <8 x i16>* %load, <8 x i16>* %store ) {
				; CHECK-LABEL: conv_v8i16_to_i128:
				; CHECK: vrev32.16
				%v = load <8 x i16>* %load
				%w = load <8 x i16>* @v8i16
				%a = add <8 x i16> %v, %w
				%f = bitcast <8 x i16> %a to i128
				call void @conv_i128_to_v8i16( i128 %f, <8 x i16>* %store )
				ret void
				}

				define void @conv_i128_to_v4i32( i128 %val, <4 x i32>* %store ) {
				; CHECK-LABEL: conv_i128_to_v4i32:
				; CHECK: vrev64.32
				%v = bitcast i128 %val to <4 x i32>
				%w = load <4 x i32>* @v4i32
				%a = add <4 x i32> %v, %w
				store <4 x i32> %a, <4 x i32>* %store
				ret void
				}

				define void @conv_v4i32_to_i128( <4 x i32>* %load, <4 x i32>* %store ) {
				; CHECK-LABEL: conv_v4i32_to_i128:
				; CHECK: vrev64.32
				%v = load <4 x i32>* %load
				%w = load <4 x i32>* @v4i32
				%a = add <4 x i32> %v, %w
				%f = bitcast <4 x i32> %a to i128
				call void @conv_i128_to_v4i32( i128 %f, <4 x i32>* %store )
				ret void
				}

				define void @conv_i128_to_v4f32( i128 %val, <4 x float>* %store ) {
				; CHECK-LABEL: conv_i128_to_v4f32:
				; CHECK: vrev64.32
				%v = bitcast i128 %val to <4 x float>
				%w = load <4 x float>* @v4f32
				%a = fadd <4 x float> %v, %w
				store <4 x float> %a, <4 x float>* %store
				ret void
				}

				define void @conv_v4f32_to_i128( <4 x float>* %load, <4 x float>* %store ) {
				; CHECK-LABEL: conv_v4f32_to_i128:
				; CHECK: vrev64.32
				%v = load <4 x float>* %load
				%w = load <4 x float>* @v4f32
				%a = fadd <4 x float> %v, %w
				%f = bitcast <4 x float> %a to i128
				call void @conv_i128_to_v4f32( i128 %f, <4 x float>* %store )
				ret void
				}

				define void @conv_f128_to_v2f64( fp128 %val, <2 x double>* %store ) {
				; CHECK-LABEL: conv_f128_to_v2f64:
				; CHECK: vrev64.32
				%v = bitcast fp128 %val to <2 x double>
				%w = load <2 x double>* @v2f64
				%a = fadd <2 x double> %v, %w
				store <2 x double> %a, <2 x double>* %store
				ret void
				}

				define void @conv_v2f64_to_f128( <2 x double>* %load, <2 x double>* %store ) {
				; CHECK-LABEL: conv_v2f64_to_f128:
				; CHECK: vrev64.32
				%v = load <2 x double>* %load
				%w = load <2 x double>* @v2f64
				%a = fadd <2 x double> %v, %w
				%f = bitcast <2 x double> %a to fp128
				call void @conv_f128_to_v2f64( fp128 %f, <2 x double>* %store )
				ret void
				}

				define void @conv_f128_to_v16i8( fp128 %val, <16 x i8>* %store ) {
				; CHECK-LABEL: conv_f128_to_v16i8:
				; CHECK: vrev32.8
				%v = bitcast fp128 %val to <16 x i8>
				%w = load <16 x i8>* @v16i8
				%a = add <16 x i8> %v, %w
				store <16 x i8> %a, <16 x i8>* %store
				ret void
				}

				define void @conv_v16i8_to_f128( <16 x i8>* %load, <16 x i8>* %store ) {
				; CHECK-LABEL: conv_v16i8_to_f128:
				; CHECK: vrev32.8
				%v = load <16 x i8>* %load
				%w = load <16 x i8>* @v16i8
				%a = add <16 x i8> %v, %w
				%f = bitcast <16 x i8> %a to fp128
				call void @conv_f128_to_v16i8( fp128 %f, <16 x i8>* %store )
				ret void
				}

				define void @conv_f128_to_v8i16( fp128 %val, <8 x i16>* %store ) {
				; CHECK-LABEL: conv_f128_to_v8i16:
				; CHECK: vrev32.16
				%v = bitcast fp128 %val to <8 x i16>
				%w = load <8 x i16>* @v8i16
				%a = add <8 x i16> %v, %w
				store <8 x i16> %a, <8 x i16>* %store
				ret void
				}

				define void @conv_v8i16_to_f128( <8 x i16>* %load, <8 x i16>* %store ) {
				; CHECK-LABEL: conv_v8i16_to_f128:
				; CHECK: vrev32.16
				%v = load <8 x i16>* %load
				%w = load <8 x i16>* @v8i16
				%a = add <8 x i16> %v, %w
				%f = bitcast <8 x i16> %a to fp128
				call void @conv_f128_to_v8i16( fp128 %f, <8 x i16>* %store )
				ret void
				}

				define void @conv_f128_to_v4f32( fp128 %val, <4 x float>* %store ) {
				; CHECK-LABEL: conv_f128_to_v4f32:
				; CHECK: vrev64.32
				%v = bitcast fp128 %val to <4 x float>
				%w = load <4 x float>* @v4f32
				%a = fadd <4 x float> %v, %w
				store <4 x float> %a, <4 x float>* %store
				ret void
				}

				define void @conv_v4f32_to_f128( <4 x float>* %load, <4 x float>* %store ) {
				; CHECK-LABEL: conv_v4f32_to_f128:
				; CHECK: vrev64.32
				%v = load <4 x float>* %load
				%w = load <4 x float>* @v4f32
				%a = fadd <4 x float> %v, %w
				%f = bitcast <4 x float> %a to fp128
				call void @conv_f128_to_v4f32( fp128 %f, <4 x float>* %store )
				ret void
				}

test/CodeGen/ARM/dagcombine-concatvector.ll

	; RUN: llc < %s -mtriple=thumbv7s-apple-ios3.0.0 -mcpu=generic \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-LE			; RUN: llc < %s -mtriple=thumbv7s-apple-ios3.0.0 -mcpu=generic \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-LE
	; RUN: llc < %s -mtriple=thumbeb -mattr=v7,neon \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-BE			; RUN: llc < %s -mtriple=thumbeb -mattr=v7,neon \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-BE

	; PR15525			; PR15525
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: ldr.w [[REG:r[0-9]+]], [sp]			; CHECK: ldr.w [[REG:r[0-9]+]], [sp]
	; CHECK-LE-NEXT: vmov {{d[0-9]+}}, r1, r2			; CHECK-LE-NEXT: vmov {{d[0-9]+}}, r1, r2
	; CHECK-LE-NEXT: vmov {{d[0-9]+}}, r3, [[REG]]			; CHECK-LE-NEXT: vmov {{d[0-9]+}}, r3, [[REG]]
	; CHECK-BE-NEXT: vmov {{d[0-9]+}}, r2, r1			; CHECK-BE-NEXT: vmov {{d[0-9]+}}, r2, r1
	; CHECK-BE-NEXT: vmov {{d[0-9]+}}, [[REG]], r3			; CHECK-BE: vmov {{d[0-9]+}}, [[REG]], r3
	; CHECK-NEXT: vst1.8 {{{d[0-9]+}}, {{d[0-9]+}}}, [r0]			; CHECK: vst1.8 {{{d[0-9]+}}, {{d[0-9]+}}}, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	define void @test1(i8* %arg, [4 x i64] %vec.coerce) {			define void @test1(i8* %arg, [4 x i64] %vec.coerce) {
	bb:			bb:
	%tmp = extractvalue [4 x i64] %vec.coerce, 0			%tmp = extractvalue [4 x i64] %vec.coerce, 0
	%tmp2 = bitcast i64 %tmp to <8 x i8>			%tmp2 = bitcast i64 %tmp to <8 x i8>
	%tmp3 = shufflevector <8 x i8> %tmp2, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp3 = shufflevector <8 x i8> %tmp2, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%tmp4 = extractvalue [4 x i64] %vec.coerce, 1			%tmp4 = extractvalue [4 x i64] %vec.coerce, 1
	%tmp5 = bitcast i64 %tmp4 to <8 x i8>			%tmp5 = bitcast i64 %tmp4 to <8 x i8>
	%tmp6 = shufflevector <8 x i8> %tmp5, <8 x i8> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%tmp6 = shufflevector <8 x i8> %tmp5, <8 x i8> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%tmp7 = shufflevector <16 x i8> %tmp6, <16 x i8> %tmp3, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%tmp7 = shufflevector <16 x i8> %tmp6, <16 x i8> %tmp3, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	tail call void @llvm.arm.neon.vst1.v16i8(i8* %arg, <16 x i8> %tmp7, i32 2)			tail call void @llvm.arm.neon.vst1.v16i8(i8* %arg, <16 x i8> %tmp7, i32 2)
	ret void			ret void
	}			}

	declare void @llvm.arm.neon.vst1.v16i8(i8*, <16 x i8>, i32)			declare void @llvm.arm.neon.vst1.v16i8(i8*, <16 x i8>, i32)

test/CodeGen/ARM/vcombine.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	%tmp1 = load <8 x i16>* %A
%tmp2 = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%tmp2 = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i16> %tmp2		ret <4 x i16> %tmp2
}		}

define <8 x i8> @vget_high8(<16 x i8>* %A) nounwind {		define <8 x i8> @vget_high8(<16 x i8>* %A) nounwind {
; CHECK: vget_high8		; CHECK: vget_high8
; CHECK-NOT: vst		; CHECK-NOT: vst
; CHECK-LE: vmov r0, r1, d17		; CHECK-LE: vmov r0, r1, d17
; CHECK-BE: vmov r1, r0, d17		; CHECK-BE: vmov r1, r0, d16
%tmp1 = load <16 x i8>* %A		%tmp1 = load <16 x i8>* %A
%tmp2 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp2 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <8 x i8> %tmp2		ret <8 x i8> %tmp2
}		}

This is an archive of the discontinued LLVM Phabricator instance.

ARM: Implement big endian bit-conversion for NEON typesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 9257

lib/Target/ARM/ARMFastISel.cpp

lib/Target/ARM/ARMISelLowering.cpp

lib/Target/ARM/ARMInstrNEON.td

test/CodeGen/ARM/big-endian-neon-bitconv.ll

test/CodeGen/ARM/dagcombine-concatvector.ll

test/CodeGen/ARM/vcombine.ll

ARM: Implement big endian bit-conversion for NEON types
ClosedPublic