Download Raw Diff

Details

Reviewers

dmgreen
david-arm
sdesmalen
nikic
spatel
paulwalker-arm
efriedma

Commits

rG43b2df03e842: [LegalizeTypes][AArch64] Use scalar_to_vector to eliminate bitcast

Summary

Legalize t3: v2i16 = bitcast i32
with            (v2i16 extract_subvector (v4i16 bitcast (v2i32 scalar_to_vector (i32 in))), 0)

Fix https://github.com/llvm/llvm-project/issues/61638

NOTE: Don't touch getPreferredVectorAction like X86 as this will touch too many test cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Allen created this revision.Apr 5 2023, 8:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 8:32 PM

Herald added subscribers: StephenFan, pengfei, hiraditya, kristof.beyls. · View Herald Transcript

Allen requested review of this revision.Apr 5 2023, 8:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 8:32 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B223913: Diff 511261.Apr 5 2023, 8:40 PM

My gut feeling is this isn't a valid transformation. Looking at the output from:

define <2 x i16> @bitcast_from_int(i32 %word) {
  %ret = bitcast i32 %word to <2 x i16>
  ret <2 x i16> %ret
}

define i16 @extract_lo(<2 x i16> %vec) {
  %ret = extractelement <2 x i16> %vec, i64 0
  ret i16 %ret
}

define i16 @extract_hi(<2 x i16> %vec) {
  %ret = extractelement <2 x i16> %vec, i64 1
  ret i16 %ret
}

bitcast_from_int:                       // @bitcast_from_int
	sub	sp, sp, #16
	.cfi_def_cfa_offset 16
	add	x8, sp, #12
	str	w0, [sp, #12]
	ld1	{ v0.h }[0], [x8]
	orr	x8, x8, #0x2
	ld1	{ v0.h }[2], [x8]
	add	sp, sp, #16
	ret

extract_lo:                             // @extract_lo
	fmov	w0, s0
	ret

extract_hi:                             // @extract_hi
	mov	w0, v0.s[1]
	ret

Shows <2 x i16> is represented as an unpacked <2 x i32> vector with the valid bits being the bottom 16-bits within each 32bit element. This means the ISD::ANY_EXTEND within the DAG your matching is critical and this gets dropped when replacing with ISD::SCALAR_TO_VECTOR. Ultimately it looks like ISD::BITCAST for smaller than legal vectors types is not a nop unless the src/dst is a load/store in which case the extension can be removed. The poor code is the result of type legalisation so perhaps you'd be better off implementing custom legalisation for ISD::BITCAST involving 32-bit vector types. I recall we did something similar for SVE to handle its unpacked vector types (see ReplaceBITCASTResults).

Thanks for your advise, I'll try that.

In current case, I also check N->getValueType(0).getScalarType() == N00.getNode()->getValueType(0), so we only address the case A --> VecB --> VecA, as the scalar type of VecA is same to A, so we can ignore the difference between VecB and VecA on the laid out ?

In addition, do you mean we “ prefer to widen v4i16 to v2i32” similar to x86 as following ? (I test it , it works fine, but there is too many case need to update, so I want to make sure I have understood what you mean rightly for off implementing custom legalisation for ISD::BITCAST)

@@ -22680,6 +22680,11 @@ AArch64TargetLowering::getPreferredVectorAction(MVT VT) const {
       VT == MVT::v1f32)
     return TypeWidenVector;
 
+  // We prefer to widen v4i16 to v2i32 instead of to promote.
+  // TODO: Add more types.
+  if (VT == MVT::v2i16)
+    return TypeWidenVector;
+
   return TargetLoweringBase::getPreferredVectorAction(VT);
 }

Matt added a subscriber: Matt.Apr 11 2023, 11:45 AM

Define custom legalisation for ISD::BITCAST as comment

Allen updated this revision to Diff 530152.Jun 9 2023, 9:42 PM

Harbormaster completed remote builds in B237916: Diff 530152.Jun 9 2023, 10:29 PM

efriedma added a subscriber: efriedma.Jun 9 2023, 11:06 PM

efriedma added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22895	SCALAR_TO_VECTOR from i32 to v2i16 implicitly discards the high 16 bits of the input. (The other lane is UNDEF. UNDEF might happen to be what you want in simple cases, but it won't be in general.) Something like `(v2i16 extract_subvector (v4i16 bitcast (v2i32 scalar_to_vector (i32 in))), 0)` should have the semantics you want, and lower to something reasonably efficient.

(v2i16 extract_subvector (v4i16 bitcast (v2i32 scalar_to_vector (i32 in))), 0)

Harbormaster completed remote builds in B237927: Diff 530172.Jun 10 2023, 1:17 AM

Allen edited the summary of this revision. (Show Details)Jun 10 2023, 2:47 AM

Allen added a reviewer: efriedma.

Allen added inline comments.Jun 10 2023, 2:52 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22895	Thanks, apply your comment. The extra instruction ushll v0.4s, v0.4h, #0 for (v4i32 any_extend v4i16) will clear the other lane. t2: i32,ch = CopyFromReg t0, Register:i32 %0 t8: v2i32 = scalar_to_vector t2 t9: v4i16 = bitcast t8 t20: v4i32 = any_extend t9 t21: v2i32 = extract_subvector t20, Constant:i64<0>

Is it worth adding other types like i16->v2i8 and i32->v4i8 too?

Add new biscast i16->v2i8 and i32->v4i8

Harbormaster completed remote builds in B238371: Diff 530746.Jun 12 2023, 8:38 PM

Thanks for doing this. That's great.

The v2i8 version doesn't look too efficient with it moving back and forth, but they all look better than going via the stack. LGTM, thanks.

This revision is now accepted and ready to land.Jun 13 2023, 3:07 AM

In D147678#4416884, @dmgreen wrote:

Thanks for doing this. That's great.

The v2i8 version doesn't look too efficient with it moving back and forth, but they all look better than going via the stack. LGTM, thanks.

I add a comment on https://github.com/llvm/llvm-project/issues/61638#issuecomment-1495316614** , these issue can produce efficient assemble (only a fmov s0, w0** as expected), but there is too many tests affected.
so I'm not sure if this is a direction for further optimization?

I've an extra nit but otherwise looks good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22894–22897	Is this necessary given it is a requirement of `ISD::BITCAST`? By which I mean you shouldn't need to manually verify the DAG is valid here? I'd expect `getNode()` to catch such errors.
22902	Please can you use `getVectorIdxConstant` here rather than hardwiring `MVT::i64`?

Fix comments
a) delete assert
b) use getVectorIdxConstant

Allen marked 3 inline comments as done.Jun 13 2023, 6:14 AM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22894–22897	deleted assert, thanks
22902	Apply your comment, thanks

In D147678#4416899, @Allen wrote:

I add a comment on https://github.com/llvm/llvm-project/issues/61638#issuecomment-1495316614** , these issue can produce efficient assemble (only a fmov s0, w0** as expected), but there is too many tests affected.
so I'm not sure if this is a direction for further optimization?

The <2 x i16> across a call boundary needs to be treated like a <2 x i32> (as that is a legal type). To do differently would be an ABI break, which would be a lot of work. There are certainly places where we could be better for small vector types, but I don't think a fmov s0, w0 would be valid on it's own. My comment was more about the bitcast i16 %word to <2 x i8> case which is moving back and forth between gpr and vectors more than it needs to. That can be improved in other patches though.

By the way it might be worth making sure the tests cover BE too. I believe they are OK, but it would be a good idea to make sure it is tested.

rebase to cover BE

Allen added a parent revision: D152815: [test] Update the checking base for LE and BE.Jun 13 2023, 7:05 AM

Harbormaster completed remote builds in B238484: Diff 530896.Jun 13 2023, 8:00 AM

rebase as the changes of tests

Harbormaster completed remote builds in B238787: Diff 531307.Jun 14 2023, 7:49 AM

This revision was landed with ongoing or failed builds.Jun 14 2023, 8:34 AM

Closed by commit rG43b2df03e842: [LegalizeTypes][AArch64] Use scalar_to_vector to eliminate bitcast (authored by Allen). · Explain Why

This revision was automatically updated to reflect the committed changes.

Allen added a commit: rG43b2df03e842: [LegalizeTypes][AArch64] Use scalar_to_vector to eliminate bitcast.

Allen mentioned this in rGe108aee956e1: [test] Update the checking base for LE and BE.

Allen mentioned this in D153394: [AArch64][GlobalISel] Legalize <2 x s8> and <4 x s8> for G_BUILD_VECTOR.Jun 20 2023, 7:01 PM

Diff 531355

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,222 Lines • ▼ Show 20 Lines	if (Subtarget->hasNEON()) {

setTruncStoreAction(MVT::v4i16, MVT::v4i8, Custom);		setTruncStoreAction(MVT::v4i16, MVT::v4i8, Custom);

setOperationAction(ISD::BITCAST, MVT::i2, Custom);		setOperationAction(ISD::BITCAST, MVT::i2, Custom);
setOperationAction(ISD::BITCAST, MVT::i4, Custom);		setOperationAction(ISD::BITCAST, MVT::i4, Custom);
setOperationAction(ISD::BITCAST, MVT::i8, Custom);		setOperationAction(ISD::BITCAST, MVT::i8, Custom);
setOperationAction(ISD::BITCAST, MVT::i16, Custom);		setOperationAction(ISD::BITCAST, MVT::i16, Custom);

		setOperationAction(ISD::BITCAST, MVT::v2i8, Custom);
		setOperationAction(ISD::BITCAST, MVT::v2i16, Custom);
		setOperationAction(ISD::BITCAST, MVT::v4i8, Custom);

setLoadExtAction(ISD::EXTLOAD, MVT::v4i16, MVT::v4i8, Custom);		setLoadExtAction(ISD::EXTLOAD, MVT::v4i16, MVT::v4i8, Custom);
setLoadExtAction(ISD::SEXTLOAD, MVT::v4i16, MVT::v4i8, Custom);		setLoadExtAction(ISD::SEXTLOAD, MVT::v4i16, MVT::v4i8, Custom);
setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i16, MVT::v4i8, Custom);		setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i16, MVT::v4i8, Custom);
setLoadExtAction(ISD::EXTLOAD, MVT::v4i32, MVT::v4i8, Custom);		setLoadExtAction(ISD::EXTLOAD, MVT::v4i32, MVT::v4i8, Custom);
setLoadExtAction(ISD::SEXTLOAD, MVT::v4i32, MVT::v4i8, Custom);		setLoadExtAction(ISD::SEXTLOAD, MVT::v4i32, MVT::v4i8, Custom);
setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i32, MVT::v4i8, Custom);		setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i32, MVT::v4i8, Custom);

// ADDP custom lowering		// ADDP custom lowering
▲ Show 20 Lines • Show All 21,643 Lines • ▼ Show 20 Lines	for (SDNode *Node : Copy->uses()) {
HasRet = true;		HasRet = true;
}		}

if (!HasRet)		if (!HasRet)
return false;		return false;

Chain = TCChain;		Chain = TCChain;
return true;		return true;
}		}

		efriedmaUnsubmitted Done Reply Inline Actions SCALAR_TO_VECTOR from i32 to v2i16 implicitly discards the high 16 bits of the input. (The other lane is UNDEF. UNDEF might happen to be what you want in simple cases, but it won't be in general.) Something like `(v2i16 extract_subvector (v4i16 bitcast (v2i32 scalar_to_vector (i32 in))), 0)` should have the semantics you want, and lower to something reasonably efficient. efriedma: SCALAR_TO_VECTOR from i32 to v2i16 implicitly discards the high 16 bits of the input. (The…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks, apply your comment. The extra instruction ushll v0.4s, v0.4h, #0 for (v4i32 any_extend v4i16) will clear the other lane. t2: i32,ch = CopyFromReg t0, Register:i32 %0 t8: v2i32 = scalar_to_vector t2 t9: v4i16 = bitcast t8 t20: v4i32 = any_extend t9 t21: v2i32 = extract_subvector t20, Constant:i64<0> Allen: Thanks, apply your comment. The extra instruction ushll v0.4s, v0.4h, #0 for **(v4i32…
// Return whether the an instruction can potentially be optimized to a tail		// Return whether the an instruction can potentially be optimized to a tail
// call. This will cause the optimizers to attempt to move, or duplicate,		// call. This will cause the optimizers to attempt to move, or duplicate,
		paulwalker-armUnsubmitted Done Reply Inline Actions Is this necessary given it is a requirement of `ISD::BITCAST`? By which I mean you shouldn't need to manually verify the DAG is valid here? I'd expect `getNode()` to catch such errors. paulwalker-arm: Is this necessary given it is a requirement of `ISD::BITCAST`? By which I mean you shouldn't…
		AllenAuthorUnsubmitted Done Reply Inline Actions deleted assert, thanks Allen: deleted assert, thanks
// return instructions to help enable tail call optimizations for this		// return instructions to help enable tail call optimizations for this
// instruction.		// instruction.
bool AArch64TargetLowering::mayBeEmittedAsTailCall(const CallInst *CI) const {		bool AArch64TargetLowering::mayBeEmittedAsTailCall(const CallInst *CI) const {
return CI->isTailCall();		return CI->isTailCall();
}		}
		paulwalker-armUnsubmitted Done Reply Inline Actions Please can you use `getVectorIdxConstant` here rather than hardwiring `MVT::i64`? paulwalker-arm: Please can you use `getVectorIdxConstant` here rather than hardwiring `MVT::i64`?
		AllenAuthorUnsubmitted Done Reply Inline Actions Apply your comment, thanks Allen: Apply your comment, thanks

bool AArch64TargetLowering::getIndexedAddressParts(SDNode N, SDNode Op,		bool AArch64TargetLowering::getIndexedAddressParts(SDNode N, SDNode Op,
SDValue &Base,		SDValue &Base,
SDValue &Offset,		SDValue &Offset,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
if (Op->getOpcode() != ISD::ADD && Op->getOpcode() != ISD::SUB)		if (Op->getOpcode() != ISD::ADD && Op->getOpcode() != ISD::SUB)
return false;		return false;

▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (AllUndef)
Op = Op.getOperand(0);		Op = Op.getOperand(0);
}		}

SDValue VectorBits = vectorToScalarBitmask(Op.getNode(), DAG);		SDValue VectorBits = vectorToScalarBitmask(Op.getNode(), DAG);
if (VectorBits)		if (VectorBits)
Results.push_back(DAG.getZExtOrTrunc(VectorBits, DL, VT));		Results.push_back(DAG.getZExtOrTrunc(VectorBits, DL, VT));
}		}

		static void CustomNonLegalBITCASTResults(SDNode *N,
		SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG, EVT ExtendVT,
		EVT CastVT) {
		SDLoc DL(N);
		SDValue Op = N->getOperand(0);
		EVT VT = N->getValueType(0);

		// Use SCALAR_TO_VECTOR for lane zero
		SDValue Vec = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, ExtendVT, Op);
		SDValue CastVal = DAG.getNode(ISD::BITCAST, DL, CastVT, Vec);
		SDValue IdxZero = DAG.getVectorIdxConstant(0, DL);
		Results.push_back(
		DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, CastVal, IdxZero));
		return;
		}

void AArch64TargetLowering::ReplaceBITCASTResults(		void AArch64TargetLowering::ReplaceBITCASTResults(
SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG) const {		SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG) const {
SDLoc DL(N);		SDLoc DL(N);
SDValue Op = N->getOperand(0);		SDValue Op = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT SrcVT = Op.getValueType();		EVT SrcVT = Op.getValueType();

		if (VT == MVT::v2i16 && SrcVT == MVT::i32) {
		CustomNonLegalBITCASTResults(N, Results, DAG, MVT::v2i32, MVT::v4i16);
		return;
		}

		if (VT == MVT::v4i8 && SrcVT == MVT::i32) {
		CustomNonLegalBITCASTResults(N, Results, DAG, MVT::v2i32, MVT::v8i8);
		return;
		}

		if (VT == MVT::v2i8 && SrcVT == MVT::i16) {
		CustomNonLegalBITCASTResults(N, Results, DAG, MVT::v4i16, MVT::v8i8);
		return;
		}

if (VT.isScalableVector() && !isTypeLegal(VT) && isTypeLegal(SrcVT)) {		if (VT.isScalableVector() && !isTypeLegal(VT) && isTypeLegal(SrcVT)) {
assert(!VT.isFloatingPoint() && SrcVT.isFloatingPoint() &&		assert(!VT.isFloatingPoint() && SrcVT.isFloatingPoint() &&
"Expected fp->int bitcast!");		"Expected fp->int bitcast!");

// Bitcasting between unpacked vector types of different element counts is		// Bitcasting between unpacked vector types of different element counts is
// not a NOP because the live elements are laid out differently.		// not a NOP because the live elements are laid out differently.
// 01234567		// 01234567
// e.g. nxv2i32 = XX??XX??		// e.g. nxv2i32 = XX??XX??
▲ Show 20 Lines • Show All 2,507 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-load-ext.ll

Show First 20 Lines • Show All 440 Lines • ▼ Show 20 Lines	; CHECK-BE-NEXT: ret
%z = add <4 x i8> %x, %y		%z = add <4 x i8> %x, %y
%s = sext <4 x i8> %z to <4 x i32>		%s = sext <4 x i8> %z to <4 x i32>
ret <4 x i32> %s		ret <4 x i32> %s
}		}

define <4 x i8> @bitcast(i32 %0) {		define <4 x i8> @bitcast(i32 %0) {
; CHECK-LE-LABEL: bitcast:		; CHECK-LE-LABEL: bitcast:
; CHECK-LE: // %bb.0:		; CHECK-LE: // %bb.0:
; CHECK-LE-NEXT: sub sp, sp, #16		; CHECK-LE-NEXT: fmov s0, w0
; CHECK-LE-NEXT: .cfi_def_cfa_offset 16		; CHECK-LE-NEXT: zip1 v0.8b, v0.8b, v0.8b
; CHECK-LE-NEXT: str w0, [sp, #12]
; CHECK-LE-NEXT: ldr s0, [sp, #12]
; CHECK-LE-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-LE-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-LE-NEXT: add sp, sp, #16
; CHECK-LE-NEXT: ret		; CHECK-LE-NEXT: ret
;		;
; CHECK-BE-LABEL: bitcast:		; CHECK-BE-LABEL: bitcast:
; CHECK-BE: // %bb.0:		; CHECK-BE: // %bb.0:
; CHECK-BE-NEXT: sub sp, sp, #16		; CHECK-BE-NEXT: fmov s0, w0
; CHECK-BE-NEXT: .cfi_def_cfa_offset 16
; CHECK-BE-NEXT: str w0, [sp, #12]
; CHECK-BE-NEXT: ldr s0, [sp, #12]
; CHECK-BE-NEXT: rev32 v0.8b, v0.8b		; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
; CHECK-BE-NEXT: ushll v0.8h, v0.8b, #0		; CHECK-BE-NEXT: zip1 v0.8b, v0.8b, v0.8b
; CHECK-BE-NEXT: rev64 v0.4h, v0.4h		; CHECK-BE-NEXT: rev64 v0.8b, v0.8b
; CHECK-BE-NEXT: add sp, sp, #16
; CHECK-BE-NEXT: ret		; CHECK-BE-NEXT: ret
%2 = bitcast i32 %0 to <4 x i8>		%2 = bitcast i32 %0 to <4 x i8>
ret <4 x i8> %2		ret <4 x i8> %2
}		}

llvm/test/CodeGen/AArch64/neon-bitcast.ll

	Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines
	define <16 x i8> @test_v2f64_to_v16i8(<2 x double> %in) nounwind{			define <16 x i8> @test_v2f64_to_v16i8(<2 x double> %in) nounwind{
	; CHECK-LABEL: test_v2f64_to_v16i8:			; CHECK-LABEL: test_v2f64_to_v16i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%val = bitcast <2 x double> %in to <16 x i8>			%val = bitcast <2 x double> %in to <16 x i8>
	ret <16 x i8> %val			ret <16 x i8> %val
	}			}

				define <2 x i16> @bitcast_i32_to_v2i16(i32 %word) {
				; CHECK-LE-LABEL: bitcast_i32_to_v2i16:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: fmov s0, w0
				; CHECK-LE-NEXT: ushll v0.4s, v0.4h, #0
				; CHECK-LE-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: bitcast_i32_to_v2i16:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: fmov s0, w0
				; CHECK-BE-NEXT: rev32 v0.4h, v0.4h
				; CHECK-BE-NEXT: ushll v0.4s, v0.4h, #0
				; CHECK-BE-NEXT: rev64 v0.2s, v0.2s
				; CHECK-BE-NEXT: ret
				%ret = bitcast i32 %word to <2 x i16>
				ret <2 x i16> %ret
				}

				define <4 x i8> @bitcast_i32_to_v4i8(i32 %word) {
				; CHECK-LE-LABEL: bitcast_i32_to_v4i8:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: fmov s0, w0
				; CHECK-LE-NEXT: zip1 v0.8b, v0.8b, v0.8b
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: bitcast_i32_to_v4i8:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: fmov s0, w0
				; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
				; CHECK-BE-NEXT: zip1 v0.8b, v0.8b, v0.8b
				; CHECK-BE-NEXT: rev64 v0.8b, v0.8b
				; CHECK-BE-NEXT: ret
				%ret = bitcast i32 %word to <4 x i8>
				ret <4 x i8> %ret
				}

				; TODO: Eliminate redundant moving back and forth between gpr and vectors
				define <2 x i8> @bitcast_i16_to_v2i8(i16 %word) {
				; CHECK-LE-LABEL: bitcast_i16_to_v2i8:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: fmov s0, w0
				; CHECK-LE-NEXT: umov w8, v0.b[0]
				; CHECK-LE-NEXT: umov w9, v0.b[1]
				; CHECK-LE-NEXT: fmov s0, w8
				; CHECK-LE-NEXT: mov v0.s[1], w9
				; CHECK-LE-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: bitcast_i16_to_v2i8:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: fmov s0, w0
				; CHECK-BE-NEXT: rev16 v0.16b, v0.16b
				; CHECK-BE-NEXT: umov w8, v0.b[0]
				; CHECK-BE-NEXT: umov w9, v0.b[1]
				; CHECK-BE-NEXT: fmov s0, w8
				; CHECK-BE-NEXT: mov v0.s[1], w9
				; CHECK-BE-NEXT: rev64 v0.2s, v0.2s
				; CHECK-BE-NEXT: ret
				%ret = bitcast i16 %word to <2 x i8>
				ret <2 x i8> %ret
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LegalizeTypes][AArch64] Use scalar_to_vector to eliminate bitcast
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 531355

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/aarch64-load-ext.ll

llvm/test/CodeGen/AArch64/neon-bitcast.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LegalizeTypes][AArch64] Use scalar_to_vector to eliminate bitcastClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 531355

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/aarch64-load-ext.ll

llvm/test/CodeGen/AArch64/neon-bitcast.ll

[LegalizeTypes][AArch64] Use scalar_to_vector to eliminate bitcast
ClosedPublic