This is an archive of the discontinued LLVM Phabricator instance.

Differential D5097

[AArch64 - BE] BUILD_VECTOR lane order is reversed in big-endian mode
AbandonedPublic

Authored by rmaprath on Aug 28 2014, 3:51 AM.

Download Raw Diff

Details

Reviewers

t.p.northover

Summary

Hi Tim,

This is a revamp of [1].

That patch was rejected mainly because it lacked enough testing. I've added tests to cover all ModImmTypes and addressed the other minor points you mentioned there.

Notes:-

The need to use rev64 instructions with big-endian vectors is documented at [2]
Most ModImmTypes have a symmetric counter-part, and the lane reversal causes them to be encoded in that opposite pattern. A few ModImmTypes (7,8, 11, 12) do not have that property and gets pushed into memory. But this is irrelevant as far as correctness is concerned.

Thanks.

Asiri

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140714/226398.html
[2] http://llvm.org/docs/BigEndianNEON.html

Diff Detail

Event Timeline

rmaprath updated this revision to Diff 13026.Aug 28 2014, 3:51 AM

rmaprath retitled this revision from to [AArch64 - BE] BUILD_VECTOR lane order is reversed in big-endian mode.

rmaprath updated this object.

rmaprath edited the test plan for this revision. (Show Details)

rmaprath added a reviewer: t.p.northover.

rmaprath added a subscriber: Unknown Object (MLST).

Herald added subscribers: mcrosier, aemerson. · View Herald TranscriptAug 28 2014, 3:51 AM

Hi Asiri,

Sorry for the delay on this one, you picked just the wrong time to start working on it as I've been away for the last week.

That patch was rejected mainly because it lacked enough testing.

No it wasn't, it was rejected (or at least delayed, it may well be correct) mainly because I thought the implementation was suspect. The testing would have to improve as well, but that was less important to me.

I'll take another look in light of your question, once I've caught up with everything.

Cheers.

Tim.

Hi Tim,

Thanks for looking into this.

Earlier I had some uncertain points about your original review comments (as they were written for James, not at the level I can process ;)), but I had a chat with James and got them resolved.

Asiri

Hi Asiri,

I'm afraid I still think that this isn't fixing the real issue, which is the dodgy casting that goes on *after* resolveBuildVector. Putting the change there seems like it would be trying to make two bugs cancel out into a feature (though I can't find any currently incorrect code that results).

It makes the resolveBuildVector's output incompatible with its fundamental building-block, isConstantSplat.
It means that CnstBits no longer represent the in-register value directly: "CnstBits & 1" no longer checks if the value would be even, for example.
We now end up passing a big-endian value into endian-agnostic isAdvSIMDModImmTypeN functions.

We bodge that last point up with some REVs afterwards, but that only works in the direct MOVI case, I think. For example:

@vec = global <4 x i16> <i16 1234, i16 5678, i16 9101, i16 1121>

define i16 @foo() {
  %in = load <4 x i16>* @vec
  %res_vec = and <4 x i16> %in, <i16 65400, i16 65535, i16 65400, i16 65535>
  %elt = extractelement <4 x i16> %res_vec, i16 0
  ret i16 %elt
}

This currently produces:

ld1     { v0.4h }, [x8]
bic     v0.2s, #0x87
rev32   v0.4h, v0.4h
umov    w0, v0.h[0]

where that REV32 is highly suspect. But after your patch it produces:

ld1     { v0.4h }, [x8]
bic     v0.2s, #0x87, lsl #16
rev32   v0.4h, v0.4h
umov    w0, v0.h[0]

where both the BIC and the REV32 are wrong, and don't cancel each other out.

Do my concerns make more sense now?

Cheers.

Tim.

Hi Tim,

I'm afraid I still think that this isn't fixing the real issue, which is the dodgy casting that goes on *after* resolveBuildVector. Putting the change there seems like it would be trying to make two bugs cancel out into a feature (though I can't find any currently incorrect code that results).

I had an itching about this and was trying to come up with a counter-example for a couple of days to prove that this fix was bit too narrow. But I finally gave in (after trying to break resolveBUILD_VECTOR with various ModImmTypes) and thought this is something we have to live with given how we handle vectors in big-endian mode.

It makes the resolveBuildVector's output incompatible with its fundamental building-block, isConstantSplat.

It means that CnstBits no longer represent the in-register value directly: "CnstBits & 1" no longer checks if the value would be even, for example.

We now end up passing a big-endian value into endian-agnostic isAdvSIMDModImmTypeN functions.

We bodge that last point up with some REVs afterwards, but that only works in the direct MOVI case, I think.

AFAICS the REV instructions are redundant (in the MOVI case). My understanding was that we have to perform the lane reversal within resolveBuildVector() because at the callee end (when a vector is passed to a function) we get something like:

define i16 @f(<4 x i16> %arg) nounwind {
  ; CHECK:          rev64   v0.4h, v0.4h
  ; CHECK-NEXT      umov    w0, v0.h[0]
  ; CHECK-NEXT      ret
  %v = extractelement <4 x i16> %arg, i32 0
  ret i16 %v
}

So, to cater for that rev64, we keep the vector lane-reversed. Perhaps a proper fix will have to get rid of that rev64.

Admittedly, my understanding of the problem was a bit narrow at the time so my focus was on somehow resurrecting the existing patch :)

For example:

@vec = global <4 x i16> <i16 1234, i16 5678, i16 9101, i16 1121>

define i16 @foo() {
  %in = load <4 x i16>* @vec
  %res_vec = and <4 x i16> %in, <i16 65400, i16 65535, i16 65400, i16 65535>
  %elt = extractelement <4 x i16> %res_vec, i16 0
  ret i16 %elt
}

This currently produces:

ld1     { v0.4h }, [x8]
bic     v0.2s, #0x87
rev32   v0.4h, v0.4h
umov    w0, v0.h[0]

where that REV32 is highly suspect. But after your patch it produces:

ld1     { v0.4h }, [x8]
bic     v0.2s, #0x87, lsl #16
rev32   v0.4h, v0.4h
umov    w0, v0.h[0]

where both the BIC and the REV32 are wrong, and don't cancel each other out.

Thanks for this example. This should be enough for me to look for a different fix.

Cheers!

Asiri.

AFAICS the REV instructions are redundant (in the MOVI case). My understanding was that we have to perform the lane reversal within resolveBuildVector() because at the callee end (when a vector is passed to a function) we get something like:

Ah, if you're looking at the call case specifically then some kind of
rev is needed for ABI compatibility, but what we're emitting at the
moment is a merged version of the necessary REV and the dodgy one.
When passing a v4i16 emit:

REV64 v0.2s, v0.2s

However, the DAG that actually gets created is closer to:

REV32 v0.4h, v0.4h // By the code associated with resolveBuildVector
REV64 v0.4h, v0.4h // By LowerCall

And the combination of those two operations is indeed a "REV64
v0.2s"[1]. I think that first one is a problem (particularly in the
BIC/OR case, but conceptually not what we want even for MOVI).

Cheers.

Tim.

[1] A good way to do these in your head is that the bigger size always
gets attached to the REV mnemonic, and combining two REVs with a
common size ("16" in this case) you drop that common size. So we have
64 <-> 16 and 32 <-> 16, which means the result is a 32 <-> 64,
written REV64 v0.2s.

Hi TIm,

However, the DAG that actually gets created is closer to:
REV32 v0.4h, v0.4h // By the code associated with resolveBuildVector
REV64 v0.4h, v0.4h // By LowerCall
And the combination of those two operations is indeed a "REV64
v0.2s"[1]. I think that first one is a problem (particularly in the
BIC/OR case, but conceptually not what we want even for MOVI).

Thanks for this explanation, helped a lot :)

I will play around getting rid of that dodgy rev.

Cheers!

Asiri.

rmaprath abandoned this revision.Sep 3 2014, 6:11 AM

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

66 lines

test/

CodeGen/

AArch64/

aarch64-big-endian-movi.ll

227 lines

Diff 13026

lib/Target/AArch64/AArch64ISelLowering.cpp

Context not available.
	return GenerateTBL(Op, ShuffleMask, DAG);	return GenerateTBL(Op, ShuffleMask, DAG);
	}	}

		//
		// In little-endian mode, CnstBits and UndefBits will consist of repeated
		// copies of the underlying SplatBits and SplatUndef from isConstantSplat().
		// In big-endian mode, the same will be reversed by chunks of lane width.
		//
		// Example: <i16 0, i16 1, i16 0, i16 1>
		//
		// little-endian:
		// CnstBits -> [0x00010000, 0x00010000]
		// UndefBits -> [0x00010000, 0x00010000]
		// big-endian:
		// CnstBits -> [0x0000, 0x0001, 0x0000, 0x0001]
		// UndefBits -> [0x0000, 0x0001, 0x0000, 0x0001]
		//
	static bool resolveBuildVector(BuildVectorSDNode *BVN, APInt &CnstBits,	static bool resolveBuildVector(BuildVectorSDNode *BVN, APInt &CnstBits,
	APInt &UndefBits) {	APInt &UndefBits, bool isBigEndian) {
	EVT VT = BVN->getValueType(0);	EVT VT = BVN->getValueType(0);
		CnstBits = APInt(VT.getSizeInBits(), 0);
		UndefBits = APInt(VT.getSizeInBits(), 0);
	APInt SplatBits, SplatUndef;	APInt SplatBits, SplatUndef;
	unsigned SplatBitSize;	unsigned SplatBitSize;
	bool HasAnyUndefs;	bool HasAnyUndefs;
Context not available.
	UndefBits <<= SplatBitSize;	UndefBits <<= SplatBitSize;
	CnstBits \|= SplatBits.zextOrTrunc(VT.getSizeInBits());	CnstBits \|= SplatBits.zextOrTrunc(VT.getSizeInBits());
	UndefBits \|= (SplatBits ^ SplatUndef).zextOrTrunc(VT.getSizeInBits());	UndefBits \|= (SplatBits ^ SplatUndef).zextOrTrunc(VT.getSizeInBits());
		}

		// In big endian mode, the underlying lanes are reversed.
		if (isBigEndian) {
		APInt BeCnstBits(VT.getSizeInBits(), 0), BeUndefBits(VT.getSizeInBits(), 0);
		unsigned Sz = BVN->getValueType(0).getVectorElementType().getSizeInBits();
		APInt Mask = APInt::getAllOnesValue(Sz);
		Mask = Mask.zextOrTrunc(VT.getSizeInBits());
		for (unsigned I = 0; I < BVN->getValueType(0).getVectorNumElements(); ++I) {
		BeCnstBits <<= Sz;
		BeUndefBits <<= Sz;
		BeCnstBits \|= CnstBits.lshr(I * Sz) & Mask;
		BeUndefBits \|= UndefBits.lshr(I * Sz) & Mask;
		}

		CnstBits = BeCnstBits;
		UndefBits = BeUndefBits;
	}	}

	return true;	return true;
Context not available.
	if (!BVN)	if (!BVN)
	return Op;	return Op;

	APInt CnstBits(VT.getSizeInBits(), 0);	APInt CnstBits, UndefBits;
	APInt UndefBits(VT.getSizeInBits(), 0);	if (resolveBuildVector(BVN, CnstBits, UndefBits, !Subtarget->isLittleEndian())) {
	if (resolveBuildVector(BVN, CnstBits, UndefBits)) {
	// We only have BIC vector immediate instruction, which is and-not.	// We only have BIC vector immediate instruction, which is and-not.
	CnstBits = ~CnstBits;	CnstBits = ~CnstBits;

Context not available.
	if (!BVN)	if (!BVN)
	return Op;	return Op;

	APInt CnstBits(VT.getSizeInBits(), 0);	APInt CnstBits, UndefBits;
	APInt UndefBits(VT.getSizeInBits(), 0);	if (resolveBuildVector(BVN, CnstBits, UndefBits, !Subtarget->isLittleEndian())) {
	if (resolveBuildVector(BVN, CnstBits, UndefBits)) {
	// We make use of a little bit of goto ickiness in order to avoid having to	// We make use of a little bit of goto ickiness in order to avoid having to
	// duplicate the immediate matching logic for the undef toggled case.	// duplicate the immediate matching logic for the undef toggled case.
	bool SecondTry = false;	bool SecondTry = false;
Context not available.
	Op = NormalizeBuildVector(Op, DAG);	Op = NormalizeBuildVector(Op, DAG);
	BuildVectorSDNode *BVN = cast<BuildVectorSDNode>(Op.getNode());	BuildVectorSDNode *BVN = cast<BuildVectorSDNode>(Op.getNode());

	APInt CnstBits(VT.getSizeInBits(), 0);	APInt CnstBits, UndefBits;
	APInt UndefBits(VT.getSizeInBits(), 0);	if (resolveBuildVector(BVN, CnstBits, UndefBits, !Subtarget->isLittleEndian())) {
	if (resolveBuildVector(BVN, CnstBits, UndefBits)) {
	// We make use of a little bit of goto ickiness in order to avoid having to	// We make use of a little bit of goto ickiness in order to avoid having to
	// duplicate the immediate matching logic for the undef toggled case.	// duplicate the immediate matching logic for the undef toggled case.
	bool SecondTry = false;	bool SecondTry = false;
Context not available.

	static SDValue EmitVectorComparison(SDValue LHS, SDValue RHS,	static SDValue EmitVectorComparison(SDValue LHS, SDValue RHS,
	AArch64CC::CondCode CC, bool NoNans, EVT VT,	AArch64CC::CondCode CC, bool NoNans, EVT VT,
	SDLoc dl, SelectionDAG &DAG) {	SDLoc dl, SelectionDAG &DAG,
		bool isBigEndian) {
	EVT SrcVT = LHS.getValueType();	EVT SrcVT = LHS.getValueType();

	BuildVectorSDNode *BVN = dyn_cast<BuildVectorSDNode>(RHS.getNode());	BuildVectorSDNode *BVN = dyn_cast<BuildVectorSDNode>(RHS.getNode());
	APInt CnstBits(VT.getSizeInBits(), 0);	APInt CnstBits, UndefBits;
	APInt UndefBits(VT.getSizeInBits(), 0);	bool IsCnst = BVN && resolveBuildVector(BVN, CnstBits, UndefBits, isBigEndian);
	bool IsCnst = BVN && resolveBuildVector(BVN, CnstBits, UndefBits);
	bool IsZero = IsCnst && (CnstBits == 0);	bool IsZero = IsCnst && (CnstBits == 0);

	if (SrcVT.getVectorElementType().isFloatingPoint()) {	if (SrcVT.getVectorElementType().isFloatingPoint()) {
Context not available.
	assert(LHS.getValueType() == RHS.getValueType());	assert(LHS.getValueType() == RHS.getValueType());
	AArch64CC::CondCode AArch64CC = changeIntCCToAArch64CC(CC);	AArch64CC::CondCode AArch64CC = changeIntCCToAArch64CC(CC);
	return EmitVectorComparison(LHS, RHS, AArch64CC, false, Op.getValueType(),	return EmitVectorComparison(LHS, RHS, AArch64CC, false, Op.getValueType(),
	dl, DAG);	dl, DAG, !Subtarget->isLittleEndian());
	}	}

	assert(LHS.getValueType().getVectorElementType() == MVT::f32 \|\|	assert(LHS.getValueType().getVectorElementType() == MVT::f32 \|\|
Context not available.

	bool NoNaNs = getTargetMachine().Options.NoNaNsFPMath;	bool NoNaNs = getTargetMachine().Options.NoNaNsFPMath;
	SDValue Cmp =	SDValue Cmp =
	EmitVectorComparison(LHS, RHS, CC1, NoNaNs, Op.getValueType(), dl, DAG);	EmitVectorComparison(LHS, RHS, CC1, NoNaNs, Op.getValueType(), dl, DAG,
		!Subtarget->isLittleEndian());
	if (!Cmp.getNode())	if (!Cmp.getNode())
	return SDValue();	return SDValue();

	if (CC2 != AArch64CC::AL) {	if (CC2 != AArch64CC::AL) {
	SDValue Cmp2 =	SDValue Cmp2 =
	EmitVectorComparison(LHS, RHS, CC2, NoNaNs, Op.getValueType(), dl, DAG);	EmitVectorComparison(LHS, RHS, CC2, NoNaNs, Op.getValueType(), dl, DAG,
		!Subtarget->isLittleEndian());
	if (!Cmp2.getNode())	if (!Cmp2.getNode())
	return SDValue();	return SDValue();

Context not available.

test/CodeGen/AArch64/aarch64-big-endian-movi.ll

This file was added.

				; RUN: llc -mtriple=aarch64_be--linux-gnu < %s \| FileCheck %s

				; CHECK-LABLE: f:
				define i16 @f(<4 x i16> %arg) nounwind {
				; CHECK: rev64 v[[REG:[0-9]+]].4h, v[[REG]].4h
				; CHECK-NEXT umov w{{[0-9]+}}, v[[REG]].h[0]
				; CHECK-NEXT ret
				%v = extractelement <4 x i16> %arg, i32 0
				ret i16 %v
				}

				; CHECK-LABLE: g:
				define i16 @g(<8 x i16> %arg) nounwind {
				; CHECK: rev64 v[[REG:[0-9]+]].8h, v[[REG]].8h
				; CHECK-NEXT umov w{{[0-9]+}}, v[[REG]].h[0]
				; CHECK-NEXT ret
				%v = extractelement <8 x i16> %arg, i32 0
				ret i16 %v
				}

				; CHECK-LABEL: symmetric:
				define i32 @symmetric() {
				; #### ModImmType1 ####
				; CHECK: movi v[[REG:[0-9]+]].2s, #0x1, lsl #16
				; CHECK-NEXT: rev64 v[[REG]].2s, v[[REG]].2s
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 1, i16 0, i16 1, i16 0>)
				; CHECK: movi v[[REG:[0-9]+]].4s, #0x1, lsl #16
				; CHECK-NEXT: rev64 v[[REG]].4s, v[[REG]].4s
				; CHECK-NEXT: ext v[[REG]].16b, v[[REG]].16b, v[[REG]].16b, #8
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 1, i16 0, i16 1, i16 0, i16 1, i16 0, i16 1, i16 0>)

				; #### ModImmType2 ####
				; CHECK: movi v[[REG:[0-9]+]].2s, #0x1, lsl #24
				; CHECK-NEXT: rev64 v[[REG]].2s, v[[REG]].2s
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 256, i16 0, i16 256, i16 0>)
				; CHECK: movi v[[REG:[0-9]+]].4s, #0x1, lsl #24
				; CHECK-NEXT: rev64 v[[REG]].4s, v[[REG]].4s
				; CHECK-NEXT: ext v[[REG]].16b, v[[REG]].16b, v[[REG]].16b, #8
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 256, i16 0, i16 256, i16 0, i16 256, i16 0, i16 256, i16 0>)

				; #### ModImmType3 ####
				; CHECK: movi v[[REG:[0-9]+]].2s, #0x1
				; CHECK-NEXT: rev64 v[[REG]].2s, v[[REG]].2s
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 0, i16 1, i16 0, i16 1>)
				; CHECK: movi v[[REG:[0-9]+]].4s, #0x1
				; CHECK-NEXT: rev64 v[[REG]].4s, v[[REG]].4s
				; CHECK-NEXT: ext v[[REG]].16b, v[[REG]].16b, v[[REG]].16b, #8
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 0, i16 1, i16 0, i16 1, i16 0, i16 1, i16 0, i16 1>)

				; #### ModImmType4 #####
				; CHECK: movi v[[REG:[0-9]+]].2s, #0x1, lsl #8
				; CHECK-NEXT: rev64 v[[REG]].2s, v[[REG]].2s
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 0, i16 256, i16 0, i16 256>)
				; CHECK: movi v[[REG:[0-9]+]].4s, #0x1, lsl #8
				; CHECK-NEXT: rev64 v[[REG]].4s, v[[REG]].4s
				; CHECK-NEXT: ext v[[REG]].16b, v[[REG]].16b, v[[REG]].16b, #8
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 0, i16 256, i16 0, i16 256, i16 0, i16 256, i16 0, i16 256>)

				; #### ModImmType5 ####
				; CHECK: movi v[[REG:[0-9]+]].4h, #0x1
				; CHECK-NEXT: rev64 v[[REG]].4h, v[[REG]].4h
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 1, i16 1, i16 1, i16 1>)
				; CHECK: movi v[[REG:[0-9]+]].8h, #0x1
				; CHECK-NEXT: rev64 v[[REG]].8h, v[[REG]].8h
				; CHECK-NEXT: ext v[[REG]].16b, v[[REG]].16b, v[[REG]].16b, #8
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>)

				; #### ModImmType6 ####
				; CHECK: movi v[[REG:[0-9]+]].4h, #0x1, lsl #8
				; CHECK-NEXT: rev64 v[[REG]].4h, v[[REG]].4h
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 256, i16 256, i16 256, i16 256>)
				; CHECK: movi v[[REG:[0-9]+]].8h, #0x1, lsl #8
				; CHECK-NEXT: rev64 v[[REG]].8h, v[[REG]].8h
				; CHECK-NEXT: ext v[[REG]].16b, v[[REG]].16b, v[[REG]].16b, #8
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 256, i16 256, i16 256, i16 256, i16 256, i16 256, i16 256, i16 256>)

				; #### ModImmType9 ####
				; CHECK: movi v[[REG:[0-9]+]].8b, #0x1
				; CHECK-NEXT: rev64 v[[REG]].8b, v[[REG]].8b
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 257, i16 257, i16 257, i16 257>)
				; CHECK: movi v[[REG:[0-9]+]].16b, #0x1
				; CHECK-NEXT: rev64 v[[REG]].16b, v[[REG]].16b
				; CHECK-NEXT: ext v[[REG]].16b, v[[REG]].16b, v[[REG]].16b, #8
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257>)

				; #### ModImmType10 ####
				; CHECK: movi d{{[0-9]+}}, #0xffff0000ffff0000
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 -1, i16 0, i16 -1, i16 0>)
				; CHECK: movi v[[REG:[0-9]+]].2d, #0xffff0000ffff0000
				; CHECK-NEXT: ext v[[REG]].16b, v[[REG]].16b, v[[REG]].16b, #8
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 -1, i16 0, i16 -1, i16 0, i16 -1, i16 0, i16 -1, i16 0>)

				ret i32 0
				}

				; #### ModImmType7 ####
				define i32 @ModImmType7() {
				; CHECK: [[LBL1:LCPI...]]:
				; CHECK-NEXT: .hword 511 // 0x1ff
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 511 // 0x1ff
				; CHECK-NEXT: .hword 0 // 0x0

				; CHECK: [[LBL2:LCPI...]]:
				; CHECK-NEXT: .hword 511 // 0x1ff
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 511 // 0x1ff
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 511 // 0x1ff
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 511 // 0x1ff
				; CHECK-NEXT: .hword 0 // 0x0

				; CHECK-LABEL: ModImmType7
				; CHECK: adrp x[[REG:[0-9]+]], .[[LBL1]]
				; CHECK-NEXT: ldr d{{[0-9]+}}, [x[[REG]], :lo12:.[[LBL1]]]
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 511, i16 0, i16 511, i16 0>)

				; CHECK: adrp x[[REG:[0-9]+]], .[[LBL2]]
				; CHECK-NEXT: ldr q{{[0-9]+}}, [x[[REG]], :lo12:.[[LBL2]]]
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 511, i16 0, i16 511, i16 0, i16 511, i16 0, i16 511, i16 0>)

				ret i32 0
				}

				; #### ModImmType8 ####
				define i32 @ModImmType8() {
				; CHECK: [[LBL1:LCPI...]]:
				; CHECK-NEXT: .hword 65535 // 0xffff
				; CHECK-NEXT: .hword 1 // 0x1
				; CHECK-NEXT: .hword 65535 // 0xffff
				; CHECK-NEXT: .hword 1 // 0x1

				; CHECK: [[LBL2:LCPI...]]:
				; CHECK-NEXT: .hword 65535 // 0xffff
				; CHECK-NEXT: .hword 1 // 0x1
				; CHECK-NEXT: .hword 65535 // 0xffff
				; CHECK-NEXT: .hword 1 // 0x1
				; CHECK-NEXT: .hword 65535 // 0xffff
				; CHECK-NEXT: .hword 1 // 0x1
				; CHECK-NEXT: .hword 65535 // 0xffff
				; CHECK-NEXT: .hword 1 // 0x1

				; CHECK-LABEL: ModImmType8
				; CHECK: adrp x[[REG:[0-9]+]], .[[LBL1]]
				; CHECK-NEXT: ldr d{{[0-9]+}}, [x[[REG]], :lo12:.[[LBL1]]]
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 65535, i16 1, i16 65535, i16 1>)

				; CHECK: adrp x[[REG:[0-9]+]], .[[LBL2]]
				; CHECK-NEXT: ldr q{{[0-9]+}}, [x[[REG]], :lo12:.[[LBL2]]]
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 65535, i16 1, i16 65535, i16 1, i16 65535, i16 1, i16 65535, i16 1>)

				ret i32 0
				}

				; #### ModImmType11 ####
				define i32 @ModImmType11() {
				; CHECK: [[LBL1:LCPI...]]:
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 16392 // 0x4008
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 16392 // 0x4008

				; CHECK: [[LBL2:LCPI...]]:
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 16392 // 0x4008
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 16392 // 0x4008
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 16392 // 0x4008
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 16392 // 0x4008

				; CHECK-LABEL: ModImmType11
				; CHECK: adrp x[[REG:[0-9]+]], .[[LBL1]]
				; CHECK-NEXT: ldr d{{[0-9]+}}, [x[[REG]], :lo12:.[[LBL1]]]
				; CHECK-NEXT: bl f
				call i16 @f(<4 x i16> <i16 0, i16 16392, i16 0, i16 16392>)

				; CHECK: adrp x[[REG:[0-9]+]], .[[LBL2]]
				; CHECK-NEXT: ldr q{{[0-9]+}}, [x[[REG]], :lo12:.[[LBL2]]]
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 0, i16 16392, i16 0, i16 16392, i16 0, i16 16392, i16 0, i16 16392>)

				ret i32 0
				}

				; #### ModImmType12 ####
				define i32 @ModImmType12() {
				; CHECK: [[LBL2:LCPI...]]:
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 16384 // 0x4000
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 0 // 0x0
				; CHECK-NEXT: .hword 16384 // 0x4000

				; CHECK-LABEL: ModImmType12
				; CHECK: adrp x[[REG:[0-9]+]], .[[LBL2]]
				; CHECK-NEXT: ldr q{{[0-9]+}}, [x[[REG]], :lo12:.[[LBL2]]]
				; CHECK-NEXT: bl g
				call i16 @g(<8 x i16> <i16 0, i16 0, i16 0, i16 16384, i16 0, i16 0, i16 0, i16 16384>)

				ret i32 0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64 - BE] BUILD_VECTOR lane order is reversed in big-endian modeAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 13026

lib/Target/AArch64/AArch64ISelLowering.cpp

test/CodeGen/AArch64/aarch64-big-endian-movi.ll

[AArch64 - BE] BUILD_VECTOR lane order is reversed in big-endian mode
AbandonedPublic