This is an archive of the discontinued LLVM Phabricator instance.

Differential D4135

ARMEB: Fix trunc store for vector types
ClosedPublic

Authored by cpirker on Jun 13 2014, 8:19 AM.

Download Raw Diff

Details

Reviewers

jmolloy

Summary

Hi all,

The ARM backend transforms a trunc store to shuffle and store operations.
A vector (or multiple scalars) is packed in one (wide) place to be stored in less memory operations.
The shuffle operation is utilized to extract some values (narrowed elements) to mimic the trunc operation.
Currently, the shuffle operation assumes little endian byte order.

This patch is calculating the shuffle indices for the least significant data (trunc) based upon the "higher side" of vector element.

Please review.

Thanks,
Christian

Diff Detail

Event Timeline

cpirker updated this revision to Diff 10391.Jun 13 2014, 8:19 AM

cpirker retitled this revision from to ARMEB: Fix trunc store for vector types.

cpirker updated this object.

cpirker edited the test plan for this revision. (Show Details)

cpirker added subscribers: Unknown Object (MLST), Konrad.

Herald added a subscriber: aemerson. · View Herald TranscriptJun 13 2014, 8:19 AM

Hi Christian,

This looks good to me, I have only one comment that should be trivial to address before commit.

Cheers,

James

test/CodeGen/ARM/big-endian-neon-trunc-store.ll
17	I just tried your patch, and this test case is producing another REV just above this line. Is that expected? If so, please add it as a CHECK line so that it is obvioiusly expected behaviour.

This revision is now accepted and ready to land.Jun 15 2014, 12:48 PM

I committed this patch as rL211010.

test/CodeGen/ARM/big-endian-neon-trunc-store.ll
17	The vrev64.32 instruction that you are observing, belongs to a bundle of vld and vrev to load a vector from memory. This instruction sequence for load is not touched by this patch and therefore a CHECK line for the vrev instruction is purposely not included.

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMISelLowering.cpp

3 lines

test/

CodeGen/

ARM/

big-endian-neon-trunc-store.ll

26 lines

Diff 10391

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,458 Lines • ▼ Show 20 Lines	if (St->isTruncatingStore() && VT.isVector()) {
// Create a type on which we perform the shuffle.		// Create a type on which we perform the shuffle.
EVT WideVecVT = EVT::getVectorVT(*DAG.getContext(), StVT.getScalarType(),		EVT WideVecVT = EVT::getVectorVT(*DAG.getContext(), StVT.getScalarType(),
NumElems*SizeRatio);		NumElems*SizeRatio);
assert(WideVecVT.getSizeInBits() == VT.getSizeInBits());		assert(WideVecVT.getSizeInBits() == VT.getSizeInBits());

SDLoc DL(St);		SDLoc DL(St);
SDValue WideVec = DAG.getNode(ISD::BITCAST, DL, WideVecVT, StVal);		SDValue WideVec = DAG.getNode(ISD::BITCAST, DL, WideVecVT, StVal);
SmallVector<int, 8> ShuffleVec(NumElems * SizeRatio, -1);		SmallVector<int, 8> ShuffleVec(NumElems * SizeRatio, -1);
for (unsigned i = 0; i < NumElems; ++i) ShuffleVec[i] = i * SizeRatio;		for (unsigned i = 0; i < NumElems; ++i)
		ShuffleVec[i] = TLI.isBigEndian() ? (i+1) * SizeRatio - 1 : i * SizeRatio;

// Can't shuffle using an illegal type.		// Can't shuffle using an illegal type.
if (!TLI.isTypeLegal(WideVecVT)) return SDValue();		if (!TLI.isTypeLegal(WideVecVT)) return SDValue();

SDValue Shuff = DAG.getVectorShuffle(WideVecVT, DL, WideVec,		SDValue Shuff = DAG.getVectorShuffle(WideVecVT, DL, WideVec,
DAG.getUNDEF(WideVec.getValueType()),		DAG.getUNDEF(WideVec.getValueType()),
ShuffleVec.data());		ShuffleVec.data());
// At this point all of the data is stored at the bottom of the		// At this point all of the data is stored at the bottom of the
▲ Show 20 Lines • Show All 2,423 Lines • Show Last 20 Lines

test/CodeGen/ARM/big-endian-neon-trunc-store.ll

				; RUN: llc < %s -mtriple armeb-eabi -mattr v7,neon -o - \| FileCheck %s

				define void @vector_trunc_store_2i64_to_2i16( <2 x i64>* %loadaddr, <2 x i16>* %storeaddr ) {
				; CHECK-LABEL: vector_trunc_store_2i64_to_2i16:
				; CHECK: vmovn.i64 [[REG:d[0-9]+]]
				; CHECK: vrev32.16 [[REG]], [[REG]]
				; CHECK: vuzp.16 [[REG]], [[REG2:d[0-9]+]]
				; CHECK: vrev32.16 [[REG]], [[REG2]]
				%1 = load <2 x i64>* %loadaddr
				%2 = trunc <2 x i64> %1 to <2 x i16>
				store <2 x i16> %2, <2 x i16>* %storeaddr
				ret void
				}

				define void @vector_trunc_store_4i32_to_4i8( <4 x i32>* %loadaddr, <4 x i8>* %storeaddr ) {
				; CHECK-LABEL: vector_trunc_store_4i32_to_4i8:
				; CHECK: vmovn.i32 [[REG:d[0-9]+]]
				jmolloyUnsubmitted Not Done Reply Inline Actions I just tried your patch, and this test case is producing another REV just above this line. Is that expected? If so, please add it as a CHECK line so that it is obvioiusly expected behaviour. jmolloy: I just tried your patch, and this test case is producing another REV just above this line. Is…
				cpirkerAuthorUnsubmitted Not Done Reply Inline Actions The vrev64.32 instruction that you are observing, belongs to a bundle of vld and vrev to load a vector from memory. This instruction sequence for load is not touched by this patch and therefore a CHECK line for the vrev instruction is purposely not included. cpirker: The vrev64.32 instruction that you are observing, belongs to a bundle of vld and vrev to load a…
				; CHECK: vrev16.8 [[REG]], [[REG]]
				; CHECK: vuzp.8 [[REG]], [[REG2:d[0-9]+]]
				; CHECK: vrev32.8 [[REG]], [[REG2]]
				%1 = load <4 x i32>* %loadaddr
				%2 = trunc <4 x i32> %1 to <4 x i8>
				store <4 x i8> %2, <4 x i8>* %storeaddr
				ret void
				}