This is an archive of the discontinued LLVM Phabricator instance.

Differential D146176

[RISCV] Don't accidentally match deinterleave masks as interleaves
ClosedPublic

Authored by luke on Mar 15 2023, 4:10 PM.

Download Raw Diff

Details

Reviewers

reames

Commits

rG4e1ba0c51868: [RISCV] Don't accidentally match deinterleave masks as interleaves

Summary

Consider a shuffle mask of <0, 2>:
This is one of two deinterleave masks to deinterleave a vector of 4
elements with factor 2.
Unfortunately, this is also technically an interleave mask, where
two subvectors of length 1 at indexes 0 and 2 will be interleaved.
This is because a mask can interleave non-contiguous subvectors:
e.g. <0, 6, 4, 1, 7, 5> on a vector of size 8:

<0 1 2 3 4 5 6 7> indices
 ^ ^     ^ ^ ^ ^
 0 0     2 2 1 1  deinterleaved subvector

This means that deinterleaving shuffles can accidentally be costed as
interleaves.
And it's incorrect in the context of interleaves, because the
only interleave shuffles we model at the moment are single permutation
shuffles, i.e. we are interleaving the first vector below and ignoring
the second:

shufflevector <2 x i32> %v0, <2 x i32> poison, <2 x i32> <i32 0, i32 2>

A mask of <0, 2> interleaves across both vectors.

The fix here is to set NumInputElts correctly: We were setting it to
twice the mask length, i.e. using both input vectors. But in fact we're
actually only using the first vector here, and isInterleaveMask actually
already has logic to ensure that the mask indices stay within the bounds
of the input vectors.

This lacks a test case due to how we're unable to test deinterleave
shuffles (because they are length changing), but is covered in the tests
in D145155

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

luke created this revision.Mar 15 2023, 4:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2023, 4:10 PM

Herald added subscribers: jobnoorman, asb, pmatos and 29 others. · View Herald Transcript

luke requested review of this revision.Mar 15 2023, 4:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2023, 4:10 PM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

luke added a child revision: D145155: [RISCV] Enable interleaved access vectorization.Mar 15 2023, 4:11 PM

luke mentioned this in D145155: [RISCV] Enable interleaved access vectorization.Mar 15 2023, 4:17 PM

luke edited the summary of this revision. (Show Details)Mar 15 2023, 4:25 PM

Harbormaster completed remote builds in B219748: Diff 505647.Mar 15 2023, 5:26 PM

LGTM

This revision is now accepted and ready to land.Mar 16 2023, 7:33 AM

Rebase

This revision was landed with ongoing or failed builds.Mar 16 2023, 8:49 AM

Closed by commit rG4e1ba0c51868: [RISCV] Don't accidentally match deinterleave masks as interleaves (authored by luke). · Explain Why

This revision was automatically updated to reflect the committed changes.

luke added a commit: rG4e1ba0c51868: [RISCV] Don't accidentally match deinterleave masks as interleaves.

Harbormaster completed remote builds in B219884: Diff 505825.Mar 16 2023, 10:08 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

Instructions.h

4 lines

lib/

Target/

RISCV/

RISCVTargetTransformInfo.cpp

3 lines

Diff 505647

llvm/include/llvm/IR/Instructions.h

Show First 20 Lines • Show All 2,442 Lines • ▼ Show 20 Lines	public:
/// E.g. For a Factor of 4 (LaneLen=2):		/// E.g. For a Factor of 4 (LaneLen=2):
/// <0, 2, 6, 4, 1, 3, 7, 5>		/// <0, 2, 6, 4, 1, 3, 7, 5>
///		///
/// NumInputElts is the total number of elements in the input vectors.		/// NumInputElts is the total number of elements in the input vectors.
///		///
/// StartIndexes are the first indexes of each vector being interleaved,		/// StartIndexes are the first indexes of each vector being interleaved,
/// substituting any indexes that were undef		/// substituting any indexes that were undef
/// E.g. <4, -1, 2, 5, 1, 3> (Factor=3): StartIndexes=<4, 0, 2>		/// E.g. <4, -1, 2, 5, 1, 3> (Factor=3): StartIndexes=<4, 0, 2>
		///
		/// Note that this does not check if the input vectors are consecutive:
		/// It will return true for masks such as
		/// <0, 4, 6, 1, 5, 7> (Factor=3, LaneLen=2)
static bool isInterleaveMask(ArrayRef<int> Mask, unsigned Factor,		static bool isInterleaveMask(ArrayRef<int> Mask, unsigned Factor,
unsigned NumInputElts,		unsigned NumInputElts,
SmallVectorImpl<unsigned> &StartIndexes);		SmallVectorImpl<unsigned> &StartIndexes);
static bool isInterleaveMask(ArrayRef<int> Mask, unsigned Factor,		static bool isInterleaveMask(ArrayRef<int> Mask, unsigned Factor,
unsigned NumInputElts) {		unsigned NumInputElts) {
SmallVector<unsigned, 8> StartIndexes;		SmallVector<unsigned, 8> StartIndexes;
return isInterleaveMask(Mask, Factor, NumInputElts, StartIndexes);		return isInterleaveMask(Mask, Factor, NumInputElts, StartIndexes);
}		}
▲ Show 20 Lines • Show All 3,034 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	default:
break;		break;
case TTI::SK_PermuteSingleSrc: {		case TTI::SK_PermuteSingleSrc: {
if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {		if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {
MVT EltTp = LT.second.getVectorElementType();		MVT EltTp = LT.second.getVectorElementType();
// If the size of the element is < ELEN then shuffles of interleaves and		// If the size of the element is < ELEN then shuffles of interleaves and
// deinterleaves of 2 vectors can be lowered into the following		// deinterleaves of 2 vectors can be lowered into the following
// sequences		// sequences
if (EltTp.getScalarSizeInBits() < ST->getELEN()) {		if (EltTp.getScalarSizeInBits() < ST->getELEN()) {
auto InterleaveMask = createInterleaveMask(Mask.size() / 2, 2);
// Example sequence:		// Example sequence:
// vsetivli zero, 4, e8, mf4, ta, ma (ignored)		// vsetivli zero, 4, e8, mf4, ta, ma (ignored)
// vwaddu.vv v10, v8, v9		// vwaddu.vv v10, v8, v9
// li a0, -1 (ignored)		// li a0, -1 (ignored)
// vwmaccu.vx v10, a0, v9		// vwmaccu.vx v10, a0, v9
if (ShuffleVectorInst::isInterleaveMask(Mask, 2, Mask.size() * 2))		if (ShuffleVectorInst::isInterleaveMask(Mask, 2, Mask.size()))
return 2 * LT.first * getLMULCost(LT.second);		return 2 * LT.first * getLMULCost(LT.second);

if (Mask[0] == 0 \|\| Mask[0] == 1) {		if (Mask[0] == 0 \|\| Mask[0] == 1) {
auto DeinterleaveMask = createStrideMask(Mask[0], 2, Mask.size());		auto DeinterleaveMask = createStrideMask(Mask[0], 2, Mask.size());
// Example sequence:		// Example sequence:
// vnsrl.wi v10, v8, 0		// vnsrl.wi v10, v8, 0
if (equal(DeinterleaveMask, Mask))		if (equal(DeinterleaveMask, Mask))
return LT.first * getLMULCost(LT.second);		return LT.first * getLMULCost(LT.second);
▲ Show 20 Lines • Show All 1,244 Lines • Show Last 20 Lines