This is an archive of the discontinued LLVM Phabricator instance.

[NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half) element type
ClosedPublic

Authored by lebedev.ri on Jun 7 2021, 8:17 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
craig.topper

Commits

rG308f6a5245a2: [NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half)…

Summary

As per the discussion in D103818, so far, this does not appear to be worthwhile.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Jun 7 2021, 8:17 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptJun 7 2021, 8:17 AM

lebedev.ri requested review of this revision.Jun 7 2021, 8:17 AM

Why is this better? vinsertf128 tends to be faster than broadcasts

llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll
1016 ↗	(On Diff #350299)	The AVX1 shuffle looks to be much better............

Harbormaster completed remote builds in B107993: Diff 350299.Jun 7 2021, 8:55 AM

lebedev.ri updated this revision to Diff 350321.Jun 7 2021, 9:14 AM

lebedev.ri retitled this revision from [X86] lowerVECTOR_SHUFFLE(): allow widening shuffle to have i128 (YMM half) element type to [NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half) element type.

lebedev.ri edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B108009: Diff 350321.Jun 7 2021, 9:58 AM

LGTM

This revision is now accepted and ready to land.Jun 7 2021, 12:35 PM

Closed by commit rG308f6a5245a2: [NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half)… (authored by lebedev.ri). · Explain WhyJun 16 2021, 12:25 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG308f6a5245a2: [NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half)….

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

2 lines

Diff 352355

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,429 Lines • ▼ Show 20 Lines	static SDValue lowerVECTOR_SHUFFLE(SDValue Op, const X86Subtarget &Subtarget,
APInt Zeroable = KnownUndef \| KnownZero;		APInt Zeroable = KnownUndef \| KnownZero;
if (Zeroable.isAllOnesValue())		if (Zeroable.isAllOnesValue())
return getZeroVector(VT, Subtarget, DAG, DL);		return getZeroVector(VT, Subtarget, DAG, DL);

bool V2IsZero = !V2IsUndef && ISD::isBuildVectorAllZeros(V2.getNode());		bool V2IsZero = !V2IsUndef && ISD::isBuildVectorAllZeros(V2.getNode());

// Try to collapse shuffles into using a vector type with fewer elements but		// Try to collapse shuffles into using a vector type with fewer elements but
// wider element types. We cap this to not form integers or floating point		// wider element types. We cap this to not form integers or floating point
// elements wider than 64 bits, but it might be interesting to form i128		// elements wider than 64 bits. It does not seem beneficial to form i128
// integers to handle flipping the low and high halves of AVX 256-bit vectors.		// integers to handle flipping the low and high halves of AVX 256-bit vectors.
SmallVector<int, 16> WidenedMask;		SmallVector<int, 16> WidenedMask;
if (VT.getScalarSizeInBits() < 64 && !Is1BitVector &&		if (VT.getScalarSizeInBits() < 64 && !Is1BitVector &&
canWidenShuffleElements(OrigMask, Zeroable, V2IsZero, WidenedMask)) {		canWidenShuffleElements(OrigMask, Zeroable, V2IsZero, WidenedMask)) {
// Shuffle mask widening should not interfere with a broadcast opportunity		// Shuffle mask widening should not interfere with a broadcast opportunity
// by obfuscating the operands with bitcasts.		// by obfuscating the operands with bitcasts.
// TODO: Avoid lowering directly from this top-level function: make this		// TODO: Avoid lowering directly from this top-level function: make this
// a query (canLowerAsBroadcast) and defer lowering to the type-based calls.		// a query (canLowerAsBroadcast) and defer lowering to the type-based calls.
▲ Show 20 Lines • Show All 33,812 Lines • Show Last 20 Lines