This is an archive of the discontinued LLVM Phabricator instance.

[SLP] allow non-power-of-2 vectorization
AbandonedPublic

Authored by spatel on Aug 19 2019, 6:10 AM.

Details

Summary

From what I can tell, we are artificially restricting the pass to bail out if we would vectorize to a non-power-of-2 number of elements. That is, everything below the changed part of this patch is working as intended for calculating costs and tree elements. However, I am proposing to add a debug flag for experimentation in case this reveals regressions.

A similar test to the diff here:
rL369255
...shows that we can already generate a non-standard vector size (<2 x float>) and shuffle.

The motivating case is from PR16739:
https://bugs.llvm.org/show_bug.cgi?id=16739
...and after instcombine, we end up with:

define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
  %1 = bitcast <4 x float>* %x to <3 x float>*
  %2 = load <3 x float>, <3 x float>* %1, align 4
  %i3 = shufflevector <3 x float> %2, <3 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
  ret <4 x float> %i3
}

And because we know that the pointer is dereferenceable to 16 bytes, the backend generates the optimal code for x86:

	movups	(%rdi), %xmm0
	shufps	$164, %xmm0, %xmm0      ## xmm0 = xmm0[0,1,2,2]

This does not appear to interact with proposal D57779, but maybe we are just lacking the regression tests to show it?

Diff Detail

Event Timeline

spatel created this revision.Aug 19 2019, 6:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2019, 6:10 AM

There is more complex D57059 to support non-power-of-2 vectorization. It should be split into several small patches + it must be very carefully tested, I just don't have time to work on this. I have a patch for this updated to the latest version, it would be good if somebody else could take it, split it, etc.

There is more complex D57059 to support non-power-of-2 vectorization. It should be split into several small patches + it must be very carefully tested, I just don't have time to work on this. I have a patch for this updated to the latest version, it would be good if somebody else could take it, split it, etc.

Thanks! I knew this had come up before, but I didn't find that patch. Let me discuss with Simon and Dinar to see what we can do.

@spatel Abandon this? D57059 is close to being completed

spatel abandoned this revision.Sep 29 2020, 9:03 AM

Abandoning in favor of the bigger fix in D57059.