This is an archive of the discontinued LLVM Phabricator instance.

[X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns.
ClosedPublic

Authored by ab on Nov 4 2014, 4:21 PM.

Details

Summary

Title says all, there were a few of them missing, making us generate code like:

movq  m64, xmm0
vpmovzxbd   xmm0, ymm0

instead of:

vpmovzxbd  m64, ymm0

I only glanced at the test results for non-AVX2, but them seemed correct.

Diff Detail

Repository
rL LLVM

Event Timeline

ab updated this revision to Diff 15791.Nov 4 2014, 4:21 PM
ab retitled this revision from to [X86] Add patterns for AVX2 VPMOV[SZ]XYrm intrinsics..
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added reviewers: delena, craig.topper, qcolombet.
ab added a subscriber: Unknown Object (MLST).
delena edited edge metadata.Nov 5 2014, 2:11 AM

I'd add these intrinsics to X86IntrinsicsInfo.h in the following form:
X86_INTRINSIC_DATA(avx2_pmovsxbq, INTR_TYPE_1OP, X86ISD::VSEXT, 0).
We try to use this way instead adding patterns.

Please add Adam Nemet to reviewers.

ab planned changes to this revision.Nov 5 2014, 10:49 AM
ab added a reviewer: anemet.

That's pretty nifty, thanks for the heads up!

That makes the *Yrm patterns from D6126 necessary though.
I'll merge the two patches together and update this later today.

-Ahmed

ab updated this revision to Diff 15837.Nov 5 2014, 5:53 PM
ab retitled this revision from [X86] Add patterns for AVX2 VPMOV[SZ]XYrm intrinsics. to [X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns..
ab edited edge metadata.

I followed Elena's advice to use X86IntrinsicsInfo, and got a bit carried away: I removed the existing PMOVX intrinsic patterns, and unified them into only X86v[sz]ext patterns.

So let's just say this patch is about refactoring the PMOVX handling, which happens to add a few of the missing patterns.
If people don't like pattern multiclasses I also have the handwritten version (just adding the patterns in the right places), which I guess is easier to read.

ab updated this revision to Diff 15838.Nov 5 2014, 6:01 PM

More pattern multiclasses, for AVX2 patterns.

ab updated this revision to Diff 15839.Nov 5 2014, 6:02 PM

..and without the AVX2->AVX typo.

delena added inline comments.Nov 5 2014, 10:39 PM
lib/Target/X86/X86InstrSSE.td
6094 ↗(On Diff #15839)

I'd remove SSE4.1 from the names. We started from sse4.1 many years ago, but now we have AVX and AVX2 instructions here as well.

6102 ↗(On Diff #15839)

reg-to-reg substitution pattern can be here, inside [ ]

6107 ↗(On Diff #15839)

And here, inside [ ], we can put reg-mem substitution pattern

ab added inline comments.Nov 6 2014, 11:36 AM
lib/Target/X86/X86InstrSSE.td
6094 ↗(On Diff #15839)

So I left it as is because I see it as the minimum feature introducing the instructions, and that seems consistent with the rest of the file. I can add /AVX/AVX2 if you feel strongly about it.

6102 ↗(On Diff #15839)

The problem is that you would need to carry around the output types for each of the *[WDQ] variants, so I figured it's easier to just write explicit Pats later.

ab added a comment.Nov 12 2014, 10:33 AM

Ping for review!

-Ahmed

Hi, In my opinion you have a lot of redundant patterns. And not each pattern is accompanied with test.
If you can't write a code that use the pattern - it should be removed.

As far as I know, we don't fold FP load to integer operation. Am I wrong?

lib/Target/X86/X86InstrSSE.td
6168 ↗(On Diff #15839)

Why v32i8 ? How do you get it? If you have v8i8 it will be legalized to v8i32 or v8i16.
I think you have a lot of redundant patterns.

6175 ↗(On Diff #15839)

How do we get this pattern? Why do we try v16i16 ->zext->v8i32?
I think that the only possible pattern is v8i16 ->zext->v8i32.

6286 ↗(On Diff #15839)

How do we extend float to i32? I don't think that we need to fold FP load to integer operation.

ab updated this revision to Diff 16171.Nov 13 2014, 11:55 AM

Remove a few useless patterns.
Regenerate intrinsics testcases from C source.
Update testcase for upstream vpunpck changes.

lib/Target/X86/X86InstrSSE.td
6168 ↗(On Diff #15839)

Good point, lots of the patterns are unusual but were there before. So I started investigating, there's years of cruft behind them. Notably:

PR14887/r172353 added the VR256 AVX2 patterns, but they don't seem useful (anymore?)

r56594 added the vzmovl/vzload patterns, and I'm not sure if we want those: they fire on the sse41.ll tests, but I don't know how common these are. For now I left those patterns there, but we have to decide whether to keep those tests. In that case we need to cover all of the patterns of the form:

  1. load from an int type of the same size as the source vector
  2. insert elements into enough lanes to fill the register
  3. bitcast into the intrinsics expected input type
  4. intrinsic call
ab added inline comments.Nov 13 2014, 12:01 PM
lib/Target/X86/X86InstrSSE.td
6286 ↗(On Diff #15839)

We don't, but loading a v4i16 results in a loadf64 (see vector-sext.ll, load_sext_test1).

ab added a comment.Dec 2 2014, 3:36 PM

Hi Elena!
So, any thoughts on the sse41.ll tests and associated patterns?

To recap:

  • the vzmovl patterns fire on code with scalar loads:
	%0 = load i64* %p
	%tmp2 = insertelement <2 x i64> zeroinitializer, i64 %0, i32 0
	%1 = bitcast <2 x i64> %tmp2 to <8 x i16>
	%2 = call <4 x i32> @llvm.x86.sse41.pmovsxwd(<8 x i16> %1)
  • the vzload patterns fire on the same code, but in 32-bit mode
  • the loadf64 patterns fire on vector loads of a 64-bit type, in 32-bit mode (where load f64 is legal, but load i64 isn't):
%X = load <2 x i32>* %ptr
%Y = sext <2 x i32> %X to <2 x i64>

The other patterns cover the expected scalar load + extension, or vector load + extension (for intrinsics).

-Ahmed

I re-checked all patterns again. I think you can commit the patch.
LGTM.

Thanks

  • Elena
ab closed this revision.Dec 5 2014, 5:31 PM
ab updated this revision to Diff 17012.

Closed by commit rL223567 (authored by @ab).