Title says all, there were a few of them missing, making us generate code like:
movq m64, xmm0 vpmovzxbd xmm0, ymm0
instead of:
vpmovzxbd m64, ymm0
I only glanced at the test results for non-AVX2, but them seemed correct.
Paths
| Differential D6125
[X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns. ClosedPublic Authored by ab on Nov 4 2014, 4:21 PM.
Details Summary Title says all, there were a few of them missing, making us generate code like: movq m64, xmm0 vpmovzxbd xmm0, ymm0 instead of: vpmovzxbd m64, ymm0 I only glanced at the test results for non-AVX2, but them seemed correct.
Diff Detail Event Timelineab updated this object. Comment Actions I'd add these intrinsics to X86IntrinsicsInfo.h in the following form: Please add Adam Nemet to reviewers. Comment Actions That's pretty nifty, thanks for the heads up! That makes the *Yrm patterns from D6126 necessary though. -Ahmed ab retitled this revision from [X86] Add patterns for AVX2 VPMOV[SZ]XYrm intrinsics. to [X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns.. ab edited edge metadata. Comment ActionsI followed Elena's advice to use X86IntrinsicsInfo, and got a bit carried away: I removed the existing PMOVX intrinsic patterns, and unified them into only X86v[sz]ext patterns. So let's just say this patch is about refactoring the PMOVX handling, which happens to add a few of the missing patterns.
Comment Actions Hi, In my opinion you have a lot of redundant patterns. And not each pattern is accompanied with test. As far as I know, we don't fold FP load to integer operation. Am I wrong?
Comment Actions Remove a few useless patterns.
Comment Actions Hi Elena! To recap:
%0 = load i64* %p %tmp2 = insertelement <2 x i64> zeroinitializer, i64 %0, i32 0 %1 = bitcast <2 x i64> %tmp2 to <8 x i16> %2 = call <4 x i32> @llvm.x86.sse41.pmovsxwd(<8 x i16> %1)
%X = load <2 x i32>* %ptr %Y = sext <2 x i32> %X to <2 x i64> The other patterns cover the expected scalar load + extension, or vector load + extension (for intrinsics). -Ahmed Comment Actions I re-checked all patterns again. I think you can commit the patch. Thanks
Revision Contents
Diff 16171 lib/Target/X86/X86InstrSSE.td
lib/Target/X86/X86IntrinsicsInfo.h
test/CodeGen/X86/avx2-pmovxrm-intrinsics.ll
test/CodeGen/X86/sse41-pmovxrm-intrinsics.ll
test/CodeGen/X86/vector-sext.ll
test/CodeGen/X86/vector-zext.ll
|
I'd remove SSE4.1 from the names. We started from sse4.1 many years ago, but now we have AVX and AVX2 instructions here as well.