This is an archive of the discontinued LLVM Phabricator instance.

[X86][MMX] Add support for MMX zero vector creation
ClosedPublic

Authored by RKSimon on Jan 10 2018, 9:18 AM.

Details

Summary

As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a xmm zero register.

This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jan 10 2018, 9:18 AM
craig.topper added inline comments.Jan 10 2018, 9:41 AM
lib/Target/X86/X86InstrInfo.cpp
8964 ↗(On Diff #129285)

Do the test cases cover this?

RKSimon added inline comments.Jan 10 2018, 10:10 AM
lib/Target/X86/X86InstrInfo.cpp
8964 ↗(On Diff #129285)

No I don't think they do. Any suggestions on suitable tests? I'm a little unclear as to when this kicks in given that the zero registers are all flagged as isReMaterializable.

reames added a subscriber: reames.Jan 10 2018, 2:42 PM

Reading along for my own education, feel free to ignore if I'm way off base.

I think I'm missing something here. If we have a bitcast of a zero element, why don't we have a generic combine to convert that to a zeroinitializer of the target type? Wouldn't we already need/have handling for a zeroinitializer constant of x86_mmx type?

More generally, why doesn't that generic combine exist for *any* constant?

Wouldn't we already need/have handling for a zeroinitializer constant of x86_mmx type?

From LangRef: "There are no arrays, vectors or constants of this type."

Wouldn't we already need/have handling for a zeroinitializer constant of x86_mmx type?

From LangRef: "There are no arrays, vectors or constants of this type."

Yes, we can't attach constants to the x86_mmx type, hence having to alias it via MMX_MOVW2D

lib/Target/X86/X86InstrInfo.cpp
8964 ↗(On Diff #129285)

I still haven't been able to cause this to fire, copying the approach from fold-pcmpeqd-2.ll doesn't seem to work - maybe just drop this part or add an assertion/unreachable?

Wouldn't we already need/have handling for a zeroinitializer constant of x86_mmx type?

From LangRef: "There are no arrays, vectors or constants of this type."

Should we consider changing that? We can always revise the LangRef to allow constants if that simplifies the implementation.

Wouldn't we already need/have handling for a zeroinitializer constant of x86_mmx type?

From LangRef: "There are no arrays, vectors or constants of this type."

Should we consider changing that? We can always revise the LangRef to allow constants if that simplifies the implementation.

It's worth looking into, but I'd prefer to get this in first.

This seems to cause a constant pool to be use for the second _mm_setzero_si64 call.

#include <x86intrin.h>

__m64 bar(__m64 a, __m64 b, __m64 z, __m64 y) {
  __m64 c = _mm_add_pi32(a, b);
  __m64 d = _mm_mul_su32(c, _mm_setzero_si64());
  __m64 e = _mm_add_pi32(d, z);
  __m64 f = _mm_add_pi16(a, e);
  __m64 g = _mm_add_pi16(b, f);
  __m64 h = _mm_mul_su32(g, y);
  __m64 i = _mm_add_pi16(h, z);
  __m64 j = _mm_add_pi16(i, y);
  __m64 k = _mm_add_pi16(j, e);
  __m64 l = _mm_add_pi16(k, f);
  __m64 m = _mm_add_pi16(l, d);
  __m64 n = _mm_mul_su32(m, c);
  __m64 o = _mm_add_pi16(n, _mm_setzero_si64());
  __m64 p = _mm_add_pi16(o, g);
  __m64 q = _mm_mul_su32(p, i);
  __m64 r = _mm_mul_su32(q, b);
  __m64 s = _mm_add_pi16(r, j);
  __m64 t = _mm_add_pi16(s, k);
  return t;
}
RKSimon updated this revision to Diff 129759.Jan 13 2018, 5:07 AM

Updated with zero fold test recommended by @craig.topper

This revision is now accepted and ready to land.Jan 15 2018, 11:36 AM
This revision was automatically updated to reflect the committed changes.