This is an archive of the discontinued LLVM Phabricator instance.

[x86, SSE] optimize pcmp results better (PR28484)
ClosedPublic

Authored by spatel on Jul 11 2016, 10:21 AM.

Details

Summary

We know that pcmp (SSE/AVX at least; I'm intentionally leaving 512-bit out of this patch because I don't know what happens there) produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.

FWIW, I see no perf differences in test-suite with this change. I don't expect that a zext of a bitmask is a common pattern. This is a first step towards the better motivating example in PR28486:
https://llvm.org/bugs/show_bug.cgi?id=28486
...which is itself just an extract from a case where we seemingly get everything wrong:
https://godbolt.org/g/Ez2bDW

One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the same throughput potential as load+and on those cores, but I think that should be handled as a CPU-specific later transformation if it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 63519.Jul 11 2016, 10:21 AM
spatel retitled this revision from to [x86, SSE] optimize pcmp results better (PR28484).
spatel updated this object.
spatel added reviewers: mkuper, DavidKreitzer, ab, RKSimon.
spatel added a subscriber: llvm-commits.
delena added a subscriber: delena.Jul 11 2016, 11:42 AM
delena added inline comments.
lib/Target/X86/X86ISelLowering.cpp
28139 ↗(On Diff #63519)

We cam mark nodes like PCMP with AssertSext. And use this marker to simplify AND.

28148 ↗(On Diff #63519)

VT0 is always equal to VT1.

28152 ↗(On Diff #63519)

On AVX-512 (skylake-avx512) the result is in a mask reg, also for 256 and 128 vector inputs.

spatel added inline comments.Jul 11 2016, 12:53 PM
lib/Target/X86/X86ISelLowering.cpp
28139 ↗(On Diff #63519)

Ah - I didn't know about AssertSext. To use it, we would add one of those nodes any time we create a PCMPEQ/PCMPGT? And then we would check for an AssertSext at this point rather than PCMPEQ/PCMPGT?

Ok if I add a 'TODO' comment in this patch?

28148 ↗(On Diff #63519)

I was paranoid that something like this:

      t18: v4i32 = setcc t2, t4, seteq:ch
      t17: v4i32 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
    t19: v4i32 = and t18, t17
  t8: v2i64 = bitcast t19
  t10: v2i64 = BUILD_VECTOR Constant:i64<4294967297>, Constant:i64<4294967297>
t11: v2i64 = and t8, t10

might cause the original (v4i32) constant to get folded away; ie, we might have a bitcast on one side of the 'and' but not the other. If this can't possibly happen, then certainly we can remove the check.

28152 ↗(On Diff #63519)

If it is the mask form, then wouldn't the node be PCMPEQM rather than PCMPEQ?

delena accepted this revision.Jul 12 2016, 2:29 AM
delena added a reviewer: delena.
delena added inline comments.
lib/Target/X86/X86ISelLowering.cpp
28139 ↗(On Diff #63519)

yes

28152 ↗(On Diff #63519)

you are right. I think you can remove "todo AVX-512" from the comments.

This revision is now accepted and ready to land.Jul 12 2016, 2:29 AM
This revision was automatically updated to reflect the committed changes.