This is an archive of the discontinued LLVM Phabricator instance.

[x86] translate SSE packed FP comparison builtins to IR
ClosedPublic

Authored by spatel on Jun 12 2016, 11:38 AM.

Details

Summary

As noted in the code comment, a potential follow-on would be to remove the builtins themselves. Other than ord/unord, this already works as expected. Eg:

typedef float v4sf __attribute__((__vector_size__(16)));
v4sf fcmpgt(v4sf a, v4sf b) { return a > b; }

I'll link a patch for the corresponding LLVM codegen tests next. A follow-on for that side would be to auto-upgrade and remove the LLVM intrinsics.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 60473.Jun 12 2016, 11:38 AM
spatel retitled this revision from to [x86] translate SSE packed FP comparison builtins to IR.
spatel updated this object.
spatel added reviewers: craig.topper, RKSimon, ab.
spatel added a subscriber: cfe-commits.
RKSimon edited edge metadata.Jun 12 2016, 12:10 PM

Is there any reason that we shouldn't include the avxintrin.h builtin_ia32_cmppd/builtin_ia32_cmpps/__builtin_ia32_cmppd256/__builtin_ia32_cmpps256 packed intrinsics in this CGBuiltin.cpp patch? Since we're heading towards nixing them anyhow.

Is there any reason that we shouldn't include the avxintrin.h builtin_ia32_cmppd/builtin_ia32_cmpps/__builtin_ia32_cmppd256/__builtin_ia32_cmpps256 packed intrinsics in this CGBuiltin.cpp patch? Since we're heading towards nixing them anyhow.

AVX is complicated by the enhancement to 32 compare ops (for Intel AVX). Note that avxintrin.h currently has conflicting comments about the immediate value meanings:

/* Compare */
#define _CMP_EQ_OQ 0x00 /* Equal (ordered, non-signaling) */
#define _CMP_LT_OS 0x01 /* Less-than (ordered, signaling) */
#define _CMP_LE_OS 0x02 /* Less-than-or-equal (ordered, signaling) */
#define _CMP_UNORD_Q 0x03 /* Unordered (non-signaling) */
#define _CMP_NEQ_UQ 0x04 /* Not-equal (unordered, non-signaling) */
#define _CMP_NLT_US 0x05 /* Not-less-than (unordered, signaling) */
#define _CMP_NLE_US 0x06 /* Not-less-than-or-equal (unordered, signaling) */
#define _CMP_ORD_Q 0x07 /* Ordered (nonsignaling) */
#define _CMP_EQ_UQ 0x08 /* Equal (unordered, non-signaling) */
#define _CMP_NGE_US 0x09 /* Not-greater-than-or-equal (unord, signaling) */
#define _CMP_NGT_US 0x0a /* Not-greater-than (unordered, signaling) */
#define _CMP_FALSE_OQ 0x0b /* False (ordered, non-signaling) */
#define _CMP_NEQ_OQ 0x0c /* Not-equal (ordered, non-signaling) */
#define _CMP_GE_OS 0x0d /* Greater-than-or-equal (ordered, signaling) */
#define _CMP_GT_OS 0x0e /* Greater-than (ordered, signaling) */
#define _CMP_TRUE_UQ 0x0f /* True (unordered, non-signaling) */
#define _CMP_EQ_OS 0x10 /* Equal (ordered, signaling) */
#define _CMP_LT_OQ 0x11 /* Less-than (ordered, non-signaling) */
#define _CMP_LE_OQ 0x12 /* Less-than-or-equal (ordered, non-signaling) */
#define _CMP_UNORD_S 0x13 /* Unordered (signaling) */
#define _CMP_NEQ_US 0x14 /* Not-equal (unordered, signaling) */
#define _CMP_NLT_UQ 0x15 /* Not-less-than (unordered, non-signaling) */
#define _CMP_NLE_UQ 0x16 /* Not-less-than-or-equal (unord, non-signaling) */
#define _CMP_ORD_S 0x17 /* Ordered (signaling) */
#define _CMP_EQ_US 0x18 /* Equal (unordered, signaling) */
#define _CMP_NGE_UQ 0x19 /* Not-greater-than-or-equal (unord, non-sign) */
#define _CMP_NGT_UQ 0x1a /* Not-greater-than (unordered, non-signaling) */
#define _CMP_FALSE_OS 0x1b /* False (ordered, signaling) */
#define _CMP_NEQ_OS 0x1c /* Not-equal (ordered, signaling) */
#define _CMP_GE_OQ 0x1d /* Greater-than-or-equal (ordered, non-signaling) */
#define _CMP_GT_OQ 0x1e /* Greater-than (ordered, non-signaling) */
#define _CMP_TRUE_US 0x1f /* True (unordered, signaling) */

/ \brief Compares each of the corresponding double-precision values of two
/ 128-bit vectors of [2 x double], using the operation specified by the
/ immediate integer operand. Returns a [2 x double] vector consisting of
/ two doubles corresponding to the two comparison results: zero if the
/ comparison is false, and all 1's if the comparison is true.
/
/ \headerfile <x86intrin.h>
/
/ \code
/ m128d _mm_cmp_pd(m128d a, __m128d b, const int c);
/ \endcode
/
/ This intrinsic corresponds to the \c VCMPPD / CMPPD instruction.
/
/ \param a
/ A 128-bit vector of [2 x double].
/ \param b
/ A 128-bit vector of [2 x double].
/ \param c
/ An immediate integer operand, with bits [4:0] specifying which comparison
/ operation to use:
/ 00h, 08h, 10h, 18h: Equal
/ 01h, 09h, 11h, 19h: Less than
/ 02h, 0Ah, 12h, 1Ah: Less than or equal / Greater than or equal (swapped
/ operands)
/ 03h, 0Bh, 13h, 1Bh: Unordered
/ 04h, 0Ch, 14h, 1Ch: Not equal
/ 05h, 0Dh, 15h, 1Dh: Not less than / Not greater than (swapped operands)
/ 06h, 0Eh, 16h, 1Eh: Not less than or equal / Not greater than or equal
/ (swapped operands)
/ 07h, 0Fh, 17h, 1Fh: Ordered
/ \returns A 128-bit vector of [2 x double] containing the comparison results.

Eeep that's certainly a lot more work than just adding a few extra cases! Please add a TODO explaining what we need to do?

If there is a problem with the header documentation please can you raise a bugzilla and CC Katya Romanova.

Eeep that's certainly a lot more work than just adding a few extra cases! Please add a TODO explaining what we need to do?

I don't know what the answer is yet...looks like this is going to require (a lot of) testing to sort out.

If there is a problem with the header documentation please can you raise a bugzilla and CC Katya Romanova.

Filed PR28110:
https://llvm.org/bugs/show_bug.cgi?id=28110

The initial test says that AMD's documentation is wrong: cmpps with immediate '8' produces a different answer than immediate '0' running on Jaguar.

RKSimon accepted this revision.Jun 13 2016, 2:45 PM
RKSimon edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jun 13 2016, 2:45 PM
This revision was automatically updated to reflect the committed changes.