This is an archive of the discontinued LLVM Phabricator instance.

PR28129 expand vector oparation to an IR constant.
ClosedPublic

Authored by dtemirbulatov on May 22 2017, 5:01 AM.

Diff Detail

Event Timeline

dtemirbulatov created this revision.May 22 2017, 5:01 AM
RKSimon edited edge metadata.May 22 2017, 5:18 AM

Test _mm256_cmp_pd as well?

lib/CodeGen/CGBuiltin.cpp
7934

You need a comment here - explain what the constant represents and what the transform does.

test/CodeGen/avx-builtins.c
1434

Use _CMP_TRUE_UQ here instead of 0xf?

spatel added inline comments.May 22 2017, 6:42 AM
lib/CodeGen/CGBuiltin.cpp
7949–7959
  1. Should we handle the 'pd256' version the same way?
  2. How about the 0xb ('false') constant? It should produce a zero here?
  3. Can or should we deal with the signalling versions (0x1b, 0x1f) too?

add _mm256_cmp_pd double version
add comments in lib/CodeGen/CGBuiltin.cpp
replaced 0xf to _CMP_TRUE_UQ in avx-builtins.c

Should we handle the 'pd256' version the same way?
How about the 0xb ('false') constant? It should produce a zero here?
Can or should we deal with the signalling versions (0x1b, 0x1f) too?

hm looks like 0xb(_CMP_FALSE_OQ) is ordered, so it is not possible and 0x1b or 0x1f might emit a signal.

lib/CodeGen/CGBuiltin.cpp
7949–7959

hm looks like 0xb(_CMP_FALSE_OQ) is ordered, so it is not possible and 0x1b or 0x1f might emit a signal.

spatel added inline comments.
lib/CodeGen/CGBuiltin.cpp
7949–7959

I didn't follow this reasoning.

  1. The 0xB compare predicate will return 'false' (all zeros) no matter what the inputs are. "Ordered" in this definition is irrelevant; just like "unordered" is irrelevant for predicate 0xF (TRUE_UQ).

It's probably helpful to run the program attached to PR28110 ( https://bugs.llvm.org/show_bug.cgi?id=28110 ) to confirm or deny if these predicates behave like you expect.

  1. Another possibly misleading wording: "non-signaling" does not actually mean non-signaling for all values. It means "non-signaling for QNAN, but still signaling for SNAN". Therefore, I think we're changing SNAN behavior by folding *any* of these preds to constant values. We should've asked this first: is that fold allowed in the default FPENV state that we assume that clang is operating in? ( cc'ing @andrew.w.kaylor and @scanon for advice)

We should've asked this first: is that fold allowed in the default FPENV state that we assume that clang is operating in?

I suppose it is FE_ALL_EXCEPT.

dtemirbulatov added a comment.EditedJun 5 2017, 6:01 AM

Ping. [andrew.w.kaylor, scanon] Is it OK to assume that FP exceptions are off by default and allow such transformation to constants in the IR since we know that we would have exception with "1.00 -nan" for _mm256_cmp_ps(a, b, 15) in case FP exceptions are enabled?

dtemirbulatov added a reviewer: hfinkel.

Update after http://lists.llvm.org/pipermail/llvm-dev/2017-June/114120.html. Added 0x1b(_CMP_FALSE_OS), 0x1f(_CMP_TRUE_US) handling.

spatel edited edge metadata.Jun 15 2017, 9:53 AM

Functionally, I think this is correct and complete now. See inline for some nits.

lib/CodeGen/CGBuiltin.cpp
7925–7926

Fix comment to something like:
"Except for predicates that create constants, ..."

7934

would produce --> produces

7940

Formatting: over 80-col limit.

7950

would produce --> produces

7956

Formatting: over 80-col limit.

Update formatting, comments

spatel accepted this revision.Jun 15 2017, 3:49 PM

LGTM.

This revision is now accepted and ready to land.Jun 15 2017, 3:49 PM