This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.
ClosedPublic

Authored by craig.topper on May 14 2018, 11:30 PM.

Details

Summary

I believe this is safe assuming default rounding mode. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.

We already do something similar for scalar conversions.

Diff Detail

Event Timeline

craig.topper created this revision.May 14 2018, 11:30 PM

I'm all for keeping the scalar/vector behaviour the same but I'm concerned about constant folding not taking into account runtime rounding mode:

e,.g. SelectionDAG::getNode - we don't check the return status of convertFromAPInt from *INT_TO_FP - but then again the FP_TO_*INT constant folds only bail on invalid conversions.....

I haven't checked what other passes are doing.

I'm concerned about constant folding not taking into account runtime rounding mode

Changing the rounding mode is UB without FENV_ACCESS. And with FENV_ACCESS, __builtin_convertvector should lower to @llvm.experimental.constrained.sitofp etc., which won't constant-fold. (llvm.experimental.constrained.sitofp doesn't actually exist yet, but I assume it will eventually get added.)

craig.topper retitled this revision from [X86] Use __builtin_convertvector to implement some of the packed integer to packed flow conversion intrinsics. to [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics..May 15 2018, 11:57 AM

This looks good to me, Craig. I am not worried about the constant folding issue, as I think constant folding these conversion intrinsics (assuming round-to-nearest) is a perfectly valid optimization in the absence of FENV_ACCESS. (FWIW, we don't do this constant folding in icc, but that is only because we have never gotten around to implementing it.) I am also not worried about the spurious 'inexact' exceptions that the changes to the mask intrinsics will cause.

GBuella added a subscriber: ashlykov.

So I think we've covered the whether this is ok to do questions. If someone can double check signed/unsigned and vector element sizes are all correct and approve this that would be great.

DavidKreitzer accepted this revision.May 21 2018, 11:55 AM

I had actually done that as part of my initial review, so LGTM.

This revision is now accepted and ready to land.May 21 2018, 11:55 AM
This revision was automatically updated to reflect the committed changes.

Hi Craig,

The __builtin_ia32_cvtdq2ps builtin seems to be supported by gcc, and this change breaks code which has been using this without problems for years. Can we restore it?

Amara

Hi @aemerson, I'm not opposed to adding it back. But the clang policy for vector builtins has always been that we won't support all the builtins that gcc does and to encourage the use of the _mm_* wrappers which are guaranteed to work in both compilers. It possible to change your source code to use the portable intrinsic name?