Page MenuHomePhabricator

[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

Authored by RKSimon on Jul 7 2016, 11:53 AM.



D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Diff Detail


Event Timeline

RKSimon updated this revision to Diff 63110.Jul 7 2016, 11:53 AM
RKSimon retitled this revision from to [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR.
RKSimon updated this object.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
eli.friedman added inline comments.Jul 7 2016, 3:24 PM
1429 ↗(On Diff #63110)

This is supposed to give up if it finds an out-of-range float; does that not work correctly?

RKSimon added inline comments.Jul 7 2016, 4:07 PM
1429 ↗(On Diff #63110)

I was being possibly over thorough - there is scope for some very careful constant folding.

The cvt versions rely on the runtime rounding mode, so constant folding may not match the same result (we could maybe just permit exact conversions?).

The cvtt versions should work but its not what I saw in some basic tests on godbolt with large values such as FLT_MAX. I'll see if we can improve the existing tests to see what is going on and make a decision on what we can safely support.

eli.friedman added inline comments.Jul 7 2016, 4:40 PM
1429 ↗(On Diff #63110)

Hmm... for the rounding mode, I think we generally take the position that we don't support changing it. I mean, if we did allow that, we couldn't implement addps on top of the LLVM IR fadd. (We might have to revisit this once LLVM gets proper support for rounding modes.)

Exact conversions should be safe either way.

RKSimon updated this revision to Diff 64317.Jul 18 2016, 7:39 AM

Updated the non-truncating conversion constant folding not to accept inexact conversions

eli.friedman accepted this revision.Jul 18 2016, 10:25 AM
eli.friedman edited edge metadata.


1682 ↗(On Diff #64317)


This revision is now accepted and ready to land.Jul 18 2016, 10:25 AM
This revision was automatically updated to reflect the committed changes.

Thanks Eli - please can you confirm if D22105 is OK as well?