This is an archive of the discontinued LLVM Phabricator instance.

[X86][FastISel] Avoid introducing legacy SSE instructions if the subtarget has AVX.
ClosedPublic

Authored by andreadb on Feb 5 2015, 8:06 AM.

Details

Summary

This patch teaches X86FastISel how to select scalar float/double convert operations using AVX instructions.

Before this patch, X86FastISel always selected legacy SSE instructions for FPExt (from float to double) and FPTrunc (from double to float).

For example:
\code

define double @foo(float %f) {
  %conv = fpext float %f to double
  ret double %conv
}

\end code

Before (with -mattr=+avx -fast-isel), X86FastIsel selected a CVTSS2SDrr which is
legacy SSE:

cvtss2sd %xmm0, %xmm0

With this patch (with -mattr=+avx -fast-isel), X86FastIsel selects a VCVTSS2SDrr instead:

vcvtss2sd %xmm0, %xmm0, %xmm0

Added test fast-isel-fptrunc-fpext.ll to check both the register-register and the register-memory float/double conversion variants.

Please let me know if ok to submit.

Thanks!
Andrea

Diff Detail

Event Timeline

andreadb updated this revision to Diff 19407.Feb 5 2015, 8:06 AM
andreadb retitled this revision from to [X86][FastISel] Avoid introducing legacy SSE instructions if the subtarget has AVX..
andreadb updated this object.
andreadb edited the test plan for this revision. (Show Details)
andreadb added reviewers: craig.topper, mkuper, ributzka.
andreadb added a subscriber: Unknown Object (MLST).
mkuper edited edge metadata.Feb 6 2015, 1:57 PM

Looks good in general, with a couple of comments.

lib/Target/X86/X86FastISel.cpp
2019

I have the feeling just reusing OpReg here would be better.
It's true that having an IMPLICIT_DEF, in theory, gives the regalloc more freedom, but I'm afraid it may just cause false-dependence trouble further on if decides to choose a different register. So it may be better to just force OpReg.

This is what the pattern for the rr version does, btw:

def : Pat<(f64 (fextend FR32:$src)),
    (VCVTSS2SDrr FR32:$src, FR32:$src)>, Requires<[UseAVX]>;

The pattern for the rm version has an IMPLICIT_DEF, but in that case, there is no choice.

2046

Perhaps this is now large enough to move the common code from the two functions into a helper?

test/CodeGen/X86/fast-isel-fptrunc-fpext.ll
25

If I remember correctly, this will also match "vcvtss2sd %xmm0, %xmm0, %xmm0" since there's no requirement for the match to start at the beginning of a line, and this is a partial match. Perhaps have an SSE-NOT for the v version?

(Same applies to all testcases)

Hi Michael,

thanks for the review. I will send a new version of the patch that addresses all your comments.

-Andrea

lib/Target/X86/X86FastISel.cpp
2019

Right, I will remove the IMPLICIT_DEF and just use OpReg for both operands in the AVX case.
The reason why I ended up using IMPLICIT_DEF was to give a chance to FastISel to also match the 'register-memory' variants of scalar float/double convert.

2046

Good point. I'll try to move the common code into a helper function.

test/CodeGen/X86/fast-isel-fptrunc-fpext.ll
25

Sure, I will change the tests.

andreadb updated this revision to Diff 19587.Feb 9 2015, 9:18 AM
andreadb edited edge metadata.

Hi Michael,

Here is a new version of the patch which hopefully addresses all your comments.
These are the main differences with respect to the previous patch:

  1. I removed all the uses of IMPLICIT_DEF registers. If the target has AVX, the same register is passed to both input operands.
  2. I changed the tests so that on SSE, we check that the instruction generated doesn't have a vex prefix.
  3. I tried to factor out common code into a separate helper named 'X86SelectFPExtOrFPTrunc'.

Please let me know if ok to submit.

Thanks again for your time,
Andrea

mkuper accepted this revision.Feb 9 2015, 10:36 PM
mkuper edited edge metadata.

Thanks, Andrea!
LGTM.

This revision is now accepted and ready to land.Feb 9 2015, 10:36 PM
This revision was automatically updated to reflect the committed changes.

Thanks Michael!
committed revision 228682.