This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix PR30926 - Add patterns for optimizing (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd clang intrinsic sequences
ClosedPublic

Authored by eladcohen on Jan 7 2017, 10:47 PM.

Details

Summary

The code emitted by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences.

Diff Detail

Repository
rL LLVM

Event Timeline

eladcohen updated this revision to Diff 83548.Jan 7 2017, 10:47 PM
eladcohen retitled this revision from to [X86] Fix PR30926 - Add patterns for optimizing cvtsi2ss, cvtsi2sd, and cvtss2sd clang intrinsic sequences.
eladcohen updated this object.
eladcohen added reviewers: igorb, craig.topper.
eladcohen added a subscriber: llvm-commits.
RKSimon added a subscriber: RKSimon.

Doesn't (v)cvtsd2ss suffer from the same issue as well?

Doesn't (v)cvtsd2ss suffer from the same issue as well?

The Clang intrinsic for (v)cvtsd2ss is implemented using a builtin (_builtin_ia32_cvtsd2ss) and not lowered to generic IR so we don't see this happening. However looking at the semantics IIUC it seems that:

static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_cvtsd_ss(__m128 __a, __m128d __b) {
  return (__m128)__builtin_ia32_cvtsd2ss((__v4sf)__a, (__v2df)__b);
}

could be also implemented with:

static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_cvtsd_ss(__m128 __a, __m128d __b)
{
  __a[0] =__b[0];
 return a;
}

Bottom line, I think you are right because the above code doesn't have to come from an intrinsic. I'll add the pattern (I should probably also open a bugzilla for lowering _mm_cvtsd_ss() to generic IR, Right?)
Thanks for the catch!

eladcohen updated this revision to Diff 83592.Jan 9 2017, 12:56 AM
eladcohen updated this object.

Added patterns for (v)cvtsd2ss

eladcohen retitled this revision from [X86] Fix PR30926 - Add patterns for optimizing cvtsi2ss, cvtsi2sd, and cvtss2sd clang intrinsic sequences to [X86] Fix PR30926 - Add patterns for optimizing cvtsi2ss, cvtsi2sd, cvtsd2ss and cvtss2sd clang intrinsic sequences.Jan 9 2017, 12:56 AM
RKSimon added inline comments.Jan 9 2017, 5:25 AM
lib/Target/X86/X86InstrSSE.td
1962 ↗(On Diff #83592)

Should this be [UseAVX] and then add AVX512 patterns in X86InstrAVX512.td ?

1994 ↗(On Diff #83592)

It's more typical to put the AVX patterns before the SSE.

eladcohen added inline comments.Jan 9 2017, 5:42 AM
lib/Target/X86/X86InstrSSE.td
1962 ↗(On Diff #83592)

For AVX512 I see two types of intrinsics that correspond to these instructions:

  1. With Rounding mode or masks (e.g. _mm_cvt_roundi64_sd) - These will generate a builtin and not generic IR, so they don't require any new patterns.
  2. Without rounding mode and masks (e.g. _mm_cvti32_sd) - These are mapped by macros to their matching AVX instruction which are handled in the above patterns - And that's why I want [HasAVX] to catch them.
1994 ↗(On Diff #83592)

I'll change this. Thanks

eladcohen updated this revision to Diff 83617.Jan 9 2017, 5:49 AM

Changed patterns order.

RKSimon added inline comments.Jan 9 2017, 6:03 AM
lib/Target/X86/X86InstrSSE.td
1962 ↗(On Diff #83592)

@craig.topper @igorb Is using the AVX path for AVX512 alright with you guys?

craig.topper edited edge metadata.Jan 10 2017, 7:29 PM

For AVX-512 we can do the following equivalents

The Int_VCVTSD2SSrr can use VCVTSD2SSZrr. This for some reason doesn't use _Int but does have VR128X register type.
The Int_VCVTSS2SDrr can use VCVTSS2SDZrr. Again it uses VR128X type.
The Int_VCVTSI2SS64rr patterns can use VCVTSI642SSZrr_Int
The Int_VCVTSI2SSrr patterns can use VCVTSI2SSZrr_Int
The Int_VCVTSI2SD64rr patterns can use VCVTSI642SDZrr_Int
The Int_VCVTSI2SDrr patterns can use VCVTSI2SDZrr_Int.

eladcohen updated this revision to Diff 83918.Jan 10 2017, 10:58 PM
eladcohen retitled this revision from [X86] Fix PR30926 - Add patterns for optimizing cvtsi2ss, cvtsi2sd, cvtsd2ss and cvtss2sd clang intrinsic sequences to [X86] Fix PR30926 - Add patterns for optimizing (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd clang intrinsic sequences.
eladcohen updated this object.
eladcohen edited edge metadata.

Thanks for the pointers!
Added AVX512 patterns

craig.topper added inline comments.Jan 10 2017, 11:15 PM
lib/Target/X86/X86InstrAVX512.td
5936 ↗(On Diff #83918)

unnecessary is spelled wrong

lib/Target/X86/X86InstrSSE.td
1961 ↗(On Diff #83918)

Can you add a blank like between the AVX block ending and the SSE block starting

1984 ↗(On Diff #83918)

Blank line here too

eladcohen updated this revision to Diff 83920.Jan 10 2017, 11:21 PM

Fixed Craig's comments.

craig.topper accepted this revision.Jan 10 2017, 11:41 PM
craig.topper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jan 10 2017, 11:41 PM
This revision was automatically updated to reflect the committed changes.

Thanks for the review! committed r291660