The code emitted by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
The Clang intrinsic for (v)cvtsd2ss is implemented using a builtin (_builtin_ia32_cvtsd2ss) and not lowered to generic IR so we don't see this happening. However looking at the semantics IIUC it seems that:
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_cvtsd_ss(__m128 __a, __m128d __b) {
  return (__m128)__builtin_ia32_cvtsd2ss((__v4sf)__a, (__v2df)__b);
}could be also implemented with:
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_cvtsd_ss(__m128 __a, __m128d __b)
{
  __a[0] =__b[0];
 return a;
}Bottom line, I think you are right because the above code doesn't have to come from an intrinsic. I'll add the pattern (I should probably also open a bugzilla for lowering _mm_cvtsd_ss() to generic IR, Right?)
Thanks for the catch!
| lib/Target/X86/X86InstrSSE.td | ||
|---|---|---|
| 1962 ↗ | (On Diff #83592) | For AVX512 I see two types of intrinsics that correspond to these instructions: 
 | 
| 1994 ↗ | (On Diff #83592) | I'll change this. Thanks | 
| lib/Target/X86/X86InstrSSE.td | ||
|---|---|---|
| 1962 ↗ | (On Diff #83592) | @craig.topper @igorb Is using the AVX path for AVX512 alright with you guys? | 
For AVX-512 we can do the following equivalents
The Int_VCVTSD2SSrr can use VCVTSD2SSZrr. This for some reason doesn't use _Int but does have VR128X register type.
The Int_VCVTSS2SDrr can use VCVTSS2SDZrr. Again it uses VR128X type.
The Int_VCVTSI2SS64rr patterns can use VCVTSI642SSZrr_Int
The Int_VCVTSI2SSrr patterns can use VCVTSI2SSZrr_Int
The Int_VCVTSI2SD64rr patterns can use VCVTSI642SDZrr_Int
The Int_VCVTSI2SDrr patterns can use VCVTSI2SDZrr_Int.