This is an archive of the discontinued LLVM Phabricator instance.

[X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h
ClosedPublic

Authored by craig.topper on May 21 2018, 6:38 PM.

Details

Summary

Intel documents the 128-bit versions as being in emmintrin.h and the 256-bit version as being in immintrin.h.

This patch makes a new __emmtrin_f16c.h to hold the 128-bit versions to be included from emmintrin.h. And makes the existing f16cintrin.h contain the 256-bit versions and include it from immintrin.h with an error if its included directly.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.May 21 2018, 6:38 PM
craig.topper added inline comments.May 21 2018, 6:43 PM
lib/Headers/immintrin.h
72 ↗(On Diff #147932)

Interesting this to note here, the 256-bit f16c intrinsics were being guarded by AVX2 when MSC_VER was defined and modules weren't supported. This was definitely incorrect.

rnk accepted this revision.May 22 2018, 11:18 AM

lgtm

This revision is now accepted and ready to land.May 22 2018, 11:18 AM

Aren't all the instructions from the same CPUID bit? It seems odd to split them across multiple files.

It is odd, but they really are split in the icc include files. So they got split a while back in clang to match the Intel Intrinsic Guide documentation.

It is odd, but they really are split in the icc include files. So they got split a while back in clang to match the Intel Intrinsic Guide documentation.

OK - if that means we're matching latest icc/gcc. LGTM.

This revision was automatically updated to reflect the committed changes.

A bit of history: In icc, the f16<=>f32 conversion intrinsics are a bit of an anomaly in that they can be implemented using either native code or emulation code based on the target architecture switch. See https://godbolt.org/g/bQy7xY (thanks, Craig, for the example code). The emulation code lives in the Intel Math Library.

The reason icc chose to declare the scalar & 128-bit versions of the intrinsics in emmintrin.h rather than a header file that more closely corresponds to the f16c feature is that emmintrin.h contains the minimum necessary to use the emulation code, i.e. the declaration of the __m128i type.

Given that clang doesn't support the lowering of these intrinsics to emulation code, I don't see much benefit including them in emmintrin.h. It would make more sense to just put everything in f16cintrin.h and include that from immintrin.h.

In brief, I like your changes in immintrin.h. I would move the code from _emmintrin_f16c.h into f16cintrin.h. And I would remove the include from emmintrin.h. I think that would be consistent with gcc as well. We can let the emulation behavior of these intrinsics remain an icc-specific anomaly.

Implemented @DavidKreitzer's suggestion in r333033