Regarding _mm_cvtps_ph and ...cvtph_ps, I can find an Intel doc that say the _mm_* functions are in emmintrin.h and the _mm256_* are in immintrin.h, so putting them all with emmintrin.h is not consistent with what Intel says. I can find a Microsoft doc that say these are all available with immintrin.h (not emmintrin.h). It looks like gcc puts them with immintrin.h also.
Can we move the #include of f16cintrin.h from emmintrin.h to immintrin.h? It would be more compatible with everybody else, and equally inconsistent with Intel's doc AFAICT.
I'd rather not move them into immintrin.h. The current situation is that if you include emmintrin.h you get "too many" intrinsics w.r.t to the Intel docs (and immintrin.h is just right, since it recursively includes emmintrin.h), while if we move it to immintrin.h, including emmintrin.h would provide "not enough" intrinsics. So, in terms of compatibility, I think this is slightly better.
Regarding MSVC and GCC - MSVC doesn't have an emmintrin.h (or any other specialized intrinsic file) and puts everything in immintrin.h. GCC, in my opinion, should probably also be fixed. :-)
On the other hand, I undersand why providing _mm256 stuff in emmintrin.h may be a problem. Perhaps we can split the file in two (or move the relevant intrinsics directly into emm/immintrin)?
That would means our header structure is slightly different from GCC's, but since this file should not be included directly anyway, that doesn't really bother me.
Does that sound reasonable to you?
Thanks Michael! Moving the mm256 stuff to immintrin.h ought to work. What actually motivates this is mucking around with modularizing the intrinsic headers, and the immediate problem is the duplicate typedefs for v8sf and m256 (which are in both f16cintrin.h and avxintrin.h). If we move the mm256 intrinsics to a point in immintrin.h that comes after where it includes avxintrin.h, then the duplicate typedefs can go away.
I'll see if I can get a patch for this up today.