Adds missing SSE/AVX 'undefined' intrinsics (PR24040):
_mm_undefined_pd + _mm256_undefined_pd
_mm_undefined_ps + _mm256_undefined_ps
_mm_undefined_si128 + _mm256_undefined_si256
Differential D12052
[X86][SSE] Add _mm_undefined_* intrinsics RKSimon on Aug 15 2015, 8:56 AM. Authored by
Details Adds missing SSE/AVX 'undefined' intrinsics (PR24040): _mm_undefined_pd + _mm256_undefined_pd
Diff Detail
Event TimelineComment Actions Thanks, Simon! Anyway, this sort of implementation somewhat worries me. Still, relying on what may be undefined behavior in the header files worries me, and I'd rather not have it implemented like that.
Comment Actions __builtin_undef seems like a pretty big hammer and does not sound trivial to implement. Not all types that can be emitted have an undef representation. For example, x86_mmx doesn't have an undef representation because there can be no constants of that type. I'd recommend a more narrow implementation technique unless we really need a more general one. Comment Actions Yes using that uninitialized value has worried me as well. I originally set it to zero (and considered using __ LINE __ or __ COUNTER __) but both introduce defined behaviour that I could see causing all sorts of problems further down the line in debug vs release builds. How undefined do we want our undefined to be? ;-) I can create builtin_ia32_undef64mmx / builtin_ia32_undef128 / builtin_ia32_undef256 / builtin_ia32_undef512 if nobody can think of a better alternative? Comment Actions I’m not sure how much people actually use these, but the AVX-512 versions of these, at least, can be very useful internally to implement AVX-512 intrinsics. However, since we don’t actually have the undef intrinsics right now, we put a zero in the unmasked version as well, which is definitely a pessimization. From: Eric Christopher [mailto:echristo@gmail.com] Comment Actions Added ia32 builtin undef intrinsics (I didn't bother with the mmx as I can't find any evidence of an undefined intrinsic for it). Added the avx512 intrinsics referenced in the intel intrinsics guide. Technically there's nothing stopping us making these builtin more general (non x86 specific) - I don't know if people want us to go that way though? I'll make the tests more explicit once we're happy that this is the way to go. Comment Actions I think this is slightly less elegant than having a generic builtin, but I'm just fine with it, especially if David/Eric prefer it to the generic version. Comment Actions Actually, thinking about it a bit more, a generic builtin most probably won't be more elegant. |