Many architectures provide SIMD operations (e.g. x86 SSE/AVX, Power AltiVec/VSX,
ARM NEON). It is common that SIMD code implementing the same algorithm, is
written in multiple target-dispatching pieces to optimize for different
architectures or micro-architectures.
The C++ standard proposal P0214 and its extensions cover many common SIMD
operations. By migrating from target-dependent intrinsics to P0214 operations,
the SIMD code can be simplified and pieces for different targets can be unified.
Refer to http://wg21.link/p0214 for introduction and motivation for the
data-parallel standard library.
I think usually this is in .cpp file