Also refactor the existing AVX / AVX2 code to not repeat itself in both the 32-bit and 64-bit cases.
For /arch:AVX512F, we define __AVX512F__ __AVX512CD__ __AVX512ER__ __AVX512PF__. cl.exe defines (according to /Bz) __AVX512F__ and __AVX512CD__, and 64-bit cl also defines _NO_PREFETCHW.
For /arch:AVX512, we define __AVX512F__ __AVX512CD__ __AVX512BW____AVX512DQ__ __AVX512VL__. cl.exe defines __AVX512F__ __AVX512CD__ __AVX512BW__,__AVX512DQ__ __AVX512VL__, and on 64-bit also _NO_PREFETCHW.
So not 100% identical, but seems pretty close.