Originally we said that -mpreferred-vector-width was only going to stop the vectorizer and some of code generation, but here's another spot if we want to make sure we don't canonicalize a memcpy/memmmove and then lower it to the widest vector type.
Original testcase:
void Copy256(const char* src, char* dst) { char tmp[32]; for (int i = 0; i < 32; ++i) tmp[i] = src[i]; for (int i = 0; i < 32; ++i) dst[i] = tmp[i]; }
which is pretty boring, but shows the problem:
vmovups ymm0, ymmword ptr [rdi] vmovups ymmword ptr [rsi], ymm0 vzeroupper ret
while the option says that this doesn't necessarily mean no vector code, I think this is a fairly reasonable place to stop some optimization.
Thoughts?