This is a 'no functional change intended' patch. It removes one FIXME, but it serves as a delivery mechanism for several more. :)
Motivation: we have a FeatureFastUAMem attribute that may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. At some point around Nehalem for Intel and Bobcat for AMD, all scalar and SSE unaligned accesses apparently became fast enough that we can happily use them whenever we want. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes IMO.
Further motivation: this is a preliminary step for PR24449 ( https://llvm.org/bugs/show_bug.cgi?id=24449 ). I'm hoping to answer a few questions about this seemingly simple test case:
void foo(char *x) { memset(x, 0, 32); }
Both of these:
$ clang -O2 memset.c -S -o -
$ clang -O2 -mavx memset.c -S -o -
Produce:
movq $0, 24(%rdi) movq $0, 16(%rdi) movq $0, 8(%rdi) movq $0, (%rdi)
- Is it ok to generate misaligned 8-byte stores by default?
- Is it better to generate misaligned 16-byte SSE stores for the default case? (The default CPU is Core2/Merom.)
- Is it better to generate a misaligned 32-byte AVX store for the AVX case?