This patch adds initial support for the prefer vector width function attribute. By processing it in getSubtargetImpl and translating it into a subtarget feature like we do for soft float.
I've implemented it this way specificically to allow skylake-avx512 to eventually enable this feature by default by adding the subtarget feature to the CPU definition. If the attribute isn't present we'll take the CPU default, if the attribute is present and specifies larger than 256 bits we'll append a -disable-prefer-avx256 to the feature string which will override the CPU default.
I've then passed this information out to TTI's getRegisterBitWidth() method. We probably also want to add support for the function attribute by itself to the vectorizers so that it works for non-x86 targets, but I've left that for a separate patch since its not directly required by my final goal and I'm less familiar with the vectorizers. X86 will still need to expose something via subtarget/TTI no matter what due to the skylake-avx512 requirement.
After this patch, I plan to start using this subtarget feature in X86ISelLowering.cpp to tell the type legalizer and assorted lowering code not to use 512-bit vectors. This seems the easiest way to ensure no 512-bit vectors are used and still allow the vectorizer to use larger types for interleaved accesses an other things. This will make the code as similar to AVX2 legalization as possible while still allowing xmm16-31, masking, gather/scatter, etc.
In order to support user code that uses target specific intrinsics that require wider vectors, I plan to add an X86 IR pass just before isel that will detect such intrinsics and explicitly add a prefer-vector-width=512 function attribute or replace an existing lower attribute with a higher value. This way we won't constrain the legalizer and will allow the wider types. Unfortunately, AVX512 C instrinics that we can represent with native IR would not trigger this pass and would be subject to being split by legalization.
Longer term we should add an IR pass earlier in the pipeline that alters the vector width based on any vectors that were present in the original IR. This would fix the native IR based intrinsic problem mentioned above. We would probably still need the X86 specific pass as a protection against not running the IR optimization pipeline.
This plan is derived from a conversation I had with Chandler and Eric Christopher on IRC a few weeks ago. Hopefully I've correctly captured what they were suggesting.