This patch adds a pass that determines if 512 bit registers are definitely needed based on the size of function arguments it detects. It explicitly ignores target independent intrinsics assuming that selection dag can gracefully split those.
I'm hoping what's here is enough to protect ABI, inline assembly, and intrinsic requirements. Anything else we need for correctness that I haven't thought of?
This doesn't help with any explicit vector code created in the original source code that get lowered to native IR without using in target specific intrinsics. We'll need to set the attribute from clang or an earlier IR pass to detect that. This pass is intended to get something testable so we can see if we can enable prefer vector width and get the legalizer to clip to 256 bit safely on real source code. Nothing will happen if -mprefer-vector-width=256 is not passed to clang.