This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled). There is no test change as all of our tests are fixed to particular vectorization flags; if desired, I can add a test which exercises the default heuristics.
At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible.
I would appreciate any help testing this before it lands. I've detailed my testing to date below, but as a practical matter, it's smaller than I'd prefer for something this major.
So far, I have successfully cross built the following: sqlite3, LLVM, Clang, Flang, LLD, LLVM's test-suite, spec2017. Not all of these *link* due to problems with my cross compilation environments, but we do compile all of the source files, and all crashes have been fixed.
Additionally, I have successfully *run* (on qemu-riscv64*) the llvm-lit portion of check-llvm, and the test-suite. Both have some failures, but everything I've looked at appears to be due to either a) human error in the run setup or b) cross build configuration problems. So, we have at least some confidence that we're not miscompiling when vectorization is enabled.
- I had to use a downstream qemu-riscv64 implementation as the package available on ubuntu appears to not include +v at all.
Additionally, sqlite3, and clang + LLVM are invalid cost clean - meaning the cost model never returns Invalid when compiling them. Other codebases - in particular test-suite - do return Invalid costs, and I don't consider that to be blocking. After the fixes to bailout properly on invalid costs, an invalid cost should prevent vectorization, but otherwise have no impact.
I'll note that there's a bunch of work pending to improve the output of the vectorizer. At the moment, I believe this all to be tuning work, and do not consider any of it blocking for this patch.
I have not done any native builds, or been able to run any of the resulting code on real hardware. If anyone else has the potential to do so, I'd greatly appreciate the help.
Why is there no override?