The vector.fma operation is portable enough across targets that we do not want
to keep it wrapped under vector.outerproduct and llvm.intrin.fmuladd.
This revision lifts the op into the vector dialect and implements the lowering to LLVM by using two patterns:
- a pattern that lowers from n-D to (n-1)-D by unrolling when n > 2
- a pattern that converts from 1-D to the proper LLVM representation
clang-format: please reformat the code