This commit implements the following patterns:
fmul (ptrue sv_all) (dup 1.0) V => V fmul (ptrue sv_all) V (dup 1.0) => V mul (ptrue sv_all) (dup 1) V => V mul (ptrue sv_all) V (dup 1) => V
That is: using the SVE mul/fmul intrinsic with an all-true predicate to
multiply a vector X by a vector of all ones is redundant.
The result of this commit is that code such as:
1 #include <arm_sve.h>
2
3 svfloat64_t foo(svfloat64_t a) {
4 svbool_t t = svptrue_b64();
5 svfloat64_t b = svdup_f64(1.0);
6 return svmul_m(t, a, b);
7 }will compile to a nop.
This commit does not capture all possibilities; only the simple case as
described above. There is still room for further optimisation.
Is it worth naming this something like combineSVEIntrinsicBinOp and similarly for FP you could have combineSVEIntrinsicFPBinOp? A bit like SelectionDAG::simplifyFPBinop. The reason I mention this is that I can imagine you wanting similar things for divides, adds at some point too, i.e. fdiv X, 1.0 -> X or fadd X, 0.0 -> X