Replacing instrinsics with normal binops results in more succinct AArch64 SVE output, e.g.: 4: 65428041 fmul z1.h, p0/m, z1.h, z2.h 8: 65408020 fadd z0.h, p0/m, z0.h, z1.h -> 4: 65620020 fmla z0.h, p0/m, z1.h, z2.h
This is not enough as you're not considering the predicate the intrinsic takes. We can only do this transform when the predicate is the equivalent of ptrue all.
It seems odd for aarch64_sve_fadd to call instCombineSVEVectorMul. That said the functionality you're after is pretty generic so perhaps it's worth creating instCombineSVEVectorBinOp as I can see us extending this for other cases. This can be called directly for case Intrinsic::aarch64_sve_fadd and called at the bottom of instCombineSVEVectorMul whilst we figure out how much of that function is still useful.
This is not taking into consideration the type of the ptrue, this could be an i64 type ptrue passed to an i32 type mul/add, hence does not cover all lanes of the operation. There needs to be a check in here that checks that the ptrue type is the same or smaller than the predicated operation. For example, for a <vscale x 4 x i32> vector op, only ptrue types of <vscale x 4 x i1>, <vscale x 8 x i1>, <vscale x 16 x i1> are allowed, <vscale x 2 x i1> is not.
I would prefer to see if (match(II, m_Intrinsic(Intrinsic::aarch64_sve_fmul))) && ... which is more idiomatic rather than introducing an IsIntrinsic lambda.
You could restructure this as:
auto BinOpCode = intrinsicIDToBinOpCode(II.getIntrinsicID()); if (BinOpCode == Instruction::BinaryOpsEnd || !match(...)) return None; IRBuilder<> Builder(II.getContext()); ... return IC.replaceInstUsesWith
If you agree that looks better?
Since we're doing this for fadd shall we also do this for fsub too? It's just literally adding another case statement I think.
LGTM! Thanks for making the changes.
nit: Could you change the test names before committing to have something different to the intrinsic name? i.e. you could replace the '.' with '_' so that you have
declare <vscale x 8 x half> @llvm_aarch64_sve_fmul_nxv8f16(