Replacing instrinsics with normal binops results in more succinct AArch64 SVE output, e.g.: 4: 65428041 fmul z1.h, p0/m, z1.h, z2.h 8: 65408020 fadd z0.h, p0/m, z0.h, z1.h -> 4: 65620020 fmla z0.h, p0/m, z1.h, z2.h
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
780 | This is not enough as you're not considering the predicate the intrinsic takes. We can only do this transform when the predicate is the equivalent of ptrue all. | |
869–870 | It seems odd for aarch64_sve_fadd to call instCombineSVEVectorMul. That said the functionality you're after is pretty generic so perhaps it's worth creating instCombineSVEVectorBinOp as I can see us extending this for other cases. This can be called directly for case Intrinsic::aarch64_sve_fadd and called at the bottom of instCombineSVEVectorMul whilst we figure out how much of that function is still useful. |
Created new method instCombineSVEVectorBinOp and narrowed the intrinsic conditional replacement to svp_true_xx/Intrinsic::aarch64_sve_convert_from_svbool
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
702–707 | This is not taking into consideration the type of the ptrue, this could be an i64 type ptrue passed to an i32 type mul/add, hence does not cover all lanes of the operation. There needs to be a check in here that checks that the ptrue type is the same or smaller than the predicated operation. For example, for a <vscale x 4 x i32> vector op, only ptrue types of <vscale x 4 x i1>, <vscale x 8 x i1>, <vscale x 16 x i1> are allowed, <vscale x 2 x i1> is not. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
703 | I would prefer to see if (match(II, m_Intrinsic(Intrinsic::aarch64_sve_fmul))) && ... which is more idiomatic rather than introducing an IsIntrinsic lambda. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
700 | You could restructure this as: auto BinOpCode = intrinsicIDToBinOpCode(II.getIntrinsicID()); if (BinOpCode == Instruction::BinaryOpsEnd || !match(...)) return None; IRBuilder<> Builder(II.getContext()); ... return IC.replaceInstUsesWith If you agree that looks better? | |
870 | Since we're doing this for fadd shall we also do this for fsub too? It's just literally adding another case statement I think. |
Hi @MattDevereau, thanks for making the changes - it looks really good now! Is it worth adding at least one negative test for the case where the predicate isn't ptrue all?
LGTM! Thanks for making the changes.
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-fma-binops.ll | ||
---|---|---|
10 ↗ | (On Diff #376276) | nit: Could you change the test names before committing to have something different to the intrinsic name? i.e. you could replace the '.' with '_' so that you have declare <vscale x 8 x half> @llvm_aarch64_sve_fmul_nxv8f16( |
Closed in commit f085a9db8b8d408d08adcba8e283e637a0116622
@david-arm changing the function declarations to underscores broke the tests, so i've left them for now
You could restructure this as:
If you agree that looks better?