Page MenuHomePhabricator

[AArch64][SVE] Replace fmul, fadd and fsub LLVM IR instrinsics with LLVM IR binary ops

Authored by MattDevereau on Sep 2 2021, 4:24 AM.


Replacing instrinsics with normal binops results in
more succinct AArch64 SVE output, e.g.:

4:   65428041   fmul    z1.h, p0/m, z1.h, z2.h
8:   65408020   fadd    z0.h, p0/m, z0.h, z1.h
4:   65620020   fmla    z0.h, p0/m, z1.h, z2.h

Diff Detail

Event Timeline

MattDevereau created this revision.Sep 2 2021, 4:24 AM
MattDevereau requested review of this revision.Sep 2 2021, 4:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2021, 4:24 AM
paulwalker-arm added inline comments.Sep 2 2021, 5:27 AM

This is not enough as you're not considering the predicate the intrinsic takes. We can only do this transform when the predicate is the equivalent of ptrue all.


It seems odd for aarch64_sve_fadd to call instCombineSVEVectorMul. That said the functionality you're after is pretty generic so perhaps it's worth creating instCombineSVEVectorBinOp as I can see us extending this for other cases. This can be called directly for case Intrinsic::aarch64_sve_fadd and called at the bottom of instCombineSVEVectorMul whilst we figure out how much of that function is still useful.

Matt added a subscriber: Matt.Sep 2 2021, 7:05 AM

Created new method instCombineSVEVectorBinOp and narrowed the intrinsic conditional replacement to svp_true_xx/Intrinsic::aarch64_sve_convert_from_svbool

Added ptrue check

bsmith added inline comments.Sep 7 2021, 4:44 AM

This is not taking into consideration the type of the ptrue, this could be an i64 type ptrue passed to an i32 type mul/add, hence does not cover all lanes of the operation. There needs to be a check in here that checks that the ptrue type is the same or smaller than the predicated operation. For example, for a <vscale x 4 x i32> vector op, only ptrue types of <vscale x 4 x i1>, <vscale x 8 x i1>, <vscale x 16 x i1> are allowed, <vscale x 2 x i1> is not.

peterwaller-arm added inline comments.Sep 7 2021, 5:46 AM

I would prefer to see if (match(II, m_Intrinsic(Intrinsic::aarch64_sve_fmul))) && ... which is more idiomatic rather than introducing an IsIntrinsic lambda.

Added test, removed Intrinsic::aarch64_sve_from_svbool case

Make instCombineSVEVectorBinOp more succinct

updated for clang-format and clang-tidy

david-arm added inline comments.Sep 15 2021, 7:18 AM

You could restructure this as:

auto BinOpCode = intrinsicIDToBinOpCode(II.getIntrinsicID());
if (BinOpCode  == Instruction::BinaryOpsEnd ||
  return None;

IRBuilder<> Builder(II.getContext());
return IC.replaceInstUsesWith

If you agree that looks better?


Since we're doing this for fadd shall we also do this for fsub too? It's just literally adding another case statement I think.

Added fsub case and tests. Rearranged instCombineSVEVectorBinOp logic

MattDevereau retitled this revision from [AArch64][SVE] Replace fmul and fadd LLVM IR instrinsics with fmul and fadd to [AArch64][SVE] Replace fmul, fadd and fsub LLVM IR instrinsics with LLVM IR binary ops.Sep 16 2021, 7:56 AM
MattDevereau edited the summary of this revision. (Show Details)

Hi @MattDevereau, thanks for making the changes - it looks really good now! Is it worth adding at least one negative test for the case where the predicate isn't ptrue all?

added test no_replace_on_non_ptrue to assert only ptrue_all is replaced

renamed no_replace_on_non_ptrue to no_replace_on_non_ptrue_all

david-arm accepted this revision.Fri, Oct 1, 12:13 AM

LGTM! Thanks for making the changes.


nit: Could you change the test names before committing to have something different to the intrinsic name? i.e. you could replace the '.' with '_' so that you have

declare <vscale x 8 x half> @llvm_aarch64_sve_fmul_nxv8f16(
This revision is now accepted and ready to land.Fri, Oct 1, 12:13 AM
MattDevereau closed this revision.Fri, Oct 1, 3:28 AM

Closed in commit f085a9db8b8d408d08adcba8e283e637a0116622

@david-arm changing the function declarations to underscores broke the tests, so i've left them for now