This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add patterns to delete redundant sel instructions
Needs ReviewPublic

Authored by lizhijin on Apr 5 2023, 8:03 AM.

Details

Summary

For instructions with FalseLanesUndef, we can transform:

(sel $pg (inst $pg, $op1, $op2), $op2)

To

(inst $pg, $op1, $op2)

Diff Detail

Event Timeline

lizhijin created this revision.Apr 5 2023, 8:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 8:03 AM
lizhijin requested review of this revision.Apr 5 2023, 8:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 8:03 AM
Matt added a subscriber: Matt.Apr 5 2023, 10:37 PM

Hi @lizhijin, I've added comments relating to the correctness of the patch but I'm a little concerned about the relevance of the patch's intent.

What real world use case are you targeting? I ask because on first glance it seems like you care about instances where users simulate svadd_m like builtins via a combination of svadd_x and svsel builtins. If this is the case then I think a cleaner implementation would be to handle this as an instcombine. However, if you're using the _u intrinsics purely as a proxy, then I'm wondering what the real IR looks like.

If you look at AArch64fadd_m1 you'll see we already have some support for the style of merging operations we expect to get during auto-vectorisation and there's sure to be benefit in extending this to cover more of the binops as your patch does but the patterns themselves are slightly different (i.e. the binop typically takes an all active predicate).

None of this necessary blocks this patch but I'm worried about unnecessarily polluting isel.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
442

This looks wrong and the likely reason for the incorrect test output. It doesn't make sense for the same operation to match to both these vselect patterns because their expected results differ.

If you follow the advice attached to AArch64fadd_p1 below, you'll not need this class but for reference you'll either need one of these vselect patterns for non-commutative operations like (sub, fdiv) with the other pattern useful for the AArch64fsubr_m1 if you want to cover those cases.

For the commutative operations the second vselect pattern probably wants to be something like

(vselect node:$pg, (sdnode node:$pg, node:$op2, node:$op1), node:$op1)
449

We already have a naming scheme for such operations which in this case is AArch64fadd_m1, which already exists. You should be able to extend the existing PatFrags and then the existing patterns will just work.

458–459

Given we already have AArch64ISD nodes for the unary passthru operations I think it'll be better to remove the select via a DAG combine? If you agree then I'd prefer this to be done as a separate patch.

651–660

These pseudo instructions intentionally have no requirement for the results for the inactive lanes and thus do not represent the behaviour you expect from from your _p1 nodes. If you follow the advice above you'll not need to change this block of code.

llvm/test/CodeGen/AArch64/sve-sel-instruction-undef-predicate.ll
2

Please remove the -mtriple parameter because the triple is already set within the IR.

11

Please can you remove all the entry:'s because the tests shouldn't need them.

40–43

This output is identical to fadd_f64 which is incorrect because this test wants the results for inactive lanes to come from %b (i.e. z1). I'd expect the output to be fadd z1.d, p0/m, z1.d, z0.d?

The aim is to combine svadd_x and svsel intrinsics.
As we can see gcc has patterns to combine sel and some other instructions, while clang can't.
gcc : https://godbolt.org/z/G4aYefG51
clang : https://godbolt.org/z/G4aYefG51

Firstly I want to implement it by DAG Combiner for most instructions (sel + svmul , sel + fmla and etc) , but I find some instructions don't have DAG opcode, like fmulx, fsubr, etc. Do you mean to implement it in the mid-end inst combine or DAG-Combine or machine-combiner ?

Hi @lizhijin, I've added comments relating to the correctness of the patch but I'm a little concerned about the relevance of the patch's intent.

What real world use case are you targeting? I ask because on first glance it seems like you care about instances where users simulate svadd_m like builtins via a combination of svadd_x and svsel builtins. If this is the case then I think a cleaner implementation would be to handle this as an instcombine. However, if you're using the _u intrinsics purely as a proxy, then I'm wondering what the real IR looks like.

If you look at AArch64fadd_m1 you'll see we already have some support for the style of merging operations we expect to get during auto-vectorisation and there's sure to be benefit in extending this to cover more of the binops as your patch does but the patterns themselves are slightly different (i.e. the binop typically takes an all active predicate).

None of this necessary blocks this patch but I'm worried about unnecessarily polluting isel.

Hi @lizhijin, I've added comments relating to the correctness of the patch but I'm a little concerned about the relevance of the patch's intent.

What real world use case are you targeting? I ask because on first glance it seems like you care about instances where users simulate svadd_m like builtins via a combination of svadd_x and svsel builtins. If this is the case then I think a cleaner implementation would be to handle this as an instcombine. However, if you're using the _u intrinsics purely as a proxy, then I'm wondering what the real IR looks like.

If you look at AArch64fadd_m1 you'll see we already have some support for the style of merging operations we expect to get during auto-vectorisation and there's sure to be benefit in extending this to cover more of the binops as your patch does but the patterns themselves are slightly different (i.e. the binop typically takes an all active predicate).

None of this necessary blocks this patch but I'm worried about unnecessarily polluting isel.

Thanks for the clarification. In this instance, based on the current design, these look like inst combines to me (see AArch64TTIImpl::instCombineIntrinsic).