For instructions with FalseLanesUndef, we can transform:
(sel $pg (inst $pg, $op1, $op2), $op2)
To
(inst $pg, $op1, $op2)
Paths
| Differential D147619
[SVE] Add patterns to delete redundant sel instructions Needs ReviewPublic Authored by lizhijin on Apr 5 2023, 8:03 AM.
Details Summary For instructions with FalseLanesUndef, we can transform: (sel $pg (inst $pg, $op1, $op2), $op2) To (inst $pg, $op1, $op2)
Diff Detail Event TimelineComment Actions Hi @lizhijin, I've added comments relating to the correctness of the patch but I'm a little concerned about the relevance of the patch's intent. What real world use case are you targeting? I ask because on first glance it seems like you care about instances where users simulate svadd_m like builtins via a combination of svadd_x and svsel builtins. If this is the case then I think a cleaner implementation would be to handle this as an instcombine. However, if you're using the _u intrinsics purely as a proxy, then I'm wondering what the real IR looks like. If you look at AArch64fadd_m1 you'll see we already have some support for the style of merging operations we expect to get during auto-vectorisation and there's sure to be benefit in extending this to cover more of the binops as your patch does but the patterns themselves are slightly different (i.e. the binop typically takes an all active predicate). None of this necessary blocks this patch but I'm worried about unnecessarily polluting isel.
Comment Actions The aim is to combine svadd_x and svsel intrinsics. Firstly I want to implement it by DAG Combiner for most instructions (sel + svmul , sel + fmla and etc) , but I find some instructions don't have DAG opcode, like fmulx, fsubr, etc. Do you mean to implement it in the mid-end inst combine or DAG-Combine or machine-combiner ?
Comment Actions Thanks for the clarification. In this instance, based on the current design, these look like inst combines to me (see AArch64TTIImpl::instCombineIntrinsic).
Revision Contents
Diff 511106 llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
llvm/test/CodeGen/AArch64/sve-sel-instruction-undef-predicate.ll
|
This looks wrong and the likely reason for the incorrect test output. It doesn't make sense for the same operation to match to both these vselect patterns because their expected results differ.
If you follow the advice attached to AArch64fadd_p1 below, you'll not need this class but for reference you'll either need one of these vselect patterns for non-commutative operations like (sub, fdiv) with the other pattern useful for the AArch64fsubr_m1 if you want to cover those cases.
For the commutative operations the second vselect pattern probably wants to be something like