If the add/sub is not single use, it will need to be materialized
later, in which case using the BMI instruction is a de-optimization in
terms of code-size and throughput.
i.e:
// Good leal -1(%rdi), %eax andl %eax, %eax xorl %eax, %esi ...
// Unecessary BMI (lower throughput, larger code size) leal -1(%rdi), %eax blsr %edi, %eax xorl %eax, %esi ...
Note, this may cause more mov instructions to be emitted sometimes
because BMI instructions only have 1 src and write-only to dst. A
better approach may be to only avoid BMI for (and/xor X, (add/sub
0/-1, X)) if this is the last use of X but NOT the last use of
(add/sub 0/-1, X).
Maybe we should have
`class binop_oneuse<SDPatternOperator operator> : PatFrag<(ops node:$A, node:$B), (operator node:$A, node:$B), [{ return N->hasOneUse(); }]>;` def add_su : binop_oneuse<add>; def and_su : binop_oneuse<and>; def srl_su : binop_oneuse<srl>; class unop_oneuse<SDPatternOperator operator> : PatFrag<(ops node:$A), (operator node:$A), [{ return N->hasOneUse(); }]>; def ineg_su : unop_oneuse<ineg>; def trunc_su : unop_oneuse<trunc>;