This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] `select_cc seteq X, 0, 0, cttz_zero_undef(X) -> and(cttz(X), sizeof(X) - 1)`
Needs ReviewPublic

Authored by mgudim on May 30 2023, 2:00 PM.

Details

Reviewers
craig.topper
Summary

This and similar cases are not yet covered by SimplifySelectCC

Diff Detail

Event Timeline

mgudim created this revision.May 30 2023, 2:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2023, 2:00 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
mgudim requested review of this revision.May 30 2023, 2:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2023, 2:00 PM
mgudim retitled this revision from [DAGCombine] `select_cc seteq X, 0, 0, cttz_zero_undef(X) -> and(cttz(X), 63)` to [DAGCombine] `select_cc seteq X, 0, 0, cttz_zero_undef(X) -> and(cttz(X), sizeof(X) - 1)`.May 30 2023, 2:02 PM
craig.topper added inline comments.May 30 2023, 4:00 PM
llvm/test/CodeGen/AArch64/fold-csel-cttz-and.ll
130

Did the update_llc_test_checks.py script add this blank line?

What if the target doesn't natively support CTLZ/CTTZ and only has CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF will end up with a select followed by the AND? What if the target doesn't support CTTZ/CTLZ at all?

mgudim added a comment.EditedMay 31 2023, 5:51 AM

@craig.topper

What if the target doesn't natively support CTLZ/CTTZ and only has CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF will end up with a select followed by the AND? What if the target doesn't support CTTZ/CTLZ at all?

The extra cases that I've added will go through the same legality checks as previous cases. The checks (!LegalOperations || TLI.isOperationLegal(ISD::CTTZ, VT))) still apply, right?

But, thanks to your comment, I've realized that I should probably add legality checks for AND.

llvm/test/CodeGen/AArch64/fold-csel-cttz-and.ll
130

It was me, my bad. Fixed it.

mgudim updated this revision to Diff 527058.May 31 2023, 8:31 AM

(1) Added check for legality of AND
(2) Made the check before insertion of AND stronger. Previous check could be not enough if the type size was not a power of 2.
(3) Updated affected tests

@craig.topper

What if the target doesn't natively support CTLZ/CTTZ and only has CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF will end up with a select followed by the AND? What if the target doesn't support CTTZ/CTLZ at all?

The extra cases that I've added will go through the same legality checks as previous cases. The checks (!LegalOperations || TLI.isOperationLegal(ISD::CTTZ, VT))) still apply, right?

But, thanks to your comment, I've realized that I should probably add legality checks for AND.

(!LegalOperations || TLI.isOperationLegal(ISD::CTTZ, VT))) only prevents creating CTTZ in the last DAGCombine run if CTTZ isn't legal. It will still allow it to be created in the earlier runs. I don't know if we have test coverage of this pattern on a target that doesn't support cttz/ctlz.

craig.topper added a comment.EditedMay 31 2023, 10:31 AM

This test

define i32 @foo(i32 %x) {                                                        
entry:                                                                           
  %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)                                  
  %1 = icmp eq i32 %x, 0                                                         
  %2 = select i1 %1, i32 0, i32 %0                                               
  ret i32 %2                                                                     
}                                                                                
                                                                                 
declare i32 @llvm.cttz.i32(i32, i1)

compiled with llc -mtriple=riscv64 -o - -mattr=+m produces

foo:                                    # @foo
        .cfi_startproc
# %bb.0:                                # %entry
        sext.w  a1, a0
        bnez    a1, .LBB0_2
# %bb.1:                                # %entry
        li      a0, 32
        andi    a0, a0, 31
        ret
.LBB0_2:
        negw    a1, a0
        and     a0, a0, a1
        lui     a1, 30667
        addiw   a1, a1, 1329
        mul     a0, a0, a1
        srliw   a0, a0, 27
        lui     a1, %hi(.LCPI0_0)
        addi    a1, a1, %lo(.LCPI0_0)
        add     a0, a1, a0
        lbu     a0, 0(a0)
        andi    a0, a0, 31
        ret
.Lfunc_end0:
        .size   foo, .Lfunc_end0-foo
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

That looks to be the table based expansion of CTTZ_ZERO_UNDEF, followed by a select to pick 32 for the zero case, followed by an AND with 31. RISC-V doesn't have cmov so the select became control flow.

This is what the code looked like before the patch

foo:                                    # @foo
        .cfi_startproc
# %bb.0:                                # %entry
        negw    a1, a0
        and     a1, a0, a1
        lui     a2, 30667
        addiw   a2, a2, 1329
        mul     a1, a1, a2
        srliw   a1, a1, 27
        lui     a2, %hi(.LCPI0_0)
        addi    a2, a2, %lo(.LCPI0_0)
        add     a1, a2, a1
        lbu     a1, 0(a1)
        sext.w  a0, a0
        seqz    a0, a0
        addi    a0, a0, -1
        and     a0, a0, a1
        ret
.Lfunc_end0:
        .size   foo, .Lfunc_end0-foo
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

I can see two alternatives to the problem you've pointed out.

(1) we could just not do the transformtion at all even before legalization if target doesn't support cttz.

(2) This is more complicated:

With my patch the dag looks like this before lowerSELECT:

SelectionDAG has 23 nodes:
  t0: ch,glue = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t26: i64 = setcc t2, Constant:i64<0>, seteq:ch
                  t15: i64 = sub Constant:i64<0>, t2
                t16: i64 = and t2, t15
              t18: i64 = mul t16, Constant:i64<151050438420815295>
            t20: i64 = srl t18, Constant:i64<58>
          t22: i64 = add ConstantPool:i64<[64 x i8] c"\00\01\02\07\03\0D\08\13\04\19\0E\1C\09\22\14(\05\11\1A&\0F.\1D0\0A\1F#6\152)9?\06\0C\12\18\1B!'\10%-/\1E518>\0B\17 $,47=\16+3<*;:"> 0, t20
        t24: i64,ch = load<(load (s8) from constant-pool), zext from i8> t0, t22, undef:i64
      t28: i64 = select t26, Constant:i64<64>, t24
    t13: i64 = and t28, Constant:i64<63>
  t9: ch,glue = CopyToReg t0, Register:i64 $x10, t13
  t10: ch = RISCVISD::RET_GLUE t9, Register:i64 $x10, t9:1

Inside lowerSELECT we can look at the uses of t28: i64 = select t26, Constant:i64<64>, t24 and realize that it will be ANDed with 63. So we'll see that

t28: i64 = select t26, Constant:i64<64>, t24
  t13: i64 = and t28, Constant:i64<63>

could be replaced with select c 0, X (where X = and(cttz(...), 63). This is a profitable reduction for RISCV because that can be turned into and((c - 1), X).

To do this, we'll have to try to see if commuting select with a binop makes the resulting select into something profitable (like when one of the choices is zero). This can work in other cases too.

@craig.topper What do you suggest?

mgudim added a comment.Jun 5 2023, 5:49 AM

I have posted a patch for folding binop into select: https://reviews.llvm.org/D152147
That solves the degradation that @craig.topper pointed out.

I have posted a patch for folding binop into select: https://reviews.llvm.org/D152147
That solves the degradation that @craig.topper pointed out.

Does the issue exist on other targets?

mgudim added a comment.Jun 8 2023, 1:00 PM

Does the issue exist on other targets?

Right, sorry. Didn't think about it. My guess would be that select_cc is at least as expensive as and. I see two alternatives:
(1) Introduce a target hook - isAndCheaperThanSelectCC which defaults to false but is true on RISCV (maybe in the presence of some extensions).
(2) Make this RISCV specific

What do you think?