This and similar cases are not yet covered by SimplifySelectCC
Details
Diff Detail
Unit Tests
Event Timeline
llvm/test/CodeGen/AArch64/fold-csel-cttz-and.ll | ||
---|---|---|
130 | Did the update_llc_test_checks.py script add this blank line? |
What if the target doesn't natively support CTLZ/CTTZ and only has CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF will end up with a select followed by the AND? What if the target doesn't support CTTZ/CTLZ at all?
What if the target doesn't natively support CTLZ/CTTZ and only has CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF will end up with a select followed by the AND? What if the target doesn't support CTTZ/CTLZ at all?
The extra cases that I've added will go through the same legality checks as previous cases. The checks (!LegalOperations || TLI.isOperationLegal(ISD::CTTZ, VT))) still apply, right?
But, thanks to your comment, I've realized that I should probably add legality checks for AND.
llvm/test/CodeGen/AArch64/fold-csel-cttz-and.ll | ||
---|---|---|
130 | It was me, my bad. Fixed it. |
(1) Added check for legality of AND
(2) Made the check before insertion of AND stronger. Previous check could be not enough if the type size was not a power of 2.
(3) Updated affected tests
(!LegalOperations || TLI.isOperationLegal(ISD::CTTZ, VT))) only prevents creating CTTZ in the last DAGCombine run if CTTZ isn't legal. It will still allow it to be created in the earlier runs. I don't know if we have test coverage of this pattern on a target that doesn't support cttz/ctlz.
This test
define i32 @foo(i32 %x) { entry: %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true) %1 = icmp eq i32 %x, 0 %2 = select i1 %1, i32 0, i32 %0 ret i32 %2 } declare i32 @llvm.cttz.i32(i32, i1)
compiled with llc -mtriple=riscv64 -o - -mattr=+m produces
foo: # @foo .cfi_startproc # %bb.0: # %entry sext.w a1, a0 bnez a1, .LBB0_2 # %bb.1: # %entry li a0, 32 andi a0, a0, 31 ret .LBB0_2: negw a1, a0 and a0, a0, a1 lui a1, 30667 addiw a1, a1, 1329 mul a0, a0, a1 srliw a0, a0, 27 lui a1, %hi(.LCPI0_0) addi a1, a1, %lo(.LCPI0_0) add a0, a1, a0 lbu a0, 0(a0) andi a0, a0, 31 ret .Lfunc_end0: .size foo, .Lfunc_end0-foo .cfi_endproc # -- End function .section ".note.GNU-stack","",@progbits
That looks to be the table based expansion of CTTZ_ZERO_UNDEF, followed by a select to pick 32 for the zero case, followed by an AND with 31. RISC-V doesn't have cmov so the select became control flow.
This is what the code looked like before the patch
foo: # @foo .cfi_startproc # %bb.0: # %entry negw a1, a0 and a1, a0, a1 lui a2, 30667 addiw a2, a2, 1329 mul a1, a1, a2 srliw a1, a1, 27 lui a2, %hi(.LCPI0_0) addi a2, a2, %lo(.LCPI0_0) add a1, a2, a1 lbu a1, 0(a1) sext.w a0, a0 seqz a0, a0 addi a0, a0, -1 and a0, a0, a1 ret .Lfunc_end0: .size foo, .Lfunc_end0-foo .cfi_endproc # -- End function .section ".note.GNU-stack","",@progbits
I can see two alternatives to the problem you've pointed out.
(1) we could just not do the transformtion at all even before legalization if target doesn't support cttz.
(2) This is more complicated:
With my patch the dag looks like this before lowerSELECT:
SelectionDAG has 23 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t26: i64 = setcc t2, Constant:i64<0>, seteq:ch t15: i64 = sub Constant:i64<0>, t2 t16: i64 = and t2, t15 t18: i64 = mul t16, Constant:i64<151050438420815295> t20: i64 = srl t18, Constant:i64<58> t22: i64 = add ConstantPool:i64<[64 x i8] c"\00\01\02\07\03\0D\08\13\04\19\0E\1C\09\22\14(\05\11\1A&\0F.\1D0\0A\1F#6\152)9?\06\0C\12\18\1B!'\10%-/\1E518>\0B\17 $,47=\16+3<*;:"> 0, t20 t24: i64,ch = load<(load (s8) from constant-pool), zext from i8> t0, t22, undef:i64 t28: i64 = select t26, Constant:i64<64>, t24 t13: i64 = and t28, Constant:i64<63> t9: ch,glue = CopyToReg t0, Register:i64 $x10, t13 t10: ch = RISCVISD::RET_GLUE t9, Register:i64 $x10, t9:1
Inside lowerSELECT we can look at the uses of t28: i64 = select t26, Constant:i64<64>, t24 and realize that it will be ANDed with 63. So we'll see that
t28: i64 = select t26, Constant:i64<64>, t24 t13: i64 = and t28, Constant:i64<63>
could be replaced with select c 0, X (where X = and(cttz(...), 63). This is a profitable reduction for RISCV because that can be turned into and((c - 1), X).
To do this, we'll have to try to see if commuting select with a binop makes the resulting select into something profitable (like when one of the choices is zero). This can work in other cases too.
@craig.topper What do you suggest?
I have posted a patch for folding binop into select: https://reviews.llvm.org/D152147
That solves the degradation that @craig.topper pointed out.
Does the issue exist on other targets?
Right, sorry. Didn't think about it. My guess would be that select_cc is at least as expensive as and. I see two alternatives:
(1) Introduce a target hook - isAndCheaperThanSelectCC which defaults to false but is true on RISCV (maybe in the presence of some extensions).
(2) Make this RISCV specific
What do you think?
Did the update_llc_test_checks.py script add this blank line?