Fold:
(X & (signbit l>> Y)) ==/!= 0 -> (X << Y) s>=/s< 0 (X & (signbit << Y)) ==/!= 0 -> (X l>> Y) s>=/s< 0
Differential D62818
[InstCombine] Introduce fold for icmp pred (and X, (sh signbit, Y)), 0. huihuiz on Jun 3 2019, 10:23 AM. Authored by
Details
Fold: (X & (signbit l>> Y)) ==/!= 0 -> (X << Y) s>=/s< 0 (X & (signbit << Y)) ==/!= 0 -> (X l>> Y) s>=/s< 0
Diff Detail
Event TimelineComment Actions For thumb target, this optimization allow generation of more compacted instructions. @ %bb.0: @ %entry subs r0, r0, #1 lsls r1, r0 cmp r1, #0 blt .LBB0_2 @ %bb.1: @ %entry mov r2, r3 .LBB0_2: @ %entry mov r0, r2 bx lr Otherwise will generate more instructions with signmask shifting @ %bb.0: @ %entry .save {r4, lr} push {r4, lr} subs r0, r0, #1 movs r4, #1 lsls r4, r4, #31 lsrs r4, r0 tst r4, r1 beq .LBB0_2 @ %bb.1: @ %entry mov r3, r2 .LBB0_2: @ %entry mov r0, r3 pop {r4, pc} ARM and thumb2 target allow flexible second operand, for this case test bit instruction with shift. This optimization does not affect performance of generated instructions. With this optimization @ %bb.0: @ %entry sub r0, r0, #1 lsl r0, r1, r0 cmp r0, #0 movge r2, r3 mov r0, r2 bx lr Without this optimization: @ %bb.0: @ %entry sub r12, r0, #1 mov r0, #-2147483648 tst r1, r0, lsr r12 moveq r2, r3 mov r0, r2 bx lr Comment Actions This looks like a missing backend-level transform, either a generic-one in DAGCombiner, or in ARMISelLowering.cpp. This fix is not the right thing to do because even if you disable this fold, Comment Actions Though this transform is also bad for X86: https://godbolt.org/z/KFM3gQ Comment Actions So we're missing an undo fold: https://rise4fun.com/Alive/w25 Comment Actions On the instcombine side, one thing worth noting which isn't called out in the commit message is the interaction with other instcombine patterns. In the testcase, note that the final IR actually doesn't contain any mask; instead, it checks icmp slt i32 [[SHL]], 0. Huihui, please update the commit message to make this clear. It's possible we should also implement the related pattern to transform (x & (signbit >> y)) != 0 to (x << y) < 0, sure. In terms of whether it's universally profitable, I'm not sure... I guess if somehow "icmp ne X, 0" is free, but "icmp slt X, 0" isn't, it could be an issue, but I don't think that applies to any architecture I can think of. Comment Actions I'm about to post dagcominer undo-fold, hold on.. Yes, now that would be a good patch, +see inline comment.
I think there may or may not bea confusion here. We are in a middle-end here. Other than TTI,
Comment Actions And posted: D62871 As for the instcombine side, Comment Actions The other approach could be changing the order of folding. Move foldICmpBinOpEqualityWithConstant to the very beginning of foldICmpInstWithConstant. Comment Actions Yes , changing the order would allow these folds. (X & (signbit >> Y)) != 0 -> (X << Y) s< 0 (X & (signbit >> Y)) == 0 -> (X << Y) >= 0 ((X << Y) & signbit) != 0 -> (X << Y) s< 0 ((X << Y) & signbit) == 0 -> (X << Y) >= 0
Comment Actions Test cases in icmp-shift-and-signbit.ll shows the updated fold order can generate better IR. Comment Actions Nice, getting closer.
Comment Actions Thank you so much for all the review feedback, really appreciate it! :)
Comment Actions Hey @xbolva00 , I don't see there is much difference between codegen of x86-clang and x86-gcc. (X & (signbit l>> Y)) ==/!= 0 -> (X << Y) >=/< 0 (X & (signbit << Y)) ==/!= 0 -> (X l>> Y) >=/< 0 and fold order issue of ((X << Y) & signbit) ==/!= 0) -> (X << Y) >=/< 0; (X << Y) & ~C ==/!= 0 -> (X << Y) </>= C+1, C+1 is power of 2; and ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0. Comment Actions Oh, i thought i commented on these reviews, apparently not :(
Comment Actions Rebased patch, and addressed review comments.
Comment Actions for onehot_merge.ll (signbit l>> C) is equivalent to (one l<< (bitwidth - C - 1)) In D63903, I update the test input, so that we are still checking fold for 'or' of ICmps and 'and' of ICmps.
Comment Actions This isn't specific to sign bit, the more general pattern is https://rise4fun.com/Alive/2zpl
|