"X % C == 0" is optimized to "X & C-1 == 0" (where C is a power-of-two)
However, "X % Y" can also be represented as "X - (X / Y) * Y" so if I rewrite the initial expression:
"X - (X / C) * C == 0" it's not currently optimized to "X & C-1 == 0", see godbolt: https://godbolt.org/z/KzuXUj
This is my first contribution to LLVM so I hope I didn't mess things up
I think we should first instead fold X - (X / C) * C/((X / -C1) >> C2)) + X into X % C.
We have integer division/remainder either way, but with remainder it's less instructions.
But that won't help here, because your target pattern @is_rem32_pos_decomposed_i8
is currently folded into
define i1 @is_rem32_pos_decomposed_i8(i8 %x) { %d.neg = sdiv i8 %x, -32 %m.neg = shl i8 %d.neg, 5 %s = sub i8 0, %x %r = icmp eq i8 %m.neg, %s ret i1 %r }instead of
define i1 @tgt(i8 %x) { %d.neg = sdiv i8 %x, -32 %m.neg = shl i8 %d.neg, 5 %m.neg.add = add i8 %m.neg, %x %r = icmp eq i8 %m.neg.add, 0 ret i1 %r }For that, i'd think we should extend foldIRemByPowerOfTwoToBitTest().