As noted in:
https://bugs.llvm.org/show_bug.cgi?id=31028
...I initially thought this would be a CGP patch and limited to targets like x86. But doing this in SimplifyCFG improves code even for targets like AArch64 that don't have a divrem instruction. That's because we can replace compare and branch with csel.
I know that the hoisting of a rem may be a stretch of the conservative limits of SimplifyCFG, but the benefits of collapsing the blocks seems like a worthy transform. We could (as we do for other expensive ops) split this back up in CGP if it is a concern.
For the example in the PR, AArch64 had:
mov w8, w0 sdiv w0, w8, w1 msub w8, w0, w1, w8 cmp w8, #42 // =42 b.eq .LBB0_2 <--- nothing in the backend is flattening this orr w0, wzr, #0x3 .LBB0_2: ret
After:
sdiv w8, w0, w1 msub w9, w8, w1, w0 cmp w9, #42 // =42 orr w9, wzr, #0x3 csel w0, w8, w9, eq ret
On x86, we had:
movl %edi, %ecx movl %ecx, %eax cltd idivl %esi movl $3, %eax cmpl $42, %edx jne .LBB0_2 # BB#1: movl %ecx, %eax cltd idivl %esi <--- very expensive and useless instruction .LBB0_2: retq
After:
movl %edi, %eax cltd idivl %esi cmpl $42, %edx movl $3, %ecx cmovnel %ecx, %eax retq
Please make one test per pair, too.