Forking this off from D140850 -
https://alive2.llvm.org/ce/z/TgBeK_
https://alive2.llvm.org/ce/z/STVD7d
We could almost justify doing this in IR, but consideration for minsize compiles requires that we only try it in codegen -- the transform is not reversible.
In all other cases, avoiding multiply should be a win because a mul is more expensive than simple/parallelizable compares. AArch even has a trick (assuming that's the correct asm) to keep instruction count even for some types.
Wow nice!