The division-by-constant strength reduction into multiply-shift sequence of instructions can be
applied on ~all target at any integer width to gain significant throughput boost for the operation,
at a (fairly significant) cost of code size.
LLVM already has this optimisation, but it would only fire on integers
with bit-widths supported natively. For example on x86_64 divisions up to 64-bits would trigger the
optimisation and on i686 64-bit integers would no longer be strength-reduced anymore.
This commit adjusts the lowering code to apply this strength-reduction even on integer bit-widths
not natively supported by the target. Ideally this would've been implemented via fallback lowerings
for the ISD::MULHU and ISD::MULHS – not all of the backends support them – but I found that to
require significant refactors and it still failed to work on some backends such as the ARM or the
RISCV (without m instructions) regardless.
However, the targets will universally support ISD::MUL of any bit-width so we just take the upper
half of the regular MUL result. This will likely be sub-optimal in a sense that some of the
instructions may not do anything useful, but even with those instructions present the resulting
lowering should be significantly better compared to conventional software division implementations.
Depends on D88785
clang-tidy: warning: invalid case style for parameter 'isDivergent' [readability-identifier-naming]
not useful