Adds a pass ExpandLargeDivRem to expand div/rem instructions
with more than 128 bits into auto-generated loops.
For example, urem i129 is expanded into a loop that is
automatically generated to implement
a simple shift-subtract algorithm similar to
loop: ; preds = %if.end, %entry %i = phi i32 [ 128, %entry ], [ %new_i, %if.end ] %r = phi i129 [ 0, %entry ], [ %r3, %if.end ] %iext = zext i32 %i to i129 %2 = lshr i129 %0, %iext %3 = trunc i129 %2 to i1 %new_r = shl i129 %r, 1 %4 = zext i1 %3 to i129 %new_r1 = or i129 %new_r, %4 %loop_exit_cond = icmp eq i32 %i, 0 %new_i = add i32 %i, -1 %5 = icmp uge i129 %new_r1, %1 br i1 %5, label %then, label %if.end then: ; preds = %loop %new_r2 = sub i129 %new_r1, %1 br label %if.end if.end: ; preds = %then, %loop %r3 = phi i129 [ %new_r2, %then ], [ %new_r1, %loop ] br i1 %loop_exit_cond, label %exit, label %loop ; Result is in %r3 }
As discussed on https://reviews.llvm.org/D120327, this approach has the advantage
that it is independent of the runtime library. This also helps the clang driver,
which otherwise would need to understand enough about the runtime library
to know whether to allow _BitInts with more than 128 bits.
Targets are still free to disable this pass and instead provide a faster
implementation in a runtime library.
Update the name.