This avoids falling back to calling out to the GCC rem functions (moddi3, umoddi3) when targeting Windows.
The rt_div functions have flipped the two arguments compared to the aeabi_divmod functions.
Contrary to the existing calls to division functions (and to what MSVC does), this doesn't add any check for division by zero (why does it have to, other than the fact that MSVC does it?)
In practice, the __rt_div family of functions check that themselves though, so I'm unsure why MSVC and clang needs to include it in the calling code.
Calls to rt_div functions for division aren't merged with calls to the same function with the same parameters for the remainder, which is more wasteful than a div + mls as before, but avoids calls to moddi3.
hasDivide makes sense, but is unrelated and may break some expectations I don't know about. I'd leave it for later, since this shouldn't hit here if the sub-arch has divide anyway.