This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Use __rt_div functions for DIVREM on Windows
ClosedPublic

Authored by mstorsjo on Aug 31 2016, 6:33 AM.

Details

Summary

This avoids falling back to calling out to the GCC rem functions (moddi3, umoddi3) when targeting Windows.

The rt_div functions have flipped the two arguments compared to the aeabi_divmod functions.

Contrary to the existing calls to division functions (and to what MSVC does), this doesn't add any check for division by zero (why does it have to, other than the fact that MSVC does it?)

In practice, the __rt_div family of functions check that themselves though, so I'm unsure why MSVC and clang needs to include it in the calling code.

Calls to rt_div functions for division aren't merged with calls to the same function with the same parameters for the remainder, which is more wasteful than a div + mls as before, but avoids calls to moddi3.

Diff Detail

Event Timeline

mstorsjo updated this revision to Diff 69841.Aug 31 2016, 6:33 AM
mstorsjo retitled this revision from to [ARM] Use __rt_div functions for DIVREM on Windows.
mstorsjo updated this object.
mstorsjo added reviewers: rengolin, compnerd, jmolloy.
mstorsjo added a subscriber: llvm-commits.
rengolin requested changes to this revision.Aug 31 2016, 7:35 AM
rengolin edited edge metadata.

Hi Martin,

The fact that divmod-eabi.ll changes is an indication that the interaction between the Dag nodes and the values they produce is weird. I suggest you walk through the back-end and print the Dag before and after the transformations.

One thing that, unfortunately, happens is that Div/Mod patters are passed multiple times over the same nodes, as they get converted from Div/Mod to Divmod, etc. This is particular to ARM and EABI, and may be getting in the way of your lowering. I can't imagine why, since it also returns two values, so it may be some corner case no one knew about.

divmod-eabi.ll is a good first approach to the problem, but you will probably have more corner cases, so please add them to the same file as you add a windows test to it.

cheers,
--renato

lib/Target/ARM/ARMISelLowering.cpp
984

hasDivide makes sense, but is unrelated and may break some expectations I don't know about. I'd leave it for later, since this shouldn't hit here if the sub-arch has divide anyway.

989–1007

I'd have put this in a separate if block, since the function names are *all* different.

12430

This is not about windows or not, it's about rt_div instead of eabi_divmod. Please, make it dependent on the RT type (not OS).

Also, this induction fiddling is confusing. The easiest thing to do is to swap the two first arguments before the loop. It's clear and precise.

This revision now requires changes to proceed.Aug 31 2016, 7:35 AM

I tested to extend this by removing the current custom code for SDIV/UDIV for windows, and divmod-eabi.ll now looks better (it produces two calls to __rt_sdiv instead of three).

Previously, all divisions included a check for whether the divisor is zero (with an udf.w #349 instruction in case of a division by zero), while when I made it work in the same way as __aeabi_idivmod I don't get those.

I'll upload the updated patch for further comments and clarifications.

lib/Target/ARM/ARMISelLowering.cpp
989–1007

I chose putting it here since it shares the code for all the setLibcallCallingConv below anyway, instead of duplicating all the 15 lines of shared code above and below. But you'd still prefer me to duplicate all of it including setOperationAction(), HasStandaloneRem and setLibcallCallingConv?

12430

How do I find the RT type here - I didn't find it within ARMSubtarget on a quick look at least.

And how would I go about to swap the two arguments? N is a const SDNode here - is it ok to make it mutable and swap the arguments there?

mstorsjo updated this revision to Diff 71121.Sep 13 2016, 2:18 AM
mstorsjo edited edge metadata.

Now the output from divmod-eabi.ll makes more sense (I still need to update the test references though).

Given that all of these changes are related to isTargetWindows everywhere, the assumption is that this can only work for windows. Once someone wants to use rt_div on non-Windows, we'll need to come up with a isWeirdABI which Windows and and that taget are members of, just like isEABI and others. So, for now, isWindows is ok.

Just a few more inline comments, and the tests and it should be good.

lib/Target/ARM/ARMISelLowering.cpp
989–1008

Ah, makes sense. Looks good this way.

12430–12431

No, don't change the SDNode, just std::swap the two arguments in Args after the loop.

Just a few more inline comments, and the tests and it should be good.

Actually, it turned out to be harder than this after all. With the current diff, div+rem gets nicely merged into one rt_sdiv call, but a lone div becomes a divsi3 instead.

Given all the changes back and forth in this area in the last year (see SVN revisions 248561, 253865, 254158, 263714 or git mirror commits 64ed61ca6b, ee5418798, 937e2d588c, a944db98cd) I'm a bit hesitant to change this further without input from @compnerd. I'll update the patch to a state which works (and doesn't produce calls to divsi3 or similar), but which doesn't merge the div+rem calls in all cases, and write a bug report for the actual issue that I'm trying to work around (LLVM producing calls to moddi3 which is unavailable on windows).

mstorsjo updated this revision to Diff 71309.Sep 14 2016, 2:16 AM
mstorsjo updated this object.
mstorsjo edited edge metadata.

Updated to a working patch

Filed a plain bug report on the original issue at https://llvm.org/bugs/show_bug.cgi?id=30378 since it seems like my attempt at this won't be completable easily.

test/CodeGen/ARM/divmod-eabi.ll
159

Here, the old test reference even shows a call to __moddi3, which is not available on ARM/Windows.

Right, the change now looks good, but I'll let @compnerd approve, since I'm not that well versed in the Windows ABI.

Right, the change now looks good, but I'll let @compnerd approve, since I'm not that well versed in the Windows ABI.

FWIW, one of the remaining issues is that I don't (currently) add the DBZCHK (division by zero check) before the __rt_div calls for divrem or rem. I can add that as well (I got a patch for it locally). But IMO the question remains whether we should add that at all. MSVC does add it (both for div and rem, which are calls to the same function), but the actual implementations of the functions also do similar checks internally, so it feels redundant. But I guess the ABI of those functions mandate that the caller checks it.

The current implementation of DBZCHK also has got an issue, that it issues a CBZ, but later passes may move the branch target might out of range, see https://llvm.org/bugs/show_bug.cgi?id=30356 and the function EmitLowered__dbzchk.

If we can remove DBZCHK altogether, it'd be easy, but I'm not sure how to fix the current one.

mstorsjo updated this revision to Diff 73448.Oct 4 2016, 3:52 AM

Added DBZCHK for the divrem/rem calls as well.

compnerd added inline comments.Oct 5 2016, 10:15 AM
lib/Target/ARM/ARMISelLowering.cpp
984

Wow, this is getting unwieldy. I wonder if its easier to negate the condition.

989–1007

Not your fault since you are replicating what is there, but I think that refactoring this a slight bit will be nicer:

for (const auto &LC : {RTLIB::SDIVREM_I8, RTLIB::SDIVREM_I16, RTLIB::SDIVREM_I32})
  setLibcallName(LC, Subtarget->isTargetWindows() ? "__rt_sdiv" : "__aeabi_idivmod");
setLibcallName(RTLIB::SDIVREM_I64, Subtarget->isTargetWindows() ? "__rt_sdiv64" : "__aeabi_ldivmod");
for (const auto &LC : {RTLIB::UDIVREM_I8, RTLIB::UDIVREM_I16, RTLIB::UDIVREM_I32})
  setLibcallName(LC, Subtarget->isTargetWindows() ? "__rt_udiv" : "__aeabi_uidivmod");
setLibcallName(RTLIB::UDIVREM_I64, Subtarget->isTargetWindows() ? "__rt_udiv64" : " __aeabi_uldivmod");
12520

Sink this inside the if condition, as it is unused otherwise.

12532

How about doing a trivial static function ExtractDenominator or even better, create a WinDBZCheckDenominator to handle the duplication of the DBZ check.

static SDValue WinDBZCheckDenominator(SDNode *N, SDValue InChain) {
  SDLoc DL(N);
  Value *Op = N->getOperand(1);
  if (N->getValueType(0) == MVT::32)
    return DAG.getNode(ARMISD::WIN__DBZCHK, DL, MVT::Other, InChain, Op);
  SDValue Lo = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, MVT::i32, Op),
                           DAG.getConstant(0, DL, MVT::i32));
  SDValue Hi = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, MVT::i32, Op),
                           DAG.getConstant(1, DL, MVT::i32));
  return DAG.getNode(ARMISD::WIN__DBZCHK, DL, MVT::Other, InChain,
                     DAG.getNode(ISD::OR, DL, MVT::i32, Lo, Hi));
}
test/CodeGen/ARM/Windows/dbzchk.ll
185

Why are you removing the check for the actual dbzchk codegen?

mstorsjo marked 3 inline comments as done.Oct 5 2016, 1:06 PM

Thanks for the comments, I'll update the patch accordingly.

test/CodeGen/ARM/Windows/dbzchk.ll
185

Sorry about that, that change is from an earlier version of the patch where some of the DBZCHKs vanished (when I hadn't added the new ones into REM and DIVREM). I restored this hunk (and some of the ones above as well).

mstorsjo updated this revision to Diff 73686.Oct 5 2016, 1:08 PM

Updated based on the comments.

compnerd accepted this revision.Oct 5 2016, 1:27 PM
compnerd edited edge metadata.
compnerd added inline comments.
test/CodeGen/ARM/Windows/dbzchk.ll
137

This is amazing. Glad to see this improved.

test/CodeGen/ARM/divmod-eabi.ll
45–47

Hmm, it would be nice to see that we can still generate the hardware division. But, I don't think that we need to hold up on that.

This revision was automatically updated to reflect the committed changes.