This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Fix calling convention in expansion of ?MULO.
ClosedPublic

Authored by whitequark on Oct 3 2016, 8:59 PM.

Details

Summary
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, an __mul?i3 intrinsic is used. Since the type is
not legal, the legalizer cannot directly call the intrinsic with
the wide arguments; instead, it "pre-lowers" them by splitting them
in halves.

The "pre-lowering" code in essence made assumptions about
the calling convention, specifically that i(N*2) values will be
split into two iN values and passed in consecutive registers in
little-endian order. This, naturally, breaks on a big-endian system,
such as our OR1K out-of-tree backend.

Thanks to James Miller <james@aatch.net> for help in debugging.

Diff Detail

Repository
rL LLVM

Event Timeline

whitequark updated this revision to Diff 73393.Oct 3 2016, 8:59 PM
whitequark retitled this revision from to [SelectionDAG] Fix calling convention in expansion of ?MULO..
whitequark updated this object.
whitequark added a reviewer: resistor.
whitequark set the repository for this revision to rL LLVM.

It is quite regretful that this change comes without tests, but I wasn't able to trick any of the existing in-tree bi-endian backends into generating a __muldi3 sequence that I need to match against. Even worse, just unconditionally swapping the registers doesn't break any of the tests, implying that this branch wasn't tested in the first place (and also that there's nothing I can easily modify).

whitequark updated this object.Oct 3 2016, 9:08 PM

Maybe ppc32 big endian for the testcase? (If you've tried it and it doesn't work that's fine, just a thought).

-eric

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3689–3696 ↗(On Diff #73393)

Comment about it the difference here?

Maybe ppc32 big endian for the testcase?

Nope, for __builtin_mul_overflow this generates a completely incomprehensible sequence that nevertheless includes no builtin calls:

stw 31, -4(1)
stwu 1, -16(1)
mulhwu 6, 3, 4
mullw 3, 3, 4
mr 31, 1
cntlzw   12, 6
stw 3, 0(5)
nor 4, 12, 12
rlwinm 3, 4, 27, 31, 31
addi 1, 1, 16
lwz 31, -4(1)
blr

Something similar happens with MIPS.

Oh well.

Anyhow, one inline comment then I guess it looks OK?

Thanks.

whitequark updated this revision to Diff 73402.Oct 3 2016, 11:54 PM
whitequark removed rL LLVM as the repository for this revision.

Added explanatory comment

echristo accepted this revision.Oct 3 2016, 11:55 PM
echristo added a reviewer: echristo.

LGTM.

This revision is now accepted and ready to land.Oct 3 2016, 11:55 PM
This revision was automatically updated to reflect the committed changes.