Skip to content

Commit 6f3387f

Browse files
author
Hal Finkel
committedMay 25, 2016
[SDAG] Add a fallback multiplication expansion
LegalizeIntegerTypes does not have a way to expand multiplications for large integer types (i.e. larger than twice the native bit width). There's no standard runtime call to use in that case, and so we'd just assert. Unfortunately, as it turns out, it is possible to hit this case from standard-ish C code in rare cases. A particular case a user ran into yesterday involved an __int128 induction variable and a loop with a quadratic (not linear) recurrence which triggered some backend logic using SCEVExpander. In this case, the BinomialCoefficient code in SCEV generates some i129 variables, which get widened to i256. At a high level, this is not actually good (i.e. the underlying optimization, PPCLoopPreIncPrep, should not be transforming the loop in question for performance reasons), but regardless, the backend shouldn't crash because of cost-modeling issues in the optimizer. This is a straightforward implementation of the multiplication expansion, based on the algorithm in Hacker's Delight. I validated it against the code for the mul256b function from http://locklessinc.com/articles/256bit_arithmetic/ using random inputs. There should be no functional change for previously-working code (the new expansion code only replaces an assert). Fixes PR19797. llvm-svn: 270720
1 parent bef0eb0 commit 6f3387f

File tree

2 files changed

+70
-1
lines changed

2 files changed

+70
-1
lines changed
 

‎llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

+43-1
Original file line numberDiff line numberDiff line change
@@ -2133,7 +2133,49 @@ void DAGTypeLegalizer::ExpandIntRes_MUL(SDNode *N,
21332133
LC = RTLIB::MUL_I64;
21342134
else if (VT == MVT::i128)
21352135
LC = RTLIB::MUL_I128;
2136-
assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported MUL!");
2136+
2137+
if (LC == RTLIB::UNKNOWN_LIBCALL) {
2138+
// We'll expand the multiplication by brute force because we have no other
2139+
// options. This is a trivially-generalized version of the code from
2140+
// Hacker's Delight (itself derived from Knuth's Algorithm M from section
2141+
// 4.3.1).
2142+
SDValue Mask =
2143+
DAG.getConstant(APInt::getLowBitsSet(NVT.getSizeInBits(),
2144+
NVT.getSizeInBits() >> 1), dl, NVT);
2145+
SDValue LLL = DAG.getNode(ISD::AND, dl, NVT, LL, Mask);
2146+
SDValue RLL = DAG.getNode(ISD::AND, dl, NVT, RL, Mask);
2147+
2148+
SDValue T = DAG.getNode(ISD::MUL, dl, NVT, LLL, RLL);
2149+
SDValue TL = DAG.getNode(ISD::AND, dl, NVT, T, Mask);
2150+
2151+
SDValue Shift =
2152+
DAG.getConstant(NVT.getSizeInBits() >> 1, dl,
2153+
TLI.getShiftAmountTy(NVT, DAG.getDataLayout()));
2154+
SDValue TH = DAG.getNode(ISD::SRL, dl, NVT, T, Shift);
2155+
SDValue LLH = DAG.getNode(ISD::SRL, dl, NVT, LL, Shift);
2156+
SDValue RLH = DAG.getNode(ISD::SRL, dl, NVT, RL, Shift);
2157+
2158+
SDValue U = DAG.getNode(ISD::ADD, dl, NVT,
2159+
DAG.getNode(ISD::MUL, dl, NVT, LLH, RLL), TL);
2160+
SDValue UL = DAG.getNode(ISD::AND, dl, NVT, U, Mask);
2161+
SDValue UH = DAG.getNode(ISD::SRL, dl, NVT, U, Shift);
2162+
2163+
SDValue V = DAG.getNode(ISD::ADD, dl, NVT,
2164+
DAG.getNode(ISD::MUL, dl, NVT, LLL, RLH), UL);
2165+
SDValue VH = DAG.getNode(ISD::SRL, dl, NVT, V, Shift);
2166+
2167+
SDValue W = DAG.getNode(ISD::ADD, dl, NVT,
2168+
DAG.getNode(ISD::MUL, dl, NVT, LL, RL),
2169+
DAG.getNode(ISD::ADD, dl, NVT, UH, VH));
2170+
Lo = DAG.getNode(ISD::ADD, dl, NVT, TH,
2171+
DAG.getNode(ISD::SHL, dl, NVT, V, Shift));
2172+
2173+
Hi = DAG.getNode(ISD::ADD, dl, NVT, W,
2174+
DAG.getNode(ISD::ADD, dl, NVT,
2175+
DAG.getNode(ISD::MUL, dl, NVT, RH, LL),
2176+
DAG.getNode(ISD::MUL, dl, NVT, RL, LH)));
2177+
return;
2178+
}
21372179

21382180
SDValue Ops[2] = { N->getOperand(0), N->getOperand(1) };
21392181
SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, true/*irrelevant*/, dl).first,

‎llvm/test/CodeGen/X86/mul-i256.ll

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
; RUN: llc < %s | FileCheck %s
2+
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
3+
target triple = "x86_64-unknown-linux-gnu"
4+
5+
define void @test(i256* %a, i256* %b, i256* %out) #0 {
6+
entry:
7+
%av = load i256, i256* %a
8+
%bv = load i256, i256* %b
9+
%r = mul i256 %av, %bv
10+
store i256 %r, i256* %out
11+
ret void
12+
}
13+
14+
; CHECK-LABEL: @test
15+
; There is a lot of inter-register motion, and so matching the instruction
16+
; sequence will be fragile. There should be 6 underlying multiplications.
17+
; CHECK: imulq
18+
; CHECK: imulq
19+
; CHECK: imulq
20+
; CHECK: imulq
21+
; CHECK: imulq
22+
; CHECK: imulq
23+
; CHECK-NOT: imulq
24+
; CHECK: retq
25+
26+
attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }
27+

0 commit comments

Comments
 (0)
Please sign in to comment.