This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
28/34
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-mul.ll
1/2
mul_pow2.ll

Differential D25966

[AArch64] Lower multiplication by a constant int to shl+add+shl
ClosedPublic

Authored by haicheng on Oct 25 2016, 2:23 PM.

Download Raw Diff

Details

Reviewers

Gerolf
mcrosier

Summary

GCC can lower a = b * C where C = (2^n + 1) * 2^m to

add     w0, w0, w0, lsl n
lsl     w0, w0, m

also lower C = (2^n - 1) * 2^m to

lsl     w1, w0, n
sub     w0, w1, w0
lsl     w0, w0, m

LLVM cannot do either above transformations and generate code like this

mov     w8, C    
mul     w0, w0, w8

This change considers the first case, since the second case requires an extra instruction. The change is also very conservative to try not to touch the mul that can be folded into s(u)mull, madd(sub), s(u)madd(sub)l since their costs seem unknown during ISelLowering.

I am also open to suggestions about a better place (machine-combiner???) to implement the transformation. If I have the information of the cycles of 32 and 64bit mul, I can consider more constants such as C = (2^m + 1) * (2^n+1), C = (2^m + 1) * 2^n + 1, or C = ((2^m + 1) * 2^n + 1) * 2^p

Diff Detail

Repository: rL LLVM

Event Timeline

haicheng updated this revision to Diff 75789.Oct 25 2016, 2:23 PM

haicheng retitled this revision from to [AArch64] Lower multiplication by a constant int to shl+add+shl.

haicheng updated this object.

haicheng added reviewers: mcrosier, Gerolf.

haicheng set the repository for this revision to rL LLVM.

haicheng added a subscriber: llvm-commits.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptOct 25 2016, 2:23 PM

Hi Haicheng,

This looks simple enough to be worth, even if the benefit is probably very small. But as it is, the code is complicating one part of two identical twins (positive and negative) and not the other, which complicates the code.

I recommend you to change that part of the code entirely into setting temporary variables, like +1/-1 ISD::ADD/ISD::SUB, based on the result of isNonNegative, and use the same piece of code for both paths.

A more generic approach could be done with some smarter constant-splitting, but this patch is simple as it is and there is already plenty of prior art for that, so let's stick to the pattern.

Also, I was expecting a much larger body of tests, with different constant sizes, multiple edge cases, and those that cannot be done, remaining a mul.

I have some comments inline, but my only additional question is: What is the motivation behind this? Benchmark numbers? Can you share them?

cheers,
--renato

lib/Target/AArch64/AArch64ISelLowering.cpp
7552	why not early return? why is this not a problem for the previous case as well?
7561	`Shift` is not a good name, since this implies the "shift amount" not the "shifted value".
test/CodeGen/AArch64/mul_pow2.ll
5	You say "shift+add+shift" but your tests are on the form "add+shift".

Hi Haicheng,

I just have a few observations/food for thought:

Nit: In your Summary I think you swapped n and m in your code snippets vs your formulas. Your code is correct though.
The 2^N-1 * 2^M reduction increases code size, so it should not fire under Oz. Otherwise similar consideration as to your major case apply
The 2^N+1 * 2^M reduction increases schedule height (at least on most processors). It might also increase code when e.g. add+mul could be combined to madd. But when code size is *not* a concern and latency(lsl) + 1 < latency (mul), latency(madd) it should always be a win. But that target dependence is not checked in your code yet.
I would look at the machine combiner only for cases that need more global scheduling context to decide

Like Renato I'm also curious about your gains. How big? Which benchmarks?

Cheers
Gerolf

haicheng updated this object.Oct 28 2016, 9:52 AM

haicheng edited edge metadata.

Rewrite performMulCombine(), make the conversion a little less conservative to improve the performance and reduce the compilation time, add more tests.

Thank you, Renato. I rewrote my change and added more tests, please let me know if I did what you recommended.

In D25966#579090, @rengolin wrote:

I have some comments inline, but my only additional question is: What is the motivation behind this? Benchmark numbers? Can you share them?

The biggest motivation is that GCC can do this, but LLVM cannot. My patch is conservative and it does not make big change to the performance. I have not observed any noticeable regression, but the gain is small. spec2006/h264ref and spec2006/povray have around 1% improvement. One internal benchmark which is integer multiplication centric has much larger improvement.

rengolin added inline comments.Oct 29 2016, 9:02 AM

test/CodeGen/AArch64/mul_pow2.ll
119	Please use {{w[0-9]+}} instead of w8.

In D25966#582890, @haicheng wrote:

The biggest motivation is that GCC can do this, but LLVM cannot. My patch is conservative and it does not make big change to the performance. I have not observed any noticeable regression, but the gain is small. spec2006/h264ref and spec2006/povray have around 1% improvement. One internal benchmark which is integer multiplication centric has much larger improvement.

Any regressions? Not that I'm expecting any, but... :)

I'll come back a bit later once I've done a proper review.

Thanks!

Thank you, Gerolf

In D25966#581742, @Gerolf wrote:

Hi Haicheng,

I just have a few observations/food for thought:

Nit: In your Summary I think you swapped n and m in your code snippets vs your formulas. Your code is correct though.

Thank you for catching this. I updated the summary.

The 2^N-1 * 2^M reduction increases code size, so it should not fire under Oz. Otherwise similar consideration as to your major case apply

The 2^N+1 * 2^M reduction increases schedule height (at least on most processors). It might also increase code when e.g. add+mul could be combined to madd. But when code size is *not* a concern and latency(lsl) + 1 < latency (mul), latency(madd) it should always be a win. But that target dependence is not checked in your code yet.

I would look at the machine combiner only for cases that need more global scheduling context to decide

I agree everything you said. I tried to be conservative in this patch to not increase code size or impact the generation of madd. If I want to support my cases, I think I need to check the target and compare the cost of different code sequences.

Like Renato I'm also curious about your gains. How big? Which benchmarks?

Please see my response to Renato above.

Update the tests according to Renato's comments.

Hi,

I have a few variable name proposals, mostly to aid the understanding of the code. I apologise for beating on that key, but the code is getting more dense, less repetitive, so it has to be well understood.

I'd welcome another review from @Gerolf at this point. :)

cheers,
--renato

lib/Target/AArch64/AArch64ISelLowering.cpp
7548	I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/- 1)
7556	`N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to understand.
7558	`Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name for this.
7575	Better call this `SwapValues`, as this is the intention of the flag.
7579	`VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the case where `VM1` is a power of two, then `Lg2` is zero and `ShiftedInt == Value`, but not always. I wouldn't mind `ShiftedMinus1` and `ValuePlus1`.
7579	Feel free to hoist those two flags out of the conditional. This will make it clear that they're invariants here.

Rename the variable names as suggested by Renato. Thank you.

I thought I understand this until about the middle of the review. Now I could use some help perhaps with variable names and comments that reflect more clearly on the expression(s) you simplify. I think this is what Renato is looking for, too.

Thanks
Gerolf

lib/Target/AArch64/AArch64ISelLowering.cpp
7545–7561	You could tie it more to the code, e.g. some multiplications Var * C can be ...
7548	I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = A*B where A, B are of type +/- (2^N +/- 1)"?
7553	ValueOfC?
7561	no -> not
7566	dito
7572	If you declare e.g C = A * B then ShiftedInt could be ConstantA etc
7575	Operation would be more general than AddOrSub
7577	Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp?
7579	ShiftedMinus1 could be ConstandAMinus1
7580	Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt (ConstantA) -1. At this point I can't match your specification and your code. However, if I"m right about this I will need to dig deep into your test cases, too ...
7589	After this point I think you can assert(IntValue == 2^N, some power of 2).
7591	I think Value should be ShiftedMinus1 from here on.
7608	I'll take another look at this code after I (think I) understand the code above.
7614	It is not clear to me why TrailingZeros and ExtraNeg are exclusive.

Address Gerolf's comments.

haicheng added inline comments.Nov 9 2016, 8:03 AM

lib/Target/AArch64/AArch64ISelLowering.cpp
7580	ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not support (mul x, (2^N - 1) * 2^M) => (shl (sub (shl x, N), x), M) yet. If we want to support it in the future, we just need to use ConstantA here as you said.
7591	Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x, -(2^N + 1) * 2^M) => -(shl (add (shl x, N), x), M) So I use ValueOfC here.

Hi Gerolf,

Would you please take another look? Does my latest update make the code easier to read?

Thank you,

Haicheng

I've done a bit of refactoring in r286601 and r286606, which I hope makes this a much easier code review. If you rebase the patch, I'd be happy to take a look.

FYI, Chad has made some big refactoring in this area, you will have to re-base:

http://llvm.org/viewvc/llvm-project?rev=286606&view=rev

Rebase the code.

With the minor comment, this looks good to me. But I'll let @Gerolf and @mcrosier have a final look and approve.

Thanks!

lib/Target/AArch64/AArch64ISelLowering.cpp
7537	IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA == ConstValue`. Your comment should reflect that. Not here, but above, before `ConstantA`'s instantiation. Here, you can just add the new case: // (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)

Address Renato's comment. Thank you.

LGTM with a few nits about naming of variables.

lib/Target/AArch64/AArch64ISelLowering.cpp
7536–7541	CAMinus1 -> SCVMinus1
7537	CAMinus1 -> SCVMinus1
7542	'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering.
7558	I'd prefer 'ShiftedConstValue' over 'ConstantA'.
7607–7608	Var -> N0
7607–7608	Var -> N0
7607–7608	Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the last bit of logic here can be written as: // Negate the result. if (NegateResult) return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Res); // Shift the result. if (TrailingZeroes) return DAG.getNode(ISD::SHL, DL, VT, Res, DAG.getConstant(TrailingZeroes, DL, MVT::i64)); return Res;
7608	Var -> N0

This revision is now accepted and ready to land.Nov 14 2016, 6:52 AM

Thanks for following up!
LGTM

lib/Target/AArch64/AArch64ISelLowering.cpp
7559	CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear.

This was committed in r287019.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

34 lines

test/

CodeGen/

AArch64/

arm64-mul.ll

2 lines

mul_pow2.ll

20 lines

Diff 75789

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,527 Lines • ▼ Show 20 Lines	static SDValue performMulCombine(SDNode *N, SelectionDAG &DAG,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

// Multiplication of a power of two plus/minus one can be done more		// Multiplication of a power of two plus/minus one can be done more
// cheaply as as shift+add/sub. For now, this is true unilaterally. If		// cheaply as as shift+add/sub. For now, this is true unilaterally. If
// future CPUs have a cheaper MADD instruction, this may need to be		// future CPUs have a cheaper MADD instruction, this may need to be
// gated on a subtarget feature. For Cyclone, 32-bit MADD is 4 cycles and		// gated on a subtarget feature. For Cyclone, 32-bit MADD is 4 cycles and
// 64-bit is 5 cycles, so this is always a win.		// 64-bit is 5 cycles, so this is always a win.
		// More aggressively, some multiplications can be lowered to shift+add+shift
		rengolinUnsubmitted Done Reply Inline Actions IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA == ConstValue`. Your comment should reflect that. Not here, but above, before `ConstantA`'s instantiation. Here, you can just add the new case: // (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M) rengolin: IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA ==…
		mcrosierUnsubmitted Not Done Reply Inline Actions CAMinus1 -> SCVMinus1 mcrosier: CAMinus1 -> SCVMinus1
		// if the constant is (2^N + 1) * 2^M.
		// TODO: consider constants in the form of (2^N - 1 ) * 2^M,
		// (2^N + 1 ) * 2^M + 1, or (2^N + 1) * (2^M + 1).
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(N->getOperand(1))) {		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(N->getOperand(1))) {
		mcrosierUnsubmitted Not Done Reply Inline Actions CAMinus1 -> SCVMinus1 mcrosier: CAMinus1 -> SCVMinus1
const APInt &Value = C->getAPIntValue();		const APInt &Value = C->getAPIntValue();
		mcrosierUnsubmitted Not Done Reply Inline Actions 'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering. mcrosier: 'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering.
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
if (Value.isNonNegative()) {		if (Value.isNonNegative()) {
		// Lg2 is used to test if the mul can be lowered to shift+add+shift.
		unsigned Lg2 = Value.countTrailingZeros();
		// Conservatively do no lower to shift+add+shift if the mul might be
		rengolinUnsubmitted Done Reply Inline Actions I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/- 1) rengolin: I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/…
		GerolfUnsubmitted Done Reply Inline Actions I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = AB where A, B are of type +/- (2^N +/- 1)"? Gerolf:* I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = A*B where…
		// folded into smul or umul.
		if (Lg2 && (isSignExtended(N->getOperand(0).getNode(), DAG) \|\|
		isZeroExtended(N->getOperand(0).getNode(), DAG)))
		Lg2 = 0;
		rengolinUnsubmitted Done Reply Inline Actions why not early return? why is this not a problem for the previous case as well? rengolin: why not early return? why is this not a problem for the previous case as well?
		// Conservatively do no lower to shift+add+shift if the mul might be
		GerolfUnsubmitted Done Reply Inline Actions ValueOfC? Gerolf: ValueOfC?
		// folded into madd or msub.
		if (Lg2)
		for (SDNode *Use : N->uses())
		rengolinUnsubmitted Done Reply Inline Actions `N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to understand. rengolin: `N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to…
		if (Use->getOpcode() == ISD::ADD \|\| Use->getOpcode() == ISD::SUB) {
		Lg2 = 0;
		rengolinUnsubmitted Done Reply Inline Actions `Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name for this. rengolin: `Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name…
		mcrosierUnsubmitted Not Done Reply Inline Actions I'd prefer 'ShiftedConstValue' over 'ConstantA'. mcrosier: I'd prefer 'ShiftedConstValue' over 'ConstantA'.
		break;
		GerolfUnsubmitted Not Done Reply Inline Actions CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear. Gerolf: CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear.
		}
		APInt Shift = Value.ashr(Lg2);
		rengolinUnsubmitted Done Reply Inline Actions `Shift` is not a good name, since this implies the "shift amount" not the "shifted value". rengolin: `Shift` is not a good name, since this implies the "shift amount" not the "shifted value".
		GerolfUnsubmitted Done Reply Inline Actions You could tie it more to the code, e.g. some multiplications Var * C can be ... Gerolf: You could tie it more to the code, e.g. some multiplications Var * C can be ...
		GerolfUnsubmitted Done Reply Inline Actions no -> not Gerolf: no -> not
// (mul x, 2^N + 1) => (add (shl x, N), x)		// (mul x, 2^N + 1) => (add (shl x, N), x)
APInt VM1 = Value - 1;		// (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)
		APInt VM1 = Shift - 1;
if (VM1.isPowerOf2()) {		if (VM1.isPowerOf2()) {
SDValue ShiftedVal =		SDValue ShiftedValue =
		GerolfUnsubmitted Done Reply Inline Actions dito Gerolf: dito
DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),		DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),
DAG.getConstant(VM1.logBase2(), DL, MVT::i64));		DAG.getConstant(VM1.logBase2(), DL, MVT::i64));
return DAG.getNode(ISD::ADD, DL, VT, ShiftedVal,		SDValue Add =
N->getOperand(0));		DAG.getNode(ISD::ADD, DL, VT, ShiftedValue, N->getOperand(0));
		if (Lg2)
		return DAG.getNode(ISD::SHL, DL, VT, Add,
		GerolfUnsubmitted Done Reply Inline Actions If you declare e.g C = A * B then ShiftedInt could be ConstantA etc Gerolf: If you declare e.g C = A * B then ShiftedInt could be ConstantA etc
		DAG.getConstant(Lg2, DL, MVT::i64));
		else
		return Add;
		rengolinUnsubmitted Done Reply Inline Actions Better call this `SwapValues`, as this is the intention of the flag. rengolin: Better call this `SwapValues`, as this is the intention of the flag.
		GerolfUnsubmitted Done Reply Inline Actions Operation would be more general than AddOrSub Gerolf: Operation would be more general than AddOrSub
}		}
// (mul x, 2^N - 1) => (sub (shl x, N), x)		// (mul x, 2^N - 1) => (sub (shl x, N), x)
		GerolfUnsubmitted Done Reply Inline Actions Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp? Gerolf: Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp?
APInt VP1 = Value + 1;		APInt VP1 = Value + 1;
if (VP1.isPowerOf2()) {		if (VP1.isPowerOf2()) {
		rengolinUnsubmitted Done Reply Inline Actions `VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the case where `VM1` is a power of two, then `Lg2` is zero and `ShiftedInt == Value`, but not always. I wouldn't mind `ShiftedMinus1` and `ValuePlus1`. rengolin: `VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the…
		rengolinUnsubmitted Done Reply Inline Actions Feel free to hoist those two flags out of the conditional. This will make it clear that they're invariants here. rengolin: Feel free to hoist those two flags out of the conditional. This will make it clear that they're…
		GerolfUnsubmitted Done Reply Inline Actions ShiftedMinus1 could be ConstandAMinus1 Gerolf: ShiftedMinus1 could be ConstandAMinus1
SDValue ShiftedVal =		SDValue ShiftedVal =
		GerolfUnsubmitted Not Done Reply Inline Actions Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt (ConstantA) -1. At this point I can't match your specification and your code. However, if I"m right about this I will need to dig deep into your test cases, too ... Gerolf: Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt…
		haichengAuthorUnsubmitted Not Done Reply Inline Actions ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not support (mul x, (2^N - 1) * 2^M) => (shl (sub (shl x, N), x), M) yet. If we want to support it in the future, we just need to use ConstantA here as you said. haicheng: ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not…
DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),		DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),
DAG.getConstant(VP1.logBase2(), DL, MVT::i64));		DAG.getConstant(VP1.logBase2(), DL, MVT::i64));
return DAG.getNode(ISD::SUB, DL, VT, ShiftedVal,		return DAG.getNode(ISD::SUB, DL, VT, ShiftedVal,
N->getOperand(0));		N->getOperand(0));
}		}
} else {		} else {
// (mul x, -(2^N - 1)) => (sub x, (shl x, N))		// (mul x, -(2^N - 1)) => (sub x, (shl x, N))
APInt VNP1 = -Value + 1;		APInt VNP1 = -Value + 1;
if (VNP1.isPowerOf2()) {		if (VNP1.isPowerOf2()) {
		GerolfUnsubmitted Done Reply Inline Actions After this point I think you can assert(IntValue == 2^N, some power of 2). Gerolf: After this point I think you can assert(IntValue == 2^N, some power of 2).
SDValue ShiftedVal =		SDValue ShiftedVal =
DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),		DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),
		GerolfUnsubmitted Not Done Reply Inline Actions I think Value should be ShiftedMinus1 from here on. Gerolf: I think Value should be ShiftedMinus1 from here on.
		haichengAuthorUnsubmitted Not Done Reply Inline Actions Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x, -(2^N + 1) * 2^M) => -(shl (add (shl x, N), x), M) So I use ValueOfC here. haicheng: Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x…
DAG.getConstant(VNP1.logBase2(), DL, MVT::i64));		DAG.getConstant(VNP1.logBase2(), DL, MVT::i64));
return DAG.getNode(ISD::SUB, DL, VT, N->getOperand(0),		return DAG.getNode(ISD::SUB, DL, VT, N->getOperand(0),
ShiftedVal);		ShiftedVal);
}		}
// (mul x, -(2^N + 1)) => - (add (shl x, N), x)		// (mul x, -(2^N + 1)) => - (add (shl x, N), x)
APInt VNM1 = -Value - 1;		APInt VNM1 = -Value - 1;
if (VNM1.isPowerOf2()) {		if (VNM1.isPowerOf2()) {
SDValue ShiftedVal =		SDValue ShiftedVal =
DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),		DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),
DAG.getConstant(VNM1.logBase2(), DL, MVT::i64));		DAG.getConstant(VNM1.logBase2(), DL, MVT::i64));
SDValue Add =		SDValue Add =
DAG.getNode(ISD::ADD, DL, VT, ShiftedVal, N->getOperand(0));		DAG.getNode(ISD::ADD, DL, VT, ShiftedVal, N->getOperand(0));
return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Add);		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Add);
}		}
}		}
}		}
return SDValue();		return SDValue();
		GerolfUnsubmitted Not Done Reply Inline Actions I'll take another look at this code after I (think I) understand the code above. Gerolf: I'll take another look at this code after I (think I) understand the code above.
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
		mcrosierUnsubmitted Not Done Reply Inline Actions Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the last bit of logic here can be written as: // Negate the result. if (NegateResult) return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Res); // Shift the result. if (TrailingZeroes) return DAG.getNode(ISD::SHL, DL, VT, Res, DAG.getConstant(TrailingZeroes, DL, MVT::i64)); return Res; mcrosier: Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the…
}		}

static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,		static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// Take advantage of vector comparisons producing 0 or -1 in each lane to		// Take advantage of vector comparisons producing 0 or -1 in each lane to
// optimize away operation when it's from a constant.		// optimize away operation when it's from a constant.
		GerolfUnsubmitted Done Reply Inline Actions It is not clear to me why TrailingZeros and ExtraNeg are exclusive. Gerolf: It is not clear to me why TrailingZeros and ExtraNeg are exclusive.
//		//
// The general transformation is:		// The general transformation is:
// UNARYOP(AND(VECTOR_CMP(x,y), constant)) -->		// UNARYOP(AND(VECTOR_CMP(x,y), constant)) -->
// AND(VECTOR_CMP(x,y), constant2)		// AND(VECTOR_CMP(x,y), constant2)
// constant2 = UNARYOP(constant)		// constant2 = UNARYOP(constant)

// Early exit if this isn't a vector operation, the operand of the		// Early exit if this isn't a vector operation, the operand of the
// unary operation isn't a bitwise AND, or if the sizes of the operations		// unary operation isn't a bitwise AND, or if the sizes of the operations
▲ Show 20 Lines • Show All 2,781 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-mul.ll

	Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	}			}

	; Check 64-bit multiplication is used for constants > 32 bits.			; Check 64-bit multiplication is used for constants > 32 bits.
	define i64 @t10(i32 %a) nounwind {			define i64 @t10(i32 %a) nounwind {
	entry:			entry:
	; CHECK-LABEL: t10:			; CHECK-LABEL: t10:
	; CHECK: mul {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: mul {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	%tmp1 = sext i32 %a to i64			%tmp1 = sext i32 %a to i64
	%tmp2 = mul i64 %tmp1, 2147483650 ; = 2^31 + 2			%tmp2 = mul i64 %tmp1, 2147483650 ; = 2^31 + 2
	ret i64 %tmp2			ret i64 %tmp2
	}			}

	; Check the sext_inreg case.			; Check the sext_inreg case.
	define i64 @t11(i64 %a) nounwind {			define i64 @t11(i64 %a) nounwind {
	entry:			entry:
	; CHECK-LABEL: t11:			; CHECK-LABEL: t11:
	; CHECK: smnegl {{x[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: smnegl {{x[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	Show All 37 Lines

test/CodeGen/AArch64/mul_pow2.ll

	; RUN: llc < %s -mtriple=aarch64-eabi \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-eabi \| FileCheck %s

	; Convert mul x, pow2 to shift.			; Convert mul x, pow2 to shift.
	; Convert mul x, pow2 +/- 1 to shift + add/sub.			; Convert mul x, pow2 +/- 1 to shift + add/sub.
				; Convert mul x, (pow2 + 1) * pow2 to shift + add + shift.
				rengolinUnsubmitted Not Done Reply Inline Actions You say "shift+add+shift" but your tests are on the form "add+shift". rengolin: You say "shift+add+shift" but your tests are on the form "add+shift".

	define i32 @test2(i32 %x) {			define i32 @test2(i32 %x) {
	; CHECK-LABEL: test2			; CHECK-LABEL: test2
	; CHECK: lsl w0, w0, #1			; CHECK: lsl w0, w0, #1

	%mul = shl nsw i32 %x, 1			%mul = shl nsw i32 %x, 1
	ret i32 %mul			ret i32 %mul
	}			}
	Show All 18 Lines
	; CHECK-LABEL: test5			; CHECK-LABEL: test5
	; CHECK: add w0, w0, w0, lsl #2			; CHECK: add w0, w0, w0, lsl #2


	%mul = mul nsw i32 %x, 5			%mul = mul nsw i32 %x, 5
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @test6(i32 %x) {
				; CHECK-LABEL: test6
				; CHECK: add w8, w0, w0, lsl #1
				; CHECK: lsl w0, w8, #1

				%mul = mul nsw i32 %x, 6
				ret i32 %mul
				}

	define i32 @test7(i32 %x) {			define i32 @test7(i32 %x) {
	; CHECK-LABEL: test7			; CHECK-LABEL: test7
	; CHECK: lsl {{w[0-9]+}}, w0, #3			; CHECK: lsl {{w[0-9]+}}, w0, #3
	; CHECK: sub w0, {{w[0-9]+}}, w0			; CHECK: sub w0, {{w[0-9]+}}, w0

	%mul = mul nsw i32 %x, 7			%mul = mul nsw i32 %x, 7
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test8(i32 %x) {			define i32 @test8(i32 %x) {
	; CHECK-LABEL: test8			; CHECK-LABEL: test8
	; CHECK: lsl w0, w0, #3			; CHECK: lsl w0, w0, #3

	%mul = shl nsw i32 %x, 3			%mul = shl nsw i32 %x, 3
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test9(i32 %x) {			define i32 @test9(i32 %x) {
	; CHECK-LABEL: test9			; CHECK-LABEL: test9
	; CHECK: add w0, w0, w0, lsl #3			; CHECK: add w0, w0, w0, lsl #3

	%mul = mul nsw i32 %x, 9			%mul = mul nsw i32 %x, 9
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @test10(i32 %x) {
				; CHECK-LABEL: test10
				; CHECK: add w8, w0, w0, lsl #2
				; CHECK: lsl w0, w8, #1

				%mul = mul nsw i32 %x, 10
				ret i32 %mul
				}
	; Convert mul x, -pow2 to shift.			; Convert mul x, -pow2 to shift.
	; Convert mul x, -(pow2 +/- 1) to shift + add/sub.			; Convert mul x, -(pow2 +/- 1) to shift + add/sub.

	define i32 @ntest2(i32 %x) {			define i32 @ntest2(i32 %x) {
	; CHECK-LABEL: ntest2			; CHECK-LABEL: ntest2
	; CHECK: neg w0, w0, lsl #1			; CHECK: neg w0, w0, lsl #1

	%mul = mul nsw i32 %x, -2			%mul = mul nsw i32 %x, -2
	Show All 21 Lines
	; CHECK: add {{w[0-9]+}}, w0, w0, lsl #2			; CHECK: add {{w[0-9]+}}, w0, w0, lsl #2
	; CHECK: neg w0, {{w[0-9]+}}			; CHECK: neg w0, {{w[0-9]+}}
	%mul = mul nsw i32 %x, -5			%mul = mul nsw i32 %x, -5
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @ntest7(i32 %x) {			define i32 @ntest7(i32 %x) {
	; CHECK-LABEL: ntest7			; CHECK-LABEL: ntest7
	; CHECK: sub w0, w0, w0, lsl #3			; CHECK: sub w0, w0, w0, lsl #3
				rengolinUnsubmitted Done Reply Inline Actions Please use {{w[0-9]+}} instead of w8. rengolin: Please use {{w[0-9]+}} instead of w8.

	%mul = mul nsw i32 %x, -7			%mul = mul nsw i32 %x, -7
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @ntest8(i32 %x) {			define i32 @ntest8(i32 %x) {
	; CHECK-LABEL: ntest8			; CHECK-LABEL: ntest8
	; CHECK: neg w0, w0, lsl #3			; CHECK: neg w0, w0, lsl #3
	Show All 13 Lines