This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
28/34
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/2
mul_pow2.ll

Differential D25966

[AArch64] Lower multiplication by a constant int to shl+add+shl
ClosedPublic

Authored by haicheng on Oct 25 2016, 2:23 PM.

Download Raw Diff

Details

Reviewers

Gerolf
mcrosier

Summary

GCC can lower a = b * C where C = (2^n + 1) * 2^m to

add     w0, w0, w0, lsl n
lsl     w0, w0, m

also lower C = (2^n - 1) * 2^m to

lsl     w1, w0, n
sub     w0, w1, w0
lsl     w0, w0, m

LLVM cannot do either above transformations and generate code like this

mov     w8, C    
mul     w0, w0, w8

This change considers the first case, since the second case requires an extra instruction. The change is also very conservative to try not to touch the mul that can be folded into s(u)mull, madd(sub), s(u)madd(sub)l since their costs seem unknown during ISelLowering.

I am also open to suggestions about a better place (machine-combiner???) to implement the transformation. If I have the information of the cycles of 32 and 64bit mul, I can consider more constants such as C = (2^m + 1) * (2^n+1), C = (2^m + 1) * 2^n + 1, or C = ((2^m + 1) * 2^n + 1) * 2^p

Diff Detail

Repository: rL LLVM

Event Timeline

haicheng updated this revision to Diff 75789.Oct 25 2016, 2:23 PM

haicheng retitled this revision from to [AArch64] Lower multiplication by a constant int to shl+add+shl.

haicheng updated this object.

haicheng added reviewers: mcrosier, Gerolf.

haicheng set the repository for this revision to rL LLVM.

haicheng added a subscriber: llvm-commits.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptOct 25 2016, 2:23 PM

Hi Haicheng,

This looks simple enough to be worth, even if the benefit is probably very small. But as it is, the code is complicating one part of two identical twins (positive and negative) and not the other, which complicates the code.

I recommend you to change that part of the code entirely into setting temporary variables, like +1/-1 ISD::ADD/ISD::SUB, based on the result of isNonNegative, and use the same piece of code for both paths.

A more generic approach could be done with some smarter constant-splitting, but this patch is simple as it is and there is already plenty of prior art for that, so let's stick to the pattern.

Also, I was expecting a much larger body of tests, with different constant sizes, multiple edge cases, and those that cannot be done, remaining a mul.

I have some comments inline, but my only additional question is: What is the motivation behind this? Benchmark numbers? Can you share them?

cheers,
--renato

lib/Target/AArch64/AArch64ISelLowering.cpp
7624	why not early return? why is this not a problem for the previous case as well?
7633	`Shift` is not a good name, since this implies the "shift amount" not the "shifted value".
test/CodeGen/AArch64/mul_pow2.ll
5	You say "shift+add+shift" but your tests are on the form "add+shift".

Hi Haicheng,

I just have a few observations/food for thought:

Nit: In your Summary I think you swapped n and m in your code snippets vs your formulas. Your code is correct though.
The 2^N-1 * 2^M reduction increases code size, so it should not fire under Oz. Otherwise similar consideration as to your major case apply
The 2^N+1 * 2^M reduction increases schedule height (at least on most processors). It might also increase code when e.g. add+mul could be combined to madd. But when code size is *not* a concern and latency(lsl) + 1 < latency (mul), latency(madd) it should always be a win. But that target dependence is not checked in your code yet.
I would look at the machine combiner only for cases that need more global scheduling context to decide

Like Renato I'm also curious about your gains. How big? Which benchmarks?

Cheers
Gerolf

haicheng updated this object.Oct 28 2016, 9:52 AM

haicheng edited edge metadata.

Rewrite performMulCombine(), make the conversion a little less conservative to improve the performance and reduce the compilation time, add more tests.

Thank you, Renato. I rewrote my change and added more tests, please let me know if I did what you recommended.

In D25966#579090, @rengolin wrote:

I have some comments inline, but my only additional question is: What is the motivation behind this? Benchmark numbers? Can you share them?

The biggest motivation is that GCC can do this, but LLVM cannot. My patch is conservative and it does not make big change to the performance. I have not observed any noticeable regression, but the gain is small. spec2006/h264ref and spec2006/povray have around 1% improvement. One internal benchmark which is integer multiplication centric has much larger improvement.

rengolin added inline comments.Oct 29 2016, 9:02 AM

test/CodeGen/AArch64/mul_pow2.ll
279	Please use {{w[0-9]+}} instead of w8.

In D25966#582890, @haicheng wrote:

The biggest motivation is that GCC can do this, but LLVM cannot. My patch is conservative and it does not make big change to the performance. I have not observed any noticeable regression, but the gain is small. spec2006/h264ref and spec2006/povray have around 1% improvement. One internal benchmark which is integer multiplication centric has much larger improvement.

Any regressions? Not that I'm expecting any, but... :)

I'll come back a bit later once I've done a proper review.

Thanks!

Thank you, Gerolf

In D25966#581742, @Gerolf wrote:

Hi Haicheng,

I just have a few observations/food for thought:

Nit: In your Summary I think you swapped n and m in your code snippets vs your formulas. Your code is correct though.

Thank you for catching this. I updated the summary.

The 2^N-1 * 2^M reduction increases code size, so it should not fire under Oz. Otherwise similar consideration as to your major case apply

The 2^N+1 * 2^M reduction increases schedule height (at least on most processors). It might also increase code when e.g. add+mul could be combined to madd. But when code size is *not* a concern and latency(lsl) + 1 < latency (mul), latency(madd) it should always be a win. But that target dependence is not checked in your code yet.

I would look at the machine combiner only for cases that need more global scheduling context to decide

I agree everything you said. I tried to be conservative in this patch to not increase code size or impact the generation of madd. If I want to support my cases, I think I need to check the target and compare the cost of different code sequences.

Like Renato I'm also curious about your gains. How big? Which benchmarks?

Please see my response to Renato above.

Update the tests according to Renato's comments.

Hi,

I have a few variable name proposals, mostly to aid the understanding of the code. I apologise for beating on that key, but the code is getting more dense, less repetitive, so it has to be well understood.

I'd welcome another review from @Gerolf at this point. :)

cheers,
--renato

lib/Target/AArch64/AArch64ISelLowering.cpp
7620	I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/- 1)
7628	`N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to understand.
7630	`Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name for this.
7647	Better call this `SwapValues`, as this is the intention of the flag.
7651	`VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the case where `VM1` is a power of two, then `Lg2` is zero and `ShiftedInt == Value`, but not always. I wouldn't mind `ShiftedMinus1` and `ValuePlus1`.
7665	Feel free to hoist those two flags out of the conditional. This will make it clear that they're invariants here.

Rename the variable names as suggested by Renato. Thank you.

I thought I understand this until about the middle of the review. Now I could use some help perhaps with variable names and comments that reflect more clearly on the expression(s) you simplify. I think this is what Renato is looking for, too.

Thanks
Gerolf

lib/Target/AArch64/AArch64ISelLowering.cpp
7617–7660	You could tie it more to the code, e.g. some multiplications Var * C can be ...
7620	I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = A*B where A, B are of type +/- (2^N +/- 1)"?
7625	ValueOfC?
7633	no -> not
7638	dito
7644	If you declare e.g C = A * B then ShiftedInt could be ConstantA etc
7647	Operation would be more general than AddOrSub
7649	Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp?
7651	ShiftedMinus1 could be ConstandAMinus1
7652	Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt (ConstantA) -1. At this point I can't match your specification and your code. However, if I"m right about this I will need to dig deep into your test cases, too ...
7670	After this point I think you can assert(IntValue == 2^N, some power of 2).
7672	I think Value should be ShiftedMinus1 from here on.
7689–7702	I'll take another look at this code after I (think I) understand the code above.
7695	It is not clear to me why TrailingZeros and ExtraNeg are exclusive.

Address Gerolf's comments.

haicheng added inline comments.Nov 9 2016, 8:03 AM

lib/Target/AArch64/AArch64ISelLowering.cpp
7652	ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not support (mul x, (2^N - 1) * 2^M) => (shl (sub (shl x, N), x), M) yet. If we want to support it in the future, we just need to use ConstantA here as you said.
7672	Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x, -(2^N + 1) * 2^M) => -(shl (add (shl x, N), x), M) So I use ValueOfC here.

Hi Gerolf,

Would you please take another look? Does my latest update make the code easier to read?

Thank you,

Haicheng

I've done a bit of refactoring in r286601 and r286606, which I hope makes this a much easier code review. If you rebase the patch, I'd be happy to take a look.

FYI, Chad has made some big refactoring in this area, you will have to re-base:

http://llvm.org/viewvc/llvm-project?rev=286606&view=rev

Rebase the code.

With the minor comment, this looks good to me. But I'll let @Gerolf and @mcrosier have a final look and approve.

Thanks!

lib/Target/AArch64/AArch64ISelLowering.cpp
7617	IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA == ConstValue`. Your comment should reflect that. Not here, but above, before `ConstantA`'s instantiation. Here, you can just add the new case: // (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)

Address Renato's comment. Thank you.

LGTM with a few nits about naming of variables.

lib/Target/AArch64/AArch64ISelLowering.cpp
7616–7617	CAMinus1 -> SCVMinus1
7617	CAMinus1 -> SCVMinus1
7622	'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering.
7638	I'd prefer 'ShiftedConstValue' over 'ConstantA'.
7688–7702	Var -> N0
7688–7702	Var -> N0
7688–7702	Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the last bit of logic here can be written as: // Negate the result. if (NegateResult) return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Res); // Shift the result. if (TrailingZeroes) return DAG.getNode(ISD::SHL, DL, VT, Res, DAG.getConstant(TrailingZeroes, DL, MVT::i64)); return Res;
7689	Var -> N0

This revision is now accepted and ready to land.Nov 14 2016, 6:52 AM

Thanks for following up!
LGTM

lib/Target/AArch64/AArch64ISelLowering.cpp
7639	CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear.

This was committed in r287019.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

124 lines

test/

CodeGen/

AArch64/

mul_pow2.ll

243 lines

Diff 76762

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,607 Lines • ▼ Show 20 Lines	static SDValue performMulCombine(SDNode *N, SelectionDAG &DAG,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

// Multiplication of a power of two plus/minus one can be done more		// Multiplication of a power of two plus/minus one can be done more
// cheaply as as shift+add/sub. For now, this is true unilaterally. If		// cheaply as as shift+add/sub. For now, this is true unilaterally. If
// future CPUs have a cheaper MADD instruction, this may need to be		// future CPUs have a cheaper MADD instruction, this may need to be
// gated on a subtarget feature. For Cyclone, 32-bit MADD is 4 cycles and		// gated on a subtarget feature. For Cyclone, 32-bit MADD is 4 cycles and
// 64-bit is 5 cycles, so this is always a win.		// 64-bit is 5 cycles, so this is always a win.
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(N->getOperand(1))) {		// More aggressively, some multiplications Var * C can be lowered to
		rengolinUnsubmitted Done Reply Inline Actions IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA == ConstValue`. Your comment should reflect that. Not here, but above, before `ConstantA`'s instantiation. Here, you can just add the new case: // (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M) rengolin: IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA ==…
		mcrosierUnsubmitted Not Done Reply Inline Actions CAMinus1 -> SCVMinus1 mcrosier: CAMinus1 -> SCVMinus1
		mcrosierUnsubmitted Not Done Reply Inline Actions CAMinus1 -> SCVMinus1 mcrosier: CAMinus1 -> SCVMinus1
const APInt &Value = C->getAPIntValue();		// shift+add+shift if the constant C = A * B where A = 2^N + 1 and B = 2^M,
		// e.g. 6=32=(2+1)2.
		// TODO: consider lowering more cases, e.g. C = 14, -6, -14 or even 45
		rengolinUnsubmitted Done Reply Inline Actions I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/- 1) rengolin: I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/…
		GerolfUnsubmitted Done Reply Inline Actions I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = AB where A, B are of type +/- (2^N +/- 1)"? Gerolf:* I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = A*B where…
		// which equals to (1+2)*16-(1+2).
		auto C = dyn_cast<ConstantSDNode>(N->getOperand(1));
		mcrosierUnsubmitted Not Done Reply Inline Actions 'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering. mcrosier: 'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering.
		if (!C)
		return SDValue();
		rengolinUnsubmitted Done Reply Inline Actions why not early return? why is this not a problem for the previous case as well? rengolin: why not early return? why is this not a problem for the previous case as well?

		GerolfUnsubmitted Done Reply Inline Actions ValueOfC? Gerolf: ValueOfC?
		const APInt &ValueOfC = C->getAPIntValue();
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
		rengolinUnsubmitted Done Reply Inline Actions `N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to understand. rengolin: `N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to…
if (Value.isNonNegative()) {		SDValue Var = N->getOperand(0);
		// TrailingZeroes is used to test if the mul can be lowered to
		rengolinUnsubmitted Done Reply Inline Actions `Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name for this. rengolin: `Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name…
		// shift+add+shift.
		unsigned TrailingZeroes = ValueOfC.countTrailingZeros();
		if (TrailingZeroes) {
		rengolinUnsubmitted Done Reply Inline Actions `Shift` is not a good name, since this implies the "shift amount" not the "shifted value". rengolin: `Shift` is not a good name, since this implies the "shift amount" not the "shifted value".
		GerolfUnsubmitted Done Reply Inline Actions no -> not Gerolf: no -> not
		// Conservatively do not lower to shift+add+shift if the mul might be
		// folded into smul or umul.
		if (Var->hasOneUse() && (isSignExtended(Var.getNode(), DAG) \|\|
		isZeroExtended(Var.getNode(), DAG)))
		return SDValue();
		GerolfUnsubmitted Done Reply Inline Actions dito Gerolf: dito
		mcrosierUnsubmitted Not Done Reply Inline Actions I'd prefer 'ShiftedConstValue' over 'ConstantA'. mcrosier: I'd prefer 'ShiftedConstValue' over 'ConstantA'.
		// Conservatively do not lower to shift+add+shift if the mul might be
		GerolfUnsubmitted Not Done Reply Inline Actions CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear. Gerolf: CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear.
		// folded into madd or msub.
		if (N->hasOneUse() && (N->use_begin()->getOpcode() == ISD::ADD \|\|
		N->use_begin()->getOpcode() == ISD::SUB))
		return SDValue();
		}
		GerolfUnsubmitted Done Reply Inline Actions If you declare e.g C = A * B then ShiftedInt could be ConstantA etc Gerolf: If you declare e.g C = A * B then ShiftedInt could be ConstantA etc
		APInt ConstantA = ValueOfC.ashr(TrailingZeroes);

		APInt IntValue;
		rengolinUnsubmitted Done Reply Inline Actions Better call this `SwapValues`, as this is the intention of the flag. rengolin: Better call this `SwapValues`, as this is the intention of the flag.
		GerolfUnsubmitted Done Reply Inline Actions Operation would be more general than AddOrSub Gerolf: Operation would be more general than AddOrSub
		unsigned Operation;
		// SwapValues decides (Var - ShiftedValue) or (ShiftedValue - Var). It does
		GerolfUnsubmitted Done Reply Inline Actions Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp? Gerolf: Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp?
		// not matter if the operation is Add.
		bool SwapValues;
		rengolinUnsubmitted Done Reply Inline Actions `VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the case where `VM1` is a power of two, then `Lg2` is zero and `ShiftedInt == Value`, but not always. I wouldn't mind `ShiftedMinus1` and `ValuePlus1`. rengolin: `VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the…
		GerolfUnsubmitted Done Reply Inline Actions ShiftedMinus1 could be ConstandAMinus1 Gerolf: ShiftedMinus1 could be ConstandAMinus1
		// ExtraNeg decides if a Neg is needed at last if C is negative.
		GerolfUnsubmitted Not Done Reply Inline Actions Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt (ConstantA) -1. At this point I can't match your specification and your code. However, if I"m right about this I will need to dig deep into your test cases, too ... Gerolf: Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt…
		haichengAuthorUnsubmitted Not Done Reply Inline Actions ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not support (mul x, (2^N - 1) * 2^M) => (shl (sub (shl x, N), x), M) yet. If we want to support it in the future, we just need to use ConstantA here as you said. haicheng: ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not…
		bool ExtraNeg;
		if (ValueOfC.isNonNegative()) {
		// add+shl+add is supported. Use ConstantA instead of ValueOfC.
		APInt ConstantAMinus1 = ConstantA - 1;
		APInt ValueOfCPlus1 = ValueOfC + 1;
		SwapValues = false;
		ExtraNeg = false;
		if (ConstantAMinus1.isPowerOf2()) {
		GerolfUnsubmitted Done Reply Inline Actions You could tie it more to the code, e.g. some multiplications Var * C can be ... Gerolf: You could tie it more to the code, e.g. some multiplications Var * C can be ...
// (mul x, 2^N + 1) => (add (shl x, N), x)		// (mul x, 2^N + 1) => (add (shl x, N), x)
APInt VM1 = Value - 1;		// Or (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)
if (VM1.isPowerOf2()) {		IntValue = ConstantAMinus1;
SDValue ShiftedVal =		Operation = ISD::ADD;
DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),		} else if (ValueOfCPlus1.isPowerOf2()) {
		rengolinUnsubmitted Done Reply Inline Actions Feel free to hoist those two flags out of the conditional. This will make it clear that they're invariants here. rengolin: Feel free to hoist those two flags out of the conditional. This will make it clear that they're…
DAG.getConstant(VM1.logBase2(), DL, MVT::i64));
return DAG.getNode(ISD::ADD, DL, VT, ShiftedVal,
N->getOperand(0));
}
// (mul x, 2^N - 1) => (sub (shl x, N), x)		// (mul x, 2^N - 1) => (sub (shl x, N), x)
APInt VP1 = Value + 1;		IntValue = ValueOfCPlus1;
if (VP1.isPowerOf2()) {		Operation = ISD::SUB;
SDValue ShiftedVal =		} else
DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),		return SDValue();
		GerolfUnsubmitted Done Reply Inline Actions After this point I think you can assert(IntValue == 2^N, some power of 2). Gerolf: After this point I think you can assert(IntValue == 2^N, some power of 2).
DAG.getConstant(VP1.logBase2(), DL, MVT::i64));
return DAG.getNode(ISD::SUB, DL, VT, ShiftedVal,
N->getOperand(0));
}
} else {		} else {
		APInt NegativeValueOfCPlus1 = -ValueOfC + 1;
		GerolfUnsubmitted Not Done Reply Inline Actions I think Value should be ShiftedMinus1 from here on. Gerolf: I think Value should be ShiftedMinus1 from here on.
		haichengAuthorUnsubmitted Not Done Reply Inline Actions Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x, -(2^N + 1) * 2^M) => -(shl (add (shl x, N), x), M) So I use ValueOfC here. haicheng: Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x…
		APInt NegativeValueOfCMinus1 = -ValueOfC - 1;
		if (NegativeValueOfCPlus1.isPowerOf2()) {
// (mul x, -(2^N - 1)) => (sub x, (shl x, N))		// (mul x, -(2^N - 1)) => (sub x, (shl x, N))
APInt VNP1 = -Value + 1;		IntValue = NegativeValueOfCPlus1;
if (VNP1.isPowerOf2()) {		Operation = ISD::SUB;
SDValue ShiftedVal =		SwapValues = true;
DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),		ExtraNeg = false;
DAG.getConstant(VNP1.logBase2(), DL, MVT::i64));		} else if (NegativeValueOfCMinus1.isPowerOf2()) {
return DAG.getNode(ISD::SUB, DL, VT, N->getOperand(0),
ShiftedVal);
}
// (mul x, -(2^N + 1)) => - (add (shl x, N), x)		// (mul x, -(2^N + 1)) => -(add (shl x, N), x)
APInt VNM1 = -Value - 1;		IntValue = NegativeValueOfCMinus1;
if (VNM1.isPowerOf2()) {		Operation = ISD::ADD;
SDValue ShiftedVal =		SwapValues = false;
DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),		ExtraNeg = true;
DAG.getConstant(VNM1.logBase2(), DL, MVT::i64));		} else
SDValue Add =
DAG.getNode(ISD::ADD, DL, VT, ShiftedVal, N->getOperand(0));
return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Add);
}
}
}
return SDValue();		return SDValue();
}		}
		assert(IntValue.isPowerOf2() && "IntValue must be power of 2");
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
		SDValue ShiftedVal =
		DAG.getNode(ISD::SHL, DL, VT, Var,
		DAG.getConstant(IntValue.logBase2(), DL, MVT::i64));
		SDValue AddOrSubVal =
		DAG.getNode(Operation, DL, VT, SwapValues ? Var : ShiftedVal,
		SwapValues ? ShiftedVal : Var);
		GerolfUnsubmitted Done Reply Inline Actions It is not clear to me why TrailingZeros and ExtraNeg are exclusive. Gerolf: It is not clear to me why TrailingZeros and ExtraNeg are exclusive.

		if (TrailingZeroes == 0 && !ExtraNeg)
		return AddOrSubVal;
		if (TrailingZeroes)
		return DAG.getNode(ISD::SHL, DL, VT, AddOrSubVal,
		DAG.getConstant(TrailingZeroes, DL, MVT::i64));
		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), AddOrSubVal);
		GerolfUnsubmitted Not Done Reply Inline Actions I'll take another look at this code after I (think I) understand the code above. Gerolf: I'll take another look at this code after I (think I) understand the code above.
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
		mcrosierUnsubmitted Not Done Reply Inline Actions Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the last bit of logic here can be written as: // Negate the result. if (NegateResult) return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Res); // Shift the result. if (TrailingZeroes) return DAG.getNode(ISD::SHL, DL, VT, Res, DAG.getConstant(TrailingZeroes, DL, MVT::i64)); return Res; mcrosier: Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the…
		}

static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,		static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// Take advantage of vector comparisons producing 0 or -1 in each lane to		// Take advantage of vector comparisons producing 0 or -1 in each lane to
// optimize away operation when it's from a constant.		// optimize away operation when it's from a constant.
//		//
// The general transformation is:		// The general transformation is:
// UNARYOP(AND(VECTOR_CMP(x,y), constant)) -->		// UNARYOP(AND(VECTOR_CMP(x,y), constant)) -->
▲ Show 20 Lines • Show All 2,786 Lines • Show Last 20 Lines

test/CodeGen/AArch64/mul_pow2.ll

	; RUN: llc < %s -mtriple=aarch64-eabi \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-eabi \| FileCheck %s

	; Convert mul x, pow2 to shift.			; Convert mul x, pow2 to shift.
	; Convert mul x, pow2 +/- 1 to shift + add/sub.			; Convert mul x, pow2 +/- 1 to shift + add/sub.
				; Convert mul x, (pow2 + 1) * pow2 to shift + add + shift.
				rengolinUnsubmitted Not Done Reply Inline Actions You say "shift+add+shift" but your tests are on the form "add+shift". rengolin: You say "shift+add+shift" but your tests are on the form "add+shift".
				; Lowering other positive constants are not supported yet.

	define i32 @test2(i32 %x) {			define i32 @test2(i32 %x) {
	; CHECK-LABEL: test2			; CHECK-LABEL: test2
	; CHECK: lsl w0, w0, #1			; CHECK: lsl w0, w0, #1

	%mul = shl nsw i32 %x, 1			%mul = shl nsw i32 %x, 1
	ret i32 %mul			ret i32 %mul
	}			}
	Show All 18 Lines
	; CHECK-LABEL: test5			; CHECK-LABEL: test5
	; CHECK: add w0, w0, w0, lsl #2			; CHECK: add w0, w0, w0, lsl #2


	%mul = mul nsw i32 %x, 5			%mul = mul nsw i32 %x, 5
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @test6_32b(i32 %x) {
				; CHECK-LABEL: test6
				; CHECK: add {{w[0-9]+}}, w0, w0, lsl #1
				; CHECK: lsl w0, {{w[0-9]+}}, #1

				%mul = mul nsw i32 %x, 6
				ret i32 %mul
				}

				define i64 @test6_64b(i64 %x) {
				; CHECK-LABEL: test6_64b
				; CHECK: add {{x[0-9]+}}, x0, x0, lsl #1
				; CHECK: lsl x0, {{x[0-9]+}}, #1

				%mul = mul nsw i64 %x, 6
				ret i64 %mul
				}

				; mul that appears together with add, sub, s(z)ext is not supported to be
				; converted to the combination of lsl, add/sub yet.
				define i64 @test6_umull(i32 %x) {
				; CHECK-LABEL: test6_umull
				; CHECK: umull x0, w0, {{w[0-9]+}}

				%ext = zext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				ret i64 %mul
				}

				define i64 @test6_smull(i32 %x) {
				; CHECK-LABEL: test6_smull
				; CHECK: smull x0, w0, {{w[0-9]+}}

				%ext = sext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				ret i64 %mul
				}

				define i32 @test6_madd(i32 %x, i32 %y) {
				; CHECK-LABEL: test6_madd
				; CHECK: madd w0, w0, {{w[0-9]+}}, w1

				%mul = mul nsw i32 %x, 6
				%add = add i32 %mul, %y
				ret i32 %add
				}

				define i32 @test6_msub(i32 %x, i32 %y) {
				; CHECK-LABEL: test6_msub
				; CHECK: msub w0, w0, {{w[0-9]+}}, w1

				%mul = mul nsw i32 %x, 6
				%sub = sub i32 %y, %mul
				ret i32 %sub
				}

				define i64 @test6_umaddl(i32 %x, i64 %y) {
				; CHECK-LABEL: test6_umaddl
				; CHECK: umaddl x0, w0, {{w[0-9]+}}, x1

				%ext = zext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%add = add i64 %mul, %y
				ret i64 %add
				}

				define i64 @test6_smaddl(i32 %x, i64 %y) {
				; CHECK-LABEL: test6_smaddl
				; CHECK: smaddl x0, w0, {{w[0-9]+}}, x1

				%ext = sext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%add = add i64 %mul, %y
				ret i64 %add
				}

				define i64 @test6_umsubl(i32 %x, i64 %y) {
				; CHECK-LABEL: test6_umsubl
				; CHECK: umsubl x0, w0, {{w[0-9]+}}, x1

				%ext = zext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%sub = sub i64 %y, %mul
				ret i64 %sub
				}

				define i64 @test6_smsubl(i32 %x, i64 %y) {
				; CHECK-LABEL: test6_smsubl
				; CHECK: smsubl x0, w0, {{w[0-9]+}}, x1

				%ext = sext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%sub = sub i64 %y, %mul
				ret i64 %sub
				}

				define i64 @test6_umnegl(i32 %x) {
				; CHECK-LABEL: test6_umnegl
				; CHECK: umnegl x0, w0, {{w[0-9]+}}

				%ext = zext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @test6_smnegl(i32 %x) {
				; CHECK-LABEL: test6_smnegl
				; CHECK: smnegl x0, w0, {{w[0-9]+}}

				%ext = sext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

	define i32 @test7(i32 %x) {			define i32 @test7(i32 %x) {
	; CHECK-LABEL: test7			; CHECK-LABEL: test7
	; CHECK: lsl {{w[0-9]+}}, w0, #3			; CHECK: lsl {{w[0-9]+}}, w0, #3
	; CHECK: sub w0, {{w[0-9]+}}, w0			; CHECK: sub w0, {{w[0-9]+}}, w0

	%mul = mul nsw i32 %x, 7			%mul = mul nsw i32 %x, 7
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test8(i32 %x) {			define i32 @test8(i32 %x) {
	; CHECK-LABEL: test8			; CHECK-LABEL: test8
	; CHECK: lsl w0, w0, #3			; CHECK: lsl w0, w0, #3

	%mul = shl nsw i32 %x, 3			%mul = shl nsw i32 %x, 3
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test9(i32 %x) {			define i32 @test9(i32 %x) {
	; CHECK-LABEL: test9			; CHECK-LABEL: test9
	; CHECK: add w0, w0, w0, lsl #3			; CHECK: add w0, w0, w0, lsl #3

	%mul = mul nsw i32 %x, 9			%mul = mul nsw i32 %x, 9
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @test10(i32 %x) {
				; CHECK-LABEL: test10
				; CHECK: add {{w[0-9]+}}, w0, w0, lsl #2
				; CHECK: lsl w0, {{w[0-9]+}}, #1

				%mul = mul nsw i32 %x, 10
				ret i32 %mul
				}

				define i32 @test11(i32 %x) {
				; CHECK-LABEL: test11
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, 11
				ret i32 %mul
				}

				define i32 @test12(i32 %x) {
				; CHECK-LABEL: test12
				; CHECK: add {{w[0-9]+}}, w0, w0, lsl #1
				; CHECK: lsl w0, {{w[0-9]+}}, #2

				%mul = mul nsw i32 %x, 12
				ret i32 %mul
				}

				define i32 @test13(i32 %x) {
				; CHECK-LABEL: test13
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, 13
				ret i32 %mul
				}

				define i32 @test14(i32 %x) {
				; CHECK-LABEL: test14
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, 14
				ret i32 %mul
				}

				define i32 @test15(i32 %x) {
				; CHECK-LABEL: test15
				; CHECK: lsl {{w[0-9]+}}, w0, #4
				; CHECK: sub w0, {{w[0-9]+}}, w0

				%mul = mul nsw i32 %x, 15
				ret i32 %mul
				}

				define i32 @test16(i32 %x) {
				; CHECK-LABEL: test16
				; CHECK: lsl w0, w0, #4

				%mul = mul nsw i32 %x, 16
				ret i32 %mul
				}

	; Convert mul x, -pow2 to shift.			; Convert mul x, -pow2 to shift.
	; Convert mul x, -(pow2 +/- 1) to shift + add/sub.			; Convert mul x, -(pow2 +/- 1) to shift + add/sub.
				; Lowering other negative constants are not supported yet.

	define i32 @ntest2(i32 %x) {			define i32 @ntest2(i32 %x) {
	; CHECK-LABEL: ntest2			; CHECK-LABEL: ntest2
	; CHECK: neg w0, w0, lsl #1			; CHECK: neg w0, w0, lsl #1

	%mul = mul nsw i32 %x, -2			%mul = mul nsw i32 %x, -2
	ret i32 %mul			ret i32 %mul
	}			}
	Show All 17 Lines
	define i32 @ntest5(i32 %x) {			define i32 @ntest5(i32 %x) {
	; CHECK-LABEL: ntest5			; CHECK-LABEL: ntest5
	; CHECK: add {{w[0-9]+}}, w0, w0, lsl #2			; CHECK: add {{w[0-9]+}}, w0, w0, lsl #2
	; CHECK: neg w0, {{w[0-9]+}}			; CHECK: neg w0, {{w[0-9]+}}
	%mul = mul nsw i32 %x, -5			%mul = mul nsw i32 %x, -5
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @ntest6(i32 %x) {
				; CHECK-LABEL: ntest6
				; CHECK: mul w0, w0, {{w[0-9]+}}
				rengolinUnsubmitted Done Reply Inline Actions Please use {{w[0-9]+}} instead of w8. rengolin: Please use {{w[0-9]+}} instead of w8.

				%mul = mul nsw i32 %x, -6
				ret i32 %mul
				}

	define i32 @ntest7(i32 %x) {			define i32 @ntest7(i32 %x) {
	; CHECK-LABEL: ntest7			; CHECK-LABEL: ntest7
	; CHECK: sub w0, w0, w0, lsl #3			; CHECK: sub w0, w0, w0, lsl #3

	%mul = mul nsw i32 %x, -7			%mul = mul nsw i32 %x, -7
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @ntest8(i32 %x) {			define i32 @ntest8(i32 %x) {
	; CHECK-LABEL: ntest8			; CHECK-LABEL: ntest8
	; CHECK: neg w0, w0, lsl #3			; CHECK: neg w0, w0, lsl #3

	%mul = mul nsw i32 %x, -8			%mul = mul nsw i32 %x, -8
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @ntest9(i32 %x) {			define i32 @ntest9(i32 %x) {
	; CHECK-LABEL: ntest9			; CHECK-LABEL: ntest9
	; CHECK: add {{w[0-9]+}}, w0, w0, lsl #3			; CHECK: add {{w[0-9]+}}, w0, w0, lsl #3
	; CHECK: neg w0, {{w[0-9]+}}			; CHECK: neg w0, {{w[0-9]+}}

	%mul = mul nsw i32 %x, -9			%mul = mul nsw i32 %x, -9
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @ntest10(i32 %x) {
				; CHECK-LABEL: ntest10
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, -10
				ret i32 %mul
				}

				define i32 @ntest11(i32 %x) {
				; CHECK-LABEL: ntest11
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, -11
				ret i32 %mul
				}

				define i32 @ntest12(i32 %x) {
				; CHECK-LABEL: ntest12
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, -12
				ret i32 %mul
				}

				define i32 @ntest13(i32 %x) {
				; CHECK-LABEL: ntest13
				; CHECK: mul w0, w0, {{w[0-9]+}}
				%mul = mul nsw i32 %x, -13
				ret i32 %mul
				}

				define i32 @ntest14(i32 %x) {
				; CHECK-LABEL: ntest14
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, -14
				ret i32 %mul
				}

				define i32 @ntest15(i32 %x) {
				; CHECK-LABEL: ntest15
				; CHECK: sub w0, w0, w0, lsl #4

				%mul = mul nsw i32 %x, -15
				ret i32 %mul
				}

				define i32 @ntest16(i32 %x) {
				; CHECK-LABEL: ntest16
				; CHECK: neg w0, w0, lsl #4

				%mul = mul nsw i32 %x, -16
				ret i32 %mul
				}