This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
28/34
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/2
mul_pow2.ll

Differential D25966

[AArch64] Lower multiplication by a constant int to shl+add+shl
ClosedPublic

Authored by haicheng on Oct 25 2016, 2:23 PM.

Download Raw Diff

Details

Reviewers

Gerolf
mcrosier

Summary

GCC can lower a = b * C where C = (2^n + 1) * 2^m to

add     w0, w0, w0, lsl n
lsl     w0, w0, m

also lower C = (2^n - 1) * 2^m to

lsl     w1, w0, n
sub     w0, w1, w0
lsl     w0, w0, m

LLVM cannot do either above transformations and generate code like this

mov     w8, C    
mul     w0, w0, w8

This change considers the first case, since the second case requires an extra instruction. The change is also very conservative to try not to touch the mul that can be folded into s(u)mull, madd(sub), s(u)madd(sub)l since their costs seem unknown during ISelLowering.

I am also open to suggestions about a better place (machine-combiner???) to implement the transformation. If I have the information of the cycles of 32 and 64bit mul, I can consider more constants such as C = (2^m + 1) * (2^n+1), C = (2^m + 1) * 2^n + 1, or C = ((2^m + 1) * 2^n + 1) * 2^p

Diff Detail

Repository: rL LLVM

Event Timeline

haicheng updated this revision to Diff 75789.Oct 25 2016, 2:23 PM

haicheng retitled this revision from to [AArch64] Lower multiplication by a constant int to shl+add+shl.

haicheng updated this object.

haicheng added reviewers: mcrosier, Gerolf.

haicheng set the repository for this revision to rL LLVM.

haicheng added a subscriber: llvm-commits.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptOct 25 2016, 2:23 PM

Hi Haicheng,

This looks simple enough to be worth, even if the benefit is probably very small. But as it is, the code is complicating one part of two identical twins (positive and negative) and not the other, which complicates the code.

I recommend you to change that part of the code entirely into setting temporary variables, like +1/-1 ISD::ADD/ISD::SUB, based on the result of isNonNegative, and use the same piece of code for both paths.

A more generic approach could be done with some smarter constant-splitting, but this patch is simple as it is and there is already plenty of prior art for that, so let's stick to the pattern.

Also, I was expecting a much larger body of tests, with different constant sizes, multiple edge cases, and those that cannot be done, remaining a mul.

I have some comments inline, but my only additional question is: What is the motivation behind this? Benchmark numbers? Can you share them?

cheers,
--renato

lib/Target/AArch64/AArch64ISelLowering.cpp
7634	why not early return? why is this not a problem for the previous case as well?
7643	`Shift` is not a good name, since this implies the "shift amount" not the "shifted value".
test/CodeGen/AArch64/mul_pow2.ll
5	You say "shift+add+shift" but your tests are on the form "add+shift".

Hi Haicheng,

I just have a few observations/food for thought:

Nit: In your Summary I think you swapped n and m in your code snippets vs your formulas. Your code is correct though.
The 2^N-1 * 2^M reduction increases code size, so it should not fire under Oz. Otherwise similar consideration as to your major case apply
The 2^N+1 * 2^M reduction increases schedule height (at least on most processors). It might also increase code when e.g. add+mul could be combined to madd. But when code size is *not* a concern and latency(lsl) + 1 < latency (mul), latency(madd) it should always be a win. But that target dependence is not checked in your code yet.
I would look at the machine combiner only for cases that need more global scheduling context to decide

Like Renato I'm also curious about your gains. How big? Which benchmarks?

Cheers
Gerolf

haicheng updated this object.Oct 28 2016, 9:52 AM

haicheng edited edge metadata.

Rewrite performMulCombine(), make the conversion a little less conservative to improve the performance and reduce the compilation time, add more tests.

Thank you, Renato. I rewrote my change and added more tests, please let me know if I did what you recommended.

In D25966#579090, @rengolin wrote:

I have some comments inline, but my only additional question is: What is the motivation behind this? Benchmark numbers? Can you share them?

The biggest motivation is that GCC can do this, but LLVM cannot. My patch is conservative and it does not make big change to the performance. I have not observed any noticeable regression, but the gain is small. spec2006/h264ref and spec2006/povray have around 1% improvement. One internal benchmark which is integer multiplication centric has much larger improvement.

rengolin added inline comments.Oct 29 2016, 9:02 AM

test/CodeGen/AArch64/mul_pow2.ll
279	Please use {{w[0-9]+}} instead of w8.

In D25966#582890, @haicheng wrote:

The biggest motivation is that GCC can do this, but LLVM cannot. My patch is conservative and it does not make big change to the performance. I have not observed any noticeable regression, but the gain is small. spec2006/h264ref and spec2006/povray have around 1% improvement. One internal benchmark which is integer multiplication centric has much larger improvement.

Any regressions? Not that I'm expecting any, but... :)

I'll come back a bit later once I've done a proper review.

Thanks!

Thank you, Gerolf

In D25966#581742, @Gerolf wrote:

Hi Haicheng,

I just have a few observations/food for thought:

Nit: In your Summary I think you swapped n and m in your code snippets vs your formulas. Your code is correct though.

Thank you for catching this. I updated the summary.

The 2^N-1 * 2^M reduction increases code size, so it should not fire under Oz. Otherwise similar consideration as to your major case apply

The 2^N+1 * 2^M reduction increases schedule height (at least on most processors). It might also increase code when e.g. add+mul could be combined to madd. But when code size is *not* a concern and latency(lsl) + 1 < latency (mul), latency(madd) it should always be a win. But that target dependence is not checked in your code yet.

I would look at the machine combiner only for cases that need more global scheduling context to decide

I agree everything you said. I tried to be conservative in this patch to not increase code size or impact the generation of madd. If I want to support my cases, I think I need to check the target and compare the cost of different code sequences.

Like Renato I'm also curious about your gains. How big? Which benchmarks?

Please see my response to Renato above.

Update the tests according to Renato's comments.

Hi,

I have a few variable name proposals, mostly to aid the understanding of the code. I apologise for beating on that key, but the code is getting more dense, less repetitive, so it has to be well understood.

I'd welcome another review from @Gerolf at this point. :)

cheers,
--renato

lib/Target/AArch64/AArch64ISelLowering.cpp
7630	I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/- 1)
7630	Feel free to hoist those two flags out of the conditional. This will make it clear that they're invariants here.
7638	`N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to understand.
7640	`Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name for this.
7657	Better call this `SwapValues`, as this is the intention of the flag.
7661	`VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the case where `VM1` is a power of two, then `Lg2` is zero and `ShiftedInt == Value`, but not always. I wouldn't mind `ShiftedMinus1` and `ValuePlus1`.

Rename the variable names as suggested by Renato. Thank you.

I thought I understand this until about the middle of the review. Now I could use some help perhaps with variable names and comments that reflect more clearly on the expression(s) you simplify. I think this is what Renato is looking for, too.

Thanks
Gerolf

lib/Target/AArch64/AArch64ISelLowering.cpp
7627–7653	You could tie it more to the code, e.g. some multiplications Var * C can be ...
7630	I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = A*B where A, B are of type +/- (2^N +/- 1)"?
7630	After this point I think you can assert(IntValue == 2^N, some power of 2).
7632	I think Value should be ShiftedMinus1 from here on.
7635	ValueOfC?
7643	no -> not
7648	dito
7654	If you declare e.g C = A * B then ShiftedInt could be ConstantA etc
7657	Operation would be more general than AddOrSub
7659	Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp?
7661	ShiftedMinus1 could be ConstandAMinus1
7662	Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt (ConstantA) -1. At this point I can't match your specification and your code. However, if I"m right about this I will need to dig deep into your test cases, too ...
7689–7705	I'll take another look at this code after I (think I) understand the code above.
7695	It is not clear to me why TrailingZeros and ExtraNeg are exclusive.

Address Gerolf's comments.

haicheng added inline comments.Nov 9 2016, 8:03 AM

lib/Target/AArch64/AArch64ISelLowering.cpp
7632	Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x, -(2^N + 1) * 2^M) => -(shl (add (shl x, N), x), M) So I use ValueOfC here.
7662	ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not support (mul x, (2^N - 1) * 2^M) => (shl (sub (shl x, N), x), M) yet. If we want to support it in the future, we just need to use ConstantA here as you said.

Hi Gerolf,

Would you please take another look? Does my latest update make the code easier to read?

Thank you,

Haicheng

I've done a bit of refactoring in r286601 and r286606, which I hope makes this a much easier code review. If you rebase the patch, I'd be happy to take a look.

FYI, Chad has made some big refactoring in this area, you will have to re-base:

http://llvm.org/viewvc/llvm-project?rev=286606&view=rev

Rebase the code.

With the minor comment, this looks good to me. But I'll let @Gerolf and @mcrosier have a final look and approve.

Thanks!

lib/Target/AArch64/AArch64ISelLowering.cpp
7663	IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA == ConstValue`. Your comment should reflect that. Not here, but above, before `ConstantA`'s instantiation. Here, you can just add the new case: // (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)

Address Renato's comment. Thank you.

LGTM with a few nits about naming of variables.

lib/Target/AArch64/AArch64ISelLowering.cpp
7633	'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering.
7649	I'd prefer 'ShiftedConstValue' over 'ConstantA'.
7663	CAMinus1 -> SCVMinus1
7665	CAMinus1 -> SCVMinus1
7692	Var -> N0
7695	Var -> N0
7696	Var -> N0
7698	Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the last bit of logic here can be written as: // Negate the result. if (NegateResult) return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Res); // Shift the result. if (TrailingZeroes) return DAG.getNode(ISD::SHL, DL, VT, Res, DAG.getConstant(TrailingZeroes, DL, MVT::i64)); return Res;

This revision is now accepted and ready to land.Nov 14 2016, 6:52 AM

Thanks for following up!
LGTM

lib/Target/AArch64/AArch64ISelLowering.cpp
7650	CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear.

This was committed in r287019.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

46 lines

test/

CodeGen/

AArch64/

mul_pow2.ll

243 lines

Diff 77734

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,618 Lines • ▼ Show 20 Lines	static SDValue performMulCombine(SDNode *N, SelectionDAG &DAG,

ConstantSDNode *C = cast<ConstantSDNode>(N->getOperand(1));		ConstantSDNode *C = cast<ConstantSDNode>(N->getOperand(1));
const APInt &ConstValue = C->getAPIntValue();		const APInt &ConstValue = C->getAPIntValue();

// Multiplication of a power of two plus/minus one can be done more		// Multiplication of a power of two plus/minus one can be done more
// cheaply as as shift+add/sub. For now, this is true unilaterally. If		// cheaply as as shift+add/sub. For now, this is true unilaterally. If
// future CPUs have a cheaper MADD instruction, this may need to be		// future CPUs have a cheaper MADD instruction, this may need to be
// gated on a subtarget feature. For Cyclone, 32-bit MADD is 4 cycles and		// gated on a subtarget feature. For Cyclone, 32-bit MADD is 4 cycles and
// 64-bit is 5 cycles, so this is always a win.		// 64-bit is 5 cycles, so this is always a win.
		// More aggressively, some multiplications Var * C can be lowered to
		// shift+add+shift if the constant C = A * B where A = 2^N + 1 and B = 2^M,
		// e.g. 6=32=(2+1)2.
		rengolinUnsubmitted Done Reply Inline Actions I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/- 1) rengolin: I think one simple formula here would be more than enough: (+/- 2^N +/- 1) * (+/- 2^M +/…
		rengolinUnsubmitted Done Reply Inline Actions Feel free to hoist those two flags out of the conditional. This will make it clear that they're invariants here. rengolin: Feel free to hoist those two flags out of the conditional. This will make it clear that they're…
		GerolfUnsubmitted Done Reply Inline Actions I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = AB where A, B are of type +/- (2^N +/- 1)"? Gerolf:* I liked the spaces in Renato's comment. Or would it be clearer to say " constants C = A*B where…
		GerolfUnsubmitted Done Reply Inline Actions After this point I think you can assert(IntValue == 2^N, some power of 2). Gerolf: After this point I think you can assert(IntValue == 2^N, some power of 2).
		// TODO: consider lowering more cases, e.g. C = 14, -6, -14 or even 45
		// which equals to (1+2)*16-(1+2).
		GerolfUnsubmitted Not Done Reply Inline Actions I think Value should be ShiftedMinus1 from here on. Gerolf: I think Value should be ShiftedMinus1 from here on.
		haichengAuthorUnsubmitted Not Done Reply Inline Actions Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x, -(2^N + 1) * 2^M) => -(shl (add (shl x, N), x), M) So I use ValueOfC here. haicheng: Similarly, I do not support (mul x, -(2^N - 1) * 2^M) => (shl (sub x, (shl x, N)), M) (mul x…
		SDValue Var = N->getOperand(0);
		mcrosierUnsubmitted Not Done Reply Inline Actions 'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering. mcrosier: 'Var' isn't descriptive. I'd prefer 'N0' as this is a common idiom in ISel lowering.
		// TrailingZeroes is used to test if the mul can be lowered to
		rengolinUnsubmitted Done Reply Inline Actions why not early return? why is this not a problem for the previous case as well? rengolin: why not early return? why is this not a problem for the previous case as well?
		// shift+add+shift.
		GerolfUnsubmitted Done Reply Inline Actions ValueOfC? Gerolf: ValueOfC?
		unsigned TrailingZeroes = ConstValue.countTrailingZeros();
		if (TrailingZeroes) {
		// Conservatively do not lower to shift+add+shift if the mul might be
		rengolinUnsubmitted Done Reply Inline Actions `N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to understand. rengolin: `N0` refers to `x`, so maybe calling it `Var` or something more meaningful would be easier to…
		// folded into smul or umul.
		if (Var->hasOneUse() && (isSignExtended(Var.getNode(), DAG) \|\|
		rengolinUnsubmitted Done Reply Inline Actions `Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name for this. rengolin: `Lg2` implies `log base 2` of `Value`, which is not true. `TrailingZeroes` is a better name…
		isZeroExtended(Var.getNode(), DAG)))
		return SDValue();
		// Conservatively do not lower to shift+add+shift if the mul might be
		rengolinUnsubmitted Done Reply Inline Actions `Shift` is not a good name, since this implies the "shift amount" not the "shifted value". rengolin: `Shift` is not a good name, since this implies the "shift amount" not the "shifted value".
		GerolfUnsubmitted Done Reply Inline Actions no -> not Gerolf: no -> not
		// folded into madd or msub.
		if (N->hasOneUse() && (N->use_begin()->getOpcode() == ISD::ADD \|\|
		N->use_begin()->getOpcode() == ISD::SUB))
		return SDValue();
		}
		GerolfUnsubmitted Done Reply Inline Actions dito Gerolf: dito
		// Use ConstantA instead of ConstValue to support both shift+add/sub and
		mcrosierUnsubmitted Not Done Reply Inline Actions I'd prefer 'ShiftedConstValue' over 'ConstantA'. mcrosier: I'd prefer 'ShiftedConstValue' over 'ConstantA'.
		// shift+add+shift.
		GerolfUnsubmitted Not Done Reply Inline Actions CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear. Gerolf: CAMinus1 is consistent with the comment. Perhaps ConstantAMinus1 would be even more clear.
		APInt ConstantA = ConstValue.ashr(TrailingZeroes);

unsigned ShiftAmt, AddSubOpc;		unsigned ShiftAmt, AddSubOpc;
		GerolfUnsubmitted Done Reply Inline Actions You could tie it more to the code, e.g. some multiplications Var * C can be ... Gerolf: You could tie it more to the code, e.g. some multiplications Var * C can be ...
// Is the shifted value the LHS operand of the add/sub?		// Is the shifted value the LHS operand of the add/sub?
		GerolfUnsubmitted Done Reply Inline Actions If you declare e.g C = A * B then ShiftedInt could be ConstantA etc Gerolf: If you declare e.g C = A * B then ShiftedInt could be ConstantA etc
bool ShiftValUseIsN0 = true;		bool ShiftValUseIsN0 = true;
// Do we need to negate the result?		// Do we need to negate the result?
bool NegateResult = false;		bool NegateResult = false;
		rengolinUnsubmitted Done Reply Inline Actions Better call this `SwapValues`, as this is the intention of the flag. rengolin: Better call this `SwapValues`, as this is the intention of the flag.
		GerolfUnsubmitted Done Reply Inline Actions Operation would be more general than AddOrSub Gerolf: Operation would be more general than AddOrSub

if (ConstValue.isNonNegative()) {		if (ConstValue.isNonNegative()) {
		GerolfUnsubmitted Done Reply Inline Actions Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp? Gerolf: Please add a comment like what values get swapped. ExtraNeg -> NegExp? Or NegSubExp?
// (mul x, 2^N + 1) => (add (shl x, N), x)		// (mul x, 2^N + 1) => (add (shl x, N), x)
// (mul x, 2^N - 1) => (sub (shl x, N), x)		// (mul x, 2^N - 1) => (sub (shl x, N), x)
		rengolinUnsubmitted Done Reply Inline Actions `VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the case where `VM1` is a power of two, then `Lg2` is zero and `ShiftedInt == Value`, but not always. I wouldn't mind `ShiftedMinus1` and `ValuePlus1`. rengolin: `VM1` implies `V Minus One` and `VP1` implies `V Plus One`, but the `Vs` are different. In the…
		GerolfUnsubmitted Done Reply Inline Actions ShiftedMinus1 could be ConstandAMinus1 Gerolf: ShiftedMinus1 could be ConstandAMinus1
APInt CVMinus1 = ConstValue - 1;		// (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)
		GerolfUnsubmitted Not Done Reply Inline Actions Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt (ConstantA) -1. At this point I can't match your specification and your code. However, if I"m right about this I will need to dig deep into your test cases, too ... Gerolf: Shouldn't ValuePlus1 be ConstantAPlus1? But then it should be ConstantAPlus1 = ShiftedInt…
		haichengAuthorUnsubmitted Not Done Reply Inline Actions ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not support (mul x, (2^N - 1) * 2^M) => (shl (sub (shl x, N), x), M) yet. If we want to support it in the future, we just need to use ConstantA here as you said. haicheng: ValueofC is used here to support (mul x, 2^N - 1) => (sub (shl x, N), x). This patch does not…
		APInt CAMinus1 = ConstantA - 1;
		rengolinUnsubmitted Done Reply Inline Actions IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA == ConstValue`. Your comment should reflect that. Not here, but above, before `ConstantA`'s instantiation. Here, you can just add the new case: // (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M) rengolin: IIGIR, the old case is still covered because if `TrailingZeroes == 0`, then `ConstantA ==…
		mcrosierUnsubmitted Not Done Reply Inline Actions CAMinus1 -> SCVMinus1 mcrosier: CAMinus1 -> SCVMinus1
APInt CVPlus1 = ConstValue + 1;		APInt CVPlus1 = ConstValue + 1;
if (CVMinus1.isPowerOf2()) {		if (CAMinus1.isPowerOf2()) {
		mcrosierUnsubmitted Not Done Reply Inline Actions CAMinus1 -> SCVMinus1 mcrosier: CAMinus1 -> SCVMinus1
ShiftAmt = CVMinus1.logBase2();		ShiftAmt = CAMinus1.logBase2();
AddSubOpc = ISD::ADD;		AddSubOpc = ISD::ADD;
} else if (CVPlus1.isPowerOf2()) {		} else if (CVPlus1.isPowerOf2()) {
ShiftAmt = CVPlus1.logBase2();		ShiftAmt = CVPlus1.logBase2();
AddSubOpc = ISD::SUB;		AddSubOpc = ISD::SUB;
} else		} else
return SDValue();		return SDValue();
} else {		} else {
// (mul x, -(2^N - 1)) => (sub x, (shl x, N))		// (mul x, -(2^N - 1)) => (sub x, (shl x, N))
// (mul x, -(2^N + 1)) => - (add (shl x, N), x)		// (mul x, -(2^N + 1)) => - (add (shl x, N), x)
APInt CVNegPlus1 = -ConstValue + 1;		APInt CVNegPlus1 = -ConstValue + 1;
APInt CVNegMinus1 = -ConstValue - 1;		APInt CVNegMinus1 = -ConstValue - 1;
if (CVNegPlus1.isPowerOf2()) {		if (CVNegPlus1.isPowerOf2()) {
ShiftAmt = CVNegPlus1.logBase2();		ShiftAmt = CVNegPlus1.logBase2();
AddSubOpc = ISD::SUB;		AddSubOpc = ISD::SUB;
ShiftValUseIsN0 = false;		ShiftValUseIsN0 = false;
} else if (CVNegMinus1.isPowerOf2()) {		} else if (CVNegMinus1.isPowerOf2()) {
ShiftAmt = CVNegMinus1.logBase2();		ShiftAmt = CVNegMinus1.logBase2();
AddSubOpc = ISD::ADD;		AddSubOpc = ISD::ADD;
NegateResult = true;		NegateResult = true;
} else		} else
return SDValue();		return SDValue();
}		}

SDLoc DL(N);		SDLoc DL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0);		SDValue ShiftedVal = DAG.getNode(ISD::SHL, DL, VT, Var,
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
SDValue ShiftedVal = DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),
DAG.getConstant(ShiftAmt, DL, MVT::i64));		DAG.getConstant(ShiftAmt, DL, MVT::i64));

SDValue AddSubN0 = ShiftValUseIsN0 ? ShiftedVal : N0;		SDValue AddSubN0 = ShiftValUseIsN0 ? ShiftedVal : Var;
		GerolfUnsubmitted Done Reply Inline Actions It is not clear to me why TrailingZeros and ExtraNeg are exclusive. Gerolf: It is not clear to me why TrailingZeros and ExtraNeg are exclusive.
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
SDValue AddSubN1 = ShiftValUseIsN0 ? N0 : ShiftedVal;		SDValue AddSubN1 = ShiftValUseIsN0 ? Var : ShiftedVal;
		mcrosierUnsubmitted Not Done Reply Inline Actions Var -> N0 mcrosier: Var -> N0
SDValue Res = DAG.getNode(AddSubOpc, DL, VT, AddSubN0, AddSubN1);		SDValue Res = DAG.getNode(AddSubOpc, DL, VT, AddSubN0, AddSubN1);
if (!NegateResult)		if (TrailingZeroes == 0 && !NegateResult)
		mcrosierUnsubmitted Not Done Reply Inline Actions Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the last bit of logic here can be written as: // Negate the result. if (NegateResult) return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Res); // Shift the result. if (TrailingZeroes) return DAG.getNode(ISD::SHL, DL, VT, Res, DAG.getConstant(TrailingZeroes, DL, MVT::i64)); return Res; mcrosier: Please add an assert showing that TrailingZeroes and NegateResult can't both be true. Then the…
return Res;		return Res;
		// Shift the result.
		if (TrailingZeroes)
		return DAG.getNode(ISD::SHL, DL, VT, Res,
		DAG.getConstant(TrailingZeroes, DL, MVT::i64));
// Negate the result.		// Negate the result.
return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Res);		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Res);
		GerolfUnsubmitted Not Done Reply Inline Actions I'll take another look at this code after I (think I) understand the code above. Gerolf: I'll take another look at this code after I (think I) understand the code above.
}		}

static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,		static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// Take advantage of vector comparisons producing 0 or -1 in each lane to		// Take advantage of vector comparisons producing 0 or -1 in each lane to
// optimize away operation when it's from a constant.		// optimize away operation when it's from a constant.
//		//
// The general transformation is:		// The general transformation is:
▲ Show 20 Lines • Show All 2,803 Lines • Show Last 20 Lines

test/CodeGen/AArch64/mul_pow2.ll

	; RUN: llc < %s -mtriple=aarch64-eabi \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-eabi \| FileCheck %s

	; Convert mul x, pow2 to shift.			; Convert mul x, pow2 to shift.
	; Convert mul x, pow2 +/- 1 to shift + add/sub.			; Convert mul x, pow2 +/- 1 to shift + add/sub.
				; Convert mul x, (pow2 + 1) * pow2 to shift + add + shift.
				rengolinUnsubmitted Not Done Reply Inline Actions You say "shift+add+shift" but your tests are on the form "add+shift". rengolin: You say "shift+add+shift" but your tests are on the form "add+shift".
				; Lowering other positive constants are not supported yet.

	define i32 @test2(i32 %x) {			define i32 @test2(i32 %x) {
	; CHECK-LABEL: test2			; CHECK-LABEL: test2
	; CHECK: lsl w0, w0, #1			; CHECK: lsl w0, w0, #1

	%mul = shl nsw i32 %x, 1			%mul = shl nsw i32 %x, 1
	ret i32 %mul			ret i32 %mul
	}			}
	Show All 18 Lines
	; CHECK-LABEL: test5			; CHECK-LABEL: test5
	; CHECK: add w0, w0, w0, lsl #2			; CHECK: add w0, w0, w0, lsl #2


	%mul = mul nsw i32 %x, 5			%mul = mul nsw i32 %x, 5
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @test6_32b(i32 %x) {
				; CHECK-LABEL: test6
				; CHECK: add {{w[0-9]+}}, w0, w0, lsl #1
				; CHECK: lsl w0, {{w[0-9]+}}, #1

				%mul = mul nsw i32 %x, 6
				ret i32 %mul
				}

				define i64 @test6_64b(i64 %x) {
				; CHECK-LABEL: test6_64b
				; CHECK: add {{x[0-9]+}}, x0, x0, lsl #1
				; CHECK: lsl x0, {{x[0-9]+}}, #1

				%mul = mul nsw i64 %x, 6
				ret i64 %mul
				}

				; mul that appears together with add, sub, s(z)ext is not supported to be
				; converted to the combination of lsl, add/sub yet.
				define i64 @test6_umull(i32 %x) {
				; CHECK-LABEL: test6_umull
				; CHECK: umull x0, w0, {{w[0-9]+}}

				%ext = zext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				ret i64 %mul
				}

				define i64 @test6_smull(i32 %x) {
				; CHECK-LABEL: test6_smull
				; CHECK: smull x0, w0, {{w[0-9]+}}

				%ext = sext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				ret i64 %mul
				}

				define i32 @test6_madd(i32 %x, i32 %y) {
				; CHECK-LABEL: test6_madd
				; CHECK: madd w0, w0, {{w[0-9]+}}, w1

				%mul = mul nsw i32 %x, 6
				%add = add i32 %mul, %y
				ret i32 %add
				}

				define i32 @test6_msub(i32 %x, i32 %y) {
				; CHECK-LABEL: test6_msub
				; CHECK: msub w0, w0, {{w[0-9]+}}, w1

				%mul = mul nsw i32 %x, 6
				%sub = sub i32 %y, %mul
				ret i32 %sub
				}

				define i64 @test6_umaddl(i32 %x, i64 %y) {
				; CHECK-LABEL: test6_umaddl
				; CHECK: umaddl x0, w0, {{w[0-9]+}}, x1

				%ext = zext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%add = add i64 %mul, %y
				ret i64 %add
				}

				define i64 @test6_smaddl(i32 %x, i64 %y) {
				; CHECK-LABEL: test6_smaddl
				; CHECK: smaddl x0, w0, {{w[0-9]+}}, x1

				%ext = sext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%add = add i64 %mul, %y
				ret i64 %add
				}

				define i64 @test6_umsubl(i32 %x, i64 %y) {
				; CHECK-LABEL: test6_umsubl
				; CHECK: umsubl x0, w0, {{w[0-9]+}}, x1

				%ext = zext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%sub = sub i64 %y, %mul
				ret i64 %sub
				}

				define i64 @test6_smsubl(i32 %x, i64 %y) {
				; CHECK-LABEL: test6_smsubl
				; CHECK: smsubl x0, w0, {{w[0-9]+}}, x1

				%ext = sext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%sub = sub i64 %y, %mul
				ret i64 %sub
				}

				define i64 @test6_umnegl(i32 %x) {
				; CHECK-LABEL: test6_umnegl
				; CHECK: umnegl x0, w0, {{w[0-9]+}}

				%ext = zext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @test6_smnegl(i32 %x) {
				; CHECK-LABEL: test6_smnegl
				; CHECK: smnegl x0, w0, {{w[0-9]+}}

				%ext = sext i32 %x to i64
				%mul = mul nsw i64 %ext, 6
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

	define i32 @test7(i32 %x) {			define i32 @test7(i32 %x) {
	; CHECK-LABEL: test7			; CHECK-LABEL: test7
	; CHECK: lsl {{w[0-9]+}}, w0, #3			; CHECK: lsl {{w[0-9]+}}, w0, #3
	; CHECK: sub w0, {{w[0-9]+}}, w0			; CHECK: sub w0, {{w[0-9]+}}, w0

	%mul = mul nsw i32 %x, 7			%mul = mul nsw i32 %x, 7
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test8(i32 %x) {			define i32 @test8(i32 %x) {
	; CHECK-LABEL: test8			; CHECK-LABEL: test8
	; CHECK: lsl w0, w0, #3			; CHECK: lsl w0, w0, #3

	%mul = shl nsw i32 %x, 3			%mul = shl nsw i32 %x, 3
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test9(i32 %x) {			define i32 @test9(i32 %x) {
	; CHECK-LABEL: test9			; CHECK-LABEL: test9
	; CHECK: add w0, w0, w0, lsl #3			; CHECK: add w0, w0, w0, lsl #3

	%mul = mul nsw i32 %x, 9			%mul = mul nsw i32 %x, 9
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @test10(i32 %x) {
				; CHECK-LABEL: test10
				; CHECK: add {{w[0-9]+}}, w0, w0, lsl #2
				; CHECK: lsl w0, {{w[0-9]+}}, #1

				%mul = mul nsw i32 %x, 10
				ret i32 %mul
				}

				define i32 @test11(i32 %x) {
				; CHECK-LABEL: test11
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, 11
				ret i32 %mul
				}

				define i32 @test12(i32 %x) {
				; CHECK-LABEL: test12
				; CHECK: add {{w[0-9]+}}, w0, w0, lsl #1
				; CHECK: lsl w0, {{w[0-9]+}}, #2

				%mul = mul nsw i32 %x, 12
				ret i32 %mul
				}

				define i32 @test13(i32 %x) {
				; CHECK-LABEL: test13
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, 13
				ret i32 %mul
				}

				define i32 @test14(i32 %x) {
				; CHECK-LABEL: test14
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, 14
				ret i32 %mul
				}

				define i32 @test15(i32 %x) {
				; CHECK-LABEL: test15
				; CHECK: lsl {{w[0-9]+}}, w0, #4
				; CHECK: sub w0, {{w[0-9]+}}, w0

				%mul = mul nsw i32 %x, 15
				ret i32 %mul
				}

				define i32 @test16(i32 %x) {
				; CHECK-LABEL: test16
				; CHECK: lsl w0, w0, #4

				%mul = mul nsw i32 %x, 16
				ret i32 %mul
				}

	; Convert mul x, -pow2 to shift.			; Convert mul x, -pow2 to shift.
	; Convert mul x, -(pow2 +/- 1) to shift + add/sub.			; Convert mul x, -(pow2 +/- 1) to shift + add/sub.
				; Lowering other negative constants are not supported yet.

	define i32 @ntest2(i32 %x) {			define i32 @ntest2(i32 %x) {
	; CHECK-LABEL: ntest2			; CHECK-LABEL: ntest2
	; CHECK: neg w0, w0, lsl #1			; CHECK: neg w0, w0, lsl #1

	%mul = mul nsw i32 %x, -2			%mul = mul nsw i32 %x, -2
	ret i32 %mul			ret i32 %mul
	}			}
	Show All 17 Lines
	define i32 @ntest5(i32 %x) {			define i32 @ntest5(i32 %x) {
	; CHECK-LABEL: ntest5			; CHECK-LABEL: ntest5
	; CHECK: add {{w[0-9]+}}, w0, w0, lsl #2			; CHECK: add {{w[0-9]+}}, w0, w0, lsl #2
	; CHECK: neg w0, {{w[0-9]+}}			; CHECK: neg w0, {{w[0-9]+}}
	%mul = mul nsw i32 %x, -5			%mul = mul nsw i32 %x, -5
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @ntest6(i32 %x) {
				; CHECK-LABEL: ntest6
				; CHECK: mul w0, w0, {{w[0-9]+}}
				rengolinUnsubmitted Done Reply Inline Actions Please use {{w[0-9]+}} instead of w8. rengolin: Please use {{w[0-9]+}} instead of w8.

				%mul = mul nsw i32 %x, -6
				ret i32 %mul
				}

	define i32 @ntest7(i32 %x) {			define i32 @ntest7(i32 %x) {
	; CHECK-LABEL: ntest7			; CHECK-LABEL: ntest7
	; CHECK: sub w0, w0, w0, lsl #3			; CHECK: sub w0, w0, w0, lsl #3

	%mul = mul nsw i32 %x, -7			%mul = mul nsw i32 %x, -7
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @ntest8(i32 %x) {			define i32 @ntest8(i32 %x) {
	; CHECK-LABEL: ntest8			; CHECK-LABEL: ntest8
	; CHECK: neg w0, w0, lsl #3			; CHECK: neg w0, w0, lsl #3

	%mul = mul nsw i32 %x, -8			%mul = mul nsw i32 %x, -8
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @ntest9(i32 %x) {			define i32 @ntest9(i32 %x) {
	; CHECK-LABEL: ntest9			; CHECK-LABEL: ntest9
	; CHECK: add {{w[0-9]+}}, w0, w0, lsl #3			; CHECK: add {{w[0-9]+}}, w0, w0, lsl #3
	; CHECK: neg w0, {{w[0-9]+}}			; CHECK: neg w0, {{w[0-9]+}}

	%mul = mul nsw i32 %x, -9			%mul = mul nsw i32 %x, -9
	ret i32 %mul			ret i32 %mul
	}			}

				define i32 @ntest10(i32 %x) {
				; CHECK-LABEL: ntest10
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, -10
				ret i32 %mul
				}

				define i32 @ntest11(i32 %x) {
				; CHECK-LABEL: ntest11
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, -11
				ret i32 %mul
				}

				define i32 @ntest12(i32 %x) {
				; CHECK-LABEL: ntest12
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, -12
				ret i32 %mul
				}

				define i32 @ntest13(i32 %x) {
				; CHECK-LABEL: ntest13
				; CHECK: mul w0, w0, {{w[0-9]+}}
				%mul = mul nsw i32 %x, -13
				ret i32 %mul
				}

				define i32 @ntest14(i32 %x) {
				; CHECK-LABEL: ntest14
				; CHECK: mul w0, w0, {{w[0-9]+}}

				%mul = mul nsw i32 %x, -14
				ret i32 %mul
				}

				define i32 @ntest15(i32 %x) {
				; CHECK-LABEL: ntest15
				; CHECK: sub w0, w0, w0, lsl #4

				%mul = mul nsw i32 %x, -15
				ret i32 %mul
				}

				define i32 @ntest16(i32 %x) {
				; CHECK-LABEL: ntest16
				; CHECK: neg w0, w0, lsl #4

				%mul = mul nsw i32 %x, -16
				ret i32 %mul
				}