This is an archive of the discontinued LLVM Phabricator instance.

[MSP430] Implement multiplication by a constant
Needs ReviewPublic

Authored by pftbest on Jul 30 2017, 9:00 AM.

Details

Reviewers
asl
awygle
Summary

The algorithm is borrowed from the Mips backend.

This also makes jumptables more efficient, they no longer require
a libcall.

Event Timeline

pftbest created this revision.Jul 30 2017, 9:00 AM

The algorithm is borrowed from the Mips backend.

This looks like target-independent code, and if multiple backends will be using it, please turn it into a target-independent utility, and use that function from both backends, or make it part of the default lowering.

This looks like target-independent code

You are right, genConstMult function is target-independent.

make it part of the default lowering

I don't think that would work, because we use this function differently.
Mips is calling it from DAGCombine, but MSP430 needs it for lowering.

please turn it into a target-independent utility

Can you please suggest where can I put this function? I've never heard about target-independent utilities.

Can you please suggest where can I put this function? I've never heard about target-independent utilities.

For this particular function, if you just want to share the code between MSP430 and MIPS, probably makes sense to stick it into the SelectionDAG class

The alternative would be to make this a target-independent lowering in LegalizeDAG.


Do you need to worry about codesize here? Lowering something like "a * 0x3333" to an inline sequence like this is going to generate a lot of code.

Do you need to worry about codesize here? Lowering something like "a * 0x3333" to an inline sequence like this is going to generate a lot of code.

The a * 0x3333 generates the following sequence (the register allocation is not very good in this case):

; BB#0:
	mov.b	r12, r13
	rla.b	r13
	add.b	r13, r12
	rla.b	r13
	rla.b	r13
	rla.b	r13
	mov.b	r13, r14
	sub.b	r12, r14
	rla.b	r13
	rla.b	r13
	sub.b	r14, r13
	mov.w	r13, r12
	ret

It is 24 bytes in size and takes only 12 cycles, which is both smaller and faster than a library call.
If we have the hardware multiplier, then a library function is 20 bytes in size (+4 bytes per call), and takes 20 cycles to execute (including interrupts disabling).
But if we don't have a hardware multiplier, the library function is 70 bytes in size and i don't know how much slower.

We can provide an option like Lanai did ("lanai-constant-mul-threshold"), to limit the number of operations, but I think it would only be useful if we have a hardware multiplier.

That looks like the code for "a * 0x33", not "a * 0x3333"... but I see your point.

@efriedma
Good catch, I accidentally used i8 type instead of i16 and didn't notice, sorry.
The correct code is rather large indeed (27 ops), so I'll add the option, and set the default limit to 12.

; BB#0:
	mov.w	r12, r13
	rla.w	r12
	add.w	r12, r13
	rla.w	r12
	rla.w	r12
	rla.w	r12
	mov.w	r12, r14
	sub.w	r13, r14
	rla.w	r12
	rla.w	r12
	mov.w	r12, r13
	sub.w	r14, r13
	rla.w	r12
	rla.w	r12
	mov.w	r12, r14
	sub.w	r13, r14
	rla.w	r12
	rla.w	r12
	mov.w	r12, r13
	sub.w	r14, r13
	rla.w	r12
	rla.w	r12
	mov.w	r12, r14
	sub.w	r13, r14
	rla.w	r12
	rla.w	r12
	sub.w	r14, r12
	ret
pftbest updated this revision to Diff 111437.Aug 16 2017, 5:04 PM

Moved the algorithm into SelectionDAG
Added an option to limit the number of operations

pftbest updated this revision to Diff 111444.Aug 16 2017, 5:52 PM

Fixed an issue with i8 not being promoted to i16 in case we fall back
to libcall.

Added more tests.

ping, please review

I'm not really the best person to review this (since I'm not really that familiar with either MIPS or MSP430), but I can provide some more comments.

lib/Target/MSP430/MSP430ISelLowering.cpp
1337

I'm not sure this algorithm is right? In any case, it's a lot different from what getMulByConstant is actually doing, so needs a lot of comments.

test/CodeGen/MSP430/mul-by-constant.ll
10

A testcase for some negative numbers would make sense. (e.g. x*-3 should lower to something like x - (x << 2)).