Page MenuHomePhabricator

[ARM] Fix issue with SMLAL (Signed Multiply Accumulate Long) lowering
ClosedPublic

Authored by jyoti.allur on Jan 15 2015, 7:43 AM.

Details

Summary

[ARM] This patch addresses following issue.

long long foo(int a, int b, int c, int d) {
  long long acc = (long long)a * (long long)b;
  acc += (long long)c * (long long)d;
  return acc;
}

Should compile to use SMLAL (Signed Multiply Accumulate Long) which multiplies
two signed 32-bit values to produce a 64-bit value, and accumulates this with
a 64-bit value.

We currently get this for v7:

_foo:
        smull	 r0, r1, r1, r0
	smull	 r2, r3, r3, r2
	adds	r0, r2, r0
	adc	r1, r3, r1
	bx	lr

The above is reduced to following with this patch:

_foo:
        smull	 r0, r1, r1, r0
	smlal	 r0, r1, r3, r2
	bx	lr

Diff Detail

Event Timeline

jyoti.allur retitled this revision from to [ARM] Fix SMLAL (Signed Multiply Accumulate Long) lowering.
jyoti.allur updated this object.
jyoti.allur edited the test plan for this revision. (Show Details)
jyoti.allur set the repository for this revision to rL LLVM.
jyoti.allur added a subscriber: Unknown Object (MLST).
jyoti.allur updated this object.Jan 15 2015, 7:44 AM
jyoti.allur retitled this revision from [ARM] Fix SMLAL (Signed Multiply Accumulate Long) lowering to [ARM] Fix issue with SMLAL (Signed Multiply Accumulate Long) lowering.

This doesn't look like the right fix: just because AddcOp0 is *some* SMUL_LOHI doesn't mean AddcOp1 isn't the right one. This patch just perturbs which one we investigate slightly and works in this case but will probably make an equal number worse.

../llvm/test/CodeGen/ARM/longMAC.ll
81

We should probably test the actual dataflow here, rather than just the existence of an smlal instruction.

jyoti.allur edited edge metadata.

Hi Tim,
The rationale behind my thinking was that there would never be a case where we need to update the same LoMul and LowAdd variables
consecutively, but as you said, it could be a wrong assumption. I am finding it hard to imagine a case though.
I have updated by adding appropriate checks to ensure LoMul and LowAdd variables are updated with its linked ISD::SMUL_LOHI node results
Thanks for reviewing.

Hi Jyoti,

The rationale behind my thinking was that there would never be a case where we need to update the same LoMul and LowAdd variables consecutively.

I agree with that, but think the problem was really how we were deciding which one we should be updating. This fix looks more likely, but still improvable:

../llvm/lib/Target/ARM/ARMISelLowering.cpp
8081

I think we actually want this check to be

if (AddcOp0 == MULOp.getValue(0))

The existing weird opcode logic looks like it was a buggy attempt to work around the problem that ADDE and ADDC use different halves of *MUL_LOHI. I suspect you could get wrong codegen if you could arrange that the low half of SMUL_LOHI gets fed into the ADDE and the high half into the ADDC.

If that's right, we also want to check that the AddeOp is using the MULOp->getValue(1) part; and while we're at it, this check just below has become redundant:

if (LoMul->getNode() != HiMul->getNode())
  return SDValue();

I suspect you could get wrong codegen if you could arrange that the low half of SMUL_LOHI gets fed into the ADDE and the high half into the ADDC.

Yep:

#include <stdint.h>

uint64_t foo(uint64_t acc, int lhs, int rhs) {
  uint64_t prod = (uint64_t)lhs * rhs;
  uint64_t prod_swap = ((prod & 0xffffffffULL) << 32) | (prod >> 32);
  return acc + prod_swap;
}

I don't think that should be an smlal.

jyoti.allur updated this revision to Diff 18513.EditedJan 21 2015, 7:23 AM
jyoti.allur removed rL LLVM as the repository for this revision.

Hi Tim,
You were right, the issue existed with the original buggy opcode checking which is now corrected without additional checks.
Thanks.

Hi Jyoti,

Thanks for updating the patch, but I don't think the logic is quite there yet:

../llvm/lib/Target/ARM/ARMISelLowering.cpp
8085

I think you've misunderstood. Addc should always refer to the 0 (low) value of MUL_LOHI. We also need to check that Adde always refers to the 1 (high) value.

E.g.

#include <stdint.h>

uint64_t foo(uint64_t acc, uint32_t lhs, uint32_t rhs) {
  uint64_t prod = (uint64_t)lhs * rhs;
  uint64_t weird_in = (prod & 0xffffffffULL) | ((prod & 0xffffffffULL) << 32);
  return acc + prod_weird;
}

Hi Tim,
I think i had submitted the previous one in a bit of hurry. Sorry about that.
Should we include a new test case in which we have weird operation on prod as well ?

t.p.northover accepted this revision.Jan 22 2015, 7:11 AM
t.p.northover edited edge metadata.

Hi Jyoti,

Thanks for updating the patch. The code looks good to me now.

Should we include a new test case in which we have weird operation on prod as well ?

It would probably be a good idea; it's only one line different from the second one you're already adding I think. Feel free to commit with that test added.

Cheers.

Tim.

This revision is now accepted and ready to land.Jan 22 2015, 7:11 AM
jyoti.allur closed this revision.Jan 23 2015, 1:12 AM

Thanks for the review Tim.
Closed by commit rL226904 (authored @jyoti.allur )