This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix issue with SMLAL (Signed Multiply Accumulate Long) lowering
ClosedPublic

Authored by jyoti.allur on Jan 15 2015, 7:43 AM.

Download Raw Diff

Details

Reviewers

compnerd
t.p.northover
jmolloy

Summary

[ARM] This patch addresses following issue.

long long foo(int a, int b, int c, int d) {
  long long acc = (long long)a * (long long)b;
  acc += (long long)c * (long long)d;
  return acc;
}

Should compile to use SMLAL (Signed Multiply Accumulate Long) which multiplies
two signed 32-bit values to produce a 64-bit value, and accumulates this with
a 64-bit value.

We currently get this for v7:

_foo:
        smull	 r0, r1, r1, r0
	smull	 r2, r3, r3, r2
	adds	r0, r2, r0
	adc	r1, r3, r1
	bx	lr

The above is reduced to following with this patch:

_foo:
        smull	 r0, r1, r1, r0
	smlal	 r0, r1, r3, r2
	bx	lr

Diff Detail

Event Timeline

jyoti.allur updated this revision to Diff 18228.Jan 15 2015, 7:43 AM

jyoti.allur retitled this revision from to [ARM] Fix SMLAL (Signed Multiply Accumulate Long) lowering.

jyoti.allur updated this object.

jyoti.allur edited the test plan for this revision. (Show Details)

jyoti.allur added reviewers: t.p.northover, compnerd.

jyoti.allur set the repository for this revision to rL LLVM.

jyoti.allur added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptJan 15 2015, 7:43 AM

jyoti.allur updated this object.Jan 15 2015, 7:44 AM

jyoti.allur added a reviewer: jmolloy.Jan 15 2015, 12:25 PM

jyoti.allur retitled this revision from [ARM] Fix SMLAL (Signed Multiply Accumulate Long) lowering to [ARM] Fix issue with SMLAL (Signed Multiply Accumulate Long) lowering.

This doesn't look like the right fix: just because AddcOp0 is *some* SMUL_LOHI doesn't mean AddcOp1 isn't the right one. This patch just perturbs which one we investigate slightly and works in this case but will probably make an equal number worse.

../llvm/test/CodeGen/ARM/longMAC.ll
81	We should probably test the actual dataflow here, rather than just the existence of an smlal instruction.

Hi Tim,
The rationale behind my thinking was that there would never be a case where we need to update the same LoMul and LowAdd variables
consecutively, but as you said, it could be a wrong assumption. I am finding it hard to imagine a case though.
I have updated by adding appropriate checks to ensure LoMul and LowAdd variables are updated with its linked ISD::SMUL_LOHI node results
Thanks for reviewing.

Hi Jyoti,

The rationale behind my thinking was that there would never be a case where we need to update the same LoMul and LowAdd variables consecutively.

I agree with that, but think the problem was really how we were deciding which one we should be updating. This fix looks more likely, but still improvable:

../llvm/lib/Target/ARM/ARMISelLowering.cpp
8081	I think we actually want this check to be if (AddcOp0 == MULOp.getValue(0)) The existing weird opcode logic looks like it was a buggy attempt to work around the problem that ADDE and ADDC use different halves of *MUL_LOHI. I suspect you could get wrong codegen if you could arrange that the low half of SMUL_LOHI gets fed into the ADDE and the high half into the ADDC. If that's right, we also want to check that the AddeOp is using the MULOp->getValue(1) part; and while we're at it, this check just below has become redundant: if (LoMul->getNode() != HiMul->getNode()) return SDValue();

I suspect you could get wrong codegen if you could arrange that the low half of SMUL_LOHI gets fed into the ADDE and the high half into the ADDC.

Yep:

#include <stdint.h>

uint64_t foo(uint64_t acc, int lhs, int rhs) {
  uint64_t prod = (uint64_t)lhs * rhs;
  uint64_t prod_swap = ((prod & 0xffffffffULL) << 32) | (prod >> 32);
  return acc + prod_swap;
}

I don't think that should be an smlal.

Hi Tim,
You were right, the issue existed with the original buggy opcode checking which is now corrected without additional checks.
Thanks.

Hi Jyoti,

Thanks for updating the patch, but I don't think the logic is quite there yet:

../llvm/lib/Target/ARM/ARMISelLowering.cpp

8085

I think you've misunderstood. Addc should always refer to the 0 (low) value of MUL_LOHI. We also need to check that Adde always refers to the 1 (high) value.

E.g.

#include <stdint.h>

uint64_t foo(uint64_t acc, uint32_t lhs, uint32_t rhs) {
  uint64_t prod = (uint64_t)lhs * rhs;
  uint64_t weird_in = (prod & 0xffffffffULL) | ((prod & 0xffffffffULL) << 32);
  return acc + prod_weird;
}

Hi Tim,
I think i had submitted the previous one in a bit of hurry. Sorry about that.
Should we include a new test case in which we have weird operation on prod as well ?

Hi Jyoti,

Thanks for updating the patch. The code looks good to me now.

Should we include a new test case in which we have weird operation on prod as well ?

It would probably be a good idea; it's only one line different from the second one you're already adding I think. Feel free to commit with that test added.

Cheers.

Tim.

This revision is now accepted and ready to land.Jan 22 2015, 7:11 AM

Thanks for the review Tim.
Closed by commit rL226904 (authored @jyoti.allur )

Revision Contents

Path

Size

../

llvm/

lib/

Target/

ARM/

ARMISelLowering.cpp

17 lines

test/

CodeGen/

ARM/

longMAC.ll

28 lines

Diff 18614

../llvm/lib/Target/ARM/ARMISelLowering.cpp

Context not available.
	else	else
	IsLeftOperandMUL = true;	IsLeftOperandMUL = true;
	if (MULOp == SDValue())	if (MULOp == SDValue())
	return SDValue();	return SDValue();

	// Figure out the right opcode.	// Figure out the right opcode.
	unsigned Opc = MULOp->getOpcode();	unsigned Opc = MULOp->getOpcode();
Context not available.
	unsigned FinalOpc = (Opc == ISD::SMUL_LOHI) ? ARMISD::SMLAL : ARMISD::UMLAL;	unsigned FinalOpc = (Opc == ISD::SMUL_LOHI) ? ARMISD::SMLAL : ARMISD::UMLAL;

	// Figure out the high and low input values to the MLAL node.	// Figure out the high and low input values to the MLAL node.
	SDValue* HiMul = &MULOp;
	SDValue* HiAdd = nullptr;	SDValue* HiAdd = nullptr;
	SDValue* LoMul = nullptr;	SDValue* LoMul = nullptr;
	SDValue* LowAdd = nullptr;	SDValue* LowAdd = nullptr;

		// Ensure that ADDE is from high result of ISD::SMUL_LOHI.
		if ((AddeOp0 != MULOp.getValue(1)) && (AddeOp1 != MULOp.getValue(1)))
		return SDValue();

	if (IsLeftOperandMUL)	if (IsLeftOperandMUL)
	HiAdd = &AddeOp1;	HiAdd = &AddeOp1;
	else	else
Context not available.
	HiAdd = &AddeOp0;	HiAdd = &AddeOp0;


		t.p.northoverUnsubmitted Not Done Reply Inline Actions I think we actually want this check to be if (AddcOp0 == MULOp.getValue(0)) The existing weird opcode logic looks like it was a buggy attempt to work around the problem that ADDE and ADDC use different halves of MUL_LOHI. I suspect you could get wrong codegen if you could arrange that the low half of SMUL_LOHI gets fed into the ADDE and the high half into the ADDC. If that's right, we also want to check that the AddeOp is using the MULOp->getValue(1) part; and while we're at it, this check just below has become redundant: if (LoMul->getNode() != HiMul->getNode()) return SDValue(); t.p.northover:* I think we actually want this check to be if (AddcOp0 == MULOp.getValue(0)) The existing…
	if (AddcOp0->getOpcode() == Opc) {	// Ensure that LoMul and LowAdd are taken from correct ISD::SMUL_LOHI node
		// whose low result is fed to the ADDC we are checking.

		if (AddcOp0 == MULOp.getValue(0)) {
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I think you've misunderstood. Addc should always refer to the 0 (low) value of MUL_LOHI. We also need to check that Adde always refers to the 1 (high) value. E.g. #include <stdint.h> uint64_t foo(uint64_t acc, uint32_t lhs, uint32_t rhs) { uint64_t prod = (uint64_t)lhs * rhs; uint64_t weird_in = (prod & 0xffffffffULL) \| ((prod & 0xffffffffULL) << 32); return acc + prod_weird; } t.p.northover: I think you've misunderstood. Addc should always refer to the 0 (low) value of MUL_LOHI. We…
	LoMul = &AddcOp0;	LoMul = &AddcOp0;
	LowAdd = &AddcOp1;	LowAdd = &AddcOp1;
	}	}
	if (AddcOp1->getOpcode() == Opc) {	if (AddcOp1 == MULOp.getValue(0)) {
	LoMul = &AddcOp1;	LoMul = &AddcOp1;
	LowAdd = &AddcOp0;	LowAdd = &AddcOp0;
	}	}
Context not available.
	if (!LoMul)	if (!LoMul)
	return SDValue();	return SDValue();

	if (LoMul->getNode() != HiMul->getNode())
	return SDValue();

	// Create the merged node.	// Create the merged node.
	SelectionDAG &DAG = DCI.DAG;	SelectionDAG &DAG = DCI.DAG;

Context not available.

../llvm/test/CodeGen/ARM/longMAC.ll

Context not available.
	%add = add i64 %mul, %c	%add = add i64 %mul, %c
	ret i64 %add	ret i64 %add
	}	}

		define i64 @MACLongTest6(i32 %a, i32 %b, i32 %c, i32 %d) {
		;CHECK-LABEL: MACLongTest6:
		;CHECK: smull r12, lr, r1, r0
		t.p.northoverUnsubmitted Not Done Reply Inline Actions We should probably test the actual dataflow here, rather than just the existence of an smlal instruction. t.p.northover: We should probably test the actual dataflow here, rather than just the existence of an smlal…
		;CHECK: smlal r12, lr, r3, r2
		%conv = sext i32 %a to i64
		%conv1 = sext i32 %b to i64
		%mul = mul nsw i64 %conv1, %conv
		%conv2 = sext i32 %c to i64
		%conv3 = sext i32 %d to i64
		%mul4 = mul nsw i64 %conv3, %conv2
		%add = add nsw i64 %mul4, %mul
		ret i64 %add
		}

		define i64 @MACLongTest7(i64 %acc, i32 %lhs, i32 %rhs) {
		;CHECK-LABEL: MACLongTest7:
		;CHECK-NOT: smlal
		%conv = sext i32 %lhs to i64
		%conv1 = sext i32 %rhs to i64
		%mul = mul nsw i64 %conv1, %conv
		%shl = shl i64 %mul, 32
		%shr = lshr i64 %mul, 32
		%or = or i64 %shl, %shr
		%add = add i64 %or, %acc
		ret i64 %add
		}

Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix issue with SMLAL (Signed Multiply Accumulate Long) loweringClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 18614

../llvm/lib/Target/ARM/ARMISelLowering.cpp

../llvm/test/CodeGen/ARM/longMAC.ll

[ARM] Fix issue with SMLAL (Signed Multiply Accumulate Long) lowering
ClosedPublic