Page MenuHomePhabricator

[AAch64] Optimize muls with operands having enough sign bits.
ClosedPublic

Authored by bipmis on Nov 28 2022, 7:07 AM.

Details

Summary

Muls with 64bit operands where each of the operand is having more than 32 sign bits, we can generate a single smull instruction on a 32bit operand.

Diff Detail

Event Timeline

bipmis requested review of this revision.Nov 28 2022, 7:07 AM
bipmis created this revision.

Could this apply to umull too? We should also (not necessarily in this commit) look into improving GlobalISel too. I think it should have enough info nowadays to perform the same ComputeNumSignBits check.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
1914

Could the same thing be done for SMSUBLrrr too?
And the additional add/sub patterns below too.

1957

Can the Pat be moved into the above block, and the PatFrag be moved maybe closer to the add_and_or_is_add existing PatFrag.

1958

I think it maybe needs to be 33 bits. Sign bits are always off by one. Can you add some tests for the edge cases?

llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll
92

We can remove the nsw from the mul.
Do you have any tests for the commuted form of this too?

bipmis updated this revision to Diff 478560.Nov 29 2022, 6:29 AM

Update the patch with support to madd,msub. Handle review comments.

bipmis marked 4 inline comments as done.Nov 29 2022, 6:35 AM

Could this apply to umull too? We should also (not necessarily in this commit) look into improving GlobalISel too. I think it should have enough info nowadays to perform the same ComputeNumSignBits check.

I think in this patch we are looking for handling the muls with sign-bits. We can possibly handle the umull scenario in a separate patch.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
1958

I think it maybe needs to be 33 bits. Sign bits are always off by one. Can you add some tests for the edge cases?

I think it has to be 32 bits as we are comparing that number of sign bit in mul operand is greater than 32. The test smull_ldrsw_w(), fails with a value of 33 as it is an edge case with a i32 operand.

define i64 @smull_ldrsw_w(i32* %x0, i32 %x1) {
; CHECK-LABEL: smull_ldrsw_w:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    ldrsw x8, [x0]
; CHECK-NEXT:    // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT:    sxtw x9, w1
; CHECK-NEXT:    mul x0, x8, x9
; CHECK-NEXT:    ret
entry:
  %ext64 = load i32, i32* %x0
  %sext = sext i32 %ext64 to i64
  %sext4 = sext i32 %x1 to i64
  %mul = mul i64 %sext, %sext4
  ret i64 %mul
}
dmgreen added inline comments.Nov 29 2022, 9:13 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
1958

Hmm. I don't think this case should transform:

define i64 @t31(i32 %a, i64 %b) nounwind {
; CHECK-LABEL: t31:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    asr x8, x1, #31
; CHECK-NEXT:    smull x0, w8, w0
; CHECK-NEXT:    ret
entry:
  %tmp1 = sext i32 %a to i64
  %c = ashr i64 %b, 31
  %tmp3 = mul i64 %tmp1, %c
  ret i64 %tmp3
}

An ashr by 32 would be OK. Same for the constant in @t10. Could it be that the smullwithonesignbits is treated as commutative, and because it is is matching 32 sign bits from the sext? This version isn't being transformed:

define i64 @t312(i64 %a, i64 %b) nounwind {
; CHECK-LABEL: t312:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    asr x8, x0, #31
; CHECK-NEXT:    asr x9, x1, #31
; CHECK-NEXT:    mul x0, x8, x9
; CHECK-NEXT:    ret
entry:
  %tmp1 = ashr i64 %a, 31
  %c = ashr i64 %b, 31
  %tmp3 = mul i64 %tmp1, %c
  ret i64 %tmp3
}

If that is the case then all the patterns could use smullwithsignbits, I think.

bipmis updated this revision to Diff 478635.Nov 29 2022, 10:41 AM
bipmis marked an inline comment as done.

Fixed for scenario where smullwithonesignbits is treated as commutative.
Added corner case tests where signBits<32

bipmis marked an inline comment as done.Nov 29 2022, 10:45 AM
bipmis added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.td
1958

Hmm. I don't think this case should transform:
If that is the case then all the patterns could use smullwithsignbits, I think.

Agreed. I think that was the case really. However, we would need a sign extend check with smullwithsignbits as otherwise we would end up having an additional sign-extend in case where operand to a sign extend is a i32.

dmgreen accepted this revision.Nov 30 2022, 3:01 AM

Thanks. LGTM

This revision is now accepted and ready to land.Nov 30 2022, 3:01 AM
This revision was automatically updated to reflect the committed changes.
bipmis marked an inline comment as done.