This is an archive of the discontinued LLVM Phabricator instance.

[AAch64] Optimize muls with operands having enough sign bits.
ClosedPublic

Authored by bipmis on Nov 28 2022, 7:07 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samtebbs

Commits

rG081b7f6b0313: [AAch64] Optimize muls with operands having enough sign bits.

Summary

Muls with 64bit operands where each of the operand is having more than 32 sign bits, we can generate a single smull instruction on a 32bit operand.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bipmis requested review of this revision.Nov 28 2022, 7:07 AM

bipmis created this revision.

Harbormaster completed remote builds in B199759: Diff 478232.Nov 28 2022, 8:09 AM

Could this apply to umull too? We should also (not necessarily in this commit) look into improving GlobalISel too. I think it should have enough info nowadays to perform the same ComputeNumSignBits check.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
1914	Could the same thing be done for SMSUBLrrr too? And the additional add/sub patterns below too.
1957	Can the Pat be moved into the above block, and the PatFrag be moved maybe closer to the add_and_or_is_add existing PatFrag.
1958	I think it maybe needs to be 33 bits. Sign bits are always off by one. Can you add some tests for the edge cases?
llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll
92	We can remove the nsw from the mul. Do you have any tests for the commuted form of this too?

Update the patch with support to madd,msub. Handle review comments.

In D138817#3955002, @dmgreen wrote:

Could this apply to umull too? We should also (not necessarily in this commit) look into improving GlobalISel too. I think it should have enough info nowadays to perform the same ComputeNumSignBits check.

I think in this patch we are looking for handling the muls with sign-bits. We can possibly handle the umull scenario in a separate patch.

llvm/lib/Target/AArch64/AArch64InstrInfo.td

1958

I think it maybe needs to be 33 bits. Sign bits are always off by one. Can you add some tests for the edge cases?

I think it has to be 32 bits as we are comparing that number of sign bit in mul operand is greater than 32. The test smull_ldrsw_w(), fails with a value of 33 as it is an edge case with a i32 operand.

define i64 @smull_ldrsw_w(i32* %x0, i32 %x1) {
; CHECK-LABEL: smull_ldrsw_w:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    ldrsw x8, [x0]
; CHECK-NEXT:    // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT:    sxtw x9, w1
; CHECK-NEXT:    mul x0, x8, x9
; CHECK-NEXT:    ret
entry:
  %ext64 = load i32, i32* %x0
  %sext = sext i32 %ext64 to i64
  %sext4 = sext i32 %x1 to i64
  %mul = mul i64 %sext, %sext4
  ret i64 %mul
}

Harbormaster completed remote builds in B200007: Diff 478560.Nov 29 2022, 7:48 AM

dmgreen added inline comments.Nov 29 2022, 9:13 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td

1958

Hmm. I don't think this case should transform:

define i64 @t31(i32 %a, i64 %b) nounwind {
; CHECK-LABEL: t31:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    asr x8, x1, #31
; CHECK-NEXT:    smull x0, w8, w0
; CHECK-NEXT:    ret
entry:
  %tmp1 = sext i32 %a to i64
  %c = ashr i64 %b, 31
  %tmp3 = mul i64 %tmp1, %c
  ret i64 %tmp3
}

An ashr by 32 would be OK. Same for the constant in @t10. Could it be that the smullwithonesignbits is treated as commutative, and because it is is matching 32 sign bits from the sext? This version isn't being transformed:

define i64 @t312(i64 %a, i64 %b) nounwind {
; CHECK-LABEL: t312:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    asr x8, x0, #31
; CHECK-NEXT:    asr x9, x1, #31
; CHECK-NEXT:    mul x0, x8, x9
; CHECK-NEXT:    ret
entry:
  %tmp1 = ashr i64 %a, 31
  %c = ashr i64 %b, 31
  %tmp3 = mul i64 %tmp1, %c
  ret i64 %tmp3
}

If that is the case then all the patterns could use smullwithsignbits, I think.

Fixed for scenario where smullwithonesignbits is treated as commutative.
Added corner case tests where signBits<32

bipmis marked an inline comment as done.Nov 29 2022, 10:45 AM

bipmis added inline comments.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
1958	Hmm. I don't think this case should transform: If that is the case then all the patterns could use smullwithsignbits, I think. Agreed. I think that was the case really. However, we would need a sign extend check with smullwithsignbits as otherwise we would end up having an additional sign-extend in case where operand to a sign extend is a i32.

Harbormaster completed remote builds in B200059: Diff 478635.Nov 29 2022, 11:47 AM

Thanks. LGTM

This revision is now accepted and ready to land.Nov 30 2022, 3:01 AM

Closed by commit rG081b7f6b0313: [AAch64] Optimize muls with operands having enough sign bits. (authored by bipmis). · Explain WhyDec 5 2022, 7:09 AM

This revision was automatically updated to reflect the committed changes.

bipmis marked an inline comment as done.

bipmis added a commit: rG081b7f6b0313: [AAch64] Optimize muls with operands having enough sign bits..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.td

26 lines

test/

CodeGen/

AArch64/

aarch64-mull-masks.ll

828 lines

aarch64-smull.ll

8 lines

Diff 480098

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 789 Lines • ▼ Show 20 Lines
}]> {		}]> {
let GISelPredicateCode = [{		let GISelPredicateCode = [{
// Only handle G_ADD for now. FIXME. build capability to compute whether		// Only handle G_ADD for now. FIXME. build capability to compute whether
// operands of G_OR have common bits set or not.		// operands of G_OR have common bits set or not.
return MI.getOpcode() == TargetOpcode::G_ADD;		return MI.getOpcode() == TargetOpcode::G_ADD;
}];		}];
}		}

		// Match mul with enough sign-bits. Can be reduced to a smaller mul operand.
		def smullwithsignbits : PatFrag<(ops node:$l, node:$r), (mul node:$l, node:$r), [{
		return CurDAG->ComputeNumSignBits(N->getOperand(0)) > 32 &&
		CurDAG->ComputeNumSignBits(N->getOperand(1)) > 32;
		}]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// AArch64 Instruction Predicate Definitions.		// AArch64 Instruction Predicate Definitions.
// We could compute these on a per-module basis but doing so requires accessing		// We could compute these on a per-module basis but doing so requires accessing
// the Function object through the <Target>Subtarget and objections were raised		// the Function object through the <Target>Subtarget and objections were raised
// to that (see post-commit review comments for r301750).		// to that (see post-commit review comments for r301750).
▲ Show 20 Lines • Show All 1,094 Lines • ▼ Show 20 Lines	def : Pat<(i64 (mul (sext_inreg GPR64:$Rn, i32), (s64imm_32bit:$C))),
(SMADDLrrr (i32 (EXTRACT_SUBREG GPR64:$Rn, sub_32)),		(SMADDLrrr (i32 (EXTRACT_SUBREG GPR64:$Rn, sub_32)),
(MOVi32imm (trunc_imm imm:$C)), XZR)>;		(MOVi32imm (trunc_imm imm:$C)), XZR)>;

def : Pat<(i64 (ineg (mul (sext GPR32:$Rn), (s64imm_32bit:$C)))),		def : Pat<(i64 (ineg (mul (sext GPR32:$Rn), (s64imm_32bit:$C)))),
(SMSUBLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), XZR)>;		(SMSUBLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), XZR)>;
def : Pat<(i64 (ineg (mul (zext GPR32:$Rn), (i64imm_32bit:$C)))),		def : Pat<(i64 (ineg (mul (zext GPR32:$Rn), (i64imm_32bit:$C)))),
(UMSUBLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), XZR)>;		(UMSUBLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), XZR)>;
def : Pat<(i64 (ineg (mul (sext_inreg GPR64:$Rn, i32), (s64imm_32bit:$C)))),		def : Pat<(i64 (ineg (mul (sext_inreg GPR64:$Rn, i32), (s64imm_32bit:$C)))),
(SMSUBLrrr (i32 (EXTRACT_SUBREG GPR64:$Rn, sub_32)),		(SMSUBLrrr (i32 (EXTRACT_SUBREG GPR64:$Rn, sub_32)),
		dmgreenUnsubmitted Done Reply Inline Actions Could the same thing be done for SMSUBLrrr too? And the additional add/sub patterns below too. dmgreen: Could the same thing be done for SMSUBLrrr too? And the additional add/sub patterns below too.
(MOVi32imm (trunc_imm imm:$C)), XZR)>;		(MOVi32imm (trunc_imm imm:$C)), XZR)>;

def : Pat<(i64 (add (mul (sext GPR32:$Rn), (s64imm_32bit:$C)), GPR64:$Ra)),		def : Pat<(i64 (add (mul (sext GPR32:$Rn), (s64imm_32bit:$C)), GPR64:$Ra)),
(SMADDLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;		(SMADDLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;
def : Pat<(i64 (add (mul (zext GPR32:$Rn), (i64imm_32bit:$C)), GPR64:$Ra)),		def : Pat<(i64 (add (mul (zext GPR32:$Rn), (i64imm_32bit:$C)), GPR64:$Ra)),
(UMADDLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;		(UMADDLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;
def : Pat<(i64 (add (mul (sext_inreg GPR64:$Rn, i32), (s64imm_32bit:$C)),		def : Pat<(i64 (add (mul (sext_inreg GPR64:$Rn, i32), (s64imm_32bit:$C)),
GPR64:$Ra)),		GPR64:$Ra)),
(SMADDLrrr (i32 (EXTRACT_SUBREG GPR64:$Rn, sub_32)),		(SMADDLrrr (i32 (EXTRACT_SUBREG GPR64:$Rn, sub_32)),
(MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;		(MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;

def : Pat<(i64 (sub GPR64:$Ra, (mul (sext GPR32:$Rn), (s64imm_32bit:$C)))),		def : Pat<(i64 (sub GPR64:$Ra, (mul (sext GPR32:$Rn), (s64imm_32bit:$C)))),
(SMSUBLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;		(SMSUBLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;
def : Pat<(i64 (sub GPR64:$Ra, (mul (zext GPR32:$Rn), (i64imm_32bit:$C)))),		def : Pat<(i64 (sub GPR64:$Ra, (mul (zext GPR32:$Rn), (i64imm_32bit:$C)))),
(UMSUBLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;		(UMSUBLrrr GPR32:$Rn, (MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;
def : Pat<(i64 (sub GPR64:$Ra, (mul (sext_inreg GPR64:$Rn, i32),		def : Pat<(i64 (sub GPR64:$Ra, (mul (sext_inreg GPR64:$Rn, i32),
(s64imm_32bit:$C)))),		(s64imm_32bit:$C)))),
(SMSUBLrrr (i32 (EXTRACT_SUBREG GPR64:$Rn, sub_32)),		(SMSUBLrrr (i32 (EXTRACT_SUBREG GPR64:$Rn, sub_32)),
(MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;		(MOVi32imm (trunc_imm imm:$C)), GPR64:$Ra)>;

		def : Pat<(i64 (smullwithsignbits GPR64:$Rn, GPR64:$Rm)),
		(SMADDLrrr (EXTRACT_SUBREG $Rn, sub_32), (EXTRACT_SUBREG $Rm, sub_32), XZR)>;
		def : Pat<(i64 (smullwithsignbits GPR64:$Rn, (sext GPR32:$Rm))),
		(SMADDLrrr (EXTRACT_SUBREG $Rn, sub_32), $Rm, XZR)>;

		def : Pat<(i64 (add (smullwithsignbits GPR64:$Rn, GPR64:$Rm), GPR64:$Ra)),
		(SMADDLrrr (EXTRACT_SUBREG $Rn, sub_32), (EXTRACT_SUBREG $Rm, sub_32), GPR64:$Ra)>;
		def : Pat<(i64 (add (smullwithsignbits GPR64:$Rn, (sext GPR32:$Rm)), GPR64:$Ra)),
		(SMADDLrrr (EXTRACT_SUBREG $Rn, sub_32), $Rm, GPR64:$Ra)>;

		def : Pat<(i64 (ineg (smullwithsignbits GPR64:$Rn, GPR64:$Rm))),
		(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), (EXTRACT_SUBREG $Rm, sub_32), XZR)>;
		def : Pat<(i64 (ineg (smullwithsignbits GPR64:$Rn, (sext GPR32:$Rm)))),
		(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), $Rm, XZR)>;

		def : Pat<(i64 (sub GPR64:$Ra, (smullwithsignbits GPR64:$Rn, GPR64:$Rm))),
		(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), (EXTRACT_SUBREG $Rm, sub_32), GPR64:$Ra)>;
		def : Pat<(i64 (sub GPR64:$Ra, (smullwithsignbits GPR64:$Rn, (sext GPR32:$Rm)))),
		(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), $Rm, GPR64:$Ra)>;
} // AddedComplexity = 5		} // AddedComplexity = 5

def : MulAccumWAlias<"mul", MADDWrrr>;		def : MulAccumWAlias<"mul", MADDWrrr>;
def : MulAccumXAlias<"mul", MADDXrrr>;		def : MulAccumXAlias<"mul", MADDXrrr>;
		dmgreenUnsubmitted Done Reply Inline Actions Can the Pat be moved into the above block, and the PatFrag be moved maybe closer to the add_and_or_is_add existing PatFrag. dmgreen: Can the Pat be moved into the above block, and the PatFrag be moved maybe closer to the…
def : MulAccumWAlias<"mneg", MSUBWrrr>;		def : MulAccumWAlias<"mneg", MSUBWrrr>;
		dmgreenUnsubmitted Done Reply Inline Actions I think it maybe needs to be 33 bits. Sign bits are always off by one. Can you add some tests for the edge cases? dmgreen: I think it maybe needs to be 33 bits. Sign bits are always off by one. Can you add some tests…
		bipmisAuthorUnsubmitted Done Reply Inline Actions I think it maybe needs to be 33 bits. Sign bits are always off by one. Can you add some tests for the edge cases? I think it has to be 32 bits as we are comparing that number of sign bit in mul operand is greater than 32. The test smull_ldrsw_w(), fails with a value of 33 as it is an edge case with a i32 operand. define i64 @smull_ldrsw_w(i32* %x0, i32 %x1) { ; CHECK-LABEL: smull_ldrsw_w: ; CHECK: // %bb.0: // %entry ; CHECK-NEXT: ldrsw x8, [x0] ; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1 ; CHECK-NEXT: sxtw x9, w1 ; CHECK-NEXT: mul x0, x8, x9 ; CHECK-NEXT: ret entry: %ext64 = load i32, i32* %x0 %sext = sext i32 %ext64 to i64 %sext4 = sext i32 %x1 to i64 %mul = mul i64 %sext, %sext4 ret i64 %mul } bipmis: > I think it maybe needs to be 33 bits. Sign bits are always off by one. Can you add some tests…
		dmgreenUnsubmitted Done Reply Inline Actions Hmm. I don't think this case should transform: define i64 @t31(i32 %a, i64 %b) nounwind { ; CHECK-LABEL: t31: ; CHECK: // %bb.0: // %entry ; CHECK-NEXT: asr x8, x1, #31 ; CHECK-NEXT: smull x0, w8, w0 ; CHECK-NEXT: ret entry: %tmp1 = sext i32 %a to i64 %c = ashr i64 %b, 31 %tmp3 = mul i64 %tmp1, %c ret i64 %tmp3 } An ashr by 32 would be OK. Same for the constant in @t10. Could it be that the smullwithonesignbits is treated as commutative, and because it is is matching 32 sign bits from the sext? This version isn't being transformed: define i64 @t312(i64 %a, i64 %b) nounwind { ; CHECK-LABEL: t312: ; CHECK: // %bb.0: // %entry ; CHECK-NEXT: asr x8, x0, #31 ; CHECK-NEXT: asr x9, x1, #31 ; CHECK-NEXT: mul x0, x8, x9 ; CHECK-NEXT: ret entry: %tmp1 = ashr i64 %a, 31 %c = ashr i64 %b, 31 %tmp3 = mul i64 %tmp1, %c ret i64 %tmp3 } If that is the case then all the patterns could use smullwithsignbits, I think. dmgreen: Hmm. I don't think this case should transform: ``` define i64 @t31(i32 %a, i64 %b) nounwind {…
		bipmisAuthorUnsubmitted Done Reply Inline Actions Hmm. I don't think this case should transform: If that is the case then all the patterns could use smullwithsignbits, I think. Agreed. I think that was the case really. However, we would need a sign extend check with smullwithsignbits as otherwise we would end up having an additional sign-extend in case where operand to a sign extend is a i32. bipmis: > Hmm. I don't think this case should transform: > If that is the case then all the patterns…
def : MulAccumXAlias<"mneg", MSUBXrrr>;		def : MulAccumXAlias<"mneg", MSUBXrrr>;
def : WideMulAccumAlias<"smull", SMADDLrrr>;		def : WideMulAccumAlias<"smull", SMADDLrrr>;
def : WideMulAccumAlias<"smnegl", SMSUBLrrr>;		def : WideMulAccumAlias<"smnegl", SMSUBLrrr>;
def : WideMulAccumAlias<"umull", UMADDLrrr>;		def : WideMulAccumAlias<"umull", UMADDLrrr>;
def : WideMulAccumAlias<"umnegl", UMSUBLrrr>;		def : WideMulAccumAlias<"umnegl", UMSUBLrrr>;

// Multiply-high		// Multiply-high
def SMULHrr : MulHi<0b010, "smulh", mulhs>;		def SMULHrr : MulHi<0b010, "smulh", mulhs>;
▲ Show 20 Lines • Show All 6,766 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shl = shl i64 %x, 32			%shl = shl i64 %x, 32
	%shr = ashr exact i64 %shl, 32			%shr = ashr exact i64 %shl, 32
	%conv = sext i32 %y to i64			%conv = sext i32 %y to i64
	%mul = mul nsw i64 %conv, %shr			%mul = mul nsw i64 %conv, %shr
	ret i64 %mul			ret i64 %mul
	}			}

				define i64 @smull_ldrsb_b(i8* %x0, i8 %x1) {
				; CHECK-LABEL: smull_ldrsb_b:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i8 %x1 to i64
				%mul = mul i64 %sext, %sext4
				dmgreenUnsubmitted Done Reply Inline Actions We can remove the nsw from the mul. Do you have any tests for the commuted form of this too? dmgreen: We can remove the nsw from the mul. Do you have any tests for the commuted form of this too?
				ret i64 %mul
				}

				define i64 @smull_ldrsb_b_commuted(i8* %x0, i8 %x1) {
				; CHECK-LABEL: smull_ldrsb_b_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smull x0, w9, w8
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i8 %x1 to i64
				%mul = mul i64 %sext4, %sext
				ret i64 %mul
				}

				define i64 @smull_ldrsb_h(i8* %x0, i16 %x1) {
				; CHECK-LABEL: smull_ldrsb_h:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsb_w(i8* %x0, i32 %x1) {
				; CHECK-LABEL: smull_ldrsb_w:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: smull x0, w8, w1
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsh_b(i16* %x0, i8 %x1) {
				; CHECK-LABEL: smull_ldrsh_b:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i8 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsh_h(i16* %x0, i16 %x1) {
				; CHECK-LABEL: smull_ldrsh_h:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsh_h_commuted(i16* %x0, i16 %x1) {
				; CHECK-LABEL: smull_ldrsh_h_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smull x0, w9, w8
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext4, %sext
				ret i64 %mul
				}

				define i64 @smull_ldrsh_w(i16* %x0, i32 %x1) {
				; CHECK-LABEL: smull_ldrsh_w:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: smull x0, w8, w1
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsw_b(i32* %x0, i8 %x1) {
				; CHECK-LABEL: smull_ldrsw_b:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext4 = sext i8 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsw_h(i32* %x0, i16 %x1) {
				; CHECK-LABEL: smull_ldrsw_h:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsw_w(i32* %x0, i32 %x1) {
				; CHECK-LABEL: smull_ldrsw_w:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: smull x0, w8, w1
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsw_w_commuted(i32* %x0, i32 %x1) {
				; CHECK-LABEL: smull_ldrsw_w_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: smull x0, w8, w1
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext4, %sext
				ret i64 %mul
				}

				define i64 @smull_sext_bb(i8 %x0, i8 %x1) {
				; CHECK-LABEL: smull_sext_bb:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxtb x8, w0
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%sext = sext i8 %x0 to i64
				%sext4 = sext i8 %x1 to i64
				%mul = mul i64 %sext, %sext4
				ret i64 %mul
				}

				define i64 @smull_ldrsw_shift(i32* %x0, i64 %x1) {
				; CHECK-LABEL: smull_ldrsw_shift:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: sxtw x9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%shl = shl i64 %x1, 32
				%shr = ashr exact i64 %shl, 32
				%mul = mul i64 %sext, %shr
				ret i64 %mul
				}

				define i64 @smull_ldrsh_zextw(i16* %x0, i32 %x1) {
				; CHECK-LABEL: smull_ldrsh_zextw:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: mov w9, w1
				; CHECK-NEXT: mul x0, x8, x9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%zext = zext i32 %x1 to i64
				%mul = mul i64 %sext, %zext
				ret i64 %mul
				}

				define i64 @smull_ldrsw_zexth(i32* %x0, i16 %x1) {
				; CHECK-LABEL: smull_ldrsw_zexth:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: and x9, x1, #0xffff
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%zext = zext i16 %x1 to i64
				%mul = mul i64 %sext, %zext
				ret i64 %mul
				}

				define i64 @smull_ldrsw_zextb(i32* %x0, i8 %x1) {
				; CHECK-LABEL: smull_ldrsw_zextb:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: and x9, x1, #0xff
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%zext = zext i8 %x1 to i64
				%mul = mul i64 %sext, %zext
				ret i64 %mul
				}

				define i64 @smull_ldrsw_zextb_commuted(i32* %x0, i8 %x1) {
				; CHECK-LABEL: smull_ldrsw_zextb_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: and x9, x1, #0xff
				; CHECK-NEXT: smull x0, w9, w8
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%zext = zext i8 %x1 to i64
				%mul = mul i64 %zext, %sext
				ret i64 %mul
				}

				define i64 @smaddl_ldrsb_h(i8* %x0, i16 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsb_h:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smaddl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext4
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_ldrsb_h_commuted(i8* %x0, i16 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsb_h_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smaddl x0, w9, w8, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext4, %sext
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_ldrsh_w(i16* %x0, i32 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsh_w:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: smaddl x0, w8, w1, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext, %sext4
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_ldrsh_w_commuted(i16* %x0, i32 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsh_w_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: smaddl x0, w8, w1, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext4, %sext
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_ldrsw_b(i32* %x0, i8 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsw_b:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smaddl x0, w8, w9, x2
				; CHECK-NEXT: ret
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i8 %x1 to i64
				%mul = mul i64 %sext, %sext2
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_ldrsw_b_commuted(i32* %x0, i8 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsw_b_commuted:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smaddl x0, w9, w8, x2
				; CHECK-NEXT: ret
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i8 %x1 to i64
				%mul = mul i64 %sext2, %sext
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_ldrsw_ldrsw(i32* %x0, i32* %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsw_ldrsw:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: ldrsw x9, [x1]
				; CHECK-NEXT: smaddl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%ext64_2 = load i32, i32* %x1
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i32 %ext64_2 to i64
				%mul = mul i64 %sext, %sext2
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_sext_hh(i16 %x0, i16 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_sext_hh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxth x8, w0
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smaddl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%sext = sext i16 %x0 to i64
				%sext2 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext2
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_ldrsw_shift(i32* %x0, i64 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsw_shift:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: sxtw x9, w1
				; CHECK-NEXT: smaddl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%shl = shl i64 %x1, 32
				%shr = ashr exact i64 %shl, 32
				%mul = mul i64 %sext, %shr
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smaddl_ldrsw_zextb(i32* %x0, i8 %x1, i64 %x2) {
				; CHECK-LABEL: smaddl_ldrsw_zextb:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: and x9, x1, #0xff
				; CHECK-NEXT: smaddl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%zext = zext i8 %x1 to i64
				%mul = mul i64 %sext, %zext
				%add = add i64 %x2, %mul
				ret i64 %add
				}

				define i64 @smnegl_ldrsb_h(i8* %x0, i16 %x1) {
				; CHECK-LABEL: smnegl_ldrsb_h:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smnegl x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext4
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_ldrsb_h_commuted(i8* %x0, i16 %x1) {
				; CHECK-LABEL: smnegl_ldrsb_h_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smnegl x0, w9, w8
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext4, %sext
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_ldrsh_w(i16* %x0, i32 %x1) {
				; CHECK-LABEL: smnegl_ldrsh_w:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: smnegl x0, w8, w1
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext, %sext4
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_ldrsh_w_commuted(i16* %x0, i32 %x1) {
				; CHECK-LABEL: smnegl_ldrsh_w_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: smnegl x0, w8, w1
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext4, %sext
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_ldrsw_b(i32* %x0, i8 %x1) {
				; CHECK-LABEL: smnegl_ldrsw_b:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smnegl x0, w8, w9
				; CHECK-NEXT: ret
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i8 %x1 to i64
				%mul = mul i64 %sext, %sext2
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_ldrsw_b_commuted(i32* %x0, i8 %x1) {
				; CHECK-LABEL: smnegl_ldrsw_b_commuted:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smnegl x0, w9, w8
				; CHECK-NEXT: ret
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i8 %x1 to i64
				%mul = mul i64 %sext2, %sext
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_ldrsw_ldrsw(i32* %x0, i32* %x1) {
				; CHECK-LABEL: smnegl_ldrsw_ldrsw:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: ldrsw x9, [x1]
				; CHECK-NEXT: smnegl x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%ext64_2 = load i32, i32* %x1
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i32 %ext64_2 to i64
				%mul = mul i64 %sext, %sext2
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_sext_hh(i16 %x0, i16 %x1) {
				; CHECK-LABEL: smnegl_sext_hh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxth x8, w0
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smnegl x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%sext = sext i16 %x0 to i64
				%sext2 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext2
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_ldrsw_shift(i32* %x0, i64 %x1) {
				; CHECK-LABEL: smnegl_ldrsw_shift:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: sxtw x9, w1
				; CHECK-NEXT: smnegl x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%shl = shl i64 %x1, 32
				%shr = ashr exact i64 %shl, 32
				%mul = mul i64 %sext, %shr
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smnegl_ldrsw_zextb(i32* %x0, i8 %x1) {
				; CHECK-LABEL: smnegl_ldrsw_zextb:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: and x9, x1, #0xff
				; CHECK-NEXT: smnegl x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%zext = zext i8 %x1 to i64
				%mul = mul i64 %sext, %zext
				%sub = sub i64 0, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsb_h(i8* %x0, i16 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsb_h:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smsubl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext4
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsb_h_commuted(i8* %x0, i16 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsb_h_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsb x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smsubl x0, w9, w8, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i8, i8* %x0
				%sext = sext i8 %ext64 to i64
				%sext4 = sext i16 %x1 to i64
				%mul = mul i64 %sext4, %sext
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsh_w(i16* %x0, i32 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsh_w:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: smsubl x0, w8, w1, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext, %sext4
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsh_w_commuted(i16* %x0, i32 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsh_w_commuted:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsh x8, [x0]
				; CHECK-NEXT: smsubl x0, w8, w1, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i16, i16* %x0
				%sext = sext i16 %ext64 to i64
				%sext4 = sext i32 %x1 to i64
				%mul = mul i64 %sext4, %sext
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsw_b(i32* %x0, i8 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsw_b:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smsubl x0, w8, w9, x2
				; CHECK-NEXT: ret
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i8 %x1 to i64
				%mul = mul i64 %sext, %sext2
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsw_b_commuted(i32* %x0, i8 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsw_b_commuted:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: sxtb x9, w1
				; CHECK-NEXT: smsubl x0, w9, w8, x2
				; CHECK-NEXT: ret
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i8 %x1 to i64
				%mul = mul i64 %sext2, %sext
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsw_ldrsw(i32* %x0, i32* %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsw_ldrsw:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: ldrsw x9, [x1]
				; CHECK-NEXT: smsubl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%ext64_2 = load i32, i32* %x1
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i32 %ext64_2 to i64
				%mul = mul i64 %sext, %sext2
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_sext_hh(i16 %x0, i16 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_sext_hh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxth x8, w0
				; CHECK-NEXT: sxth x9, w1
				; CHECK-NEXT: smsubl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%sext = sext i16 %x0 to i64
				%sext2 = sext i16 %x1 to i64
				%mul = mul i64 %sext, %sext2
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsw_shift(i32* %x0, i64 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsw_shift:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: sxtw x9, w1
				; CHECK-NEXT: smsubl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%shl = shl i64 %x1, 32
				%shr = ashr exact i64 %shl, 32
				%mul = mul i64 %sext, %shr
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smsubl_ldrsw_zextb(i32* %x0, i8 %x1, i64 %x2) {
				; CHECK-LABEL: smsubl_ldrsw_zextb:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x8, [x0]
				; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: and x9, x1, #0xff
				; CHECK-NEXT: smsubl x0, w8, w9, x2
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%zext = zext i8 %x1 to i64
				%mul = mul i64 %sext, %zext
				%sub = sub i64 %x2, %mul
				ret i64 %sub
				}

				define i64 @smull_sext_ashr31(i32 %a, i64 %b) nounwind {
				; CHECK-LABEL: smull_sext_ashr31:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxtw x8, w0
				; CHECK-NEXT: asr x9, x1, #31
				; CHECK-NEXT: mul x0, x8, x9
				; CHECK-NEXT: ret
				entry:
				%tmp1 = sext i32 %a to i64
				%c = ashr i64 %b, 31
				%tmp3 = mul i64 %tmp1, %c
				ret i64 %tmp3
				}

				define i64 @smull_sext_ashr32(i32 %a, i64 %b) nounwind {
				; CHECK-LABEL: smull_sext_ashr32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: asr x8, x1, #32
				; CHECK-NEXT: smull x0, w8, w0
				; CHECK-NEXT: ret
				entry:
				%tmp1 = sext i32 %a to i64
				%c = ashr i64 %b, 32
				%tmp3 = mul i64 %tmp1, %c
				ret i64 %tmp3
				}


				define i64 @smull_ashr31_both(i64 %a, i64 %b) nounwind {
				; CHECK-LABEL: smull_ashr31_both:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: asr x8, x0, #31
				; CHECK-NEXT: asr x9, x1, #31
				; CHECK-NEXT: mul x0, x8, x9
				; CHECK-NEXT: ret
				entry:
				%tmp1 = ashr i64 %a, 31
				%c = ashr i64 %b, 31
				%tmp3 = mul i64 %tmp1, %c
				ret i64 %tmp3
				}

				define i64 @smull_ashr32_both(i64 %a, i64 %b) nounwind {
				; CHECK-LABEL: smull_ashr32_both:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: asr x8, x0, #32
				; CHECK-NEXT: asr x9, x1, #32
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%tmp1 = ashr i64 %a, 32
				%c = ashr i64 %b, 32
				%tmp3 = mul i64 %tmp1, %c
				ret i64 %tmp3
				}

llvm/test/CodeGen/AArch64/aarch64-smull.ll

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: smull_zext_v2i32_v2i64:			; CHECK-LABEL: smull_zext_v2i32_v2i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr d0, [x1]			; CHECK-NEXT: ldr d0, [x1]
	; CHECK-NEXT: ldrh w8, [x0]			; CHECK-NEXT: ldrh w8, [x0]
	; CHECK-NEXT: ldrh w11, [x0, #2]			; CHECK-NEXT: ldrh w11, [x0, #2]
	; CHECK-NEXT: sshll v0.2d, v0.2s, #0			; CHECK-NEXT: sshll v0.2d, v0.2s, #0
	; CHECK-NEXT: fmov x9, d0			; CHECK-NEXT: fmov x9, d0
	; CHECK-NEXT: mov x10, v0.d[1]			; CHECK-NEXT: mov x10, v0.d[1]
	; CHECK-NEXT: mul x8, x8, x9			; CHECK-NEXT: smull x8, w8, w9
	; CHECK-NEXT: mul x9, x11, x10			; CHECK-NEXT: smull x9, w11, w10
	; CHECK-NEXT: fmov d0, x8			; CHECK-NEXT: fmov d0, x8
	; CHECK-NEXT: mov v0.d[1], x9			; CHECK-NEXT: mov v0.d[1], x9
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load.A = load <2 x i16>, <2 x i16>* %A			%load.A = load <2 x i16>, <2 x i16>* %A
	%load.B = load <2 x i32>, <2 x i32>* %B			%load.B = load <2 x i32>, <2 x i32>* %B
	%zext.A = zext <2 x i16> %load.A to <2 x i64>			%zext.A = zext <2 x i16> %load.A to <2 x i64>
	%sext.B = sext <2 x i32> %load.B to <2 x i64>			%sext.B = sext <2 x i32> %load.B to <2 x i64>
	%res = mul <2 x i64> %zext.A, %sext.B			%res = mul <2 x i64> %zext.A, %sext.B
	ret <2 x i64> %res			ret <2 x i64> %res
	}			}

	define <2 x i64> @smull_zext_and_v2i32_v2i64(<2 x i32>* %A, <2 x i32>* %B) nounwind {			define <2 x i64> @smull_zext_and_v2i32_v2i64(<2 x i32>* %A, <2 x i32>* %B) nounwind {
	; CHECK-LABEL: smull_zext_and_v2i32_v2i64:			; CHECK-LABEL: smull_zext_and_v2i32_v2i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr d0, [x0]			; CHECK-NEXT: ldr d0, [x0]
	; CHECK-NEXT: ldr d1, [x1]			; CHECK-NEXT: ldr d1, [x1]
	; CHECK-NEXT: bic v0.2s, #128, lsl #24			; CHECK-NEXT: bic v0.2s, #128, lsl #24
	; CHECK-NEXT: sshll v1.2d, v1.2s, #0			; CHECK-NEXT: sshll v1.2d, v1.2s, #0
	; CHECK-NEXT: ushll v0.2d, v0.2s, #0			; CHECK-NEXT: ushll v0.2d, v0.2s, #0
	; CHECK-NEXT: fmov x9, d1			; CHECK-NEXT: fmov x9, d1
	; CHECK-NEXT: fmov x10, d0			; CHECK-NEXT: fmov x10, d0
	; CHECK-NEXT: mov x8, v1.d[1]			; CHECK-NEXT: mov x8, v1.d[1]
	; CHECK-NEXT: mov x11, v0.d[1]			; CHECK-NEXT: mov x11, v0.d[1]
	; CHECK-NEXT: mul x9, x10, x9			; CHECK-NEXT: smull x9, w10, w9
	; CHECK-NEXT: mul x8, x11, x8			; CHECK-NEXT: smull x8, w11, w8
	; CHECK-NEXT: fmov d0, x9			; CHECK-NEXT: fmov d0, x9
	; CHECK-NEXT: mov v0.d[1], x8			; CHECK-NEXT: mov v0.d[1], x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load.A = load <2 x i32>, <2 x i32>* %A			%load.A = load <2 x i32>, <2 x i32>* %A
	%and.A = and <2 x i32> %load.A, <i32 u0x7FFFFFFF, i32 u0x7FFFFFFF>			%and.A = and <2 x i32> %load.A, <i32 u0x7FFFFFFF, i32 u0x7FFFFFFF>
	%load.B = load <2 x i32>, <2 x i32>* %B			%load.B = load <2 x i32>, <2 x i32>* %B
	%zext.A = zext <2 x i32> %and.A to <2 x i64>			%zext.A = zext <2 x i32> %and.A to <2 x i64>
	%sext.B = sext <2 x i32> %load.B to <2 x i64>			%sext.B = sext <2 x i32> %load.B to <2 x i64>
	▲ Show 20 Lines • Show All 755 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AAch64] Optimize muls with operands having enough sign bits.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 480098

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll

llvm/test/CodeGen/AArch64/aarch64-smull.ll

[AAch64] Optimize muls with operands having enough sign bits.
ClosedPublic