This is an archive of the discontinued LLVM Phabricator instance.

[AAch64] Optimize muls with operands having enough sign bits. One operand is a sub.
AbandonedPublic

Authored by bipmis on Dec 6 2022, 3:58 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samtebbs

Summary

Muls with 64bit operands where one of the operands is a register with enough sign bits
The other operand is a sub with enough sign bits.
We can generate a 32bit sub and a single smull instruction on a 32bit operand.

Diff Detail

Unit TestsFailed

	Time	Test
	60,040 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,050 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test

Event Timeline

bipmis requested review of this revision.Dec 6 2022, 3:58 AM

bipmis created this revision.

Can you give some more details about why is this true? I would expect the sub to have 31 sign bits.

The mul in submulwithsignbits will be commutative, so will match either way. The code needs to account for that I think, not just check for operand(1).

In D139413#3974025, @dmgreen wrote:

Can you give some more details about why is this true? I would expect the sub to have 31 sign bits.

The mul in submulwithsignbits will be commutative, so will match either way. The code needs to account for that I think, not just check for operand(1).

Basically if looked from IR perspective we are trying to implement

define i64 @smull_sext_sub(i32* %x0, i32 %x1, i32 %x2) {
entry:
  %ext64 = load i32, i32* %x0
  %sext = sext i32 %ext64 to i64
  %sext2 = sext i32 %x1 to i64
  %sext3 = sext i32 %x2 to i64
  %sub = sub i64 %sext, %sext2
  %mul = mul i64 %sext3, %sub
  ret i64 %mul
}

define i64 @smull_sext_sub2(i32* %x0, i32 %x1, i32 %x2) {
entry:
  %ext64 = load i32, i32* %x0
  %sext3 = sext i32 %x2 to i64
  %sub = sub i32 %ext64, %x1
  %sext = sext i32 %sub to i64
  %mul = mul i64 %sext3, %sext
  ret i64 %mul
}

Why 31 bits. If we look sub and mul as arithmetic instructions they both need the same number of sign bits to determine of a 64bit arithmetic can be reduced to an equivalent 32bit.
A simple example - https://alive2.llvm.org/ce/z/RJU_ua

Harbormaster completed remote builds in B201347: Diff 480424.Dec 6 2022, 8:40 AM

bipmis abandoned this revision.Jul 27 2023, 1:55 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.td

9 lines

test/

CodeGen/

AArch64/

aarch64-mull-masks.ll

19 lines

Diff 480424

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 794 Lines • ▼ Show 20 Lines	def add_and_or_is_add : PatFrags<(ops node:$lhs, node:$rhs),
}];		}];
}		}

// Match mul with enough sign-bits. Can be reduced to a smaller mul operand.		// Match mul with enough sign-bits. Can be reduced to a smaller mul operand.
def smullwithsignbits : PatFrag<(ops node:$l, node:$r), (mul node:$l, node:$r), [{		def smullwithsignbits : PatFrag<(ops node:$l, node:$r), (mul node:$l, node:$r), [{
return CurDAG->ComputeNumSignBits(N->getOperand(0)) > 32 &&		return CurDAG->ComputeNumSignBits(N->getOperand(0)) > 32 &&
CurDAG->ComputeNumSignBits(N->getOperand(1)) > 32;		CurDAG->ComputeNumSignBits(N->getOperand(1)) > 32;
}]>;		}]>;
		def submulwithsignbits : PatFrag<(ops node:$wd, node:$ws, node:$wt),
		(mul node:$wd, (sub node:$ws, node:$wt)) , [{
		return CurDAG->ComputeNumSignBits(N->getOperand(0)) > 32 &&
		CurDAG->ComputeNumSignBits(N->getOperand(1)->getOperand(0)) > 32 &&
		CurDAG->ComputeNumSignBits(N->getOperand(1)->getOperand(1)) > 32;
		}]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// AArch64 Instruction Predicate Definitions.		// AArch64 Instruction Predicate Definitions.
// We could compute these on a per-module basis but doing so requires accessing		// We could compute these on a per-module basis but doing so requires accessing
// the Function object through the <Target>Subtarget and objections were raised		// the Function object through the <Target>Subtarget and objections were raised
▲ Show 20 Lines • Show All 1,135 Lines • ▼ Show 20 Lines	def : Pat<(i64 (ineg (smullwithsignbits GPR64:$Rn, GPR64:$Rm))),
(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), (EXTRACT_SUBREG $Rm, sub_32), XZR)>;		(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), (EXTRACT_SUBREG $Rm, sub_32), XZR)>;
def : Pat<(i64 (ineg (smullwithsignbits GPR64:$Rn, (sext GPR32:$Rm)))),		def : Pat<(i64 (ineg (smullwithsignbits GPR64:$Rn, (sext GPR32:$Rm)))),
(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), $Rm, XZR)>;		(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), $Rm, XZR)>;

def : Pat<(i64 (sub GPR64:$Ra, (smullwithsignbits GPR64:$Rn, GPR64:$Rm))),		def : Pat<(i64 (sub GPR64:$Ra, (smullwithsignbits GPR64:$Rn, GPR64:$Rm))),
(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), (EXTRACT_SUBREG $Rm, sub_32), GPR64:$Ra)>;		(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), (EXTRACT_SUBREG $Rm, sub_32), GPR64:$Ra)>;
def : Pat<(i64 (sub GPR64:$Ra, (smullwithsignbits GPR64:$Rn, (sext GPR32:$Rm)))),		def : Pat<(i64 (sub GPR64:$Ra, (smullwithsignbits GPR64:$Rn, (sext GPR32:$Rm)))),
(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), $Rm, GPR64:$Ra)>;		(SMSUBLrrr (EXTRACT_SUBREG $Rn, sub_32), $Rm, GPR64:$Ra)>;

		def : Pat<(i64 (submulwithsignbits GPR64:$Rn, GPR64:$Rm, (sext GPR32:$Rt))),
		(SMADDLrrr (EXTRACT_SUBREG $Rn, sub_32), (SUBSWrr (EXTRACT_SUBREG $Rm, sub_32), $Rt), XZR)>;
} // AddedComplexity = 5		} // AddedComplexity = 5

def : MulAccumWAlias<"mul", MADDWrrr>;		def : MulAccumWAlias<"mul", MADDWrrr>;
def : MulAccumXAlias<"mul", MADDXrrr>;		def : MulAccumXAlias<"mul", MADDXrrr>;
def : MulAccumWAlias<"mneg", MSUBWrrr>;		def : MulAccumWAlias<"mneg", MSUBWrrr>;
def : MulAccumXAlias<"mneg", MSUBXrrr>;		def : MulAccumXAlias<"mneg", MSUBXrrr>;
def : WideMulAccumAlias<"smull", SMADDLrrr>;		def : WideMulAccumAlias<"smull", SMADDLrrr>;
def : WideMulAccumAlias<"smnegl", SMSUBLrrr>;		def : WideMulAccumAlias<"smnegl", SMSUBLrrr>;
▲ Show 20 Lines • Show All 6,771 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll

	Show First 20 Lines • Show All 898 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: smull x0, w8, w9			; CHECK-NEXT: smull x0, w8, w9
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%tmp1 = ashr i64 %a, 32			%tmp1 = ashr i64 %a, 32
	%c = ashr i64 %b, 32			%c = ashr i64 %b, 32
	%tmp3 = mul i64 %tmp1, %c			%tmp3 = mul i64 %tmp1, %c
	ret i64 %tmp3			ret i64 %tmp3
	}			}

				define i64 @smull_sext_sub(i32* %x0, i32 %x1, i16 %x2) {
				; CHECK-LABEL: smull_sext_sub:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrsw x9, [x0]
				; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
				; CHECK-NEXT: sxth x8, w2
				; CHECK-NEXT: sub w9, w9, w1
				; CHECK-NEXT: smull x0, w8, w9
				; CHECK-NEXT: ret
				entry:
				%ext64 = load i32, i32* %x0
				%sext = sext i32 %ext64 to i64
				%sext2 = sext i32 %x1 to i64
				%sext3 = sext i16 %x2 to i64
				%sub = sub i64 %sext, %sext2
				%mul = mul nsw i64 %sext3, %sub
				ret i64 %mul
				}