This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering][X86][AMDGPU] Teach expandMUL_LOHI to handle a mix of sign and zero extend.
AbandonedPublic

Authored by craig.topper on Sep 5 2022, 12:45 AM.

Details

Summary

If one input of a MUL that needs to expanded is sign extended and
the other is zero extended, we can use an unsigned mul and apply
a correction. If the signed number is negative, we subtract the lower
bits of the zero extended input.

Fixes PR57549, but not in the same way.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 5 2022, 12:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 12:45 AM
craig.topper requested review of this revision.Sep 5 2022, 12:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 12:45 AM
Herald added a subscriber: wdng. · View Herald Transcript

Should we have a generic combine instead?

If we have something like this - https://alive2.llvm.org/ce/z/NAmVa3 - then it's always better to get rid of the mul unless we have 'minsize':

define i32 @src(i32 %x, i32 %y) {
  %ysign = ashr i32 %y, 31
  %m = mul i32 %x, %ysign
  ret i32 %m
}

define i32 @tgt(i32 %x, i32 %y) {
  %isneg = icmp slt i32 %y, 0
  %negx = sub i32 0, %x
  %m = select i1 %isneg, i32 %negx, i32 0
  ret i32 %m
}
% llc -o - mul.ll -mtriple=riscv32 -mattr=m
	srai	a1, a1, 31
	mul	a0, a0, a1
...
	neg	a0, a0
	srai	a1, a1, 31
	and	a0, a1, a0

% llc -o - mul.ll -mtriple=aarch64         
	asr	w8, w1, #31
	mul	w0, w0, w8
...
	neg	w8, w0
	and	w0, w8, w1, asr #31

% llc -o - mul.ll -mtriple=x86_64 
	movl	%esi, %eax
	sarl	$31, %eax
	imull	%edi, %eax
...
	movl	%esi, %eax
	negl	%edi
	sarl	$31, %eax
	andl	%edi, %eax

Should we have a generic combine instead?

If we have something like this - https://alive2.llvm.org/ce/z/NAmVa3 - then it's always better to get rid of the mul unless we have 'minsize':

define i32 @src(i32 %x, i32 %y) {
  %ysign = ashr i32 %y, 31
  %m = mul i32 %x, %ysign
  ret i32 %m
}

define i32 @tgt(i32 %x, i32 %y) {
  %isneg = icmp slt i32 %y, 0
  %negx = sub i32 0, %x
  %m = select i1 %isneg, i32 %negx, i32 0
  ret i32 %m
}
% llc -o - mul.ll -mtriple=riscv32 -mattr=m
	srai	a1, a1, 31
	mul	a0, a0, a1
...
	neg	a0, a0
	srai	a1, a1, 31
	and	a0, a1, a0

% llc -o - mul.ll -mtriple=aarch64         
	asr	w8, w1, #31
	mul	w0, w0, w8
...
	neg	w8, w0
	and	w0, w8, w1, asr #31

% llc -o - mul.ll -mtriple=x86_64 
	movl	%esi, %eax
	sarl	$31, %eax
	imull	%edi, %eax
...
	movl	%esi, %eax
	negl	%edi
	sarl	$31, %eax
	andl	%edi, %eax

Would we do a second combine for (add Z, (and X, (neg Y)) to get the (sub Z, (and X, Y))?

Would we do a second combine for (add Z, (and X, (neg Y)) to get the (sub Z, (and X, Y))?

I didn't look past the first missing transform...but sure, if that's missing too. :)

Something like this?
https://alive2.llvm.org/ce/z/zFxp6A

Seems to be missed in IR too.

craig.topper abandoned this revision.Sep 7 2022, 7:51 PM

Replaced by D133399