This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Add combine logic to use MADDLD/MADDHD/MADDHDU in multiply-add patterns
Needs ReviewPublic

Authored by tingwang on Aug 18 2022, 6:55 PM.

Details

Reviewers
nemanjai
shchenz
Group Reviewers
Restricted Project
Summary

Some i64 multiply-add patterns can use MADDLD/MADDHD/MADDHDU to simplify as shown in this patch.

Diff Detail

Event Timeline

tingwang created this revision.Aug 18 2022, 6:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2022, 6:55 PM
tingwang requested review of this revision.Aug 18 2022, 6:55 PM
shchenz added inline comments.Aug 18 2022, 8:19 PM
llvm/test/CodeGen/PowerPC/adde-sube-int128-madd.ll
4 ↗(On Diff #453857)

commit the new tests as a NFC patch first please.

tingwang updated this revision to Diff 453883.Aug 18 2022, 10:18 PM

Update test case to show the change.

tingwang marked an inline comment as done.Aug 18 2022, 10:20 PM
tingwang added inline comments.
llvm/test/CodeGen/PowerPC/adde-sube-int128-madd.ll
4 ↗(On Diff #453857)

Sure, will do. Updated test case to highlight the delta.

tingwang updated this revision to Diff 455823.Aug 26 2022, 1:19 AM
tingwang marked an inline comment as done.

Move the combine logic in front of type legalization to simplify the pattern.

lkail added a subscriber: lkail.Aug 30 2022, 6:59 AM
This comment was removed by lkail.
tingwang planned changes to this revision.Aug 30 2022, 11:28 PM
tingwang added a comment.EditedAug 31 2022, 12:37 AM

The third version has issue that it missed some opportunities that should have been picked up by the second version. For example:

%conv = sext i64 %a to i128
%conv1 = sext i64 %b to i128
%or = or i128 %conv, %conv1
%mul = mul nsw i128 %conv1, %or
%conv2 = sext i64 %c to i128
%add = add nsw i128 %mul, %conv2

It is not obvious to me that %or = or i128 %conv, %conv1 in the context can be represented by sext i64 (or i64 %a, %b) to i128. I'm not aware of any API that can help deduce how many bits are the same as signed bit, so that we can apply truncate here. KnownBits calculates ones and zeros, can work for unsigned case, however cannot help signed case.

After reading some code, realized maybe I can use ComputeNumSignBits to determine signed case.

tingwang updated this revision to Diff 457142.Aug 31 2022, 6:17 PM

(1) Validate pattern by using NumSignBits and SignBitIsZero
(2) Add a test case to show the pattern

IMO, this version is better than the first one, after the type legalization recognizing the legalized i128 types seems more complicated than recognizing the MADD pattern for type i128 before the type legalization.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
17262

Can we do a std::swap first to put the MUL operand to a settled index? So that we don't need to check MUL operand many times in below code

17281

We may need some false check test cases here, like extending i65 to i128?

17346

Maybe we need to explicitly check this function is called before type legalization(isBeforeLegalize)? BUILD_PAIR is a node which should only be generated before type legalization.

llvm/test/CodeGen/PowerPC/add-sub-int128-madd.ll
4 ↗(On Diff #457142)

Can we add some cases for other types extension like i32/i16 to i128? I think for these cases, we may don't need maddhd/maddhdu to generate the higher 64 bits of the i128.

tingwang updated this revision to Diff 457175.Aug 31 2022, 10:32 PM

Address comments:
(1) Add isBeforeLegalize() check (updated function scope to allow access to DAGCombinerInfo).
(2) Add test case to show invalid case is not transformed.
(3) Update code structure, reduce duplicated logic.
(4) Add check to identify cases which lead to zero higher half result.

tingwang marked 4 inline comments as done.Aug 31 2022, 10:33 PM
shchenz added inline comments.Oct 14 2022, 12:27 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
17264

I am worried about the swapping for SUB and then adding a SUB(0, swapped_result) to get it back. They are not equal I think.

For example for your below case:

define i128 @sub_int128_CmAxB(i64 noundef %a, i64 noundef %b, i64 noundef %c) local_unnamed_addr #0 {
; CHECK-P9-NEXT:    neg 4, 4
; CHECK-P9-NEXT:    maddld 6, 4, 3, 5
; CHECK-P9-NEXT:    maddhd 4, 4, 3, 5
; CHECK-P9-NEXT:    mr 3, 6
; CHECK-P9-NEXT:    blr
entry:
  %conv = sext i64 %c to i128
  %conv1 = sext i64 %a to i128
  %conv2 = sext i64 %b to i128
  %mul = mul nsw i128 %conv2, %conv1
  %sub = sub nsw i128 %conv, %mul
  ret i128 %sub
}

Suppose %b is the most negative value.

17276

maddld and maddhd are valid for 64 bit integers which include sign bit at the bit-0. So I think here we should expect there are at least 64 + 1= 65 sign bits?
For the 64 sign bits and zero sign bit case, I guess the case is like: we have 0 in all the high 64 bits, and we have 1 in the first bit of the low 64 bits?(Otherwise, the sign bits number must be at least 65?) If so, this is not a case can be handled correctly either. We are expecting zero extension, but the low 64 bits value which can be accessed by maddld/maddhd is a signed value.

17314

This seems overkill. For signed values, cases that some operands are negative while others are positive are still valid.

I think for add(mul(A, B), C),

  • if A, B C all have zero sign bit, it is equal to maddhdu + maddld.
  • if any of A, B, C can not be proven have zero sign bit, it is equal to maddhd + maddld
tingwang updated this revision to Diff 468442.Oct 18 2022, 1:00 AM

Update according to comments:
(1) Drop sub patterns.
(2) Update check criteria to allow combined signed/unsigned operands as long as NumSignBits >= 65, and is served by madd[l/h]d.
(3) NumSignBits == 64 and SignBitIsZero is the only case served by maddhdu.

All IR test pattern results compared between P8 and P9 by attached script.

tingwang marked 2 inline comments as done.Oct 18 2022, 1:03 AM
tingwang added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
17276

Thank you for pointing out. Updated accordingly.