Some i64 multiply-add patterns can use MADDLD/MADDHD/MADDHDU to simplify as shown in this patch.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/PowerPC/adde-sube-int128-madd.ll | ||
---|---|---|
4 ↗ | (On Diff #453857) | commit the new tests as a NFC patch first please. |
llvm/test/CodeGen/PowerPC/adde-sube-int128-madd.ll | ||
---|---|---|
4 ↗ | (On Diff #453857) | Sure, will do. Updated test case to highlight the delta. |
The third version has issue that it missed some opportunities that should have been picked up by the second version. For example:
%conv = sext i64 %a to i128 %conv1 = sext i64 %b to i128 %or = or i128 %conv, %conv1 %mul = mul nsw i128 %conv1, %or %conv2 = sext i64 %c to i128 %add = add nsw i128 %mul, %conv2
It is not obvious to me that %or = or i128 %conv, %conv1 in the context can be represented by sext i64 (or i64 %a, %b) to i128. I'm not aware of any API that can help deduce how many bits are the same as signed bit, so that we can apply truncate here. KnownBits calculates ones and zeros, can work for unsigned case, however cannot help signed case.
After reading some code, realized maybe I can use ComputeNumSignBits to determine signed case.
(1) Validate pattern by using NumSignBits and SignBitIsZero
(2) Add a test case to show the pattern
IMO, this version is better than the first one, after the type legalization recognizing the legalized i128 types seems more complicated than recognizing the MADD pattern for type i128 before the type legalization.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
17262 | Can we do a std::swap first to put the MUL operand to a settled index? So that we don't need to check MUL operand many times in below code | |
17281 | We may need some false check test cases here, like extending i65 to i128? | |
17346 | Maybe we need to explicitly check this function is called before type legalization(isBeforeLegalize)? BUILD_PAIR is a node which should only be generated before type legalization. | |
llvm/test/CodeGen/PowerPC/add-sub-int128-madd.ll | ||
4 ↗ | (On Diff #457142) | Can we add some cases for other types extension like i32/i16 to i128? I think for these cases, we may don't need maddhd/maddhdu to generate the higher 64 bits of the i128. |
Address comments:
(1) Add isBeforeLegalize() check (updated function scope to allow access to DAGCombinerInfo).
(2) Add test case to show invalid case is not transformed.
(3) Update code structure, reduce duplicated logic.
(4) Add check to identify cases which lead to zero higher half result.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
17264 | I am worried about the swapping for SUB and then adding a SUB(0, swapped_result) to get it back. They are not equal I think. For example for your below case: define i128 @sub_int128_CmAxB(i64 noundef %a, i64 noundef %b, i64 noundef %c) local_unnamed_addr #0 { ; CHECK-P9-NEXT: neg 4, 4 ; CHECK-P9-NEXT: maddld 6, 4, 3, 5 ; CHECK-P9-NEXT: maddhd 4, 4, 3, 5 ; CHECK-P9-NEXT: mr 3, 6 ; CHECK-P9-NEXT: blr entry: %conv = sext i64 %c to i128 %conv1 = sext i64 %a to i128 %conv2 = sext i64 %b to i128 %mul = mul nsw i128 %conv2, %conv1 %sub = sub nsw i128 %conv, %mul ret i128 %sub } Suppose %b is the most negative value. | |
17276 | maddld and maddhd are valid for 64 bit integers which include sign bit at the bit-0. So I think here we should expect there are at least 64 + 1= 65 sign bits? | |
17314 | This seems overkill. For signed values, cases that some operands are negative while others are positive are still valid. I think for add(mul(A, B), C),
|
Update according to comments:
(1) Drop sub patterns.
(2) Update check criteria to allow combined signed/unsigned operands as long as NumSignBits >= 65, and is served by madd[l/h]d.
(3) NumSignBits == 64 and SignBitIsZero is the only case served by maddhdu.
All IR test pattern results compared between P8 and P9 by attached script.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
17276 | Thank you for pointing out. Updated accordingly. |
Can we do a std::swap first to put the MUL operand to a settled index? So that we don't need to check MUL operand many times in below code