Page MenuHomePhabricator

[X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)
ClosedPublic

Authored by LiuChen3 on Sep 17 2021, 12:40 AM.

Details

Summary

This patch is to support transform something like
_mm512_add_ph(acc, _mm512_fmadd_pch(a, b, _mm512_setzero_ph()))
to _mm512_fmadd_pch(a, b, acc).

Diff Detail

Unit TestsFailed

TimeTest
280 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-cxa-atexit.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-cxa-atexit.S
130 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-static-initializer.S
Script: -- : 'RUN: at line 7'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-static-initializer.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-static-initializer.S
140 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-tls.S
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-tls.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-tls.S

Event Timeline

LiuChen3 created this revision.Sep 17 2021, 12:40 AM
LiuChen3 requested review of this revision.Sep 17 2021, 12:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2021, 12:40 AM
xbolva00 added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
47481

Do we really need this output here? Simplify it a bit? Something like you wrote "Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)"?

pengfei added inline comments.Sep 17 2021, 1:10 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47522

I think we can use bool IsConj, SDValue MulOp0, MulOp0 instead of CFmul. Then you don't need to create a temp mul node.

47529–47530

Better to add parentheses.

LiuChen3 added inline comments.Sep 17 2021, 1:54 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47481

Good idea.

47522

It seems we create more temp node. Is it better?

pengfei added inline comments.Sep 17 2021, 2:28 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47522

They are temp variables rather than nodes. And compiler may likly optimize them.

LiuChen3 updated this revision to Diff 373190.Sep 17 2021, 4:19 AM

Address comments

pengfei added inline comments.Sep 17 2021, 5:17 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47517–47518

Can these be

MulOp0 = Op0->getOperand(1);
MulOp1 = Op0->getOperand(2);
47522

I think we can then use

if ((Opcode == X86ISD::VFMULC || Opcode == X86ISD::VFCMULC)) {
  ...
  return true;
}
if ((Opcode == X86ISD::VFMADDC || Opcode == X86ISD::VFCMADDC) ... {
  ...
  return true;
}
return false;
47533–47534

I think we can remove the assert now.

LiuChen3 updated this revision to Diff 373206.Sep 17 2021, 6:35 AM

Adress comments.

pengfei added inline comments.Sep 17 2021, 7:00 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47514–47517

Why we still need this?

pengfei added inline comments.Sep 17 2021, 7:01 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47512

We don't need else after return. See the Lint comment.

LiuChen3 added inline comments.Sep 17 2021, 7:19 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47514–47517

We need transfer FMA(A, B 0) to MUL(A, B) firstly.

LiuChen3 added inline comments.Sep 17 2021, 7:25 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47514–47517

My bad. I got what's your mean.

LiuChen3 updated this revision to Diff 373216.Sep 17 2021, 7:34 AM

Adress comments.

pengfei accepted this revision.Sep 17 2021, 8:36 PM

LGTM with a nit.

llvm/lib/Target/X86/X86ISelLowering.cpp
47514

indent

This revision is now accepted and ready to land.Sep 17 2021, 8:36 PM

LGTM with a nit.

Can we make the title of this patch more obvious that we're talking about the f16 packed complex operations? FMA and FADD make it look generic.

LiuChen3 updated this revision to Diff 373392.Sep 17 2021, 11:01 PM

Adress comments

LiuChen3 retitled this revision from [X86] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A) to [X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A).Sep 17 2021, 11:01 PM

I'm not sure this transform is valid without no signed zeros. fma(x, y, -0.0) is equivalent to fmul.

I'm not sure this transform is valid without no signed zeros. fma(x, y, -0.0) is equivalent to fmul.

I agree, we may need to check -0.0 and 0.0 only when 'hasNoSignedZeros' is true?

We need to mention complex FMA in the title.

I'm not sure this transform is valid without no signed zeros. fma(x, y, -0.0) is equivalent to fmul.

Sorry, I lack the knowledge of 0.0 and -0.0. Why fma(x, y, 0.0) is not equal fmul?

There are special rules for adding or subtracting signed zero:

  • x + (+0) = x (for x different than 0)
  • x + (-0) = x (for x different than 0)
  • (-0) + (-0) = (-0) - (+0) = -0
  • (+0) + (+0) = (+0) - (-0) = +0
  • x - x = x + (-x) = +0 for any finite x unless rounding towards negative infininity then the result is -0 instead

If A * B is -0, then FMA(A, B, +0) would be (-0 + +0) or (-0 - (-0)) which should produce +0 by the last rule above with x = -0. This is different than FMUL(A,B) which we said was -0.

There are special rules for adding or subtracting signed zero:

  • x + (+0) = x (for x different than 0)
  • x + (-0) = x (for x different than 0)
  • (-0) + (-0) = (-0) - (+0) = -0
  • (+0) + (+0) = (+0) - (-0) = +0
  • x - x = x + (-x) = +0 for any finite x unless rounding towards negative infininity then the result is -0 instead

If A * B is -0, then FMA(A, B, +0) would be (-0 + +0) or (-0 - (-0)) which should produce +0 by the last rule above with x = -0. This is different than FMUL(A,B) which we said was -0.

Got it. Thanks a lot. :)

LiuChen3 updated this revision to Diff 374158.Sep 22 2021, 1:53 AM

Distinguish 0.0 and -0.0 in FMA.

LiuChen3 added inline comments.Sep 22 2021, 1:56 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47533

Maybe we can just check hasNoSignedZeros() and hasAllowContract() as pengfei said?

llvm/test/CodeGen/X86/avx512fp16-combine-vfmac-fadd.ll
196

Should we do this combine standalone?

pengfei added inline comments.Sep 22 2021, 2:13 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47612–47621

This seems been changed unconsciously.

47629–47634

The indentation is wrong too. The same below.

pengfei requested changes to this revision.Sep 22 2021, 2:28 AM

The format in this file was wrongly formatted.

llvm/lib/Target/X86/X86ISelLowering.cpp
47533

Yeah, I prefer to checking both in line 47582.

47582

Should this be

AllowContract(Op0->getFlags()) && (ISD::isBuildVectorAllZeros(Op0->getOperand(0).getNode()) && Op0->getFlags().hasNoSignedZeros()) || IsVectorAllNegativeZero(Op0->getOperand(0).getNode()))

I.e, check AllowContract together with IsVectorAllNegativeZero as well.

This revision now requires changes to proceed.Sep 22 2021, 2:28 AM
LiuChen3 added inline comments.Sep 22 2021, 6:33 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47582

AllowContract will check hasNoSignedZeros(). It seems that we can only do this combination when the fast-math flag is set, No matter if the third operand is +0.0 or 0.0.
+0.0 or -0.0 affects the conversion of FMA(a, b, ±0.0) to FMUL(a, b).

47612–47621

Sorry for this. Looks like I accidentally do some change here.

LiuChen3 updated this revision to Diff 374432.Sep 22 2021, 8:27 PM
  1. Remove redundant option checkRemove redundant option check.
  2. Reformat.
pengfei added inline comments.Sep 22 2021, 10:50 PM
llvm/test/CodeGen/X86/avx512fp16-combine-vfmac-fadd.ll
3

How about CHECK,NO-SZ

4

How about CHECK,HAS-SZ

LiuChen3 updated this revision to Diff 374445.Sep 22 2021, 11:02 PM

Address comments.

pengfei accepted this revision.Sep 22 2021, 11:11 PM

LGTM.

This revision is now accepted and ready to land.Sep 22 2021, 11:11 PM
This revision was landed with ongoing or failed builds.Sep 23 2021, 12:37 AM
This revision was automatically updated to reflect the committed changes.

Thanks for you review. The order of operand is changed in final commit: The addend of FMA builtin is moved from the first to the third.