This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z)
ClosedPublic

Authored by Jim on Nov 5 2020, 10:30 PM.

Details

Summary

Check if it has no signed zeros flag (nsz) in getNegatedExpression for x86.
This patch fixed miscompilation: https://alive2.llvm.org/ce/z/XxwBAJ

Diff Detail

Event Timeline

Jim created this revision.Nov 5 2020, 10:30 PM
Jim requested review of this revision.Nov 5 2020, 10:30 PM

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;
Jim updated this revision to Diff 303364.Nov 6 2020, 1:21 AM

Update testcase.

lkail added a subscriber: shchenz.Nov 6 2020, 1:24 AM
Jim added a comment.Nov 6 2020, 1:55 AM

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;

Yes, it is legal with no signed zeros fast math flag. I see that PowerPC deal with this case specifically.

Should I add the condition with no signed zero to permit this transform instead of deleting it?

In D90901#2378338, @Jim wrote:

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;

Yes, it is legal with no signed zeros fast math flag. I see that PowerPC deal with this case specifically.

Should I add the condition with no signed zero to permit this transform instead of deleting it?

I think so. As Craig pointed out, the default implementation of getNegatedExpression will take care of the fast-math flags. You need check the nsz inside X86::getNegatedExpression() when perform some folding that might change the sign bit of zero.

Also, target specific test is missing. The powerpc part test change is unexpected as there is already nsz flags there.

spatel added a comment.Nov 6 2020, 6:07 AM

Also, target specific test is missing. The powerpc part test change is unexpected as there is already nsz flags there.

+1. The bug(s) appear to be in target-specific code, so we need a minimal test for x86 (and ARM and possibly others) to go with the code fix. IIUC, something like this:

define double @fneg_fma(double %x, double %y, double %z) {
  %negx = fneg double %x
  %negz = fneg double %z
  %fma = call double @llvm.fma.f64(double %negx, double %y, double %negz)
  %n = fneg double %fma
  ret double %n
}

Currently, that is transformed to vfmadd213sd, but that's a miscompile. We can simulate that in IR with Alive2:
https://alive2.llvm.org/ce/z/XxwBAJ

reverse ping

RKSimon requested changes to this revision.May 4 2021, 7:13 AM

As @spatel said, we need more per-target test coverage for this

This revision now requires changes to proceed.May 4 2021, 7:13 AM
Jim updated this revision to Diff 345677.May 16 2021, 12:59 AM

Check if it has no signed zeros flag (nsz) or no NoSignedZerosFPMath option
in getNegatedExpression for x86.

Jim retitled this revision from [DAGCombiner] Don't fold ((fma (fneg X), Y, (fneg Z)) to fneg (fma X, Y, Z)) to [X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z).May 16 2021, 1:00 AM
Jim edited the summary of this revision. (Show Details)
qiucf added inline comments.May 17 2021, 2:47 AM
llvm/test/CodeGen/X86/fma-signed-zero.ll
12 ↗(On Diff #345677)

You can pre-commit (or at least stage them first and do git-diff) this test so that what optimizations are prevented is clear.

Jim updated this revision to Diff 345856.May 17 2021, 6:26 AM

Address @qiucf's comment.

Pre-commit patch https://reviews.llvm.org/D102621.

spatel added inline comments.May 17 2021, 8:16 AM
llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
2 ↗(On Diff #345856)

To confirm: this patch is based on a local update with the test changes?

We really want to avoid adding llc options to toggle the FP state as discussed in D99080 (so this diff would invert what we want to do there).

Unfortunately, this is a yak shaving exercise because the tests are using deprecated target-specific intrinsics. I hopefully fixed that for this file at least here:
8854b27b19

Ideally, we can now add tests with FMF (nsz on the appropriate instructions) to preserve the intent of these tests and also demonstrate the bug.

The fma-fneg-combine.ll file has the same problem. Let me know if I should update that or if you want to give that a try.

Jim added inline comments.May 17 2021, 10:30 PM
llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
2 ↗(On Diff #345856)

I don't understand what do you mean "this patch is based on a local update with the test changes?"

I am not sure how to update the tests in fma-fneg-combine.ll. The intrinsics used in fma-fneg-combine.ll have additional arguments not just only 3 arguments.

I would update avx2-fma-fneg-combine.ll by adding FMF on the intrunctions.

spatel added inline comments.May 18 2021, 2:10 PM
llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
2 ↗(On Diff #345856)

Sorry - I didn't realize the tests in fma-fneg-combine.ll were for still active target-specific nodes.
I posted D102725, so we can try to convert those to use IR-level FMF too.
I think you can update this patch to use IR-level FMF on avx2-fma-fneg-combine.ll now. It would be good to add tests with 'nsz' and leave the existing tests as-is. That way we'll show that we are not miscompiling but we are optimizing if possible.

Jim added inline comments.May 18 2021, 8:50 PM
llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
2 ↗(On Diff #345856)

I found that llvm.x86.avx512.vfmadd.ps.512 in fma-fneg-combine.ll would be lowered to general fma but it losts FMF. Do you have any comment to fix it ?

spatel added inline comments.May 19 2021, 11:40 AM
llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
2 ↗(On Diff #345856)

Yes, it's a mess. Please have a look and update this patch after:
f66ba4cfa
333c968
f12f9be
9b59a61

If I got it right, we have the necessary test coverage in fma-fneg-combine.ll now, so there is no need to change the RUN lines in that file. Just regenerate the CHECK lines using utils/update_llc_test_checks.py after applying this patch.

Jim updated this revision to Diff 346618.May 19 2021, 7:39 PM

Rebase and update testcases.

spatel added inline comments.May 20 2021, 5:38 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47066 ↗(On Diff #346618)

Do not check the TargetOptions here; that's the legacy/deprecated construct. We should have fixed FMF propagation enough at this point, so it should not be necessary.

And we should have enough test coverage to verify that (although it's hard to tell what is redundant/missing in the existing tests). That also means we should not use target options or function attrs in the new test file. I'll comment directly on D102621 to make that clearer.

Jim updated this revision to Diff 346913.May 20 2021, 7:47 PM

Address @spatel's comment

spatel added inline comments.May 21 2021, 5:14 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47049 ↗(On Diff #346913)

Don't need to check TargetOptions?

Jim updated this revision to Diff 347012.EditedMay 21 2021, 6:31 AM

Address @spatel's comment.

spatel accepted this revision.May 21 2021, 6:41 AM

LGTM - thanks for working through the FMF/test updates!
Please update the commit message since we are not checking the target options now.
Also, you may want to add a link to an Alive2 proof and mention that this patch fixes miscompiles:
https://alive2.llvm.org/ce/z/XxwBAJ

RKSimon accepted this revision.May 21 2021, 7:16 AM

LGTM (just to unblock) - @spatel has the final decision - cheers

This revision is now accepted and ready to land.May 21 2021, 7:16 AM
Jim edited the summary of this revision. (Show Details)May 21 2021, 7:43 AM