Page MenuHomePhabricator

[DAGCombiner] Don't fold ((fma (fneg X), Y, (fneg Z)) to fneg (fma X, Y, Z))
Needs ReviewPublic

Authored by Jim on Nov 5 2020, 10:30 PM.

Details

Summary

In the special case, X = -1, Y = 0 and Z = 0 that (-(-1))*0+(-0)=0 is not equal to -(-1*0+0)=-0.
So it is a incorrect transform.

Diff Detail

Unit TestsFailed

TimeTest
40 mslinux > Clang.CodeGen::lto-newpm-pipeline.c
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -O0 /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-O0
380 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
720 mslinux > LLVM.Other::new-pass-manager.ll
Script: -- : 'RUN: at line 8'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -disable-output -disable-verify -debug-pass-manager -passes=no-op-module /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pass-manager.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pass-manager.ll --check-prefix=CHECK-MODULE-PASS

Event Timeline

Jim created this revision.Nov 5 2020, 10:30 PM
Jim requested review of this revision.Nov 5 2020, 10:30 PM

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;
Jim updated this revision to Diff 303364.Nov 6 2020, 1:21 AM

Update testcase.

lkail added a subscriber: shchenz.Nov 6 2020, 1:24 AM
Jim added a comment.Nov 6 2020, 1:55 AM

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;

Yes, it is legal with no signed zeros fast math flag. I see that PowerPC deal with this case specifically.

Should I add the condition with no signed zero to permit this transform instead of deleting it?

In D90901#2378338, @Jim wrote:

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;

Yes, it is legal with no signed zeros fast math flag. I see that PowerPC deal with this case specifically.

Should I add the condition with no signed zero to permit this transform instead of deleting it?

I think so. As Craig pointed out, the default implementation of getNegatedExpression will take care of the fast-math flags. You need check the nsz inside X86::getNegatedExpression() when perform some folding that might change the sign bit of zero.

Also, target specific test is missing. The powerpc part test change is unexpected as there is already nsz flags there.

spatel added a comment.Nov 6 2020, 6:07 AM

Also, target specific test is missing. The powerpc part test change is unexpected as there is already nsz flags there.

+1. The bug(s) appear to be in target-specific code, so we need a minimal test for x86 (and ARM and possibly others) to go with the code fix. IIUC, something like this:

define double @fneg_fma(double %x, double %y, double %z) {
  %negx = fneg double %x
  %negz = fneg double %z
  %fma = call double @llvm.fma.f64(double %negx, double %y, double %negz)
  %n = fneg double %fma
  ret double %n
}

Currently, that is transformed to vfmadd213sd, but that's a miscompile. We can simulate that in IR with Alive2:
https://alive2.llvm.org/ce/z/XxwBAJ

reverse ping