Check if it has no signed zeros flag (nsz) in getNegatedExpression for x86.
This patch fixed miscompilation: https://alive2.llvm.org/ce/z/XxwBAJ
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
No test changes?
The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.
I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.
// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;Yes, it is legal with no signed zeros fast math flag. I see that PowerPC deal with this case specifically.
Should I add the condition with no signed zero to permit this transform instead of deleting it?
I think so. As Craig pointed out, the default implementation of getNegatedExpression will take care of the fast-math flags. You need check the nsz inside X86::getNegatedExpression() when perform some folding that might change the sign bit of zero.
Also, target specific test is missing. The powerpc part test change is unexpected as there is already nsz flags there.
+1. The bug(s) appear to be in target-specific code, so we need a minimal test for x86 (and ARM and possibly others) to go with the code fix. IIUC, something like this:
define double @fneg_fma(double %x, double %y, double %z) {
  %negx = fneg double %x
  %negz = fneg double %z
  %fma = call double @llvm.fma.f64(double %negx, double %y, double %negz)
  %n = fneg double %fma
  ret double %n
}Currently, that is transformed to vfmadd213sd, but that's a miscompile. We can simulate that in IR with Alive2:
https://alive2.llvm.org/ce/z/XxwBAJ
Check if it has no signed zeros flag (nsz) or no NoSignedZerosFPMath option
in getNegatedExpression for x86.
| llvm/test/CodeGen/X86/fma-signed-zero.ll | ||
|---|---|---|
| 12 | You can pre-commit (or at least stage them first and do git-diff) this test so that what optimizations are prevented is clear. | |
| llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll | ||
|---|---|---|
| 3 | To confirm: this patch is based on a local update with the test changes? We really want to avoid adding llc options to toggle the FP state as discussed in D99080 (so this diff would invert what we want to do there). Unfortunately, this is a yak shaving exercise because the tests are using deprecated target-specific intrinsics. I hopefully fixed that for this file at least here: Ideally, we can now add tests with FMF (nsz on the appropriate instructions) to preserve the intent of these tests and also demonstrate the bug. The fma-fneg-combine.ll file has the same problem. Let me know if I should update that or if you want to give that a try. | |
| llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll | ||
|---|---|---|
| 3 | I don't understand what do you mean "this patch is based on a local update with the test changes?" I am not sure how to update the tests in fma-fneg-combine.ll. The intrinsics used in fma-fneg-combine.ll have additional arguments not just only 3 arguments. I would update avx2-fma-fneg-combine.ll by adding FMF on the intrunctions. | |
| llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll | ||
|---|---|---|
| 3 | Sorry - I didn't realize the tests in fma-fneg-combine.ll were for still active target-specific nodes. | |
| llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll | ||
|---|---|---|
| 3 | I found that llvm.x86.avx512.vfmadd.ps.512 in fma-fneg-combine.ll would be lowered to general fma but it losts FMF. Do you have any comment to fix it ? | |
| llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll | ||
|---|---|---|
| 3 | Yes, it's a mess. Please have a look and update this patch after: If I got it right, we have the necessary test coverage in fma-fneg-combine.ll now, so there is no need to change the RUN lines in that file. Just regenerate the CHECK lines using utils/update_llc_test_checks.py after applying this patch. | |
| llvm/lib/Target/X86/X86ISelLowering.cpp | ||
|---|---|---|
| 47065 | Do not check the TargetOptions here; that's the legacy/deprecated construct. We should have fixed FMF propagation enough at this point, so it should not be necessary. And we should have enough test coverage to verify that (although it's hard to tell what is redundant/missing in the existing tests). That also means we should not use target options or function attrs in the new test file. I'll comment directly on D102621 to make that clearer. | |
| llvm/lib/Target/X86/X86ISelLowering.cpp | ||
|---|---|---|
| 47049 | Don't need to check TargetOptions? | |
LGTM - thanks for working through the FMF/test updates!
Please update the commit message since we are not checking the target options now.
Also, you may want to add a link to an Alive2 proof and mention that this patch fixes miscompiles:
https://alive2.llvm.org/ce/z/XxwBAJ
Don't need to check TargetOptions?