Page MenuHomePhabricator

[FPEnv][InstSimplify] Enable more folds for constrained fadd
Needs ReviewPublic

Authored by kpn on Jul 20 2021, 7:24 AM.



Currently there are optimizations for the fadd instruction that do not fire for a constrained fadd. Add some of these optimizations.

Diff Detail

Unit TestsFailed

210 msx64 debian > LLVM.CodeGen/AMDGPU::loop_break.ll
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/loop_break.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck -check-prefix=OPT /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/loop_break.ll
140 msx64 debian > LLVM.CodeGen/AMDGPU::sgpr-control-flow.ll
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck -enable-var-scope -check-prefix=SI /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll
260 msx64 windows > LLVM.CodeGen/AMDGPU::loop_break.ll
Script: -- : 'RUN: at line 1'; c:\ws\w4\llvm-project\premerge-checks\build\bin\opt.exe -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow C:\ws\w4\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\loop_break.ll | c:\ws\w4\llvm-project\premerge-checks\build\bin\filecheck.exe -check-prefix=OPT C:\ws\w4\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\loop_break.ll
140 msx64 windows > LLVM.CodeGen/AMDGPU::sgpr-control-flow.ll
Script: -- : 'RUN: at line 2'; c:\ws\w4\llvm-project\premerge-checks\build\bin\llc.exe -march=amdgcn -mcpu=tahiti -verify-machineinstrs < C:\ws\w4\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\sgpr-control-flow.ll | c:\ws\w4\llvm-project\premerge-checks\build\bin\filecheck.exe -enable-var-scope -check-prefix=SI C:\ws\w4\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\sgpr-control-flow.ll

Event Timeline

kpn created this revision.Jul 20 2021, 7:24 AM
kpn requested review of this revision.Jul 20 2021, 7:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2021, 7:24 AM

Pre-commit the tests, so we just show diffs in this patch?

kpn updated this revision to Diff 362043.Jul 27 2021, 8:44 AM

Tests have been precommitted.

sepavloff added inline comments.Jul 29 2021, 3:24 AM

Even if ExBehavior != fp::ebStrict this transformation is invalid if X is SNaN, the latter must be converted to QNaN.


The same about X==SNaN.


What about making such transformation in non-default mode?

kpn added inline comments.Jul 29 2021, 6:53 AM

It requires adding support to the IR matchers like m_FSub(), and those are used elsewhere. Which implies testing in places in addition to here. So I'm saving that for a subsequent patch.

Small steps.

kpn added inline comments.Aug 2 2021, 9:22 AM

Can I put this in a different patch? We have the same issue for constrained and non-constrained cases, and we don't have a good way to distinguish them -- nor should we.

Most of the required code is present in APFloat, but not all. Would it be OK for me to add the needed bits to APFloat and use them from InstSimplify in a different ticket?

sepavloff added inline comments.Aug 2 2021, 10:57 AM

I am not sure I understand what you are going to put into another patch. The transformation:

fadd X, -0 ==> X

is valid only if X != SNaN, otherwise we have:

fadd SNaN, -0 ==> QNaN

It does not depend on whether FP environment is default or not, the code before your changes was already incorrect. Such transformation is valid if the operation has flag nnan, but not in general case. Code in APFloat hardly can help, as constant folding is made previously, if X is a constant here, it means it cannot be folded.

spatel added inline comments.

IIUC, you are saying SNaN vs. QNaN is more than just a part of the exception state. But that's not how I interpret the current LangRef:

...and that's why *none* of the transforms here are intentionally SNaN preserving/clearing.

If we are going to change the behavior of the default FP env, the LangRef must be updated to make that clear. I don't see the motivation yet.

sepavloff added inline comments.Aug 6 2021, 10:43 AM

That's true for default FP environment. A difference between SNaN and QNaN is that operations on SNaN raise invalid exception. In default FP environment exceptions are ignored so SNaN and QNaN behave identically.

But this code works in non-default FP environment as well. fadd here designates both regular IR node used in default environment as well as its constrained counterpart. So SNaN here must be handled more carefully.

kpn added a comment.Aug 6 2021, 11:25 AM

We could fold in the "strict" exception behavior cases if we had a matcher for a QNaN. But instructions still wouldn't be removed since "strict" makes them be !isInstructionTriviallyDead(). A TODO note is probably sufficient should a m_QNaN() be added in the future.


We won't see a NaN here if 'nnan' is specified since simplifyFPOp() will have already turned it to poison and returned. So I think the check for FMF.noNaNs() should be removed since it can't happen and is therefore misleading.

We won't fold in the 'strict' exception case because we check for it and reject it.

I suspect the real objection is that we're not turning a SNaN into a QNaN here. But with the constant folding that Serge recently added we won't even be here in the "ignore" or "maytrap" cases at all. There won't be a "fadd NaN, -0" case except in the "strict" exception case which we decline to fold.

Aside from the unneeded check for FMF.noNaNs() I don't see a problem here. We aren't returning the wrong result here or below.

And I do now have code for the APFloat and ConstantFP classes to make it easy for simplifyFPOp() to convert SNaN->QNaN. That's what I was planning on submitting in a different ticket.

spatel added inline comments.Aug 6 2021, 11:50 AM

This seems to be getting fuzzy when the exception state is "MayTrap", so we probably need to clarify that definition in the LangRef.

Is there a regression test below that shows where you think this patch is wrong?

I think the fold is correct with MayTrap (assuming the rounding mode is known suitable) because "passes are not required to preserve all exceptions that are implied by the original code"; presumably some other operation is eventually going to generate the invalid exception when it sees the SNaN?

kpn added inline comments.Aug 6 2021, 1:19 PM

Actually, the check for FMF.noNaNs() is required to catch non-constants. We can still fold "fadd X, -0" if we know X is not a NaN. We still have the "trivially dead" issue where the instruction would be hanging around needlessly, but still...

kpn added inline comments.Aug 6 2021, 2:16 PM

Rereading @spatel's comment: what about the "X is a variable that happens to have the value of an SNaN" case: it's true that we'll be removing an instruction that would have converted the SNaN to a QNaN. So the removal of the instruction would be observable. I don't see a way around that without eliminating the ability to optimize at all.

andrew.w.kaylor added inline comments.

I don't think you can make this transformation in the "maytrap" case. It would be ok to optimize an SNaN to a QNaN, but you can't eliminate an instruction that would raise the exception while performing the same conversion. Consider this example:

double foo(double x, double y) {
 #pragma clang fp exceptions(maytrap)
  double temp;
  temp = x + -0.0;
  // Check the x = SNaN case
  if (fetestexcept(FE_INEXACT))
    return -1.0;
  temp = temp + y;
  // Check the y = SNaN case
  if (fetestexcept(FE_INEXACT))
    return -2.0;
  return temp;

Eliminating the fadd x, -0 causes the exception to be raised in the wrong place.