Page MenuHomePhabricator

[FPEnv][RFC] Invert sense of MIFlag::FPExcept flag
ClosedPublic

Authored by uweigand on Jan 9 2020, 10:28 AM.

Details

Summary

In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes default to potentially raising FP exceptions unless otherwise specified -- i.e. if we forget to propagate the flag somewhere, the effect is now only lost performance, not incorrect code.

However, the related flag at the MI level still defaults to nodes not raising FP exceptions unless otherwise specified. To be fully on the (conservatively) safe side, we should invert that flag as well.

This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept. (Note that this does also introduce an incompatible change in the MIR format.)

This should be NFC for all MI instructions emitted by SelectionDAG. However, we could see some changes now in these cases:

  • MI instructions emitted by target code (post-ISEL passes). I believe I fixed all those on SystemZ -- I'm not seeing any assembler changes at all with this patch when building a full benchmark suite. However, there may be issues on other targets.
  • Code emitted by FastISel and GlobalISel. These will now default to raising exceptions for all FP instructions (on targets that already define mayRaiseFPExceptions). For FastISel, this probably doesn't matter much since this should mostly be used at -O0, where the effect of instructions raising (or not) FP exceptions should be minimal anyway. GlobalISel will definitely need to be fixed to set the NoFPExcept flag where appropriate.

The patch introduces a number of changes to X86 test cases. Some of them are expected (replace fpexcept output with nofpexcept), some are a bit surprising.

  • In two places in fp-intrinsics-flags.ll, we see an LD_Fp64m80 marked nofpexcept. This is wrong (but it was already wrong before, where it should have been marked fpexcept but wasn't). This is probably due to the X86 target somewhere not copying flags correctly.
  • In fast-isel-select-sse.ll, I was seeing different output for the -fast-isel vs. not case. This turned out to be yet another artifact of the differing behavior of MachineCSE. With -fast-isel, FP instructions are now marked as raising exceptions. Therefore MachineCSE no longer coalesces some COPYs into the FP instruction. Therefore, we get some differences in relative lenghts of use-def chains which lead the final heuristic in TwoAddressInstructionPass::isProfitableToCommute to come to different conclusions. In the "bad" case this means one extra move instruction in the output. It's not quite clear to me that this can be "fixed" as the "good" case seems to be just plain luck here -- the heuristic there doesn't actually address the underlying problem at all. I'm now simply using -O0 with both -fast-isel and without, which gets back to identical output (including an extra copy). Given that -fast-isel tends to be used primarily with -O0 this may be fine ...

Diff Detail

Event Timeline

uweigand created this revision.Jan 9 2020, 10:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2020, 10:28 AM
craig.topper accepted this revision.Jan 9 2020, 6:08 PM

LGTM

llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
9

I don't know how to fix this. This is a pattern match from a any extending load instruction. There's no flag to copy from. Unfortunately, X87 generates an exception if you load a NAN from a float or double memory location. But I don't know how to represent that in SelectionDAG or IR.

This revision is now accepted and ready to land.Jan 9 2020, 6:08 PM
pengfei added a subscriber: pengfei.Jan 9 2020, 6:10 PM
pengfei added inline comments.Jan 9 2020, 6:38 PM
llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
9

SSE instructions don't generate exceptions when only loading from memory to register. Maybe other targets too. And current strict FP semantics don't define a strict load. Can we add masking the #I before FLD and FPCW recovery operations into the pattern?

pengfei added inline comments.Jan 9 2020, 6:48 PM
llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
9

Oh, we also add those instructions in non-strict scenarios. So it's not practicable.

craig.topper added inline comments.Jan 9 2020, 6:52 PM
llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
9

What happens to the SNAN or denormal value that was loaded if the exception is masked. Does the SNAN get quieted or does it stay an SNAN in 80-bit format? If masking quiets it then the SNAN would never signal. For SSE it would be signalled when the SNAN is operated on by an arithmetic instruction.

pengfei added inline comments.Jan 9 2020, 10:24 PM
llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
9

sNaN will be kept sNaN in 80-bit while qNaN to be qNaN, tested by below code:

void foo(unsigned a) {
  fedisableexcept(FE_INVALID);
  asm("fld1\nflds %0\nfwait" :: "m"(a));
  feenableexcept(FE_INVALID);
  asm("fucom\nfwait");
}

But it becomes more complicated for denormal. Because any denormal in float and double will become normal value in 80-bit format. So we should keep #D unchanged, which means we may still raise exception when loading memory in X87, and we can't keep the same behavior with SSE.

craig.topper added inline comments.Jan 9 2020, 11:41 PM
llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
9

For the SNAN test, you need to clear the exception status bits before unmasking the exception. Otherwise the exception is still pending and will be taken when its unmasked.

pengfei added inline comments.Jan 9 2020, 11:59 PM
llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
9

You are correct! They all turn to qNaN after fld. The former exception came from the pending exception.

uweigand marked an inline comment as done.Jan 10 2020, 6:30 AM
uweigand added inline comments.
llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
9

The test case starts out as just a plain @llvm.experimental.constrained.sitofp.f64.i8
This gets translated at the start of isel to

    t3: i8,ch = load<(load 1 from %fixed-stack.0)> t0, FrameIndex:i32<-1>, undef:i32
  t4: f64,ch = strict_sint_to_fp t0, t3
t6: f80 = fp_extend t4

The strict_sint_to_fp then becomes a CVTSI2SDrr (correctly marked as raising exceptions), while the fp_extend becomes a combination of MOVSDmr and LD_Fp64m80.

It seems to me the problem originates with the fp_extend -- which already should be a strict_fp_extend, really. Then the strict_fp_extend should be converted to a series of fpexcept MI instructions.

I'm not sure exactly where the fp_extend comes from, but I'd assume this is a conversion mandated by the ABI? In that case, it might make sense to check whether the function is marked with the strictfp attribute and generate strict conversions by the ABI interface code in that case.

This revision was automatically updated to reflect the committed changes.