In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes default to potentially raising FP exceptions unless otherwise specified -- i.e. if we forget to propagate the flag somewhere, the effect is now only lost performance, not incorrect code.
However, the related flag at the MI level still defaults to nodes not raising FP exceptions unless otherwise specified. To be fully on the (conservatively) safe side, we should invert that flag as well.
This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept. (Note that this does also introduce an incompatible change in the MIR format.)
This should be NFC for all MI instructions emitted by SelectionDAG. However, we could see some changes now in these cases:
- MI instructions emitted by target code (post-ISEL passes). I believe I fixed all those on SystemZ -- I'm not seeing any assembler changes at all with this patch when building a full benchmark suite. However, there may be issues on other targets.
- Code emitted by FastISel and GlobalISel. These will now default to raising exceptions for all FP instructions (on targets that already define mayRaiseFPExceptions). For FastISel, this probably doesn't matter much since this should mostly be used at -O0, where the effect of instructions raising (or not) FP exceptions should be minimal anyway. GlobalISel will definitely need to be fixed to set the NoFPExcept flag where appropriate.
The patch introduces a number of changes to X86 test cases. Some of them are expected (replace fpexcept output with nofpexcept), some are a bit surprising.
- In two places in fp-intrinsics-flags.ll, we see an LD_Fp64m80 marked nofpexcept. This is wrong (but it was already wrong before, where it should have been marked fpexcept but wasn't). This is probably due to the X86 target somewhere not copying flags correctly.
- In fast-isel-select-sse.ll, I was seeing different output for the -fast-isel vs. not case. This turned out to be yet another artifact of the differing behavior of MachineCSE. With -fast-isel, FP instructions are now marked as raising exceptions. Therefore MachineCSE no longer coalesces some COPYs into the FP instruction. Therefore, we get some differences in relative lenghts of use-def chains which lead the final heuristic in TwoAddressInstructionPass::isProfitableToCommute to come to different conclusions. In the "bad" case this means one extra move instruction in the output. It's not quite clear to me that this can be "fixed" as the "good" case seems to be just plain luck here -- the heuristic there doesn't actually address the underlying problem at all. I'm now simply using -O0 with both -fast-isel and without, which gets back to identical output (including an extra copy). Given that -fast-isel tends to be used primarily with -O0 this may be fine ...