This is an archive of the discontinued LLVM Phabricator instance.

[X86][ConstraintFP] Model `MXCSR` for function call
ClosedPublic

Authored by pengfei on Dec 7 2022, 8:56 AM.

Details

Summary

This patch is inspired by D111433. It would affect the performance under
strict FP mode. But it preserves the correct rounding behavior accross
function calls.

Fixes #59305

Diff Detail

Event Timeline

pengfei created this revision.Dec 7 2022, 8:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2022, 8:56 AM
pengfei requested review of this revision.Dec 7 2022, 8:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2022, 8:56 AM
craig.topper added inline comments.Dec 7 2022, 11:16 AM
llvm/test/CodeGen/X86/fp-strict-scalar-inttofp-fp16.ll
384

What happened here?

pengfei added inline comments.Dec 7 2022, 7:18 PM
llvm/test/CodeGen/X86/fp-strict-scalar-inttofp-fp16.ll
384

Before this change, addss can be sunk into %bb.1. But it fails now because of https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/MachineSink.cpp#L918
due to MXCSR has a def in this function.

sepavloff added inline comments.Jan 12 2023, 10:08 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4491–4495

The function name is a bit misleading, maybe getFPEnvironmentRegisters or some thing like that?

If this function lists FP environment registers that can be changed by a call, then any FP control mode register, not only rounding control (for example, denormal treatment or precision control) also can be modified. FP status information, like raised exception, also may be changed in the called procedure, so it is reasonable to include relevant registers as well.

MXCSR contains all this information but other architectures may have different registers for such information.

pengfei added inline comments.Jan 13 2023, 5:38 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4491–4495

Thanks for the inputs! Modeling all the FP environment registers sounds attractive. However, it is not what the current strict FP required.

  1. Denormal treatment is a good point. We haven't modeled it, but I think it's not a problem if we sort it into rounding control.
  2. Precision as well as exception control registers do nothing with strict FP, because we just use MIFlag to ensure exception correctness.
  3. The exception status registers are the same. We never use this information in strict FP.

In a word, only the rounding behaviors in strict FP mode relies on register modeling. So I think the name reflects the requirments correctly.

sepavloff added inline comments.Jan 16 2023, 8:10 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4491–4495

Current definition of strictfp in https://llvm.org/docs/LangRef.html#function-attributes explicitly mentions status flags:

LLVM will not attempt any optimizations that require assumptions about the floating-point rounding mode or that might alter the state of floating-point status flags ...

As for other control modes, like denormal mode, they behave similar to rounding direction. If a function call (like C2x library function fesetmode) modifies them, nothing prevents from moving the call across the users of the modified control register.

Listing all FP environment registers saves us from undesired code motion.

Reverse ping.

pengfei added inline comments.Jan 26 2023, 4:48 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4491–4495

It is how it works across the whole compiler pipeline. But I tend to take rounding mode and floating-point status flags as 2 different aspects of strictfp due to their different implementation in backend.
To prevent backend optimizations over FP exceptions, we introduced an MI flag NoFPExcept, which will be set to false for FP instructions that may raise exception under strictfp mode and check mayRaiseFPException before doing optimizations. This is different from we preventing optimizations through modeling rounding control registers.
For example, preventing instruction moving in MachineInstr::isSafeToMove, preventing CSE we are facing here in MachineCSE::isCSECandidate, where isCall is checked too.
That said we don't need to model any FP status flags for call instructions again. Thus we should avoid to use FPEnvironmentRegisters for confusion.
As for denormal mode control, I think it's OK we take them as rounding control class and return them with the same method in future.

sepavloff added inline comments.Jan 26 2023, 5:28 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4491–4495

Thank you for the detailed explanation!

You are right, separation of status flags from control modes has benefits. It is OK to limit affected registers to control modes only in this patch, status flags anyway are in the same register on X86.

What about using something like getFPControlRegisters as a name?

pengfei added inline comments.Jan 26 2023, 6:06 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4491–4495

status flags anyway are in the same register on X86.

Actually not, X87 has separate FPSW too.

What about using something like getFPControlRegisters as a name?

I don't think so. The exception mask register is also a FP control register. We don't want to model them too. Actually we have a huge space for performance improvement under strict exception mode if we can correctly model these mask in bits. It needs a large refactor and I don't have a clear idea at the moment.

pengfei added inline comments.Jan 26 2023, 6:12 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4491–4495

If you like, we may expand it in future with an enum argument RoundingControl, DenormalControl, ExceptionControl, ExceptionStatus etc. for different requirments, then we can change is to getFPEnvironmentRegisters. I think it's sufficient for this patch to go with it.

sepavloff accepted this revision.Jan 26 2023, 8:32 AM

LGTM.

Thanks!

This revision is now accepted and ready to land.Jan 26 2023, 8:32 AM
This revision was landed with ongoing or failed builds.Jan 26 2023, 6:07 PM
This revision was automatically updated to reflect the committed changes.