This is an archive of the discontinued LLVM Phabricator instance.

[x86] Split MXCSR into two pseudo-registers
AbandonedPublic

Authored by andrew.w.kaylor on Mar 6 2017, 9:41 AM.

Details

Reviewers
zvi
rnk
efriedma
Summary

Split MXCSR into two pseudo-registers so that the control bits and the status bits can be modeled separately. This register cannot be used as an operand to any instruction so we are free to model it in whatever way is most useful for producing correct code.

This patch only updates the instructions that load and save the entire contents of the register, so both control and status parts are referenced together here. A subsequent patch will update floating point operations to add an implicit use of the control bits and an implicit def of the status bits. This will guarantee that FP instructions are not hoisted above or sunk below the instructions that set the control bits or read the status bits without causing FP operations to act as barriers to one another.

I will be posting another patch shortly to update the clang front end to recognize this change in register naming.

Diff Detail

Repository
rL LLVM

Event Timeline

andrew.w.kaylor created this revision.Mar 6 2017, 9:41 AM
craig.topper removed a subscriber: craig.topper.
craig.topper added a subscriber: craig.topper.
rnk requested changes to this revision.Mar 6 2017, 2:20 PM

lgtm with other clang fix

This revision now requires changes to proceed.Mar 6 2017, 2:20 PM
efriedma requested changes to this revision.Mar 6 2017, 3:20 PM
efriedma added a subscriber: efriedma.

This isn't backward-compatible with existing IR which clobbers mxcsr. You could auto-upgrade, I guess. Alternatively, you could make the status bits a subregister of MXCSR instead of modeling it as two completely separate registers.

A subsequent patch will update floating point operations to add an implicit use of the control bits and an implicit def of the status bits

This seems kind of confusing... strict floating-point ops need to implicitly use and def the status bits, because the new value depends on the previous value. You can think of an FP operation as a logical OR acting on the status register. Many kinds of code motion are legal (e.g. you can reorder FP operations with each other, or hoist them out of loops). But if you omit the use, other optimizations won't work correctly; for example, dead code elimination will eliminate FP operations which have a visible effect on the status register.

Given that, I'm not sure what splitting the status register buys you; I guess it becomes easier to check whether an instruction modifies the control bits?

rnk added a comment.Mar 6 2017, 3:37 PM

This isn't backward-compatible with existing IR which clobbers mxcsr. You could auto-upgrade, I guess. Alternatively, you could make the status bits a subregister of MXCSR instead of modeling it as two completely separate registers.

I didn't think these were supported for modelling inline asm constraints. Besides, the "mxcsr" constraint is less than two months old. Certainly we aren't required to be backwards compatible with it yet.

A subsequent patch will update floating point operations to add an implicit use of the control bits and an implicit def of the status bits

This seems kind of confusing... strict floating-point ops need to implicitly use and def the status bits, because the new value depends on the previous value. You can think of an FP operation as a logical OR acting on the status register. Many kinds of code motion are legal (e.g. you can reorder FP operations with each other, or hoist them out of loops). But if you omit the use, other optimizations won't work correctly; for example, dead code elimination will eliminate FP operations which have a visible effect on the status register.

Given that, I'm not sure what splitting the status register buys you; I guess it becomes easier to check whether an instruction modifies the control bits?

My goal was to restrict the motion of FP operations relative to the instructions that set the control bits or read the status bits without imposing unnecessary restrictions. I believe we agreed when this was discussed previously that the order of exceptions does not need to be preserved, so long as all exceptions are accounted for in the status bits.

I am indeed introducing this register modeling with a view to supporting strict floating point semantics. Initially I intended to model MXCSR use as you indicated, with all strict FP operations having an implicit use and def of this register. However, I was having problems finding a clean way to communicate the fact that an operation required strict semantics across the ISel boundary. This split register modeling was an attempt to get something that was "strict enough" for the strict case without imposing a restriction on the default case.

I suppose you are correct that this has a vulnerability to operations being DCE'd. I'm not sure preserving exception status from floating point operations whose results are never used is a critical use case. I guess that depends on the way we document the semantics of strict FP support. I'll have to think about that.

There is also a question to be resolved as to which function calls should act as barriers and which should not, or in the present context I suppose that equates to which should clobber MXCSR and which should not..

I guess it isn't a backward-compatibility problem if nothing is actually using it yet, I guess. Still, it would be nice to make clobbering "mxcsr" do the obvious, correct thing, as opposed to splitting it into registers which don't actually exist.

I didn't think these were supported for modelling inline asm constraints

Support for what, exactly?

I suppose you are correct that this has a vulnerability to operations being DCE'd. I'm not sure preserving exception status from floating point operations whose results are never used is a critical use case. I guess that depends on the way we document the semantics of strict FP support. I'll have to think about that.

It's not just DCE which is problematic... we could also sink a floating-point operation past a read from the status register. I suppose you could prevent that particular problem by making reads from the status register write to the control bits, but that causes its own problems.

It's not just DCE which is problematic... we could also sink a floating-point operation past a read from the status register. I suppose you could prevent that particular problem by making reads from the status register write to the control bits, but that causes its own problems.

Well, that's exactly the sort of thing I am trying to stop. I was under the impression that if all FP operations have an implicit def of the status bits that would be sufficient to prevent the from being sunk past a read of the status bits. Are you saying that if I have two FP operations with defs of 'mxcsr_s' the back end will be free to assume that the first one can sink past the second operation and a subsequent read of the status bits?

Well, that's exactly the sort of thing I am trying to stop. I was under the impression that if all FP operations have an implicit def of the status bits that would be sufficient to prevent the from being sunk past a read of the status bits. Are you saying that if I have two FP operations with defs of 'mxcsr_s' the back end will be free to assume that the first one can sink past the second operation and a subsequent read of the status bits?

Yes. "def" means completely overwriting the old value, so if you have two operations which def a register, the first definition is dead (whether or not the instruction is dead as a whole). You might want to look at how the x86 backend models arithmetic instructions which set EFLAGS to see how this works in practice; the scheduler will, for example, move an ADD across a CMP+CMOV.

OK, so maybe that puts me back in the position of needing to find a way to conditionally add the MXCSR use/def information only when the strict semantics are required, in which case there would be no significant advantage to splitting the register as I'm proposing here.

andrew.w.kaylor abandoned this revision.Mar 13 2017, 11:49 AM

It looks like I need to rethink this.