When floating-point environment is not examined, reordering instructions doesn't
harm, otherwise library call that checks floating-point environment state can be
moved before the actual computation, producing wrong results (not affected by
side-effects of operations).
Please drop the AMDGPU changes. I don't see a point in trying to support these at this time, and they can't be meaningfully tested now. While the hardware supports floating point exceptions, they require a lot of additional compiler work to implement beyond the core operations. A lot of SGPRs need to be reserved and initialized, the control registers set to enable them, somehow supporting trap handlers and probably other things.
Other than that, Legal is a bad default for these. These should use the new LibCall action by default set in TargetLoweringBase. There is also a hasFloatingPointExceptions() target hook already, perhaps something could error if these are used and not supported.
You can use a statically sized C array for this
Using the integer size for address space 0 seems like a poor choice. Do targets actually care about this type? Why isn't it an i1 target constant?
A new utility function for isFPOpWithChain would be useful.
Why is this Op.getOperand(0) instead of just Op? I would also move the chain handling out of the loop and handle separately or have a separate w/chain and wo/chain loop
A Chain out arguments looks weird here. Why can't UnrollVectorOp's result have the chain result?
|1309–1312 ↗||(On Diff #38417)|
Why would only the last fmul need to be FMUL_W_CHAIN?
|1450–1454 ↗||(On Diff #38417)|
The instruction asm string should not include the _wchain. A better name I think for this would be V_ADD_F32_FPE or something else that doesn't mention chains.
Not really, this only fixes ordering of FP instructions, everything else (i.e., changes in floating-point state registers) is left for future improvements. I just wanted to ensure that new instructions won't cause selection failures on supported targets, but if you think this is useless for PTX or other targets I can remove these changes.
Is there an advantage to duplicating FADD, FSUB, FMUL, FDIV, and FREM into a second W_CHAIN set, instead of having only a single set of ops which have the chain operand? It would naively seem better to have less duplication? (That is, if the compilation flags say to not care about fp ordering, the input chain would still be there, but would always be set to the entry node.)
This was the first implementation, but the issue was that in that case you can't (or I don't know how) use same instruction twice to create one node with side effects and the other one without (and model changes in related registers probably too). Hal actually said that one could add new instructions and for me it also appeared like a safer choice to do not break current floating-point math.
Dropped AMDGPU. Addressed comments.
This is a rewrite of code in the case directly above, I just made it aware of the chain.
Thanks, it should be Op.
It's not the chain of the node we're processing, it's reference to the chain of an operand. Node shouldn't be legalized by reference to its chain, which will be replaced with new value when such node gets legalized through reverence to its value.
Because its result is processed is assumed to have type of vector element and it is processed that way.
Haven't looked at the target specific code, seems mostly mechanical changes.
Some comments on the generic part.
Remove spurious space change.
I'm not sure I understand what's going on here, this need to be documented.
It is not clear to me that this will preserve the FP env?
spurious formatting change?
Not clear how it relates to the "FP with chain" stuff.
spurious formatting change
In the rest of the patch, I think you grouped the *_W_CHAIN cases.
I'm not convince you maintain correct FPenv semantics here.
Another case of mixing UnsafeFPMath with FPenv, I'm not sure about where this is going. Shouldn't we just drop the chain and turn every XXX_W_CHAIN into XXX when UnsafeFPMath is enabled?
This has changed recently :)