Constant fold, canonicalize constants to RHS,
reduce to minnum/maxnum when inputs are nan/undef.
Details
- Reviewers
artem.tamazov majnemer • tstellarAMD
Diff Detail
Event Timeline
Looks good, but IEEE-754 correctness needs to be verified. Is IEEE compliance required for llvm.amdgcn.fmed3.f32? If it is, we shall look to formal definition of fmed3 and check carefully.
For example, transformations like fmed3(0.0, 1.0, x) -> fmed3(x, 0.0, 1.0) may be non-IEEE-compliant w.r.t. sNANs when shader is in IEEE mode. That depends on expected semantics of fmed3, of course. For example, this is how V_MED3_F semantics is defined for Gfx8:
If (isNan(Src0) || isNan(Src1) || isNan(Src2)) Result = MIN3(Src0, Src1, Src2) Else if (MAX3(Src0, Src1, Src2) == Src0) Result = MAX(Src1, Src2) Else if (MAX3(Src0, Src1, Src2) == Src1) Result = MAX(Src0, Src2) Else Result = MAX(Src0, Src1)
Clarification:
...and, in IEEE mode, V_MED3_F32(0.0, 1.0, sNAN) yelds qNAN, while V_MED3_F32(sNAN, 0.0, 1.0) produces 1.0.
It should match the instruction behavior, but we don't necessarily care about it treating signaling NaNs correctly though. LLVM in general isn't aware of them and breaks their behavior everywhere. The new constrained FP intrinsics should be aware of proper snan behavior though. When we have a complete set of constrained FP intrinsics and when people start using them, we could add a constrained version which would need to properly handle sNaNs. As far as this intrinsic is concerned, as long as it preserves general NaN behavior ignoring quieting etc. that should OK
All right. I just would like to make the case clear. When shader is in IEEE mode, this intrinsic does not preserve NAN behavior for some cases, e.g. if (x == sNAN), then (fmed3(0,1,x) == qNAN), but (fmed3(x,0,1) == 1). This is OK until we do not try to fold OpenCL constructs like
if (fmax(fmax(a, b), c) == a) d = fmax(b, c); else if (fmax(fmax(a, b), c) == b) d = fmax(a, c); else d = fmax(a, b);
to
d = llvm.amdgcn.fmed3(a, b , c);