This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Basic folds for fmed3 intrinsic
ClosedPublic

Authored by arsenm on Jan 31 2017, 10:36 AM.

Details

Summary

Constant fold, canonicalize constants to RHS,
reduce to minnum/maxnum when inputs are nan/undef.

Diff Detail

Event Timeline

arsenm created this revision.Jan 31 2017, 10:36 AM
artem.tamazov edited edge metadata.EditedFeb 27 2017, 7:16 AM

Looks good, but IEEE-754 correctness needs to be verified. Is IEEE compliance required for llvm.amdgcn.fmed3.f32? If it is, we shall look to formal definition of fmed3 and check carefully.

For example, transformations like fmed3(0.0, 1.0, x) -> fmed3(x, 0.0, 1.0) may be non-IEEE-compliant w.r.t. sNANs when shader is in IEEE mode. That depends on expected semantics of fmed3, of course. For example, this is how V_MED3_F semantics is defined for Gfx8:

If (isNan(Src0) || isNan(Src1) || isNan(Src2))
  Result = MIN3(Src0, Src1, Src2)
Else if (MAX3(Src0, Src1, Src2) == Src0)
  Result = MAX(Src1, Src2)
Else if (MAX3(Src0, Src1, Src2) == Src1)
  Result = MAX(Src0, Src2)
Else
  Result = MAX(Src0, Src1)

Clarification:

...Is IEEE compliance required for llvm.amdgcn.fmed3.f32? If it is, we shall look to formal definition of fmed3 and check carefully.
For example, transformations like fmed3(0.0, 1.0, x) -> fmed3(x, 0.0, 1.0) may be non-IEEE-compliant w.r.t. sNANs when shader is in IEEE mode.
That depends on expected semantics of fmed3, of course. For example, this is how V_MED3_F semantics is defined for Gfx8...

...and, in IEEE mode, V_MED3_F32(0.0, 1.0, sNAN) yelds qNAN, while V_MED3_F32(sNAN, 0.0, 1.0) produces 1.0.

Looks good, but IEEE-754 correctness needs to be verified. Is IEEE compliance required for llvm.amdgcn.fmed3.f32? If it is, we shall look to formal definition of fmed3 and check carefully.

For example, transformations like fmed3(0.0, 1.0, x) -> fmed3(x, 0.0, 1.0) may be non-IEEE-compliant w.r.t. sNANs when shader is in IEEE mode. That depends on expected semantics of fmed3, of course. For example, this is how V_MED3_F semantics is defined for Gfx8:

If (isNan(Src0) || isNan(Src1) || isNan(Src2))
  Result = MIN3(Src0, Src1, Src2)
Else if (MAX3(Src0, Src1, Src2) == Src0)
  Result = MAX(Src1, Src2)
Else if (MAX3(Src0, Src1, Src2) == Src1)
  Result = MAX(Src0, Src2)
Else
  Result = MAX(Src0, Src1)

It should match the instruction behavior, but we don't necessarily care about it treating signaling NaNs correctly though. LLVM in general isn't aware of them and breaks their behavior everywhere. The new constrained FP intrinsics should be aware of proper snan behavior though. When we have a complete set of constrained FP intrinsics and when people start using them, we could add a constrained version which would need to properly handle sNaNs. As far as this intrinsic is concerned, as long as it preserves general NaN behavior ignoring quieting etc. that should OK

artem.tamazov accepted this revision.Feb 27 2017, 12:46 PM

As far as this intrinsic is concerned, as long as it preserves general NaN behavior ignoring quieting etc. that should OK

All right. I just would like to make the case clear. When shader is in IEEE mode, this intrinsic does not preserve NAN behavior for some cases, e.g. if (x == sNAN), then (fmed3(0,1,x) == qNAN), but (fmed3(x,0,1) == 1). This is OK until we do not try to fold OpenCL constructs like

if (fmax(fmax(a, b), c) == a)
  d = fmax(b, c);
else if (fmax(fmax(a, b), c) == b)
  d = fmax(a, c);
else
  d = fmax(a, b);

to

d = llvm.amdgcn.fmed3(a, b , c);
This revision is now accepted and ready to land.Feb 27 2017, 12:46 PM
arsenm closed this revision.Feb 27 2017, 3:20 PM

r296409