This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Add clamp combine
ClosedPublic

Authored by Petar.Avramovic on Oct 23 2020, 8:43 AM.

Details

Summary

Add clamp combine. Source is fminnum(fmaxnum(Val, K0), K1) or
fmaxnum(fminnum(Val, K1), K0) with K0=0.0 and K1=1.0 or fmed3
intrinsic with 0.0 and 1.0 as two out of three operands.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2020, 8:43 AM
Petar.Avramovic requested review of this revision.Oct 23 2020, 8:43 AM

Missing tests for f16 cases

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
187 ↗(On Diff #300312)

Check this first around the whole section?

207–216 ↗(On Diff #300312)

These look like two distinct apply actions to me

224–228 ↗(On Diff #300312)

getConstantFPVRegVal

233 ↗(On Diff #300312)

This should be implied by the matcher opcode

234 ↗(On Diff #300312)

Early return on intrinsic check

235–246 ↗(On Diff #300312)

I think the flow of the DAG version is easier to handle, swapping the values and then checking isClampZeroToOne at the end

247 ↗(On Diff #300312)

Typo remainig

254 ↗(On Diff #300312)

Should get this from the MachineIRBuilder

256–260 ↗(On Diff #300312)

I think something is confused here. IEEE mode only changes signaling nan behavior. I also think trying to handle IEEE=0 correctly with snans is a fools errand since *no* operations will behave correctly. The DAG combine for this does not consider IEEE mode

275 ↗(On Diff #300312)

Should not construct a new MachineIRBuilder, there should be one existing already in the combiner

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
207–216 ↗(On Diff #300312)

They share same match action. If we want to split them to matchMinMaxToClamp and matchMinMaxToMed3 we might end up matching min/max pattern twice.

235–246 ↗(On Diff #300312)

That swaps operands and might produce different result with IEEE=1

256–260 ↗(On Diff #300312)

IEEE=0 does not treat SNaN differently and needs no SNaN checks. With IEEE=0 DX10Clamp=1 (shaders want all possible x2, x4, div2, and clamp modifiers) there is no need for any nan check. Other operation don't really mention that backend needs to check for special values. They do have different behavior when input is SNaN but correctness in not mentioned. What do we have to do force correct behavior for IEEE=0?
The considering of SNaN only comes from compute kernel functions being in IEEE=1 mode and trying to lower fminnum to fminnum_ieee (If higher language used fminnum_ieee with IEEE=1 there would be no need for SNaN checks at all).

foad added a comment.Oct 27 2020, 3:49 AM

Is it possible to separate this into two different combines:

  1. min(max(...)) or max(min(...)) -> med(x, K0, K1)
  2. med(x, 0.0, 1.0) -> clamp(x)

?

arsenm added inline comments.Nov 2 2020, 1:13 PM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
235–246 ↗(On Diff #300312)

Are you saying the DAG version is broken?

256–260 ↗(On Diff #300312)

With IEEE=0, no operations quiet signalling nans. All operations behave incorrectly and pass through the snan instead of quieting it. The lack of quieting the snan is a correctness issue. It would be a massive amount of work to actually implement the correct behavior and manually quiet snan inputs to implement the correct generic operation semantics, which we will probably never do.

Petar.Avramovic added inline comments.Nov 3 2020, 5:52 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
235–246 ↗(On Diff #300312)

DAG version should also check for SNaN (This is for IEEE=true) alongside DX10Clamp.
result of fmed3 depends on operand order when one is SNaN
swapping the operands and deciding to give up since input might be SNaN might create different result:
med3_f32 (0, 1, SNaN) = min(min(0,1), SNaN) = QNaN
med3_f32 (0, 1, SNaN) -(swap)-> med3_f32 (SNaN, 0, 1) = min(min(SNaN, 0), 1) = 1

swapping is fine for QNaN.

256–260 ↗(On Diff #300312)

All operations behave incorrectly and pass through the snan instead of quieting it. The lack of quieting the snan is a correctness issue.

Is this for llvm-ir? Would it then be acceptable to only perform combine if nnan flag (check using isKnownNeverNaN) is present on the instruction with IEEE=false?
Also when IEEE=false we should never use isKnownNeverSNaN.
For IEEE=true we could then check for NaN input or DX10Clamp.

arsenm added inline comments.Nov 3 2020, 8:00 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
256–260 ↗(On Diff #300312)

Yes. Officially the IR description pretends snans do not exist, but really they do and need to be quieted by FP canonicalizing FP operations as per IEEE754. I'm not sure it even makes sense to try to get correct snan behavior with IEEE=0 since it's never going to be implemented. The comment here absolutely needs to make clear the distinction between snan and qnan handling

Getting IEEE=true correct is far more interesting/important.

reverse ping

arsenm requested changes to this revision.Mar 30 2021, 6:02 PM
This revision now requires changes to proceed.Mar 30 2021, 6:02 PM
Petar.Avramovic retitled this revision from AMDGPU/GlobalISel: Add clamp combine for IEEE=false to AMDGPU/GlobalISel: Add clamp combine.
Petar.Avramovic edited the summary of this revision. (Show Details)

Rebase/rework.

Add one test for splat padded with undef with ieee=true and dx10_clamp=true.

arsenm added inline comments.Jun 28 2021, 5:23 PM
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
237

I would expect the helper above to return the two constants directly rather than having to requery the registers

246–248

Should swap these conditions so the cheap checks are first

351–357

There's already a helper for this, getDefIgnoringCopies/getDefSrcRegIgnoringCopies

Use matcher that returns APFloat.

arsenm accepted this revision.Jul 14 2021, 3:50 PM
This revision is now accepted and ready to land.Jul 14 2021, 3:50 PM
foad added inline comments.Jul 15 2021, 6:54 AM
llvm/lib/Target/AMDGPU/AMDGPUCombine.td
76

Can you use the standard register_matchdata for this?

Use register_matchinfo.

arsenm accepted this revision.Jul 19 2021, 7:16 PM
This revision was landed with ongoing or failed builds.Dec 3 2021, 4:02 AM
This revision was automatically updated to reflect the committed changes.