This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use V_FMA_MIX* more often
ClosedPublic

Authored by matejam on Jun 22 2023, 6:19 AM.

Details

Summary

Combine mul (f32) + fptrunc (f32->f16) to "v_fma_mixlo_f16 mulSrc1, mulSrc2, 0".

Diff Detail

Event Timeline

matejam created this revision.Jun 22 2023, 6:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2023, 6:19 AM
matejam requested review of this revision.Jun 22 2023, 6:19 AM

What other cases should I cover?
Should I cover the case when we have to pick the higher 16 bits of mul instruction and select v_fma_mixhi_f16?

Should it also be done for GlobalISel in the same patch?

arsenm added inline comments.Jun 22 2023, 6:28 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2573 ↗(On Diff #533565)

Does this need to consider contract flags? I think the FMA selection question is a bit complicated for a selection pattern. If you have to consider fast math flags I think you're better off moving this to the combiners so the existing mix patterns work

2574 ↗(On Diff #533565)

Does the pattern import in globalisel if you add register class annotations to the output operands?

llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
2294

Use named values. Also add some multiple use tests

arsenm added inline comments.Jun 22 2023, 6:29 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2574 ↗(On Diff #533565)

Also handle the mad_mix case?

arsenm added inline comments.Jun 22 2023, 9:58 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2573 ↗(On Diff #533565)

This is also missing source modifier usage

@arsenm Would it be correct if I wrote a pattern in MadFmaMixPats multiclass, that finds (f16 (fptrunc (f32 (fmul %src1, %src2)))) and turns it into v_fma_mixlo_f16 %src1, %src2, 0.
The add would stay the same.

matejam updated this revision to Diff 535399.Jun 28 2023, 7:29 AM
matejam edited the summary of this revision. (Show Details)

Transform fptrunc (mul a, b) -> fma_mix/mad_mix a, b, 0, implicit_def
or
build_vector el0, (fptrunc (mul a, b)) -> fma_mix/mad_mix a, b, 0, el0.

matejam added inline comments.Jun 28 2023, 12:31 PM
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
194

I could add two more patterns similar to this one, with AMDGPUclamp.
One with elt0 as its first element and the other with lo_src* and hi_src*.

llvm/test/CodeGen/AMDGPU/fdiv.f16.ll
0–1

I guess this was an accident.

FMA and MAD formation is kind of complicated and I'm still leaning towards you being better off doing this in a combine where all other fma formation is done

I'm assuming all these instructions internally perform the fma as f32?

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
185

probably should be just fmul, this may not be the correct behavior with strictfp if the fmul were to raise an exception

187

If this is using v_mad_mix (i.e >= gfx900 && < gfx906), you can't introduce v_mad* without checking if denormal flushing is enabled

llvm/test/CodeGen/AMDGPU/fdiv.f16.ll
0–1

Conversion of tests to generated checks should be done separate from a functional change

llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
2205

Why the struct return?

Should pre-commit the baseline tests for any new cases

2261

Use named values and drop the flags

2263

The base pattern doesn't include the add. I think tests including the add could be useful, but the base pattern tests should end at the fmul

2264–2266

Don't need all this casting noise to return a result

2267

don't bother with the struct wrapper?

2270

Need tests with source modifiers and multiple uses, plus the permutations with the lo and hi cases. Also denormal flushing on and off

Also need all the tests for the other pattern

matejam updated this revision to Diff 536914.Jul 3 2023, 4:01 PM

Instead of writing patterns that select fma/mad_mix*, write combiners for sdag and global isel that will transform
fptrunc (fmul a, b) into fptrunc (fma a, b, 0), which will later be selected into v_fma/mad_mix*.

matejam updated this revision to Diff 536982.Jul 4 2023, 1:18 AM

Remove add instruction from fptrunc + fmul -> v_fma/mad_mix*.

matejam marked 4 inline comments as done.Jul 4 2023, 1:28 AM
matejam added inline comments.
llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
2270

Multiple uses meaning that I should have more instructions which use %mul?
Denormal flushing is covered in test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll.
I will work on v_fma/mad_mixhi combing now.

foad added a comment.Jul 4 2023, 7:39 AM

FMA and MAD formation is kind of complicated and I'm still leaning towards you being better off doing this in a combine where all other fma formation is done

But this isn't really FMA/MAD formation - if the addend is hard coded to 0 then these instructions just do a regular multiply. The "fused"ness or not doesn't have any effect.

I'm assuming all these instructions internally perform the fma as f32?

Yes the "mix" forms all do a 32-bit mad/fma internally.

@foad do you think I should do something similar to Diff 2?
Or should I write AMDGPU-only combiners?
Or something else?

foad added a comment.Jul 4 2023, 9:16 AM

@foad do you think I should do something similar to Diff 2?

Yes, something like that.

foad added inline comments.Jul 4 2023, 9:18 AM
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
187

Is that a pre-existing problem with the other patterns in this multiclass (I see there is a TODO comment about it at the top)? Or do they check for denormal handling before creating anything that matches fma_like?

matejam added inline comments.Jul 5 2023, 1:56 AM
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
187

If you run llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll with -denormal-fp-math=ieee or preserve-sign, you will get the same result, both will select v_fma/mad_mix* instructions.

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
187

Or do they check for denormal handling before creating anything that matches fma_like?

It is checked in SITargetLowering::isFMAFasterThanFMulAndFAdd and SITargetLowering::isFMADLegal, I think.

arsenm added inline comments.Jul 6 2023, 3:24 PM
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
5340 ↗(On Diff #536982)

I think CombinerHelper already has a TLI member

5361 ↗(On Diff #536982)

I think just using the type here works

5365 ↗(On Diff #536982)

Missing an erase?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15838 ↗(On Diff #536982)

Swap the order of these checks

matejam updated this revision to Diff 539931.Jul 13 2023, 3:45 AM

Use selection patterns for selecting V_FMA/MAD_MIX* instead of combiners.
With combiners there is a possibility that fptrunc (fmul a, b) would be selected into fptrunc (fma a, b, 0), and some other combiner could transform it back to its original state.

arsenm added inline comments.Jul 13 2023, 6:44 AM
llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
2246

Drop unnecessary flags

2293

Drop unnecessary flags

2297

Add tests with source modifier usage

matejam updated this revision to Diff 540007.Jul 13 2023, 7:06 AM

Removed unnecessary flags from fmul instruction in tests.
Add test with abs as a src0 modifier.

tsymalla requested changes to this revision.Jul 13 2023, 7:10 AM

Can you please rebase since the patch does not seem to apply?

This revision now requires changes to proceed.Jul 13 2023, 7:10 AM
tsymalla resigned from this revision.Jul 13 2023, 7:11 AM
This revision now requires review to proceed.Jul 13 2023, 7:11 AM
arsenm accepted this revision.Jul 13 2023, 7:13 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
2339

fneg one would be good too

This revision is now accepted and ready to land.Jul 13 2023, 7:13 AM
matejam updated this revision to Diff 540020.Jul 13 2023, 7:23 AM

Add test with fneg as src modifier.
Something weird happening with patch application, maybe the parent review https://reviews.llvm.org/D155171 should be merged first?

matejam updated this revision to Diff 540031.Jul 13 2023, 7:29 AM

Rebase and merge precommit.

matejam updated this revision to Diff 540039.Jul 13 2023, 7:50 AM

Rebase again.

This revision was landed with ongoing or failed builds.Jul 13 2023, 7:58 AM
This revision was automatically updated to reflect the committed changes.