Combine mul (f32) + fptrunc (f32->f16) to "v_fma_mixlo_f16 mulSrc1, mulSrc2, 0".
Details
Diff Detail
Event Timeline
What other cases should I cover?
Should I cover the case when we have to pick the higher 16 bits of mul instruction and select v_fma_mixhi_f16?
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2573 ↗ | (On Diff #533565) | Does this need to consider contract flags? I think the FMA selection question is a bit complicated for a selection pattern. If you have to consider fast math flags I think you're better off moving this to the combiners so the existing mix patterns work |
2574 ↗ | (On Diff #533565) | Does the pattern import in globalisel if you add register class annotations to the output operands? |
llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll | ||
2294 | Use named values. Also add some multiple use tests |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2574 ↗ | (On Diff #533565) | Also handle the mad_mix case? |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2573 ↗ | (On Diff #533565) | This is also missing source modifier usage |
@arsenm Would it be correct if I wrote a pattern in MadFmaMixPats multiclass, that finds (f16 (fptrunc (f32 (fmul %src1, %src2)))) and turns it into v_fma_mixlo_f16 %src1, %src2, 0.
The add would stay the same.
Transform fptrunc (mul a, b) -> fma_mix/mad_mix a, b, 0, implicit_def
or
build_vector el0, (fptrunc (mul a, b)) -> fma_mix/mad_mix a, b, 0, el0.
FMA and MAD formation is kind of complicated and I'm still leaning towards you being better off doing this in a combine where all other fma formation is done
I'm assuming all these instructions internally perform the fma as f32?
llvm/lib/Target/AMDGPU/VOP3PInstructions.td | ||
---|---|---|
185 | probably should be just fmul, this may not be the correct behavior with strictfp if the fmul were to raise an exception | |
187 | If this is using v_mad_mix (i.e >= gfx900 && < gfx906), you can't introduce v_mad* without checking if denormal flushing is enabled | |
llvm/test/CodeGen/AMDGPU/fdiv.f16.ll | ||
0–1 | Conversion of tests to generated checks should be done separate from a functional change | |
llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll | ||
2205 | Why the struct return? Should pre-commit the baseline tests for any new cases | |
2261 | Use named values and drop the flags | |
2263 | The base pattern doesn't include the add. I think tests including the add could be useful, but the base pattern tests should end at the fmul | |
2264–2266 | Don't need all this casting noise to return a result | |
2267 | don't bother with the struct wrapper? | |
2270 | Need tests with source modifiers and multiple uses, plus the permutations with the lo and hi cases. Also denormal flushing on and off Also need all the tests for the other pattern |
Instead of writing patterns that select fma/mad_mix*, write combiners for sdag and global isel that will transform
fptrunc (fmul a, b) into fptrunc (fma a, b, 0), which will later be selected into v_fma/mad_mix*.
llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll | ||
---|---|---|
2270 | Multiple uses meaning that I should have more instructions which use %mul? |
FMA and MAD formation is kind of complicated and I'm still leaning towards you being better off doing this in a combine where all other fma formation is done
But this isn't really FMA/MAD formation - if the addend is hard coded to 0 then these instructions just do a regular multiply. The "fused"ness or not doesn't have any effect.
I'm assuming all these instructions internally perform the fma as f32?
Yes the "mix" forms all do a 32-bit mad/fma internally.
llvm/lib/Target/AMDGPU/VOP3PInstructions.td | ||
---|---|---|
187 | Is that a pre-existing problem with the other patterns in this multiclass (I see there is a TODO comment about it at the top)? Or do they check for denormal handling before creating anything that matches fma_like? |
llvm/lib/Target/AMDGPU/VOP3PInstructions.td | ||
---|---|---|
187 | If you run llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll with -denormal-fp-math=ieee or preserve-sign, you will get the same result, both will select v_fma/mad_mix* instructions. |
llvm/lib/Target/AMDGPU/VOP3PInstructions.td | ||
---|---|---|
187 |
It is checked in SITargetLowering::isFMAFasterThanFMulAndFAdd and SITargetLowering::isFMADLegal, I think. |
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
5340 ↗ | (On Diff #536982) | I think CombinerHelper already has a TLI member |
5361 ↗ | (On Diff #536982) | I think just using the type here works |
5365 ↗ | (On Diff #536982) | Missing an erase? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
15838 ↗ | (On Diff #536982) | Swap the order of these checks |
Use selection patterns for selecting V_FMA/MAD_MIX* instead of combiners.
With combiners there is a possibility that fptrunc (fmul a, b) would be selected into fptrunc (fma a, b, 0), and some other combiner could transform it back to its original state.
Removed unnecessary flags from fmul instruction in tests.
Add test with abs as a src0 modifier.
llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll | ||
---|---|---|
2339 | fneg one would be good too |
Add test with fneg as src modifier.
Something weird happening with patch application, maybe the parent review https://reviews.llvm.org/D155171 should be merged first?
probably should be just fmul, this may not be the correct behavior with strictfp if the fmul were to raise an exception