This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add patterns to select masked FP arith
ClosedPublic

Authored by c-rhodes on Jul 26 2022, 5:44 AM.

Details

Summary

Add patterns to select predicated instructions when lowering:

fadd(a, select(mask, b, splat(0)))
fsub(a, select(mask, b, splat(0)))

'fadd' is unsafe unless no-signed zeros fast-math flag is set, since

-0.0 + 0.0 = 0.0

changes the sign. Alive2: https://alive2.llvm.org/ce/z/wbhJh_

Also adds FMA patterns for:

fadd(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)
fsub(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)

These patterns require the 'contract' fast-math flag to be set, and the
fadd 'nsz' as above.

Diff Detail

Event Timeline

c-rhodes created this revision.Jul 26 2022, 5:44 AM
c-rhodes requested review of this revision.Jul 26 2022, 5:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 5:44 AM
efriedma added inline comments.Jul 26 2022, 9:29 AM
llvm/test/CodeGen/AArch64/sve-masked-fp-arith.ll
135 ↗(On Diff #447655)

This transform isn't legal without the "contract" fast-math flag.

Matt added a subscriber: Matt.Jul 26 2022, 12:37 PM
c-rhodes updated this revision to Diff 447989.Jul 27 2022, 3:34 AM
  • Check ‘contract’ fast-math flag is set for FMLA/FMLS transform.
  • Tests moved to existing sve-fp-combine.ll
c-rhodes marked an inline comment as done.Jul 27 2022, 3:37 AM
c-rhodes added inline comments.
llvm/test/CodeGen/AArch64/sve-masked-fp-arith.ll
135 ↗(On Diff #447655)

This transform isn't legal without the "contract" fast-math flag.

Thanks for pointing that out Eli, fixed now.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
386–387

Do flags on the fmul matter? It's the result of the fadd/fsub that's affected by the contraction and so I think only those nodes require the contract flag.

I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value.

c-rhodes marked an inline comment as done.Jul 29 2022, 1:36 AM
c-rhodes added inline comments.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
386–387

Do flags on the fmul matter? It's the result of the fadd/fsub that's affected by the contraction and so I think only those nodes require the contract flag.

Not entirely sure to be honest, the existing SVE patterns we have to combine fmul+fadd into fma don't kick in unless contract is also on the fmul: https://godbolt.org/z/xWsn7vs5f

I checked some other targets (X86 and Power9) and they also don't combine unless contract is on the fmul, but there is a combine in AArch64 for fmadd that kicks in without contract on fmul: https://godbolt.org/z/rzzTb8s9W

I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value.

Not sure either, I'll look into it.

c-rhodes added inline comments.Aug 2 2022, 3:27 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
386–387

I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value.

Not sure either, I'll look into it.

I think I understand the issue now.

printf("%g %g\n",  0.0f + 0.0f, -0.0f + 0.0f);
printf("%g %g\n",  0.0f - 0.0f, -0.0f - 0.0f);
printf("%g %g\n",  0.0f * 0.0f, -0.0f * 0.0f);

gives:

0 0
0 -0
0 -0

so fadd produces different result and is unsafe with no-sign zeroes. Alive2 agrees:

nsz is required for the fadd/sel and fadd/sel/fmul (FMLA) patterns. Although the fmul/sel patterns aren't valid according to Alive2.

c-rhodes updated this revision to Diff 449584.Aug 3 2022, 1:38 AM
c-rhodes edited the summary of this revision. (Show Details)
c-rhodes set the repository for this revision to rG LLVM Github Monorepo.
  • fmul(a, select(mask, b, splat(0))) transform isn’t correct, remove it.
  • contract fast-math flag only required on the fadd/fsub op.
  • fadd(a, select(mask, b, splat(0))’transform is unsafe unless nsz fast-math flag is specified.
c-rhodes marked an inline comment as done.Aug 3 2022, 1:39 AM
paulwalker-arm accepted this revision.Aug 5 2022, 4:01 AM

As we increase the idioms that need to match to fma like instructions we may want to move some of this logic into c++ code to canonicalise the DAG and emit FMA_PRED/vselect combos that are easier to match, but that can wait for another day.

This revision is now accepted and ready to land.Aug 5 2022, 4:01 AM
This revision was landed with ongoing or failed builds.Aug 8 2022, 1:45 AM
This revision was automatically updated to reflect the committed changes.