This is an archive of the discontinued LLVM Phabricator instance.

[SDAG][x86] check for relaxed math when matching an FP reduction
ClosedPublic

Authored by spatel on Aug 14 2019, 11:50 AM.

Details

Summary

If the last step in a reduction allows reassociation and doesn't care about -0.0, then we are free to recognize that computation as a reduction that may reorder the intermediate steps.

This is requested directly by PR42705:
https://bugs.llvm.org/show_bug.cgi?id=42705
and solves PR42947 (if horizontal math instructions are actually faster than the alternative):
https://bugs.llvm.org/show_bug.cgi?id=42947

Diff Detail

Event Timeline

spatel created this revision.Aug 14 2019, 11:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2019, 11:50 AM
lebedev.ri added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9057

precommit?

craig.topper added inline comments.Aug 14 2019, 12:22 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
35487

So was the FP part of this just unusable before now?

spatel marked 2 inline comments as done.Aug 14 2019, 12:32 PM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9057

Yes, will do.

llvm/lib/Target/X86/X86ISelLowering.cpp
35487

Yes - afaict, we put in the ability for FP down here, but there's no way that was getting down here without matching the reduction.

Please could you add a couple of partial reduction tests (e.g. 4f32 in 8f32)

RKSimon added inline comments.Aug 14 2019, 12:45 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9008

Is this satisfactory for all possible float ops (FMUL/FMIN/etc)? Also, shouldn't we be testing this at every stage?

spatel marked 3 inline comments as done.Aug 14 2019, 1:49 PM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9008

Good point. We don't have any FP callers for this besides the FADD that's being added in this patch. FMUL would not need 'nsz', and I'm not sure about FMIN/FMAX:

/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
VECREDUCE_FMAX, VECREDUCE_FMIN,

I think the safe thing for this patch is to handle FADD and assert on the other opcodes until we have an actual caller and know exactly what that behavior should be.

For the flags, I'm following the FMF semantics that we use in IR: the final calculation alone determines what relaxations are allowed for any intermediate calcs leading up to that value. I've added a test to confirm that logic.

spatel updated this revision to Diff 215234.Aug 14 2019, 1:55 PM
spatel marked an inline comment as done.

Patch updated:

  1. Switch flag requirements based on opcode (only FADD handled for now).
  2. Added partial reduction tests (rL368913).
  3. Added negative test for mismatch on fast-math-flags.
RKSimon accepted this revision.Aug 15 2019, 2:18 AM

LGTM - cheers

This revision is now accepted and ready to land.Aug 15 2019, 2:18 AM
This revision was automatically updated to reflect the committed changes.