This is an archive of the discontinued LLVM Phabricator instance.

[x86] use instruction-level fast-math-flags to drive MachineCombiner
ClosedPublic

Authored by spatel on Feb 19 2020, 10:32 AM.

Details

Summary

The code changes here are hopefully straightforward:

  1. Use SDAG node-level flags to decide if FP ops can be reassociated (use both "reassoc" and "nsz" to be consistent with IR transforms; we probably don't need "nsz", but that's a safer interpretation of the FMF).
  2. Check that both nodes allow reassociation to change instructions. This is a stronger requirement than we've usually implemented in IR/DAG, but this is needed to solve the motivating bug (see below), and it seems unlikely to impede optimization at this late stage.
  3. Intersect/propagate MachineIR flags to enable further reassociation in MachineCombiner.

We managed to make MachineCombiner flexible enough that no changes are needed to that pass itself. So this patch should only affect x86 (assuming no other targets have implemented the hooks using MachineIR flags yet).

The motivating example in PR43609 is another case of fast-math transforms interacting badly with special FP ops created by lowering:
https://bugs.llvm.org/show_bug.cgi?id=43609
The special fadd ops used for converting int to FP assume that they will not be altered, so those are created without FMF.

However, the MachineCombiner pass was being enabled for FP ops using the global/function-level TargetOption for "UnsafeFPMath". We managed to run instruction/node-level FMF all the way down to MachineIR sometime in the last 1-2 years though, so we can do better now.

The test diffs require some explanation:

  1. llvm/test/CodeGen/X86/fmf-flags.ll - no target option for unsafe math was specified here, so MachineCombiner kicks in where it did not previously; to make it behave consistently, we need to specify a CPU schedule model, so use the default model, and there are no code diffs.
  2. llvm/test/CodeGen/X86/machine-combiner.ll - replace the target option for unsafe math with the equivalent IR-level flags, and there are no code diffs; we can't remove the NaN/nsz options because those are still used to drive x86 fmin/fmax codegen (special SDAG opcodes).
  3. llvm/test/CodeGen/X86/pow.ll - similar to #1
  4. llvm/test/CodeGen/X86/sqrt-fastmath.ll - similar to #1/#3, but MachineCombiner does some reassociation of the estimate sequence ops; presumably these are perf wins based on latency/throughput (and we get some reduction of move instructions too); I'm not sure how it affects numerical accuracy, but the test reflects reality better now because we would expect MachineCombiner to be enabled if the IR was generated via something like "-ffast-math" with clang.
  5. llvm/test/CodeGen/X86/vec_int_to_fp.ll - this is the test added to model PR43609; the fadds are not reassociated now, so we should get the expected results.
  6. llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll - similar to #1
  7. llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll - similar to #1

Diff Detail

Event Timeline

spatel created this revision.Feb 19 2020, 10:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2020, 10:32 AM

All in all LGTM, will check back in bit to see what others have to say.

llvm/lib/Target/X86/X86InstrInfo.cpp
7661

The reassociate pass uses nsz too, and as a convention this seems fine.

7850

I like this as for us in the case when we have divergent FMF environments brought together via inlining, this preserves the bounds of expression interfacing, where neither model would incur on the other. Perhaps we should have a way to zap the int/float flags as a utility so that we can keep contexts separate in MachineInstr.h.

wristow accepted this revision.Feb 26 2020, 7:09 PM

LGTM

This revision is now accepted and ready to land.Feb 26 2020, 7:09 PM
This revision was automatically updated to reflect the committed changes.