This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Copy FP flags when visiting a binary instruction.
ClosedPublic

Authored by vchuravy on Dec 13 2019, 3:17 PM.

Details

Summary

We noticed in Julia that the sequence below no longer turned into
a sequence of FMA instructions in LLVM 7+, but it did in LLVM 6.

%29 = fmul contract <4 x double> %wide.load, %wide.load16
%30 = fmul contract <4 x double> %wide.load13, %wide.load17
%31 = fmul contract <4 x double> %wide.load14, %wide.load18
%32 = fmul contract <4 x double> %wide.load15, %wide.load19
%33 = fadd fast <4 x double> %vec.phi, %29
%34 = fadd fast <4 x double> %vec.phi10, %30
%35 = fadd fast <4 x double> %vec.phi11, %31
%36 = fadd fast <4 x double> %vec.phi12, %32

Unlike Clang, Julia doesn't set the unsafe-fp-math=true function
attribute, but rather emits more local instruction flags.

This partially undoes https://reviews.llvm.org/D46854 and if required I can try to minimize the test further.

Diff Detail

Event Timeline

vchuravy created this revision.Dec 13 2019, 3:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2019, 3:17 PM
vchuravy edited the summary of this revision. (Show Details)Dec 13 2019, 3:18 PM
vchuravy added a reviewer: spatel.
vchuravy edited the summary of this revision. (Show Details)

Unit tests: pass. 60868 tests passed, 0 failed and 726 were skipped.

clang-format: pass.

Build artifacts: console-log.txt, CMakeCache.txt, test-results.xml, diff.json

It's not clear to me where the bug is. I see this debug output:

Detected a reduction operation:  %34 = fadd fast <4 x double> %vec.phi, %30
Creating new node: t36: v4f64 = fadd vector-reduction t35, t30 <-- dropped all other FMF

So that seems like we just accidentally cleared FMF by setting vector-reduction. If so, then a better fix would be after line 3127?

if (isVectorReductionOp(&I)) {
  Flags.setVectorReduction(true);
  LLVM_DEBUG(dbgs() << "Detected a reduction operation:" << I << "\n");
}

Reducing the test case would be good - no need to unroll 4x vector.body?

vchuravy updated this revision to Diff 234574.Dec 18 2019, 10:26 AM
  • reduce test
  • update test

It's not clear to me where the bug is. I see this debug output:

Detected a reduction operation:  %34 = fadd fast <4 x double> %vec.phi, %30
Creating new node: t36: v4f64 = fadd vector-reduction t35, t30 <-- dropped all other FMF

So that seems like we just accidentally cleared FMF by setting vector-reduction. If so, then a better fix would be after line 3127?

We dropped all FMF because we created a new SDNodeFlags Flags and then create a new node with those flags and replace the input node with the new one.
Setting the vector reduction doesn't clear the FMF, so where we copy them doesn't matter much. I can of course move the copying of the FMF after setting the vector reduction flag.

I reduced the test as much as possible using llvm-reduce, I had to check whether a patched binary produced and fma and a unpatched binary not yet produced a fma.
If the test is missing the middle block the fma is generated in either case.

Unit tests: pass. 61005 tests passed, 0 failed and 728 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

It's not clear to me where the bug is. I see this debug output:

Detected a reduction operation:  %34 = fadd fast <4 x double> %vec.phi, %30
Creating new node: t36: v4f64 = fadd vector-reduction t35, t30 <-- dropped all other FMF

So that seems like we just accidentally cleared FMF by setting vector-reduction. If so, then a better fix would be after line 3127?

We dropped all FMF because we created a new SDNodeFlags Flags and then create a new node with those flags and replace the input node with the new one.
Setting the vector reduction doesn't clear the FMF, so where we copy them doesn't matter much. I can of course move the copying of the FMF after setting the vector reduction flag.

Ok, I see how this fails now - back in the parent SelectionDAGBuilder::visit() function, we intersect flags for nodes that already have flags, so we are intersecting 'fast' with 'vector-reduction' which is nothing.

I reduced the test as much as possible using llvm-reduce, I had to check whether a patched binary produced and fma and a unpatched binary not yet produced a fma.
If the test is missing the middle block the fma is generated in either case.

We have to trigger isVectorReductionOp() to be true, but that doesn't require separate basic blocks. I added a further reduced test here:
rG13d30bd54b8b

That covers the only code path for this bug, so you should be able to rebase and update that test file by running utils/update_llc_test_checks.py.

When I looked at this recently as well, it looked like intersect was were we lost our flags.

When I looked at this recently as well, it looked like intersect was were we lost our flags.

Thanks. I don't see much downside to the fix as proposed currently, but let me know if you agree. If we take the flags from the IR instruction here, it will probably be redundant with the propagation in visit(). Or we can pull that flags copying to line 3128 and avoid intersecting flags so much. I doubt this makes any difference to overall codegen time.

I like the notion of avoiding/reducing intersect context hits by placing it in the vector reduction scope (line 3128).

vchuravy updated this revision to Diff 235012.Dec 21 2019, 2:37 AM
update PR

Unit tests: pass. 61083 tests passed, 0 failed and 728 were skipped.

clang-tidy: pass.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

vchuravy updated this revision to Diff 235014.Dec 21 2019, 5:02 AM

clang-format

Unit tests: pass. 61083 tests passed, 0 failed and 728 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

spatel accepted this revision.Dec 21 2019, 7:11 AM

LGTM - thanks!

This revision is now accepted and ready to land.Dec 21 2019, 7:11 AM
spatel added inline comments.Dec 21 2019, 7:12 AM
llvm/test/CodeGen/X86/fmf-reduction.ll
4

remove 'FIXME'

This revision was automatically updated to reflect the committed changes.