This is an archive of the discontinued LLVM Phabricator instance.

transform fadd chains to increase parallelism
ClosedPublic

Authored by spatel on Apr 23 2015, 1:50 PM.

Details

Summary

This is a compromise: with this simple patch, we should always handle a chain of exactly 3 operations optimally, but we're not generating the optimal balanced binary tree for a longer sequence.

In general, this transform will reduce the dependency chain for a sequence of instructions using N operands from a worst case N-1 dependent operations to N/2 dependent operations. The optimal balanced binary tree would reduce the chain to log2(N).

As I see it, the trade-off for not dealing with longer sequences is: (1) we have less complexity in the compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we don't need to worry about the increased register pressure required to parallelize longer sequences. It also seems unlikely that we would ever encounter really long strings of dependent ops like that in the wild, but I'm not sure how to verify that speculation. FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math and this patch.

If this patch looks ok, then I can extend it to cover other associative operations such as fmul, fmax, fmin, integer add, integer mul.

This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305

and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

The issue also came up in:
http://reviews.llvm.org/D8941

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 24326.Apr 23 2015, 1:50 PM
spatel retitled this revision from to transform fadd chains to increase parallelism.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: hfinkel, qcolombet, andreadb.
spatel added a subscriber: Unknown Object (MLST).
qcolombet added inline comments.Apr 28 2015, 11:15 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7662 ↗(On Diff #24326)

I would prefer the comment to match the actual code, i.e., invert the order of the operand:
(fadd (fadd (fadd z, w), y), x) -> (fadd (fadd z, w), (fadd x, y))

You could even use named operands like this:
(fadd N0: (fadd N00: (fadd z, w), N01: y), N1: x) -> (fadd N00: (fadd z, w), (fadd N1: x, M01: y))

7666 ↗(On Diff #24326)

You can move this assignment into the next if.

test/CodeGen/X86/fp-fast.ll
124 ↗(On Diff #24326)

Can’t you be more specific on the input registers?
With a pattern like this, I believe even the old inefficient sequence would match, wouldn’t it?

spatel added inline comments.Apr 28 2015, 11:27 AM
test/CodeGen/X86/fp-fast.ll
124 ↗(On Diff #24326)

Hi Quentin,

Thanks for reviewing this patch. I don't think we can be more specific on the inputs: we know that xmm0 - xmm3 are the input registers, but the order of the operands as well as the order of the first two adds may be commuted (not by this patch, but some future patch)?

I made sure that the last check will not work without this patch. It requires that the outputs of the first two adds are inputs to the third add. This final add check is actually too specific because it fixes the order of the operands. I tried every regex combo that I could think of to make that more flexible, but couldn't get anything to work with FileCheck.

qcolombet added inline comments.Apr 28 2015, 11:39 AM
test/CodeGen/X86/fp-fast.ll
124 ↗(On Diff #24326)

Let stick to the current order of the operands. If they change we can fix them.

Anyhow, this is not a bit deal, I was just not sure, it wouldn’t match the old sequence :).

spatel added inline comments.Apr 28 2015, 1:29 PM
test/CodeGen/X86/fp-fast.ll
124 ↗(On Diff #24326)

Ok - if we're not too concerned with flexibility of the match, it becomes easy. :)
I can just use update_llc_test_checks.py for the exact match.

spatel updated this revision to Diff 24574.Apr 28 2015, 1:34 PM

Patch updated:

  1. Fixed fold comment to match code
  2. Moved variable declaration closer to use
  3. Made test CHECK lines match expected output exactly
qcolombet accepted this revision.Apr 28 2015, 1:49 PM
qcolombet edited edge metadata.

Thanks Sanjay!

LGTM.

Q.

This revision is now accepted and ready to land.Apr 28 2015, 1:49 PM
This revision was automatically updated to reflect the committed changes.