This is an archive of the discontinued LLVM Phabricator instance.

Combine fmul vector FP constants when unsafe math is allowed
ClosedPublic

Authored by spatel on Sep 8 2014, 6:09 PM.

Details

Summary

This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820

That patch allowed combining of splatted vector FP constants that are multiplied.

This patch allows combining non-uniform vector FP constants too by relaxing the check on the type of vector. I've also tried to canonicalize a vector fmul in the same way that we already do for scalars - if only one operand of the fmul is a constant, make it operand 1. Otherwise, we miss potential folds.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 13432.Sep 8 2014, 6:09 PM
spatel retitled this revision from to Combine fmul vector FP constants when unsafe math is allowed.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added a reviewer: arsenm.
spatel added a subscriber: Unknown Object (MLST).

Hi Sanjay,

Thanks for the patch.
I have a couple of comments (see below):

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6812–6814 ↗(On Diff #13432)

This is ok. However, it might be better to check that N0 is a build vector of all constants before commuting the operands of the FMUL. This is to avoid that we trigger the canonicalization for nodes where N0 is a build vector, but not all elements in N0 are constants or undef.

6835–6846 ↗(On Diff #13432)

I think we probably don't need the 'hasOneUse()' constraint here.

in the worst case scenario, we still have two fmul instructions; that is because the new (fmul c1, c2) would be always constant folded.
Also, if you remove the 'hasOneUse()' constraint, every time the dag combiner triggers your new rule, the number of uses for the 'fmul x, c1' is decreased by one.
This would have a positive effect on the following code example:

define <4 x float> @foo(<4 x float> %A, <4 x float> %B) {
  %1 = fmul <4 x float> %A, <float 3.0, float 4.0, float 5.0, float 6.0>
  %2 = fmul <4 x float> %1, <float 1.0, float 2.0, float 3.0, float 4.0>
  %3 = fmul <4 x float> %1, <float 2.0, float 3.0, float 4.0, float 5.0>
  %4 = fadd <4 x float> %2, %3
  ret <4 x float> %4
}

Without the 'hasOneUse()' check, we only get two (v)mulps. With the 'hasOneUse()' we instead get three mulps.

spatel updated this revision to Diff 13497.Sep 9 2014, 2:16 PM

Thanks very much for the prompt review, Andrea!

I agree with both of your comments. Please see the updated patch which:

  1. Confirms that a build vector is a constant before canonicalizing.
  2. Removes the hasOneUse() restriction on the fold.
  3. Adds a minimal test case for the multiple use scenario; it has the same number of fmuls, but we can confirm that the optimization occurs by checking the constant pool values.
andreadb accepted this revision.Sep 9 2014, 3:59 PM
andreadb edited edge metadata.

LGTM. Thanks!

This revision is now accepted and ready to land.Sep 9 2014, 3:59 PM

I'm now wondering if this patch is necessary. I'd rather not bloat up DAGCombiner any more if there's no reason to.

The constant reassociations in all 3 of the test cases that I created here are already done by -instcombine if 'fast' is specified on each fmul inst. And in http://reviews.llvm.org/D5222, I've said that 'fast' will be specified for any function produced by clang with -ffast-math.

Matt, what was the motivation for r215820? Is there some scenario you were seeing where we can't do these folds in -instcombine?

spatel closed this revision.Sep 11 2014, 8:55 AM
spatel updated this revision to Diff 13590.

Closed by commit rL217599 (authored by @spatel).

Thanks all - checked in at r217599.