This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] make cost calc consistent for binops and cmps
ClosedPublic

Authored by spatel on Feb 24 2020, 9:54 AM.

Details

Summary

Code duplication (subsequently removed by refactoring) allowed a logic discrepancy to creep in here.

We were being conservative about creating a vector binop -- but not a vector cmp -- in the case where a vector op has the same estimated cost as the scalar op. We want to be more aggressive here because that can allow other combines based on reduced instruction count/uses.

We can reverse the transform in DAGCombiner (potentially with a more accurate cost model) if this causes regressions.

AFAIK, this does not conflict with InstCombine. We have a scalarize transform there, but it relies on finding a constant operand or a matching insertelement, so that means it eliminates an extractelement from the sequence (so we won't have 2 extracts by the time we get here if InstCombine succeeds).

Diff Detail

Event Timeline

spatel created this revision.Feb 24 2020, 9:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2020, 9:54 AM

We have the reverse transform with target hook in DAGCombiner already -- scalarizeExtractedBinop() -- so that provides a later opportunity to invert (potentially with a more accurate cost model).

I'm a little bit lost there.
Unless i'm not reading it right, even if the TLI.shouldScalarizeBinop(Vec) returns true,
the code only scalarizes if at least one of the vector operands is constant.

We have the reverse transform with target hook in DAGCombiner already -- scalarizeExtractedBinop() -- so that provides a later opportunity to invert (potentially with a more accurate cost model).

I'm a little bit lost there.
Unless i'm not reading it right, even if the TLI.shouldScalarizeBinop(Vec) returns true,
the code only scalarizes if at least one of the vector operands is constant.

Oops, I misread it - I saw an x86 FP example like:

define float @fadd_extract(<4 x float> %x, <4 x float> %y) {
  %a = fadd <4 x float> %x, %y
  %r = extractelement <4 x float> %a, i32 0
  ret float %r
}

...but that's handled by the x86-specific scalarizeExtEltFP().

I can try to make scalarizeExtractedBinop() stronger, but I'll need to find a target/example where it is a win (the extracts need to be free, or the vector op needs to be pretty expensive?). Or we just see if this causes wins or regressions for anyone as-is?

For reference, a motivating case for VectorCombine that I'm hoping to build up to (not there yet because we need a shuffle):
https://bugs.llvm.org/show_bug.cgi?id=42633

If we make the cost calc conservative, then we probably forego creating a vector op on that example for some targets even though it's a likely win.

Another possibility is that targets will need to have more complex cost modeling for extract/insert (D74976) - but that's probably not enough by itself to get all of the missed vectorizations we've seen.

We have the reverse transform with target hook in DAGCombiner already -- scalarizeExtractedBinop() -- so that provides a later opportunity to invert (potentially with a more accurate cost model).

I'm a little bit lost there.
Unless i'm not reading it right, even if the TLI.shouldScalarizeBinop(Vec) returns true,
the code only scalarizes if at least one of the vector operands is constant.

Oops, I misread it - I saw an x86 FP example like:

define float @fadd_extract(<4 x float> %x, <4 x float> %y) {
  %a = fadd <4 x float> %x, %y
  %r = extractelement <4 x float> %a, i32 0
  ret float %r
}

...but that's handled by the x86-specific scalarizeExtEltFP().

I can try to make scalarizeExtractedBinop() stronger, but I'll need to find a target/example where it is a win (the extracts need to be free, or the vector op needs to be pretty expensive?). Or we just see if this causes wins or regressions for anyone as-is?

I think that either scalarizeExtractedBinop() should actually be able to do the inverse transform,
or this patch's description shouldn't say scalarizeExtractedBinop() can do inverse transform.

We have the reverse transform with target hook in DAGCombiner already -- scalarizeExtractedBinop() -- so that provides a later opportunity to invert (potentially with a more accurate cost model).

I'm a little bit lost there.
Unless i'm not reading it right, even if the TLI.shouldScalarizeBinop(Vec) returns true,
the code only scalarizes if at least one of the vector operands is constant.

Oops, I misread it - I saw an x86 FP example like:

define float @fadd_extract(<4 x float> %x, <4 x float> %y) {
  %a = fadd <4 x float> %x, %y
  %r = extractelement <4 x float> %a, i32 0
  ret float %r
}

...but that's handled by the x86-specific scalarizeExtEltFP().

I can try to make scalarizeExtractedBinop() stronger, but I'll need to find a target/example where it is a win (the extracts need to be free, or the vector op needs to be pretty expensive?). Or we just see if this causes wins or regressions for anyone as-is?

I think that either scalarizeExtractedBinop() should actually be able to do the inverse transform,
or this patch's description shouldn't say scalarizeExtractedBinop() can do inverse transform.

Changing the description/comments is easy. :)
I'm willing to see if that works out. And if there are problems, I can adjust the DAGCombiner code.

spatel updated this revision to Diff 246275.Feb 24 2020, 12:08 PM
spatel edited the summary of this revision. (Show Details)
lebedev.ri accepted this revision.Feb 24 2020, 12:18 PM

SGTM, the difference wasn't really intentional on my side, i just kind-of not caught it.

This revision is now accepted and ready to land.Feb 24 2020, 12:18 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Feb 25 2020, 6:06 AM

Looks like this broke tests everywhere, eg http://45.33.8.238/linux/11165/step_7.txt

Can you take a look please?

Looks like this broke tests everywhere, eg http://45.33.8.238/linux/11165/step_7.txt

Can you take a look please?

Thanks. This hit an over-reaching clang test (again). Should be fixed with:
rG83f4372f3a70

This patch partly fixed a bug report that came in after approval:
https://bugs.llvm.org/show_bug.cgi?id=45015

I added a more complete test to detect that set of problems here:
rGf452f7b95a8c