This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fixed cost model for horizontal reduction.
ClosedPublic

Authored by ABataev on Nov 3 2016, 4:27 AM.

Details

Summary

Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev updated this revision to Diff 76836.Nov 3 2016, 4:27 AM
ABataev retitled this revision from to [SLP] Fixed cost model for horizontal reduction..
ABataev updated this object.
ABataev added reviewers: hfinkel, mkuper, mzolotukhin.
ABataev added subscribers: llvm-commits, spatel, RKSimon.
mkuper edited edge metadata.Nov 5 2016, 11:52 PM

Thanks for catching this Alexey, computing the scalar reduction cost with the vector type does look very odd.

include/llvm/CodeGen/BasicTTIImpl.h
928 ↗(On Diff #76836)

The computation is getting pretty hairy. Could you please add comments explaining what exactly it computes?

I understand you're trying to model the reduction steps performed on illegal vectors and steps performed on legal steps separately, but I'm not sure this computes precisely what we want.

932 ↗(On Diff #76836)

Maybe save static_cast<T *>(this) in a temporary, for readability's sake, now that you're adding more uses?

934 ↗(On Diff #76836)

I counts the number of illegal reduction steps, right?
Maybe give it a more meaningful name? It's not a regular loop counter where it's obvious what it counts up to.

ABataev marked 3 inline comments as done.Nov 9 2016, 5:59 AM
ABataev added inline comments.
include/llvm/CodeGen/BasicTTIImpl.h
928 ↗(On Diff #76836)

Added several lines of comments trying to explain the new code.

934 ↗(On Diff #76836)

Changed to while loop.

ABataev updated this revision to Diff 77338.Nov 9 2016, 6:01 AM
ABataev edited edge metadata.
ABataev marked 2 inline comments as done.

Updated after Michael's comments

mkuper added inline comments.Nov 13 2016, 11:56 AM
include/llvm/CodeGen/BasicTTIImpl.h
966 ↗(On Diff #77338)

ThisT -> ConcreteTTI, or something conveying the same information?
(That's what this actually is, right?)

983 ↗(On Diff #77338)

ont eh -> on the

988 ↗(On Diff #77338)

Why the +1?
If the type is legal, LongVectorsCount is 0, Ty doesn't change, so this changes the behavior from multiplying by NumReduxLevel to NumReduxLevel + 1. Was the previous version wrong in that respect?

ABataev updated this revision to Diff 78386.Nov 17 2016, 10:35 AM
ABataev marked 3 inline comments as done.

Fixed after comments from Michael

So, now, none of the costs on the tests actually change?
Does that mean that the changes in costs in the previous versions came from the +1?

Can you add a test that demonstrates the cost change? (Preferably in a way that shows what happened - e.g. commit a test with the "bad" cost, and then have a diff with the good one).

ABataev updated this revision to Diff 78549.Nov 18 2016, 9:45 AM

Fixed some issues in previous implementation. More correct cost calculation

So, now, none of the costs on the tests actually change?
Does that mean that the changes in costs in the previous versions came from the +1?

Can you add a test that demonstrates the cost change? (Preferably in a way that shows what happened - e.g. commit a test with the "bad" cost, and then have a diff with the good one).

Changed a code a little bit. Checked it, the calculated cost is very close to the real situation, but seems to me the BoUpSLP tree cost is too optimistic. Will look at it later.

So, now, none of the costs on the tests actually change?
Does that mean that the changes in costs in the previous versions came from the +1?

Can you add a test that demonstrates the cost change? (Preferably in a way that shows what happened - e.g. commit a test with the "bad" cost, and then have a diff with the good one).

Changed a code a little bit. Checked it, the calculated cost is very close to the real situation, but seems to me the BoUpSLP tree cost is too optimistic. Will look at it later.

I'm sorry, but I'm still confused.
The original rationale for this patch was that the vector cost model is too optimistic, but the only test change seems to show the cost model becoming even more optimistic.

Michael, added several test cases in r287801. Checked the test after my changes - no changes is found, the result is still the same as before this patch with fixes

ABataev updated this revision to Diff 79139.Nov 23 2016, 1:01 PM
ABataev updated this object.

Updated to the latest version

The point I'm trying to make is that there's still no adequate test coverage for this patch.

Could you please split this into two patches:

  1. A patch that contains line 4286, and a test where the behavior actually changes due to that line.
  1. A patch that contains getReductionCost() and a test where the behavior changes due to those changes. I assume (a) the change in reduction.ll you have in this patch is due to this part, and (b) 11 is a better cost here than 17, but it seems like <8 x i32> should be legal on AVX2, and illegal on SSE/AVX, so I wonder why we get the same cost for both case.

Would a smaller fix achieve the same result? If so, can you add a test that demonstrates where the logic you've added for illegal types matters.

The point I'm trying to make is that there's still no adequate test coverage for this patch.

Could you please split this into two patches:

  1. A patch that contains line 4286, and a test where the behavior actually changes due to that line.
  1. A patch that contains getReductionCost() and a test where the behavior changes due to those changes. I assume (a) the change in reduction.ll you have in this patch is due to this part, and (b) 11 is a better cost here than 17, but it seems like <8 x i32> should be legal on AVX2, and illegal on SSE/AVX, so I wonder why we get the same cost for both case.

Would a smaller fix achieve the same result? If so, can you add a test that demonstrates where the logic you've added for illegal types matters.

  1. Michael, I don't think this is a good idea. Actually, all these changes are required for cost model fix. I removed all optimizations from the patch, now it is a pure bug fix.
  2. Yes, the patch may reduce the cost of vector operations, but also it reduces the cost of scalar operations because of use of ScalarTy instead of VectorTy.

Currently, this kind of reduction is also allowed. The vector cost of this operation is 17, but the scalar cost is 16 (because of using of VectorTy). With this patch, the vector cost is 11, but the scalar cost will be 7, so the difference even better than in the original code.

mkuper accepted this revision.Nov 30 2016, 10:09 AM
mkuper edited edge metadata.

Ok, now I see, in this case it doesn't make sense to separate the two.
(Sorry for the response time, I was on vacation)

LGTM, except that I would still like to see a test for the scalar cost as part of the final patch you commit. Having the test to begin with (showing 17 -> 11 and 16 -> 7 together) would have saved us some miscommunication.

This revision is now accepted and ready to land.Nov 30 2016, 10:09 AM
This revision was automatically updated to reflect the committed changes.