Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost.
Details
Diff Detail
Event Timeline
Thanks for catching this Alexey, computing the scalar reduction cost with the vector type does look very odd.
include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
928โ929 | The computation is getting pretty hairy. Could you please add comments explaining what exactly it computes? I understand you're trying to model the reduction steps performed on illegal vectors and steps performed on legal steps separately, but I'm not sure this computes precisely what we want. | |
932 | Maybe save static_cast<T *>(this) in a temporary, for readability's sake, now that you're adding more uses? | |
934 | I counts the number of illegal reduction steps, right? |
include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
966 | ThisT -> ConcreteTTI, or something conveying the same information? | |
983 | ont eh -> on the | |
988 | Why the +1? |
So, now, none of the costs on the tests actually change?
Does that mean that the changes in costs in the previous versions came from the +1?
Can you add a test that demonstrates the cost change? (Preferably in a way that shows what happened - e.g. commit a test with the "bad" cost, and then have a diff with the good one).
Changed a code a little bit. Checked it, the calculated cost is very close to the real situation, but seems to me the BoUpSLP tree cost is too optimistic. Will look at it later.
I'm sorry, but I'm still confused.
The original rationale for this patch was that the vector cost model is too optimistic, but the only test change seems to show the cost model becoming even more optimistic.
Michael, added several test cases in r287801. Checked the test after my changes - no changes is found, the result is still the same as before this patch with fixes
The point I'm trying to make is that there's still no adequate test coverage for this patch.
Could you please split this into two patches:
- A patch that contains line 4286, and a test where the behavior actually changes due to that line.
- A patch that contains getReductionCost() and a test where the behavior changes due to those changes. I assume (a) the change in reduction.ll you have in this patch is due to this part, and (b) 11 is a better cost here than 17, but it seems like <8 x i32> should be legal on AVX2, and illegal on SSE/AVX, so I wonder why we get the same cost for both case.
Would a smaller fix achieve the same result? If so, can you add a test that demonstrates where the logic you've added for illegal types matters.
- Michael, I don't think this is a good idea. Actually, all these changes are required for cost model fix. I removed all optimizations from the patch, now it is a pure bug fix.
- Yes, the patch may reduce the cost of vector operations, but also it reduces the cost of scalar operations because of use of ScalarTy instead of VectorTy.
Currently, this kind of reduction is also allowed. The vector cost of this operation is 17, but the scalar cost is 16 (because of using of VectorTy). With this patch, the vector cost is 11, but the scalar cost will be 7, so the difference even better than in the original code.
Ok, now I see, in this case it doesn't make sense to separate the two.
(Sorry for the response time, I was on vacation)
LGTM, except that I would still like to see a test for the scalar cost as part of the final patch you commit. Having the test to begin with (showing 17 -> 11 and 16 -> 7 together) would have saved us some miscommunication.
The computation is getting pretty hairy. Could you please add comments explaining what exactly it computes?
I understand you're trying to model the reduction steps performed on illegal vectors and steps performed on legal steps separately, but I'm not sure this computes precisely what we want.