The cost of splitting a large vector instruction is not being taken into account by the getUserCost function. This was leading to some loops being over unrolled. The cost of a vector instruction is now being multiplied by the cost of the type legalization. This will return a more accurate cost.
Details
Diff Detail
Event Timeline
lib/Target/PowerPC/PPCTargetTransformInfo.cpp | ||
---|---|---|
194–197 | I have concerns about this being Power only. And to what costs we're applying this to, etc. Thoughts? |
lib/Target/PowerPC/PPCTargetTransformInfo.cpp | ||
---|---|---|
194–197 | It could be done for all platforms, but I was worried that it would hurt some tuning that I wasn't able to test. This should only affect cases where there are vector instructions in the IR that are larger than the machine can handle. In those cases, getUserCost will not give a more accurate description of the cost. For example, if there is a 16 wide add of i32s, getUserCost will now return a value of 4 instead of 1, since it will eventually be split into 4 instructions on a machine that only has 128bit wide vector instructions. |
I guess number of instructions is rather hard coded in. I'd definitely like to see this across other targets, but it works for now.
Thanks!
-eric
I have concerns about this being Power only. And to what costs we're applying this to, etc.
Thoughts?