Since there is no legal instruction for integer vector division, factor in the cost of singling out each element to be used with the scalar division instruction.
Details
Diff Detail
Event Timeline
This patch results in some image processing benchmarks to improve by over 1% on big Cortex and Exynos cores.
Makes sense to me. I tested this patch on Falkor and didn't notice any significant performance differences on the benchmarks I ran. Can we add a test case, though? Maybe in test/Analysis/CostModel?
For the test case, why not just run "-cost-model -analyze" like we do for the other tests instead of running the vectorizer? Am I missing something?
The test case has scalar types and it seems more interesting to see the cost rising proportionally with the vector factor.
You can't really see the costs of the divisions increasing, though. Why not follow the example in bswap.ll with something like:
define <2 x i32> @sdiv_v2i32(<2 x i32> %a, <2 x i32> %b) { ; CHECK: Found an estimated cost of 8 for instruction: %tmp = sdiv <2 x i32> %a, %b %tmp = sdiv <2 x i32> %a, %b ret <2 x i32> %tmp } define <4 x i32> @sdiv_v4i32(<4 x i32> %a, <4 x i32> %b) { ; CHECK: Found an estimated cost of 22 for instruction: %tmp = sdiv <4 x i32> %a, %b %tmp = sdiv <4 x i32> %a, %b ret <4 x i32> %tmp } ...
llvm/test/Analysis/CostModel/AArch64/div.ll | ||
---|---|---|
7 โ | (On Diff #137467) | Show the actual numbers instead of a regex? |
llvm/test/Analysis/CostModel/AArch64/div.ll | ||
---|---|---|
7 โ | (On Diff #137467) | I was afraid you were going to say that... ๐ |