If the going-to-be-generated vector instruction is going to be
scalarized, not need to try to vectorize it. Instead, better to generate
buildvector/gather node and try-to-vectorize operands independently. It
shall improve vectorization quality/performance.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll | ||
---|---|---|
5 | This fixme was fixed but has now been unfixed |
llvm/test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll | ||
---|---|---|
5 | Hmm, I checked VI tests and loooks like fmul <2 x half> tests are scalarized in the end (https://godbolt.org/z/hPcjoETWT). So, it does not worth it to vectorize them. |
I don't understand why this patch is in terms of *legality*. The lowering choices should already be reflected in the *cost* returned for these operations. Why do we need to directly access the lowering choices rather than using the existing cost based APIs?
In particular, if an intermediate sub-tree is illegal, but the cost of lowering is low enough, then vectorizing the entire tree (if the total cost accounting shows that being profitable) is the right answer. You're spending a huge amount of additional work to reclaim that ability by hard failing the vectorization.
It's not really clear to me what your motivating case here is, but I suspect you've got an inaccurate cost model for one of the illegal operations. I'd strongly prefer fixing that rather than taking this approach.
Cost estimation works too late and affects th whole vectorization result. This patch prevents building of the nodes, which should not be tried for the vectorization at all, as later they are going to be scalarized. Cost model does not help here at all.
In particular, if an intermediate sub-tree is illegal, but the cost of lowering is low enough, then vectorizing the entire tree (if the total cost accounting shows that being profitable) is the right answer. You're spending a huge amount of additional work to reclaim that ability by hard failing the vectorization.
No. It prevents building the vector nodes, which will be scalarized and should be gather/buildvector nodes instead. If the operands could be vectorized for some reason, we need to vectorize it. Otherwise we actually may miss vectorization of some trees because of too high cost of those scalarized nodes.
It's not really clear to me what your motivating case here is, but I suspect you've got an inaccurate cost model for one of the illegal operations. I'd strongly prefer fixing that rather than taking this approach.
Again, no, it is not the cost, it is the check for legality here.
llvm/test/Transforms/SLPVectorizer/X86/sin-sqrt.ll | ||
---|---|---|
2 | do we have a fveclib that we can add test coverage for that will vectorize sin calls? |
llvm/test/Transforms/SLPVectorizer/X86/sin-sqrt.ll | ||
---|---|---|
2 | There is AArch64/accelerate-vector-functions-inseltpoison.ll for AArch64, will add SVML for x86 |
I wonder if we'd be better off just adding a isScalarized flag (and maybe a VF value?) member variable inside InstructionCost to track when the cost was scalarized, what do you think?
Yes, I'm happy to investigate this - it crosses over with another issue we're starting to hit that we need better methods to compare VF * scalar_cost vs vector_cost for cases where the throughput costs are below 1.0 (but clamped to 1)
How dependent are you other patches on getting this patch in?
I don't have dependancy, this was just a finding after the discussion in another patch from Philip Reames
getTypeBasedIntrinsicInstrCost already has a similar IntrinsicID->ISD conversion - merge them?