Adds a TTI option to consider scalarization unprofitable for
targets that prefer vector code to stay vectorized.
Details
- Reviewers
majnemer jmolloy ThomasRaoux nikic
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
InstCombine is a target-independent canonicalization pass. The use of TTI hooks is forbidden as a matter of policy.
If there is no subset of scalarization that is universally profitable, then the transform should be moved into VectorCombine, which is cost-model driven.
llvm/include/llvm/Transforms/InstCombine/InstCombiner.h | ||
---|---|---|
49 | Read the comment. |
I expected this kind of reply, I felt though that TTI is already used in some cases. If it's strictly target independent, I don't understand why we would scalarize in instcombine in the first place, since this is very unlikely to be target agnostic. Would you be ok with a switch (probably not)? The only problem with moving it to vector-combine is that besides scalarize phi, it seems entangled with extract_elt transforms that _are_ target independent, WDYT?
TTI is only used for handling of target intrinsics.
If it's strictly target independent, I don't understand why we would scalarize in instcombine in the first place, since this is very unlikely to be target agnostic.
Probably just an old transform that used to be universally profitable at the time.
Would you be ok with a switch (probably not)? The only problem with moving it to vector-combine is that besides scalarize phi, it seems entangled with extract_elt transforms that _are_ target independent, WDYT?
The patch description has no information on what problem you're actually trying to solve, so it's hard to give meaningful advice here. Is any kind of scalarization problematic for your target, or is it something more specific, e.g. an expensive variable extract from a binop being converted into two expensive variable extracts?
%0 = mul <N x i32> %x, <splat vector>
%1 = extractelement <N x i32> %m, i32 %i
%2 = insertelement <N x i32> poison, i32 %1, i32 0
%3 = shufflevector <N x i32> %2, <N x i32> poison, <N x i32> zeroinitializer
instcombine scalarizes this (extract + scalar mul strength reduced to shl + insert + shuffle). Our target does not like the vector to scalar code as a result, it is not "cheap". Another way to fix this is to re-vectorize the code later, but my preference was to avoid scalarizing like this in the first place. I suppose adding a switch to instcombine is not really an option (I don't like it)?
Read the comment.