This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add target option to bail scalarization
Needs RevisionPublic

Authored by hgreving on Jun 9 2023, 6:04 PM.

Details

Summary

Adds a TTI option to consider scalarization unprofitable for
targets that prefer vector code to stay vectorized.

Diff Detail

Event Timeline

hgreving created this revision.Jun 9 2023, 6:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2023, 6:04 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
hgreving requested review of this revision.Jun 9 2023, 6:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2023, 6:04 PM
nikic requested changes to this revision.Jun 10 2023, 12:23 AM

InstCombine is a target-independent canonicalization pass. The use of TTI hooks is forbidden as a matter of policy.

If there is no subset of scalarization that is universally profitable, then the transform should be moved into VectorCombine, which is cost-model driven.

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
49

Read the comment.

This revision now requires changes to proceed.Jun 10 2023, 12:23 AM

InstCombine is a target-independent canonicalization pass. The use of TTI hooks is forbidden as a matter of policy.

If there is no subset of scalarization that is universally profitable, then the transform should be moved into VectorCombine, which is cost-model driven.

I expected this kind of reply, I felt though that TTI is already used in some cases. If it's strictly target independent, I don't understand why we would scalarize in instcombine in the first place, since this is very unlikely to be target agnostic. Would you be ok with a switch (probably not)? The only problem with moving it to vector-combine is that besides scalarize phi, it seems entangled with extract_elt transforms that _are_ target independent, WDYT?

nikic added a comment.Jun 13 2023, 5:39 AM

InstCombine is a target-independent canonicalization pass. The use of TTI hooks is forbidden as a matter of policy.

If there is no subset of scalarization that is universally profitable, then the transform should be moved into VectorCombine, which is cost-model driven.

I expected this kind of reply, I felt though that TTI is already used in some cases.

TTI is only used for handling of target intrinsics.

If it's strictly target independent, I don't understand why we would scalarize in instcombine in the first place, since this is very unlikely to be target agnostic.

Probably just an old transform that used to be universally profitable at the time.

Would you be ok with a switch (probably not)? The only problem with moving it to vector-combine is that besides scalarize phi, it seems entangled with extract_elt transforms that _are_ target independent, WDYT?

The patch description has no information on what problem you're actually trying to solve, so it's hard to give meaningful advice here. Is any kind of scalarization problematic for your target, or is it something more specific, e.g. an expensive variable extract from a binop being converted into two expensive variable extracts?

hgreving added a comment.EditedJun 13 2023, 7:16 AM

InstCombine is a target-independent canonicalization pass. The use of TTI hooks is forbidden as a matter of policy.

If there is no subset of scalarization that is universally profitable, then the transform should be moved into VectorCombine, which is cost-model driven.

I expected this kind of reply, I felt though that TTI is already used in some cases.

TTI is only used for handling of target intrinsics.

If it's strictly target independent, I don't understand why we would scalarize in instcombine in the first place, since this is very unlikely to be target agnostic.

Probably just an old transform that used to be universally profitable at the time.

Would you be ok with a switch (probably not)? The only problem with moving it to vector-combine is that besides scalarize phi, it seems entangled with extract_elt transforms that _are_ target independent, WDYT?

The patch description has no information on what problem you're actually trying to solve, so it's hard to give meaningful advice here. Is any kind of scalarization problematic for your target, or is it something more specific, e.g. an expensive variable extract from a binop being converted into two expensive variable extracts?

%0 = mul <N x i32> %x, <splat vector>
%1 = extractelement <N x i32> %m, i32 %i
%2 = insertelement <N x i32> poison, i32 %1, i32 0
%3 = shufflevector <N x i32> %2, <N x i32> poison, <N x i32> zeroinitializer

instcombine scalarizes this (extract + scalar mul strength reduced to shl + insert + shuffle). Our target does not like the vector to scalar code as a result, it is not "cheap". Another way to fix this is to re-vectorize the code later, but my preference was to avoid scalarizing like this in the first place. I suppose adding a switch to instcombine is not really an option (I don't like it)?