This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Introduce isLegalVectorOp to check if the vector instruction is going to be scalarized.
Needs ReviewPublic

Authored by ABataev on Jul 7 2023, 12:12 PM.

Details

Reviewers
RKSimon
reames
Summary

If the going-to-be-generated vector instruction is going to be
scalarized, not need to try to vectorize it. Instead, better to generate
buildvector/gather node and try-to-vectorize operands independently. It
shall improve vectorization quality/performance.

Diff Detail

Event Timeline

ABataev created this revision.Jul 7 2023, 12:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 12:12 PM
ABataev requested review of this revision.Jul 7 2023, 12:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 12:12 PM
RKSimon added inline comments.Jul 10 2023, 4:13 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
355

getTypeBasedIntrinsicInstrCost already has a similar IntrinsicID->ISD conversion - merge them?

379

ISD::FCOS?

RKSimon retitled this revision from [SLP]Introduce isValiVectorOp to check if the vector instruction is going to be scalarized. to [SLP]Introduce isLegalVectorOp to check if the vector instruction is going to be scalarized..Jul 10 2023, 4:14 AM
ABataev updated this revision to Diff 538647.Jul 10 2023, 8:00 AM

Address comments, rebase

arsenm added a subscriber: arsenm.Jul 10 2023, 8:33 AM
arsenm added inline comments.
llvm/test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll
5

This fixme was fixed but has now been unfixed

ABataev added inline comments.Jul 10 2023, 9:36 AM
llvm/test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll
5

Hmm, I checked VI tests and loooks like fmul <2 x half> tests are scalarized in the end (https://godbolt.org/z/hPcjoETWT). So, it does not worth it to vectorize them.

I don't understand why this patch is in terms of *legality*. The lowering choices should already be reflected in the *cost* returned for these operations. Why do we need to directly access the lowering choices rather than using the existing cost based APIs?

In particular, if an intermediate sub-tree is illegal, but the cost of lowering is low enough, then vectorizing the entire tree (if the total cost accounting shows that being profitable) is the right answer. You're spending a huge amount of additional work to reclaim that ability by hard failing the vectorization.

It's not really clear to me what your motivating case here is, but I suspect you've got an inaccurate cost model for one of the illegal operations. I'd strongly prefer fixing that rather than taking this approach.

I don't understand why this patch is in terms of *legality*. The lowering choices should already be reflected in the *cost* returned for these operations. Why do we need to directly access the lowering choices rather than using the existing cost based APIs?

Cost estimation works too late and affects th whole vectorization result. This patch prevents building of the nodes, which should not be tried for the vectorization at all, as later they are going to be scalarized. Cost model does not help here at all.

In particular, if an intermediate sub-tree is illegal, but the cost of lowering is low enough, then vectorizing the entire tree (if the total cost accounting shows that being profitable) is the right answer. You're spending a huge amount of additional work to reclaim that ability by hard failing the vectorization.

No. It prevents building the vector nodes, which will be scalarized and should be gather/buildvector nodes instead. If the operands could be vectorized for some reason, we need to vectorize it. Otherwise we actually may miss vectorization of some trees because of too high cost of those scalarized nodes.

It's not really clear to me what your motivating case here is, but I suspect you've got an inaccurate cost model for one of the illegal operations. I'd strongly prefer fixing that rather than taking this approach.

Again, no, it is not the cost, it is the check for legality here.

RKSimon added inline comments.Jul 31 2023, 9:28 AM
llvm/test/Transforms/SLPVectorizer/X86/sin-sqrt.ll
2–3

do we have a fveclib that we can add test coverage for that will vectorize sin calls?

ABataev added inline comments.Aug 1 2023, 6:09 AM
llvm/test/Transforms/SLPVectorizer/X86/sin-sqrt.ll
2–3

There is AArch64/accelerate-vector-functions-inseltpoison.ll for AArch64, will add SVML for x86

ABataev updated this revision to Diff 551579.Aug 18 2023, 11:22 AM

Rebase, ping!

Matt added a subscriber: Matt.Aug 31 2023, 1:12 PM

I wonder if we'd be better off just adding a isScalarized flag (and maybe a VF value?) member variable inside InstructionCost to track when the cost was scalarized, what do you think?

I wonder if we'd be better off just adding a isScalarized flag (and maybe a VF value?) member variable inside InstructionCost to track when the cost was scalarized, what do you think?

It may work too. Will you try to make a patch? Or you prefer me to make it?

Yes, I'm happy to investigate this - it crosses over with another issue we're starting to hit that we need better methods to compare VF * scalar_cost vs vector_cost for cases where the throughput costs are below 1.0 (but clamped to 1)

How dependent are you other patches on getting this patch in?

Yes, I'm happy to investigate this - it crosses over with another issue we're starting to hit that we need better methods to compare VF * scalar_cost vs vector_cost for cases where the throughput costs are below 1.0 (but clamped to 1)

How dependent are you other patches on getting this patch in?

I don't have dependancy, this was just a finding after the discussion in another patch from Philip Reames