This is an archive of the discontinued LLVM Phabricator instance.

[SLP][COST][X86]Improve cost model for masked gather.
ClosedPublic

Authored by ABataev on Jun 28 2021, 10:39 AM.

Details

Summary

Revived D101297 in its original form + added some changes in X86
legalization cehcking for masked gathers.

This solution is the most stable and the most correct one. We have to
check the legality before trying to build the masked gather in SLP.
Without this check we have incorrect cost (for SLP) in case if the masked gather
is not legal/slower than the gather. And we're missing some
vectorization opportunities.

This can be fixed in the cost model, but in this case we need to add
special checks for the cost of GEPs for ScatterVectorize node, add
special check for small trees, etc., i.e. there are a lot of corner
cases here and there, which insrease code base and make it harder to
maintain the code.

Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed.

The question from D101297. Actually, no, it can't. Actually, simple
gather may give us better result, especially after we started
vectorization of insertelements. Plus, like I said before, the cost for
non-legal masked gathers leads to missed vectorization opportunities.

Diff Detail

Event Timeline

ABataev created this revision.Jun 28 2021, 10:39 AM
ABataev requested review of this revision.Jun 28 2021, 10:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2021, 10:39 AM
anton-afanasyev added inline comments.Jul 6 2021, 3:33 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3007

Maybe it's worth to use CommonAlignment for getEntryCost() as well? This patch looks good one to take this code from D57059.

RKSimon added inline comments.Jul 7 2021, 6:05 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4651

I'm really not sure about this - by moving this into isLegalMaskedGather we're affecting codegen as well as IR.

While we agreed that we'd avoid this on AVX2 targets (where legal gathers are actually REALLY bad), more minor cost differences on AVX512 targets its not so clear.

ABataev added inline comments.Jul 7 2021, 6:29 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4651

I think this change is ok. Actually, we saw perf degradations with masked gathers even for AVX512 and had to change to use only for at least 8 elements, even for 4 elements simple gathering might be more profitable than the masked gathering. I would agree, that this requires some extra fine-tuning but I believe this kind of check should be in legality rather than in the cost. Without this change, we return the cost of loads+gathering, which is not the cost of the MaskedGather op. And I think this is good for codegen too since it relies on the legality check when chooses between masked gathering and simple gathering.

ABataev updated this revision to Diff 356943.Jul 7 2021, 7:00 AM

Address comment

RKSimon accepted this revision.Jul 8 2021, 9:18 AM

LGTM - if you improve the VLX codegen test coverage this should be OK, cheers

llvm/test/CodeGen/X86/masked_gather.ll
5

Please can you add AVX512VL coverage:

; RUN: llc -mtriple=x86_64-unknown-linux-gnu  -mattr=+avx512f < %s | FileCheck %s --check-prefixes=AVX512,AVX512F
; RUN: llc -mtriple=x86_64-unknown-linux-gnu  -mattr=+avx512f,+avx512vl < %s | FileCheck %s --check-prefixes=AVX512,AVX512VL
This revision is now accepted and ready to land.Jul 8 2021, 9:18 AM

LGTM - if you improve the VLX codegen test coverage this should be OK, cheers

Sure, will add.

This revision was automatically updated to reflect the committed changes.
vtjnash added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4080

Was this supposed to be CommonAlignment here? It seems like that added computation is currently unused

ABataev added inline comments.Jul 22 2021, 2:58 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4080

Yes, missed it, will fix, thanks!