This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorize] Call isLegalMaskedGather before creating a gather TreeEntry
Needs ReviewPublic

Authored by Carrot on Dec 4 2020, 3:27 PM.

Details

Summary

Function isLegalMaskedGather should be queried when vectorizing a gather/scatter intrinsic/instruction, like LoopVectorize does in function setCostBasedWideningDecision. But SLPVectorize doesn't do this check, so it can even generate gather intrinsics on targets that don't support gather instructions, like x86 processor supports SSE but no AVX512.

Diff Detail

Event Timeline

Carrot created this revision.Dec 4 2020, 3:27 PM
Carrot requested review of this revision.Dec 4 2020, 3:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2020, 3:27 PM
Carrot added a reviewer: fhahn.
fhahn added a comment.Dec 5 2020, 2:42 AM

I think using isLegalMaskedGather is a workaround for a cost-model deficiencies. If the cost-model accurately estimates the cost of masked gathers & scatters, they would not be selected by the SLP vectorizer. I think it would be preferable to improve the cost modeling for gathers/scatters on X86 and remove the workaround in LoopVectorize as well. I recently added an initial estimate for gather & scatter costs to BasicTTI (D91984) and this resolved similar issues on AArch64. Would it be possible to adjust X86TTIImpl::getGatherScatterOpCost to more accurately estimate the cost, maybe by falling back to BasicTTI in cases not handled there yet?

https://bugs.llvm.org/show_bug.cgi?id=48044 - related PR about (not ideal?) cost model for gathers.

I'm agree with @fhahn -- tuning cost model is more right way. Also it could be the case of arch where gathers are missing but it's beneficial to use them for vectorization tree building. They are lowered to scalarized instrs further.

Yes improving the cost model makes more sense (both for gather/scatter and gep vectorization costs) - SLP should only ever create a gather with a constant mask, so at the very least ScalarizeMaskedMemIntrin should do a good job of converting the loads to a BUILD_VECTOR sequence.

If we can get examples of bad vectorization using gathers I can take a look at working out exactly where the cost model is falling down.

Carrot added a comment.Dec 7 2020, 9:22 AM

@fhahn, thank you for the clarification of the isLegalMaskedGather and getGatherScatterOpCost usage.

@RKSimon, thank you for your help. I do have another test case that has bad vectorization with scatters, https://bugs.llvm.org/show_bug.cgi?id=48429.