This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix gather/scatter with large scales (PR55021)
AbandonedPublic

Authored by nikic on Apr 27 2022, 8:22 AM.

Details

Summary

The AVX512 gather/scatter instructions only support scales up to 8. If the scale is larger, perform an explicit shift operation instead.

I'm not familiar with the requirements for the instruction -- is a scale that does not match the scatter element width actually supported? The instruction looks sensible, but I don't know if this is actually legal. If not, the check could be change to compare with the element width.

Fixes https://github.com/llvm/llvm-project/issues/55021.

Diff Detail

Event Timeline

nikic created this revision.Apr 27 2022, 8:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 8:22 AM
nikic requested review of this revision.Apr 27 2022, 8:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 8:22 AM
craig.topper added inline comments.Apr 27 2022, 9:33 AM
llvm/test/CodeGen/X86/gather-scatter-opaque-ptr.ll
16

Maybe I don't understand opaque pointers, but why is the result element type of this GEP considered to be [512 x i8] and not i8?

We also have the X86ISD::MGATHER/MSCATTER nodes - if there any chance of encountering bad scales with those?

Do we need to fix the MGATHER handling in ReplaceNodeResults too?

craig.topper added inline comments.Apr 27 2022, 9:48 AM
llvm/test/CodeGen/X86/gather-scatter-opaque-ptr.ll
16

Nevermind, I don't think was thinking about this GEP correctly.

skan added a subscriber: skan.Apr 27 2022, 9:11 PM
nikic updated this revision to Diff 425723.Apr 28 2022, 2:07 AM

Also handle ReplaceNodeResults().

nikic added inline comments.Apr 28 2022, 2:09 AM
llvm/test/CodeGen/X86/gather-scatter-opaque-ptr-2.ll
4

As a side note, this seems to generate much worse code with AVX512? https://llvm.godbolt.org/z/16TvGrTY1

gpei added inline comments.Apr 28 2022, 3:06 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
31567
DATA_ADDR←BASE_ADDR + (SignExtend(VINDEX[i+31:i])*SCALE + DISP;

Gather will signextend(index) to 64-bit first, then multiple the scale. When Index is 32-bit, it may occur overflow if we just multiply scale to index.

BTW, can we just bail out in getUniformBase when the scale is not supported by the target.

nikic added inline comments.Apr 28 2022, 3:20 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
31567

BTW, can we just bail out in getUniformBase when the scale is not supported by the target.

This should be possible with a new TLI hook (unless something for this already exists?)

Alternatively, we can always enforce element type == scale in SDAG builder, and let target DAG combines combine mgather/mscatter + shift into scale. X86 already does this already.

nikic added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
31567

I've put up https://reviews.llvm.org/D124605 as a variant that adjusts SDAGBuilder instead. Let me know which general approach is preferred.

nikic abandoned this revision.Apr 29 2022, 5:57 AM

Abandoning this in favor of the simpler solution at D124530.