This is an archive of the discontinued LLVM Phabricator instance.

[X86] Allow pre-SSE41 targets to extract multiple v16i8 elements coming from the same DWORD/WORD super-element
ClosedPublic

Authored by RKSimon on Jul 26 2023, 10:49 AM.

Details

Summary

Pre-SSE41 targets tended to have weak (serial) GPR<->VEC moves, meaning we only allowed a single extraction before spilling the vector to stack and loading the element instead. But this didn't make use of the DWORD/WORD extraction we had to use could extract multiple i8 elements at the same time.

This patch attempts to determine if all uses of a vector are element extractions, and works out whether all the extractions share the same WORD or (lowest) DWORD, in which case we can perform a single extraction and just shift/truncate the individual elements.

Diff Detail

Event Timeline

RKSimon created this revision.Jul 26 2023, 10:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 10:49 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon requested review of this revision.Jul 26 2023, 10:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 10:49 AM
RKSimon retitled this revision from [X86] Allow pre-SSE41 targets to extract multiple elements coming from the same DWORD/WORD super-element to [X86] Allow pre-SSE41 targets to extract multiple v16i8 elements coming from the same DWORD/WORD super-element.Jul 26 2023, 12:36 PM
goldstein.w.n added inline comments.Jul 26 2023, 1:13 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
20575

ZEXT_MOVL too no?

20581

Shouldn't the number of bits set be a function of the extraction width? I.e id expect pextrw to set 2x as many bits as pextrb or will it never be legal to have pextr* that doesn't match ele width?

RKSimon added inline comments.Jul 26 2023, 2:13 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
20575

we only need to handle extractions from the vector to gpr - ZEXT_MOVL is the other way around.

20581

That should be handled by recursion via the BITCAST case below - PEXTRW / PEXTRB only work with their own v8i16 / v16i8 types (although tbh we never needed seperate node types - creating a single X86ISD::PEXTR nodetype would have been enough),

any other comments?

pengfei added inline comments.Jul 31 2023, 2:41 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
20695

Why we use getSizeInBits rather than check for i8?

20713

It's not clear to me here, the old code should have more chance to generate SRL than the new code due to the restriction. Which one it better? I didn't find a case to reflect the difference.

RKSimon added inline comments.Jul 31 2023, 2:49 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
20695

I'm not sure - my guess if whoever did it was following the pattern for 32-bit scalars below, but that was to handle floats. I'll change it in a pre-commit.

20713

I'm not sure I understand? The original code was always limited to a single extract, and for odd-indices (greater than 3) PEXTRW+SRL would be used. The change just means that we allow a pair of extracts (odd + even) that share a PEXTRW - so we still allow the SRL case.

pengfei accepted this revision.Jul 31 2023, 5:18 AM

LGTM.

llvm/lib/Target/X86/X86ISelLowering.cpp
20713

I see the point, thanks!

This revision is now accepted and ready to land.Jul 31 2023, 5:18 AM