This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Extract i1 elements from vXi1 bool vectors
ClosedPublic

Authored by RKSimon on Apr 26 2019, 7:56 AM.

Details

Summary

This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Apr 26 2019, 7:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2019, 7:56 AM
RKSimon added inline comments.Apr 26 2019, 8:09 AM
test/CodeGen/X86/avx2-masked-gather.ll
55 ↗(On Diff #196851)

Annoyingly this doesn't drop through the code, skipping zero-element gathers, instead repeating tests+branches.

393 ↗(On Diff #196851)

Repeated PACKSS+MOVMSK instructions - I assume due to it having a undef argument in the xmm0 slot.

test/CodeGen/X86/avx512-insert-extract.ll
1024 ↗(On Diff #196851)

we should be able to replace both shifts and the cmp with a single BT $32, %RAX ?

test/CodeGen/X86/masked_compressstore.ll
51 ↗(On Diff #196851)

More repeated PACKSS+MOVMSK

test/CodeGen/X86/movmsk-cmp.ll
4376

We should be able to reduce this to a TEST

4650

Repeated comparisons

test/CodeGen/X86/setcc-combine.ll
10 ↗(On Diff #196851)

I think this is effectively a NOT that we should be able to handle somehow.

RKSimon updated this revision to Diff 197084.Apr 29 2019, 4:17 AM

Updated version, which actually looks pretty similar to D59669.....

I'm performing all the extractions at the same time and only combining if (a) there are only extractions using the source vector and (b) there's more than 1 extract (I'd like to remove this limitation in the future).

The big difference is D59669 tries to limit to setcc usage only, which with our new SimplifyDemandedBits support is probably unnecessary.

@spatel How do you want to proceed with this + D59669?

RKSimon retitled this revision from [X86][SSE] Extract i1 elements from vXi1 bool vectors (WIP) to [X86][SSE] Extract i1 elements from vXi1 bool vectors.Apr 29 2019, 4:17 AM
RKSimon marked an inline comment as done.
RKSimon added inline comments.
test/CodeGen/X86/movmsk-cmp.ll
4543

Interesting that we merge this OR chain but fail with the AND chain on movmsk_v2i64

This looks like a good refinement of the earlier patch, so I'm happy to abandon D59669 and move forward here.

lib/Target/X86/X86ISelLowering.cpp
34919

Matter of taste, but don't need to explicitly use "llvm::" here.

34925–34926

Could use a formula comment of the transform around here such as:

// extelt vXi1 X, MaskIdx --> ((movmsk X) & Mask) == Mask
34931

Did framing this as:

(x & Mask == Mask)

rather than:

(x & Mask != 0)

make a difference in the output? If so, add a TODO comment about trying to avoid that problem.

RKSimon updated this revision to Diff 197324.Apr 30 2019, 7:33 AM
RKSimon marked an inline comment as done.

Addressed @spatel's comments

RKSimon marked an inline comment as done.Apr 30 2019, 7:39 AM
RKSimon added inline comments.
lib/Target/X86/X86ISelLowering.cpp
34931

Yes, using the (x & Mask != 0) causes the 2 cmoves in PR39665_c_ray to reappear - I'll raise a bug

34931
spatel accepted this revision.Apr 30 2019, 4:45 PM

LGTM

This revision is now accepted and ready to land.Apr 30 2019, 4:45 PM
RKSimon edited the summary of this revision. (Show Details)May 1 2019, 2:26 AM
This revision was automatically updated to reflect the committed changes.