This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix EXTRACT_VECTOR_ELT with variable index.
ClosedPublic

Authored by igorb on Feb 7 2017, 3:16 PM.

Details

Summary

Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.
Its more profitable to goes through memory (1 cycles throughput)
than use VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles.
Tests on real hardware get similar results.

Removing the VINSERT node , we don't need it any more.

First patch to fix Bug 31731

Diff Detail

Repository
rL LLVM

Event Timeline

igorb created this revision.Feb 7 2017, 3:16 PM
zvi edited edge metadata.Feb 7 2017, 11:53 PM

Does it make sense to split this into 2 patches?

  1. Replace X86ISD::VINSERT with BUILD_VECTOR
  2. Implement missing functionality for variable-index extractelt?
lib/Target/X86/X86ISelLowering.cpp
13653

Make this a static function or consider making a member of X86Subtarget

igorb updated this revision to Diff 87639.Feb 8 2017, 5:39 AM
igorb retitled this revision from [AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. to [AVX512] Improve EXTRACT_VECTOR_ELT with variable index. .
igorb edited the summary of this revision. (Show Details)
igorb marked an inline comment as done.
delena added inline comments.Feb 8 2017, 6:14 AM
test/CodeGen/X86/avx512-insert-extract.ll
1461

You can use VPERMQ here.
Permute qwords in ymm3/m256/m64bcst using indexes in ymm2 and store the result in ymm1.
VPERMQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst

1593

varible -> variable

1607

You don't need stack. Use VPEMD, for the both KNL and SKX.

1717

Please use VPSHUFB instead of stack.

RKSimon added a subscriber: RKSimon.Feb 8 2017, 6:36 AM
RKSimon added inline comments.
lib/Target/X86/X86ISelLowering.cpp
13656

// VPERMD/PS

13662

// VPERMW

igorb updated this revision to Diff 87661.Feb 8 2017, 8:07 AM
igorb marked 6 inline comments as done.
igorb added a reviewer: RKSimon.

Fix according to comments ,
Thanks!

RKSimon edited edge metadata.Feb 8 2017, 8:58 AM

Add the new tests to trunk now to show current codegen?

lib/Target/X86/X86ISelLowering.cpp
13708

SmallVector<SDValue, 8> Ops(NumElts, DAG.getConstant(0, dl, MaskEltVT));

27007

Haven't you've lost the hasVLX() check by using isVPERMVsupported?

zvi added a comment.Feb 8 2017, 10:16 PM

Add the new tests to trunk now to show current codegen?

According to pr31731 some cases should miscompile on ToT

delena added inline comments.Feb 8 2017, 11:03 PM
test/CodeGen/X86/avx512-insert-extract.ll
1503

vmovslq + vmovq can be replaced with vmovd

1518

It should be vpermpd.

1691

I propose to extend v8i16 to v8i32 and use vpermd.

1721

Again, you can extend to v16i32.

1778

The result will be wrong. If you have '1' in MSB, the destination will be zeroed. You should mask the %index with 0xF:
and $15, %edi
vmovd %edi, %xmm

1838

You can extend to v32i16 and use VPERMW

igorb updated this revision to Diff 87793.Feb 9 2017, 4:36 AM
igorb marked 2 inline comments as done.
igorb edited the summary of this revision. (Show Details)
igorb added inline comments.Feb 9 2017, 4:38 AM
lib/Target/X86/X86ISelLowering.cpp
27007

I added patterns to TD file to handle 128/256 bit vector in case NoVLX using 512bit.
I think the check here only to insure VPERMV can be used.

test/CodeGen/X86/avx512-insert-extract.ll
1518

Thanks!

1691

Thanks Elena,
I prefer to handle all cases that can be extend in separate path.

1778

Max 5 bits are used , according to LLVM doc "If idx exceeds the length of val, the results are undefined."

igorb added a comment.Feb 9 2017, 4:39 AM

Add the new tests to trunk now to show current codegen?

Done,
Thanks.

RKSimon added inline comments.Feb 9 2017, 9:48 AM
lib/Target/X86/X86InstrAVX512.td
4844

Very minor but is it worth renaming this if its not just being used for shifts/rotates?

igorb updated this revision to Diff 89090.Feb 20 2017, 1:55 AM
igorb retitled this revision from [AVX512] Improve EXTRACT_VECTOR_ELT with variable index. to [AVX512] Fix EXTRACT_VECTOR_ELT with variable index. .
igorb edited the summary of this revision. (Show Details)
delena accepted this revision.Feb 20 2017, 2:50 AM

It's not AVX-512 specific any more, please re-title the review.
Please specify that you are removing the VINSERT together with the code that using it and we don't need this node any more.

LGTM after the minor changes in the comments.

lib/Target/X86/X86ISelLowering.cpp
13709

It's more profitable to go ..
than using ..

This revision is now accepted and ready to land.Feb 20 2017, 2:50 AM
igorb retitled this revision from [AVX512] Fix EXTRACT_VECTOR_ELT with variable index. to [X86] Fix EXTRACT_VECTOR_ELT with variable index. .Feb 20 2017, 4:27 AM
igorb edited the summary of this revision. (Show Details)
zvi accepted this revision.Feb 20 2017, 5:15 AM

LGTM with a minor comment

lib/Target/X86/X86ISelLowering.cpp
13712

Would be nice if you added in the comment the two generated code sequences you compared in IACA and their IACA analysis.

This revision was automatically updated to reflect the committed changes.