This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
ClosedPublic

Authored by aymanmus on May 15 2017, 4:36 AM.

Details

Summary

AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Diff Detail

Event Timeline

aymanmus created this revision.May 15 2017, 4:36 AM
igorb edited edge metadata.May 22 2017, 12:50 AM

I think we need more generic solution, that latter could be expended to to other vxi1 types and improve cases like

    t44: v8i1 = X86ISD::CMPM t2, t4, Constant:i8<17>
    t45: i8 = bitcast t44
  t14: i16 = zero_extend t45
t32: v16i1 = bitcast t14

The relevant CONCAT_VECTORS implemented by insert_subvector, please move the implementation to insert1BitVector().
Would you consider to try the follow approach.

Detect that we have  insert_subvector zero, CMPM  or  insert_subvector zero, (AND  CMPM, MASK) , etc .. case and mark it legal.    
In td file define new instructions without DAG patterns.
add lower patters to map    
                 insert_subvector zeroVec,  CMP_DAG    ->  CMPM_NEW instructions
lib/Target/X86/X86InstrAVX512.td
2006

this part now has isCodeGenOnly= 1 and isAsmParserOnly = 1 , please check the result of tabel-gen

aymanmus updated this revision to Diff 100679.May 30 2017, 3:02 AM
aymanmus edited the summary of this revision. (Show Details)
aymanmus edited the summary of this revision. (Show Details)
zvi added inline comments.Jun 4 2017, 12:24 PM
lib/Target/X86/X86ISelLowering.cpp
5064

If think you should check here for AND

5101

promotes -> widens ?

5104

*during

zvi added inline comments.Jun 4 2017, 12:46 PM
lib/Target/X86/X86ISelLowering.cpp
7948

Can all these helpers return SDValue() instead of false? That would save the need to pass and modify the reference argument.

lib/Target/X86/X86InstrAVX512.td
2207

*axv -> avx

3033

Are all these covered by tests?

aymanmus updated this revision to Diff 101546.Jun 6 2017, 5:26 AM
aymanmus marked 3 inline comments as done.
aymanmus added inline comments.
lib/Target/X86/X86ISelLowering.cpp
5064

True should be returned only if the instruction zeros the upper bits of the destination k-register and can be masked.

7948

isExpandWithZeros doesn't modify the SDValue, but for isTypePromotionOfi1ZeroUpBits it is indeed better to change it.
Thanks.

lib/Target/X86/X86InstrAVX512.td
3033

Adding knl run for this (with checks on relevant test cases).
Thanks

igorb added inline comments.Jun 15 2017, 1:39 AM
lib/Target/X86/X86ISelLowering.cpp
5101

I think that the comment is unclear.
Please move the implementation after comments " // There are 3 possible cases: " line 5123

5105

please see line:5086

5106

Hi ,
This is incorrect, you should also check ISD::isBuildVectorAllZeros(Vec.getNode()) .
please add test case also.

7962

please check if you can use void SelectionDAG::computeKnownBits and update
void X86TargetLowering::computeKnownBitsForTargetNode instead

aymanmus marked 3 inline comments as done.Jun 15 2017, 5:09 AM
aymanmus added inline comments.
lib/Target/X86/X86ISelLowering.cpp
7962

I don't think we can use SelectionDAG::computeKnownBit here since its definition for vectors is to return known bits that are shared for all vector elements.
for example:

knownBits(<i8 00xxxxxx, i8 x00xxxxx> && <i8 xxxxxxxx, i8 xxxxxxxx>) = i8 x0xxxxxx

In our case, vector elements are of type i1, so known bits will always include nothing.

aymanmus updated this revision to Diff 102656.Jun 15 2017, 5:11 AM

Fixed condition for marking insert_vector legal. Thanks @igorb

aymanmus updated this revision to Diff 102661.Jun 15 2017, 5:39 AM
igorb accepted this revision.Jun 15 2017, 5:50 AM

LGTM

This revision is now accepted and ready to land.Jun 15 2017, 5:50 AM
This revision was automatically updated to reflect the committed changes.
RKSimon edited edge metadata.Jun 15 2017, 7:41 AM

@aymanmus I'm sorry but I had to revert this at rL305470 as it's causing tablegen crashes on windows buildbots: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/10612

@RKSimon I'm looking into that at the moment.

Recommitting after fixing bug in tablegen in rL306251 and rL306371.

test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll