This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Attempt to lower vec_reduce_add patterns with PSADBW for zero-extended vXi8 sources
ClosedPublic

Authored by RKSimon on Feb 19 2022, 1:14 PM.

Details

Summary

For i16/32/64 vectors, if the upper bits are known to be zero, then we can try to truncate to vXi8 (if its worth it) and perform this as a PSADBW to add+zext each v4i8 subvector to a i64 sum, which we can then reduce together.

This addresses some of the PR42674 test cases where the source data was vXi8 but had been extended to match a wider unsigned integer accumulator.

Diff Detail

Event Timeline

RKSimon created this revision.Feb 19 2022, 1:14 PM
RKSimon requested review of this revision.Feb 19 2022, 1:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2022, 1:14 PM

Clang-format the code.

llvm/lib/Target/X86/X86ISelLowering.cpp
43062–43064

I don't understand the code quite well, some doubts:

  1. If the source are known <= 255, why do we need truncate it. Should be better to bitcast directly?
  2. If the ByteVT < 128, why don't we widen it with undef and return the value of lane 0 after PSADBW?
RKSimon added inline comments.Feb 27 2022, 3:54 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
43062–43064

1 - I'm not sure I follow - PSADBW could be used with purely bitcasted data but we'd lose some of the benefits of avoiding several stages of reduction. But I guess it could be useful for i16/i32 data to avoid a truncation and still get a horizontal-sum over fewer elements - is that what you meant?

2 - WidenToV16I8 does leave the upper 64-bit element undef, the v4i8 ISD::INSERT_VECTOR_ELT codepath is just to avoid some issues with combines failing to make use of the the implicit upperbit zeroing of MOVD.

pengfei accepted this revision.Feb 27 2022, 4:39 AM

LGTM.

llvm/lib/Target/X86/X86ISelLowering.cpp
43062–43064

1 - Yes.
2 - I see. I thought ZeroExtend will zeroing the upper 64-bit.

This revision is now accepted and ready to land.Feb 27 2022, 4:39 AM
This revision was landed with ongoing or failed builds.Feb 27 2022, 7:18 AM
This revision was automatically updated to reflect the committed changes.