This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use PSADBW for v8i8 addition reductions.
ClosedPublic

Authored by craig.topper on Aug 11 2019, 11:03 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Aug 11 2019, 11:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2019, 11:03 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon added inline comments.Aug 12 2019, 3:36 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
35451 ↗(On Diff #214576)

we can easily support 2i8/4i8 as well by replacing this with an insertion into a zero v16i8 vector

That doesn’t seem profitable for v2i8. We’d be better off extracting both elements and doing a scalar add. For v4i8, I’m not sure. Psadbw is 5 cycles on some CPUs if I remember right, the normal expansion is probably faster on those CPUs.

RKSimon accepted this revision.Aug 14 2019, 7:38 AM

OK, let's just go for v8i8

This revision is now accepted and ready to land.Aug 14 2019, 7:38 AM
This revision was automatically updated to reflect the committed changes.