This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions
ClosedPublic

Authored by RKSimon on Apr 29 2019, 7:33 AM.

Details

Summary

We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector).

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Apr 29 2019, 7:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2019, 7:33 AM

I might've overlooked it - do we have test coverage for a 256-bit source vector where we extract from the upper elements?

lib/Target/X86/X86ISelLowering.cpp
19034 ↗(On Diff #197108)

Need to update this comment to something like:
This is a shuffle or free if the left index is 0.

RKSimon added inline comments.May 1 2019, 7:03 AM
test/CodeGen/X86/haddsub.ll
1012 ↗(On Diff #197538)

We still miss folding to extractf128+hadd+permilps - but the cost-benefit isn't great.

RKSimon marked an inline comment as done.May 1 2019, 9:22 AM
RKSimon added inline comments.
test/CodeGen/X86/haddsub.ll
1012 ↗(On Diff #197538)

FYI - I have a follow up mini-patch that will fix this.

spatel accepted this revision.May 1 2019, 10:01 AM

LGTM

This revision is now accepted and ready to land.May 1 2019, 10:01 AM
This revision was automatically updated to reflect the committed changes.