Page MenuHomePhabricator

[x86] narrow 256-bit horizontal ops via demanded elements

Authored by spatel on Feb 6 2019, 12:58 PM.



256-bit horizontal math ops are an x86 monstrosity (and thankfully have not been extended to 512-bit AFAIK).

The two 128-bit halves operate on separate halves of the inputs. So if we don't demand anything in the upper half of the result, we can extract the low halves of the inputs, do the math, and then insert that result into a 256-bit output.

All of the extract/insert is free (ymm<-->xmm), so we're left with a narrower (cheaper) version of the original op.

In the affected tests based on:
...we see that the h-op narrowing can result in further narrowing of other math via existing generic transforms.

I originally drafted this patch as an exact pattern match starting from extract_vector_elt, but I thought we might see diffs starting from extract_subvector too, so I changed it to a more general demanded elements solution. There are no extra existing regression test improvements from that switch though, so we could go back. The patch is slightly less code this way though assuming I didn't miss any constraints.

Diff Detail


Event Timeline

spatel created this revision.Feb 6 2019, 12:58 PM
RKSimon added inline comments.Feb 7 2019, 2:29 AM
32980 ↗(On Diff #185618)

Can you use the extract128BitVector helper to do this?

spatel updated this revision to Diff 185803.Feb 7 2019, 9:58 AM
spatel marked an inline comment as done.

Patch updated:
Use {insert/extract}128BitVector to reduce code.

RKSimon accepted this revision.Feb 7 2019, 10:40 AM

LGTM with one (optional) minor.

32981 ↗(On Diff #185803)

You should be able to just use Ext0.getValueType()?

This revision is now accepted and ready to land.Feb 7 2019, 10:40 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2019, 7:22 AM