This is an archive of the discontinued LLVM Phabricator instance.

[X86] More accurately model the cost of horizontal reductions.
ClosedPublic

Authored by craig.topper on Mar 19 2020, 11:45 PM.

Details

Summary

This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Diff Detail

Event Timeline

craig.topper created this revision.Mar 19 2020, 11:45 PM
Herald added a project: Restricted Project. · View Herald Transcript
craig.topper retitled this revision from [X86] More accurately model the cost of horizontal reductions for types. to [X86] More accurately model the cost of horizontal reductions..Mar 21 2020, 11:53 AM
RKSimon accepted this revision.Mar 22 2020, 8:23 AM

LGTM - cheers!

This revision is now accepted and ready to land.Mar 22 2020, 8:23 AM
This revision was automatically updated to reflect the committed changes.