This is an archive of the discontinued LLVM Phabricator instance.

[CostModel][X86] Fix overcounting arithmetic cost in illegal types getArithmeticReductionCost
ClosedPublic

Authored by craig.topper on Dec 6 2018, 4:04 PM.

Details

Summary

We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.

So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.

There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.

Diff Detail

Event Timeline

craig.topper created this revision.Dec 6 2018, 4:04 PM

Do the same thing to the min/max reduction cost.

RKSimon accepted this revision.Dec 7 2018, 4:02 AM

Nice catch! LGTM

This revision is now accepted and ready to land.Dec 7 2018, 4:02 AM
This revision was automatically updated to reflect the committed changes.