This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Improve handling and cost estimates of vector integer div/rem
ClosedPublic

Authored by jonpa on Oct 12 2018, 7:16 AM.

Details

Reviewers
uweigand
jonpa
Summary

I changed the constant cost of an sdiv/udiv to 20 in getArithmeticInstrCost for the divide with register case which uses a target divide instruction.

I also discovered that there is actually a third case (in addition to the divisor being a register or a power of 2 constant splat): a constant vector which is *not* a power of two splat. In that case we can get a sequence with a mul and shifts. So I now have three costs - see comment in the cost method.

In addition to this, I also found that only the incomplete vector ops, e.g. srem <2 x i32> would actually get that optimization into the mul sequence. A <4 x i32> would not, and I found a way to handle that by adding a target DAGCombine for vector SDIV, UDIV, SREM and UREM in order to scalarize them early. See comment in combineIntDIVREM for motivation (I suppose perhaps all of them could be scalarized early, but I added a check to just handle the case with a constant vector divisor).

Tests of costs are extended. Some tests removed from int-arith.ll that are now redundant.

Also testing that the constant divisors don't use a divide instruction by running llc in the CostModel test cases (like X86 does).

SPEC impact:
z13/z14: 5 loops in 2 files improved. 2 other files also changed, which seems in one case to relate to SLP costs, and the other file gets mul sequence instead of div for <4 x i32> div.

Diff Detail

Event Timeline

jonpa created this revision.Oct 12 2018, 7:16 AM

LGTM, thanks!

jonpa accepted this revision.Oct 25 2018, 2:57 PM

Committed as r345321.

This revision is now accepted and ready to land.Oct 25 2018, 2:57 PM
jonpa closed this revision.Oct 25 2018, 2:57 PM
lib/Target/SystemZ/SystemZTargetTransformInfo.cpp