This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Rework getInterleavedMemoryOpCost()
ClosedPublic

Authored by jonpa on Oct 10 2018, 4:12 AM.

Details

Reviewers
uweigand
Summary

Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.

This makes it more readable and just slightly more accurate generally. I think this is also a good starting point for any future further improvements.

Note: this should wait until the fix for scalarized loads has gone in, since I saw some loop that now got interleaved loads instead of scalarized, which seemed wrong in that case.

Diff Detail

Event Timeline

jonpa created this revision.Oct 10 2018, 4:12 AM
jonpa added a subscriber: uweigand.

It would be good to have some tests for this ...

lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
1030

I think "VecTy" here is already the wide vector type, containing VF * Factor many elements. Why do you need to multiply it size *again* with VF here?

jonpa updated this revision to Diff 172141.Nov 1 2018, 9:03 AM
jonpa added a reviewer: uweigand.
jonpa removed a subscriber: uweigand.

I tried my best to make some tests for the load interleave groups costs (the cost computations for store groups should be unchanged).

Again, this is a bit theoretical, and this is mostly a rewrite to make further improvements easier. For instance, this is still not considering loop invariant or constant values.

This seems to have some positive effect still on 5 loops on spec. See

lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
1030

Unless I am missing something, this is needed to compute the number of vector registers the operand at Index will inhabit after first loading the wide vector of WideTy and then doing the shuffling.

For instance, an i32 load group with factor=6 and VF=2 is first loaded with a <12 x i32> load (VecTy).
NumDstVecs here is 1, from ceil( (2 * 32) / 128).

In other words, 'VF * element size' is the size in bits of the loaded vector value derived from each Index.

uweigand accepted this revision.Nov 2 2018, 6:18 AM

LGTM, thanks.

lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
1030

Ah, of course. I missed that this is using get*Scalar*Size, i.e. the *element* size. That's fine then.

This revision is now accepted and ready to land.Nov 2 2018, 6:18 AM
jonpa closed this revision.Nov 2 2018, 10:18 AM

r345998