This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Align newly created arrays to the first level cache line boundary
ClosedPublic

Authored by gareevroman on Dec 21 2016, 3:13 AM.

Details

Summary

Aligning data to cache lines boundaries helps to avoid overheads related to an access to it ([1]). This patch aligns newly created arrays and adds an option to specify the first level cache line size. By default we use 64 bytes, which is a typical cache-line size ([2]).

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak).

Refs.:

[1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access
[2] - http://igoro.com/archive/gallery-of-processor-cache-effects/

Diff Detail

Event Timeline

gareevroman retitled this revision from to [Polly] Align newly created arrays to the first level cache line boundary.
gareevroman updated this object.
grosser accepted this revision.Dec 21 2016, 3:15 AM
grosser edited edge metadata.

LGTM, could you add the theoretical performance into the commit message as well?

This revision is now accepted and ready to land.Dec 21 2016, 3:15 AM
gareevroman updated this object.
gareevroman edited edge metadata.
gareevroman added a subscriber: pollydev.

Update according to the comments.

Hi Roman,

any plans to commit this?

gareevroman closed this revision.Jan 18 2017, 4:28 AM

Hi Roman,

any plans to commit this?

Hi Tobias,

this was committed in r290253.

Interesting, maybe we need to get this from TargetTransformInfo eventually, because I saw a similar global variable/option in LoadStoreVectorizer as well.

Hi Hongbin,

thanks for the comment!

Interesting, maybe we need to get this from TargetTransformInfo eventually, because I saw a similar global variable/option in LoadStoreVectorizer as well.

Right. We decided not to do it now, because it would probably require passing of a TargetTransformInfo object to IslNodeBuilder. It seems that such an object wouldn't be useful for anything else at the moment. Furthermore, in case of some platforms (e.g. Intel Core i7-3820 SandyBridge), it reports that the cache line size is equal to zero.