This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Change the determination of parameters of macro-kernel
ClosedPublic

Authored by gareevroman on Dec 21 2016, 3:09 AM.

Details

Summary

Typically processor architectures do not include an L3 cache, which means that Nc, the parameter of the micro-kernel, is, for all practical purposes, redundant ([1]). However, its small values can cause the redundant packing of the same elements of the matrix A, the first operand of the matrix multiplication. At the same time, big values of the parameter Nc can cause segmentation faults in case the available stack is exceeded.

This patch adds an option to specify the parameter Nc as a multiple of the parameter of the micro-kernel Nr.

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).

Refs.:

[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Diff Detail

Repository
rL LLVM

Event Timeline

gareevroman retitled this revision from to [Polly] Change the determination of parameters of macro-kernel.
gareevroman updated this object.
gareevroman added a subscriber: pollydev.
grosser edited edge metadata.Dec 21 2016, 3:12 AM

LGTM, but can you add a test case?

gareevroman updated this object.
gareevroman edited edge metadata.

Update according to the comments.

grosser accepted this revision.Dec 21 2016, 4:40 AM
grosser edited edge metadata.
This revision is now accepted and ready to land.Dec 21 2016, 4:40 AM
This revision was automatically updated to reflect the committed changes.