Copy statements are implemented using extensions nodes. That causes the following issues:
- It is not possible to recompute the dependencies that can be necessary for dependence analysis and, subsequently, detection of parallel loops.
- It is not possible to print schedules that contains extensions nodes.
- If someone wants to further optimize the schedule tree, the extensions nodes should be handled separately.
This patch makes polly insert copy statements into the domain of the schedule tree to avoid the issues stated above. In particular, it helps to generate parallel code in case of GEMM.
For example, in case of the GEMM from PolyBench 3.2, alpha = beta = 1, Intel Core i7-3820 SandyBridge, OMP_NUM_THREADS = 8, and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -DPOLYBENCH_USE_SCALAR_LB -mllvm —polly-parallel -lgomp
it helps to improve the execution time of generated code from 0.099 seconds to 0.038 seconds in case of standard dataset, from 0.734 seconds to 0.242 seconds in case of large data set, and from 5.78 seconds to 1.74 seconds in case extra large dataset.
TODO: Translate to C++ bindings.