Index: polly/trunk/docs/HowToManuallyUseTheIndividualPiecesOfPolly.rst =================================================================== --- polly/trunk/docs/HowToManuallyUseTheIndividualPiecesOfPolly.rst +++ polly/trunk/docs/HowToManuallyUseTheIndividualPiecesOfPolly.rst @@ -0,0 +1,475 @@ +================================================== +How to manually use the Individual pieces of Polly +================================================== + +Execute the individual Polly passes manually +============================================ + +.. sectionauthor:: Singapuram Sanjay Srivallabh + +This example presents the individual passes that are involved when optimizing +code with Polly. We show how to execute them individually and explain for +each which analysis is performed or what transformation is applied. In this +example the polyhedral transformation is user-provided to show how much +performance improvement can be expected by an optimal automatic optimizer. + +1. **Create LLVM-IR from the C code** +------------------------------------- + Polly works on LLVM-IR. Hence it is necessary to translate the source + files into LLVM-IR. If more than one file should be optimized the + files can be combined into a single file with llvm-link. + + .. code-block:: console + + clang -S -emit-llvm matmul.c -o matmul.s + + +2. **Prepare the LLVM-IR for Polly** +------------------------------------ + + Polly is only able to work with code that matches a canonical form. + To translate the LLVM-IR into this form we use a set of + canonicalication passes. They are scheduled by using + '-polly-canonicalize'. + + .. code-block:: console + + opt -S -polly-canonicalize matmul.s > matmul.preopt.ll + +3. **Show the SCoPs detected by Polly (optional)** +-------------------------------------------------- + + To understand if Polly was able to detect SCoPs, we print the structure + of the detected SCoPs. In our example two SCoPs are detected. One in + 'init_array' the other in 'main'. + + .. code-block:: console + + $ opt -polly-ast -analyze -q matmul.preopt.ll -polly-process-unprofitable + + .. code-block:: guess + + :: isl ast :: init_array :: %for.cond1.preheader---%for.end19 + + if (1) + + for (int c0 = 0; c0 <= 1535; c0 += 1) + for (int c1 = 0; c1 <= 1535; c1 += 1) + Stmt_for_body3(c0, c1); + + else + { /* original code */ } + + :: isl ast :: main :: %for.cond1.preheader---%for.end30 + + if (1) + + for (int c0 = 0; c0 <= 1535; c0 += 1) + for (int c1 = 0; c1 <= 1535; c1 += 1) { + Stmt_for_body3(c0, c1); + for (int c2 = 0; c2 <= 1535; c2 += 1) + Stmt_for_body8(c0, c1, c2); + } + + else + { /* original code */ } + +4. **Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)** +---------------------------------------------------------------------------------------- + + Polly can use graphviz to graphically show a CFG in which the detected + SCoPs are highlighted. It can also create '.dot' files that can be + translated by the 'dot' utility into various graphic formats. + + + .. code-block:: console + + $ opt -view-scops -disable-output matmul.preopt.ll + $ opt -view-scops-only -disable-output matmul.preopt.ll + + The output for the different functions: + + - view-scops : main_, init_array_, print_array_ + - view-scops-only : main-scopsonly_, init_array-scopsonly_, print_array-scopsonly_ + +.. _main: http://polly.llvm.org/experiments/matmul/scops.main.dot.png +.. _init_array: http://polly.llvm.org/experiments/matmul/scops.init_array.dot.png +.. _print_array: http://polly.llvm.org/experiments/matmul/scops.print_array.dot.png +.. _main-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.main.dot.png +.. _init_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.init_array.dot.png +.. _print_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.print_array.dot.png + +5. **View the polyhedral representation of the SCoPs** +------------------------------------------------------ + + .. code-block:: console + + $ opt -polly-scops -analyze matmul.preopt.ll -polly-process-unprofitable + + .. code-block:: guess + + [...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end19' in function 'init_array': + Function: init_array + Region: %for.cond1.preheader---%for.end19 + Max Loop Depth: 2 + Invariant Accesses: { + } + Context: + { : } + Assumed Context: + { : } + Invalid Context: + { : 1 = 0 } + Arrays { + float MemRef_A[*][1536]; // Element size 4 + float MemRef_B[*][1536]; // Element size 4 + } + Arrays (Bounds as pw_affs) { + float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4 + float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4 + } + Alias Groups (0): + n/a + Statements { + Stmt_for_body3 + Domain := + { Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }; + Schedule := + { Stmt_for_body3[i0, i1] -> [i0, i1] }; + MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] + { Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] }; + MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] + { Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] }; + } + [...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end30' in function 'main': + Function: main + Region: %for.cond1.preheader---%for.end30 + Max Loop Depth: 3 + Invariant Accesses: { + } + Context: + { : } + Assumed Context: + { : } + Invalid Context: + { : 1 = 0 } + Arrays { + float MemRef_C[*][1536]; // Element size 4 + float MemRef_A[*][1536]; // Element size 4 + float MemRef_B[*][1536]; // Element size 4 + } + Arrays (Bounds as pw_affs) { + float MemRef_C[*][ { [] -> [(1536)] } ]; // Element size 4 + float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4 + float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4 + } + Alias Groups (0): + n/a + Statements { + Stmt_for_body3 + Domain := + { Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }; + Schedule := + { Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] }; + MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] + { Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }; + Stmt_for_body8 + Domain := + { Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }; + Schedule := + { Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] }; + ReadAccess := [Reduction Type: NONE] [Scalar: 0] + { Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }; + ReadAccess := [Reduction Type: NONE] [Scalar: 0] + { Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }; + ReadAccess := [Reduction Type: NONE] [Scalar: 0] + { Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }; + MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] + { Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }; + } + + +6. **Show the dependences for the SCoPs** +----------------------------------------- + + .. code-block:: console + + $ opt -polly-dependences -analyze matmul.preopt.ll -polly-process-unprofitable + + .. code-block:: guess + + [...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end19' in function 'init_array': + RAW dependences: + { } + WAR dependences: + { } + WAW dependences: + { } + Reduction dependences: + n/a + Transitive closure of reduction dependences: + { } + [...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end30' in function 'main': + RAW dependences: + { Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 } + WAR dependences: + { } + WAW dependences: + { Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 } + Reduction dependences: + n/a + Transitive closure of reduction dependences: + { } + +7. **Export jscop files** +------------------------- + + .. code-block:: console + + $ opt -polly-export-jscop matmul.preopt.ll -polly-process-unprofitable + + .. code-block:: guess + + [...]Writing JScop '%for.cond1.preheader---%for.end19' in function 'init_array' to './init_array___%for.cond1.preheader---%for.end19.jscop'. + + Writing JScop '%for.cond1.preheader---%for.end30' in function 'main' to './main___%for.cond1.preheader---%for.end30.jscop'. + + + +8. **Import the changed jscop files and print the updated SCoP structure (optional)** +------------------------------------------------------------------------------------- + + Polly can reimport jscop files, in which the schedules of the statements + are changed. These changed schedules are used to descripe + transformations. It is possible to import different jscop files by + providing the postfix of the jscop file that is imported. + + We apply three different transformations on the SCoP in the main + function. The jscop files describing these transformations are + hand written (and available in docs/experiments/matmul). + + **No Polly** + + As a baseline we do not call any Polly code generation, but only apply the normal -O3 optimizations. + + .. code-block:: console + + $ opt matmul.preopt.ll -polly-import-jscop -polly-ast -analyze -polly-process-unprofitable + + .. code-block:: c + + [...] + :: isl ast :: main :: %for.cond1.preheader---%for.end30 + + if (1) + + for (int c0 = 0; c0 <= 1535; c0 += 1) + for (int c1 = 0; c1 <= 1535; c1 += 1) { + Stmt_for_body3(c0, c1); + for (int c3 = 0; c3 <= 1535; c3 += 1) + Stmt_for_body8(c0, c1, c3); + } + + else + { /* original code */ } + [...] + + **Loop Interchange (and Fission to allow the interchange)** + + We split the loops and can now apply an interchange of the loop dimensions that enumerate Stmt_for_body8. + + .. Although I feel (and have created a .jscop) we can avoid splitting the loops. + + .. code-block:: console + + $ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-ast -analyze -polly-process-unprofitable + + .. code-block:: c + + [...] + :: isl ast :: main :: %for.cond1.preheader---%for.end30 + + if (1) + + { + for (int c1 = 0; c1 <= 1535; c1 += 1) + for (int c2 = 0; c2 <= 1535; c2 += 1) + Stmt_for_body3(c1, c2); + for (int c1 = 0; c1 <= 1535; c1 += 1) + for (int c2 = 0; c2 <= 1535; c2 += 1) + for (int c3 = 0; c3 <= 1535; c3 += 1) + Stmt_for_body8(c1, c3, c2); + } + + else + { /* original code */ } + [...] + + **Interchange + Tiling** + + In addition to the interchange we now tile the second loop nest. + + .. code-block:: console + + $ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable + + .. code-block:: c + + [...] + :: isl ast :: main :: %for.cond1.preheader---%for.end30 + + if (1) + + { + for (int c1 = 0; c1 <= 1535; c1 += 1) + for (int c2 = 0; c2 <= 1535; c2 += 1) + Stmt_for_body3(c1, c2); + for (int c1 = 0; c1 <= 1535; c1 += 64) + for (int c2 = 0; c2 <= 1535; c2 += 64) + for (int c3 = 0; c3 <= 1535; c3 += 64) + for (int c4 = c1; c4 <= c1 + 63; c4 += 1) + for (int c5 = c3; c5 <= c3 + 63; c5 += 1) + for (int c6 = c2; c6 <= c2 + 63; c6 += 1) + Stmt_for_body8(c4, c6, c5); + } + + else + { /* original code */ } + [...] + + + **Interchange + Tiling + Strip-mining to prepare vectorization** + + To later allow vectorization we create a so called trivially + parallelizable loop. It is innermost, parallel and has only four + iterations. It can be replaced by 4-element SIMD instructions. + + .. code-block:: console + + $ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable + + .. code-block:: c + + [...] + :: isl ast :: main :: %for.cond1.preheader---%for.end30 + + if (1) + + { + for (int c1 = 0; c1 <= 1535; c1 += 1) + for (int c2 = 0; c2 <= 1535; c2 += 1) + Stmt_for_body3(c1, c2); + for (int c1 = 0; c1 <= 1535; c1 += 64) + for (int c2 = 0; c2 <= 1535; c2 += 64) + for (int c3 = 0; c3 <= 1535; c3 += 64) + for (int c4 = c1; c4 <= c1 + 63; c4 += 1) + for (int c5 = c3; c5 <= c3 + 63; c5 += 1) + for (int c6 = c2; c6 <= c2 + 63; c6 += 4) + for (int c7 = c6; c7 <= c6 + 3; c7 += 1) + Stmt_for_body8(c4, c7, c5); + } + + else + { /* original code */ } + [...] + +9. **Codegenerate the SCoPs** +----------------------------- + + This generates new code for the SCoPs detected by polly. If + -polly-import-jscop is present, transformations specified in the + imported jscop files will be applied. + + + .. code-block:: console + + $ opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll + + .. code-block:: console + + $ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-codegen -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged.ll + + .. code-block:: guess + + Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged'. + File could not be read: No such file or directory + Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged'. + + .. code-block:: console + + $ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-codegen -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled.ll + + .. code-block:: guess + + Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled'. + File could not be read: No such file or directory + Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled'. + + .. code-block:: console + + $ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled+vector.ll + + .. code-block:: guess + + Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'. + File could not be read: No such file or directory + Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'. + + .. code-block:: console + + $ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-parallel -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled+openmp.ll + + .. code-block:: guess + + Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'. + File could not be read: No such file or directory + Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'. + + +10. **Create the executables** +------------------------------ + + .. code-block:: console + + $ llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s -o matmul.normalopt.exe + $ llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe + $ llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe + $ llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe + $ llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && gcc -fopenmp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe + +11. **Compare the runtime of the executables** +---------------------------------------------- + + By comparing the runtimes of the different code snippets we see that a + simple loop interchange gives here the largest performance boost. + However in this case, adding vectorization and using OpenMP degrades + the performance. + + .. code-block:: console + + $ time ./matmul.normalopt.exe + + real 0m11.295s + user 0m11.288s + sys 0m0.004s + $ time ./matmul.polly.interchanged.exe + + real 0m0.988s + user 0m0.980s + sys 0m0.008s + $ time ./matmul.polly.interchanged+tiled.exe + + real 0m0.830s + user 0m0.816s + sys 0m0.012s + $ time ./matmul.polly.interchanged+tiled+vector.exe + + real 0m5.430s + user 0m5.424s + sys 0m0.004s + $ time ./matmul.polly.interchanged+tiled+vector+openmp.exe + + real 0m3.184s + user 0m11.972s + sys 0m0.036s + Index: polly/trunk/docs/experiments/matmul/init_array___%for.cond1.preheader---%for.end19.jscop =================================================================== --- polly/trunk/docs/experiments/matmul/init_array___%for.cond1.preheader---%for.end19.jscop +++ polly/trunk/docs/experiments/matmul/init_array___%for.cond1.preheader---%for.end19.jscop @@ -0,0 +1,33 @@ +{ + "arrays" : [ + { + "name" : "MemRef_A", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_B", + "sizes" : [ "1536" ], + "type" : "float" + } + ], + "context" : "{ : }", + "name" : "%for.cond1.preheader---%for.end19", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }", + "name" : "Stmt_for_body3", + "schedule" : "{ Stmt_for_body3[i0, i1] -> [i0, i1] }" + } + ] +} Index: polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop =================================================================== --- polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop +++ polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop @@ -0,0 +1,57 @@ +{ + "arrays" : [ + { + "name" : "MemRef_C", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_A", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_B", + "sizes" : [ "1536" ], + "type" : "float" + } + ], + "context" : "{ : }", + "name" : "%for.cond1.preheader---%for.end30", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }", + "name" : "Stmt_for_body3", + "schedule" : "{ Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] }" + }, + { + "accesses" : [ + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }", + "name" : "Stmt_for_body8", + "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] }" + } + ] +} Index: polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged =================================================================== --- polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged +++ polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged @@ -0,0 +1,57 @@ +{ + "arrays" : [ + { + "name" : "MemRef_C", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_A", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_B", + "sizes" : [ "1536" ], + "type" : "float" + } + ], + "context" : "{ : }", + "name" : "%for.cond1.preheader---%for.end30", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }", + "name" : "Stmt_for_body3", + "schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0] }" + }, + { + "accesses" : [ + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }", + "name" : "Stmt_for_body8", + "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, i0, i2, i1] }" + } + ] +} Index: polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled =================================================================== --- polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled +++ polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled @@ -0,0 +1,57 @@ +{ + "arrays" : [ + { + "name" : "MemRef_C", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_A", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_B", + "sizes" : [ "1536" ], + "type" : "float" + } + ], + "context" : "{ : }", + "name" : "%for.cond1.preheader---%for.end30", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }", + "name" : "Stmt_for_body3", + "schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0, 0, 0, 0 ] }" + }, + { + "accesses" : [ + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }", + "name" : "Stmt_for_body8", + "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }" + } + ] +} Index: polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector =================================================================== --- polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector +++ polly/trunk/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector @@ -0,0 +1,57 @@ +{ + "arrays" : [ + { + "name" : "MemRef_C", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_A", + "sizes" : [ "1536" ], + "type" : "float" + }, + { + "name" : "MemRef_B", + "sizes" : [ "1536" ], + "type" : "float" + } + ], + "context" : "{ : }", + "name" : "%for.cond1.preheader---%for.end30", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }", + "name" : "Stmt_for_body3", + "schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0, 0, 0, 0, 0 ] }" + }, + { + "accesses" : [ + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }" + } + ], + "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }", + "name" : "Stmt_for_body8", + "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, o0, o1, o2, i0, i2, oo1, i1]: o0 <= i0 < o0 + 64 and o1 <= oo1 < o1 + 64 and o2 <= i2 < o2 + 64 and oo1 <= i1 < oo1 + 4 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and oo1 % 4 = 0 }" + } + ] +} Index: polly/trunk/docs/index.rst =================================================================== --- polly/trunk/docs/index.rst +++ polly/trunk/docs/index.rst @@ -23,8 +23,8 @@ Architecture UsingPollyWithClang + HowToManuallyUseTheIndividualPiecesOfPolly -* `How to manually use the individual pieces of Polly `_ * `A list of Polly passes `_ Indices and tables Index: polly/trunk/www/experiments/matmul/init_array___%for.cond---%for.end19.jscop =================================================================== --- polly/trunk/www/experiments/matmul/init_array___%for.cond---%for.end19.jscop +++ polly/trunk/www/experiments/matmul/init_array___%for.cond---%for.end19.jscop @@ -1,21 +0,0 @@ -{ - "context" : "{ : }", - "name" : "for.cond => for.end19", - "statements" : [ - { - "accesses" : [ - { - "kind" : "write", - "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_A[1536i0 + i1] }" - }, - { - "kind" : "write", - "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_B[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_for_body3[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }", - "name" : "Stmt_for_body3", - "schedule" : "{ Stmt_for_body3[i0, i1] -> schedule[0, i0, 0, i1, 0] }" - } - ] -} Index: polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop =================================================================== --- polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop +++ polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop @@ -1,40 +0,0 @@ -{ - "context" : "{ : }", - "name" : "for.cond => for.end30", - "statements" : [ - { - "accesses" : [ - { - "kind" : "write", - "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_for_body3[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }", - "name" : "Stmt_for_body3", - "schedule" : "{ Stmt_for_body3[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }" - }, - { - "accesses" : [ - { - "kind" : "read", - "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" - }, - { - "kind" : "read", - "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[1536i0 + i2] }" - }, - { - "kind" : "read", - "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }" - }, - { - "kind" : "write", - "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_for_body8[i0, i1, i2] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 and i2 >= 0 and i2 <= 1535 }", - "name" : "Stmt_for_body8", - "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> schedule[0, i0, 0, i1, 1, i2, 0] }" - } - ] -} Index: polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged =================================================================== --- polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged +++ polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged @@ -1,40 +0,0 @@ -{ - "context" : "{ [] }", - "name" : "%1 => %17", - "statements" : [ - { - "accesses" : [ - { - "kind" : "write", - "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", - "name" : "Stmt_4", - "schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }" - }, - { - "accesses" : [ - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" - }, - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }" - }, - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }" - }, - { - "kind" : "write", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", - "name" : "Stmt_6", - "schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, i0, 0, i2, 0, i1, 0] }" - } - ] -} Index: polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled =================================================================== --- polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled +++ polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled @@ -1,40 +0,0 @@ -{ - "context" : "{ [] }", - "name" : "%1 => %17", - "statements" : [ - { - "accesses" : [ - { - "kind" : "write", - "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", - "name" : "Stmt_4", - "schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }" - }, - { - "accesses" : [ - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" - }, - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }" - }, - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }" - }, - { - "kind" : "write", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", - "name" : "Stmt_6", - "schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }" - } - ] -} Index: polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled+vector =================================================================== --- polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled+vector +++ polly/trunk/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled+vector @@ -1,40 +0,0 @@ -{ - "context" : "{ [] }", - "name" : "%1 => %17", - "statements" : [ - { - "accesses" : [ - { - "kind" : "write", - "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", - "name" : "Stmt_4", - "schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0, 0] }" - }, - { - "accesses" : [ - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" - }, - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }" - }, - { - "kind" : "read", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }" - }, - { - "kind" : "write", - "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" - } - ], - "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", - "name" : "Stmt_6", - "schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, o0, o1, o2, i0, i2, ii1, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and ii1 % 4 = 0 and ii1 <= i1 < ii1 + 4}" - } - ] -} Index: polly/trunk/www/experiments/matmul/matmul.preopt.ll =================================================================== --- polly/trunk/www/experiments/matmul/matmul.preopt.ll +++ polly/trunk/www/experiments/matmul/matmul.preopt.ll @@ -1,5 +1,6 @@ ; ModuleID = 'matmul.s' -target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" +source_filename = "matmul.c" +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] } @@ -7,114 +8,100 @@ @A = common global [1536 x [1536 x float]] zeroinitializer, align 16 @B = common global [1536 x [1536 x float]] zeroinitializer, align 16 -@stdout = external global %struct._IO_FILE* +@stdout = external global %struct._IO_FILE*, align 8 @.str = private unnamed_addr constant [5 x i8] c"%lf \00", align 1 @C = common global [1536 x [1536 x float]] zeroinitializer, align 16 -@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1 +@.str.1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1 ; Function Attrs: nounwind uwtable define void @init_array() #0 { entry: - br label %for.cond + br label %entry.split -for.cond: ; preds = %for.inc17, %entry - %0 = phi i64 [ %indvar.next2, %for.inc17 ], [ 0, %entry ] - %exitcond3 = icmp ne i64 %0, 1536 - br i1 %exitcond3, label %for.body, label %for.end19 - -for.body: ; preds = %for.cond - br label %for.cond1 - -for.cond1: ; preds = %for.inc, %for.body - %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ] - %arrayidx6 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %0, i64 %indvar - %arrayidx16 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %0, i64 %indvar - %1 = mul i64 %0, %indvar - %mul = trunc i64 %1 to i32 - %exitcond = icmp ne i64 %indvar, 1536 - br i1 %exitcond, label %for.body3, label %for.end +entry.split: ; preds = %entry + br label %for.cond1.preheader -for.body3: ; preds = %for.cond1 - %rem = srem i32 %mul, 1024 - %add = add nsw i32 1, %rem +for.cond1.preheader: ; preds = %entry.split, %for.inc17 + %indvars.iv5 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next6, %for.inc17 ] + br label %for.body3 + +for.body3: ; preds = %for.cond1.preheader, %for.body3 + %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ] + %0 = mul nuw nsw i64 %indvars.iv, %indvars.iv5 + %1 = trunc i64 %0 to i32 + %rem = srem i32 %1, 1024 + %add = add nsw i32 %rem, 1 %conv = sitofp i32 %add to double - %div = fdiv double %conv, 2.000000e+00 + %div = fmul double %conv, 5.000000e-01 %conv4 = fptrunc double %div to float + %arrayidx6 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %indvars.iv5, i64 %indvars.iv store float %conv4, float* %arrayidx6, align 4 - %rem8 = srem i32 %mul, 1024 - %add9 = add nsw i32 1, %rem8 + %2 = mul nuw nsw i64 %indvars.iv, %indvars.iv5 + %3 = trunc i64 %2 to i32 + %rem8 = srem i32 %3, 1024 + %add9 = add nsw i32 %rem8, 1 %conv10 = sitofp i32 %add9 to double - %div11 = fdiv double %conv10, 2.000000e+00 + %div11 = fmul double %conv10, 5.000000e-01 %conv12 = fptrunc double %div11 to float + %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %indvars.iv5, i64 %indvars.iv store float %conv12, float* %arrayidx16, align 4 - br label %for.inc + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 + %exitcond = icmp ne i64 %indvars.iv.next, 1536 + br i1 %exitcond, label %for.body3, label %for.inc17 + +for.inc17: ; preds = %for.body3 + %indvars.iv.next6 = add nuw nsw i64 %indvars.iv5, 1 + %exitcond7 = icmp ne i64 %indvars.iv.next6, 1536 + br i1 %exitcond7, label %for.cond1.preheader, label %for.end19 -for.inc: ; preds = %for.body3 - %indvar.next = add i64 %indvar, 1 - br label %for.cond1 - -for.end: ; preds = %for.cond1 - br label %for.inc17 - -for.inc17: ; preds = %for.end - %indvar.next2 = add i64 %0, 1 - br label %for.cond - -for.end19: ; preds = %for.cond +for.end19: ; preds = %for.inc17 ret void } ; Function Attrs: nounwind uwtable define void @print_array() #0 { entry: - br label %for.cond + br label %entry.split -for.cond: ; preds = %for.inc10, %entry - %indvar1 = phi i64 [ %indvar.next2, %for.inc10 ], [ 0, %entry ] - %exitcond3 = icmp ne i64 %indvar1, 1536 - br i1 %exitcond3, label %for.body, label %for.end12 - -for.body: ; preds = %for.cond - br label %for.cond1 - -for.cond1: ; preds = %for.inc, %for.body - %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ] - %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar - %j.0 = trunc i64 %indvar to i32 - %exitcond = icmp ne i64 %indvar, 1536 - br i1 %exitcond, label %for.body3, label %for.end +entry.split: ; preds = %entry + br label %for.cond1.preheader -for.body3: ; preds = %for.cond1 - %0 = load %struct._IO_FILE** @stdout, align 8 - %1 = load float* %arrayidx5, align 4 - %conv = fpext float %1 to double - %call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %0, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv) - %rem = srem i32 %j.0, 80 +for.cond1.preheader: ; preds = %entry.split, %for.end + %indvars.iv6 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next7, %for.end ] + %0 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8 + br label %for.body3 + +for.body3: ; preds = %for.cond1.preheader, %for.inc + %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.inc ] + %1 = phi %struct._IO_FILE* [ %0, %for.cond1.preheader ], [ %5, %for.inc ] + %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv6, i64 %indvars.iv + %2 = load float, float* %arrayidx5, align 4 + %conv = fpext float %2 to double + %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2 + %3 = trunc i64 %indvars.iv to i32 + %rem = srem i32 %3, 80 %cmp6 = icmp eq i32 %rem, 79 - br i1 %cmp6, label %if.then, label %if.end + br i1 %cmp6, label %if.then, label %for.inc if.then: ; preds = %for.body3 - %2 = load %struct._IO_FILE** @stdout, align 8 - %call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) - br label %if.end - -if.end: ; preds = %if.then, %for.body3 + %4 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8 + %fputc3 = tail call i32 @fputc(i32 10, %struct._IO_FILE* %4) br label %for.inc -for.inc: ; preds = %if.end - %indvar.next = add i64 %indvar, 1 - br label %for.cond1 - -for.end: ; preds = %for.cond1 - %3 = load %struct._IO_FILE** @stdout, align 8 - %call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %3, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) - br label %for.inc10 - -for.inc10: ; preds = %for.end - %indvar.next2 = add i64 %indvar1, 1 - br label %for.cond +for.inc: ; preds = %for.body3, %if.then + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 + %5 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8 + %exitcond = icmp ne i64 %indvars.iv.next, 1536 + br i1 %exitcond, label %for.body3, label %for.end + +for.end: ; preds = %for.inc + %.lcssa = phi %struct._IO_FILE* [ %5, %for.inc ] + %fputc = tail call i32 @fputc(i32 10, %struct._IO_FILE* %.lcssa) + %indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1 + %exitcond8 = icmp ne i64 %indvars.iv.next7, 1536 + br i1 %exitcond8, label %for.cond1.preheader, label %for.end12 -for.end12: ; preds = %for.cond +for.end12: ; preds = %for.end ret void } @@ -123,64 +110,62 @@ ; Function Attrs: nounwind uwtable define i32 @main() #0 { entry: - call void @init_array() - br label %for.cond + br label %entry.split -for.cond: ; preds = %for.inc28, %entry - %indvar3 = phi i64 [ %indvar.next4, %for.inc28 ], [ 0, %entry ] - %exitcond6 = icmp ne i64 %indvar3, 1536 - br i1 %exitcond6, label %for.body, label %for.end30 - -for.body: ; preds = %for.cond - br label %for.cond1 - -for.cond1: ; preds = %for.inc25, %for.body - %indvar1 = phi i64 [ %indvar.next2, %for.inc25 ], [ 0, %for.body ] - %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1 - %exitcond5 = icmp ne i64 %indvar1, 1536 - br i1 %exitcond5, label %for.body3, label %for.end27 - -for.body3: ; preds = %for.cond1 +entry.split: ; preds = %entry + tail call void @init_array() + br label %for.cond1.preheader + +for.cond1.preheader: ; preds = %entry.split, %for.inc28 + %indvars.iv7 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next8, %for.inc28 ] + br label %for.body3 + +for.body3: ; preds = %for.cond1.preheader, %for.inc25 + %indvars.iv4 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next5, %for.inc25 ] + %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4 store float 0.000000e+00, float* %arrayidx5, align 4 - br label %for.cond6 + br label %for.body8 -for.cond6: ; preds = %for.inc, %for.body3 - %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body3 ] - %arrayidx16 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar - %arrayidx20 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1 - %exitcond = icmp ne i64 %indvar, 1536 - br i1 %exitcond, label %for.body8, label %for.end - -for.body8: ; preds = %for.cond6 - %0 = load float* %arrayidx5, align 4 - %1 = load float* %arrayidx16, align 4 - %2 = load float* %arrayidx20, align 4 +for.body8: ; preds = %for.body3, %for.body8 + %indvars.iv = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next, %for.body8 ] + %arrayidx12 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4 + %0 = load float, float* %arrayidx12, align 4 + %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %indvars.iv7, i64 %indvars.iv + %1 = load float, float* %arrayidx16, align 4 + %arrayidx20 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv4 + %2 = load float, float* %arrayidx20, align 4 %mul = fmul float %1, %2 %add = fadd float %0, %mul - store float %add, float* %arrayidx5, align 4 - br label %for.inc + %arrayidx24 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4 + store float %add, float* %arrayidx24, align 4 + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 + %exitcond = icmp ne i64 %indvars.iv.next, 1536 + br i1 %exitcond, label %for.body8, label %for.inc25 + +for.inc25: ; preds = %for.body8 + %indvars.iv.next5 = add nuw nsw i64 %indvars.iv4, 1 + %exitcond6 = icmp ne i64 %indvars.iv.next5, 1536 + br i1 %exitcond6, label %for.body3, label %for.inc28 + +for.inc28: ; preds = %for.inc25 + %indvars.iv.next8 = add nuw nsw i64 %indvars.iv7, 1 + %exitcond9 = icmp ne i64 %indvars.iv.next8, 1536 + br i1 %exitcond9, label %for.cond1.preheader, label %for.end30 -for.inc: ; preds = %for.body8 - %indvar.next = add i64 %indvar, 1 - br label %for.cond6 - -for.end: ; preds = %for.cond6 - br label %for.inc25 +for.end30: ; preds = %for.inc28 + ret i32 0 +} -for.inc25: ; preds = %for.end - %indvar.next2 = add i64 %indvar1, 1 - br label %for.cond1 +; Function Attrs: nounwind +declare i64 @fwrite(i8* nocapture, i64, i64, %struct._IO_FILE* nocapture) #2 -for.end27: ; preds = %for.cond1 - br label %for.inc28 +; Function Attrs: nounwind +declare i32 @fputc(i32, %struct._IO_FILE* nocapture) #2 -for.inc28: ; preds = %for.end27 - %indvar.next4 = add i64 %indvar3, 1 - br label %for.cond +attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } +attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } +attributes #2 = { nounwind } -for.end30: ; preds = %for.cond - ret i32 0 -} +!llvm.ident = !{!0} -attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } -attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } +!0 = !{!"clang version 4.0.0 (http://llvm.org/git/clang.git 081569d9a29c7bc827b2d41f8e62891bbc895e2f) (http://llvm.org/git/llvm.git e117e506536626352e8e47f6c72cd6e2a276622c)"} Index: polly/trunk/www/experiments/matmul/matmul.s =================================================================== --- polly/trunk/www/experiments/matmul/matmul.s +++ polly/trunk/www/experiments/matmul/matmul.s @@ -1,5 +1,6 @@ ; ModuleID = 'matmul.c' -target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" +source_filename = "matmul.c" +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] } @@ -7,10 +8,10 @@ @A = common global [1536 x [1536 x float]] zeroinitializer, align 16 @B = common global [1536 x [1536 x float]] zeroinitializer, align 16 -@stdout = external global %struct._IO_FILE* +@stdout = external global %struct._IO_FILE*, align 8 @.str = private unnamed_addr constant [5 x i8] c"%lf \00", align 1 @C = common global [1536 x [1536 x float]] zeroinitializer, align 16 -@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1 +@.str.1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1 ; Function Attrs: nounwind uwtable define void @init_array() #0 { @@ -21,7 +22,7 @@ br label %for.cond for.cond: ; preds = %for.inc17, %entry - %0 = load i32* %i, align 4 + %0 = load i32, i32* %i, align 4 %cmp = icmp slt i32 %0, 1536 br i1 %cmp, label %for.body, label %for.end19 @@ -30,45 +31,45 @@ br label %for.cond1 for.cond1: ; preds = %for.inc, %for.body - %1 = load i32* %j, align 4 + %1 = load i32, i32* %j, align 4 %cmp2 = icmp slt i32 %1, 1536 br i1 %cmp2, label %for.body3, label %for.end for.body3: ; preds = %for.cond1 - %2 = load i32* %i, align 4 - %3 = load i32* %j, align 4 + %2 = load i32, i32* %i, align 4 + %3 = load i32, i32* %j, align 4 %mul = mul nsw i32 %2, %3 %rem = srem i32 %mul, 1024 %add = add nsw i32 1, %rem %conv = sitofp i32 %add to double %div = fdiv double %conv, 2.000000e+00 %conv4 = fptrunc double %div to float - %4 = load i32* %j, align 4 + %4 = load i32, i32* %j, align 4 %idxprom = sext i32 %4 to i64 - %5 = load i32* %i, align 4 + %5 = load i32, i32* %i, align 4 %idxprom5 = sext i32 %5 to i64 - %arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %idxprom5 - %arrayidx6 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom + %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %idxprom5 + %arrayidx6 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom store float %conv4, float* %arrayidx6, align 4 - %6 = load i32* %i, align 4 - %7 = load i32* %j, align 4 + %6 = load i32, i32* %i, align 4 + %7 = load i32, i32* %j, align 4 %mul7 = mul nsw i32 %6, %7 %rem8 = srem i32 %mul7, 1024 %add9 = add nsw i32 1, %rem8 %conv10 = sitofp i32 %add9 to double %div11 = fdiv double %conv10, 2.000000e+00 %conv12 = fptrunc double %div11 to float - %8 = load i32* %j, align 4 + %8 = load i32, i32* %j, align 4 %idxprom13 = sext i32 %8 to i64 - %9 = load i32* %i, align 4 + %9 = load i32, i32* %i, align 4 %idxprom14 = sext i32 %9 to i64 - %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %idxprom14 - %arrayidx16 = getelementptr inbounds [1536 x float]* %arrayidx15, i32 0, i64 %idxprom13 + %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %idxprom14 + %arrayidx16 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx15, i64 0, i64 %idxprom13 store float %conv12, float* %arrayidx16, align 4 br label %for.inc for.inc: ; preds = %for.body3 - %10 = load i32* %j, align 4 + %10 = load i32, i32* %j, align 4 %inc = add nsw i32 %10, 1 store i32 %inc, i32* %j, align 4 br label %for.cond1 @@ -77,7 +78,7 @@ br label %for.inc17 for.inc17: ; preds = %for.end - %11 = load i32* %i, align 4 + %11 = load i32, i32* %i, align 4 %inc18 = add nsw i32 %11, 1 store i32 %inc18, i32* %i, align 4 br label %for.cond @@ -95,7 +96,7 @@ br label %for.cond for.cond: ; preds = %for.inc10, %entry - %0 = load i32* %i, align 4 + %0 = load i32, i32* %i, align 4 %cmp = icmp slt i32 %0, 1536 br i1 %cmp, label %for.body, label %for.end12 @@ -104,47 +105,47 @@ br label %for.cond1 for.cond1: ; preds = %for.inc, %for.body - %1 = load i32* %j, align 4 + %1 = load i32, i32* %j, align 4 %cmp2 = icmp slt i32 %1, 1536 br i1 %cmp2, label %for.body3, label %for.end for.body3: ; preds = %for.cond1 - %2 = load %struct._IO_FILE** @stdout, align 8 - %3 = load i32* %j, align 4 + %2 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8 + %3 = load i32, i32* %j, align 4 %idxprom = sext i32 %3 to i64 - %4 = load i32* %i, align 4 + %4 = load i32, i32* %i, align 4 %idxprom4 = sext i32 %4 to i64 - %arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom4 - %arrayidx5 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom - %5 = load float* %arrayidx5, align 4 + %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom4 + %arrayidx5 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom + %5 = load float, float* %arrayidx5, align 4 %conv = fpext float %5 to double - %call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv) - %6 = load i32* %j, align 4 + %call = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), double %conv) + %6 = load i32, i32* %j, align 4 %rem = srem i32 %6, 80 %cmp6 = icmp eq i32 %rem, 79 br i1 %cmp6, label %if.then, label %if.end if.then: ; preds = %for.body3 - %7 = load %struct._IO_FILE** @stdout, align 8 - %call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %7, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) + %7 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8 + %call8 = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %7, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0)) br label %if.end if.end: ; preds = %if.then, %for.body3 br label %for.inc for.inc: ; preds = %if.end - %8 = load i32* %j, align 4 + %8 = load i32, i32* %j, align 4 %inc = add nsw i32 %8, 1 store i32 %inc, i32* %j, align 4 br label %for.cond1 for.end: ; preds = %for.cond1 - %9 = load %struct._IO_FILE** @stdout, align 8 - %call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %9, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) + %9 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8 + %call9 = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %9, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0)) br label %for.inc10 for.inc10: ; preds = %for.end - %10 = load i32* %i, align 4 + %10 = load i32, i32* %i, align 4 %inc11 = add nsw i32 %10, 1 store i32 %inc11, i32* %i, align 4 br label %for.cond @@ -164,13 +165,13 @@ %k = alloca i32, align 4 %t_start = alloca double, align 8 %t_end = alloca double, align 8 - store i32 0, i32* %retval + store i32 0, i32* %retval, align 4 call void @init_array() store i32 0, i32* %i, align 4 br label %for.cond for.cond: ; preds = %for.inc28, %entry - %0 = load i32* %i, align 4 + %0 = load i32, i32* %i, align 4 %cmp = icmp slt i32 %0, 1536 br i1 %cmp, label %for.body, label %for.end30 @@ -179,61 +180,61 @@ br label %for.cond1 for.cond1: ; preds = %for.inc25, %for.body - %1 = load i32* %j, align 4 + %1 = load i32, i32* %j, align 4 %cmp2 = icmp slt i32 %1, 1536 br i1 %cmp2, label %for.body3, label %for.end27 for.body3: ; preds = %for.cond1 - %2 = load i32* %j, align 4 + %2 = load i32, i32* %j, align 4 %idxprom = sext i32 %2 to i64 - %3 = load i32* %i, align 4 + %3 = load i32, i32* %i, align 4 %idxprom4 = sext i32 %3 to i64 - %arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom4 - %arrayidx5 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom + %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom4 + %arrayidx5 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom store float 0.000000e+00, float* %arrayidx5, align 4 store i32 0, i32* %k, align 4 br label %for.cond6 for.cond6: ; preds = %for.inc, %for.body3 - %4 = load i32* %k, align 4 + %4 = load i32, i32* %k, align 4 %cmp7 = icmp slt i32 %4, 1536 br i1 %cmp7, label %for.body8, label %for.end for.body8: ; preds = %for.cond6 - %5 = load i32* %j, align 4 + %5 = load i32, i32* %j, align 4 %idxprom9 = sext i32 %5 to i64 - %6 = load i32* %i, align 4 + %6 = load i32, i32* %i, align 4 %idxprom10 = sext i32 %6 to i64 - %arrayidx11 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom10 - %arrayidx12 = getelementptr inbounds [1536 x float]* %arrayidx11, i32 0, i64 %idxprom9 - %7 = load float* %arrayidx12, align 4 - %8 = load i32* %k, align 4 + %arrayidx11 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom10 + %arrayidx12 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx11, i64 0, i64 %idxprom9 + %7 = load float, float* %arrayidx12, align 4 + %8 = load i32, i32* %k, align 4 %idxprom13 = sext i32 %8 to i64 - %9 = load i32* %i, align 4 + %9 = load i32, i32* %i, align 4 %idxprom14 = sext i32 %9 to i64 - %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %idxprom14 - %arrayidx16 = getelementptr inbounds [1536 x float]* %arrayidx15, i32 0, i64 %idxprom13 - %10 = load float* %arrayidx16, align 4 - %11 = load i32* %j, align 4 + %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %idxprom14 + %arrayidx16 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx15, i64 0, i64 %idxprom13 + %10 = load float, float* %arrayidx16, align 4 + %11 = load i32, i32* %j, align 4 %idxprom17 = sext i32 %11 to i64 - %12 = load i32* %k, align 4 + %12 = load i32, i32* %k, align 4 %idxprom18 = sext i32 %12 to i64 - %arrayidx19 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %idxprom18 - %arrayidx20 = getelementptr inbounds [1536 x float]* %arrayidx19, i32 0, i64 %idxprom17 - %13 = load float* %arrayidx20, align 4 + %arrayidx19 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %idxprom18 + %arrayidx20 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx19, i64 0, i64 %idxprom17 + %13 = load float, float* %arrayidx20, align 4 %mul = fmul float %10, %13 %add = fadd float %7, %mul - %14 = load i32* %j, align 4 + %14 = load i32, i32* %j, align 4 %idxprom21 = sext i32 %14 to i64 - %15 = load i32* %i, align 4 + %15 = load i32, i32* %i, align 4 %idxprom22 = sext i32 %15 to i64 - %arrayidx23 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom22 - %arrayidx24 = getelementptr inbounds [1536 x float]* %arrayidx23, i32 0, i64 %idxprom21 + %arrayidx23 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom22 + %arrayidx24 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx23, i64 0, i64 %idxprom21 store float %add, float* %arrayidx24, align 4 br label %for.inc for.inc: ; preds = %for.body8 - %16 = load i32* %k, align 4 + %16 = load i32, i32* %k, align 4 %inc = add nsw i32 %16, 1 store i32 %inc, i32* %k, align 4 br label %for.cond6 @@ -242,7 +243,7 @@ br label %for.inc25 for.inc25: ; preds = %for.end - %17 = load i32* %j, align 4 + %17 = load i32, i32* %j, align 4 %inc26 = add nsw i32 %17, 1 store i32 %inc26, i32* %j, align 4 br label %for.cond1 @@ -251,7 +252,7 @@ br label %for.inc28 for.inc28: ; preds = %for.end27 - %18 = load i32* %i, align 4 + %18 = load i32, i32* %i, align 4 %inc29 = add nsw i32 %18, 1 store i32 %inc29, i32* %i, align 4 br label %for.cond @@ -260,5 +261,9 @@ ret i32 0 } -attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } -attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } +attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } +attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } + +!llvm.ident = !{!0} + +!0 = !{!"clang version 4.0.0 (http://llvm.org/git/clang.git 081569d9a29c7bc827b2d41f8e62891bbc895e2f) (http://llvm.org/git/llvm.git e117e506536626352e8e47f6c72cd6e2a276622c)"} Index: polly/trunk/www/experiments/matmul/runall.sh =================================================================== --- polly/trunk/www/experiments/matmul/runall.sh +++ polly/trunk/www/experiments/matmul/runall.sh @@ -3,68 +3,69 @@ echo "--> 1. Create LLVM-IR from C" clang -S -emit-llvm matmul.c -o matmul.s -echo "--> 2. Load Polly automatically when calling the 'opt' tool" -export PATH_TO_POLLY_LIB="~/polly/build/lib/" -alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so" - -echo "--> 3. Prepare the LLVM-IR for Polly" +echo "--> 2. Prepare the LLVM-IR for Polly" opt -S -polly-canonicalize matmul.s > matmul.preopt.ll -echo "--> 4. Show the SCoPs detected by Polly" -opt -basicaa -polly-ast -analyze -q matmul.preopt.ll +echo "--> 3. Show the SCoPs detected by Polly" +opt -basicaa -polly-ast -analyze -q matmul.preopt.ll \ + -polly-process-unprofitable -echo "--> 5.1 Highlight the detected SCoPs in the CFGs of the program" +echo "--> 4.1 Highlight the detected SCoPs in the CFGs of the program" # We only create .dot files, as directly -view-scops directly calls graphviz # which would require user interaction to continue the script. # opt -basicaa -view-scops -disable-output matmul.preopt.ll opt -basicaa -dot-scops -disable-output matmul.preopt.ll -echo "--> 5.2 Highlight the detected SCoPs in the CFGs of the program (print \ +echo "--> 4.2 Highlight the detected SCoPs in the CFGs of the program (print \ no instructions)" # We only create .dot files, as directly -view-scops-only directly calls # graphviz which would require user interaction to continue the script. # opt -basicaa -view-scops-only -disable-output matmul.preopt.ll opt -basicaa -dot-scops-only -disable-output matmul.preopt.ll -echo "--> 5.3 Create .png files from the .dot files" +echo "--> 4.3 Create .png files from the .dot files" for i in `ls *.dot`; do dot -Tpng $i > $i.png; done -echo "--> 6. View the polyhedral representation of the SCoPs" -opt -basicaa -polly-scops -analyze matmul.preopt.ll +echo "--> 5. View the polyhedral representation of the SCoPs" +opt -basicaa -polly-scops -analyze matmul.preopt.ll -polly-process-unprofitable -echo "--> 7. Show the dependences for the SCoPs" -opt -basicaa -polly-dependences -analyze matmul.preopt.ll +echo "--> 6. Show the dependences for the SCoPs" +opt -basicaa -polly-dependences -analyze matmul.preopt.ll \ + -polly-process-unprofitable -echo "--> 8. Export jscop files" -opt -basicaa -polly-export-jscop matmul.preopt.ll +echo "--> 7. Export jscop files" +opt -basicaa -polly-export-jscop matmul.preopt.ll -polly-process-unprofitable -echo "--> 9. Import the updated jscop files and print the new SCoPs. (optional)" -opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll +echo "--> 8. Import the updated jscop files and print the new SCoPs. (optional)" +opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \ + -polly-process-unprofitable opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \ - -polly-import-jscop-postfix=interchanged + -polly-import-jscop-postfix=interchanged -polly-process-unprofitable opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \ - -polly-import-jscop-postfix=interchanged+tiled + -polly-import-jscop-postfix=interchanged+tiled -polly-process-unprofitable opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \ - -polly-import-jscop-postfix=interchanged+tiled+vector + -polly-import-jscop-postfix=interchanged+tiled+vector \ + -polly-process-unprofitable -echo "--> 10. Codegenerate the SCoPs" +echo "--> 9. Codegenerate the SCoPs" opt -basicaa -polly-import-jscop -polly-import-jscop-postfix=interchanged \ - -polly-codegen \ + -polly-codegen -polly-process-unprofitable\ matmul.preopt.ll | opt -O3 > matmul.polly.interchanged.ll opt -basicaa -polly-import-jscop \ -polly-import-jscop-postfix=interchanged+tiled -polly-codegen \ - matmul.preopt.ll | opt -O3 > matmul.polly.interchanged+tiled.ll -opt -basicaa -polly-import-jscop \ + matmul.preopt.ll -polly-process-unprofitable \ + | opt -O3 > matmul.polly.interchanged+tiled.ll +opt -basicaa -polly-import-jscop -polly-process-unprofitable\ -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \ matmul.preopt.ll -polly-vectorizer=polly\ | opt -O3 > matmul.polly.interchanged+tiled+vector.ll -opt -basicaa -polly-import-jscop \ +opt -basicaa -polly-import-jscop -polly-process-unprofitable\ -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \ matmul.preopt.ll -polly-vectorizer=polly -polly-parallel\ | opt -O3 > matmul.polly.interchanged+tiled+vector+openmp.ll opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll -echo "--> 11. Create the executables" +echo "--> 10. Create the executables" llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s \ -o matmul.polly.interchanged.exe llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s \ @@ -80,7 +81,7 @@ llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s \ -o matmul.normalopt.exe -echo "--> 12. Compare the runtime of the executables" +echo "--> 11. Compare the runtime of the executables" echo "time ./matmul.normalopt.exe" time -f "%E real, %U user, %S sys" ./matmul.normalopt.exe Index: polly/trunk/www/experiments/matmul/scops.init_array.dot =================================================================== --- polly/trunk/www/experiments/matmul/scops.init_array.dot +++ polly/trunk/www/experiments/matmul/scops.init_array.dot @@ -1,47 +1,39 @@ digraph "Scop Graph for 'init_array' function" { label="Scop Graph for 'init_array' function"; - Node0x17d4370 [shape=record,label="{entry:\l br label %for.cond\l}"]; - Node0x17d4370 -> Node0x17da5d0; - Node0x17da5d0 [shape=record,label="{for.cond: \l %0 = phi i64 [ %indvar.next2, %for.inc17 ], [ 0, %entry ]\l %exitcond3 = icmp ne i64 %0, 1536\l br i1 %exitcond3, label %for.body, label %for.end19\l}"]; - Node0x17da5d0 -> Node0x17da5f0; - Node0x17da5d0 -> Node0x17da650; - Node0x17da5f0 [shape=record,label="{for.body: \l br label %for.cond1\l}"]; - Node0x17da5f0 -> Node0x17da900; - Node0x17da900 [shape=record,label="{for.cond1: \l %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ]\l %arrayidx6 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %0, i64 %indvar\l %arrayidx16 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %0, i64 %indvar\l %1 = mul i64 %0, %indvar\l %mul = trunc i64 %1 to i32\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %for.body3, label %for.end\l}"]; - Node0x17da900 -> Node0x17da670; - Node0x17da900 -> Node0x17da9a0; - Node0x17da670 [shape=record,label="{for.body3: \l %rem = srem i32 %mul, 1024\l %add = add nsw i32 1, %rem\l %conv = sitofp i32 %add to double\l %div = fdiv double %conv, 2.000000e+00\l %conv4 = fptrunc double %div to float\l store float %conv4, float* %arrayidx6, align 4\l %rem8 = srem i32 %mul, 1024\l %add9 = add nsw i32 1, %rem8\l %conv10 = sitofp i32 %add9 to double\l %div11 = fdiv double %conv10, 2.000000e+00\l %conv12 = fptrunc double %div11 to float\l store float %conv12, float* %arrayidx16, align 4\l br label %for.inc\l}"]; - Node0x17da670 -> Node0x17da8e0; - Node0x17da8e0 [shape=record,label="{for.inc: \l %indvar.next = add i64 %indvar, 1\l br label %for.cond1\l}"]; - Node0x17da8e0 -> Node0x17da900[constraint=false]; - Node0x17da9a0 [shape=record,label="{for.end: \l br label %for.inc17\l}"]; - Node0x17da9a0 -> Node0x17d9e70; - Node0x17d9e70 [shape=record,label="{for.inc17: \l %indvar.next2 = add i64 %0, 1\l br label %for.cond\l}"]; - Node0x17d9e70 -> Node0x17da5d0[constraint=false]; - Node0x17da650 [shape=record,label="{for.end19: \l ret void\l}"]; + Node0x5b5b5a0 [shape=record,label="{entry:\l br label %entry.split\l}"]; + Node0x5b5b5a0 -> Node0x5b5de30; + Node0x5b5de30 [shape=record,label="{entry.split: \l br label %for.cond1.preheader\l}"]; + Node0x5b5de30 -> Node0x5b5de50; + Node0x5b5de50 [shape=record,label="{for.cond1.preheader: \l %indvars.iv5 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next6, %for.inc17 ]\l br label %for.body3\l}"]; + Node0x5b5de50 -> Node0x5b5b570; + Node0x5b5b570 [shape=record,label="{for.body3: \l %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next,\l... %for.body3 ]\l %0 = mul nuw nsw i64 %indvars.iv, %indvars.iv5\l %1 = trunc i64 %0 to i32\l %rem = srem i32 %1, 1024\l %add = add nsw i32 %rem, 1\l %conv = sitofp i32 %add to double\l %div = fmul double %conv, 5.000000e-01\l %conv4 = fptrunc double %div to float\l %arrayidx6 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @A, i64 0, i64 %indvars.iv5, i64 %indvars.iv\l store float %conv4, float* %arrayidx6, align 4\l %2 = mul nuw nsw i64 %indvars.iv, %indvars.iv5\l %3 = trunc i64 %2 to i32\l %rem8 = srem i32 %3, 1024\l %add9 = add nsw i32 %rem8, 1\l %conv10 = sitofp i32 %add9 to double\l %div11 = fmul double %conv10, 5.000000e-01\l %conv12 = fptrunc double %div11 to float\l %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @B, i64 0, i64 %indvars.iv5, i64 %indvars.iv\l store float %conv12, float* %arrayidx16, align 4\l %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l %exitcond = icmp ne i64 %indvars.iv.next, 1536\l br i1 %exitcond, label %for.body3, label %for.inc17\l}"]; + Node0x5b5b570 -> Node0x5b5b570[constraint=false]; + Node0x5b5b570 -> Node0x5b5df30; + Node0x5b5df30 [shape=record,label="{for.inc17: \l %indvars.iv.next6 = add nuw nsw i64 %indvars.iv5, 1\l %exitcond7 = icmp ne i64 %indvars.iv.next6, 1536\l br i1 %exitcond7, label %for.cond1.preheader, label %for.end19\l}"]; + Node0x5b5df30 -> Node0x5b5de50[constraint=false]; + Node0x5b5df30 -> Node0x5b5df90; + Node0x5b5df90 [shape=record,label="{for.end19: \l ret void\l}"]; colorscheme = "paired12" - subgraph cluster_0x17d3a30 { + subgraph cluster_0x5b4bdd0 { label = ""; style = solid; color = 1 - subgraph cluster_0x17d4ec0 { - label = ""; - style = filled; - color = 3 subgraph cluster_0x17d4180 { + subgraph cluster_0x5b4bf50 { + label = "Region can not profitably be optimized!"; + style = solid; + color = 6 + subgraph cluster_0x5b4c0d0 { label = ""; style = solid; color = 5 - Node0x17da900; - Node0x17da670; - Node0x17da8e0; + Node0x5b5b570; } - Node0x17da5d0; - Node0x17da5f0; - Node0x17da9a0; - Node0x17d9e70; + Node0x5b5de50; + Node0x5b5df30; } - Node0x17d4370; - Node0x17da650; + Node0x5b5b5a0; + Node0x5b5de30; + Node0x5b5df90; } } Index: polly/trunk/www/experiments/matmul/scops.main.dot =================================================================== --- polly/trunk/www/experiments/matmul/scops.main.dot +++ polly/trunk/www/experiments/matmul/scops.main.dot @@ -1,65 +1,50 @@ digraph "Scop Graph for 'main' function" { label="Scop Graph for 'main' function"; - Node0x17d21a0 [shape=record,label="{entry:\l call void @init_array()\l br label %for.cond\l}"]; - Node0x17d21a0 -> Node0x17d2020; - Node0x17d2020 [shape=record,label="{for.cond: \l %indvar3 = phi i64 [ %indvar.next4, %for.inc28 ], [ 0, %entry ]\l %exitcond6 = icmp ne i64 %indvar3, 1536\l br i1 %exitcond6, label %for.body, label %for.end30\l}"]; - Node0x17d2020 -> Node0x17d3950; - Node0x17d2020 -> Node0x17da500; - Node0x17d3950 [shape=record,label="{for.body: \l br label %for.cond1\l}"]; - Node0x17d3950 -> Node0x17da760; - Node0x17da760 [shape=record,label="{for.cond1: \l %indvar1 = phi i64 [ %indvar.next2, %for.inc25 ], [ 0, %for.body ]\l %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1\l %exitcond5 = icmp ne i64 %indvar1, 1536\l br i1 %exitcond5, label %for.body3, label %for.end27\l}"]; - Node0x17da760 -> Node0x17db1e0; - Node0x17da760 -> Node0x17db250; - Node0x17db1e0 [shape=record,label="{for.body3: \l store float 0.000000e+00, float* %arrayidx5, align 4\l br label %for.cond6\l}"]; - Node0x17db1e0 -> Node0x17da740; - Node0x17da740 [shape=record,label="{for.cond6: \l %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body3 ]\l %arrayidx16 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar\l %arrayidx20 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %for.body8, label %for.end\l}"]; - Node0x17da740 -> Node0x17da5a0; - Node0x17da740 -> Node0x17da800; - Node0x17da5a0 [shape=record,label="{for.body8: \l %0 = load float* %arrayidx5, align 4\l %1 = load float* %arrayidx16, align 4\l %2 = load float* %arrayidx20, align 4\l %mul = fmul float %1, %2\l %add = fadd float %0, %mul\l store float %add, float* %arrayidx5, align 4\l br label %for.inc\l}"]; - Node0x17da5a0 -> Node0x17da5c0; - Node0x17da5c0 [shape=record,label="{for.inc: \l %indvar.next = add i64 %indvar, 1\l br label %for.cond6\l}"]; - Node0x17da5c0 -> Node0x17da740[constraint=false]; - Node0x17da800 [shape=record,label="{for.end: \l br label %for.inc25\l}"]; - Node0x17da800 -> Node0x17dae20; - Node0x17dae20 [shape=record,label="{for.inc25: \l %indvar.next2 = add i64 %indvar1, 1\l br label %for.cond1\l}"]; - Node0x17dae20 -> Node0x17da760[constraint=false]; - Node0x17db250 [shape=record,label="{for.end27: \l br label %for.inc28\l}"]; - Node0x17db250 -> Node0x17dae80; - Node0x17dae80 [shape=record,label="{for.inc28: \l %indvar.next4 = add i64 %indvar3, 1\l br label %for.cond\l}"]; - Node0x17dae80 -> Node0x17d2020[constraint=false]; - Node0x17da500 [shape=record,label="{for.end30: \l ret i32 0\l}"]; + Node0x5b5c850 [shape=record,label="{entry:\l br label %entry.split\l}"]; + Node0x5b5c850 -> Node0x5b5a440; + Node0x5b5a440 [shape=record,label="{entry.split: \l tail call void @init_array()\l br label %for.cond1.preheader\l}"]; + Node0x5b5a440 -> Node0x5b38cd0; + Node0x5b38cd0 [shape=record,label="{for.cond1.preheader: \l %indvars.iv7 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next8, %for.inc28 ]\l br label %for.body3\l}"]; + Node0x5b38cd0 -> Node0x5b4bd30; + Node0x5b4bd30 [shape=record,label="{for.body3: \l %indvars.iv4 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next5,\l... %for.inc25 ]\l %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l store float 0.000000e+00, float* %arrayidx5, align 4\l br label %for.body8\l}"]; + Node0x5b4bd30 -> Node0x5b38c50; + Node0x5b38c50 [shape=record,label="{for.body8: \l %indvars.iv = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next, %for.body8 ]\l %arrayidx12 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l %0 = load float, float* %arrayidx12, align 4\l %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @A, i64 0, i64 %indvars.iv7, i64 %indvars.iv\l %1 = load float, float* %arrayidx16, align 4\l %arrayidx20 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv4\l %2 = load float, float* %arrayidx20, align 4\l %mul = fmul float %1, %2\l %add = fadd float %0, %mul\l %arrayidx24 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l store float %add, float* %arrayidx24, align 4\l %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l %exitcond = icmp ne i64 %indvars.iv.next, 1536\l br i1 %exitcond, label %for.body8, label %for.inc25\l}"]; + Node0x5b38c50 -> Node0x5b38c50[constraint=false]; + Node0x5b38c50 -> Node0x5b5a290; + Node0x5b5a290 [shape=record,label="{for.inc25: \l %indvars.iv.next5 = add nuw nsw i64 %indvars.iv4, 1\l %exitcond6 = icmp ne i64 %indvars.iv.next5, 1536\l br i1 %exitcond6, label %for.body3, label %for.inc28\l}"]; + Node0x5b5a290 -> Node0x5b4bd30[constraint=false]; + Node0x5b5a290 -> Node0x5b5a340; + Node0x5b5a340 [shape=record,label="{for.inc28: \l %indvars.iv.next8 = add nuw nsw i64 %indvars.iv7, 1\l %exitcond9 = icmp ne i64 %indvars.iv.next8, 1536\l br i1 %exitcond9, label %for.cond1.preheader, label %for.end30\l}"]; + Node0x5b5a340 -> Node0x5b38cd0[constraint=false]; + Node0x5b5a340 -> Node0x5b5a3a0; + Node0x5b5a3a0 [shape=record,label="{for.end30: \l ret i32 0\l}"]; colorscheme = "paired12" - subgraph cluster_0x17d3f30 { + subgraph cluster_0x5b5c970 { label = ""; style = solid; color = 1 - subgraph cluster_0x17d38d0 { + subgraph cluster_0x5b5c5a0 { label = ""; style = filled; - color = 3 subgraph cluster_0x17d3850 { + color = 3 subgraph cluster_0x5b5c9f0 { label = ""; style = solid; color = 5 - subgraph cluster_0x17d37d0 { + subgraph cluster_0x5b5c110 { label = ""; style = solid; color = 7 - Node0x17da740; - Node0x17da5a0; - Node0x17da5c0; + Node0x5b38c50; } - Node0x17da760; - Node0x17db1e0; - Node0x17da800; - Node0x17dae20; + Node0x5b4bd30; + Node0x5b5a290; } - Node0x17d2020; - Node0x17d3950; - Node0x17db250; - Node0x17dae80; + Node0x5b38cd0; + Node0x5b5a340; } - Node0x17d21a0; - Node0x17da500; + Node0x5b5c850; + Node0x5b5a440; + Node0x5b5a3a0; } } Index: polly/trunk/www/experiments/matmul/scops.print_array.dot =================================================================== --- polly/trunk/www/experiments/matmul/scops.print_array.dot +++ polly/trunk/www/experiments/matmul/scops.print_array.dot @@ -1,60 +1,51 @@ digraph "Scop Graph for 'print_array' function" { label="Scop Graph for 'print_array' function"; - Node0x17d2200 [shape=record,label="{entry:\l br label %for.cond\l}"]; - Node0x17d2200 -> Node0x17d4f20; - Node0x17d4f20 [shape=record,label="{for.cond: \l %indvar1 = phi i64 [ %indvar.next2, %for.inc10 ], [ 0, %entry ]\l %exitcond3 = icmp ne i64 %indvar1, 1536\l br i1 %exitcond3, label %for.body, label %for.end12\l}"]; - Node0x17d4f20 -> Node0x17d3680; - Node0x17d4f20 -> Node0x17d9fc0; - Node0x17d3680 [shape=record,label="{for.body: \l br label %for.cond1\l}"]; - Node0x17d3680 -> Node0x17da220; - Node0x17da220 [shape=record,label="{for.cond1: \l %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ]\l %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar\l %j.0 = trunc i64 %indvar to i32\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %for.body3, label %for.end\l}"]; - Node0x17da220 -> Node0x17d9ea0; - Node0x17da220 -> Node0x17da0f0; - Node0x17d9ea0 [shape=record,label="{for.body3: \l %0 = load %struct._IO_FILE** @stdout, align 8\l %1 = load float* %arrayidx5, align 4\l %conv = fpext float %1 to double\l %call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %0, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv)\l %rem = srem i32 %j.0, 80\l %cmp6 = icmp eq i32 %rem, 79\l br i1 %cmp6, label %if.then, label %if.end\l}"]; - Node0x17d9ea0 -> Node0x17d9ec0; - Node0x17d9ea0 -> Node0x17da060; - Node0x17d9ec0 [shape=record,label="{if.then: \l %2 = load %struct._IO_FILE** @stdout, align 8\l %call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l br label %if.end\l}"]; - Node0x17d9ec0 -> Node0x17da060; - Node0x17da060 [shape=record,label="{if.end: \l br label %for.inc\l}"]; - Node0x17da060 -> Node0x17da200; - Node0x17da200 [shape=record,label="{for.inc: \l %indvar.next = add i64 %indvar, 1\l br label %for.cond1\l}"]; - Node0x17da200 -> Node0x17da220[constraint=false]; - Node0x17da0f0 [shape=record,label="{for.end: \l %3 = load %struct._IO_FILE** @stdout, align 8\l %call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %3, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l br label %for.inc10\l}"]; - Node0x17da0f0 -> Node0x17da080; - Node0x17da080 [shape=record,label="{for.inc10: \l %indvar.next2 = add i64 %indvar1, 1\l br label %for.cond\l}"]; - Node0x17da080 -> Node0x17d4f20[constraint=false]; - Node0x17d9fc0 [shape=record,label="{for.end12: \l ret void\l}"]; + Node0x5b5ee00 [shape=record,label="{entry:\l br label %entry.split\l}"]; + Node0x5b5ee00 -> Node0x5b5ee50; + Node0x5b5ee50 [shape=record,label="{entry.split: \l br label %for.cond1.preheader\l}"]; + Node0x5b5ee50 -> Node0x5b5ee70; + Node0x5b5ee70 [shape=record,label="{for.cond1.preheader: \l %indvars.iv6 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next7, %for.end ]\l %0 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l br label %for.body3\l}"]; + Node0x5b5ee70 -> Node0x5b5ee20; + Node0x5b5ee20 [shape=record,label="{for.body3: \l %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next,\l... %for.inc ]\l %1 = phi %struct._IO_FILE* [ %0, %for.cond1.preheader ], [ %5, %for.inc ]\l %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @C, i64 0, i64 %indvars.iv6, i64 %indvars.iv\l %2 = load float, float* %arrayidx5, align 4\l %conv = fpext float %2 to double\l %call = tail call i32 (%struct._IO_FILE*, i8*, ...)\l... @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x\l... i8]* @.str, i64 0, i64 0), double %conv) #2\l %3 = trunc i64 %indvars.iv to i32\l %rem = srem i32 %3, 80\l %cmp6 = icmp eq i32 %rem, 79\l br i1 %cmp6, label %if.then, label %for.inc\l}"]; + Node0x5b5ee20 -> Node0x5b60d10; + Node0x5b5ee20 -> Node0x5b60d70; + Node0x5b60d10 [shape=record,label="{if.then: \l %4 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l %fputc3 = tail call i32 @fputc(i32 10, %struct._IO_FILE* %4)\l br label %for.inc\l}"]; + Node0x5b60d10 -> Node0x5b60d70; + Node0x5b60d70 [shape=record,label="{for.inc: \l %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l %5 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l %exitcond = icmp ne i64 %indvars.iv.next, 1536\l br i1 %exitcond, label %for.body3, label %for.end\l}"]; + Node0x5b60d70 -> Node0x5b5ee20[constraint=false]; + Node0x5b60d70 -> Node0x5b60e10; + Node0x5b60e10 [shape=record,label="{for.end: \l %.lcssa = phi %struct._IO_FILE* [ %5, %for.inc ]\l %fputc = tail call i32 @fputc(i32 10, %struct._IO_FILE* %.lcssa)\l %indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1\l %exitcond8 = icmp ne i64 %indvars.iv.next7, 1536\l br i1 %exitcond8, label %for.cond1.preheader, label %for.end12\l}"]; + Node0x5b60e10 -> Node0x5b5ee70[constraint=false]; + Node0x5b60e10 -> Node0x5b60e70; + Node0x5b60e70 [shape=record,label="{for.end12: \l ret void\l}"]; colorscheme = "paired12" - subgraph cluster_0x17d38f0 { + subgraph cluster_0x5b349a0 { label = ""; style = solid; color = 1 - subgraph cluster_0x17d4030 { - label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; + subgraph cluster_0x5b5c2c0 { + label = "Call instruction: %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2"; style = solid; color = 6 - subgraph cluster_0x17d3fb0 { - label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; + subgraph cluster_0x5b5c240 { + label = "Call instruction: %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2"; style = solid; color = 5 - subgraph cluster_0x17d3f30 { - label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; + subgraph cluster_0x5b34a20 { + label = "Region can not profitably be optimized!"; style = solid; color = 7 - Node0x17d9ea0; - Node0x17d9ec0; + Node0x5b5ee20; + Node0x5b60d10; } - Node0x17da220; - Node0x17da060; - Node0x17da200; + Node0x5b60d70; } - Node0x17d4f20; - Node0x17d3680; - Node0x17da0f0; - Node0x17da080; + Node0x5b5ee70; + Node0x5b60e10; } - Node0x17d2200; - Node0x17d9fc0; + Node0x5b5ee00; + Node0x5b5ee50; + Node0x5b60e70; } } Index: polly/trunk/www/experiments/matmul/scopsonly.init_array.dot =================================================================== --- polly/trunk/www/experiments/matmul/scopsonly.init_array.dot +++ polly/trunk/www/experiments/matmul/scopsonly.init_array.dot @@ -1,47 +1,39 @@ digraph "Scop Graph for 'init_array' function" { label="Scop Graph for 'init_array' function"; - Node0x17d4370 [shape=record,label="{entry}"]; - Node0x17d4370 -> Node0x17d9de0; - Node0x17d9de0 [shape=record,label="{for.cond}"]; - Node0x17d9de0 -> Node0x17d9e40; - Node0x17d9de0 -> Node0x17d9ea0; - Node0x17d9e40 [shape=record,label="{for.body}"]; - Node0x17d9e40 -> Node0x17d9f90; - Node0x17d9f90 [shape=record,label="{for.cond1}"]; - Node0x17d9f90 -> Node0x17d9ff0; - Node0x17d9f90 -> Node0x17da050; - Node0x17d9ff0 [shape=record,label="{for.body3}"]; - Node0x17d9ff0 -> Node0x17d9f00; - Node0x17d9f00 [shape=record,label="{for.inc}"]; - Node0x17d9f00 -> Node0x17d9f90[constraint=false]; - Node0x17da050 [shape=record,label="{for.end}"]; - Node0x17da050 -> Node0x17da200; - Node0x17da200 [shape=record,label="{for.inc17}"]; - Node0x17da200 -> Node0x17d9de0[constraint=false]; - Node0x17d9ea0 [shape=record,label="{for.end19}"]; + Node0x5ae2570 [shape=record,label="{entry}"]; + Node0x5ae2570 -> Node0x5ae4e90; + Node0x5ae4e90 [shape=record,label="{entry.split}"]; + Node0x5ae4e90 -> Node0x5ae4f50; + Node0x5ae4f50 [shape=record,label="{for.cond1.preheader}"]; + Node0x5ae4f50 -> Node0x5ae50e0; + Node0x5ae50e0 [shape=record,label="{for.body3}"]; + Node0x5ae50e0 -> Node0x5ae50e0[constraint=false]; + Node0x5ae50e0 -> Node0x5ae5100; + Node0x5ae5100 [shape=record,label="{for.inc17}"]; + Node0x5ae5100 -> Node0x5ae4f50[constraint=false]; + Node0x5ae5100 -> Node0x5ae4ff0; + Node0x5ae4ff0 [shape=record,label="{for.end19}"]; colorscheme = "paired12" - subgraph cluster_0x17d3a30 { + subgraph cluster_0x5ad2dd0 { label = ""; style = solid; color = 1 - subgraph cluster_0x17d4ec0 { - label = ""; - style = filled; - color = 3 subgraph cluster_0x17d4180 { + subgraph cluster_0x5ad2f50 { + label = "Region can not profitably be optimized!"; + style = solid; + color = 6 + subgraph cluster_0x5ad30d0 { label = ""; style = solid; color = 5 - Node0x17d9f90; - Node0x17d9ff0; - Node0x17d9f00; + Node0x5ae50e0; } - Node0x17d9de0; - Node0x17d9e40; - Node0x17da050; - Node0x17da200; + Node0x5ae4f50; + Node0x5ae5100; } - Node0x17d4370; - Node0x17d9ea0; + Node0x5ae2570; + Node0x5ae4e90; + Node0x5ae4ff0; } } Index: polly/trunk/www/experiments/matmul/scopsonly.main.dot =================================================================== --- polly/trunk/www/experiments/matmul/scopsonly.main.dot +++ polly/trunk/www/experiments/matmul/scopsonly.main.dot @@ -1,65 +1,50 @@ digraph "Scop Graph for 'main' function" { label="Scop Graph for 'main' function"; - Node0x17d3950 [shape=record,label="{entry}"]; - Node0x17d3950 -> Node0x17d21a0; - Node0x17d21a0 [shape=record,label="{for.cond}"]; - Node0x17d21a0 -> Node0x17db9a0; - Node0x17d21a0 -> Node0x17da4f0; - Node0x17db9a0 [shape=record,label="{for.body}"]; - Node0x17db9a0 -> Node0x17da5e0; - Node0x17da5e0 [shape=record,label="{for.cond1}"]; - Node0x17da5e0 -> Node0x17da640; - Node0x17da5e0 -> Node0x17da6a0; - Node0x17da640 [shape=record,label="{for.body3}"]; - Node0x17da640 -> Node0x17da550; - Node0x17da550 [shape=record,label="{for.cond6}"]; - Node0x17da550 -> Node0x17da5b0; - Node0x17da550 -> Node0x17da850; - Node0x17da5b0 [shape=record,label="{for.body8}"]; - Node0x17da5b0 -> Node0x17da8b0; - Node0x17da8b0 [shape=record,label="{for.inc}"]; - Node0x17da8b0 -> Node0x17da550[constraint=false]; - Node0x17da850 [shape=record,label="{for.end}"]; - Node0x17da850 -> Node0x17db930; - Node0x17db930 [shape=record,label="{for.inc25}"]; - Node0x17db930 -> Node0x17da5e0[constraint=false]; - Node0x17da6a0 [shape=record,label="{for.end27}"]; - Node0x17da6a0 -> Node0x17dada0; - Node0x17dada0 [shape=record,label="{for.inc28}"]; - Node0x17dada0 -> Node0x17d21a0[constraint=false]; - Node0x17da4f0 [shape=record,label="{for.end30}"]; + Node0x5abfcf0 [shape=record,label="{entry}"]; + Node0x5abfcf0 -> Node0x5ade060; + Node0x5ade060 [shape=record,label="{entry.split}"]; + Node0x5ade060 -> Node0x5ade0e0; + Node0x5ade0e0 [shape=record,label="{for.cond1.preheader}"]; + Node0x5ade0e0 -> Node0x5ade100; + Node0x5ade100 [shape=record,label="{for.body3}"]; + Node0x5ade100 -> Node0x5ae0020; + Node0x5ae0020 [shape=record,label="{for.body8}"]; + Node0x5ae0020 -> Node0x5ae0020[constraint=false]; + Node0x5ae0020 -> Node0x5ae0080; + Node0x5ae0080 [shape=record,label="{for.inc25}"]; + Node0x5ae0080 -> Node0x5ade100[constraint=false]; + Node0x5ae0080 -> Node0x5adfef0; + Node0x5adfef0 [shape=record,label="{for.inc28}"]; + Node0x5adfef0 -> Node0x5ade0e0[constraint=false]; + Node0x5adfef0 -> Node0x5adff50; + Node0x5adff50 [shape=record,label="{for.end30}"]; colorscheme = "paired12" - subgraph cluster_0x17d3f30 { + subgraph cluster_0x5ad2c80 { label = ""; style = solid; color = 1 - subgraph cluster_0x17d38d0 { + subgraph cluster_0x5ad2e50 { label = ""; style = filled; - color = 3 subgraph cluster_0x17d3850 { + color = 3 subgraph cluster_0x5ad2d00 { label = ""; style = solid; color = 5 - subgraph cluster_0x17d37d0 { + subgraph cluster_0x5ad2dd0 { label = ""; style = solid; color = 7 - Node0x17da550; - Node0x17da5b0; - Node0x17da8b0; + Node0x5ae0020; } - Node0x17da5e0; - Node0x17da640; - Node0x17da850; - Node0x17db930; + Node0x5ade100; + Node0x5ae0080; } - Node0x17d21a0; - Node0x17db9a0; - Node0x17da6a0; - Node0x17dada0; + Node0x5ade0e0; + Node0x5adfef0; } - Node0x17d3950; - Node0x17da4f0; + Node0x5abfcf0; + Node0x5ade060; + Node0x5adff50; } } Index: polly/trunk/www/experiments/matmul/scopsonly.print_array.dot =================================================================== --- polly/trunk/www/experiments/matmul/scopsonly.print_array.dot +++ polly/trunk/www/experiments/matmul/scopsonly.print_array.dot @@ -1,60 +1,51 @@ digraph "Scop Graph for 'print_array' function" { label="Scop Graph for 'print_array' function"; - Node0x17d2200 [shape=record,label="{entry}"]; - Node0x17d2200 -> Node0x17d4f20; - Node0x17d4f20 [shape=record,label="{for.cond}"]; - Node0x17d4f20 -> Node0x17d9fd0; - Node0x17d4f20 -> Node0x17da030; - Node0x17d9fd0 [shape=record,label="{for.body}"]; - Node0x17d9fd0 -> Node0x17da120; - Node0x17da120 [shape=record,label="{for.cond1}"]; - Node0x17da120 -> Node0x17da180; - Node0x17da120 -> Node0x17da1e0; - Node0x17da180 [shape=record,label="{for.body3}"]; - Node0x17da180 -> Node0x17da090; - Node0x17da180 -> Node0x17da0f0; - Node0x17da090 [shape=record,label="{if.then}"]; - Node0x17da090 -> Node0x17da0f0; - Node0x17da0f0 [shape=record,label="{if.end}"]; - Node0x17da0f0 -> Node0x17da390; - Node0x17da390 [shape=record,label="{for.inc}"]; - Node0x17da390 -> Node0x17da120[constraint=false]; - Node0x17da1e0 [shape=record,label="{for.end}"]; - Node0x17da1e0 -> Node0x17d9e40; - Node0x17d9e40 [shape=record,label="{for.inc10}"]; - Node0x17d9e40 -> Node0x17d4f20[constraint=false]; - Node0x17da030 [shape=record,label="{for.end12}"]; + Node0x5ae5e30 [shape=record,label="{entry}"]; + Node0x5ae5e30 -> Node0x5ae5f50; + Node0x5ae5f50 [shape=record,label="{entry.split}"]; + Node0x5ae5f50 -> Node0x5ae7d90; + Node0x5ae7d90 [shape=record,label="{for.cond1.preheader}"]; + Node0x5ae7d90 -> Node0x5ae7f20; + Node0x5ae7f20 [shape=record,label="{for.body3}"]; + Node0x5ae7f20 -> Node0x5ae7f40; + Node0x5ae7f20 -> Node0x5ae7f60; + Node0x5ae7f40 [shape=record,label="{if.then}"]; + Node0x5ae7f40 -> Node0x5ae7f60; + Node0x5ae7f60 [shape=record,label="{for.inc}"]; + Node0x5ae7f60 -> Node0x5ae7f20[constraint=false]; + Node0x5ae7f60 -> Node0x5ae7e30; + Node0x5ae7e30 [shape=record,label="{for.end}"]; + Node0x5ae7e30 -> Node0x5ae7d90[constraint=false]; + Node0x5ae7e30 -> Node0x5ae8110; + Node0x5ae8110 [shape=record,label="{for.end12}"]; colorscheme = "paired12" - subgraph cluster_0x17d38f0 { + subgraph cluster_0x5abb9a0 { label = ""; style = solid; color = 1 - subgraph cluster_0x17d4030 { - label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; + subgraph cluster_0x5ae32c0 { + label = "Call instruction: %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2"; style = solid; color = 6 - subgraph cluster_0x17d3fb0 { - label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; + subgraph cluster_0x5ae3240 { + label = "Call instruction: %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2"; style = solid; color = 5 - subgraph cluster_0x17d3f30 { - label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; + subgraph cluster_0x5abba20 { + label = "Region can not profitably be optimized!"; style = solid; color = 7 - Node0x17da180; - Node0x17da090; + Node0x5ae7f20; + Node0x5ae7f40; } - Node0x17da120; - Node0x17da0f0; - Node0x17da390; + Node0x5ae7f60; } - Node0x17d4f20; - Node0x17d9fd0; - Node0x17da1e0; - Node0x17d9e40; + Node0x5ae7d90; + Node0x5ae7e30; } - Node0x17d2200; - Node0x17da030; + Node0x5ae5e30; + Node0x5ae5f50; + Node0x5ae8110; } }