This is an archive of the discontinued LLVM Phabricator instance.

[Polly] [PATCH] Annotation of SIMD loops
ClosedPublic

Authored by gareevroman on Nov 8 2015, 10:43 PM.

Details

Summary

Use 'mark' nodes to annotate a SIMD loop during ScheduleTransformation and skip parallelism checks.

The buildbot shows the following compile/execution time changes:

Compile time:
  Improvements     Δ             Previous     Current   σ
  …/gesummv  	-6.06%	0.2640	  0.2480	 0.0055
  …/gemver	        -4.46%	0.4480	  0.4280	 0.0044
  …/covariance	-4.31%	0.8360	  0.8000	 0.0065
  …/adi                  -3.23%	0.9920	  0.9600	 0.0065
  …/doitgen	        -2.53%	0.9480	  0.9240	 0.0090
  …/3mm               -2.33%	1.0320	  1.0080	 0.0087

Execution time:
  Regressions      Δ             Previous     Current    σ
  …/viterbi	      1.70%     5.1840	5.2720	0.0074
  …/smallpt	      1.06%     12.4920	12.6240	0.0040

Diff Detail

Repository
rL LLVM

Event Timeline

gareevroman updated this revision to Diff 39666.
gareevroman retitled this revision from to [PATCH] Annotation of SIMD loops.
gareevroman updated this object.
gareevroman added a subscriber: pollydev.

Process ‘mark nodes’ in IslAst::astBuildBeforeFor and IslAst::astBuildAfterFor

jdoerfert edited edge metadata.Nov 9 2015, 2:26 PM

Could you update the diff using a bigger (the best is full) context as I am a little bit lost otherwise. If you use arcanist (the command line tool of phabricator) or follow the guide here it should be possible to look not only at the changes but also the surrounding code in the web interface.

lib/CodeGen/IslAst.cpp
128 ↗(On Diff #39666)

Can't we use the InParallelFor instead of a new flag? That would mean we would genelize that mark to express parallelism not necessarily SIMD parallelism.

In geneal I think 4 flags seem a bit much for the amount of information we want to get accross. Maybe we stick with this new one and can rid of some old ones instead.

grosser edited edge metadata.Nov 10 2015, 12:11 AM

Hi Roman,

thanks for the patch. Besides Johannes' comment, I just have a couple of smaller things that I would like to get feedback on.

Best,
Tobias

Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip parallelism checks.

However, I haven’t yet decided where we should process ‘mark nodes’. I would be very grateful, if you help me to do this.

The first variant uses IslNodeBuilder::createMark for this purpose and helps to skip parallelism checks in IslAst::astBuildBeforeFor and IslAst::astBuildAfterFor.

Seems good.

In my opinion, this approach makes the following problems:

We can't insert an isl_ast_node between isl_ast_node_mark with ‘SIMD mark’ and its isl_ast_node_for child. Otherwise, we should search for its isl_ast_node_for child in

IslNodeBuilder::createMark. Probably, this is not critical now and can be fixed somehow.

Sorry, I do not understand what you describe here. Could you try to reformulate?

If I’m not mistaken, there is no isl_ast_print_options_set_print for mark nodes. That’s why we have to modify IslAstInfo to be able print «#pragma simd» in IslAst::cbPrintFor.

I am fine with changing #pragma SIMD to SIMD.

The second approach is to process ‘mark nodes’ in IslAst::astBuildBeforeFor and IslAst::astBuildAfterFor. (Its implementation can be found in an updated revision)

lib/CodeGen/IslNodeBuilder.cpp
368 ↗(On Diff #39665)

You asked in a private email if we should also do something special to allow LLVM's Loop vectorizer to be used when using the option 'stripmine'. I think we indeed should do so, by calling createForSequantial and by passing to it the information that this for loop is parallel.

Currently createForSequential checks itself for parallelism:

Parallel =                                                                     
    IslAstInfo::isParallel(For) && !IslAstInfo::isReductionParallel(For);

We can probably add an optional parameter that skips these checks if isParallel is set.

grosser retitled this revision from [PATCH] Annotation of SIMD loops to [Polly[ [PATCH] Annotation of SIMD loops.Nov 10 2015, 12:13 AM
grosser edited edge metadata.
grosser retitled this revision from [Polly[ [PATCH] Annotation of SIMD loops to [Polly] [PATCH] Annotation of SIMD loops.

Hi Roman,

thanks for the patch. Besides Johannes' comment, I just have a couple of smaller things that I would like to get feedback on.

Thank you for the comments!

Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip parallelism checks.

However, I haven’t yet decided where we should process ‘mark nodes’. I would be very grateful, if you help me to do this.

The first variant uses IslNodeBuilder::createMark for this purpose and helps to skip parallelism checks in IslAst::astBuildBeforeFor and IslAst::astBuildAfterFor.

Seems good.

In my opinion, this approach makes the following problems:

We can't insert an isl_ast_node between isl_ast_node_mark with ‘SIMD mark’ and its isl_ast_node_for child. Otherwise, we should search for its isl_ast_node_for child in

IslNodeBuilder::createMark. Probably, this is not critical now and can be fixed somehow.

Sorry, I do not understand what you describe here. Could you try to reformulate?

I think that now we can access an isl_ast_node_for from the isl_ast_node_mark with ‘SIMD mark’ without problems. However, it’ll probably difficult, if, in future, we want to implement some additional functionality by inserting new mark nodes between the isl_ast_node_mark with ‘SIMD mark’ and the isl_ast_node_for. They may also require additional processing, which prevents us from skipping them.

If I’m not mistaken, there is no isl_ast_print_options_set_print for mark nodes. That’s why we have to modify IslAstInfo to be able print «#pragma simd» in IslAst::cbPrintFor.

I am fine with changing #pragma SIMD to SIMD.

OK. I’m also fine with this.

lib/CodeGen/IslAst.cpp
128 ↗(On Diff #39666)

Thank you for the comments! I’ll try to do something with this.

gareevroman added inline comments.Nov 10 2015, 6:36 AM
lib/CodeGen/IslNodeBuilder.cpp
368 ↗(On Diff #39665)

Thanks! I think it would be good.

In a new version we use the InParallelFor and the PollyVectorizerChoice instead of a new flag. I think that if the PollyVectorizerChoice isn’t equal to the VECTORIZER_NONE, we could skip testing for parallelism of innermost loops. Maybe I’m missing something.

However, we fail the /test/ScopInfo/stride_detection.ll and the /isl/CodeGen/. If I’m not mistaken, the first one has a wrong jscop, which produces

#pragma known-parallel
for (int c0 = 0; c0 <= 31; c0 += 1)
  for (int c1 = 0; c1 <= floord(nk - 1, 32); c1 += 1)
    for (int c2 = 0; c2 <= 7; c2 += 1)
      for (int c3 = 0; c3 <= min(31, nk - 32 * c1 - 1); c3 += 1)
        for (int c4 = 0; c4 <= 3; c4 += 1)
          Stmt_for_body_3(32 * c0 + 4 * c2 + c4, 32 * c1 + c3);

else

{  /* original code */ }

instead of

// 1st level tiling - Tiles
#pragma known-parallel
for (int c0 = 0; c0 <= 31; c0 += 1)
  for (int c1 = 0; c1 <= floord(nk - 1, 32); c1 += 1) {
    // 1st level tiling - Points
    for (int c2 = 0; c2 <= 7; c2 += 1)
      for (int c3 = 0; c3 <= min(31, nk - 32 * c1 - 1); c3 += 1) {
        // SIMD
        for (int c4 = 0; c4 <= 3; c4 += 1)
          Stmt_for_body_3(32 * c0 + 4 * c2 + c4, 32 * c1 + c3);
      }
  }

else

{  /* original code */ }

The second one doesn’t use polly-opt-isl, which prevents Polly from generating ‘SIMD’ mark nodes. I’ve used -polly-opt-isl to fix it.

@grosser, @jdoerfert, should we rewrite them to use jscops?

grosser accepted this revision.Feb 21 2016, 11:18 AM
grosser edited edge metadata.

LGTM.

You still need to update the commit message. If you could possibly add some compile time numbers to highlight the benefit of this reduction that would be great.

This revision is now accepted and ready to land.Feb 21 2016, 11:18 AM
gareevroman updated this object.
gareevroman edited edge metadata.

I’ve updated the commit message and fixed IslNodeBuilder::createMark to handle a situation when a child node of a 'SIMD mark' is a loop that has a single iteration and isl optimizes it away.

Closed by commit rL261620: Annotation of SIMD loops (authored by romangareev). · Explain WhyFeb 23 2016, 1:04 AM
This revision was automatically updated to reflect the committed changes.