Page MenuHomePhabricator

[Polly] [RFC] Calculate AST expression type
Needs ReviewPublic

Authored by maxf on Jul 16 2017, 7:35 PM.

Details

Reviewers
grosser
bollu
Summary

(This is a work-in-progress, feedback is welcome).
Until now, Polly always uses 64 bits for each expression as there is no
way to figure out the correct type. This patch changes the behavour:

  • (isl) At AST generation time the type of each AST expression is calculated. In the base case, this is the exact minimal type, optionally types can be approximated to reduce compile-time overhead
  • (isl) During this phase, we now take the concept of the 'biggest native type' into account. The biggest native type (fixed 64 bit atm) is the largest type we *want* to generate. Whenever an expression exceeds this size, a precondition is generated that, when satisfied, assures that the biggest native type is enough to hold said expression

Polly then uses this information to:

  • Take all preconditions into account when creating the run condition of a SCoP
  • Use a correct type for each expression. The default behaviour at the moment is to still use 64 bits per default to save on sext's.

The isl part of this patch has been partially discussed previously on
the isl mailinglist. Before upstreaming this into isl, we need to agree
on an interface, therefore it seems resonable to also put this on
Phabricator.

Event Timeline

maxf created this revision.Jul 16 2017, 7:35 PM
maxf added a comment.Jul 16 2017, 7:40 PM

Note: Modifications in codegen.c are only for debugging.

lib/External/isl/codegen.c
304

The changes in this file are for debugging only and will be removed for the final version.

grosser edited edge metadata.Jul 16 2017, 11:00 PM

Hi Maximilian,

thank you for pushing your WIP-patch. I know it still must be cleaned up, but here some comments which would make it easier to reason about the current patch:

  1. Many test cases fail because we print casts when printing the generated AST. Can you disable the printing of these casts? They only distract and removing them makes it a lot easier to understand what is still failing,
  1. In ScopInfo/int2ptr_ptr2int.ll and some others the IR is changed. Do you have an idea why? I would expect for us to not yet change the IR, no?

Best,
Tobias

include/polly/CodeGen/IslAst.h
86

Could you add documentation?

include/polly/CodeGen/IslExprBuilder.h
188

Could you add documentation?

265

Could you add documentation?

(I know there is not too much documentation around, but as this function is doing something rather special it would be good to document it).

322

Could you add documentation?

lib/CodeGen/IslAst.cpp
426

Could you add flags to be able to control all configurations?

428

Just write:

isl_options_set_ast_build_compute_bounds(Ctx, ComputeBounds);

without the if-condition around.

517

Would it make sense to use `isl_ast_node_foreach_descendant_top_down```?

lib/CodeGen/IslExprBuilder.cpp
20

No need to include iostream, just use llvm::errs()

736

`llvm::errs() <<` is what we use by default.

lib/CodeGen/IslNodeBuilder.cpp
550

descending

lib/External/isl/isl_ast_graft_private.h
102 ↗(On Diff #106830)

Unrelated

lib/External/isl/isl_options_private.h
62

Alignment?

maxf planned changes to this revision.Jul 17 2017, 4:18 PM

Hello Tobias,

as for (1): Hmm, I can do that, at least for now. I'd rather have them in in the final version so that we also test this feature whenever possible, but I guess it makes sense to differ between tests that fail and tests that only fail due to format changes now. I'd disable printing bounds in ISL_C format for now and put it back in the final version, along with fixed testcases, if that's okay with you?

for (2): Not exactly. There are two reasons why the given test fails:

  • We already generate a (more restrictive) precondition, the failure type you observed can be introduced because the register numbers increase because we codegenerate the precondition in front of the loop which now needs more registers.
  • We only use 64 bits when we can proof that it will be enough. In the test case you indicated this seems not to be possible inside the precondition as the inputs are already 64 bit integers if I read the test correctly.

Best,

Maximilian

lib/CodeGen/IslAst.cpp
517

Ah, I didn't know about that one. Well, this function visits all AST nodes and the expression trees they reference (that's why the method is overloaded), so in order to use isl_ast_node_foreach_descendant_top_down I'd still need a function that does a big switch on the node statement, so it wouldn't make the code much shorter.

Just out of curiosity, are we supposed to change the isl inside polly? what is the relationship between this isl and the "upstream" isl?

maxf added a comment.Jul 17 2017, 4:38 PM

No, this is not supposed to only change isl inside of polly. Among the things that were noted when I originally wanted to upstream the code into isl is a lack of motivation for the interface, since at that time it was a big patch to isl only. After discussion with Sven, we agreed on changing the original interface (store information in AST (expression) nodes instead of directly inside expressions) but before cleaning up and preparing another round of patches and resubmitting it seems like a good idea to show the whole code in one place so that we can agree on what and how we'd like isl to provide.

In D35471#812102, @maxf wrote:

Hello Tobias,

as for (1): Hmm, I can do that, at least for now. I'd rather have them in in the final version so that we also test this feature whenever possible, but I guess it makes sense to differ between tests that fail and tests that only fail due to format changes now. I'd disable printing bounds in ISL_C format for now and put it back in the final version, along with fixed testcases, if that's okay with you?

I personally believe the printed types in Polly make the output more difficult to read. I would prefer to not have the types there everywhere.

for (2): Not exactly. There are two reasons why the given test fails:

  • We already generate a (more restrictive) precondition, the failure type you observed can be introduced because the register numbers increase because we codegenerate the precondition in front of the loop which now needs more registers.
  • We only use 64 bits when we can proof that it will be enough. In the test case you indicated this seems not to be possible inside the precondition as the inputs are already 64 bit integers if I read the test correctly.

I see. Interesting.

Best,
Tobias

lib/CodeGen/IslAst.cpp
517

If it does not simplify the code, feel free to leave it as it is.

maxf updated this revision to Diff 107192.Jul 18 2017, 3:26 PM

First round of cleanups:

  • Down to four test cases that fail code generation
  • Adressed most comments
  • Added an option to isl to not print bounds on values that polly enables per default

Nice. I will have a look immediately. I am also at ETH tomorrow in case you wanna chat.

Hi Maximilian,

here another round of reviews:

It seems we loose some "braces":

/home/grosser/Projects/polly/git/tools/polly/test/ScheduleOptimizer/pattern-matching-based-opts_5.ll:100:15: error: expected string not found in input
; CHECK-NEXT: Stmt_for_body6(-((ni + 4) % 4) + ni + c6, 2048 * c0 + 8 * c3 + c7, 256 * c1 + c5);
              ^
<stdin>:80:2: note: scanning from here
 Stmt_for_body6(-ni + 4 % 4 + ni + c6, 2048 * c0 + 8 * c3 + c7, 256 * c1 + c5);
lib/CodeGen/IslAst.cpp
546

You are syntactically connecting preconditions with a "&&". This does not allow for any simplifications. Instead, I suggest to intersect Cond, and only at the very end to build preconditions.

lib/CodeGen/IslExprBuilder.cpp
804

Please use full types, rather than 'auto'. (except for iterators or where the type is obvious)

lib/External/isl/isl_ast_codegen.c
1649

Unrelated change.

More comments:

  • Can you check if the constructors miss some isl cleanups.
  • There are more isl print output changes
maxf updated this revision to Diff 107784.Jul 22 2017, 6:27 AM

Another round of cleanups:

  • Excluding GPU tests, we're down to 9 unexpected failures, two of which are miscompiles
  • isl printer output fixed
  • Usage of computed bounds disabled for now until as they will result in at least some changes to our test suite.

Hi Maximilian,

this is very cool! Good work!

Best,
Tobias

maxf updated this revision to Diff 108071.Jul 25 2017, 6:52 AM
maxf marked 12 inline comments as done.

Another round of fixes:

  • Fix last miscompiles in polly unittests. This brings us down to two testcases where the generated IR changed (GPU test cases excluded).
  • Prepare PPCGCodeGen. It's still not working (memory accesses need to be handled), but move initialization of options on isl_ctx so that they are set early enough for this. Also fix the addditional RTC generated by PPCGCodeGen.
  • Small cleanups
maxf marked 2 inline comments as done.Jul 25 2017, 6:55 AM
maxf updated this revision to Diff 108846.Jul 30 2017, 6:16 PM
maxf marked an inline comment as done.

Another round of fixes:

  • All unit tests pass now. The last remaining test was changed to reflect the newly generated preconditions
  • Types are now limited to 63 bits to leave space for the sign bit
  • Fixed preconditions meaning the exact opposite of what was intended
  • Generate preconditions and parts of polly-acc generated code (e.g. host <-> device memory transfer) using fixed 64 bits
  • More code cleanup

Hi Maximilian,

this looks indeed very good! Thanks for pushing this forward! I did a first round with stylistic comments.

Could you also please add test cases for some of the code generation changes. I will do another round with comments on the content of the patch itself soonish.

include/polly/CodeGen/IslExprBuilder.h
188

preceeded

194

Add "." at end of sentence?

270

THE inner

lib/CodeGen/IslAst.cpp
525

No braces around single statement bodies.

lib/CodeGen/IslExprBuilder.cpp
737

No need for else, If the previous branch condition holds, we anyhow assert.

744

"." at end of sentence. Also, can you check the grammar. Something seems wrong.

lib/CodeGen/IslNodeBuilder.cpp
95

I removed this in r309533. Thank you for spotting!

552

What is the "compute case"?

lib/External/isl/isl_ast_build_expr.c
2757

Indentation?

lib/External/isl/isl_ast_build_private.h
208

Indentation?

lib/External/isl/isl_ast_codegen.c
1441

Unrelated change.

Another set of smaller changes.

I also have another more higher-level comment for this. Could you possibly add a small set of test cases that focuses on:

  1. How we derive run-time conditions when generating 64 bit computations
  2. How we actually generate minimal types (maybe add a mode for this?)
include/polly/CodeGen/IslAst.h
85

Why are there two whitespaces after the parameter names?

100

Why are there two whitespaces after the parameter names?

include/polly/CodeGen/IslExprBuilder.h
173

"for e.g." -> "for example"

Now, instead of describing the user of this (which might change and we then certainly won't update this comment), please describe what the change to the generated code here is.

lib/CodeGen/IslAst.cpp
541

No braces around single-assignment bodies.

Just out of curiosity, did you compare the results (compile time, runtime, number of required overflow tracking intrinsics,...) to the simple bit-width tracking that was implemented in the IslExpr builder?

Hi Johannes,

AFAIR you talk about: https://llvm.org/svn/llvm-project/polly/trunk@271878

Thanks for initiating a discussion which we should have had a year ago. Maximilian is about to generate some numbers / example test cases. We will get back to you soonish.

Best,
Tobias

Hi Johannes,

just to clarify. I never said your approach is wrong or has an entirely wrong design, but said that a similar approach caused issues in graphite, hence I would like to understand your patch in detail before going this road. I still don't fully understand how this works, but tried some simple examples with https://llvm.org/svn/llvm-project/polly/trunk@271878. Maybe you can cross-check if what I observe is the intended behavior of your patch.

define void @foo32(float* %A, i32 %n) {
entry:
  br label %loop

loop:
  %indvar = phi i32 [0, %entry], [%indvar.next, %loop]
  %indvar.next = add nsw i32 %indvar, 1
  %cmp = icmp slt i32 %indvar, %n
  store float 1.0, float* %A
  br i1 %cmp, label %loop, label %exit

exit:
  ret void
}

define void @foo64(float* %A, i64 %n) {
entry:
  br label %loop

loop:
  %indvar = phi i64 [0, %entry], [%indvar.next, %loop]
  %indvar.next = add nsw i64 %indvar, 1
  %cmp = icmp slt i64 %indvar, %n
  store float 1.0, float* %A
  br i1 %cmp, label %loop, label %exit

exit:
  ret void
}

when I change the schedule from [i] -> [i + 1024] (using jscop to simulate a scheduler which decides to perform skewing) I get the following ASTs:

foo32:
if (1)

    {
      for (int c0 = 1024; c0 <= n + 1024; c0 += 1)
        Stmt_loop(c0 - 1024);
      if (n <= -1)
        Stmt_loop(0);
    }

foo64:

if (1)


    {
      for (int c0 = 1024; c0 <= n + 1024; c0 += 1)
        Stmt_loop(c0 - 1024);
      if (n <= -1)
        Stmt_loop(0);
    }

The code that is generated for foo32 to model the addition n + 1024 is:

%0 = sext i32 %n to i33
 %1 = add nsw i33 %0, 1024

This seems to be correct. Also, in general, I feel that generating non power-of-two types is something we should better avoid.

The code that is generated for foo64 to model the addition n + 1024 is:

%0 = add nsw i64 %n, 1024

but it seems no run-time bounds checks are generated. Your commit message says: " If the type adjustment is not possible, thus the necessary type is bigger than the type value of --polly-max-expr-bit-width, we will use assumptions to verify the computation will not wrap." hence I would expect a RTC to be generated. Is the behavior I observe the expected behavior?

I never said your approach is wrong or has an entirely wrong design, but said that a similar approach caused issues in graphite,

I am pretty sure you did say exactly that.

hence I would like to understand your patch in detail before going this road. I still don't fully understand how this works, but tried some simple examples with https://llvm.org/svn/llvm-project/polly/trunk@271878. Maybe you can cross-check if what I observe is the intended behavior of your patch.

I am not sure what this is supposed to achieve. I did (~1.5 years ago) try to get this in in order to improve performance but now I have a lot of other things on my plate. I asked if you compared yourself against this but it's OK if you feel that is impossible because the patch is hard to understand.

%0 = sext i32 %n to i33
 %1 = add nsw i33 %0, 1024

This seems to be correct. Also, in general, I feel that generating non power-of-two types is something we should better avoid.

That could have been done "virtually" but the discussion didn't come up.

The code that is generated for foo64 to model the addition n + 1024 is:

%0 = add nsw i64 %n, 1024

but it seems no run-time bounds checks are generated. Your commit message says: " If the type adjustment is not possible, thus the necessary type is bigger than the type value of --polly-max-expr-bit-width, we will use assumptions to verify the computation will not wrap." hence I would expect a RTC to be generated. Is the behavior I observe the expected behavior?

To be honest, I don't remember.

I never said your approach is wrong or has an entirely wrong design, but said that a similar approach caused issues in graphite,

I am pretty sure you did say exactly that.

hence I would like to understand your patch in detail before going this road. I still don't fully understand how this works, but tried some simple examples with https://llvm.org/svn/llvm-project/polly/trunk@271878. Maybe you can cross-check if what I observe is the intended behavior of your patch.

I am not sure what this is supposed to achieve. I did (~1.5 years ago) try to get this in in order to improve performance but now I have a lot of other things on my plate. I asked if you compared yourself against this but it's OK if you feel that is impossible because the patch is hard to understand.

It is hard to compare -- quality-wise -- if I am not certain about how the patch works. We will at some point need to solve this problem in LLVM and if you have a good solution, I would like to mention this in my presentation.

%0 = sext i32 %n to i33
 %1 = add nsw i33 %0, 1024

This seems to be correct. Also, in general, I feel that generating non power-of-two types is something we should better avoid.

That could have been done "virtually" but the discussion didn't come up.

Unfortunately, we did not manage to have one.

What do you mean by 'virtually'? Keeping track of the type information on the side and then pushing things out. Sure, this could have been done.

The code that is generated for foo64 to model the addition n + 1024 is:

%0 = add nsw i64 %n, 1024

but it seems no run-time bounds checks are generated. Your commit message says: " If the type adjustment is not possible, thus the necessary type is bigger than the type value of --polly-max-expr-bit-width, we will use assumptions to verify the computation will not wrap." hence I would expect a RTC to be generated. Is the behavior I observe the expected behavior?

To be honest, I don't remember.

That's unfortunate as this is the point I am very unclear about and I fail to find the corresponding code. Maybe you want to have a brief look. The patch is rather short and likely for you easy to understand.

Best,
Tobias

bollu added inline comments.Sep 29 2017, 8:27 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1029

Can we create a wrapper that does this? Example,

template<typename FTy>
auto scopeFixedSizeMode(FTy F, IslExprBuilder &ExprBuilder) -> decltype(FTy()) typename std::enable_if<!std::is_same<decltype(std::declval<FTy>()()), void>::value,
                 decltype(std::declval<FTy>()())>::type
scopedFixedSizeMode(FTy F) {
    ExprBuilder.setBuildMode(true);
    auto Ret = F();
    ExprBuilder.setBuildMode(false);
    return Ret;
}

template <typename FTy>
typename std::enable_if<std::is_same<decltype(std::declval<FTy>()()), void>::value, void>::type
scopedFixedSizeMode(FTy F) {
    ExprBuilder.setBuildMode(true);
    F();
    ExprBuilder.setBuildMode(false);
}

The wrapper right now is crazy, but there has to be a way for this to be more "canonical".

philip.pfaffe added inline comments.
lib/CodeGen/PPCGCodeGeneration.cpp
1029

Here's a way to express the same thing more nicely:

template<typename T> T Foo(std::function<T()> F) {                              
  auto R = F();                                                                 
  return R;                                                                     
}                                                                               
                                                                                
void Foo(std::function<void()> F) {                                             
  F();                                                                          
}
bollu added inline comments.Sep 29 2017, 8:33 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1029

This should let us write

ExprBuilder.setFixedSizeMode(true);
Value *NumElements = ExprBuilder.create(Res);
ExprBuilder.setFixedSizeMode(false);

as:

Value *NumElements = scopedFixedSizeMode([&] () { ExprBuilder.create(Res); });

I primarily want this because it's much less error prone.

maxf updated this revision to Diff 117294.Oct 1 2017, 3:20 PM
maxf marked 17 inline comments as done.

More cleanups. Also fixed a pathologic edge case...

maxf added inline comments.Oct 1 2017, 3:22 PM
lib/CodeGen/PPCGCodeGeneration.cpp
1029

Sounds good. We should probably also think about doing this for setTrackOverflow (which originally convinced me to use a simple flag for this).