Page MenuHomePhabricator

Use modulo semantic to generate non-integer-overflow assumptions
ClosedPublic

Authored by jdoerfert on Aug 12 2015, 4:20 AM.

Details

Summary
This will allow to generate non-wrap assumptions for integer expressions
that are part of the SCoP. We compare the common isl representation of
the expression with one computed with modulo semantic. For all parameter
combinations they are not equal we can have integer overflows.
  
The nsw flags are respected when the modulo representation is computed,
nuw and nw flags are ignored for now.
  
In order to not increase compile time to much, the non-wrap assumptions
are collected in a separate boundary context instead of the assumed
context. This helps compile time as the boundary context can become
complex and it is therefor not advised to use it in other operations
except runtime check generation. However, the assumed context is e.g.,
used to tighten dependences. While the boundary context might help to
tighten the assumed context it is doubtful that it will help in practice
(it does not effect lnt much) as the boundary (or no-wrap assumptions)
only restrict the very end of the possible value range of parameters.

Diff Detail

Event Timeline

jdoerfert updated this revision to Diff 31924.Aug 12 2015, 4:20 AM
jdoerfert retitled this revision from to Use modulo semantic to generate non-integer-overflow assumptions.
jdoerfert added reviewers: grosser, Meinersbur.
jdoerfert updated this object.
jdoerfert added subscribers: Restricted Project, pollydev, llvm-commits.
grosser edited edge metadata.Aug 12 2015, 7:03 AM

I did not had a detailed look yet, but wondered what the LNT results for
this patch looked like? Did this pass all tests and did we see a compile
time impact?

Tobias

I did not had a detailed look yet, but wondered what the LNT results for
this patch looked like? Did this pass all tests and did we see a compile
time impact?

It should pass all lnt tests but I'll run lnt on this version again tonight.

The compile time impact is severe for <5 tests and "ok" for the rest. Most tests do not show worse compile time actually. However, I am still working on some simplifications that will reduce compile time as well as some exit conditions that we can place in order to bail out before we spend to much time on a benchmark.

Meinersbur added inline comments.Aug 12 2015, 11:28 AM
include/polly/Support/SCEVAffinator.h
72

Too many spaces?

85

Why the empty line 85 between @brief and @param ?

lib/Support/SCEVAffinator.cpp
118

__isl_take isl_pw_aff *PWA

jdoerfert updated this revision to Diff 32136.Aug 14 2015, 1:51 AM
jdoerfert edited edge metadata.

Separated assumed and boundary context and minimized the compile time overhead

jdoerfert updated this revision to Diff 32137.Aug 14 2015, 1:53 AM

Fixed small issues raised by Michael

jdoerfert updated this object.Aug 14 2015, 1:55 AM

jdoerfert updated this revision to Diff 32136.
jdoerfert added a comment.

Separated assumed and boundary context and minimized the compile time overhead

Hi Johannes,

can you give more details on the current compile-time overhead? Maybe an LNT result
page, or the top-5 offenders?

Best,
Tobias

Attached 2 json files, without_overflow_assumptions.json
and with_overflow_assumptions.json. The former is a recent upstream
commit with (I think) 4 lnt runs, the latter the current integer
wrapping patch applied to the upstream head with 2 lnt runs.

We have 5 benchmarks that have more than 15% compile time regression:

; regress ; prev   ; Current;   σ

MultiSource/Applications/SIBsim4/SIBsim4 ; 136.24% ; 2.5200 ; 5.9533 ; 0.2801
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk ; 21.86% ; 0.5334 ; 0.6500 ; 0.0016
SingleSource/Regression/C++/2003-05-14-expr_stmt ; 16.67% ; 0.1200 ; 0.1400 ; -
MultiSource/Applications/oggenc/oggenc ; 16.34% ; 7.0400 ; 8.1900 ; 0.2633
SingleSource/Regression/C++/EH/dead_try_block ; 15.38% ; 0.1300 ; 0.1500 ; 0.0016

Also the execution time regressed by more than 15% a few times:

                                                                           ;    Δ   ; Previous;  Current;   σ
SingleSource/Benchmarks/Polybench/stencils/fdtd-apml/fdtd-apml             ; 50.00% ; 0.3400  ; 0.5100  ; 0.0016
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv   ; 35.27% ; 0.0567  ; 0.0767  ; -     
SingleSource/Benchmarks/Shootout-C++/strcat                                ; 33.33% ; 0.0300  ; 0.0400  ; -     
MultiSource/Benchmarks/Olden/treeadd/treeadd                               ; 24.01% ; 0.0833  ; 0.1033  ; 0.0050
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg         ; 22.22% ; 0.0900  ; 0.1100  ; 0.0016
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm         ; 19.73% ; 12.7233 ; 15.2333 ; 0.2967
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky ; 18.05% ; 0.4433  ; 0.5233  ; 0.0050
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv   ; 16.70% ; 0.1000  ; 0.1167  ; 0.0016

However I am not sure how much, especially of the runtime results, is
real or only some noise.

All in all I think this is as good as it gets for now.

This time with attachments.

Hi Johannes,

I found one test case where modulo wrapping happens and where Polly currently, but also with your patch, produces incorrect results:

void __attribute__ ((noinline)) foo(float *A, long N) {
  for (long i = 0; i < N; i++)
    if ((signed char)i < 100)
      A[i] += i;
}

Compiled with polly-clang test-wrapping-in-condition.c -O3 -mllvm -polly -mllvm -polly-no-early-exit Polly derives the domain:

[ N ] -> { Stmt_if_then[i0] : i0 >= 0 and i0 <= -1 + N and N >= 1 and i0 <= 99 };

and only runs the first 99 iterations. Without Polly, we run 1000 iterations, but only for those where "i" casted to char is smaller than 99 the values of A are actually updated.

With your patch the BoundaryContext is empty, Should your patch not have a constraint N <= 128 in the BoundaryContext to prevent this miscompilation?

jdoerfert updated this revision to Diff 32241.Aug 16 2015, 8:45 AM

Fixed the problem pointed out by Tobias and added his test case.

Hi Johannes,

some first comments. I still need to understand the algorithm better, but probably just wait to read the writeup in the gitlab document.

include/polly/ScopInfo.h
852

measure

include/polly/Support/SCEVAffinator.h
69

happening

lib/Analysis/ScopInfo.cpp
1295

Is this valid? We are simplifying our assumption context here with the parameter constraints coming from the domain which was generated assuming the assumptions
hold. Is this not another <-> loophole? If it is not, maybe a comment would help that explains why this is correct.

2016

Maybe put this into a separate function with a comment what you are actually doing.

Also, why are you intersecting with getContext() the second time. This adds context constraints to the BoundaryContext which we know hold (and which will likely be dropped in simplifyContext().

2130

This simplification is again fishy as we use the already simplified domain to simplify
the assumptions that were used to simplify the domain.

lib/Support/SCEVAffinator.cpp
80

It seems worth describing what you are actually computing here. Also, it is unclear to me why intersecting here with the domain is save. Are you relying on the domain not
being fully built?

82

It seems you only add modulo wrapping at the very end, but not right when we construct the expressions. This makes the impression you only add modulo wrapping at the outermost level. I think you already replied that the inner levels will also be in
the cache and consequently we will actually have all the wrappings. If this is the case,
we probably want to add a comment.

jdoerfert updated this revision to Diff 33628.Aug 31 2015, 2:08 PM

Updated version. Depends on D12499.

jdoerfert accepted this revision.Sep 26 2015, 2:19 PM
jdoerfert added a reviewer: jdoerfert.

Commited in r247732.

This revision is now accepted and ready to land.Sep 26 2015, 2:19 PM
jdoerfert closed this revision.Sep 26 2015, 2:19 PM