This will allow to generate non-wrap assumptions for integer expressions that are part of the SCoP. We compare the common isl representation of the expression with one computed with modulo semantic. For all parameter combinations they are not equal we can have integer overflows. The nsw flags are respected when the modulo representation is computed, nuw and nw flags are ignored for now. In order to not increase compile time to much, the non-wrap assumptions are collected in a separate boundary context instead of the assumed context. This helps compile time as the boundary context can become complex and it is therefor not advised to use it in other operations except runtime check generation. However, the assumed context is e.g., used to tighten dependences. While the boundary context might help to tighten the assumed context it is doubtful that it will help in practice (it does not effect lnt much) as the boundary (or no-wrap assumptions) only restrict the very end of the possible value range of parameters.
Details
- Reviewers
Meinersbur grosser jdoerfert
Diff Detail
Event Timeline
I did not had a detailed look yet, but wondered what the LNT results for
this patch looked like? Did this pass all tests and did we see a compile
time impact?
Tobias
It should pass all lnt tests but I'll run lnt on this version again tonight.
The compile time impact is severe for <5 tests and "ok" for the rest. Most tests do not show worse compile time actually. However, I am still working on some simplifications that will reduce compile time as well as some exit conditions that we can place in order to bail out before we spend to much time on a benchmark.
jdoerfert updated this revision to Diff 32136.
jdoerfert added a comment.Separated assumed and boundary context and minimized the compile time overhead
Hi Johannes,
can you give more details on the current compile-time overhead? Maybe an LNT result
page, or the top-5 offenders?
Best,
Tobias
Attached 2 json files, without_overflow_assumptions.json
and with_overflow_assumptions.json. The former is a recent upstream
commit with (I think) 4 lnt runs, the latter the current integer
wrapping patch applied to the upstream head with 2 lnt runs.
We have 5 benchmarks that have more than 15% compile time regression:
; regress ; prev ; Current; σ
MultiSource/Applications/SIBsim4/SIBsim4 ; 136.24% ; 2.5200 ; 5.9533 ; 0.2801
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk ; 21.86% ; 0.5334 ; 0.6500 ; 0.0016
SingleSource/Regression/C++/2003-05-14-expr_stmt ; 16.67% ; 0.1200 ; 0.1400 ; -
MultiSource/Applications/oggenc/oggenc ; 16.34% ; 7.0400 ; 8.1900 ; 0.2633
SingleSource/Regression/C++/EH/dead_try_block ; 15.38% ; 0.1300 ; 0.1500 ; 0.0016
Also the execution time regressed by more than 15% a few times:
; Δ ; Previous; Current; σ SingleSource/Benchmarks/Polybench/stencils/fdtd-apml/fdtd-apml ; 50.00% ; 0.3400 ; 0.5100 ; 0.0016 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv ; 35.27% ; 0.0567 ; 0.0767 ; - SingleSource/Benchmarks/Shootout-C++/strcat ; 33.33% ; 0.0300 ; 0.0400 ; - MultiSource/Benchmarks/Olden/treeadd/treeadd ; 24.01% ; 0.0833 ; 0.1033 ; 0.0050 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg ; 22.22% ; 0.0900 ; 0.1100 ; 0.0016 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm ; 19.73% ; 12.7233 ; 15.2333 ; 0.2967 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky ; 18.05% ; 0.4433 ; 0.5233 ; 0.0050 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv ; 16.70% ; 0.1000 ; 0.1167 ; 0.0016
However I am not sure how much, especially of the runtime results, is
real or only some noise.
All in all I think this is as good as it gets for now.
Hi Johannes,
I found one test case where modulo wrapping happens and where Polly currently, but also with your patch, produces incorrect results:
void __attribute__ ((noinline)) foo(float *A, long N) { for (long i = 0; i < N; i++) if ((signed char)i < 100) A[i] += i; }
Compiled with polly-clang test-wrapping-in-condition.c -O3 -mllvm -polly -mllvm -polly-no-early-exit Polly derives the domain:
[ N ] -> { Stmt_if_then[i0] : i0 >= 0 and i0 <= -1 + N and N >= 1 and i0 <= 99 };
and only runs the first 99 iterations. Without Polly, we run 1000 iterations, but only for those where "i" casted to char is smaller than 99 the values of A are actually updated.
With your patch the BoundaryContext is empty, Should your patch not have a constraint N <= 128 in the BoundaryContext to prevent this miscompilation?
Hi Johannes,
some first comments. I still need to understand the algorithm better, but probably just wait to read the writeup in the gitlab document.
include/polly/ScopInfo.h | ||
---|---|---|
825 | measure | |
include/polly/Support/SCEVAffinator.h | ||
61 | happening | |
lib/Analysis/ScopInfo.cpp | ||
1168 | Is this valid? We are simplifying our assumption context here with the parameter constraints coming from the domain which was generated assuming the assumptions | |
1494 | Maybe put this into a separate function with a comment what you are actually doing. Also, why are you intersecting with getContext() the second time. This adds context constraints to the BoundaryContext which we know hold (and which will likely be dropped in simplifyContext(). | |
1592 | This simplification is again fishy as we use the already simplified domain to simplify | |
lib/Support/SCEVAffinator.cpp | ||
81 | It seems worth describing what you are actually computing here. Also, it is unclear to me why intersecting here with the domain is save. Are you relying on the domain not | |
83 | It seems you only add modulo wrapping at the very end, but not right when we construct the expressions. This makes the impression you only add modulo wrapping at the outermost level. I think you already replied that the inner levels will also be in |
measure