In addition to controlling delinearization validity checks, the -da-disable-delinearization-checks option is also used to controls whether fixed size array delinearization happens or not. This patch renames the option to make it more meaningful and transparent about the assumption it implies.
As suggestion in D73995, subscripts derived from GEP might be valid with inrange modifiers or when we verify that the value range always falls into the dimension's range (e.g. with ScalarEvolution::getRangeRef).
I am not convinced the new name is better because "assume inrange subscripts" reads to me like the subscripts are unambiguous (e.g. from the source code) can be reasoned about, while they are derived heuristically, at least in the delinerization case. Alternative suggestions: "-da-allow-unsafe-derived(-multidimensional)-subscripts", "-da-disable-derived(-multidimensional)-index-range-checks", "-da-assume-nonoverlapping-of-heuristically-derived-multidimensional-array-structures" (or combinations thereof).
I'm not particular about the name, but just don't fully understand the distinction being made here. Based on previous discussion on this (http://lists.llvm.org/pipermail/llvm-dev/2019-May/132372.html) the problem seems to be more related to the language specification than the delinearization algorithm itself. In other words, it appears that the only reason why we need these range checks is because (subject to some people's interpretation) C language allows array indexes to overlap. If that's true, then the fact that our subscripts are derived heuristically is irrelevant, because had the user guaranteed that the indexes don't overlap, we wouldn't need to check the range of our derived subscripts.
Are you suggesting that, even if the indexes are in-range at the source-code level, the delinearization can come up with subscripts that are out-of-range? If that's the case I'd be very interested in an example where that happens.
Consider the example C source:
float A[n*n+n]; for (int i = 0; i < 2*n; i+=1) for (int j = 0; j < n; j+=1) A[j*n + i] = ...
Delinearization might convert the access to A[j][i] for an array of size A[n]. However, in this example, elements accessed by the j-loop overlap into elements accessed by the (j+1)-loop. Assuming that A[j] and A[j+1] do not overlap ("assume-inrange") would be illegal, but from the source code's point-of-view, there is not even an inner array dimension that could be out-of-range. Since the loop bounds are derived heuristically, the user doesn't even know what indexes they agree to not overlap (it could also be A[j/2][n*(j%2) + i] for an array of size A[2*n]).
I think A should be float A[n*n+2*n] for it to even be an in-bound access, but I see what the concern is now. That is a much more useful example I've seen for describing the issue. Thank you very much!
Mmmh maybe, haven't thought too much about it beforehand. Let's see.
The last element accessed is
i = 2*n-1 j = n-1
which accesses element
A[(n-1)*n + (2*n-1)] = A[n*n-n + 2*n-1 = A[n*n + 1*n - 1]
that is, the array needs to be at least A[n*n + n].
My bad. You are right!
Regarding the suggested names: "-da-allow-unsafe-derived(-multidimensional)-subscripts", "-da-disable-derived(-multidimensional)-index-range-checks", and "-da-assume-nonoverlapping-of-heuristically-derived-multidimensional-array-structures". Is there anyone you prefer over the existing -da-disable-delinearization-checks? If not I can abandon this revision.
Not sure. If we count "derive multidim subscripts from GEP without inrange" as delinearization (GEP without inrange/inbounds is just syntactical sugar), then I think the current -da-disable-delinearization-checks is fine.