This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1/1
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
-
Delinearization.cpp
2/4
DependenceAnalysis.cpp
-
LoopCacheAnalysis.cpp
-
ScalarEvolution.cpp
-
test/Analysis/Delinearization/
-
Analysis/
-
Delinearization/
-
a.ll
1/3
byte_offset.ll
-
constant_functions_multi_dim.ll
-
divide_by_one.ll
-
himeno_1.ll
-
himeno_2.ll
-
iv_times_constant_in_subscript.ll
-
multidim_ivs_and_integer_offsets_3d.ll
-
multidim_ivs_and_integer_offsets_nts_3d.ll
-
multidim_ivs_and_parameteric_offsets_3d.ll
-
multidim_only_ivs_2d.ll
-
multidim_only_ivs_3d.ll
-
multidim_only_ivs_3d_cast.ll
-
parameter_addrec_product.ll
-
polly/
-
lib/Analysis/
-
Analysis/
1
ScopDetection.cpp
-
test/ScopDetect/
-
ScopDetect/
-
array_elt_byte_offset.ll

Differential D108885

[Delinerization] Keep array element byte offset.
Needs ReviewPublic

Authored by Meinersbur on Aug 28 2021, 9:34 PM.

Download Raw Diff

Details

Reviewers

Whitney
bmahjour
sebpop
grosser
bollu

Summary

The offset into the array element cannot just be discarded. In most cases it will be zero, but like any of the detected subscript any memory access must be checked whether it is contained in the byte range of the memory access (or aliases with neighboring array elements). This is because delinearization is just a heuristic, there is not guarantee that the byte offset is a constant, is non-negative, is less than the array element size, or the memory access is entirely contained within the array element (delinerization does not even know the access size).

Fix by removing the special handling of the least significant dimension in ScalarEvolution::computeAccessFunctions. The makes the returned Subscripts array one larger than the Sizes array. This actually would be expected, a subscript each size plus one subscript for the division remainder representing the outermost dimension of unknown size.

This bug caused Polly to miscompile blender (526.blender_r from SPEC CPU 2017) in -polly-process-unprofitable mode. The SCEV expression incorrectly delinearized has been reduced in the test case byte_offset.ll. The dropped offset into the array element of size 4 (a float) is ((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>). This significant component was just dropped, and the wrong pointer was computed when regenerating code from the remaining delinearized subscripts.

This occurred during blender's subsurface scattering implementation. As a result, blender's rendering diverged from the reference image.

Bug	Reference

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Meinersbur created this revision.Aug 28 2021, 9:34 PM

Herald added a reviewer: bollu. · View Herald TranscriptAug 28 2021, 9:34 PM

Herald added subscribers: javed.absar, hiraditya. · View Herald Transcript

Meinersbur requested review of this revision.Aug 28 2021, 9:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2021, 9:34 PM

[Polly] Also require offset to be zero.

Harbormaster completed remote builds in B121641: Diff 369304.Aug 28 2021, 10:42 PM

The makes the returned Subscripts array one larger than the Sizes array. This actually would be expected, a subscript each size plus one subscript for the division remainder representing the outermost dimension of unknown size.

The Size array includes the element size, so it already makes up for the lack of a value for the outermost dimension.

Instead of treating the element offset as a subscript, could we make computeAccessFunctions return the offset (if non-zero) in a separate output parameter?
Or maybe we should consider delinearization unsuccessful when there is a non-zero remainder after the last real subscript is already computed.

llvm/include/llvm/Analysis/ScalarEvolution.h
1201	address of a memory access -> address of the above memory access
llvm/lib/Analysis/DependenceAnalysis.cpp
3454	I'm worried that having an extra subscript could confuse the analysis. Each separable subscript typically corresponds to a distinct loop level (unless the subscript is ZIV), but if we include the element offset as a "subscript" it would normally not have any corresponding loop levels.
llvm/test/Analysis/Delinearization/byte_offset.ll
10	could we have the psudo-c for this IR added here as a comment?

Address review

In D108885#2975622, @bmahjour wrote:

The Size array includes the element size, so it already makes up for the lack of a value for the outermost dimension.

That's the issue. It is contained in the Size array, but its subscript (which ideally would be between 0 <= subscript < EltSize), unlike for any other size, is dropped.

Instead of treating the element offset as a subscript, could we make computeAccessFunctions return the offset (if non-zero) in a separate output parameter?

This would make handling of the element offset more complicated and error prone, while there is no inherent difference between a array dimension of constant size and the byte offset. As illustrated by this bug and the fact that DependenceAnalysis's subscript checking also just works for the array element offset.

Or maybe we should consider delinearization unsuccessful when there is a non-zero remainder after the last real subscript is already computed.

Would be possibility, and was already done by computeAccessFunctions if the outermost operation was a SCEVAddRecExpr justified by being "too complicated". How can it be "too complicated" if it is just forgotten about?
It would also make make it impossible do use delinearization for array-of-structs. I would leave that decision what to support to the caller. At least for Polly I was considering supporting array-of-structs.

llvm/lib/Analysis/DependenceAnalysis.cpp
3454	Changed this to pop the last element before continuing unless `isZero()` is false and then bail out, like already done for Polly and LoopCacheAnalysis. We can check later how DA had to be changed to support this. At least the DelinearizationChecks below would just work and I would assume that DA works find as long as the accessed memory falls completely into the elements byte change, which is verified by the Delinearization check.
llvm/test/Analysis/Delinearization/byte_offset.ll
10	Not sure how useful you find these manually-decompiled IR that come from llvm-reduce, but here it is. You can also find the origin here: https://github.com/blender/blender/blob/765b842f9520843183bf0a3cdcd071f152bbbf9e/source/blender/blenkernel/intern/CCGSubSurf.c#L2131

Harbormaster completed remote builds in B122028: Diff 369840.Aug 31 2021, 9:30 PM

In D108885#2976089, @Meinersbur wrote:

In D108885#2975622, @bmahjour wrote:

Instead of treating the element offset as a subscript, could we make computeAccessFunctions return the offset (if non-zero) in a separate output parameter?

This would make handling of the element offset more complicated and error prone, while there is no inherent difference between a array dimension of constant size and the byte offset. As illustrated by this bug and the fact that DependenceAnalysis's subscript checking also just works for the array element offset.

The way "element offset" is being handled in this patch is to isolate it (from the rest of the subscripts) and ignore it everywhere except for ScopDetection.cpp where it is still isolated but used to decide if the access is affine. If it needs to be isolated to be handled, then I don't see how returning it as a separate parameter makes it more complicated. The original goal of the delinearization algorithm is to recover source level subscripts. While the "element offset" can be thought of as a byte index into an imaginary inner-most dimension, it is not something that corresponds to a source level array subscript.

Or maybe we should consider delinearization unsuccessful when there is a non-zero remainder after the last real subscript is already computed.

Would be possibility, and was already done by computeAccessFunctions if the outermost operation was a SCEVAddRecExpr justified by being "too complicated". How can it be "too complicated" if it is just forgotten about?
It would also make make it impossible do use delinearization for array-of-structs. I would leave that decision what to support to the caller. At least for Polly I was considering supporting array-of-structs.

Yeah, I'm not sure why it was considered "too complicated" only when the remainder was an AddRec. The paper says that the original polynomial expression (representing the linearized access function) is first divided by the element size, but they don't say what it means if there is a remainder and what to do with it. I would assume that having an access function that doesn't evenly divide the element size cannot be safely delinearized.

I tried a simple example and it seems that the "element offset" is zero regardless of which member of a structure is being accessed, so not sure if it has any bearing on the ability to delinearize arrays of structs:

> cat delin_struct.c
struct MyS {
  float a, b;
};

void foo(int n, int m, struct MyS f1[][n][m]) {
  for (int i = 0; i < 1024; i++)
    for (int j = 0; j < n; j++)
      for (int k = 0; k < m; k++)
        f1[i][j][k].a = f1[i][j][k].b;
}

opt -passes='print<delinearization>' -disable-output delin_struct.simp.ll 2>&1 | grep ArrayRef
ArrayRef[{0,+,2}<%for.body>][{0,+,2}<%for.body4>][{1,+,2}<%for.body8>][0]
ArrayRef[{0,+,2}<%for.body>][{2,+,2}<%for.body4>][-1][0]
ArrayRef[0][{(-1 + (2 * (zext i32 %n to i64) * (zext i32 %m to i64))),+,(2 * (zext i32 %n to i64) * (zext i32 %m to i64))}<%for.body>][0]
ArrayRef[{0,+,2}<%for.body>][{0,+,2}<%for.body4>][{0,+,2}<%for.body8>][0]
ArrayRef[{0,+,2}<%for.body>][{2,+,2}<%for.body4>][-2][0]
ArrayRef[0][{(-2 + (2 * (zext i32 %n to i64) * (zext i32 %m to i64))),+,(2 * (zext i32 %n to i64) * (zext i32 %m to i64))}<%for.body>][0]

Note that the subscript in the last dimension (assumed byte array) is 0! IR attached:

delin_struct.simp.ll3 KBDownload

llvm/lib/Analysis/DependenceAnalysis.cpp
3453	[nit] the by offset -> the byte offset
llvm/test/Analysis/Delinearization/byte_offset.ll
10	I haven't looked at the original example yet, but this looks like a case where delinearization should be expected to fail. The recovered subscripts above don't look anything like what the source level subscripts are.
polly/lib/Analysis/ScopDetection.cpp
1029	[nit] `ElementSize` may be a better name than `ArrSize`?

Meinersbur mentioned this in D109133: [Delinerization] Require by offset to be zero..Sep 1 2021, 11:00 PM

In D108885#2977230, @bmahjour wrote:

If it needs to be isolated to be handled, then I don't see how returning it as a separate parameter makes it more complicated.

It only is handled differently because the legacy code does not expect an additional parameter. However, fixing the bug a higher priority and the individual passes could be improved making proper uses of the returned subscript at a later point.

The original goal of the delinearization algorithm is to recover source level subscripts.

[citation needed]

If this was true than any delinearization that does not correspond to the original source subscript needed to be considered wrong. However, I am pretty sure we also want to delinearize A[i +j*n] even though the source code subscript has been linearized.

I tried a simple example and it seems that the "element offset" is zero regardless of which member of a structure is being accessed, so not sure if it has any bearing on the ability to delinearize arrays of structs:

It is delinearized as array element sizeof(float), not sizeof(MyS) with one dimension twice as large and SCEVAddRecExpr starting with 1 instead of 0. This has to do with how ScalarEvolution tries to remove the struct index from the GEP, modeling it as a simple addition.

Use different elements of different sizes in the struct:

struct __attribute__((packed)) Pair { int x; long y; };
void foo(unsigned long N, struct Pair A[][N]) {
  for (unsigned long i = 0; i < N; i+=1)
      A[i][i].y = 0;
}

opt -delinearize

AccessFunction: {4,+,(12 + (12 * %N))}<%for.cond>
Base offset: %A
ArrayDecl[UnknownSize][%N][8]
ArrayRef[0][{0,+,1}<%for.cond>][{4,+,(4 + (12 * %N))}<%for.cond>]

This is obviously more complicated than it needs to be. Specifically, it still considers element size to be sizeof(long), because that's the access it sees. If accessing x instead of y the element size would be 4. Polly tries to combine the shapes from multiple delinearization results into a common one such that subscripts are comparable.

Yes, with the current implementation of delinearization, I don't think there are a lot of cases where the byte offset subscript would improve the dependence analysis. Still, I think the right API that allows improving the delinerization includes the byte offset.

Having derived this example, it is much better understandable than the regression test I got from llvm-reduce, I could replace it.

In D108885#2978890, @Meinersbur wrote:
In D108885#2977230, @bmahjour wrote:

If it needs to be isolated to be handled, then I don't see how returning it as a separate parameter makes it more complicated.

It only is handled differently because the legacy code does not expect an additional parameter. However, fixing the bug a higher priority and the individual passes could be improved making proper uses of the returned subscript at a later point.

The original goal of the delinearization algorithm is to recover source level subscripts.

[citation needed]

If this was true than any delinearization that does not correspond to the original source subscript needed to be considered wrong. However, I am pretty sure we also want to delinearize A[i +j*n] even though the source code subscript has been linearized.

I tried a simple example and it seems that the "element offset" is zero regardless of which member of a structure is being accessed, so not sure if it has any bearing on the ability to delinearize arrays of structs:

It is delinearized as array element sizeof(float), not sizeof(MyS) with one dimension twice as large and SCEVAddRecExpr starting with 1 instead of 0. This has to do with how ScalarEvolution tries to remove the struct index from the GEP, modeling it as a simple addition.

Use different elements of different sizes in the struct:
struct __attribute__((packed)) Pair { int x; long y; };
void foo(unsigned long N, struct Pair A[][N]) {
  for (unsigned long i = 0; i < N; i+=1)
      A[i][i].y = 0;
}
opt -delinearize
AccessFunction: {4,+,(12 + (12 * %N))}<%for.cond>
Base offset: %A
ArrayDecl[UnknownSize][%N][8]
ArrayRef[0][{0,+,1}<%for.cond>][{4,+,(4 + (12 * %N))}<%for.cond>]
This is obviously more complicated than it needs to be. Specifically, it still considers element size to be sizeof(long), because that's the access it sees. If accessing x instead of y the element size would be 4. Polly tries to combine the shapes from multiple delinearization results into a common one such that subscripts are comparable.

Yes, with the current implementation of delinearization, I don't think there are a lot of cases where the byte offset subscript would improve the dependence analysis. Still, I think the right API that allows improving the delinerization includes the byte offset.

Having derived this example, it is much better understandable than the regression test I got from llvm-reduce, I could replace it.

Thanks for the example. It does make it much easier to understand the problem. I also have concerns about the current implementation not being able to handle arrays of structures. To understand this better, I derived an example based on your example above:

struct __attribute__((packed)) MyS {
  float a;
  double b;
};

void foo(long long n, long long m, struct MyS f1[][n][m]) {
  for (int i = 0; i < 1024; i++)
    for (int j = 0; j < n; j++)
      for (int k = 0; k < m; k++)
        f1[i][j][k].b = f1[i-1][j][k].b;
}

Ideally we would want delinearization to recover the subscripts for the load to be [i-1][j][k] and the subscripts for the store to be [i][j][k], so that the dependence analysis can produce [-1, 0, 0]. With the changes in this patch we get the following delinearization for the load and stores respectively:

Inst:  %4 = load double, double* %b, align 1, !tbaa !3
In Loop with Header: for.body11
AccessFunction: {{{(4 + (-12 * %n * %m)),+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,12}<%for.body11>
Base offset: %f1
ArrayDecl[UnknownSize][%n][%m][8]
ArrayRef[0][0][{0,+,1}<nuw><nsw><%for.body11>][{{{(4 + (-12 * %n * %m)),+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,4}<%for.body11>]

Inst:  store double %4, double* %b22, align 1, !tbaa !3
In Loop with Header: for.body11
AccessFunction: {{{4,+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,12}<%for.body11>
Base offset: %f1
ArrayDecl[UnknownSize][%n][%m][8]
ArrayRef[0][0][{0,+,1}<nuw><nsw><%for.body11>][{{{4,+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,4}<%for.body11>]

Even though delinearization claims to be successful most of the complexity of the original access function has now moved to the inner most dimension, leaving the outer subscripts as zeros. This won't help dependence analysis in any way, as it now has to further analyze the complicated subscript for the synthesized inner most dimension. It's not clear to me how that can be done.

Note that if the element size passed to the delinearize function was chosen to be 12 (the true element size of the array), then delinearization would have been able to recover more meaningful subscripts for the outer dimensions without requiring a "byte offset". I wonder if we can improve the results for structure of arrays by choosing a better element size.

Note that if the element size passed to the delinearize function was chosen to be 12 (the true element size of the array), then delinearization would have been able to recover more meaningful subscripts for the outer dimensions without requiring a "byte offset". I wonder if we can improve the results for structure of arrays by choosing a better element size.

Correction: the access functions in my example do not divide 12, so choosing the "true element size" doesn't fix it :(

...but since delinearization is a heuristic, maybe we can heuristically pick something that does divide it evenly (eg a GCD or just choosing 1). If we choose 1, DA is able to produce expected result (with -da-disable-delinearization-checks):

ArrayDecl[UnknownSize][%n][%m][1]
ArrayRef[{-12,+,12}<%for.body>][{0,+,12}<%for.body5>][{4,+,12}<%for.body11>][0]
...
ArrayRef[{0,+,12}<%for.body>][{0,+,12}<%for.body5>][{4,+,12}<%for.body11>][0]

Src:  %4 = load double, double* %b, align 1, !tbaa !3 --> Dst:  store double %4, double* %b22, align 1, !tbaa !3
  da analyze - consistent anti [-1 0 0]!

Meinersbur mentioned this in rG088577a38e60: [Delinerization] Require by offset to be zero..Sep 8 2021, 2:02 PM

> I also have concerns about the current implementation not being able to handle arrays of structures.

D109527

maybe we can heuristically pick something that does divide it evenly (eg a GCD or just choosing 1). If we choose 1, DA is able to produce expected result (with -da-disable-delinearization-checks)

A dimension of size 1 can be omitted since its only valid with a subscript of 0 and multiplies by 1. What makes delinearization useful is that allowing to assume that memory accesses only alias if all subscripts are equal (and the arrays themselves dot overlap). For this to work the subscripts must be between 0 and the dimension size. -da-disable-delinearization-checks skips this check and therefore may cause miscompilation.

Meinersbur added a child revision: D109527: [Delinearization] Delinearization of Array-of-Struct. Proof-of-Concept..Sep 9 2021, 10:56 AM

In D108885#2992237, @Meinersbur wrote:

> I also have concerns about the current implementation not being able to handle arrays of structures.

D109527

Thanks for that. Looks like using the "true element size" can significantly simplify the remainder expression for the added byte dimension. We may be converging on a solution here.

maybe we can heuristically pick something that does divide it evenly (eg a GCD or just choosing 1). If we choose 1, DA is able to produce expected result (with -da-disable-delinearization-checks)

A dimension of size 1 can be omitted since its only valid with a subscript of 0 and multiplies by 1. What makes delinearization useful is that allowing to assume that memory accesses only alias if all subscripts are equal (and the arrays themselves dot overlap)[*]. For this to work the subscripts must be between 0 and the dimension size. -da-disable-delinearization-checks skips this check and therefore may cause miscompilation.

Rather than viewing it as an extra dimension I simply view it as the size of the structure member. From the point of view of dependence analysis (after aliasing properties are computed) the dependence between A[i][j][k].b and A[i-1][j][k].b should yield the same results regardless of the size of b (be it 1-byte or otherwise). I used the -da-disable-delinearization-checks to illustrate the idea, but it's possible that using this scheme requires different validity checks.

bmahjour added inline comments.Sep 9 2021, 3:45 PM

llvm/lib/Analysis/DependenceAnalysis.cpp
3458–3467	I think we should care more about the difference between the byte offset of the source and destination, than its actual value. If the byte offsets are equal, then the rest of the subscripts should be analyzed. If they are different, then we don't know how to handle them yet. suggestion: replace it with the following and move it after checking that the subscript arrays are equal in size and contain at least 2 elements each (line 3457 of the base). const SCEV EltOffsetSrc = SrcSubscripts.pop_back_val(); const SCEV EltOffsetDst = DstSubscripts.pop_back_val(); if (EltOffsetSrc != EltOffsetDst) return false;

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ScalarEvolution.h

15 lines

lib/

Analysis/

Delinearization.cpp

13 lines

DependenceAnalysis.cpp

2 lines

LoopCacheAnalysis.cpp

10 lines

ScalarEvolution.cpp

20 lines

test/

Analysis/

Delinearization/

a.ll

4 lines

byte_offset.ll

59 lines

constant_functions_multi_dim.ll

8 lines

divide_by_one.ll

8 lines

himeno_1.ll

4 lines

himeno_2.ll

4 lines

iv_times_constant_in_subscript.ll

4 lines

multidim_ivs_and_integer_offsets_3d.ll

4 lines

multidim_ivs_and_integer_offsets_nts_3d.ll

4 lines

multidim_ivs_and_parameteric_offsets_3d.ll

4 lines

multidim_only_ivs_2d.ll

8 lines

multidim_only_ivs_3d.ll

4 lines

multidim_only_ivs_3d_cast.ll

4 lines

parameter_addrec_product.ll

4 lines

polly/

lib/

Analysis/

ScopDetection.cpp

23 lines

test/

ScopDetect/

array_elt_byte_offset.ll

78 lines

Diff 369304

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,107 Lines • ▼ Show 20 Lines	public:
bool invalidate(Function &F, const PreservedAnalyses &PA,		bool invalidate(Function &F, const PreservedAnalyses &PA,
FunctionAnalysisManager::Invalidator &Inv);		FunctionAnalysisManager::Invalidator &Inv);

/// Collect parametric terms occurring in step expressions (first step of		/// Collect parametric terms occurring in step expressions (first step of
/// delinearization).		/// delinearization).
void collectParametricTerms(const SCEV *Expr,		void collectParametricTerms(const SCEV *Expr,
SmallVectorImpl<const SCEV *> &Terms);		SmallVectorImpl<const SCEV *> &Terms);

/// Return in Subscripts the access functions for each dimension in Sizes		/// Return in Subscripts the access functions for each dimension in Sizes plus
/// (third step of delinearization).		/// for an outermost dimension of unknown size (third step of
		/// delinearization).
void computeAccessFunctions(const SCEV *Expr,		void computeAccessFunctions(const SCEV *Expr,
SmallVectorImpl<const SCEV *> &Subscripts,		SmallVectorImpl<const SCEV *> &Subscripts,
SmallVectorImpl<const SCEV *> &Sizes);		ArrayRef<const SCEV *> Sizes);

/// Gathers the individual index expressions from a GEP instruction.		/// Gathers the individual index expressions from a GEP instruction.
///		///
/// This function optimistically assumes the GEP references into a fixed size		/// This function optimistically assumes the GEP references into a fixed size
/// array. If this is actually true, this function returns a list of array		/// array. If this is actually true, this function returns a list of array
/// subscript expressions in \p Subscripts and a list of integers describing		/// subscript expressions in \p Subscripts and a list of integers describing
/// the size of the individual array dimensions in \p Sizes. Both lists have		/// the size of the individual array dimensions in \p Sizes. Both lists have
/// either equal length or the size list is one element shorter in case there		/// either equal length or the size list is one element shorter in case there
Show All 23 Lines	public:
/// accesses, and compute in step 2 a unique array shape. This guarantees		/// accesses, and compute in step 2 a unique array shape. This guarantees
/// that the array shape will be the same across all memory accesses.		/// that the array shape will be the same across all memory accesses.
///		///
/// FIXME: We could derive the result of steps 1 and 2 from a description of		/// FIXME: We could derive the result of steps 1 and 2 from a description of
/// the array shape given in metadata.		/// the array shape given in metadata.
///		///
/// Example:		/// Example:
///		///
/// A[][n][m]		/// float A[][n][m]
///		///
/// for i		/// for i
/// for j		/// for j
/// for k		/// for k
/// A[j+k][2i][5i] =		/// A[j+k][2i][5i] =
///		///
/// The initial SCEV:		/// The initial SCEV:
///		///
/// A[{{{0,+,2m+5}_i, +, nm}_j, +, n*m}_k]		/// A[{{{0,+,2m+5}_i, +, nm}_j, +, n*m}_k]
		/// (Note: The SCEV of a pointer is in bytes, here we use the argument
		/// of the GetElementPtr for illustration)
///		///
/// 1. Find the different terms in the step functions:		/// 1. Find the different terms in the step functions:
/// -> [2m, 5, nm, n*m]		/// -> [2m, 5, nm, n*m]
///		///
/// 2. Compute the array size: sort and unique them		/// 2. Compute the array size: sort and unique them
/// -> [nm, 2m, 5]		/// -> [nm, 2m, 5]
/// find the GCD of all the terms = 1		/// find the GCD of all the terms = 1
/// divide by the GCD and erase constant terms		/// divide by the GCD and erase constant terms
Show All 12 Lines	public:
///		///
/// b. Divide Quotient: {{{0,+,2}_i, +, n}_j, +, n}_k by next outer size n		/// b. Divide Quotient: {{{0,+,2}_i, +, n}_j, +, n}_k by next outer size n
/// Quotient: {{{0,+,0}_i, +, 1}_j, +, 1}_k		/// Quotient: {{{0,+,0}_i, +, 1}_j, +, 1}_k
/// Remainder: {{{0,+,2}_i, +, 0}_j, +, 0}_k		/// Remainder: {{{0,+,2}_i, +, 0}_j, +, 0}_k
/// The Remainder is the subscript of the next array dimension: [2i].		/// The Remainder is the subscript of the next array dimension: [2i].
///		///
/// The subscript of the outermost dimension is the Quotient: [j+k].		/// The subscript of the outermost dimension is the Quotient: [j+k].
///		///
/// Overall, we have: A[][n][m], and the access function: A[j+k][2i][5i].		/// Overall, for the address of a memory access, we have: A[][n][m][4], and
		bmahjourUnsubmitted Done Reply Inline Actions address of a memory access -> address of the above memory access bmahjour: address of a memory access -> address of the above memory access
		/// the access function: A[j+k][2i][5i][0] where [4] is the array element size
		/// in bytes and [0] is the byte offset into this element.
void delinearize(const SCEV Expr, SmallVectorImpl<const SCEV > &Subscripts,		void delinearize(const SCEV Expr, SmallVectorImpl<const SCEV > &Subscripts,
SmallVectorImpl<const SCEV *> &Sizes,		SmallVectorImpl<const SCEV *> &Sizes,
const SCEV *ElementSize);		const SCEV *ElementSize);

/// Return the DataLayout associated with the module this SCEV instance is		/// Return the DataLayout associated with the module this SCEV instance is
/// operating on.		/// operating on.
const DataLayout &getDataLayout() const {		const DataLayout &getDataLayout() const {
return F.getParent()->getDataLayout();		return F.getParent()->getDataLayout();
▲ Show 20 Lines • Show All 1,060 Lines • Show Last 20 Lines

llvm/lib/Analysis/Delinearization.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	for (Loop *L = LI->getLoopFor(BB); L != nullptr; L = L->getParentLoop()) {
O << "\n";		O << "\n";
O << "Inst:" << Inst << "\n";		O << "Inst:" << Inst << "\n";
O << "In Loop with Header: " << L->getHeader()->getName() << "\n";		O << "In Loop with Header: " << L->getHeader()->getName() << "\n";
O << "AccessFunction: " << *AccessFn << "\n";		O << "AccessFunction: " << *AccessFn << "\n";

SmallVector<const SCEV *, 3> Subscripts, Sizes;		SmallVector<const SCEV *, 3> Subscripts, Sizes;
SE->delinearize(AccessFn, Subscripts, Sizes, SE->getElementSize(&Inst));		SE->delinearize(AccessFn, Subscripts, Sizes, SE->getElementSize(&Inst));
if (Subscripts.size() == 0 \|\| Sizes.size() == 0 \|\|		if (Subscripts.size() == 0 \|\| Sizes.size() == 0 \|\|
Subscripts.size() != Sizes.size()) {		Subscripts.size() != Sizes.size() + 1) {
O << "failed to delinearize\n";		O << "failed to delinearize\n";
continue;		continue;
}		}

O << "Base offset: " << *BasePointer << "\n";		O << "Base offset: " << *BasePointer << "\n";
O << "ArrayDecl[UnknownSize]";		O << "ArrayDecl[UnknownSize]";
int Size = Subscripts.size();		for (const SCEV *Size : Sizes)
for (int i = 0; i < Size - 1; i++)		O << "[" << *Size << "]";
O << "[" << *Sizes[i] << "]";		O << "\n";
O << " with elements of " << *Sizes[Size - 1] << " bytes.\n";

O << "ArrayRef";		O << "ArrayRef";
for (int i = 0; i < Size; i++)		for (const SCEV *Subscript : Subscripts)
O << "[" << *Subscripts[i] << "]";		O << "[" << *Subscript << "]";
O << "\n";		O << "\n";
}		}
}		}
}		}

} // end anonymous namespace		} // end anonymous namespace

void Delinearization::getAnalysisUsage(AnalysisUsage &AU) const {		void Delinearization::getAnalysisUsage(AnalysisUsage &AU) const {
Show All 33 Lines

llvm/lib/Analysis/DependenceAnalysis.cpp

Show First 20 Lines • Show All 3,444 Lines • ▼ Show 20 Lines	bool DependenceInfo::tryDelinearizeParametricSize(
// Second step: find subscript sizes.		// Second step: find subscript sizes.
SmallVector<const SCEV *, 4> Sizes;		SmallVector<const SCEV *, 4> Sizes;
SE->findArrayDimensions(Terms, Sizes, ElementSize);		SE->findArrayDimensions(Terms, Sizes, ElementSize);

// Third step: compute the access functions for each subscript.		// Third step: compute the access functions for each subscript.
SE->computeAccessFunctions(SrcAR, SrcSubscripts, Sizes);		SE->computeAccessFunctions(SrcAR, SrcSubscripts, Sizes);
SE->computeAccessFunctions(DstAR, DstSubscripts, Sizes);		SE->computeAccessFunctions(DstAR, DstSubscripts, Sizes);

// Fail when there is only a subscript: that's a linearized access function.		// Fail when there is only a subscript: that's a linearized access function.
		bmahjourUnsubmitted Not Done Reply Inline Actions [nit] the by offset -> the byte offset bmahjour: [nit] the by offset -> the byte offset
if (SrcSubscripts.size() < 2 \|\| DstSubscripts.size() < 2 \|\|		if (SrcSubscripts.size() <= 2 \|\| DstSubscripts.size() <= 2 \|\|
		bmahjourUnsubmitted Done Reply Inline Actions I'm worried that having an extra subscript could confuse the analysis. Each separable subscript typically corresponds to a distinct loop level (unless the subscript is ZIV), but if we include the element offset as a "subscript" it would normally not have any corresponding loop levels. bmahjour: I'm worried that having an extra subscript could confuse the analysis. Each separable subscript…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions Changed this to pop the last element before continuing unless `isZero()` is false and then bail out, like already done for Polly and LoopCacheAnalysis. We can check later how DA had to be changed to support this. At least the DelinearizationChecks below would just work and I would assume that DA works find as long as the accessed memory falls completely into the elements byte change, which is verified by the Delinearization check. Meinersbur: Changed this to pop the last element before continuing unless `isZero()` is false and then bail…
SrcSubscripts.size() != DstSubscripts.size())		SrcSubscripts.size() != DstSubscripts.size())
return false;		return false;

size_t Size = SrcSubscripts.size();		size_t Size = SrcSubscripts.size();

// Statically check that the array bounds are in-range. The first subscript we		// Statically check that the array bounds are in-range. The first subscript we
// don't have a size for and it cannot overflow into another subscript, so is		// don't have a size for and it cannot overflow into another subscript, so is
// always safe. The others need to be 0 <= subscript[i] < bound, for both src		// always safe. The others need to be 0 <= subscript[i] < bound, for both src
// and dst.		// and dst.
// FIXME: It may be better to record these sizes and add them as constraints		// FIXME: It may be better to record these sizes and add them as constraints
// to the dependency checks.		// to the dependency checks.
if (!DisableDelinearizationChecks)		if (!DisableDelinearizationChecks)
for (size_t I = 1; I < Size; ++I) {		for (size_t I = 1; I < Size; ++I) {
		bmahjourUnsubmitted Not Done Reply Inline Actions I think we should care more about the difference between the byte offset of the source and destination, than its actual value. If the byte offsets are equal, then the rest of the subscripts should be analyzed. If they are different, then we don't know how to handle them yet. suggestion: replace it with the following and move it after checking that the subscript arrays are equal in size and contain at least 2 elements each (line 3457 of the base). const SCEV EltOffsetSrc = SrcSubscripts.pop_back_val(); const SCEV EltOffsetDst = DstSubscripts.pop_back_val(); if (EltOffsetSrc != EltOffsetDst) return false; bmahjour: I think we should care more about the difference between the byte offset of the source and…
if (!isKnownNonNegative(SrcSubscripts[I], SrcPtr))		if (!isKnownNonNegative(SrcSubscripts[I], SrcPtr))
return false;		return false;

if (!isKnownLessThan(SrcSubscripts[I], Sizes[I - 1]))		if (!isKnownLessThan(SrcSubscripts[I], Sizes[I - 1]))
return false;		return false;

if (!isKnownNonNegative(DstSubscripts[I], DstPtr))		if (!isKnownNonNegative(DstSubscripts[I], DstPtr))
return false;		return false;
▲ Show 20 Lines • Show All 660 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopCacheAnalysis.cpp

Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	if (Loop *L = LI.getLoopFor(BB)) {
AccessFn = SE.getMinusSCEV(AccessFn, BasePointer);		AccessFn = SE.getMinusSCEV(AccessFn, BasePointer);

LLVM_DEBUG(dbgs().indent(2) << "In Loop '" << L->getName()		LLVM_DEBUG(dbgs().indent(2) << "In Loop '" << L->getName()
<< "', AccessFn: " << *AccessFn << "\n");		<< "', AccessFn: " << *AccessFn << "\n");

SE.delinearize(AccessFn, Subscripts, Sizes,		SE.delinearize(AccessFn, Subscripts, Sizes,
SE.getElementSize(&StoreOrLoadInst));		SE.getElementSize(&StoreOrLoadInst));

		// Eliminate the innermost subscript (for the byte offset to the array
		// element). LoopCacheAnalysis assumes it is zero.
		if (!Subscripts.empty() && Subscripts.back()->isZero())
		Subscripts.pop_back();

if (Subscripts.empty() \|\| Sizes.empty() \|\|		if (Subscripts.empty() \|\| Sizes.empty() \|\|
Subscripts.size() != Sizes.size()) {		Subscripts.size() != Sizes.size()) {
		Subscripts.clear();
		Sizes.clear();

// Attempt to determine whether we have a single dimensional array access.		// Attempt to determine whether we have a single dimensional array access.
// before giving up.		// before giving up.
if (!isOneDimensionalArray(AccessFn, ElemSize, *L, SE)) {		if (!isOneDimensionalArray(AccessFn, ElemSize, *L, SE)) {
LLVM_DEBUG(dbgs().indent(2)		LLVM_DEBUG(dbgs().indent(2)
<< "ERROR: failed to delinearize reference\n");		<< "ERROR: failed to delinearize reference\n");
Subscripts.clear();
Sizes.clear();
return false;		return false;
}		}

// The array may be accessed in reverse, for example:		// The array may be accessed in reverse, for example:
// for (i = N; i > 0; i--)		// for (i = N; i > 0; i--)
// A[i] = 0;		// A[i] = 0;
// In this case, reconstruct the access function using the absolute value		// In this case, reconstruct the access function using the absolute value
// of the step recurrence.		// of the step recurrence.
▲ Show 20 Lines • Show All 298 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,461 Lines • ▼ Show 20 Lines	LLVM_DEBUG({
dbgs() << "Sizes:\n";		dbgs() << "Sizes:\n";
for (const SCEV *S : Sizes)		for (const SCEV *S : Sizes)
dbgs() << *S << "\n";		dbgs() << *S << "\n";
});		});
}		}

void ScalarEvolution::computeAccessFunctions(		void ScalarEvolution::computeAccessFunctions(
const SCEV Expr, SmallVectorImpl<const SCEV > &Subscripts,		const SCEV Expr, SmallVectorImpl<const SCEV > &Subscripts,
SmallVectorImpl<const SCEV *> &Sizes) {		ArrayRef<const SCEV *> Sizes) {
// Early exit in case this SCEV is not an affine multivariate function.		// Early exit in case this SCEV is not an affine multivariate function.
if (Sizes.empty())		if (Sizes.empty())
return;		return;

if (auto *AR = dyn_cast<SCEVAddRecExpr>(Expr))		if (auto *AR = dyn_cast<SCEVAddRecExpr>(Expr))
if (!AR->isAffine())		if (!AR->isAffine())
return;		return;

const SCEV *Res = Expr;		const SCEV *Res = Expr;
int Last = Sizes.size() - 1;		int Last = Sizes.size() - 1;
for (int i = Last; i >= 0; i--) {		for (int i = Last; i >= 0; i--) {
const SCEV Q, R;		const SCEV Q, R;
SCEVDivision::divide(*this, Res, Sizes[i], &Q, &R);		SCEVDivision::divide(*this, Res, Sizes[i], &Q, &R);

LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "Res: " << *Res << "\n";		dbgs() << "Res: " << *Res << "\n";
dbgs() << "Sizes[i]: " << *Sizes[i] << "\n";		dbgs() << "Sizes[i]: " << *Sizes[i] << "\n";
dbgs() << "Res divided by Sizes[i]:\n";		dbgs() << "Res divided by Sizes[i]:\n";
dbgs() << "Quotient: " << *Q << "\n";		dbgs() << "Quotient: " << *Q << "\n";
dbgs() << "Remainder: " << *R << "\n";		dbgs() << "Remainder: " << *R << "\n";
});		});

Res = Q;		Res = Q;

// Do not record the last subscript corresponding to the size of elements in
// the array.
if (i == Last) {

// Bail out if the remainder is too complex.
if (isa<SCEVAddRecExpr>(R)) {
Subscripts.clear();
Sizes.clear();
return;
}

continue;
}

// Record the access function for the current subscript.		// Record the access function for the current subscript.
Subscripts.push_back(R);		Subscripts.push_back(R);
}		}

// Also push in last position the remainder of the last division: it will be		// Also push in last position the remainder of the last division: it will be
// the access function of the innermost dimension.		// the access function of the innermost dimension.
Subscripts.push_back(Res);		Subscripts.push_back(Res);

Show All 32 Lines
/// because it appears as an offset that does not divide any of the strides in		/// because it appears as an offset that does not divide any of the strides in
/// the loops:		/// the loops:
///		///
/// CHECK: Base offset: %A		/// CHECK: Base offset: %A
///		///
/// and then SCEV->delinearize determines the size of some of the dimensions of		/// and then SCEV->delinearize determines the size of some of the dimensions of
/// the array as these are the multiples by which the strides are happening:		/// the array as these are the multiples by which the strides are happening:
///		///
/// CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of sizeof(double) bytes.		/// CHECK: ArrayDecl[UnknownSize][%m][%o][sizeof(double)]
///		///
/// Note that the outermost dimension remains of UnknownSize because there are		/// Note that the outermost dimension remains of UnknownSize because there are
/// no strides that would help identifying the size of the last dimension: when		/// no strides that would help identifying the size of the last dimension: when
/// the array has been statically allocated, one could compute the size of that		/// the array has been statically allocated, one could compute the size of that
/// dimension by dividing the overall size of the array by the size of the known		/// dimension by dividing the overall size of the array by the size of the known
/// dimensions: %m * %o * 8.		/// dimensions: %m * %o * 8.
///		///
/// Finally delinearize provides the access functions for the array reference		/// Finally delinearize provides the access functions for the array reference
/// that does correspond to A[i][j][k] of the above C testcase:		/// that does correspond to A[i][j][k] of the above C testcase:
///		///
/// CHECK: ArrayRef[{0,+,1}<%for.i>][{0,+,1}<%for.j>][{0,+,1}<%for.k>]		/// CHECK: ArrayRef[{0,+,1}<%for.i>][{0,+,1}<%for.j>][{0,+,1}<%for.k>][0]
///		///
/// The testcases are checking the output of a function pass:		/// The testcases are checking the output of a function pass:
/// DelinearizationPass that walks through all loads and stores of a function		/// DelinearizationPass that walks through all loads and stores of a function
/// asking for the SCEV of the memory access with respect to all enclosing		/// asking for the SCEV of the memory access with respect to all enclosing
/// loops, calling SCEV->delinearize on that and printing the results.		/// loops, calling SCEV->delinearize on that and printing the results.
void ScalarEvolution::delinearize(const SCEV *Expr,		void ScalarEvolution::delinearize(const SCEV *Expr,
SmallVectorImpl<const SCEV *> &Subscripts,		SmallVectorImpl<const SCEV *> &Subscripts,
SmallVectorImpl<const SCEV *> &Sizes,		SmallVectorImpl<const SCEV *> &Sizes,
▲ Show 20 Lines • Show All 1,525 Lines • Show Last 20 Lines

llvm/test/Analysis/Delinearization/a.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s
	;			;
	; void foo(long n, long m, long o, int A[n][m][o]) {			; void foo(long n, long m, long o, int A[n][m][o]) {
	; for (long i = 0; i < n; i++)			; for (long i = 0; i < n; i++)
	; for (long j = 0; j < m; j++)			; for (long j = 0; j < m; j++)
	; for (long k = 0; k < o; k++)			; for (long k = 0; k < o; k++)
	; A[2i+3][3j-4][5*k+7] = 1;			; A[2i+3][3j-4][5*k+7] = 1;
	; }			; }

	; AddRec: {{{(28 + (4 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,(12 * %o)}<%for.j>,+,20}<%for.k>			; AddRec: {{{(28 + (4 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,(12 * %o)}<%for.j>,+,20}<%for.k>
	; CHECK: Base offset: %A			; CHECK: Base offset: %A
	; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 4 bytes.			; CHECK: ArrayDecl[UnknownSize][%m][%o][4]
	; CHECK: ArrayRef[{3,+,2}<nuw><%for.i>][{-4,+,3}<nw><%for.j>][{7,+,5}<nw><%for.k>]			; CHECK: ArrayRef[{3,+,2}<nuw><%for.i>][{-4,+,3}<nw><%for.j>][{7,+,5}<nw><%for.k>][0]

	define void @foo(i64 %n, i64 %m, i64 %o, i32* nocapture %A) #0 {			define void @foo(i64 %n, i64 %m, i64 %o, i32* nocapture %A) #0 {
	entry:			entry:
	%cmp32 = icmp sgt i64 %n, 0			%cmp32 = icmp sgt i64 %n, 0
	br i1 %cmp32, label %for.cond1.preheader.lr.ph, label %for.end17			br i1 %cmp32, label %for.cond1.preheader.lr.ph, label %for.end17

	for.cond1.preheader.lr.ph: ; preds = %entry			for.cond1.preheader.lr.ph: ; preds = %entry
	%cmp230 = icmp sgt i64 %m, 0			%cmp230 = icmp sgt i64 %m, 0
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Analysis/Delinearization/byte_offset.ll

This file was added.

				; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
				; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s

				; CHECK: Inst: store float 0.000000e+00, float* %arrayidx.i4563, align 4
				; CHECK: In Loop with Header: for.body.i4567
				; CHECK: AccessFunction: {((sext i32 %mul7.i4534 to i64) + {(24 + (sext i32 %i1 to i64) + (16 * (sext i16 %i401 to i64))<nsw>),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>),+,4}<%for.body.i4567>
				; CHECK: Base offset: %i400
				; CHECK: ArrayDecl[UnknownSize][((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))][4]
				; CHECK: ArrayRef[0][{(6 + (4 * (sext i16 %i401 to i64))<nsw>)<nsw>,+,1}<%for.body.i4567>][((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>)]

				bmahjourUnsubmitted Not Done Reply Inline Actions could we have the psudo-c for this IR added here as a comment? bmahjour: could we have the psudo-c for this IR added here as a comment?
				MeinersburAuthorUnsubmitted Done Reply Inline Actions Not sure how useful you find these manually-decompiled IR that come from llvm-reduce, but here it is. You can also find the origin here: https://github.com/blender/blender/blob/765b842f9520843183bf0a3cdcd071f152bbbf9e/source/blender/blenkernel/intern/CCGSubSurf.c#L2131 Meinersbur: Not sure how useful you find these manually-decompiled IR that come from llvm-reduce, but here…
				bmahjourUnsubmitted Not Done Reply Inline Actions I haven't looked at the original example yet, but this looks like a case where delinearization should be expected to fail. The recovered subscripts above don't look anything like what the source level subscripts are. bmahjour: I haven't looked at the original example yet, but this looks like a case where delinearization…
				%struct.S1 = type { %struct.S1, i8, i16, i16, i16, i16 }
				%struct.S2 = type { %struct.S2, i8, i16, i16, i16, i16, %struct.S3, %struct.S1 }
				%struct.S3 = type { %struct.S3, i8, i16, i16, float, %struct.S2, %struct.S2, %struct.S1** }

				define void @foo() {
				entry:
				%i1 = load i32, i32* undef, align 8
				%shl.i.i = shl nuw i32 1, undef
				%add.i.i = add nuw nsw i32 %shl.i.i, 1
				%mul.i = mul nsw i32 %add.i.i, %add.i.i
				%add.i1797 = add nsw i32 %mul.i, %add.i.i
				%i398 = sext i32 %add.i1797 to i64
				%i399 = sext i32 %i1 to i64
				%i400 = load %struct.S1, %struct.S1* undef, align 8
				%i401 = load i16, i16* undef, align 8
				%arrayidx.i.i4361 = getelementptr inbounds %struct.S1, %struct.S1* %i400, i64 1
				%i402 = bitcast %struct.S1* %arrayidx.i.i4361 to %struct.S2**
				%idxprom.i4363 = sext i16 %i401 to i64
				%arrayidx.i4364 = getelementptr inbounds %struct.S2, %struct.S2* %i402, i64 %idxprom.i4363
				%arrayidx2.i.i4377 = getelementptr inbounds %struct.S2, %struct.S2* %arrayidx.i4364, i64 %idxprom.i4363
				%i403 = bitcast %struct.S2** %arrayidx2.i.i4377 to i8*
				br label %for.body703

				for.body703: ; preds = %for.end767, %entry
				%indvars.iv5141 = phi i64 [ 0, %entry ], [ %indvars.iv.next5142, %for.end767 ]
				%i412 = mul nsw i64 %indvars.iv5141, %i398
				%i413 = add nsw i64 %i412, 1
				%i414 = mul nsw i64 %i413, %i399
				%add.ptr.i4384 = getelementptr inbounds i8, i8* %i403, i64 %i414
				br i1 undef, label %for.body736, label %for.end767

				for.body736: ; preds = %for.body703
				%mul7.i4534 = shl i32 undef, undef
				%idxprom.i4535 = sext i32 %mul7.i4534 to i64
				%arrayidx.i4536 = getelementptr inbounds i8, i8* %add.ptr.i4384, i64 %idxprom.i4535
				%i438 = bitcast i8* %arrayidx.i4536 to float*
				br label %for.body.i4567

				for.body.i4567: ; preds = %for.body.i4567, %for.body736
				%indvars.iv.i4562 = phi i64 [ %indvars.iv.next.i4565, %for.body.i4567 ], [ 0, %for.body736 ]
				%arrayidx.i4563 = getelementptr inbounds float, float* %i438, i64 %indvars.iv.i4562
				store float 0.0, float* %arrayidx.i4563, align 4
				%indvars.iv.next.i4565 = add nuw nsw i64 %indvars.iv.i4562, 1
				br label %for.body.i4567

				for.end767: ; preds = %for.body703
				%indvars.iv.next5142 = add nuw nsw i64 %indvars.iv5141, 1
				br label %for.body703
				}

llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll

	; RUN: opt -delinearize -analyze -enable-new-pm=0 < %s \| FileCheck %s			; RUN: opt -delinearize -analyze -enable-new-pm=0 < %s \| FileCheck %s
	; RUN: opt -passes='print<delinearization>' -disable-output < %s 2>&1 \| FileCheck %s			; RUN: opt -passes='print<delinearization>' -disable-output < %s 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; CHECK: Inst: %tmp = load float, float* %arrayidx, align 4			; CHECK: Inst: %tmp = load float, float* %arrayidx, align 4
	; CHECK-NEXT: In Loop with Header: for.inc			; CHECK-NEXT: In Loop with Header: for.inc
	; CHECK-NEXT: AccessFunction: {(4 * %N * %call),+,4}<nsw><%for.inc>			; CHECK-NEXT: AccessFunction: {(4 * %N * %call),+,4}<nsw><%for.inc>
	; CHECK-NEXT: Base offset: %A			; CHECK-NEXT: Base offset: %A
	; CHECK-NEXT: ArrayDecl[UnknownSize][%N] with elements of 4 bytes.			; CHECK-NEXT: ArrayDecl[UnknownSize][%N][4]
	; CHECK-NEXT: ArrayRef[%call][{0,+,1}<nuw><nsw><%for.inc>]			; CHECK-NEXT: ArrayRef[%call][{0,+,1}<nuw><nsw><%for.inc>][0]

	; CHECK: Inst: %tmp5 = load float, float* %arrayidx4, align 4			; CHECK: Inst: %tmp5 = load float, float* %arrayidx4, align 4
	; CHECK-NEXT: In Loop with Header: for.inc			; CHECK-NEXT: In Loop with Header: for.inc
	; CHECK-NEXT: AccessFunction: {(4 * %call1),+,(4 * %N)}<nsw><%for.inc>			; CHECK-NEXT: AccessFunction: {(4 * %call1),+,(4 * %N)}<nsw><%for.inc>
	; CHECK-NEXT: Base offset: %B			; CHECK-NEXT: Base offset: %B
	; CHECK-NEXT: ArrayDecl[UnknownSize][%N] with elements of 4 bytes.			; CHECK-NEXT: ArrayDecl[UnknownSize][%N][4]
	; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.inc>][%call1]			; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.inc>][%call1][0]

	; Function Attrs: noinline nounwind uwtable			; Function Attrs: noinline nounwind uwtable
	define void @mat_mul(float* %C, float* %A, float* %B, i64 %N) #0 !kernel_arg_addr_space !2 !kernel_arg_access_qual !3 !kernel_arg_type !4 !kernel_arg_base_type !4 !kernel_arg_type_qual !5 {			define void @mat_mul(float* %C, float* %A, float* %B, i64 %N) #0 !kernel_arg_addr_space !2 !kernel_arg_access_qual !3 !kernel_arg_type !4 !kernel_arg_base_type !4 !kernel_arg_type_qual !5 {
	entry:			entry:
	br label %entry.split			br label %entry.split

	entry.split: ; preds = %entry			entry.split: ; preds = %entry
	%call = tail call i64 @_Z13get_global_idj(i32 0) #3			%call = tail call i64 @_Z13get_global_idj(i32 0) #3
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Analysis/Delinearization/divide_by_one.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i1:32-i64:64-a:0-n32"			target datalayout = "e-m:e-p:32:32-i1:32-i64:64-a:0-n32"

	; Check that division by 1 can be delinearized.			; Check that division by 1 can be delinearized.
	;			;
	; void test1(unsigned char *dst, int stride, int bs) {			; void test1(unsigned char *dst, int stride, int bs) {
	; for (int r = bs; r >= 0; --r)			; for (int r = bs; r >= 0; --r)
	; for (int c = 0; c < bs; ++c)			; for (int c = 0; c < bs; ++c)
	; dst[r * stride + c] = dst[(r + 1) * stride + c - 1];			; dst[r * stride + c] = dst[(r + 1) * stride + c - 1];
	; }			; }

	; AddRec: {{(-1 + ((1 + %bs) * %stride)),+,(-1 * %stride)}<%for.cond1.preheader>,+,1}<nw><%for.body3>			; AddRec: {{(-1 + ((1 + %bs) * %stride)),+,(-1 * %stride)}<%for.cond1.preheader>,+,1}<nw><%for.body3>
	; CHECK: Inst: %0 = load i8, i8* %arrayidx, align 1			; CHECK: Inst: %0 = load i8, i8* %arrayidx, align 1
	; CHECK: Base offset: %dst			; CHECK: Base offset: %dst
	; CHECK: ArrayDecl[UnknownSize][%stride] with elements of 1 bytes.			; CHECK: ArrayDecl[UnknownSize][%stride][1]
	; CHECK: ArrayRef[{(1 + %bs),+,-1}<nw><%for.cond1.preheader>][{-1,+,1}<nw><%for.body3>]			; CHECK: ArrayRef[{(1 + %bs),+,-1}<nw><%for.cond1.preheader>][{-1,+,1}<nw><%for.body3>][0]

	; AddRec: {{(%stride * %bs),+,(-1 * %stride)}<%for.cond1.preheader>,+,1}<nw><%for.body3>			; AddRec: {{(%stride * %bs),+,(-1 * %stride)}<%for.cond1.preheader>,+,1}<nw><%for.body3>
	; CHECK: Inst: store i8 %0, i8* %arrayidx7, align 1			; CHECK: Inst: store i8 %0, i8* %arrayidx7, align 1
	; CHECK: Base offset: %dst			; CHECK: Base offset: %dst
	; CHECK: ArrayDecl[UnknownSize][%stride] with elements of 1 bytes.			; CHECK: ArrayDecl[UnknownSize][%stride][1]
	; CHECK: ArrayRef[{%bs,+,-1}<nsw><%for.cond1.preheader>][{0,+,1}<nuw><nsw><%for.body3>]			; CHECK: ArrayRef[{%bs,+,-1}<nsw><%for.cond1.preheader>][{0,+,1}<nuw><nsw><%for.body3>][0]

	define void @test(i8* nocapture %dst, i32 %stride, i32 %bs) {			define void @test(i8* nocapture %dst, i32 %stride, i32 %bs) {
	entry:			entry:
	%cmp20 = icmp sgt i32 %bs, -1			%cmp20 = icmp sgt i32 %bs, -1
	br i1 %cmp20, label %for.cond1.preheader.lr.ph, label %for.end9			br i1 %cmp20, label %for.cond1.preheader.lr.ph, label %for.end9

	for.cond1.preheader.lr.ph:			for.cond1.preheader.lr.ph:
	%cmp218 = icmp slt i32 0, %bs			%cmp218 = icmp slt i32 0, %bs
	Show All 39 Lines

llvm/test/Analysis/Delinearization/himeno_1.ll

	Show All 23 Lines
	; for(i = 1; i < p_rows_sub; i++)			; for(i = 1; i < p_rows_sub; i++)
	; for(j = 1; j < p_cols_sub; j++)			; for(j = 1; j < p_cols_sub; j++)
	; for(k = 1; k < p_deps_sub; k++)			; for(k = 1; k < p_deps_sub; k++)
	; MR(a,0,i,j,k) = i + j + k;			; MR(a,0,i,j,k) = i + j + k;
	; }			; }

	; AddRec: {{{(4 + (4 * (sext i32 %a.deps to i64) * (1 + (sext i32 %a.cols to i64))) + %a.base),+,(4 * (sext i32 %a.deps to i64) * (sext i32 %a.cols to i64))}<%for.i>,+,(4 * (sext i32 %a.deps to i64))}<%for.j>,+,4}<%for.k>			; AddRec: {{{(4 + (4 * (sext i32 %a.deps to i64) * (1 + (sext i32 %a.cols to i64))) + %a.base),+,(4 * (sext i32 %a.deps to i64) * (sext i32 %a.cols to i64))}<%for.i>,+,(4 * (sext i32 %a.deps to i64))}<%for.j>,+,4}<%for.k>
	; CHECK: Base offset: %a.base			; CHECK: Base offset: %a.base
	; CHECK: ArrayDecl[UnknownSize][(sext i32 %a.cols to i64)][(sext i32 %a.deps to i64)] with elements of 4 bytes.			; CHECK: ArrayDecl[UnknownSize][(sext i32 %a.cols to i64)][(sext i32 %a.deps to i64)][4]
	; CHECK: ArrayRef[{1,+,1}<nuw><nsw><%for.i>][{1,+,1}<nuw><nsw><%for.j>][{1,+,1}<nuw><nsw><%for.k>]			; CHECK: ArrayRef[{1,+,1}<nuw><nsw><%for.i>][{1,+,1}<nuw><nsw><%for.j>][{1,+,1}<nuw><nsw><%for.k>][0]

	%struct.Mat = type { float*, i32, i32, i32, i32 }			%struct.Mat = type { float*, i32, i32, i32, i32 }

	define void @jacobi(i32 %nn, %struct.Mat* nocapture %a, %struct.Mat* nocapture %p) nounwind uwtable {			define void @jacobi(i32 %nn, %struct.Mat* nocapture %a, %struct.Mat* nocapture %p) nounwind uwtable {
	entry:			entry:
	%p.rows.ptr = getelementptr inbounds %struct.Mat, %struct.Mat* %p, i64 0, i32 2			%p.rows.ptr = getelementptr inbounds %struct.Mat, %struct.Mat* %p, i64 0, i32 2
	%p.rows = load i32, i32* %p.rows.ptr			%p.rows = load i32, i32* %p.rows.ptr
	%p.rows.sub = add i32 %p.rows, -1			%p.rows.sub = add i32 %p.rows, -1
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Analysis/Delinearization/himeno_2.ll

	Show All 23 Lines
	; for(i = 1; i < p_rows_sub; i++)			; for(i = 1; i < p_rows_sub; i++)
	; for(j = 1; j < p_cols_sub; j++)			; for(j = 1; j < p_cols_sub; j++)
	; for(k = 1; k < p_deps_sub; k++)			; for(k = 1; k < p_deps_sub; k++)
	; MR(a,0,i,j,k) = i + j + k;			; MR(a,0,i,j,k) = i + j + k;
	; }			; }

	; AddRec: {{{(4 + (4 * (sext i32 %a.deps to i64) * (1 + (sext i32 %a.cols to i64))) + %a.base),+,(4 * (sext i32 %a.deps to i64) * (sext i32 %a.cols to i64))}<%for.i>,+,(4 * (sext i32 %a.deps to i64))}<%for.j>,+,4}<%for.k>			; AddRec: {{{(4 + (4 * (sext i32 %a.deps to i64) * (1 + (sext i32 %a.cols to i64))) + %a.base),+,(4 * (sext i32 %a.deps to i64) * (sext i32 %a.cols to i64))}<%for.i>,+,(4 * (sext i32 %a.deps to i64))}<%for.j>,+,4}<%for.k>
	; CHECK: Base offset: %a.base			; CHECK: Base offset: %a.base
	; CHECK: ArrayDecl[UnknownSize][(sext i32 %a.cols to i64)][(sext i32 %a.deps to i64)] with elements of 4 bytes.			; CHECK: ArrayDecl[UnknownSize][(sext i32 %a.cols to i64)][(sext i32 %a.deps to i64)][4]
	; CHECK: ArrayRef[{1,+,1}<nuw><nsw><%for.i>][{1,+,1}<nuw><nsw><%for.j>][{1,+,1}<nuw><nsw><%for.k>]			; CHECK: ArrayRef[{1,+,1}<nuw><nsw><%for.i>][{1,+,1}<nuw><nsw><%for.j>][{1,+,1}<nuw><nsw><%for.k>][0]

	%struct.Mat = type { float*, i32, i32, i32, i32 }			%struct.Mat = type { float*, i32, i32, i32, i32 }

	define void @jacobi(i32 %nn, %struct.Mat* nocapture %a, %struct.Mat* nocapture %p) nounwind uwtable {			define void @jacobi(i32 %nn, %struct.Mat* nocapture %a, %struct.Mat* nocapture %p) nounwind uwtable {
	entry:			entry:
	%p.rows.ptr = getelementptr inbounds %struct.Mat, %struct.Mat* %p, i64 0, i32 2			%p.rows.ptr = getelementptr inbounds %struct.Mat, %struct.Mat* %p, i64 0, i32 2
	%p.rows = load i32, i32* %p.rows.ptr			%p.rows = load i32, i32* %p.rows.ptr
	%p.rows.sub = add i32 %p.rows, -1			%p.rows.sub = add i32 %p.rows, -1
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Analysis/Delinearization/iv_times_constant_in_subscript.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s

	; Derived from the following code:			; Derived from the following code:
	;			;
	; void foo(long n, long m, long b, double A[n][m]) {			; void foo(long n, long m, long b, double A[n][m]) {
	; for (long i = 0; i < n; i++)			; for (long i = 0; i < n; i++)
	; for (long j = 0; j < m; j++)			; for (long j = 0; j < m; j++)
	; A[2i+b][2j] = 1.0;			; A[2i+b][2j] = 1.0;
	; }			; }

	; AddRec: {{((%m * %b * 8) + %A),+,(2 * %m * 8)}<%for.i>,+,(2 * 8)}<%for.j>			; AddRec: {{((%m * %b * 8) + %A),+,(2 * %m * 8)}<%for.i>,+,(2 * 8)}<%for.j>
	; CHECK: Base offset: %A			; CHECK: Base offset: %A
	; CHECK: ArrayDecl[UnknownSize][%m] with elements of 8 bytes.			; CHECK: ArrayDecl[UnknownSize][%m][8]
	; CHECK: ArrayRef[{%b,+,2}<nsw><%for.i>][{0,+,2}<nuw><%for.j>]			; CHECK: ArrayRef[{%b,+,2}<nsw><%for.i>][{0,+,2}<nuw><%for.j>][0]


	define void @foo(i64 %n, i64 %m, i64 %b, double* %A) {			define void @foo(i64 %n, i64 %m, i64 %b, double* %A) {
	entry:			entry:
	br label %for.i			br label %for.i

	for.i:			for.i:
	%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]			%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]
	Show All 23 Lines

llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s

	; void foo(long n, long m, long o, double A[n][m][o]) {			; void foo(long n, long m, long o, double A[n][m][o]) {
	;			;
	; for (long i = 0; i < n; i++)			; for (long i = 0; i < n; i++)
	; for (long j = 0; j < m; j++)			; for (long j = 0; j < m; j++)
	; for (long k = 0; k < o; k++)			; for (long k = 0; k < o; k++)
	; A[i+3][j-4][k+7] = 1.0;			; A[i+3][j-4][k+7] = 1.0;
	; }			; }

	; AddRec: {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k>			; AddRec: {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k>
	; CHECK: Base offset: %A			; CHECK: Base offset: %A
	; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes.			; CHECK: ArrayDecl[UnknownSize][%m][%o][8]
	; CHECK: ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nw><%for.j>][{7,+,1}<nuw><nsw><%for.k>]			; CHECK: ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nw><%for.j>][{7,+,1}<nuw><nsw><%for.k>][0]

	define void @foo(i64 %n, i64 %m, i64 %o, double* %A) {			define void @foo(i64 %n, i64 %m, i64 %o, double* %A) {
	entry:			entry:
	br label %for.i			br label %for.i

	for.i:			for.i:
	%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]			%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]
	br label %for.j			br label %for.j
	Show All 36 Lines

llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_nts_3d.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s

	; void foo(long n, long m, long o, long p, double A[n][m][o+p]) {			; void foo(long n, long m, long o, long p, double A[n][m][o+p]) {
	;			;
	; for (long i = 0; i < n; i++)			; for (long i = 0; i < n; i++)
	; for (long j = 0; j < m; j++)			; for (long j = 0; j < m; j++)
	; for (long k = 0; k < o; k++)			; for (long k = 0; k < o; k++)
	; A[i+3][j-4][k+7] = 1.0;			; A[i+3][j-4][k+7] = 1.0;
	; }			; }

	; AddRec: {{{(56 + (8 * (-4 + (3 * %m)) * (%o + %p)) + %A),+,(8 * (%o + %p) * %m)}<%for.cond4.preheader.lr.ph.us>,+,(8 * (%o + %p))}<%for.body6.lr.ph.us.us>,+,8}<%for.body6.us.us>			; AddRec: {{{(56 + (8 * (-4 + (3 * %m)) * (%o + %p)) + %A),+,(8 * (%o + %p) * %m)}<%for.cond4.preheader.lr.ph.us>,+,(8 * (%o + %p))}<%for.body6.lr.ph.us.us>,+,8}<%for.body6.us.us>
	; CHECK: Base offset: %A			; CHECK: Base offset: %A
	; CHECK: ArrayDecl[UnknownSize][%m][(%o + %p)] with elements of 8 bytes.			; CHECK: ArrayDecl[UnknownSize][%m][(%o + %p)][8]
	; CHECK: ArrayRef[{3,+,1}<nuw><%for.cond4.preheader.lr.ph.us>][{-4,+,1}<nw><%for.body6.lr.ph.us.us>][{7,+,1}<nw><%for.body6.us.us>]			; CHECK: ArrayRef[{3,+,1}<nuw><%for.cond4.preheader.lr.ph.us>][{-4,+,1}<nw><%for.body6.lr.ph.us.us>][{7,+,1}<nw><%for.body6.us.us>][0]

	define void @foo(i64 %n, i64 %m, i64 %o, i64 %p, double* nocapture %A) nounwind uwtable {			define void @foo(i64 %n, i64 %m, i64 %o, i64 %p, double* nocapture %A) nounwind uwtable {
	entry:			entry:
	%add = add nsw i64 %p, %o			%add = add nsw i64 %p, %o
	%cmp22 = icmp sgt i64 %n, 0			%cmp22 = icmp sgt i64 %n, 0
	br i1 %cmp22, label %for.cond1.preheader.lr.ph, label %for.end16			br i1 %cmp22, label %for.cond1.preheader.lr.ph, label %for.end16

	for.cond1.preheader.lr.ph: ; preds = %entry			for.cond1.preheader.lr.ph: ; preds = %entry
	Show All 40 Lines

llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s

	; void foo(long n, long m, long o, double A[n][m][o], long p, long q, long r) {			; void foo(long n, long m, long o, double A[n][m][o], long p, long q, long r) {
	;			;
	; for (long i = 0; i < n; i++)			; for (long i = 0; i < n; i++)
	; for (long j = 0; j < m; j++)			; for (long j = 0; j < m; j++)
	; for (long k = 0; k < o; k++)			; for (long k = 0; k < o; k++)
	; A[i+p][j+q][k+r] = 1.0;			; A[i+p][j+q][k+r] = 1.0;
	; }			; }

	; AddRec: {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k>			; AddRec: {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k>
	; CHECK: Base offset: %A			; CHECK: Base offset: %A
	; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes.			; CHECK: ArrayDecl[UnknownSize][%m][%o][8]
	; CHECK: ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nw><%for.j>][{%r,+,1}<nsw><%for.k>]			; CHECK: ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nw><%for.j>][{%r,+,1}<nsw><%for.k>][0]

	define void @foo(i64 %n, i64 %m, i64 %o, double* %A, i64 %p, i64 %q, i64 %r) {			define void @foo(i64 %n, i64 %m, i64 %o, double* %A, i64 %p, i64 %q, i64 %r) {
	entry:			entry:
	br label %for.i			br label %for.i

	for.i:			for.i:
	%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]			%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]
	br label %for.j			br label %for.j
	Show All 36 Lines

llvm/test/Analysis/Delinearization/multidim_only_ivs_2d.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s

	; Derived from the following code:			; Derived from the following code:
	;			;
	; void foo(long n, long m, double A[n][m]) {			; void foo(long n, long m, double A[n][m]) {
	; for (long i = 0; i < n; i++)			; for (long i = 0; i < n; i++)
	; for (long j = 0; j < m; j++)			; for (long j = 0; j < m; j++)
	; A[i][j] = 1.0;			; A[i][j] = 1.0;
	; }			; }

	; Inst: %val = load double, double* %arrayidx			; Inst: %val = load double, double* %arrayidx
	; In Loop with Header: for.j			; In Loop with Header: for.j
	; AddRec: {{0,+,(%m * 8)}<%for.i>,+,8}<%for.j>			; AddRec: {{0,+,(%m * 8)}<%for.i>,+,8}<%for.j>
	; Base offset: %A			; Base offset: %A
	; ArrayDecl[UnknownSize][%m] with elements of 8 bytes.			; ArrayDecl[UnknownSize][%m][8]
	; ArrayRef[{0,+,1}<nuw><nsw><%for.i>][{0,+,1}<nuw><nsw><%for.j>]			; ArrayRef[{0,+,1}<nuw><nsw><%for.i>][{0,+,1}<nuw><nsw><%for.j>][0]

	; Inst: store double %val, double* %arrayidx			; Inst: store double %val, double* %arrayidx
	; In Loop with Header: for.j			; In Loop with Header: for.j
	; AddRec: {{%A,+,(8 * %m)}<%for.i>,+,8}<%for.j>			; AddRec: {{%A,+,(8 * %m)}<%for.i>,+,8}<%for.j>
	; CHECK: Base offset: %A			; CHECK: Base offset: %A
	; CHECK: ArrayDecl[UnknownSize][%m] with elements of 8 bytes.			; CHECK: ArrayDecl[UnknownSize][%m][8]
	; CHECK: ArrayRef[{0,+,1}<nuw><nsw><%for.i>][{0,+,1}<nuw><nsw><%for.j>]			; CHECK: ArrayRef[{0,+,1}<nuw><nsw><%for.i>][{0,+,1}<nuw><nsw><%for.j>][0]

	define void @foo(i64 %n, i64 %m, double* %A) {			define void @foo(i64 %n, i64 %m, double* %A) {
	entry:			entry:
	br label %for.i			br label %for.i

	for.i:			for.i:
	%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]			%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]
	%tmp = mul nsw i64 %i, %m			%tmp = mul nsw i64 %i, %m
	Show All 20 Lines

llvm/test/Analysis/Delinearization/multidim_only_ivs_3d.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s

	; void foo(long n, long m, long o, double A[n][m][o]) {			; void foo(long n, long m, long o, double A[n][m][o]) {
	;			;
	; for (long i = 0; i < n; i++)			; for (long i = 0; i < n; i++)
	; for (long j = 0; j < m; j++)			; for (long j = 0; j < m; j++)
	; for (long k = 0; k < o; k++)			; for (long k = 0; k < o; k++)
	; A[i][j][k] = 1.0;			; A[i][j][k] = 1.0;
	; }			; }

	; AddRec: {{{%A,+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k>			; AddRec: {{{%A,+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k>
	; CHECK: Base offset: %A			; CHECK: Base offset: %A
	; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes.			; CHECK: ArrayDecl[UnknownSize][%m][%o][8]
	; CHECK: ArrayRef[{0,+,1}<nuw><nsw><%for.i>][{0,+,1}<nuw><nsw><%for.j>][{0,+,1}<nuw><nsw><%for.k>]			; CHECK: ArrayRef[{0,+,1}<nuw><nsw><%for.i>][{0,+,1}<nuw><nsw><%for.j>][{0,+,1}<nuw><nsw><%for.k>][0]

	define void @foo(i64 %n, i64 %m, i64 %o, double* %A) {			define void @foo(i64 %n, i64 %m, i64 %o, double* %A) {
	entry:			entry:
	br label %for.i			br label %for.i

	for.i:			for.i:
	%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]			%i = phi i64 [ 0, %entry ], [ %i.inc, %for.i.inc ]
	br label %for.j			br label %for.j
	Show All 33 Lines

llvm/test/Analysis/Delinearization/multidim_only_ivs_3d_cast.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -delinearize \| FileCheck %s
	; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<delinearization>' -disable-output 2>&1 \| FileCheck %s
	; void foo(int n, int m, int o, double A[n][m][o]) {			; void foo(int n, int m, int o, double A[n][m][o]) {
	;			;
	; for (int i = 0; i < n; i++)			; for (int i = 0; i < n; i++)
	; for (int j = 0; j < m; j++)			; for (int j = 0; j < m; j++)
	; for (int k = 0; k < o; k++)			; for (int k = 0; k < o; k++)
	; A[i][j][k] = 1.0;			; A[i][j][k] = 1.0;
	; }			; }

	; AddRec: {{{%A,+,(8 * (zext i32 %m to i64) * (zext i32 %o to i64))}<%for.i>,+,(8 * (zext i32 %o to i64))}<%for.j>,+,8}<%for.k>			; AddRec: {{{%A,+,(8 * (zext i32 %m to i64) * (zext i32 %o to i64))}<%for.i>,+,(8 * (zext i32 %o to i64))}<%for.j>,+,8}<%for.k>
	; CHECK: Base offset: %A			; CHECK: Base offset: %A
	; CHECK: ArrayDecl[UnknownSize][(zext i32 %m to i64)][(zext i32 %o to i64)] with elements of 8 bytes.			; CHECK: ArrayDecl[UnknownSize][(zext i32 %m to i64)][(zext i32 %o to i64)][8]
	; CHECK: ArrayRef[{0,+,1}<%for.i>][{0,+,1}<%for.j>][{0,+,1}<%for.k>]			; CHECK: ArrayRef[{0,+,1}<%for.i>][{0,+,1}<%for.j>][{0,+,1}<%for.k>][0]

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @foo(i32 %n, i32 %m, i32 %o, double* %A) {			define void @foo(i32 %n, i32 %m, i32 %o, double* %A) {
	entry:			entry:
	%m_zext = zext i32 %m to i64			%m_zext = zext i32 %m to i64
	%n_zext = zext i32 %o to i64			%n_zext = zext i32 %o to i64
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Analysis/Delinearization/parameter_addrec_product.ll

	; RUN: opt -delinearize -analyze -enable-new-pm=0 < %s \| FileCheck %s			; RUN: opt -delinearize -analyze -enable-new-pm=0 < %s \| FileCheck %s
	; RUN: opt -passes='print<delinearization>' -disable-output < %s 2>&1 \| FileCheck %s			; RUN: opt -passes='print<delinearization>' -disable-output < %s 2>&1 \| FileCheck %s
	;			;
	; void foo(float A, long p) {			; void foo(float A, long p) {
	; for (long i = 0; i < 100; i++)			; for (long i = 0; i < 100; i++)
	; for (long j = 0; j < 100; j++)			; for (long j = 0; j < 100; j++)
	; A[i * (*p) + j] += i + j;			; A[i * (*p) + j] += i + j;
	; }			; }
	;			;
	; CHECK: ArrayDecl[UnknownSize][%pval] with elements of 4 bytes.			; CHECK: ArrayDecl[UnknownSize][%pval][4]
	; CHECK: ArrayRef[{0,+,1}<nuw><nsw><%bb2>][{0,+,1}<nuw><nsw><%bb4>]			; CHECK: ArrayRef[{0,+,1}<nuw><nsw><%bb2>][{0,+,1}<nuw><nsw><%bb4>][0]
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @foo(float* %A, i64* %p) {			define void @foo(float* %A, i64* %p) {
	bb:			bb:
	br label %bb2			br label %bb2

	bb2: ; preds = %bb16, %bb			bb2: ; preds = %bb16, %bb
	Show All 38 Lines

polly/lib/Analysis/ScopDetection.cpp

Show First 20 Lines • Show All 976 Lines • ▼ Show 20 Lines	for (const auto &Pair : Context.Accesses[BasePointer]) {
return false;		return false;
}		}
}		}
return false;		return false;
}		}
return true;		return true;
}		}

		static bool isKnownLessOrEqual(ScalarEvolution &SE, const SCEV *LHS,
		const SCEV *RHS) {
		return SE.isKnownPredicate(ICmpInst::ICMP_SLE, LHS, RHS);
		}

// We first store the resulting memory accesses in TempMemoryAccesses. Only		// We first store the resulting memory accesses in TempMemoryAccesses. Only
// if the access functions for all memory accesses have been successfully		// if the access functions for all memory accesses have been successfully
// delinearized we continue. Otherwise, we either report a failure or, if		// delinearized we continue. Otherwise, we either report a failure or, if
// non-affine accesses are allowed, we drop the information. In case the		// non-affine accesses are allowed, we drop the information. In case the
// information is dropped the memory accesses need to be overapproximated		// information is dropped the memory accesses need to be overapproximated
// when translated to a polyhedral representation.		// when translated to a polyhedral representation.
bool ScopDetection::computeAccessFunctions(		bool ScopDetection::computeAccessFunctions(
DetectionContext &Context, const SCEVUnknown *BasePointer,		DetectionContext &Context, const SCEVUnknown *BasePointer,
Show All 16 Lines	if (!AF) {
else		else
IsNonAffine = true;		IsNonAffine = true;
} else {		} else {
if (Shape->DelinearizedSizes.size() == 0) {		if (Shape->DelinearizedSizes.size() == 0) {
Acc->DelinearizedSubscripts.push_back(AF);		Acc->DelinearizedSubscripts.push_back(AF);
} else {		} else {
SE.computeAccessFunctions(AF, Acc->DelinearizedSubscripts,		SE.computeAccessFunctions(AF, Acc->DelinearizedSubscripts,
Shape->DelinearizedSizes);		Shape->DelinearizedSizes);
if (Acc->DelinearizedSubscripts.size() == 0)		if (Acc->DelinearizedSubscripts.size() == 0) {
		IsNonAffine = true;
		} else {
		// TODO: Do not drop the last DelinearizedSubscripts. If non-zero it
		// must be preseved for code-generation such that it represents the
		// same pointer.
		const SCEV *AccOffset = Acc->DelinearizedSubscripts.pop_back_val();
		const SCEV *ArrSize = Shape->DelinearizedSizes.back();
		bmahjourUnsubmitted Not Done Reply Inline Actions [nit] `ElementSize` may be a better name than `ArrSize`? bmahjour: [nit] `ElementSize` may be a better name than `ArrSize`?
		const SCEV *AccSize = Context.ElementSize[BasePointer];
		if (!AccOffset->isZero() \|\|
		!isKnownLessOrEqual(SE, AccSize, ArrSize)) {
		LLVM_DEBUG(dbgs()
		<< "Accessed element bytes [" << *AccOffset << ",+"
		<< *AccSize << ") possibly outside element [0,"
		<< *ArrSize << ")\n");
IsNonAffine = true;		IsNonAffine = true;
}		}
		}
		}
for (const SCEV *S : Acc->DelinearizedSubscripts)		for (const SCEV *S : Acc->DelinearizedSubscripts)
if (!isAffine(S, Scope, Context))		if (!isAffine(S, Scope, Context))
IsNonAffine = true;		IsNonAffine = true;
}		}

// (Possibly) report non affine access		// (Possibly) report non affine access
if (IsNonAffine) {		if (IsNonAffine) {
BasePtrHasNonAffine = true;		BasePtrHasNonAffine = true;
▲ Show 20 Lines • Show All 1,008 Lines • Show Last 20 Lines

polly/test/ScopDetect/array_elt_byte_offset.ll

This file was added.

				; RUN: opt %loadPolly -polly-detect -disable-output -debug < %s 2>&1 \| FileCheck %s

				; CHECK: Accessed element bytes [{{.+}},+4) possibly outside element [0,4)
				; REQUIRES: asserts

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				%struct.CCGFace = type { %struct.CCGFace, i8, i16, i16, i16, i16 }
				%struct.CCGVert = type { %struct.CCGVert, i8, i16, i16, i16, i16, %struct.CCGEdge, %struct.CCGFace }
				%struct.CCGEdge = type { %struct.CCGEdge, i8, i16, i16, float, %struct.CCGVert, %struct.CCGVert, %struct.CCGFace** }

				define void @ccgSubSurf__calcSubdivLevel() {
				entry:
				%shl.i1795 = shl nuw i32 1, undef
				%i = load i32, i32* undef, align 8
				%shl.i.i = shl nuw i32 1, undef
				%add.i.i = add nuw nsw i32 %shl.i.i, 1
				%mul.i = mul nsw i32 %add.i.i, %add.i.i
				%add.i1797 = add nsw i32 %mul.i, %add.i.i
				%smax = call i32 @llvm.smax.i32(i32 %shl.i1795, i32 1)
				%i1 = sext i32 %add.i1797 to i64
				%i2 = sext i32 %i to i64
				%i3 = load %struct.CCGFace, %struct.CCGFace* undef, align 8
				%i4 = load i16, i16* undef, align 8
				%arrayidx.i.i.i.i = getelementptr inbounds %struct.CCGFace, %struct.CCGFace* %i3, i64 1
				%i5 = bitcast %struct.CCGFace* %arrayidx.i.i.i.i to %struct.CCGVert**
				%idxprom.i.i.i = sext i16 %i4 to i64
				%arrayidx.i.i.i = getelementptr inbounds %struct.CCGVert, %struct.CCGVert* %i5, i64 %idxprom.i.i.i
				%arrayidx2.i.i = getelementptr inbounds %struct.CCGVert, %struct.CCGVert* %arrayidx.i.i.i, i64 %idxprom.i.i.i
				%i6 = bitcast %struct.CCGVert** %arrayidx2.i.i to i8*
				%i7 = load i32, i32* undef, align 4
				%wide.trip.count.i.us.us.us = zext i32 %i7 to i64
				br label %for.cond10.preheader.us.us.us

				for.cond10.preheader.us.us.us:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.cond10.for.inc38_crit_edge.split.us.split.us.us.us.us ]
				%i8 = mul nsw i64 %indvars.iv, %i1
				%i9 = add nsw i64 %i8, 1
				%i10 = mul nsw i64 %i9, %i2
				%add.ptr.i.us.us.us = getelementptr inbounds i8, i8* %i6, i64 %i10
				br label %for.cond14.preheader.us.us.us.us.us

				for.cond14.preheader.us.us.us.us.us:
				br label %for.body18.us.us.us.us.us.us

				for.body18.us.us.us.us.us.us:
				%x.04761.us.us.us.us.us.us = phi i32 [ 0, %for.cond14.preheader.us.us.us.us.us ], [ %add25.us.us.us.us.us.us, %VertDataAvg4.exit.loopexit.us.us.us.us.us.us ]
				%mul8.i.us.us.us.us.us.us = shl i32 undef, undef
				%add9.i.us.us.us.us.us.us = add nsw i32 %mul8.i.us.us.us.us.us.us, %add.i.i
				%mul10.i.us.us.us.us.us.us = mul nsw i32 %add9.i.us.us.us.us.us.us, %i
				%idxprom.i.us.us.us.us.us.us = sext i32 %mul10.i.us.us.us.us.us.us to i64
				%arrayidx.i.us.us.us.us.us.us = getelementptr inbounds i8, i8* %add.ptr.i.us.us.us, i64 %idxprom.i.us.us.us.us.us.us
				%i11 = bitcast i8* %arrayidx.i.us.us.us.us.us.us to float*
				%add25.us.us.us.us.us.us = add nuw nsw i32 %x.04761.us.us.us.us.us.us, 1
				br label %for.body.i.us.us.us.us.us.us

				for.body.i.us.us.us.us.us.us:
				%indvars.iv.i.us.us.us.us.us.us = phi i64 [ 0, %for.body18.us.us.us.us.us.us ], [ %indvars.iv.next.i.us.us.us.us.us.us, %for.body.i.us.us.us.us.us.us ]
				%arrayidx.i1890.us.us.us.us.us.us = getelementptr inbounds float, float* %i11, i64 %indvars.iv.i.us.us.us.us.us.us
				%i12 = load float, float* %arrayidx.i1890.us.us.us.us.us.us, align 4
				%indvars.iv.next.i.us.us.us.us.us.us = add nuw nsw i64 %indvars.iv.i.us.us.us.us.us.us, 1
				%exitcond.not.i.us.us.us.us.us.us = icmp eq i64 %indvars.iv.next.i.us.us.us.us.us.us, %wide.trip.count.i.us.us.us
				br i1 %exitcond.not.i.us.us.us.us.us.us, label %VertDataAvg4.exit.loopexit.us.us.us.us.us.us, label %for.body.i.us.us.us.us.us.us

				VertDataAvg4.exit.loopexit.us.us.us.us.us.us:
				%exitcond.not = icmp eq i32 %add25.us.us.us.us.us.us, %smax
				br i1 %exitcond.not, label %for.cond14.for.inc35_crit_edge.split.us.us.us.us.us.us, label %for.body18.us.us.us.us.us.us

				for.cond14.for.inc35_crit_edge.split.us.us.us.us.us.us:
				br i1 undef, label %for.cond10.for.inc38_crit_edge.split.us.split.us.us.us.us, label %for.cond14.preheader.us.us.us.us.us

				for.cond10.for.inc38_crit_edge.split.us.split.us.us.us.us:
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond10.preheader.us.us.us
				}

				declare i32 @llvm.smax.i32(i32, i32)

This is an archive of the discontinued LLVM Phabricator instance.

[Delinerization] Keep array element byte offset.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 369304

llvm/include/llvm/Analysis/ScalarEvolution.h

llvm/lib/Analysis/Delinearization.cpp

llvm/lib/Analysis/DependenceAnalysis.cpp

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/Delinearization/a.ll

llvm/test/Analysis/Delinearization/byte_offset.ll

llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll

llvm/test/Analysis/Delinearization/divide_by_one.ll

llvm/test/Analysis/Delinearization/himeno_1.ll

llvm/test/Analysis/Delinearization/himeno_2.ll

llvm/test/Analysis/Delinearization/iv_times_constant_in_subscript.ll

llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll

llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_nts_3d.ll

llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll

llvm/test/Analysis/Delinearization/multidim_only_ivs_2d.ll

llvm/test/Analysis/Delinearization/multidim_only_ivs_3d.ll

llvm/test/Analysis/Delinearization/multidim_only_ivs_3d_cast.ll

llvm/test/Analysis/Delinearization/parameter_addrec_product.ll

polly/lib/Analysis/ScopDetection.cpp

polly/test/ScopDetect/array_elt_byte_offset.ll

[Delinerization] Keep array element byte offset.
Needs ReviewPublic