This is an archive of the discontinued LLVM Phabricator instance.

[Polly] [WIP] Introduce ShapeInfo into polly for sizes and strides.
Needs ReviewPublic

Authored by cs15btech11044 on Jul 6 2018, 7:42 AM.

Details

Summary
  • Create a representation in Polly to disambiguate size based and stride based representations of arrays.
  • I have never taught Codegen about strided arrays, so that is something we would need to do as well.
  • Backport code to detect chapel/Fortran style indexing. TODO: Codegen
  • There is code in IslExprBuilder which is responsible for actually lowering the multidimensional index expression. This needs to be backported. This will be the next commit.
  • Attached a test case, which is causing when the false invalidate line is removed out in buildAccessPollyAbstractIndex()i
  • Teaching PPCG codegen about our indexing function
  • Fixed the memory allocation and index expression creation issue by modifying ShapeInfo class to take in only dimension sizes from Chapel Frontend and creating instruction in LLVM-IR pertaining to computation of block values from these dimension sizes
    • Modified getArrayOffset() to create proper extents for Chapel Arrays

Diff Detail

Event Timeline

cs15btech11044 created this revision.Jul 6 2018, 7:42 AM

Next Steps:

  • There was a need to introduce a new kind of value map for this to work, a map from const SCEV * -> Value *. I hugely dislike this design, and would love to change this.
  • Ideally the ScopArrayInfo should only save pw_aff. However, since we _do_ allow nonaffine indexing, this is not very sane. We need some point in the design space which is not so painful to work with.
  • We need to test the stride code path from the chapel source code.

I raised this concern already on Siddharth's version, but if this will succeed that I'll need to repeat it hear: ShapeInfo is pretty spaghetti, this should be fundamentally redesigned.

Why do we need to differentiate between strided and sized arrays? When does it matter? Physically these arrays are identical, so when are they treated differently?

Updated this patch according to Chapel's intrinsic conventions exclusively, thus it needs to be improved upon.
Only the ScopBuilder and ScopHelper are affected by this update.

cs15btech11044 added inline comments.Jul 8 2018, 3:15 AM
lib/Analysis/ScopBuilder.cpp
752

@bollu. I am talking about this line here

bollu added inline comments.Jul 8 2018, 3:22 AM
lib/Analysis/ScopBuilder.cpp
752

Yeah, this looks like dead code that was left out. If you remove the line, then what happens?

Regarding Invalidate Scop

You mentioned that you get this error:

Invalidate SCoP because of reason 9

So, the reasons for invalidation are listed in this enum

cpp
enum AssumptionKind {
  ALIASING,
  INBOUNDS,
  WRAPPING,
  UNSIGNED,
  PROFITABLE,
  ERRORBLOCK,
  COMPLEXITY,
  INFINITELOOP,
  INVARIANTLOAD,
  DELINEARIZATION,
};

this is in include/polly/ScopInfo.h:98

Since DELINEARIZATION has enum value 10, it means that it someone called

invalidateScop(..., DELINERALIZATION, ....)

Which is the line you are pointing at. Removing this line should solve the problem.

cs15btech11044 added inline comments.Jul 8 2018, 3:39 AM
lib/Analysis/ScopBuilder.cpp
752

Well, removing this line is causing an assertion failure in isl_map.c.
Whereas keeping it is still showing the SCoP and telling that it is invalidated.

bollu added inline comments.Jul 8 2018, 3:56 AM
lib/Analysis/ScopBuilder.cpp
752

That is interesting. Could you hunt down the assertion in isl_map.c?? That line is definitely wrong, because it just throws away our delineralisation.

cs15btech11044 added inline comments.Jul 8 2018, 4:21 AM
lib/Analysis/ScopBuilder.cpp
752

The assertion is failing in

for (unsigned i = 0; i < DimsMissing; i++)
    Map = Map.fix_si(isl::dim::out, i, 0);

in updateDimensionality of ScopInfo.cpp.
Assertion "pos < isl_map_dim(map, type)" has failed in isl_map.c

bollu added inline comments.Jul 8 2018, 4:30 AM
lib/Analysis/ScopBuilder.cpp
752

Could you please create a minimal test case where this occurs?

cs15btech11044 set the repository for this revision to rPLO Polly.

Added the minimal test case, which is causing isl_map assertion failure.

bollu added inline comments.Jul 9 2018, 8:34 AM
lib/Analysis/ScopBuilder.cpp
757

@cs15btech11044, Here is where we add an array access from the Subscripts that we have derived.

If we derive Subscripts incorrectly, we will record an incorrect array shape (and possibly an incorrect array index).

Here is where the "link" between our "intrinsic" and Polly's array modelling happens.

Updates made in this patch:

  • There were some places which were strictly using size representation for their purpose. These failed whenever we relied on stride based representation. Some changes were made to ShapeInfo to allow flexible usage of Sizes or Shapes vector depending on their availability.

Test case is not yet updated. I will update it in the next diff update.

Updated the test case for reference.

cs15btech11044 set the repository for this revision to rPLO Polly.
  • Added the changes related to IslNodeBuilder and IslExprBuilder which were using ShapeInfo related information
  • In PPCGCodeGeneration, handling of polly_array_index() has been taken care of.
  • An extra test case has been added to the GPGPU/ demonstrating the crash related to improper context generation in the SCoP.

Modified getFortranArrayIds() to check over all dimensions of the array.

cs15btech11044 edited the summary of this revision. (Show Details)
cs15btech11044 set the repository for this revision to rPLO Polly.

Changes made to this diff

  • Fixed the complementing memory allocation and index expression problems by making some design changes in ShapeInfo class. The main reason for this change is that the block values which chapel uses aren't exactly the array sizes per dimension. So there arises a need to include the parameters to ShapeInfo which corresponds to actual array sizes.
cs15btech11044 edited the summary of this revision. (Show Details)

Optimized the previous patch by passing only dimension Sizes and injecting LLVM-IR related to computation of block values from Dimension Sizes

As mentioned in the last phone call, I think we should not use 'Stride' as an alternative to row-major indexing. The primary reason is that there are no unique coordinates for a single memory location which means we cannot accurately compute dependencies. Indeed, the delinearization stuff is all about ensuring that there is no unpredictable aliasing.

If offsets are required, this can be added as padding between rows. The default row-major address computation for a 4-dimensional tensor of size n1×n2×n3×n4 at index (i1,i2,i3,i4) is:

address = base + ((i1*n2 + i2)*n3 + i3)*n4 + i4 = i1*n2*n3*n4 + i2*n3*n4 + i3*n4 + i4*1
                                                     ^~~~~~~^      ^~~~^      ^^      ^ 
                                                     These coefficients are what I interpret as 'strides'

(note that if the strides are precomputed, both representations have the same number of operations)
or

address = base + (((i1*n2 + i2)*n3 + i3)*n4 + i4)*sizeof(*address)

when we compute in byte units.

With padding of p1,p2,p3,p4 bytes between elements (i.e. p4 is the number of bytes we insert after every element) we get an expression such as

address = base + (sizeof(*address) + p4)*i4;
                 ^~~~~~~~~~~~~~~~~~~~~~^
                 size per element incl padding

in one dimension.

address = base + ((sizeof(*address) + p4)*n4 + p3)*i3 + (sizeof(*address) + p4)*i4;
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~^            ^~~~~~~~~~~~~~~~~~~~~~~~~^
                 size of lower-dim line                 Index in that line

in two dimensions.

address = base + (((sizeof(*address) + p4)*n4 + p3)*n3 + p2)*i2 + ((sizeof(*address) + p4)*n4 + p3)*i3 + (sizeof(*address) + p4)*i4;

in three dimensions.

address = base + ((((p4 + sizeof(*address))*n4 + p3)*n3 + p2)*n2 + p1)*i1 + (((p4 + sizeof(*address))*n4 + p3)*n3 + p2)*i2 + ((p4 + sizeof(*address))*n4 + p3)*i3 + (p4 + sizeof(*address))*i4;

in four dimensions. Note that p1 unused (it would be added just once at the end of the tensor; it could be used if we add padding before each element/line/row/column in which case it would represent a constant offset of the first element to the base pointer).

Padding/dimension sizes should be known by higher level languages. If we only have the strides were are back to the delinerization problem.

include/polly/CodeGen/IslExprBuilder.h
280–281 ↗(On Diff #157873)

Please use doxygen comments for member documentation (triple slashes ////)

I am following this suggestion. However, I would like to clarify something.
Suppose we are considering a 3-dimensional array named arr (Consider C language for now).
Does the offset/padding solution suggested in the previous comment model the situation where each dimension of the array has its range from 0 to some dimension size value and while accessing the array, we use some arithmetic along with the respective dimension index ( arr[i+1][2*j-1][3*k] for instance)?

Also could you please elaborate on how would the byte-based indexing be more beneficial than the index-based approach?

Suppose we are considering a 3-dimensional array named arr (Consider C language for now).
Does the offset/padding solution suggested in the previous comment model the situation where each dimension of the array has its range from 0 to some dimension size value and while accessing the array, we use some arithmetic along with the respective dimension index ( arr[i+1][2*j-1][3*k] for instance)?

In that case i1/i2/i3 would be those expressions (i1=i+1,i2=2*j-1,'i3=3*k'). What matters is that behavior is only defined if each subscript evaluates to something within the size (0<=i1<n1) for its dimension.

Also could you please elaborate on how would the byte-based indexing be more beneficial than the index-based approach?

It would allow padding that is not a multiple of the element size. Take for instance the 24bit BMP image format. Each pixel takes 3 bytes, but each line must start at 4-byte boundaries. E.g. use a 3x5 image (RGB=color value, '.'=padding):

RGBRGBRGBRGBRGB.
RGBRGBRGBRGBRGB.
RGBRGBRGBRGBRGB

Each line has 5*3=15 bytes. To start the next line at a 4-byte boundary, a single byte of padding is added (so each line effectively takes 16 bytes; called getTypeAllocSize in LLVM).

On top of my generic concern regarding the unclean implementation of the ShapeInfo itself, I left you a bunch of inline comments, mostly concerning style.

include/polly/CodeGen/IslExprBuilder.h
97 ↗(On Diff #157873)

Why is this a MapVector instead of a DenseMap?

include/polly/ScopInfo.h
291

Make this an inline friend?

667

This and the comment should be removed.

include/polly/Support/ScopHelper.h
486

StringRef?

lib/Analysis/ScopBuilder.cpp
686

What's the point of this check?

700

Pointless IILE;

749

What's the point of these asserts?

1606

Why {nullptr}?

lib/Analysis/ScopDetection.cpp
546

Why count()?

574

Superfluous braces.

lib/Analysis/ScopInfo.cpp
1069–1070

Is the nullptr intentional?

4006

Should be static.

4024

There is no need to make this an IILE.

4048

Unrelated.

4080

This should really be an error.

lib/CodeGen/IslExprBuilder.cpp
292 ↗(On Diff #157873)

This entire thing should be broken up into at least three functions.

lib/CodeGen/IslNodeBuilder.cpp
1216 ↗(On Diff #157873)

Prefer using the c++ API.

lib/CodeGen/PPCGCodeGeneration.cpp
145

I really don't like IILE patterns. It doesn't even save you structure here.

807

Unrelated.

1105

Unrelated.

1112

Unrelated.

1209

Unrelated.

1211

Unrelated.

1465

Should be static.

1777

This seems unsound.

1782

Why 4? This needs documentation.

2022

What does this code even do?

3560

This seems like an unsound or at least unrelated change.

lib/Support/ScopHelper.cpp
708

You can just say None.

723

See above.

726

Why would the operand be null?

731

See above.

733

Why do you count() instead of equality compare?

738

Just return the make_pair.

cs15btech11044 marked 9 inline comments as done.Aug 4 2018, 2:55 AM
cs15btech11044 added inline comments.
lib/CodeGen/PPCGCodeGeneration.cpp
3560

Initially, the RTC's were not passing. So we had temporarily overridden the RTC to be always true, in order to check whether the generated kernel function had proper code.

cs15btech11044 edited the summary of this revision. (Show Details)
cs15btech11044 set the repository for this revision to rPLO Polly.

For Chapel Arrays with custom dimension ranges (like arr[1..4][1..4] for instance), there were two issues

  • For generating OffsetValue in the Kernel function for GPU, there wasn't any reference to the value counterpart given to expandCodeFor() in IslExprBuilder.cpp. It is fixed by passing the GlobalMap to the function.
  • While testing the code with arrays as above, there was no need to compute getArrayOffset() for Chapel case, since Chapel internally takes care of its indices to make it 0 based. PPCGCodegen is unaware of this fact and still generates offsets which results in out of bound accesses which are originally valid. For now, I have commented it out, but I would love to have a more concrete solution here. Would having an extra flag for Polly which toggles this change would suffice?
cs15btech11044 edited the summary of this revision. (Show Details)

Addressed @philip.pfaffe 's concern regarding disabling getArrayOffset().