- Create a representation in Polly to disambiguate size based and stride based representations of arrays.
- I have never taught Codegen about strided arrays, so that is something we would need to do as well.
- Backport code to detect chapel/Fortran style indexing. TODO: Codegen
- There is code in IslExprBuilder which is responsible for actually lowering the multidimensional index expression. This needs to be backported. This will be the next commit.
- Attached a test case, which is causing when the false invalidate line is removed out in buildAccessPollyAbstractIndex()i
- Teaching PPCG codegen about our indexing function
- Fixed the memory allocation and index expression creation issue by modifying ShapeInfo class to take in only dimension sizes from Chapel Frontend and creating instruction in LLVM-IR pertaining to computation of block values from these dimension sizes
- Modified getArrayOffset() to create proper extents for Chapel Arrays
Details
- Reviewers
bollu Meinersbur philip.pfaffe grosser
Diff Detail
Event Timeline
Next Steps:
- There was a need to introduce a new kind of value map for this to work, a map from const SCEV * -> Value *. I hugely dislike this design, and would love to change this.
- Ideally the ScopArrayInfo should only save pw_aff. However, since we _do_ allow nonaffine indexing, this is not very sane. We need some point in the design space which is not so painful to work with.
- We need to test the stride code path from the chapel source code.
I raised this concern already on Siddharth's version, but if this will succeed that I'll need to repeat it hear: ShapeInfo is pretty spaghetti, this should be fundamentally redesigned.
Why do we need to differentiate between strided and sized arrays? When does it matter? Physically these arrays are identical, so when are they treated differently?
Updated this patch according to Chapel's intrinsic conventions exclusively, thus it needs to be improved upon.
Only the ScopBuilder and ScopHelper are affected by this update.
lib/Analysis/ScopBuilder.cpp | ||
---|---|---|
752 | Yeah, this looks like dead code that was left out. If you remove the line, then what happens? Regarding Invalidate ScopYou mentioned that you get this error:
So, the reasons for invalidation are listed in this enum cpp enum AssumptionKind { ALIASING, INBOUNDS, WRAPPING, UNSIGNED, PROFITABLE, ERRORBLOCK, COMPLEXITY, INFINITELOOP, INVARIANTLOAD, DELINEARIZATION, }; this is in include/polly/ScopInfo.h:98 Since DELINEARIZATION has enum value 10, it means that it someone called invalidateScop(..., DELINERALIZATION, ....) Which is the line you are pointing at. Removing this line should solve the problem. |
lib/Analysis/ScopBuilder.cpp | ||
---|---|---|
752 | Well, removing this line is causing an assertion failure in isl_map.c. |
lib/Analysis/ScopBuilder.cpp | ||
---|---|---|
752 | That is interesting. Could you hunt down the assertion in isl_map.c?? That line is definitely wrong, because it just throws away our delineralisation. |
lib/Analysis/ScopBuilder.cpp | ||
---|---|---|
752 | The assertion is failing in for (unsigned i = 0; i < DimsMissing; i++) Map = Map.fix_si(isl::dim::out, i, 0); in updateDimensionality of ScopInfo.cpp. |
lib/Analysis/ScopBuilder.cpp | ||
---|---|---|
752 | Could you please create a minimal test case where this occurs? |
lib/Analysis/ScopBuilder.cpp | ||
---|---|---|
757 | @cs15btech11044, Here is where we add an array access from the Subscripts that we have derived. If we derive Subscripts incorrectly, we will record an incorrect array shape (and possibly an incorrect array index). Here is where the "link" between our "intrinsic" and Polly's array modelling happens. |
Updates made in this patch:
- There were some places which were strictly using size representation for their purpose. These failed whenever we relied on stride based representation. Some changes were made to ShapeInfo to allow flexible usage of Sizes or Shapes vector depending on their availability.
Test case is not yet updated. I will update it in the next diff update.
- Added the changes related to IslNodeBuilder and IslExprBuilder which were using ShapeInfo related information
- In PPCGCodeGeneration, handling of polly_array_index() has been taken care of.
- An extra test case has been added to the GPGPU/ demonstrating the crash related to improper context generation in the SCoP.
Changes made to this diff
- Fixed the complementing memory allocation and index expression problems by making some design changes in ShapeInfo class. The main reason for this change is that the block values which chapel uses aren't exactly the array sizes per dimension. So there arises a need to include the parameters to ShapeInfo which corresponds to actual array sizes.
Optimized the previous patch by passing only dimension Sizes and injecting LLVM-IR related to computation of block values from Dimension Sizes
As mentioned in the last phone call, I think we should not use 'Stride' as an alternative to row-major indexing. The primary reason is that there are no unique coordinates for a single memory location which means we cannot accurately compute dependencies. Indeed, the delinearization stuff is all about ensuring that there is no unpredictable aliasing.
If offsets are required, this can be added as padding between rows. The default row-major address computation for a 4-dimensional tensor of size n1×n2×n3×n4 at index (i1,i2,i3,i4) is:
address = base + ((i1*n2 + i2)*n3 + i3)*n4 + i4 = i1*n2*n3*n4 + i2*n3*n4 + i3*n4 + i4*1 ^~~~~~~^ ^~~~^ ^^ ^ These coefficients are what I interpret as 'strides'
(note that if the strides are precomputed, both representations have the same number of operations)
or
address = base + (((i1*n2 + i2)*n3 + i3)*n4 + i4)*sizeof(*address)
when we compute in byte units.
With padding of p1,p2,p3,p4 bytes between elements (i.e. p4 is the number of bytes we insert after every element) we get an expression such as
address = base + (sizeof(*address) + p4)*i4; ^~~~~~~~~~~~~~~~~~~~~~^ size per element incl padding
in one dimension.
address = base + ((sizeof(*address) + p4)*n4 + p3)*i3 + (sizeof(*address) + p4)*i4; ^~~~~~~~~~~~~~~~~~~~~~~~~~^ ^~~~~~~~~~~~~~~~~~~~~~~~~^ size of lower-dim line Index in that line
in two dimensions.
address = base + (((sizeof(*address) + p4)*n4 + p3)*n3 + p2)*i2 + ((sizeof(*address) + p4)*n4 + p3)*i3 + (sizeof(*address) + p4)*i4;
in three dimensions.
address = base + ((((p4 + sizeof(*address))*n4 + p3)*n3 + p2)*n2 + p1)*i1 + (((p4 + sizeof(*address))*n4 + p3)*n3 + p2)*i2 + ((p4 + sizeof(*address))*n4 + p3)*i3 + (p4 + sizeof(*address))*i4;
in four dimensions. Note that p1 unused (it would be added just once at the end of the tensor; it could be used if we add padding before each element/line/row/column in which case it would represent a constant offset of the first element to the base pointer).
Padding/dimension sizes should be known by higher level languages. If we only have the strides were are back to the delinerization problem.
include/polly/CodeGen/IslExprBuilder.h | ||
---|---|---|
279–280 | Please use doxygen comments for member documentation (triple slashes ////) |
I am following this suggestion. However, I would like to clarify something.
Suppose we are considering a 3-dimensional array named arr (Consider C language for now).
Does the offset/padding solution suggested in the previous comment model the situation where each dimension of the array has its range from 0 to some dimension size value and while accessing the array, we use some arithmetic along with the respective dimension index ( arr[i+1][2*j-1][3*k] for instance)?
Also could you please elaborate on how would the byte-based indexing be more beneficial than the index-based approach?
In that case i1/i2/i3 would be those expressions (i1=i+1,i2=2*j-1,'i3=3*k'). What matters is that behavior is only defined if each subscript evaluates to something within the size (0<=i1<n1) for its dimension.
Also could you please elaborate on how would the byte-based indexing be more beneficial than the index-based approach?
It would allow padding that is not a multiple of the element size. Take for instance the 24bit BMP image format. Each pixel takes 3 bytes, but each line must start at 4-byte boundaries. E.g. use a 3x5 image (RGB=color value, '.'=padding):
RGBRGBRGBRGBRGB. RGBRGBRGBRGBRGB. RGBRGBRGBRGBRGB
Each line has 5*3=15 bytes. To start the next line at a 4-byte boundary, a single byte of padding is added (so each line effectively takes 16 bytes; called getTypeAllocSize in LLVM).
On top of my generic concern regarding the unclean implementation of the ShapeInfo itself, I left you a bunch of inline comments, mostly concerning style.
include/polly/CodeGen/IslExprBuilder.h | ||
---|---|---|
97 | Why is this a MapVector instead of a DenseMap? | |
include/polly/ScopInfo.h | ||
291 | Make this an inline friend? | |
675 | This and the comment should be removed. | |
include/polly/Support/ScopHelper.h | ||
486 | StringRef? | |
lib/Analysis/ScopBuilder.cpp | ||
686 | What's the point of this check? | |
700 | Pointless IILE; | |
749 | What's the point of these asserts? | |
1606 | Why {nullptr}? | |
lib/Analysis/ScopDetection.cpp | ||
546 | Why count()? | |
574 | Superfluous braces. | |
lib/Analysis/ScopInfo.cpp | ||
1074–1075 | Is the nullptr intentional? | |
4011 | Should be static. | |
4029 | There is no need to make this an IILE. | |
4053 | Unrelated. | |
4085 | This should really be an error. | |
lib/CodeGen/IslExprBuilder.cpp | ||
291–292 | This entire thing should be broken up into at least three functions. | |
lib/CodeGen/IslNodeBuilder.cpp | ||
1216 | Prefer using the c++ API. | |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
146 | I really don't like IILE patterns. It doesn't even save you structure here. | |
808 | Unrelated. | |
1106 | Unrelated. | |
1113 | Unrelated. | |
1210 | Unrelated. | |
1212 | Unrelated. | |
1466 | Should be static. | |
1796 | This seems unsound. | |
1801 | Why 4? This needs documentation. | |
2041 | What does this code even do? | |
3579 | This seems like an unsound or at least unrelated change. | |
lib/Support/ScopHelper.cpp | ||
708 | You can just say None. | |
723 | See above. | |
726 | Why would the operand be null? | |
731 | See above. | |
733 | Why do you count() instead of equality compare? | |
738 | Just return the make_pair. |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
3579 | Initially, the RTC's were not passing. So we had temporarily overridden the RTC to be always true, in order to check whether the generated kernel function had proper code. |
For Chapel Arrays with custom dimension ranges (like arr[1..4][1..4] for instance), there were two issues
- For generating OffsetValue in the Kernel function for GPU, there wasn't any reference to the value counterpart given to expandCodeFor() in IslExprBuilder.cpp. It is fixed by passing the GlobalMap to the function.
- While testing the code with arrays as above, there was no need to compute getArrayOffset() for Chapel case, since Chapel internally takes care of its indices to make it 0 based. PPCGCodegen is unaware of this fact and still generates offsets which results in out of bound accesses which are originally valid. For now, I have commented it out, but I would love to have a more concrete solution here. Would having an extra flag for Polly which toggles this change would suffice?
Why is this a MapVector instead of a DenseMap?