Page MenuHomePhabricator

[Polly][WIP] Use SCEV information for the second level aliasing
ClosedPublic

Authored by gareevroman on Jul 22 2017, 12:25 AM.

Details

Summary

We introduce another level of alias metadata to distinguish the individual non-aliasing accesses that have inter iteration alias-free base pointers marked with "Inter iteration alias-free" mark nodes. To distinguish two accesses, the comparison of raw pointers representing base pointers is used.

In case of, for example, ublas's prod function that implements GEMM, and DeLiCM we can get accesses to same location represented by different raw pointers. Consequently, we create different alias sets that can prevent accesses from, for example, being sinked or hoisted.

To avoid the issue, we compare the corresponding SCEV information instead of the corresponding raw pointers.

Diff Detail

Event Timeline

gareevroman created this revision.Jul 22 2017, 12:25 AM
grosser edited edge metadata.Jul 31 2017, 6:58 AM

Hi Roman,

thanks for the patch. I looked into it a couple of times, but need to have another deeper look. Two high-level questions already:

  1. Why do we need to use SCEV? Should we not be able to tell form our information which base pointers are expected to be identical?
  2. What exactly must be checked in the new test case here. What was generated before, why was this wrong, and what does the new code generate.

Also, I may have missed this earlier. Why are the number of elements in the check lines growing so much. Is this expected?

CHECK-NEXT: !43 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34, !36, !38, !40}

grosser requested changes to this revision.Aug 3 2017, 11:54 PM

Marking this as requesting changes, to move this out of the review queue for now.

Roman, it would be great if you could answer my questions before i look deeper.

This revision now requires changes to proceed.Aug 3 2017, 11:54 PM
gareevroman edited edge metadata.
  1. Why do we need to use SCEV? Should we not be able to tell form our information which base pointers are expected to be identical?

I think we should be able to tell that. However, it seems we don't do it.

The test case swaps the following access relations (the original JSCoP can be found in the new version of the patch):

{

"kind" : "read",
"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C1[0] }"

}

{

"kind" : "read",
"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }"

}

Subsequently, for some reason, Polly generates different accesses to elements of the matrix C. For example, in case of C[3][7], Polly generates the following code:

  %459 = mul nsw i64 96, %polly.indvar98
  %460 = mul nsw i64 4, %polly.indvar134
  %461 = add nsw i64 %459, %460
  %462 = add nsw i64 %461, 3
  %463 = mul nsw i64 8, %polly.indvar128
  %464 = add nsw i64 %463, 7
  %465 = mul nsw i64 256, %polly.indvar
  %466 = add nsw i64 %465, %polly.indvar140
  br label %polly.stmt.for.body6940

polly.stmt.for.body6940:                          ; preds = %polly.stmt.for.body6914
  %polly.access.cast.C941 = bitcast [1024 x double]* %C to double*
  %467 = mul nsw i64 96, %polly.indvar98
  %468 = mul nsw i64 4, %polly.indvar134
  %469 = add nsw i64 %467, %468
  %470 = add nsw i64 %469, 3
  %polly.access.mul.C942 = mul nsw i64 %470, 1024
  %471 = mul nsw i64 8, %polly.indvar128
  %472 = add nsw i64 %471, 7
  %polly.access.add.C943 = add nsw i64 %polly.access.mul.C942, %472
  %polly.access.C944 = getelementptr double, double* %polly.access.cast.C941, i64 %polly.access.add.C943
  %tmp_p_scalar_945 = load double, double* %polly.access.C944, align 8, !alias.scope !136, !noalias !137
  %polly.access.cast.Packed_A946 = bitcast [24 x [256 x [4 x double]]]* %Packed_A to double*
  %polly.access.mul.Packed_A947 = mul nsw i64 %polly.indvar134, 256
  %polly.access.add.Packed_A948 = add nsw i64 %polly.access.mul.Packed_A947, %polly.indvar140
  %polly.access.mul.Packed_A949 = mul nsw i64 %polly.access.add.Packed_A948, 4
  %polly.access.add.Packed_A950 = add nsw i64 %polly.access.mul.Packed_A949, 3
  %polly.access.Packed_A951 = getelementptr double, double* %polly.access.cast.Packed_A946, i64 %polly.access.add.Packed_A950
  %tmp1_p_scalar_952 = load double, double* %polly.access.Packed_A951, align 8, !alias.scope !7, !noalias !10
  %polly.access.cast.Packed_B953 = bitcast [256 x [256 x [8 x double]]]* %Packed_B to double*
  %polly.access.mul.Packed_B954 = mul nsw i64 %polly.indvar128, 256
  %polly.access.add.Packed_B955 = add nsw i64 %polly.access.mul.Packed_B954, %polly.indvar140
  %polly.access.mul.Packed_B956 = mul nsw i64 %polly.access.add.Packed_B955, 8
  %polly.access.add.Packed_B957 = add nsw i64 %polly.access.mul.Packed_B956, 7
  %polly.access.Packed_B958 = getelementptr double, double* %polly.access.cast.Packed_B953, i64 %polly.access.add.Packed_B957
  %tmp2_p_scalar_959 = load double, double* %polly.access.Packed_B958, align 8, !alias.scope !6, !noalias !8
  %p_mul960 = fmul double %tmp1_p_scalar_952, %tmp2_p_scalar_959
  %p_add961 = fadd double %tmp_p_scalar_945, %p_mul960
  %polly.access.C1962 = getelementptr double, double* %C1, i64 0
  %tmp3_p_scalar_963 = load double, double* %polly.access.C1962, align 8, !alias.scope !3, !noalias !13
  %p_add18964 = fadd double %tmp3_p_scalar_963, %p_add961
  %scevgep965 = getelementptr [1024 x double], [1024 x double]* %C, i64 %462, i64 %464
  store double %p_add18964, double* %scevgep965, align 8, !alias.scope !138, !noalias !139
  %polly.indvar_next141 = add nsw i64 %polly.indvar140, 1
  %polly.loop_cond142 = icmp sle i64 %polly.indvar_next141, 255
  br i1 %polly.loop_cond142, label %polly.loop_header137, label %polly.loop_exit139

Let's consider how accesses to the matrix C are formed. Indices of the first dimension are equal:

%459 = mul nsw i64 96, %polly.indvar98
%460 = mul nsw i64 4, %polly.indvar134
%461 = add nsw i64 %459, %460
%462 = add nsw i64 %461, 3

%467 = mul nsw i64 96, %polly.indvar98
%468 = mul nsw i64 4, %polly.indvar134
%469 = add nsw i64 %467, %468
%470 = add nsw i64 %469, 3

Indices of the second dimension are equal too:

%463 = mul nsw i64 8, %polly.indvar128
%464 = add nsw i64 %463, 7

%471 = mul nsw i64 8, %polly.indvar128
%472 = add nsw i64 %471, 7

However, to store a value we access the two dimensional array:

%scevgep965 = getelementptr [1024 x double], [1024 x double]* %C, i64 %462, i64 %464

store double %p_add18964, double* %scevgep965, align 8

To read the value we access the one dimensional array:

%polly.access.cast.C941 = bitcast [1024 x double]* %C to double*

%polly.access.mul.C942 = mul nsw i64 %470, 1024

%polly.access.add.C943 = add nsw i64 %polly.access.mul.C942, %472
  %polly.access.C944 = getelementptr double, double* %polly.access.cast.C941, i64 %polly.access.add.C943
  %tmp_p_scalar_945 = load double, double* %polly.access.C944, align 8

Consequently, as far as I understand, we have two different base pointers that point to the same location. Since we compare raw pointers to determine a second level alias set, Polly generates different second level alias set for these read and write accesses to the matrix C. In case of DeLiCM, we have a similar situation. However, I haven't manage to get a reduced test case.

Probably, it'd be good to fix the problem of the redundant code generation of Polly. Unfortunately, I'm busy at the moment and can't do it. In any case, I think that it makes sense to make the second level aliasing use the SCEV information, since it makes it more robust.

  1. What exactly must be checked in the new test case here.

Check that we don't create different alias sets for locations represented by different raw pointers.

What was generated before, why was this wrong,

Previously, we generated 64 second level alias sets instead of 32 second level alias sets, since we comparee raw pointers to determine second level alias sets.

and what does the new code generate.

32 second level alias sets.

Also, I may have missed this earlier. Why are the number of elements in the check lines growing so much. Is this expected?

CHECK-NEXT: !43 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34, !36, !38, !40}

I think this is expected. The innermost loop computes 32 different elements of the matrix C. Consequently, 32 different second level alias sets are generated.

P.S.: Sorry, I've found out that we check the wrong output. I've updated the test case.

grosser accepted this revision.Aug 5 2017, 2:11 PM

OK, that's fine for me than. Can you possibly add to the test case check lines for the load and store to illustrate to which alias data they refer to?

This revision is now accepted and ready to land.Aug 5 2017, 2:11 PM
This revision was automatically updated to reflect the committed changes.

OK, that's fine for me than. Can you possibly add to the test case check lines for the load and store to illustrate to which alias data they refer to?

AFAIU, it'd require to add IR and make it dependent on the generated code. I'm not sure that it's a good approach.

grosser reopened this revision.Aug 8 2017, 9:59 AM

I am slightly confused. Don't we add IR in all our test cases?

This revision is now accepted and ready to land.Aug 8 2017, 9:59 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 5:41 AM
Herald added a subscriber: javed.absar. · View Herald Transcript