Page MenuHomePhabricator

Stripping invalid debug information before verification can prevent code-generation
AbandonedPublic

Authored by singam-sanjay on May 5 2017, 2:11 AM.

Details

Summary

Polly sometimes mishandles debug information inside SCoPs which eventually prevents generating optimised code. This patch strips this information the code before it's verified.

This behaviour was first observed when attempting to optimise gemm written in Julia using Polly-ACC. The code-generation of the NVPTX kernel used to fail silently until the error was revealed by passing &(llvm::errs()) to verifyModule in GPUNodebuilder::finalizeKernelFunction,

DICompileUnit not listed in llvm.dbg.cu
!11 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !12)

This was solved by calling "llvm::StripDebugInfo" right before verifyModule.

--- a/lib/CodeGen/PPCGCodeGeneration.cpp
+++ b/lib/CodeGen/PPCGCodeGeneration.cpp
@@ -28,6 +28,7 @@
 #include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/LegacyPassManager.h"
@@ -1707,7 +1708,8 @@ std::string GPUNodeBuilder::createKernelASM() {
 }
 
 std::string GPUNodeBuilder::finalizeKernelFunction() {
-  if (verifyModule(*GPUModule)) {
+  //llvm::StripDebugInfo(*GPUModule);
+  if (verifyModule(*GPUModule,&(llvm::errs()))) {
     BuildSuccessful = false;
     return "";
   }

Julia then crashed with the following error,

Invalid user of intrinsic instruction!
  store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)** %polly_launch_0_param_7, align 8
LLVM ERROR: Broken module found, compilation aborted!

Since this looked like a debug-intrinsic, the solution (a guess) was to use polly::isIgnoredIntrinsic to identify such instructions and avoid adding them in findReferencesInBlock, in IslNodeBuilder.cpp.

--- a/lib/CodeGen/IslNodeBuilder.cpp
+++ b/lib/CodeGen/IslNodeBuilder.cpp
@@ -204,7 +204,9 @@ int IslNodeBuilder::getNumberOfIterations(__isl_keep isl_ast_node *For) {
 /// Extract the values and SCEVs needed to generate code for a block.
 static int findReferencesInBlock(struct SubtreeReferences &References,
                                  const ScopStmt *Stmt, const BasicBlock *BB) {
-  for (const Instruction &Inst : *BB)
+  for (const Instruction &Inst : *BB) {
+    if (polly::isIgnoredIntrinsic(&Inst))
+      continue;
     for (Value *SrcVal : Inst.operands()) {
       auto *Scope = References.LI.getLoopFor(BB);
       if (canSynthesize(SrcVal, References.S, &References.SE, Scope)) {
@@ -213,6 +215,7 @@ static int findReferencesInBlock(struct SubtreeReferences &References,
       } else if (Value *NewVal = References.GlobalMap.lookup(SrcVal))
         References.Values.insert(NewVal);
     }
+  }
   return 0;
 }

NVPTX code was then successfully generated.

Diff Detail

Repository
rL LLVM

Event Timeline

singam-sanjay created this revision.May 5 2017, 2:11 AM

@grosser This is a follow-up to this post here and is required when Polly-ACC is used with julia-debug.

Meinersbur edited edge metadata.May 5 2017, 7:14 AM

Do you have a test case?

lib/CodeGen/IslNodeBuilder.cpp
208–210

We usually do not have braces around a single statement in an if-condition.

lib/CodeGen/PPCGCodeGeneration.cpp
1711

Where do the debug intrinsics come from? As far as I know we just do not copy them over to generated code.

Do you have a test case?

Tobias and I had a lengthy discussion in this post on polly-dev. When Polly-ACC is invoked through julia-debug, which is the version of the Julia interpreter compiled with debug-symbols and also inserts debug information into IR.

Attached are two files generated by julia and julia-debug respectively in the form JULIA_LLVM_ARGS="-polly-target=gpu -polly-dump-before -polly-dump-after" usr/bin/julia(or julia-debug) --check-bounds=no

The following file when compiled by opt -O3 -polly -polly-dump-after -polly-target=gpu generates "kernel_0.ll", corresponding to NVPTX kernel, and the file "kernel_gemm-before-after.ll" contains the NVPTX kernel embedded as the string kernel_0.


The same command when executed on this file doesn't produce "kernerl_0.ll" and "kernel_gemm-before-after.ll" 's kernel_0 is an empty string.

With the suggested patch, opt is able to generate NVPTX code for this file as well.

Removed braces from single-statement 'then' block.

Could you add some form of kernel_gemm-before-DEBUG.ll as a test case to the patch? Ideally with a comment explaining what would be wrong if there was debug metadata attached.

Could you also clarify in the summary whether you mean debug intrinsics (such as @llvm.dbg.declare) and debug metadata (such as !DILocalVariable)? The part in IslNodeBuilder seems to be about intrinsics, the part in PPCGCodeGeneration about debug metadata.

grosser edited edge metadata.May 7 2017, 4:15 AM

Hi Sanjay,

thanks for the updated patch. Some comments:

  1. Please always upload to phabricator with maximal context
  1. Please add a _minimal_ test case to this patch. kernel_gemm-before-DEBUG.ll already nicely crashes for me, but it is likely larger than needed. It would be really good to make this a minimal test case.
  1. Please include the error message in the commit message:

Cannot invoke an intrinsic other than donothing, patchpoint, statepoint, coro_resume or coro_destroy

store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)** %polly_launch_0_param_7

LLVM ERROR: Broken function found, compilation aborted!

  1. Please describe the actual error and how it got resolved in the commit message. "Polly mishandles debug info" is very generic. After you understood the problem, please describe it and explain how you resolved it.
  1. Use a minimal set of passes to test this issue in your test case. Instead of using 'opt -O3 -polly -polly-target=gpu', which runs a large set of passes, please use only 'opt -polly-codegen-ppcg'.
lib/CodeGen/PPCGCodeGeneration.cpp
1711

Right. If I run "opt -polly-codegen-ppcg ~/Downloads/kernel_gemm-before-DEBUG.ll" and remove this line, these is no problem. Is this change really needed. In case it is, please add a test case that breaks without this change. Otherwise, please remove this change.

grosser requested changes to this revision.May 7 2017, 11:13 PM

Ah, officially marking this as requiring changes, such that I get notified on an update.

lib/CodeGen/IslNodeBuilder.cpp
208–210

This has been fixed. Feel invited to mark it as "done".

This revision now requires changes to proceed.May 7 2017, 11:13 PM
singam-sanjay marked an inline comment as done.May 7 2017, 11:39 PM

Hi Sanjay,

thanks for the updated patch. Some comments:

  1. Please always upload to phabricator with maximal context

I'm compiling an intro to give context to this patch.

  1. Please add a _minimal_ test case to this patch. kernel_gemm-before-DEBUG.ll already nicely crashes for me, but it is likely larger than needed. It would be really good to make this a minimal test case.

I would need some help here. Will get back to on that.

  1. Please include the error message in the commit message:

You meant the patch summary ?

Cannot invoke an intrinsic other than donothing, patchpoint, statepoint, coro_resume or coro_destroy

store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)** %polly_launch_0_param_7

LLVM ERROR: Broken function found, compilation aborted!

Actually, this is not the only problem. The intro I'll be adding to the summary will detail all the errors.

  1. Please describe the actual error and how it got resolved in the commit message. "Polly mishandles debug info" is very generic. After you understood the problem, please describe it and explain how you resolved it.

Sure.

  1. Use a minimal set of passes to test this issue in your test case. Instead of using 'opt -O3 -polly -polly-target=gpu', which runs a large set of passes, please use only 'opt -polly-codegen-ppcg'.

I thought Polly's called only at -O3. I'm assuming the latter doesn't include many optimisation passes.

lib/CodeGen/PPCGCodeGeneration.cpp
1711

They are generated by Julia when invoked through julia-debug

Hi Sanjay,

thanks for the updated patch. Some comments:

  1. Please always upload to phabricator with maximal context

I'm compiling an intro to give context to this patch.

Cool.

  1. Please add a _minimal_ test case to this patch. kernel_gemm-before-DEBUG.ll already nicely crashes for me, but it is likely larger than needed. It would be really good to make this a minimal test case.

I would need some help here. Will get back to on that.

Sure.

  1. Please include the error message in the commit message:

You meant the patch summary ?

Yes.

Cannot invoke an intrinsic other than donothing, patchpoint, statepoint, coro_resume or coro_destroy

store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)** %polly_launch_0_param_7

LLVM ERROR: Broken function found, compilation aborted!

Actually, this is not the only problem. The intro I'll be adding to the summary will detail all the errors.

Let's fix a problem at a time.

  1. Please describe the actual error and how it got resolved in the commit message. "Polly mishandles debug info" is very generic. After you understood the problem, please describe it and explain how you resolved it.

Sure.

  1. Use a minimal set of passes to test this issue in your test case. Instead of using 'opt -O3 -polly -polly-target=gpu', which runs a large set of passes, please use only 'opt -polly-codegen-ppcg'.

I thought Polly's called only at -O3. I'm assuming the latter doesn't include many optimisation passes.

Sure. You can either run the full -O3 pass sequence, or individual passes. For tests we commonly run individual passes.

Best,
Tobias

singam-sanjay retitled this revision from Stripping debug-intrinsics to prevent Polly from code-generating them incorrectly to Stripping debug information before Polly code-generates them incorrectly.May 8 2017, 2:54 PM
singam-sanjay edited the summary of this revision. (Show Details)
singam-sanjay edited the summary of this revision. (Show Details)
singam-sanjay retitled this revision from Stripping debug information before Polly code-generates them incorrectly to Stripping invalid debug information before verification fails code-generation.May 8 2017, 2:56 PM

I've been using Julia (@1eef0279982c3b8b923ff9cd78bea6cfc07fb22c) with Polly @r302276 and LLVM's stable branch (4.0 patch 1) since Julia is having an issue with the trunk. Please let me know the other configurations you've been trying the patch in.

The following are files generated by this configuration (previous ones where from a 2 month old trunk),


For some reason Julia crashes after dumping code with the invalid debug-intrinsic. This intrinsic can be found file along with the NVPTX kernel.

Hi Sanjay,

thanks for improving the summary. Can you now please add a minmal test case in the Polly "test/" folder and make sure it fails with "make check" before your patch and passed after your patch. The patch should be minimal, so at best you manually "by-hand" reduce the test case to a minimal example. It might help to start off from something simpler than gemm, e.g. a simple copy kernel.

Best,
Tobias

singam-sanjay added inline comments.May 9 2017, 12:35 PM
lib/CodeGen/PPCGCodeGeneration.cpp
1711

verifyModule doesn't report errors since the 2nd argument, raw_ostream *OS = nullptr, is left empty. When &(llvm::errs()) is passed as the 2nd argument you get the following output,

DICompileUnit not listed in llvm.dbg.cu
!11 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !12)

Along with the following which eventually crashes the program.

Cannot invoke an intrinsic other than donothing, patchpoint, statepoint, coro_resume or coro_destroy
  store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)** %polly_launch_0_param_7
LLVM ERROR: Broken function found, compilation aborted!

Hello Tobias,

As explained in the previous in-line comment, the DICompileUnit not listed in llvm.dbg.cu error is unearthed only when verifyModule is passed a valid raw_ostream as the second argument. hard coding the passing of &(llvm::errs()) as the raw_ostream might make the output verbose and wouldn't be understood without context (SCoP etc.). Can we conditionally pass &(llvm::errs()) to verifyModule based on an option ? something like "-polly-report". When this is the case, I can write a test which can fail by RUNning opt -polly-report -polly-ppcg-codegen %s.

As for the second error, on the intrinsic, we could leave opt to crash to reveal the error. Or we'd have to codegenerate the resulting output by preventing opt from verifying it (-disable-verify) and therefor crashing it, and let FileCheck find it through CHECK: store void.

I need your help with finding the minimal test case that'd fix this issue. For this I would have to find out exactly which part of Polly is mishandling this code. FYI,

  1. The DICompileUnit error is uncovered when the GPU module is verified.
  2. The invalid debug-intrinsic is found in the midst of instructions that prepare launch parameter for the PTX kernel (%polly_launch_0_param_7 is at the end of the instruction). You'll find the instruction in the file below.

I feel that in trying to find the exact cause of the error, we might actually understand why Polly's mishandling the debug information and therefore fix that instead of using this work around.

hard coding the passing of &(llvm::errs()) as the raw_ostream might make the output verbose and wouldn't be understood without context (SCoP etc.). Can we conditionally pass &(llvm::errs()) to verifyModule based on an option ? something like "-polly-report". When this is the case, I can write a test which can fail by RUNning opt -polly-report -polly-ppcg-codegen %s.

This would look something like this patch,

--- a/lib/CodeGen/PPCGCodeGeneration.cpp
+++ b/lib/CodeGen/PPCGCodeGeneration.cpp
@@ -1707,7 +1708,12 @@ std::string GPUNodeBuilder::createKernelASM() {
 }
 
 std::string GPUNodeBuilder::finalizeKernelFunction() {
-  if (verifyModule(*GPUModule)) {
+  llvm::StripDebugInfo(*GPUModule);
+  
+  raw_ostream *OS=nullptr;
+  if(ReportLevel)
+    OS = &(llvm::errs());
+  if (verifyModule(*GPUModule,OS)) {
     BuildSuccessful = false;
     return "";
   }

ReportLevel is currently local to ScopDetection.cpp, hence cannot be used. Do you recommend moving it to RegisterPasses.h ? Please recommend an equivalent option.

The following file is the Module pointed to by GPUModule, which is in turn verified by verifiyModule,


This file is clearly missing the llvm.dgb.cu. This is an error on the part Polly since this module is generated by Polly itself. We could modify Polly to insert llvm.dbg.cu into the module in case any DICompileUnits exist. If not, I'm not sure how to write a test case that can catch this error since I do not know which part of the code is causing the error.

The erroneous debug-intrinsic error is averted only when almost all debug-intrinsics inside the CurrentScop are commented away. I did this by printing the Scop name (if--L65) at PPCGCodeGeneration::runOnScop and commented the debug-intrinsics at the source .ll file. Currentlly, error affects polly_launch_0_param_7. When the first few debug-intrinsics are commented, the error then occurs with polly_launch_0_param_8. As more intrinsics are commented away, the error moves onto polly_launch_0_param_9 untill finally disappearing when all the intrinsics in a sub-SCoP (if7--if9) .


The SCoP detected in this file is if--L65. You will observe that all intrinsics from the block if7 till if9 have been commented and un-commenting any one of them will crash opt.

To avoid the apparent complexities, I suggest a workaround by (somehow) stripping all debug information from the Region associated with the CurrentScop given to PPCGCodeGeneration::runOnScop.

The following Julia functions were used to get Julia to generate missing DICompileUnit and invalid intrinsic errors,

@polly function copy_mat(a,b)
       ni, nj, nk = size(a)
       for i=1:ni
         for j=1:nj
           a[i,j] =  b[i,j]*b[i,j]
         end
       end
    end

@polly function copy_3D_mat(a,b)
       ni, nj, nk = size(a)
       for i=1:ni
         for j=1:nj
           for k=1:nk
             a[i,j,k] =  b[i,j,k]
           end
         end
       end
    end

@polly function square_mat(a,b)
       ni, nj, nk = size(a)
       for i=1:ni
         for j=1:nj
           a[i,j] =  b[i,j]*b[i,j]
         end
       end
    end

@polly function square_3D_mat(a,b)
       ni, nj, nk = size(a)
       for i=1:ni
         for j=1:nj
           for k=1:nk
             a[i,j,k] =  b[i,j,k]*b[i,j,k]
           end
         end
       end
    end

Note that the difference between with square and copy is that a[...] is set to b[...]*b[...] and a[...] is set to just b[...], respectively.

Both copy_mat and copy_3D_mat crashed Julia with the Invalid user of intrinsic instruction! error,

julia> copy_mat(a,b)
Invalid user of intrinsic instruction!
  store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)** %polly_launch_0_param_5, align 8
LLVM ERROR: Broken module found, compilation aborted!

Interestingly, copy_mat-after.ll neither has the store void (metadata,.. instruction nor the string polly_launch_0_param_5 whereas copy_3D_mat-after.ll has the store void ... polly_launch_0_param_7 intrinsic and the buggy allocation of the the 7th launch parameter %polly_launch_0_param_7 = alloca void (metadata, i64, metadata, metadata)*.

Both square_mat and square_3D_mat generated the DICompileUnit not listed in llvm.dbg.cu error,

julia> square_3D_mat(a,b)
DICompileUnit not listed in llvm.dbg.cu
!10 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !11)
julia>


The anomaly here is that both square_mat-after.ll and square_3D_after.ll contain the buggy intrinsic store void (metadat,.. on polly_launch_0_param_5 and polly_launch_0_param_7 respectively, yet goes unnoticed in the case of Julia. Both square_*mat-before.ll crash opt,

$ opt -polly-codegen-ppcg -polly-acc-dump-kernel-ir square_mat-before-DICompileUnit-error.ll
DICompileUnit not listed in llvm.dbg.cu
!10 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !11)
Cannot invoke an intrinsic other than donothing, patchpoint, statepoint, coro_resume or coro_destroy
store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)**%polly_launch_0_param_5
LLVM ERROR: Broken function found, compilation aborted!
$ opt -polly-codegen-ppcg -polly-acc-dump-kernel-ir square_3D_mat-before-DICompileUnit-error.ll
DICompileUnit not listed in llvm.dbg.cu
!10 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !11)
Cannot invoke an intrinsic other than donothing, patchpoint, statepoint, coro_resume or coro_destroy
store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)**%polly_launch_0_param_7
LLVM ERROR: Broken function found, compilation aborted!

I've tried similar add_mat, sub_mat, add_zero_mat and zero_mat functions, defined below, and all of them,

  1. Resulted in the "DICompileUnit not listed in llvm.dbg.cu" error inside Julia
  2. Resulted in "DICompileUnit not listed in llvm.dbg.cu" and "Cannot invoke an intrinsic other than donothing, patchpoint,..." error when run on opt, eventually crashing it.
@polly function add_mat(a,b)
  ni,nj = size(a)
  for i=1:ni
    for j=1:nj
      a[i,j] = a[i,j] + b[i,j]
    end
  end
end


@polly function sub_mat(a,b)
  ni,nj = size(a)
  for i=1:ni
    for j=1:nj
      a[i,j] = a[i,j] - b[i,j]
    end
  end
end

@polly function add_zero_mat(a,b)
  ni,nj = size(a)
  for i=1:ni
    for j=1:nj
      a[i,j] = a[i,j] + 0
    end
  end
end

@polly function zero_mat(a,b)
  ni,nj = size(a)
  for i=1:ni
    for j=1:nj
      a[i,j] = b[i,j] - b[i,j]
    end
  end
end

The pattern w.r.t. the "DICompileUnit not listed..." error reveals that all functions which perform mathematical operations on the matrices, result in the error. The copy_mat and copy_3D_mat kernels do not result in an error since only load and store instructions are required by a copy kernel. This was confirmed when I replaced,

  • %34 = fadd double %29, %33, !dbg !29 in add_mat
  • %34 = fsub double %29, %33, !dbg !29 in sub_mat
  • %25 = fadd double %24, 0.000000e+00, !dbg !29 in add_zero_mat
  • %30 = fsub double %29, %29, !dbg !29 in zero_mat

by %A_i_j = load double, double* %addr_B_i_j, align 8, essentially making it a copy kernel, and didn't receive the DICompileUnit... error when I run it through opt.

Please check it for yourself by running these files through opt. I've commented out the mathematical op and added a load op right after it.

In all cases, when the module for the NVPTX kernel is generated, the last argument happens to be meta data,
define ptx_kernel void @kernel_0(i8 addrspace(1)* %MemRef0, i8 addrspace(1)* %MemRef1, i64 %p_0, i64 %p_1, i64 %p_2, void (metadata, i64, metadata, metadata)* %llvm.dbg.value)
This happens along with the Cannot invoke an intrinsic other than donothing, patchpoint,... error. I see that everytime, one of the final polly_launch_$ID_param_$PAR is always allocacting as a void (metadata, i64, metadata, metadata)* which I've highlighted in my previous comment.

... copy_3D_mat-after.ll has the store void ... polly_launch_0_param_7 intrinsic and the buggy allocation of the the 7th launch parameter %polly_launch_0_param_7 = alloca void (metadata, i64, metadata, metadata)*.

Any guesses as to why this happens ? I recollect Tobias talking about the "values in the subtree getting in the way". @grosser, could you please elaborate on that if you can recollect ?

Interesting. As another data point, can you try a single-dimensional parallel loop, where you add two vectors and see if the polly openmp backend crashes here as well?

singam-sanjay retitled this revision from Stripping invalid debug information before verification fails code-generation to Stripping invalid debug information before verification can prevent code-generation.May 11 2017, 10:37 PM

Interesting. As another data point, can you try a single-dimensional parallel loop, where you add two vectors and see if the polly openmp backend crashes here as well?

Running opt -polly-parallel -polly-process-unprofitable -polly-force-parallel on add_vec, square_mat and square_3D_mat didn't change the code. ( Is it because these are merely options which are valid only when polly::registerPasses is called on the pass manager, whereas -polly-codegen-ppcg is an option tied to adding the PPCGCodeGeneration pass to the PM because of the INITIALIZE_PASS_* macros ? )

Adding a -O3 -polly at the first of the options produced vectorised code, but not parallel code even for square_3D_mat.

In all cases, when the module for the NVPTX kernel is generated, the last argument happens to be meta data,
define ptx_kernel void @kernel_0(i8 addrspace(1)* %MemRef0, i8 addrspace(1)* %MemRef1, i64 %p_0, i64 %p_1, i64 %p_2, void (metadata, i64, metadata, metadata)* %llvm.dbg.value)
This happens along with the Cannot invoke an intrinsic other than donothing, patchpoint,... error. I see that everytime, one of the final polly_launch_$ID_param_$PAR is always allocacting as a void (metadata, i64, metadata, metadata)* which I've highlighted in my previous comment.

... copy_3D_mat-after.ll has the store void ... polly_launch_0_param_7 intrinsic and the buggy allocation of the the 7th launch parameter %polly_launch_0_param_7 = alloca void (metadata, i64, metadata, metadata)*.

Any guesses as to why this happens ? I recollect Tobias talking about the "values in the subtree getting in the way". @grosser, could you please elaborate on that if you can recollect ?

@grosser did you have a look at the last part of the previous comment ? You had mentioned that the debug-intrinsic error may be because of "values in the subtree getting in the way". I'd like to understand what you meant by that, doing so would enable me to write a test case that would inject these erroneous values into the subtree.

The kernel params which are allocated as metadata (e.g. %polly_launch_0_param_7 = alloca void (metadata, i64, metadata, metadata)* in copy_3D_mat-before ) are added into SubtreeValues in the following part of the code of GPUNodeBuilder::createLaunchParameters,

1276 for (auto Val : SubtreeValues) {
1277 Instruction *Param =
1278 new AllocaInst(Val->getType(), AddressSpace,
1279 Launch + "_param_" + std::to_string(Index),
1280 EntryBlock->getTerminator());
1281 Builder.CreateStore(Val, Param);
1282 Value *Slot = Builder.CreateGEP(
1283 Parameters, {Builder.getInt64(0), Builder.getInt64(Index)});
1284 Value *ParamTyped =
1285 Builder.CreatePointerCast(Param, Builder.getInt8PtrTy());
1286 Builder.CreateStore(ParamTyped, Slot);
1287 Index++;
1288 }

Does this given any insight into why we're having the buggy debug-intrinsic issue ?

@sanyam: Sorry, I am a little busy today, but can you please try to generate parallel code from C (with debug info) . I am almost certain it will also crash the same way (at least for the first bug). I should reply to your other question tomorrow.

See: http://polly.llvm.org/docs/HowToManuallyUseTheIndividualPiecesOfPolly.html#execute-the-individual-polly-passes-manually

Best,
Tobias

Hello @grosser,

Sorry for disturbing you.

Thanks for this ! I forgot to use -polly-codegen.

Calls to the OpenMP runtime were inserted when opt -polly-parallel -polly-codegen %FILE was run on copy_mat, copy_3D_mat, square_mat and square_3D_mat. But, Polly modified neither add_vec nor sub_vec, even on -polly-parallel-force. I recollect you saying that only the outer loop is parallelised and also only when there's considerable amount of work in each iteration. IN that case, why did you expect something as simple as vector addition to be parallelised ?

I am almost certain it will also crash the same way (at least for the first bug). I should reply to your other question tomorrow.

I've been trying out square_3D_mat written in C.

Polly is not considering it as a SCoP since there are non-affine accesses, e.g. a[N*N+N+1], in the code.

singam-sanjay added a comment.EditedMay 12 2017, 4:04 AM

@grosser I might have uploaded the wrong IR file.

I got the following files by doing clang -g -O3 -mllvm -polly -mllvm -S -emit-llvm kernels.c

The scops.mat_3D_add.dot file seemed to match the code since it looked like there was a double-nested loop. But opt -dot-scops kernels.ll gave me something else, was when I realised I was looking at the wrong file.

Even the dot file I got by clang -g -S -emit-llvm kernels.c and opt -dot-scops kernel.ll didn't look like it was an unoptimised file. I'm looking into why this happened.

Hello @grosser

The code that we need to analyze through opt is the code that is given to Polly. This is obtained by adding the "clang -g -O3 -mllvm -polly -mllvm -polly-dump-before kernels.c", which is going to be kernel-before.ll. We must consider kernel-before.ll file and not the kernel.ll, which is code that has been transformed by passes after Polly.

opt -dot-scops kernel-before.ll reveals that Polly is unable to detect a SCoP because of the non-affine access,

Non affine access function: {{{0,+,((zext i32 %0 to i64) * (zext i32 %0 to i64))}<%8>,+,(zext i32 %0 to i64)}<%12>,+,1}<nw><%16>".

The function signature is void mat_3D_add( int N, char a[N][N][N], char b[N][N][N] ), thus mapping the first argument in the C signature, N, to the first in the IR signature, %0.

Polly is therefore having a problem with determining the dependences because of the non-affine access A[N*N+N+1]. I thought Polly would be able to reconstruct this as a multidimensional array access, acc to this paper On Recovering Multi-Dimensional Arrays in Polly. Note that the motivating example given in the paper has more parameters than mat_3D_add used here.

grosser added a comment.EditedMay 14 2017, 10:42 PM

Hi Sanjay,

I was thinking of the following experiment:

$cat /tmp/test.c
void foo(float A[], long p) {
  for (long i = 0; i < 1024; i++)
    for (long j = 0; j < 1024; j++)
      A[i] = A[i] * p;
}
$pc /tmp/test.c -O3 -mllvm -polly -c -mllvm -debug-only=polly-ast -mllvm -polly-process-unprofitable -mllvm -polly-parallel -mllvm -polly-tiling=false 
:: isl ast :: foo :: %for.cond1.preheader---%for.cond.cleanup
{  :  }
{ domain: "{ Stmt1[i0, i1] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 }", child: { schedule: "[{ Stmt1[i0, i1] -> [(i0)] }]", child: { schedule: "[{ Stmt1[i0, i1] -> [(i1)] }]" } } }
if (1)

    #pragma omp parallel for
    for (int c0 = 0; c0 <= 1023; c0 += 1)
      #pragma minimal dependence distance: 1
      for (int c1 = 0; c1 <= 1023; c1 += 1)
        Stmt1(c0, c1);

else
    {  /* original code */ }

I would have expected for this to break as well. As a last data point, it would be interesting to understand why this example does not break.

singam-sanjay abandoned this revision.Aug 8 2017, 5:29 AM