This is an archive of the discontinued LLVM Phabricator instance.

[Polly] [PPCGCodeGeneration] do not allow LLVM intrinsics to be tracked as Subtree Values.
ClosedPublic

Authored by bollu on May 22 2017, 8:35 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

bollu created this revision.May 22 2017, 8:35 AM
Meinersbur edited edge metadata.May 22 2017, 9:15 AM

Summary missing.

What is supposed to happen (after this patch) if the pointer to a function (or even to an intrisic) is used, e.g. store into an array?

lib/CodeGen/PPCGCodeGeneration.cpp
1112 ↗(On Diff #99770)

Drop "actually"

1115 ↗(On Diff #99770)

Function names start with lower case letters.

1157 ↗(On Diff #99770)

Don't use the name It for non-iterators. make_filter_range returns a range (pair of begin and end iterator).

test/GPGPU/intrinsic-not-copied-to-kernel.ll
1 ↗(On Diff #99770)

Nit: double space

bollu added a comment.May 23 2017, 1:51 AM

@Meinersbur: I don't think it was ever legal to take the address of an intrinsic. The LLVM IRVerifier actively checks against this.

IR/Verifier.cpp
Assert(
    !F->isIntrinsic() ||
        i == (isa<CallInst>(I) ? e - 1 : isa<InvokeInst>(I) ? e - 3 : 0),
    "Cannot take the address of an intrinsic!", &I);
Assert(
    !F->isIntrinsic() || isa<CallInst>(I) ||
        F->getIntrinsicID() == Intrinsic::donothing ||
        F->getIntrinsicID() == Intrinsic::coro_resume ||
        F->getIntrinsicID() == Intrinsic::coro_destroy ||
        F->getIntrinsicID() == Intrinsic::experimental_patchpoint_void ||
        F->getIntrinsicID() == Intrinsic::experimental_patchpoint_i64 ||
        F->getIntrinsicID() == Intrinsic::experimental_gc_statepoint,
    "Cannot invoke an intrinsic other than donothing, patchpoint, "
    "statepoint, coro_resume or coro_destroy",
    &I);
bollu updated this revision to Diff 99863.May 23 2017, 1:53 AM
  • [NFC wrt patch] Style fixes.
bollu updated this revision to Diff 99867.May 23 2017, 2:39 AM
[NFC wrt patch] fix double space
bollu added a comment.May 23 2017, 2:43 AM

@Meinersbur: Addressed comments.

Meinersbur accepted this revision.May 24 2017, 7:14 AM

I believe this patch covers only a special case of a more general problem. The pointer to the intrinsic is added in

static int findReferencesInBlock(struct SubtreeReferences &References,
                                 const ScopStmt *Stmt, const BasicBlock *BB) {
  for (const Instruction &Inst : *BB)
    for (Value *SrcVal : Inst.operands()) {
      auto *Scope = References.LI.getLoopFor(BB);
      if (canSynthesize(SrcVal, References.S, &References.SE, Scope)) {
        References.SCEVs.insert(References.SE.getSCEVAtScope(SrcVal, Scope));

because it is used as an operand of a call instruction. In general, we cannot call host functions from offloaded kernels. In this case, it might be ok because intrinsics are lowered by the backend, so it is ok. How does Polly-ACC determine whether ok call is ok and when it is not? Is there a whitelist? What about memcpy, memmove and memset?

If there is indeed a filter before, the patch LGTM.

Please use consistent case in your title: "Do not allow LLVM intrinsics to be tracked as subtree values."

lib/CodeGen/PPCGCodeGeneration.cpp
1112 ↗(On Diff #99867)

double space

This revision is now accepted and ready to land.May 24 2017, 7:14 AM
Meinersbur added inline comments.May 24 2017, 7:23 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1114 ↗(On Diff #99867)

Please add some detail about how this can happen.

In this case: Each operand of call instruction, which includes the function to be called, added to references.

Does this remove debug intrinsics as well ? If it does, what happens to the meta data at the end of the IR output ( !DICompileUnits, !llvm.dbg.cu = {!11,...}, etc. ) ?

I believe this patch covers only a special case of a more general problem. The pointer to the intrinsic is added in

static int findReferencesInBlock(struct SubtreeReferences &References,
                                 const ScopStmt *Stmt, const BasicBlock *BB) {
  for (const Instruction &Inst : *BB)
    for (Value *SrcVal : Inst.operands()) {
      auto *Scope = References.LI.getLoopFor(BB);
      if (canSynthesize(SrcVal, References.S, &References.SE, Scope)) {
        References.SCEVs.insert(References.SE.getSCEVAtScope(SrcVal, Scope));

because it is used as an operand of a call instruction.

@Meinersbur Is the pointer added in the nested for ?

for (Value *SrcVal : Inst.operands()) {
  auto *Scope = References.LI.getLoopFor(BB);
  if (canSynthesize(SrcVal, References.S, &References.SE, Scope)) {
    References.SCEVs.insert(References.SE.getSCEVAtScope(SrcVal, Scope));  //<- This line right ?

How does adding this pointer affect the subsequent code-generation ?

@bollu @Meinersbur What was the motivation behind this patch ?

bollu added a comment.May 25 2017, 3:33 PM

@singam-sanjay: This was to initially to fix crashes in cases where intrinsics are called from a kernel. We currently try to take the function pointer of an intrinsic and offload it to the kernel (which obviously fails, because an intrinsic "does not really exist"). After discussing this with @Meinersbur, he pointed out that it doesn't make much sense to allow most kinds of function pointers as well, since a call from a function pointer in the kernel maybe to the host, will would not work.

Thanks for the comment @bollu !

So, the code below,

References.SCEVs.insert(References.SE.getSCEVAtScope(SrcVal, Scope));

Adds all references that the kernel IR makes, when it was a part of the original IR. Therefore, this data has to be made available to the kernel and is passed as a polly_launch_param_#. When a reference happens to be to an intrinsic, which carries function pointers, we get buggy IR. Is my understanding correct ?

Given that ppcg might modify the schedule of the IR, I'm curious as to where the intrinsic would be placed in the modified IR. Please share your thoughts. @bollu @grosser @Meinersbur

@Meinersbur: doesn't "value" refer to LLVM::Value? that's why I capitalised it. If that doesn't fit with the naming convention, I'll change it.

Given that ppcg might modify the schedule of the IR, I'm curious as to where the intrinsic would be placed in the modified IR. Please share your thoughts. @bollu @grosser @Meinersbur

Intrinsics are available globally everywhere. One can use them directly without "passing" them around. However, depending on the backend, it might not be able to lower it to the architecture (e.g. an AVX intrinisic on NVidia hardware).

@Meinersbur: doesn't "value" refer to LLVM::Value? that's why I capitalised it. If that doesn't fit with the naming convention, I'll change it.

Are you referring to the title?

  • "Subtree" is capitalized, but there is no type "Subtree"
  • Neither is there one named "Values".
  • The first letter ("do") is also not properly capitalized, giving me the impression that capitalization is rather random than intended.
  • It is unclear that you mean an actual type. Use "llvm::Value" for that (there is an alternative to put it into ticks: "Values", but it is just Phabricator which formats that in typewriter font then, but not the git command line tools). Or just use the English word "value" since it should be clear that values in LLVM are represented by the type "llvm::Value".
bollu added a comment.May 26 2017, 4:25 AM

@Meinersbur: So, as discussed, I'll skip Scops that try to refer to function pointers / intrinsics from being detected? This will need changes in ScopDetection, unless I misunderstood our discussion?

@singam-sanjay: I don't think I understand the question. Specifically, ".. I'm curious as to where the intrinsic would be placed in the modified IR".

For example, when given this:

program.c
static const int N = 1000;
void f(float A[N][N], int n, float B[N][N]) {
  for(int i = 0; i < n; i++) {
    for(int j = 0; j < n; j++) {
      B[i][j] = pow(A[i][j], 3);
    }

  }
}

in the generated IR, it tries to acquire a function pointer to the instruction llvm.pow.f64 in the kernel offload stuff.

@Meinersbur: So, as discussed, I'll skip Scops that try to refer to function pointers / intrinsics from being detected? This will need changes in ScopDetection, unless I misunderstood our discussion?

That's a good question. You can do it in ScopDetection, but then ScopDetection must know what the target will be for it to know which intrinsics/functions are available (Son't want to disallow functions/intriniscs when compiling to host CPU). It would be nice if ScopDetection stays target-independent.

The other possibility is it scan for called functions/intrinsics in PPCGCodeGeneration and bail out there if one is found that cannot be mapped to a GPU-equivalent.

@singam-sanjay: I don't think I understand the question. Specifically, ".. I'm curious as to where the intrinsic would be placed in the modified IR".

I asked this question with the debug intrinsics in mind (llvm.dbg.declare, llvm.dbg.value).

For example, when given this:

program.c
static const int N = 1000;
void f(float A[N][N], int n, float B[N][N]) {
  for(int i = 0; i < n; i++) {
    for(int j = 0; j < n; j++) {
      B[i][j] = pow(A[i][j], 3);
    }

  }
}

in the generated IR, it tries to acquire a function pointer to the instruction llvm.pow.f64 in the kernel offload stuff.

Unlike llvm.pow.f64, the debug intrinsics don't perform any useful operation. So I was curious about where it would be placed in the kernel IR.

If exponentiation is represented as an intrinsic, how can you afford to ignore such intrinsics which perform useful operations which could alter the data flow or the control flow of the program ?

@grosser @Meinersbur (Context D32894) This patch was able to remove the occurrences of debug intrinsics in the kernel code and the following error,

Cannot invoke an intrinsic other than donothing, patchpoint, statepoint, coro_resume or coro_destroy
  store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)** %polly_launch_0_param_7
LLVM ERROR: Broken function found, compilation aborted!

But, somehow debug meta data slipped through,

; ModuleID = 'kernel_0'
source_filename = "kernel_0"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

define ptx_kernel void @kernel_0(i8 addrspace(1)* %MemRef0, i8 addrspace(1)* %MemRef1, i8 addrspace(1)* %MemRef2, i64 %p_0, i64 %p_1, i64 %p_2, i64 %p_3, i64, i64) #0 {
entry:
  %2 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
  %b0 = zext i32 %2 to i64
  %3 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.y()
  %b1 = zext i32 %3 to i64
  %4 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
  %t0 = zext i32 %4 to i64
  %5 = call i32 @llvm.nvvm.read.ptx.sreg.tid.y()
  %t1 = zext i32 %5 to i64
  %6 = mul nsw i64 32, %b0
  %7 = sub nsw i64 %p_2, %6
  %8 = sub nsw i64 %7, 1
  %pexp.p_div_q = udiv i64 %8, 8192
  br label %polly.loop_preheader

polly.loop_exit:                                  ; preds = %polly.loop_exit4
  ret void

polly.loop_header:                                ; preds = %polly.loop_exit4, %polly.loop_preheader
  %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.loop_exit4 ]
  %9 = mul nsw i64 32, %b1
  %10 = sub nsw i64 %p_0, %9
  %11 = sub nsw i64 %10, 1
  %pexp.p_div_q1 = udiv i64 %11, 8192
  br label %polly.loop_preheader3

polly.loop_exit4:                                 ; preds = %polly.merge57
  %polly.indvar_next = add nsw i64 %polly.indvar, 1
  %polly.loop_cond = icmp sle i64 %polly.indvar_next, %pexp.p_div_q
  br i1 %polly.loop_cond, label %polly.loop_header, label %polly.loop_exit

polly.loop_preheader:                             ; preds = %entry
  %12 = icmp sgt i64 %p_0, 0
  %smax = select i1 %12, i64 %p_0, i64 0
  %13 = add i64 %smax, 1
  %14 = icmp sgt i64 %p_0, 0
  %smax54 = select i1 %14, i64 %p_0, i64 0
  %15 = add i64 %smax54, 1
  %16 = icmp sgt i64 %p_0, 0
  %smax84 = select i1 %16, i64 %p_0, i64 0
  %17 = add i64 %smax84, 1
  br label %polly.loop_header

polly.loop_header2:                               ; preds = %polly.merge57, %polly.loop_preheader3
  %polly.indvar5 = phi i64 [ 0, %polly.loop_preheader3 ], [ %polly.indvar_next6, %polly.merge57 ]
  %18 = sub nsw i64 %p_1, 1
  %polly.fdiv_q.shr = ashr i64 %18, 5
  br label %polly.loop_if

polly.loop_exit10:                                ; preds = %polly.merge, %polly.loop_if
  br label %polly.cond56

polly.cond56:                                     ; preds = %polly.loop_exit10
  %19 = icmp sle i64 %p_1, 0
  br i1 %19, label %polly.then58, label %polly.else59

polly.merge57:                                    ; preds = %polly.else59, %polly.merge61
  %polly.indvar_next6 = add nsw i64 %polly.indvar5, 1
  %polly.loop_cond7 = icmp sle i64 %polly.indvar_next6, %pexp.p_div_q1
  br i1 %polly.loop_cond7, label %polly.loop_header2, label %polly.loop_exit4

polly.loop_preheader3:                            ; preds = %polly.loop_header
  br label %polly.loop_header2

polly.loop_if:                                    ; preds = %polly.loop_header2
  %polly.loop_guard = icmp sle i64 0, %polly.fdiv_q.shr
  br i1 %polly.loop_guard, label %polly.loop_preheader9, label %polly.loop_exit10

polly.loop_header8:                               ; preds = %polly.merge, %polly.loop_preheader9
  %polly.indvar11 = phi i64 [ 0, %polly.loop_preheader9 ], [ %polly.indvar_next12, %polly.merge ]
  br label %polly.cond

polly.cond:                                       ; preds = %polly.loop_header8
  %20 = mul nsw i64 32, %b0
  %21 = add nsw i64 %20, %t0
  %22 = mul nsw i64 8192, %polly.indvar
  %23 = add nsw i64 %21, %22
  %24 = add nsw i64 %23, 1
  %25 = icmp sge i64 %p_2, %24
  br i1 %25, label %polly.then, label %polly.else

polly.merge:                                      ; preds = %polly.else, %polly.loop_exit18
  call void @llvm.nvvm.barrier0()
  %polly.indvar_next12 = add nsw i64 %polly.indvar11, 1
  %polly.loop_cond13 = icmp sle i64 %polly.indvar_next12, %polly.fdiv_q.shr
  br i1 %polly.loop_cond13, label %polly.loop_header8, label %polly.loop_exit10

polly.loop_preheader9:                            ; preds = %polly.loop_if
  br label %polly.loop_header8

polly.then:                                       ; preds = %polly.cond
  %26 = mul nsw i64 -2, %b1
  %27 = mul nsw i64 512, %polly.indvar5
  %28 = sub nsw i64 %26, %27
  %29 = sub nsw i64 %p_0, %t1
  %30 = sub nsw i64 %29, 1
  %polly.fdiv_q.shr14 = ashr i64 %30, 4
  %31 = add nsw i64 %28, %polly.fdiv_q.shr14
  %32 = icmp slt i64 1, %31
  %33 = select i1 %32, i64 1, i64 %31
  br label %polly.loop_if15

polly.loop_exit18:                                ; preds = %polly.loop_exit35, %polly.loop_if15
  br label %polly.merge

polly.else:                                       ; preds = %polly.cond
  br label %polly.merge

polly.loop_if15:                                  ; preds = %polly.then
  %polly.loop_guard19 = icmp sle i64 0, %33
  br i1 %polly.loop_guard19, label %polly.loop_preheader17, label %polly.loop_exit18

polly.loop_header16:                              ; preds = %polly.loop_exit35, %polly.loop_preheader17
  %polly.indvar20 = phi i64 [ 0, %polly.loop_preheader17 ], [ %polly.indvar_next21, %polly.loop_exit35 ]
  br label %polly.cond23

polly.cond23:                                     ; preds = %polly.loop_header16
  %34 = icmp eq i64 %polly.indvar11, 0
  br i1 %34, label %polly.then25, label %polly.else26

polly.merge24:                                    ; preds = %polly.else26, %polly.stmt.if7
  %35 = mul nsw i64 32, %polly.indvar11
  %36 = sub nsw i64 %p_1, %35
  %37 = sub nsw i64 %36, 1
  %38 = icmp slt i64 31, %37
  %39 = select i1 %38, i64 31, i64 %37
  br label %polly.loop_if32

polly.loop_exit35:                                ; preds = %polly.stmt.if9, %polly.loop_if32
  %polly.indvar_next21 = add nsw i64 %polly.indvar20, 1
  %polly.loop_cond22 = icmp sle i64 %polly.indvar_next21, %33
  br i1 %polly.loop_cond22, label %polly.loop_header16, label %polly.loop_exit18

polly.loop_preheader17:                           ; preds = %polly.loop_if15
  br label %polly.loop_header16

polly.then25:                                     ; preds = %polly.cond23
  %40 = mul nsw i64 32, %b0
  %41 = add nsw i64 %40, %t0
  %42 = mul nsw i64 8192, %polly.indvar
  %43 = add nsw i64 %41, %42
  %44 = mul nsw i64 32, %b1
  %45 = add nsw i64 %44, %t1
  %46 = mul nsw i64 8192, %polly.indvar5
  %47 = add nsw i64 %45, %46
  %48 = mul nsw i64 16, %polly.indvar20
  %49 = add nsw i64 %47, %48
  br label %polly.stmt.if7

polly.stmt.if7:                                   ; preds = %polly.then25
  %polly.access.cast.MemRef0 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)*
  %50 = mul nsw i64 32, %b1
  %51 = add nsw i64 %50, %t1
  %52 = mul nsw i64 8192, %polly.indvar5
  %53 = add nsw i64 %51, %52
  %54 = mul nsw i64 16, %polly.indvar20
  %55 = add nsw i64 %53, %54
  %polly.access.mul.MemRef0 = mul nsw i64 %55, %p_3
  %56 = mul nsw i64 32, %b0
  %57 = add nsw i64 %56, %t0
  %58 = mul nsw i64 8192, %polly.indvar
  %59 = add nsw i64 %57, %58
  %polly.access.add.MemRef0 = add nsw i64 %polly.access.mul.MemRef0, %59
  %polly.access.MemRef0 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef0, i64 %polly.access.add.MemRef0
  %_p_scalar_ = load i64, i64 addrspace(1)* %polly.access.MemRef0, align 8
  %p_ = mul i64 %_p_scalar_, %0, !dbg !1
  %polly.access.cast.MemRef027 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)*
  %60 = mul nsw i64 32, %b1
  %61 = add nsw i64 %60, %t1
  %62 = mul nsw i64 8192, %polly.indvar5
  %63 = add nsw i64 %61, %62
  %64 = mul nsw i64 16, %polly.indvar20
  %65 = add nsw i64 %63, %64
  %polly.access.mul.MemRef028 = mul nsw i64 %65, %p_3
  %66 = mul nsw i64 32, %b0
  %67 = add nsw i64 %66, %t0
  %68 = mul nsw i64 8192, %polly.indvar
  %69 = add nsw i64 %67, %68
  %polly.access.add.MemRef029 = add nsw i64 %polly.access.mul.MemRef028, %69
  %polly.access.MemRef030 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef027, i64 %polly.access.add.MemRef029
  store i64 %p_, i64 addrspace(1)* %polly.access.MemRef030, align 8
  br label %polly.merge24

polly.else26:                                     ; preds = %polly.cond23
  br label %polly.merge24

polly.loop_if32:                                  ; preds = %polly.merge24
  %polly.loop_guard36 = icmp sle i64 0, %39
  br i1 %polly.loop_guard36, label %polly.loop_preheader34, label %polly.loop_exit35

polly.loop_header33:                              ; preds = %polly.stmt.if9, %polly.loop_preheader34
  %polly.indvar37 = phi i64 [ 0, %polly.loop_preheader34 ], [ %polly.indvar_next38, %polly.stmt.if9 ]
  %70 = mul nsw i64 32, %b0
  %71 = add nsw i64 %70, %t0
  %72 = mul nsw i64 8192, %polly.indvar
  %73 = add nsw i64 %71, %72
  %74 = mul nsw i64 32, %polly.indvar11
  %75 = add nsw i64 %74, %polly.indvar37
  %76 = mul nsw i64 32, %b1
  %77 = add nsw i64 %76, %t1
  %78 = mul nsw i64 8192, %polly.indvar5
  %79 = add nsw i64 %77, %78
  %80 = mul nsw i64 16, %polly.indvar20
  %81 = add nsw i64 %79, %80
  br label %polly.stmt.if9

polly.stmt.if9:                                   ; preds = %polly.loop_header33
  %polly.access.cast.MemRef040 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)*
  %82 = mul nsw i64 32, %b1
  %83 = add nsw i64 %82, %t1
  %84 = mul nsw i64 8192, %polly.indvar5
  %85 = add nsw i64 %83, %84
  %86 = mul nsw i64 16, %polly.indvar20
  %87 = add nsw i64 %85, %86
  %polly.access.mul.MemRef041 = mul nsw i64 %87, %p_3
  %88 = mul nsw i64 32, %b0
  %89 = add nsw i64 %88, %t0
  %90 = mul nsw i64 8192, %polly.indvar
  %91 = add nsw i64 %89, %90
  %polly.access.add.MemRef042 = add nsw i64 %polly.access.mul.MemRef041, %91
  %polly.access.MemRef043 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef040, i64 %polly.access.add.MemRef042
  %_p_scalar_44 = load i64, i64 addrspace(1)* %polly.access.MemRef043, align 8
  %polly.access.cast.MemRef1 = bitcast i8 addrspace(1)* %MemRef1 to i64 addrspace(1)*
  %92 = mul nsw i64 32, %polly.indvar11
  %93 = add nsw i64 %92, %polly.indvar37
  %polly.access.mul.MemRef1 = mul nsw i64 %93, %p_2
  %94 = mul nsw i64 32, %b0
  %95 = add nsw i64 %94, %t0
  %96 = mul nsw i64 8192, %polly.indvar
  %97 = add nsw i64 %95, %96
  %polly.access.add.MemRef1 = add nsw i64 %polly.access.mul.MemRef1, %97
  %polly.access.MemRef1 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef1, i64 %polly.access.add.MemRef1
  %_p_scalar_45 = load i64, i64 addrspace(1)* %polly.access.MemRef1, align 8
  %p_46 = mul i64 %_p_scalar_45, %1, !dbg !27
  %polly.access.cast.MemRef2 = bitcast i8 addrspace(1)* %MemRef2 to i64 addrspace(1)*
  %98 = mul nsw i64 32, %b1
  %99 = add nsw i64 %98, %t1
  %100 = mul nsw i64 8192, %polly.indvar5
  %101 = add nsw i64 %99, %100
  %102 = mul nsw i64 16, %polly.indvar20
  %103 = add nsw i64 %101, %102
  %polly.access.mul.MemRef2 = mul nsw i64 %103, %p_1
  %104 = mul nsw i64 32, %polly.indvar11
  %105 = add nsw i64 %104, %polly.indvar37
  %polly.access.add.MemRef2 = add nsw i64 %polly.access.mul.MemRef2, %105
  %polly.access.MemRef2 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef2, i64 %polly.access.add.MemRef2
  %_p_scalar_47 = load i64, i64 addrspace(1)* %polly.access.MemRef2, align 8
  %p_48 = mul i64 %p_46, %_p_scalar_47, !dbg !27
  %p_49 = add i64 %p_48, %_p_scalar_44, !dbg !27
  %polly.access.cast.MemRef050 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)*
  %106 = mul nsw i64 32, %b1
  %107 = add nsw i64 %106, %t1
  %108 = mul nsw i64 8192, %polly.indvar5
  %109 = add nsw i64 %107, %108
  %110 = mul nsw i64 16, %polly.indvar20
  %111 = add nsw i64 %109, %110
  %polly.access.mul.MemRef051 = mul nsw i64 %111, %p_3
  %112 = mul nsw i64 32, %b0
  %113 = add nsw i64 %112, %t0
  %114 = mul nsw i64 8192, %polly.indvar
  %115 = add nsw i64 %113, %114
  %polly.access.add.MemRef052 = add nsw i64 %polly.access.mul.MemRef051, %115
  %polly.access.MemRef053 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef050, i64 %polly.access.add.MemRef052
  store i64 %p_49, i64 addrspace(1)* %polly.access.MemRef053, align 8
  %polly.indvar_next38 = add nsw i64 %polly.indvar37, 1
  %polly.loop_cond39 = icmp sle i64 %polly.indvar_next38, %39
  br i1 %polly.loop_cond39, label %polly.loop_header33, label %polly.loop_exit35

polly.loop_preheader34:                           ; preds = %polly.loop_if32
  br label %polly.loop_header33

polly.then58:                                     ; preds = %polly.cond56
  br label %polly.cond60

polly.cond60:                                     ; preds = %polly.then58
  %116 = mul nsw i64 32, %b0
  %117 = add nsw i64 %116, %t0
  %118 = mul nsw i64 8192, %polly.indvar
  %119 = add nsw i64 %117, %118
  %120 = add nsw i64 %119, 1
  %121 = icmp sge i64 %p_2, %120
  br i1 %121, label %polly.then62, label %polly.else63

polly.merge61:                                    ; preds = %polly.else63, %polly.loop_exit68
  call void @llvm.nvvm.barrier0()
  br label %polly.merge57

polly.else59:                                     ; preds = %polly.cond56
  br label %polly.merge57

polly.then62:                                     ; preds = %polly.cond60
  %122 = mul nsw i64 -2, %b1
  %123 = mul nsw i64 512, %polly.indvar5
  %124 = sub nsw i64 %122, %123
  %125 = sub nsw i64 %p_0, %t1
  %126 = sub nsw i64 %125, 1
  %polly.fdiv_q.shr64 = ashr i64 %126, 4
  %127 = add nsw i64 %124, %polly.fdiv_q.shr64
  %128 = icmp slt i64 1, %127
  %129 = select i1 %128, i64 1, i64 %127
  br label %polly.loop_if65

polly.loop_exit68:                                ; preds = %polly.stmt.if773, %polly.loop_if65
  br label %polly.merge61

polly.else63:                                     ; preds = %polly.cond60
  br label %polly.merge61

polly.loop_if65:                                  ; preds = %polly.then62
  %polly.loop_guard69 = icmp sle i64 0, %129
  br i1 %polly.loop_guard69, label %polly.loop_preheader67, label %polly.loop_exit68

polly.loop_header66:                              ; preds = %polly.stmt.if773, %polly.loop_preheader67
  %polly.indvar70 = phi i64 [ 0, %polly.loop_preheader67 ], [ %polly.indvar_next71, %polly.stmt.if773 ]
  %130 = mul nsw i64 32, %b0
  %131 = add nsw i64 %130, %t0
  %132 = mul nsw i64 8192, %polly.indvar
  %133 = add nsw i64 %131, %132
  %134 = mul nsw i64 32, %b1
  %135 = add nsw i64 %134, %t1
  %136 = mul nsw i64 8192, %polly.indvar5
  %137 = add nsw i64 %135, %136
  %138 = mul nsw i64 16, %polly.indvar70
  %139 = add nsw i64 %137, %138
  br label %polly.stmt.if773

polly.stmt.if773:                                 ; preds = %polly.loop_header66
  %polly.access.cast.MemRef074 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)*
  %140 = mul nsw i64 32, %b1
  %141 = add nsw i64 %140, %t1
  %142 = mul nsw i64 8192, %polly.indvar5
  %143 = add nsw i64 %141, %142
  %144 = mul nsw i64 16, %polly.indvar70
  %145 = add nsw i64 %143, %144
  %polly.access.mul.MemRef075 = mul nsw i64 %145, %p_3
  %146 = mul nsw i64 32, %b0
  %147 = add nsw i64 %146, %t0
  %148 = mul nsw i64 8192, %polly.indvar
  %149 = add nsw i64 %147, %148
  %polly.access.add.MemRef076 = add nsw i64 %polly.access.mul.MemRef075, %149
  %polly.access.MemRef077 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef074, i64 %polly.access.add.MemRef076
  %_p_scalar_78 = load i64, i64 addrspace(1)* %polly.access.MemRef077, align 8
  %p_79 = mul i64 %_p_scalar_78, %0, !dbg !1
  %polly.access.cast.MemRef080 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)*
  %150 = mul nsw i64 32, %b1
  %151 = add nsw i64 %150, %t1
  %152 = mul nsw i64 8192, %polly.indvar5
  %153 = add nsw i64 %151, %152
  %154 = mul nsw i64 16, %polly.indvar70
  %155 = add nsw i64 %153, %154
  %polly.access.mul.MemRef081 = mul nsw i64 %155, %p_3
  %156 = mul nsw i64 32, %b0
  %157 = add nsw i64 %156, %t0
  %158 = mul nsw i64 8192, %polly.indvar
  %159 = add nsw i64 %157, %158
  %polly.access.add.MemRef082 = add nsw i64 %polly.access.mul.MemRef081, %159
  %polly.access.MemRef083 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef080, i64 %polly.access.add.MemRef082
  store i64 %p_79, i64 addrspace(1)* %polly.access.MemRef083, align 8
  %polly.indvar_next71 = add nsw i64 %polly.indvar70, 1
  %polly.loop_cond72 = icmp sle i64 %polly.indvar_next71, %129
  br i1 %polly.loop_cond72, label %polly.loop_header66, label %polly.loop_exit68

polly.loop_preheader67:                           ; preds = %polly.loop_if65
  br label %polly.loop_header66
}

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #1

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() #1

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.y() #1

; Function Attrs: convergent nounwind
declare void @llvm.nvvm.barrier0() #2

attributes #0 = { "polly.skip.fn" }
attributes #1 = { nounwind readnone }
attributes #2 = { convergent nounwind }

!nvvm.annotations = !{!0}

!0 = !{void (i8 addrspace(1)*, i8 addrspace(1)*, i8 addrspace(1)*, i64, i64, i64, i64, i64, i64)* @kernel_0, !"maxntidx", i32 32, !"maxntidy", i32 16, !"maxntidz", i32 1}
!1 = !DILocation(line: 6, scope: !2)
!2 = distinct !DISubprogram(name: "kernel_gemm", linkageName: "julia_kernel_gemm_66058", scope: null, file: !3, type: !4, isLocal: false, isDefinition: true, isOptimized: true, unit: !11, variables: !13)
!3 = !DIFile(filename: "REPL[1]", directory: ".")
!4 = !DISubroutineType(types: !5)
!5 = !{!6, !6, !7, !7, !7}
!6 = !DIBasicType(name: "Int64", size: 64, encoding: DW_ATE_unsigned)
!7 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !8, size: 64, align: 64)
!8 = !DICompositeType(tag: DW_TAG_structure_type, name: "jl_value_t", file: !9, line: 71, align: 64, elements: !10)
!9 = !DIFile(filename: "julia.h", directory: "")
!10 = !{!7}
!11 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !12)
!12 = !{}
!13 = !{!14, !16, !17, !18, !19, !20, !21, !21, !22, !23, !24, !25, !26}
!14 = !DILocalVariable(name: "#self#", arg: 1, scope: !2, file: !3, line: 2, type: !15)
!15 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "#kernel_gemm", align: 8, elements: !12, runtimeLang: DW_LANG_Julia, identifier: "#kernel_gemm_65995")
!16 = !DILocalVariable(name: "alpha", arg: 2, scope: !2, file: !3, line: 2, type: !6)
!17 = !DILocalVariable(name: "beta", arg: 3, scope: !2, file: !3, line: 2, type: !6)
!18 = !DILocalVariable(name: "C", arg: 4, scope: !2, file: !3, line: 2, type: !7)
!19 = !DILocalVariable(name: "A", arg: 5, scope: !2, file: !3, line: 2, type: !7)
!20 = !DILocalVariable(name: "B", arg: 6, scope: !2, file: !3, line: 2, type: !7)
!21 = !DILocalVariable(name: "j", scope: !2, file: !3, line: 2, type: !6)
!22 = !DILocalVariable(name: "k", scope: !2, file: !3, line: 2, type: !6)
!23 = !DILocalVariable(name: "i", scope: !2, file: !3, line: 2, type: !6)
!24 = !DILocalVariable(name: "ni", scope: !2, file: !3, line: 2, type: !6)
!25 = !DILocalVariable(name: "nk", scope: !2, file: !3, line: 2, type: !6)
!26 = !DILocalVariable(name: "nj", scope: !2, file: !3, line: 2, type: !6)
!27 = !DILocation(line: 9, scope: !2)

The error message from verifyModule :
DICompileUnit not listed in llvm.dbg.cu
!11 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !12)

The meta data seems to about variables in the original function.


@bollu that's the file I've been using to test this patch and D32894.

lib/CodeGen/PPCGCodeGeneration.cpp
1117 ↗(On Diff #99867)

I tried both,

return  !(F && F->isIntrinsic()) || !polly::isIgnoredIntrinsic(V);

and

return  !polly::isIgnoredIntrinsic(V) || !(F && F->isIntrinsic());

but these didn't prevent the "store void (metadata,... polly_launch_0_param_7" instruction from the IR. @bollu @grosser @Meinersbur thoughts ?

bollu added inline comments.May 30 2017, 1:20 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1117 ↗(On Diff #99867)

I don't believe either of those will help, since the function being called will be a store. You'll probably need to loop over the parameters and check if they are debug information or something like that. I don't know how debug metadata works in LLVM.

bollu updated this revision to Diff 100694.May 30 2017, 5:01 AM
  • Check if Scop has function pointers. If it does, bail out of PPCGCodeGen.
bollu added a comment.May 31 2017, 7:37 AM

@Meinersbur: I added the bail-out code in PPCGCodeGeneration. Could you take a look?

lib/CodeGen/PPCGCodeGeneration.cpp
1117 ↗(On Diff #99867)

Sorry, correction: ofc store is not a function, don't know what I was thinking when I wrote that.

grosser edited edge metadata.Jun 12 2017, 2:30 AM

Hi Siddharth,

your last patch makes perfect sense, but is in fact something different than what this patch review initially started off with. As it is safe and correct to always bail out, I suggest to commit your latest patch independently (with the change I request). We can then continue with white-listing functions.

Michael is ATM on a summer school and likely won't be able to reply. I OK this for now as this is trivial and likely good to go. Please provide (post-commit) comments if we missed something.

lib/CodeGen/PPCGCodeGeneration.cpp
2716 ↗(On Diff #100694)

try -> trying

test/GPGPU/intrinsic-not-copied-to-kernel.ll
13 ↗(On Diff #100694)

Hi Siddharth,

could you use a function that is not an intrinsic which we will soon whitelist anyhow? E.g., just an externally defined function where only a declaration is available?

bollu updated this revision to Diff 102164.Jun 12 2017, 2:59 AM
  • Change test case to use a declared global function and not a known intrinsic.
  • [NFC] grammar fix.
grosser added inline comments.Jun 12 2017, 3:00 AM
test/GPGPU/intrinsic-not-copied-to-kernel.ll
76 ↗(On Diff #102164)

Do we even create a scop if the function is not marked as 'readnone'?

bollu updated this revision to Diff 102166.Jun 12 2017, 3:02 AM
  • [NFC] Slightly simplify testcase, change name to better fit intent.
bollu updated this revision to Diff 102167.Jun 12 2017, 3:11 AM
  • Check that we model the kernel in the testcase as a scop.
bollu added a comment.Jun 12 2017, 3:12 AM

@grosser Thank you, I didn't notice that we don't model it if there is no readnone. Added a check to make sure that we model the scop.

test/GPGPU/unknown-fn-call-not-copied-into-kernel.ll
9 ↗(On Diff #102167)

This should check that we actually model the scop.

82 ↗(On Diff #102167)

The attributes should now mark it as readnone.

This is better.

bollu added a comment.Jun 12 2017, 3:53 AM

is that an LGTM? :)

bollu updated this revision to Diff 102171.Jun 12 2017, 4:19 AM
  • [NFC] Remove extra attributes.
This revision was automatically updated to reflect the committed changes.
singam-sanjay added inline comments.Jun 12 2017, 12:43 PM
polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp
2719

Does this mean PPCGCodeGen will bail out if there are debug intrinsics ?

bollu added inline comments.Jun 13 2017, 6:28 AM
polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp
2719

yes, it will. However, in D34145, I'm allowing intrinsics again. We should probably move the discussion about debug intrinsics there. Is that fair?