Details
- Reviewers
efriedma jdoerfert Meinersbur gareevroman sebpop • zinob huihuiz pollydev grosser - Commits
- rGbccaea57c00d: [Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers.
rPLO305185: [Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers.
rL305185: [Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers.
Diff Detail
- Build Status
Buildable 7149 Build 7149: arc lint + arc unit
Event Timeline
Summary missing.
What is supposed to happen (after this patch) if the pointer to a function (or even to an intrisic) is used, e.g. store into an array?
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1112 | Drop "actually" | |
1115 | Function names start with lower case letters. | |
1149 | Don't use the name It for non-iterators. make_filter_range returns a range (pair of begin and end iterator). | |
test/GPGPU/intrinsic-not-copied-to-kernel.ll | ||
2 | Nit: double space |
@Meinersbur: I don't think it was ever legal to take the address of an intrinsic. The LLVM IRVerifier actively checks against this.
Assert( !F->isIntrinsic() || i == (isa<CallInst>(I) ? e - 1 : isa<InvokeInst>(I) ? e - 3 : 0), "Cannot take the address of an intrinsic!", &I); Assert( !F->isIntrinsic() || isa<CallInst>(I) || F->getIntrinsicID() == Intrinsic::donothing || F->getIntrinsicID() == Intrinsic::coro_resume || F->getIntrinsicID() == Intrinsic::coro_destroy || F->getIntrinsicID() == Intrinsic::experimental_patchpoint_void || F->getIntrinsicID() == Intrinsic::experimental_patchpoint_i64 || F->getIntrinsicID() == Intrinsic::experimental_gc_statepoint, "Cannot invoke an intrinsic other than donothing, patchpoint, " "statepoint, coro_resume or coro_destroy", &I);
I believe this patch covers only a special case of a more general problem. The pointer to the intrinsic is added in
static int findReferencesInBlock(struct SubtreeReferences &References, const ScopStmt *Stmt, const BasicBlock *BB) { for (const Instruction &Inst : *BB) for (Value *SrcVal : Inst.operands()) { auto *Scope = References.LI.getLoopFor(BB); if (canSynthesize(SrcVal, References.S, &References.SE, Scope)) { References.SCEVs.insert(References.SE.getSCEVAtScope(SrcVal, Scope));
because it is used as an operand of a call instruction. In general, we cannot call host functions from offloaded kernels. In this case, it might be ok because intrinsics are lowered by the backend, so it is ok. How does Polly-ACC determine whether ok call is ok and when it is not? Is there a whitelist? What about memcpy, memmove and memset?
If there is indeed a filter before, the patch LGTM.
Please use consistent case in your title: "Do not allow LLVM intrinsics to be tracked as subtree values."
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1112 | double space |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1114 | Please add some detail about how this can happen. In this case: Each operand of call instruction, which includes the function to be called, added to references. |
Does this remove debug intrinsics as well ? If it does, what happens to the meta data at the end of the IR output ( !DICompileUnits, !llvm.dbg.cu = {!11,...}, etc. ) ?
@Meinersbur Is the pointer added in the nested for ?
for (Value *SrcVal : Inst.operands()) { auto *Scope = References.LI.getLoopFor(BB); if (canSynthesize(SrcVal, References.S, &References.SE, Scope)) { References.SCEVs.insert(References.SE.getSCEVAtScope(SrcVal, Scope)); //<- This line right ?
How does adding this pointer affect the subsequent code-generation ?
@bollu @Meinersbur What was the motivation behind this patch ?
@singam-sanjay: This was to initially to fix crashes in cases where intrinsics are called from a kernel. We currently try to take the function pointer of an intrinsic and offload it to the kernel (which obviously fails, because an intrinsic "does not really exist"). After discussing this with @Meinersbur, he pointed out that it doesn't make much sense to allow most kinds of function pointers as well, since a call from a function pointer in the kernel maybe to the host, will would not work.
Thanks for the comment @bollu !
So, the code below,
References.SCEVs.insert(References.SE.getSCEVAtScope(SrcVal, Scope));
Adds all references that the kernel IR makes, when it was a part of the original IR. Therefore, this data has to be made available to the kernel and is passed as a polly_launch_param_#. When a reference happens to be to an intrinsic, which carries function pointers, we get buggy IR. Is my understanding correct ?
Given that ppcg might modify the schedule of the IR, I'm curious as to where the intrinsic would be placed in the modified IR. Please share your thoughts. @bollu @grosser @Meinersbur
@Meinersbur: doesn't "value" refer to LLVM::Value? that's why I capitalised it. If that doesn't fit with the naming convention, I'll change it.
Intrinsics are available globally everywhere. One can use them directly without "passing" them around. However, depending on the backend, it might not be able to lower it to the architecture (e.g. an AVX intrinisic on NVidia hardware).
Are you referring to the title?
- "Subtree" is capitalized, but there is no type "Subtree"
- Neither is there one named "Values".
- The first letter ("do") is also not properly capitalized, giving me the impression that capitalization is rather random than intended.
- It is unclear that you mean an actual type. Use "llvm::Value" for that (there is an alternative to put it into ticks: "Values", but it is just Phabricator which formats that in typewriter font then, but not the git command line tools). Or just use the English word "value" since it should be clear that values in LLVM are represented by the type "llvm::Value".
@Meinersbur: So, as discussed, I'll skip Scops that try to refer to function pointers / intrinsics from being detected? This will need changes in ScopDetection, unless I misunderstood our discussion?
@singam-sanjay: I don't think I understand the question. Specifically, ".. I'm curious as to where the intrinsic would be placed in the modified IR".
For example, when given this:
static const int N = 1000; void f(float A[N][N], int n, float B[N][N]) { for(int i = 0; i < n; i++) { for(int j = 0; j < n; j++) { B[i][j] = pow(A[i][j], 3); } } }
in the generated IR, it tries to acquire a function pointer to the instruction llvm.pow.f64 in the kernel offload stuff.
That's a good question. You can do it in ScopDetection, but then ScopDetection must know what the target will be for it to know which intrinsics/functions are available (Son't want to disallow functions/intriniscs when compiling to host CPU). It would be nice if ScopDetection stays target-independent.
The other possibility is it scan for called functions/intrinsics in PPCGCodeGeneration and bail out there if one is found that cannot be mapped to a GPU-equivalent.
I asked this question with the debug intrinsics in mind (llvm.dbg.declare, llvm.dbg.value).
For example, when given this:
program.cstatic const int N = 1000; void f(float A[N][N], int n, float B[N][N]) { for(int i = 0; i < n; i++) { for(int j = 0; j < n; j++) { B[i][j] = pow(A[i][j], 3); } } }in the generated IR, it tries to acquire a function pointer to the instruction llvm.pow.f64 in the kernel offload stuff.
Unlike llvm.pow.f64, the debug intrinsics don't perform any useful operation. So I was curious about where it would be placed in the kernel IR.
If exponentiation is represented as an intrinsic, how can you afford to ignore such intrinsics which perform useful operations which could alter the data flow or the control flow of the program ?
@grosser @Meinersbur (Context D32894) This patch was able to remove the occurrences of debug intrinsics in the kernel code and the following error,
Cannot invoke an intrinsic other than donothing, patchpoint, statepoint, coro_resume or coro_destroy store void (metadata, i64, metadata, metadata)* @llvm.dbg.value, void (metadata, i64, metadata, metadata)** %polly_launch_0_param_7 LLVM ERROR: Broken function found, compilation aborted!
But, somehow debug meta data slipped through,
; ModuleID = 'kernel_0' source_filename = "kernel_0" target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64" target triple = "nvptx64-nvidia-cuda" define ptx_kernel void @kernel_0(i8 addrspace(1)* %MemRef0, i8 addrspace(1)* %MemRef1, i8 addrspace(1)* %MemRef2, i64 %p_0, i64 %p_1, i64 %p_2, i64 %p_3, i64, i64) #0 { entry: %2 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() %b0 = zext i32 %2 to i64 %3 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() %b1 = zext i32 %3 to i64 %4 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() %t0 = zext i32 %4 to i64 %5 = call i32 @llvm.nvvm.read.ptx.sreg.tid.y() %t1 = zext i32 %5 to i64 %6 = mul nsw i64 32, %b0 %7 = sub nsw i64 %p_2, %6 %8 = sub nsw i64 %7, 1 %pexp.p_div_q = udiv i64 %8, 8192 br label %polly.loop_preheader polly.loop_exit: ; preds = %polly.loop_exit4 ret void polly.loop_header: ; preds = %polly.loop_exit4, %polly.loop_preheader %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.loop_exit4 ] %9 = mul nsw i64 32, %b1 %10 = sub nsw i64 %p_0, %9 %11 = sub nsw i64 %10, 1 %pexp.p_div_q1 = udiv i64 %11, 8192 br label %polly.loop_preheader3 polly.loop_exit4: ; preds = %polly.merge57 %polly.indvar_next = add nsw i64 %polly.indvar, 1 %polly.loop_cond = icmp sle i64 %polly.indvar_next, %pexp.p_div_q br i1 %polly.loop_cond, label %polly.loop_header, label %polly.loop_exit polly.loop_preheader: ; preds = %entry %12 = icmp sgt i64 %p_0, 0 %smax = select i1 %12, i64 %p_0, i64 0 %13 = add i64 %smax, 1 %14 = icmp sgt i64 %p_0, 0 %smax54 = select i1 %14, i64 %p_0, i64 0 %15 = add i64 %smax54, 1 %16 = icmp sgt i64 %p_0, 0 %smax84 = select i1 %16, i64 %p_0, i64 0 %17 = add i64 %smax84, 1 br label %polly.loop_header polly.loop_header2: ; preds = %polly.merge57, %polly.loop_preheader3 %polly.indvar5 = phi i64 [ 0, %polly.loop_preheader3 ], [ %polly.indvar_next6, %polly.merge57 ] %18 = sub nsw i64 %p_1, 1 %polly.fdiv_q.shr = ashr i64 %18, 5 br label %polly.loop_if polly.loop_exit10: ; preds = %polly.merge, %polly.loop_if br label %polly.cond56 polly.cond56: ; preds = %polly.loop_exit10 %19 = icmp sle i64 %p_1, 0 br i1 %19, label %polly.then58, label %polly.else59 polly.merge57: ; preds = %polly.else59, %polly.merge61 %polly.indvar_next6 = add nsw i64 %polly.indvar5, 1 %polly.loop_cond7 = icmp sle i64 %polly.indvar_next6, %pexp.p_div_q1 br i1 %polly.loop_cond7, label %polly.loop_header2, label %polly.loop_exit4 polly.loop_preheader3: ; preds = %polly.loop_header br label %polly.loop_header2 polly.loop_if: ; preds = %polly.loop_header2 %polly.loop_guard = icmp sle i64 0, %polly.fdiv_q.shr br i1 %polly.loop_guard, label %polly.loop_preheader9, label %polly.loop_exit10 polly.loop_header8: ; preds = %polly.merge, %polly.loop_preheader9 %polly.indvar11 = phi i64 [ 0, %polly.loop_preheader9 ], [ %polly.indvar_next12, %polly.merge ] br label %polly.cond polly.cond: ; preds = %polly.loop_header8 %20 = mul nsw i64 32, %b0 %21 = add nsw i64 %20, %t0 %22 = mul nsw i64 8192, %polly.indvar %23 = add nsw i64 %21, %22 %24 = add nsw i64 %23, 1 %25 = icmp sge i64 %p_2, %24 br i1 %25, label %polly.then, label %polly.else polly.merge: ; preds = %polly.else, %polly.loop_exit18 call void @llvm.nvvm.barrier0() %polly.indvar_next12 = add nsw i64 %polly.indvar11, 1 %polly.loop_cond13 = icmp sle i64 %polly.indvar_next12, %polly.fdiv_q.shr br i1 %polly.loop_cond13, label %polly.loop_header8, label %polly.loop_exit10 polly.loop_preheader9: ; preds = %polly.loop_if br label %polly.loop_header8 polly.then: ; preds = %polly.cond %26 = mul nsw i64 -2, %b1 %27 = mul nsw i64 512, %polly.indvar5 %28 = sub nsw i64 %26, %27 %29 = sub nsw i64 %p_0, %t1 %30 = sub nsw i64 %29, 1 %polly.fdiv_q.shr14 = ashr i64 %30, 4 %31 = add nsw i64 %28, %polly.fdiv_q.shr14 %32 = icmp slt i64 1, %31 %33 = select i1 %32, i64 1, i64 %31 br label %polly.loop_if15 polly.loop_exit18: ; preds = %polly.loop_exit35, %polly.loop_if15 br label %polly.merge polly.else: ; preds = %polly.cond br label %polly.merge polly.loop_if15: ; preds = %polly.then %polly.loop_guard19 = icmp sle i64 0, %33 br i1 %polly.loop_guard19, label %polly.loop_preheader17, label %polly.loop_exit18 polly.loop_header16: ; preds = %polly.loop_exit35, %polly.loop_preheader17 %polly.indvar20 = phi i64 [ 0, %polly.loop_preheader17 ], [ %polly.indvar_next21, %polly.loop_exit35 ] br label %polly.cond23 polly.cond23: ; preds = %polly.loop_header16 %34 = icmp eq i64 %polly.indvar11, 0 br i1 %34, label %polly.then25, label %polly.else26 polly.merge24: ; preds = %polly.else26, %polly.stmt.if7 %35 = mul nsw i64 32, %polly.indvar11 %36 = sub nsw i64 %p_1, %35 %37 = sub nsw i64 %36, 1 %38 = icmp slt i64 31, %37 %39 = select i1 %38, i64 31, i64 %37 br label %polly.loop_if32 polly.loop_exit35: ; preds = %polly.stmt.if9, %polly.loop_if32 %polly.indvar_next21 = add nsw i64 %polly.indvar20, 1 %polly.loop_cond22 = icmp sle i64 %polly.indvar_next21, %33 br i1 %polly.loop_cond22, label %polly.loop_header16, label %polly.loop_exit18 polly.loop_preheader17: ; preds = %polly.loop_if15 br label %polly.loop_header16 polly.then25: ; preds = %polly.cond23 %40 = mul nsw i64 32, %b0 %41 = add nsw i64 %40, %t0 %42 = mul nsw i64 8192, %polly.indvar %43 = add nsw i64 %41, %42 %44 = mul nsw i64 32, %b1 %45 = add nsw i64 %44, %t1 %46 = mul nsw i64 8192, %polly.indvar5 %47 = add nsw i64 %45, %46 %48 = mul nsw i64 16, %polly.indvar20 %49 = add nsw i64 %47, %48 br label %polly.stmt.if7 polly.stmt.if7: ; preds = %polly.then25 %polly.access.cast.MemRef0 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)* %50 = mul nsw i64 32, %b1 %51 = add nsw i64 %50, %t1 %52 = mul nsw i64 8192, %polly.indvar5 %53 = add nsw i64 %51, %52 %54 = mul nsw i64 16, %polly.indvar20 %55 = add nsw i64 %53, %54 %polly.access.mul.MemRef0 = mul nsw i64 %55, %p_3 %56 = mul nsw i64 32, %b0 %57 = add nsw i64 %56, %t0 %58 = mul nsw i64 8192, %polly.indvar %59 = add nsw i64 %57, %58 %polly.access.add.MemRef0 = add nsw i64 %polly.access.mul.MemRef0, %59 %polly.access.MemRef0 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef0, i64 %polly.access.add.MemRef0 %_p_scalar_ = load i64, i64 addrspace(1)* %polly.access.MemRef0, align 8 %p_ = mul i64 %_p_scalar_, %0, !dbg !1 %polly.access.cast.MemRef027 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)* %60 = mul nsw i64 32, %b1 %61 = add nsw i64 %60, %t1 %62 = mul nsw i64 8192, %polly.indvar5 %63 = add nsw i64 %61, %62 %64 = mul nsw i64 16, %polly.indvar20 %65 = add nsw i64 %63, %64 %polly.access.mul.MemRef028 = mul nsw i64 %65, %p_3 %66 = mul nsw i64 32, %b0 %67 = add nsw i64 %66, %t0 %68 = mul nsw i64 8192, %polly.indvar %69 = add nsw i64 %67, %68 %polly.access.add.MemRef029 = add nsw i64 %polly.access.mul.MemRef028, %69 %polly.access.MemRef030 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef027, i64 %polly.access.add.MemRef029 store i64 %p_, i64 addrspace(1)* %polly.access.MemRef030, align 8 br label %polly.merge24 polly.else26: ; preds = %polly.cond23 br label %polly.merge24 polly.loop_if32: ; preds = %polly.merge24 %polly.loop_guard36 = icmp sle i64 0, %39 br i1 %polly.loop_guard36, label %polly.loop_preheader34, label %polly.loop_exit35 polly.loop_header33: ; preds = %polly.stmt.if9, %polly.loop_preheader34 %polly.indvar37 = phi i64 [ 0, %polly.loop_preheader34 ], [ %polly.indvar_next38, %polly.stmt.if9 ] %70 = mul nsw i64 32, %b0 %71 = add nsw i64 %70, %t0 %72 = mul nsw i64 8192, %polly.indvar %73 = add nsw i64 %71, %72 %74 = mul nsw i64 32, %polly.indvar11 %75 = add nsw i64 %74, %polly.indvar37 %76 = mul nsw i64 32, %b1 %77 = add nsw i64 %76, %t1 %78 = mul nsw i64 8192, %polly.indvar5 %79 = add nsw i64 %77, %78 %80 = mul nsw i64 16, %polly.indvar20 %81 = add nsw i64 %79, %80 br label %polly.stmt.if9 polly.stmt.if9: ; preds = %polly.loop_header33 %polly.access.cast.MemRef040 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)* %82 = mul nsw i64 32, %b1 %83 = add nsw i64 %82, %t1 %84 = mul nsw i64 8192, %polly.indvar5 %85 = add nsw i64 %83, %84 %86 = mul nsw i64 16, %polly.indvar20 %87 = add nsw i64 %85, %86 %polly.access.mul.MemRef041 = mul nsw i64 %87, %p_3 %88 = mul nsw i64 32, %b0 %89 = add nsw i64 %88, %t0 %90 = mul nsw i64 8192, %polly.indvar %91 = add nsw i64 %89, %90 %polly.access.add.MemRef042 = add nsw i64 %polly.access.mul.MemRef041, %91 %polly.access.MemRef043 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef040, i64 %polly.access.add.MemRef042 %_p_scalar_44 = load i64, i64 addrspace(1)* %polly.access.MemRef043, align 8 %polly.access.cast.MemRef1 = bitcast i8 addrspace(1)* %MemRef1 to i64 addrspace(1)* %92 = mul nsw i64 32, %polly.indvar11 %93 = add nsw i64 %92, %polly.indvar37 %polly.access.mul.MemRef1 = mul nsw i64 %93, %p_2 %94 = mul nsw i64 32, %b0 %95 = add nsw i64 %94, %t0 %96 = mul nsw i64 8192, %polly.indvar %97 = add nsw i64 %95, %96 %polly.access.add.MemRef1 = add nsw i64 %polly.access.mul.MemRef1, %97 %polly.access.MemRef1 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef1, i64 %polly.access.add.MemRef1 %_p_scalar_45 = load i64, i64 addrspace(1)* %polly.access.MemRef1, align 8 %p_46 = mul i64 %_p_scalar_45, %1, !dbg !27 %polly.access.cast.MemRef2 = bitcast i8 addrspace(1)* %MemRef2 to i64 addrspace(1)* %98 = mul nsw i64 32, %b1 %99 = add nsw i64 %98, %t1 %100 = mul nsw i64 8192, %polly.indvar5 %101 = add nsw i64 %99, %100 %102 = mul nsw i64 16, %polly.indvar20 %103 = add nsw i64 %101, %102 %polly.access.mul.MemRef2 = mul nsw i64 %103, %p_1 %104 = mul nsw i64 32, %polly.indvar11 %105 = add nsw i64 %104, %polly.indvar37 %polly.access.add.MemRef2 = add nsw i64 %polly.access.mul.MemRef2, %105 %polly.access.MemRef2 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef2, i64 %polly.access.add.MemRef2 %_p_scalar_47 = load i64, i64 addrspace(1)* %polly.access.MemRef2, align 8 %p_48 = mul i64 %p_46, %_p_scalar_47, !dbg !27 %p_49 = add i64 %p_48, %_p_scalar_44, !dbg !27 %polly.access.cast.MemRef050 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)* %106 = mul nsw i64 32, %b1 %107 = add nsw i64 %106, %t1 %108 = mul nsw i64 8192, %polly.indvar5 %109 = add nsw i64 %107, %108 %110 = mul nsw i64 16, %polly.indvar20 %111 = add nsw i64 %109, %110 %polly.access.mul.MemRef051 = mul nsw i64 %111, %p_3 %112 = mul nsw i64 32, %b0 %113 = add nsw i64 %112, %t0 %114 = mul nsw i64 8192, %polly.indvar %115 = add nsw i64 %113, %114 %polly.access.add.MemRef052 = add nsw i64 %polly.access.mul.MemRef051, %115 %polly.access.MemRef053 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef050, i64 %polly.access.add.MemRef052 store i64 %p_49, i64 addrspace(1)* %polly.access.MemRef053, align 8 %polly.indvar_next38 = add nsw i64 %polly.indvar37, 1 %polly.loop_cond39 = icmp sle i64 %polly.indvar_next38, %39 br i1 %polly.loop_cond39, label %polly.loop_header33, label %polly.loop_exit35 polly.loop_preheader34: ; preds = %polly.loop_if32 br label %polly.loop_header33 polly.then58: ; preds = %polly.cond56 br label %polly.cond60 polly.cond60: ; preds = %polly.then58 %116 = mul nsw i64 32, %b0 %117 = add nsw i64 %116, %t0 %118 = mul nsw i64 8192, %polly.indvar %119 = add nsw i64 %117, %118 %120 = add nsw i64 %119, 1 %121 = icmp sge i64 %p_2, %120 br i1 %121, label %polly.then62, label %polly.else63 polly.merge61: ; preds = %polly.else63, %polly.loop_exit68 call void @llvm.nvvm.barrier0() br label %polly.merge57 polly.else59: ; preds = %polly.cond56 br label %polly.merge57 polly.then62: ; preds = %polly.cond60 %122 = mul nsw i64 -2, %b1 %123 = mul nsw i64 512, %polly.indvar5 %124 = sub nsw i64 %122, %123 %125 = sub nsw i64 %p_0, %t1 %126 = sub nsw i64 %125, 1 %polly.fdiv_q.shr64 = ashr i64 %126, 4 %127 = add nsw i64 %124, %polly.fdiv_q.shr64 %128 = icmp slt i64 1, %127 %129 = select i1 %128, i64 1, i64 %127 br label %polly.loop_if65 polly.loop_exit68: ; preds = %polly.stmt.if773, %polly.loop_if65 br label %polly.merge61 polly.else63: ; preds = %polly.cond60 br label %polly.merge61 polly.loop_if65: ; preds = %polly.then62 %polly.loop_guard69 = icmp sle i64 0, %129 br i1 %polly.loop_guard69, label %polly.loop_preheader67, label %polly.loop_exit68 polly.loop_header66: ; preds = %polly.stmt.if773, %polly.loop_preheader67 %polly.indvar70 = phi i64 [ 0, %polly.loop_preheader67 ], [ %polly.indvar_next71, %polly.stmt.if773 ] %130 = mul nsw i64 32, %b0 %131 = add nsw i64 %130, %t0 %132 = mul nsw i64 8192, %polly.indvar %133 = add nsw i64 %131, %132 %134 = mul nsw i64 32, %b1 %135 = add nsw i64 %134, %t1 %136 = mul nsw i64 8192, %polly.indvar5 %137 = add nsw i64 %135, %136 %138 = mul nsw i64 16, %polly.indvar70 %139 = add nsw i64 %137, %138 br label %polly.stmt.if773 polly.stmt.if773: ; preds = %polly.loop_header66 %polly.access.cast.MemRef074 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)* %140 = mul nsw i64 32, %b1 %141 = add nsw i64 %140, %t1 %142 = mul nsw i64 8192, %polly.indvar5 %143 = add nsw i64 %141, %142 %144 = mul nsw i64 16, %polly.indvar70 %145 = add nsw i64 %143, %144 %polly.access.mul.MemRef075 = mul nsw i64 %145, %p_3 %146 = mul nsw i64 32, %b0 %147 = add nsw i64 %146, %t0 %148 = mul nsw i64 8192, %polly.indvar %149 = add nsw i64 %147, %148 %polly.access.add.MemRef076 = add nsw i64 %polly.access.mul.MemRef075, %149 %polly.access.MemRef077 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef074, i64 %polly.access.add.MemRef076 %_p_scalar_78 = load i64, i64 addrspace(1)* %polly.access.MemRef077, align 8 %p_79 = mul i64 %_p_scalar_78, %0, !dbg !1 %polly.access.cast.MemRef080 = bitcast i8 addrspace(1)* %MemRef0 to i64 addrspace(1)* %150 = mul nsw i64 32, %b1 %151 = add nsw i64 %150, %t1 %152 = mul nsw i64 8192, %polly.indvar5 %153 = add nsw i64 %151, %152 %154 = mul nsw i64 16, %polly.indvar70 %155 = add nsw i64 %153, %154 %polly.access.mul.MemRef081 = mul nsw i64 %155, %p_3 %156 = mul nsw i64 32, %b0 %157 = add nsw i64 %156, %t0 %158 = mul nsw i64 8192, %polly.indvar %159 = add nsw i64 %157, %158 %polly.access.add.MemRef082 = add nsw i64 %polly.access.mul.MemRef081, %159 %polly.access.MemRef083 = getelementptr i64, i64 addrspace(1)* %polly.access.cast.MemRef080, i64 %polly.access.add.MemRef082 store i64 %p_79, i64 addrspace(1)* %polly.access.MemRef083, align 8 %polly.indvar_next71 = add nsw i64 %polly.indvar70, 1 %polly.loop_cond72 = icmp sle i64 %polly.indvar_next71, %129 br i1 %polly.loop_cond72, label %polly.loop_header66, label %polly.loop_exit68 polly.loop_preheader67: ; preds = %polly.loop_if65 br label %polly.loop_header66 } ; Function Attrs: nounwind readnone declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #1 ; Function Attrs: nounwind readnone declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() #1 ; Function Attrs: nounwind readnone declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 ; Function Attrs: nounwind readnone declare i32 @llvm.nvvm.read.ptx.sreg.tid.y() #1 ; Function Attrs: convergent nounwind declare void @llvm.nvvm.barrier0() #2 attributes #0 = { "polly.skip.fn" } attributes #1 = { nounwind readnone } attributes #2 = { convergent nounwind } !nvvm.annotations = !{!0} !0 = !{void (i8 addrspace(1)*, i8 addrspace(1)*, i8 addrspace(1)*, i64, i64, i64, i64, i64, i64)* @kernel_0, !"maxntidx", i32 32, !"maxntidy", i32 16, !"maxntidz", i32 1} !1 = !DILocation(line: 6, scope: !2) !2 = distinct !DISubprogram(name: "kernel_gemm", linkageName: "julia_kernel_gemm_66058", scope: null, file: !3, type: !4, isLocal: false, isDefinition: true, isOptimized: true, unit: !11, variables: !13) !3 = !DIFile(filename: "REPL[1]", directory: ".") !4 = !DISubroutineType(types: !5) !5 = !{!6, !6, !7, !7, !7} !6 = !DIBasicType(name: "Int64", size: 64, encoding: DW_ATE_unsigned) !7 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !8, size: 64, align: 64) !8 = !DICompositeType(tag: DW_TAG_structure_type, name: "jl_value_t", file: !9, line: 71, align: 64, elements: !10) !9 = !DIFile(filename: "julia.h", directory: "") !10 = !{!7} !11 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !12) !12 = !{} !13 = !{!14, !16, !17, !18, !19, !20, !21, !21, !22, !23, !24, !25, !26} !14 = !DILocalVariable(name: "#self#", arg: 1, scope: !2, file: !3, line: 2, type: !15) !15 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "#kernel_gemm", align: 8, elements: !12, runtimeLang: DW_LANG_Julia, identifier: "#kernel_gemm_65995") !16 = !DILocalVariable(name: "alpha", arg: 2, scope: !2, file: !3, line: 2, type: !6) !17 = !DILocalVariable(name: "beta", arg: 3, scope: !2, file: !3, line: 2, type: !6) !18 = !DILocalVariable(name: "C", arg: 4, scope: !2, file: !3, line: 2, type: !7) !19 = !DILocalVariable(name: "A", arg: 5, scope: !2, file: !3, line: 2, type: !7) !20 = !DILocalVariable(name: "B", arg: 6, scope: !2, file: !3, line: 2, type: !7) !21 = !DILocalVariable(name: "j", scope: !2, file: !3, line: 2, type: !6) !22 = !DILocalVariable(name: "k", scope: !2, file: !3, line: 2, type: !6) !23 = !DILocalVariable(name: "i", scope: !2, file: !3, line: 2, type: !6) !24 = !DILocalVariable(name: "ni", scope: !2, file: !3, line: 2, type: !6) !25 = !DILocalVariable(name: "nk", scope: !2, file: !3, line: 2, type: !6) !26 = !DILocalVariable(name: "nj", scope: !2, file: !3, line: 2, type: !6) !27 = !DILocation(line: 9, scope: !2)
The error message from verifyModule :
DICompileUnit not listed in llvm.dbg.cu
!11 = distinct !DICompileUnit(language: DW_LANG_C89, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !12)
The meta data seems to about variables in the original function.
@bollu that's the file I've been using to test this patch and D32894.
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1117 | I tried both, return !(F && F->isIntrinsic()) || !polly::isIgnoredIntrinsic(V); and return !polly::isIgnoredIntrinsic(V) || !(F && F->isIntrinsic()); but these didn't prevent the "store void (metadata,... polly_launch_0_param_7" instruction from the IR. @bollu @grosser @Meinersbur thoughts ? |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1117 | I don't believe either of those will help, since the function being called will be a store. You'll probably need to loop over the parameters and check if they are debug information or something like that. I don't know how debug metadata works in LLVM. |
@Meinersbur: I added the bail-out code in PPCGCodeGeneration. Could you take a look?
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1117 | Sorry, correction: ofc store is not a function, don't know what I was thinking when I wrote that. |
Hi Siddharth,
your last patch makes perfect sense, but is in fact something different than what this patch review initially started off with. As it is safe and correct to always bail out, I suggest to commit your latest patch independently (with the change I request). We can then continue with white-listing functions.
Michael is ATM on a summer school and likely won't be able to reply. I OK this for now as this is trivial and likely good to go. Please provide (post-commit) comments if we missed something.
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
2716 | try -> trying | |
test/GPGPU/intrinsic-not-copied-to-kernel.ll | ||
14 | Hi Siddharth, could you use a function that is not an intrinsic which we will soon whitelist anyhow? E.g., just an externally defined function where only a declaration is available? |
- Change test case to use a declared global function and not a known intrinsic.
- [NFC] grammar fix.
test/GPGPU/intrinsic-not-copied-to-kernel.ll | ||
---|---|---|
76 | Do we even create a scop if the function is not marked as 'readnone'? |
@grosser Thank you, I didn't notice that we don't model it if there is no readnone. Added a check to make sure that we model the scop.
test/GPGPU/unknown-fn-call-not-copied-into-kernel.ll | ||
---|---|---|
9 ↗ | (On Diff #102167) | This should check that we actually model the scop. |
82 ↗ | (On Diff #102167) | The attributes should now mark it as readnone. |
polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
2719 ↗ | (On Diff #102173) | Does this mean PPCGCodeGen will bail out if there are debug intrinsics ? |
Drop "actually"