This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
35/62
ManagedMemoryRewrite.cpp
-
test/GPGPU/
-
GPGPU/
-
managed-memory-rewrite-malloc-free.ll

Differential D36516

[WIP] [Polly] [ManagedMemoryRewrite] Rewrite global arrays with global pointers that are polly_mallocManage'd
ClosedPublic

Authored by bollu on Aug 9 2017, 6:06 AM.

Download Raw Diff

Details

Reviewers

efriedma
jdoerfert
Meinersbur
gareevroman
sebpop
• zinob
huihuiz
pollydev
grosser
singam-sanjay
philip.pfaffe

Commits

rG8a2c07f6d42f: [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
rPLO311080: [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
rL311080: [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.

Diff Detail

Build Status

Buildable 9173
Build 9173: arc lint + arc unit

Event Timeline

bollu created this revision.Aug 9 2017, 6:06 AM

LGTM.

This revision is now accepted and ready to land.Aug 9 2017, 7:16 AM

[WIP] [Polly] [ManagedMemoryRewrite] Rewrite global arrays to pointers that are allocated with managed memory.

grosser accepted this revision.Aug 9 2017, 10:44 AM

grosser added inline comments.

lib/CodeGen/ManagedMemoryRewrite.cpp
125	Format? Is this more than 80 lines? In general this does not seem to be run through clang format?
164	Unrelated change?

This is slightly harder than I initially thought, because,

mutateType on the original global is entirely the wrong way to go about this. It leaves the IR in some inconsistent state.

The right solution (I believe) is to create a replacement global (say A.repl), set it up correctly, visit each use site of the original (say A), and change it to the replacement (A.repl)

The problem is that we can't insert an IRBuilder at a Use to call bitcast. We need an Instruction. But the underlying Value could be a Constant or something, and I can't see a way to extract a location for a builder from a Value. I'm probably missing something in the API, any help appreciated.

Right now, this doesn't actually replace, though it does create a separate A.repl and initialises it (I can see the call to cudaMallocManaged with nvprof).

[WIP] [Polly] [ManagedMemoryRewrite] Rewrite global arrays to pointers that are allocated with managed memory.

@grosser I'm sorry, I seem to be hitting some Phab bug that updates D36516 rather than creating a new revision. The original D36516 was an NFC change. I tried running another arc patch with the same effect (you can see the bollu updated this revision to Diff 110436.). Something super weird is happening.

Anyway, let's discuss the proposed change here. I'll move the [NFC] patch to another revision.

lib/CodeGen/ManagedMemoryRewrite.cpp
125	Yes, hence `WIP`. I wished to discuss how to do this correctly. Could you `request changes`, because this patch should not be `LGTM`d. Please see my comment on the patch about the discussion I wish to have.

bollu retitled this revision from [NFC] [ManagedMemoryRewrite] [Polly] Erase original malloc and free to [WIP] [Polly] [ManagedMemoryRewrite] Rewrite global arrays with global pointers that are polly_mallocManage'd.Aug 9 2017, 10:52 AM

bollu edited the summary of this revision. (Show Details)

Do you have some other uncommitted phabricator patches in your branch?

@grosser: could you change your status to "Requesting changes"? That way, you'll see when I push

grosser requested changes to this revision.Aug 10 2017, 1:40 AM

This revision now requires changes to proceed.Aug 10 2017, 1:40 AM

[WIP] not sure why this doesn't work, code seems reasonable

Example of changes induced by this patch on a .ll file

source.c

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>
int A[5];
void f(int *arr) {
    for(int i = 0; i < 5; i++) arr[i] = 10;
}


#define CHECK(r) {_check((r), __LINE__);}

void _check(cudaError_t r, int line) {
    if (r != cudaSuccess) {
        printf("CUDA error on line %d: %s\n", line, cudaGetErrorString(r));
        exit(0);
    }
}

void print_arr(int *A) {
    for(int i = 0; i < 5; i++)
        printf("%d := %d\n", i, A[i]);
}


int main() {
    for(int i = 0; i < 5; i++) { A[i] = -42; }
    printf("A before kernel:\n");
    print_arr(A);


    printf("launching kernel...\n");
    f(A);

    printf("printing A...\n");
    print_arr(A);
    return 0;
}

unoptimised.ll

; ModuleID = 'program.ll'
source_filename = "program.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

@.str = private unnamed_addr constant [27 x i8] c"CUDA error on line %d: %s\0A\00", align 1
@.str.1 = private unnamed_addr constant [10 x i8] c"%d := %d\0A\00", align 1
@A = common global [5 x i32] zeroinitializer, align 16
@.str.2 = private unnamed_addr constant [18 x i8] c"A before kernel:\0A\00", align 1
@.str.3 = private unnamed_addr constant [21 x i8] c"launching kernel...\0A\00", align 1
@.str.4 = private unnamed_addr constant [15 x i8] c"printing A...\0A\00", align 1
@str = private unnamed_addr constant [17 x i8] c"A before kernel:\00"
@str.1 = private unnamed_addr constant [20 x i8] c"launching kernel...\00"
@str.2 = private unnamed_addr constant [14 x i8] c"printing A...\00"

define void @f(i32* %arr) {
entry:
  br label %entry.split

entry.split:                                      ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %entry.split, %for.body
  %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds i32, i32* %arr, i64 %indvars.iv
  store i32 10, i32* %arrayidx, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 5
  br i1 %exitcond, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

define void @_check(i32 %r, i32 %line) {
entry:
  br label %entry.split

entry.split:                                      ; preds = %entry
  %cmp = icmp eq i32 %r, 0
  br i1 %cmp, label %if.end, label %if.then

if.then:                                          ; preds = %entry.split
  %call = tail call i8* @cudaGetErrorString(i32 %r)
  %call1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([27 x i8], [27 x i8]* @.str, i64 0, i64 0), i32 %line, i8* %call)
  tail call void @exit(i32 0)
  unreachable

if.end:                                           ; preds = %entry.split
  ret void
}

declare i32 @printf(i8*, ...)

declare i8* @cudaGetErrorString(i32)

declare void @exit(i32)

define void @print_arr(i32* %A) {
entry:
  br label %entry.split

entry.split:                                      ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %entry.split, %for.body
  %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
  %tmp4 = load i32, i32* %arrayidx, align 4
  %0 = trunc i64 %indvars.iv to i32
  %call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 %0, i32 %tmp4)
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 5
  br i1 %exitcond, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

define i32 @main() {
entry:
  br label %entry.split

entry.split:                                      ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %entry.split, %for.body
  %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds [5 x i32], [5 x i32]* @A, i64 0, i64 %indvars.iv
  store i32 -42, i32* %arrayidx, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 5
  br i1 %exitcond, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([17 x i8], [17 x i8]* @str, i64 0, i64 0))
  tail call void @print_arr(i32* getelementptr inbounds ([5 x i32], [5 x i32]* @A, i64 0, i64 0))
  %puts1 = tail call i32 @puts(i8* getelementptr inbounds ([20 x i8], [20 x i8]* @str.1, i64 0, i64 0))
  tail call void @f(i32* getelementptr inbounds ([5 x i32], [5 x i32]* @A, i64 0, i64 0))
  %puts2 = tail call i32 @puts(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @str.2, i64 0, i64 0))
  tail call void @print_arr(i32* getelementptr inbounds ([5 x i32], [5 x i32]* @A, i64 0, i64 0))
  ret i32 0
}

; Function Attrs: nounwind
declare i32 @puts(i8* nocapture readonly) #0

attributes #0 = { nounwind }

Optimised `.ll` file

Note that all uses of @A have now become @A.toptr
Also notice that wherever we use @A.toptr, we bitcast it from i32 ** to [i32 x 5]*, which is a legal bitcast (?)

You can run the output program through opt, it passes verify module so there's nothing grossly wrong with it.
Is the bitcast incorrect? Am I missing something obvious?

optimised.ll

; ModuleID = 'program.canonical.ll'
source_filename = "program.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

@.str = private unnamed_addr constant [27 x i8] c"CUDA error on line %d: %s\0A\00", align 1
@.str.1 = private unnamed_addr constant [10 x i8] c"%d := %d\0A\00", align 1
@A = common global [5 x i32] zeroinitializer, align 16
@.str.2 = private unnamed_addr constant [18 x i8] c"A before kernel:\0A\00", align 1
@.str.3 = private unnamed_addr constant [21 x i8] c"launching kernel...\0A\00", align 1
@.str.4 = private unnamed_addr constant [15 x i8] c"printing A...\0A\00", align 1
@str = private unnamed_addr constant [17 x i8] c"A before kernel:\00"
@str.1 = private unnamed_addr constant [20 x i8] c"launching kernel...\00"
@str.2 = private unnamed_addr constant [14 x i8] c"printing A...\00"
@FUNC_f_SCOP_0_KERNEL_0 = private unnamed_addr constant [462 x i8] c"//\0A// Generated by LLVM NVPTX Back-End\0A//\0A\0A.version 3.2\0A.target sm_30\0A.address_size 64\0A\0A\09// .globl\09FUNC_f_SCOP_0_KERNEL_0\0A\0A.visible .entry FUNC_f_SCOP_0_KERNEL_0(\0A\09.param .u64 FUNC_f_SCOP_0_KERNEL_0_param_0\0A)\0A.maxntid 5, 1, 1\0A{\0A\09.reg .b32 \09%r<3>;\0A\09.reg .b64 \09%rd<4>;\0A\0A\09ld.param.u64 \09%rd1, [FUNC_f_SCOP_0_KERNEL_0_param_0];\0A\09mov.u32 \09%r1, %tid.x;\0A\09mul.wide.u32 \09%rd2, %r1, 4;\0A\09add.s64 \09%rd3, %rd1, %rd2;\0A\09mov.u32 \09%r2, 10;\0A\09st.global.u32 \09[%rd3], %r2;\0A\09ret;\0A}\0A\0A\0A\00"
@FUNC_f_SCOP_0_KERNEL_0_name = private unnamed_addr constant [23 x i8] c"FUNC_f_SCOP_0_KERNEL_0\00"
@A.toptr = global i32* null
@llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 0, void ()* @A.constructor, i8* bitcast (i32** @A.toptr to i8*) }]

define void @f(i32* %arr) {
entry:
  %polly_launch_0_params = alloca [2 x i8*]
  %polly_launch_0_param_0 = alloca i8*
  %polly_launch_0_param_size_0 = alloca i32
  %polly_launch_0_params_i8ptr = bitcast [2 x i8*]* %polly_launch_0_params to i8*
  br label %entry.split

entry.split:                                      ; preds = %entry
  br label %polly.split_new_and_old

polly.split_new_and_old:                          ; preds = %entry.split
  %0 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 1, i64 5)
  %.obit = extractvalue { i64, i1 } %0, 1
  %polly.overflow.state = or i1 false, %.obit
  %.res = extractvalue { i64, i1 } %0, 0
  %1 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 6, i64 %.res)
  %.obit1 = extractvalue { i64, i1 } %1, 1
  %polly.overflow.state2 = or i1 %polly.overflow.state, %.obit1
  %.res3 = extractvalue { i64, i1 } %1, 0
  %2 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 0, i64 %.res3)
  %.obit4 = extractvalue { i64, i1 } %2, 1
  %polly.overflow.state5 = or i1 %polly.overflow.state2, %.obit4
  %.res6 = extractvalue { i64, i1 } %2, 0
  %3 = icmp sge i64 %.res6, 0
  %4 = and i1 true, %3
  %polly.rtc.overflown = xor i1 %polly.overflow.state5, true
  %polly.rtc.result = and i1 %4, %polly.rtc.overflown
  br i1 %polly.rtc.result, label %polly.start, label %for.body.pre_entry_bb

for.body.pre_entry_bb:                            ; preds = %polly.split_new_and_old
  br label %for.body

for.body:                                         ; preds = %for.body.pre_entry_bb, %for.body
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.pre_entry_bb ]
  %arrayidx = getelementptr inbounds i32, i32* %arr, i64 %indvars.iv
  store i32 10, i32* %arrayidx, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 5
  br i1 %exitcond, label %for.body, label %polly.merge_new_and_old

polly.merge_new_and_old:                          ; preds = %polly.exiting, %for.body
  br label %for.end

for.end:                                          ; preds = %polly.merge_new_and_old
  ret void

polly.start:                                      ; preds = %polly.split_new_and_old
  br label %polly.acc.initialize

polly.acc.initialize:                             ; preds = %polly.start
  %5 = call i8* @polly_initContextCUDA()
  %6 = bitcast i32* %arr to i8*
  %7 = getelementptr [2 x i8*], [2 x i8*]* %polly_launch_0_params, i64 0, i64 0
  store i8* %6, i8** %polly_launch_0_param_0
  %8 = bitcast i8** %polly_launch_0_param_0 to i8*
  store i8* %8, i8** %7
  store i32 4, i32* %polly_launch_0_param_size_0
  %9 = getelementptr [2 x i8*], [2 x i8*]* %polly_launch_0_params, i64 0, i64 1
  %10 = bitcast i32* %polly_launch_0_param_size_0 to i8*
  store i8* %10, i8** %9
  %11 = call i8* @polly_getKernel(i8* getelementptr inbounds ([462 x i8], [462 x i8]* @FUNC_f_SCOP_0_KERNEL_0, i32 0, i32 0), i8* getelementptr inbounds ([23 x i8], [23 x i8]* @FUNC_f_SCOP_0_KERNEL_0_name, i32 0, i32 0))
  call void @polly_launchKernel(i8* %11, i32 1, i32 1, i32 5, i32 1, i32 1, i8* %polly_launch_0_params_i8ptr)
  call void @polly_freeKernel(i8* %11)
  call void @polly_synchronizeDevice()
  call void @polly_freeContext(i8* %5)
  br label %polly.exiting

polly.exiting:                                    ; preds = %polly.acc.initialize
  br label %polly.merge_new_and_old
}

define void @_check(i32 %r, i32 %line) {
entry:
  br label %entry.split

entry.split:                                      ; preds = %entry
  %cmp = icmp eq i32 %r, 0
  br i1 %cmp, label %if.end, label %if.then

if.then:                                          ; preds = %entry.split
  %call = tail call i8* @cudaGetErrorString(i32 %r)
  %call1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([27 x i8], [27 x i8]* @.str, i64 0, i64 0), i32 %line, i8* %call)
  tail call void @exit(i32 0)
  unreachable

if.end:                                           ; preds = %entry.split
  ret void
}

declare i32 @printf(i8*, ...)

declare i8* @cudaGetErrorString(i32)

declare void @exit(i32)

define void @print_arr(i32* %A) {
entry:
  br label %entry.split

entry.split:                                      ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %for.body, %entry.split
  %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
  %tmp4 = load i32, i32* %arrayidx, align 4
  %0 = trunc i64 %indvars.iv to i32
  %call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 %0, i32 %tmp4)
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 5
  br i1 %exitcond, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

define i32 @main() {
entry:
  br label %entry.split

entry.split:                                      ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %for.body, %entry.split
  %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds [5 x i32], [5 x i32]* bitcast (i32** @A.toptr to [5 x i32]*), i64 0, i64 %indvars.iv
  store i32 -42, i32* %arrayidx, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 5
  br i1 %exitcond, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([17 x i8], [17 x i8]* @str, i64 0, i64 0))
  tail call void @print_arr(i32* getelementptr inbounds ([5 x i32], [5 x i32]* bitcast (i32** @A.toptr to [5 x i32]*), i64 0, i64 0))
  %puts1 = tail call i32 @puts(i8* getelementptr inbounds ([20 x i8], [20 x i8]* @str.1, i64 0, i64 0))
  tail call void @f(i32* getelementptr inbounds ([5 x i32], [5 x i32]* bitcast (i32** @A.toptr to [5 x i32]*), i64 0, i64 0))
  %puts2 = tail call i32 @puts(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @str.2, i64 0, i64 0))
  tail call void @print_arr(i32* getelementptr inbounds ([5 x i32], [5 x i32]* bitcast (i32** @A.toptr to [5 x i32]*), i64 0, i64 0))
  ret i32 0
}

; Function Attrs: nounwind
declare i32 @puts(i8* nocapture readonly) #0

; Function Attrs: nounwind readnone speculatable
declare { i64, i1 } @llvm.smul.with.overflow.i64(i64, i64) #1

; Function Attrs: nounwind readnone speculatable
declare { i64, i1 } @llvm.sadd.with.overflow.i64(i64, i64) #1

declare i8* @polly_initContextCUDA()

declare i8* @polly_getKernel(i8*, i8*)

declare void @polly_launchKernel(i8*, i32, i32, i32, i32, i32, i8*)

declare void @polly_freeKernel(i8*)

declare void @polly_synchronizeDevice()

declare void @polly_freeContext(i8*)

declare i8* @polly_mallocManaged(i64)

define void @A.constructor() {
entry:
  %mem.raw = call i8* @polly_mallocManaged(i64 16000)
  %mem.typed = bitcast i8* %mem.raw to i32*
  store i32* %mem.typed, i32** @A.toptr
  ret void
}

attributes #0 = { nounwind }
attributes #1 = { nounwind readnone speculatable }

lib/CodeGen/ManagedMemoryRewrite.cpp
137	I believe this bitcast to be correct. Am I doing something "obviously" wrong here?
140	For some reason, this fails. I suspect this indicates some deeper bug in my code.
tools/GPURuntime/GPUJIT.c
36 ↗	(On Diff #110543)	This is just a debugging change, I'll remove it at the final commit.

[WIP] version now works on test code, going to run it on COSMO to detect bugs in implementation.

Harbormaster completed remote builds in B9222: Diff 110693.Aug 11 2017, 5:12 AM

Meinersbur edited edge metadata.Aug 12 2017, 3:58 PM

The function editAllUses look more complicated than necessary. Here is what I'd try:

For every function, insert a arrptr.load in the entry (or lazily when needed) (might be beneficial to have just one such load per function)

Iterate of the instructions that contain a use of GlobalArray.

Recurse over the operands of the instruction. Pseudo code:

Value *recursiveReplace(Value *Replacee, GlobalArray*Old, LoadInst *New) {
  if (Value == Old)
    return New;

  Instuction *Replacement = dyn_cast<Instruction>(Replacee); // Replacee is only an instruction for the recursion root.
  for (Use &Op : Value->operands()) {
    auto Replacer= recursiveReplace(Op>get());
    if (Replacer== Op->get())
     continue;
    if (!Replacement) {
     Replacement = Replacee->getAsInstruction();
     // Insert Replacement somewhere, e.g. right after arrptr.load (It was previously a constant, i.e. there are no other dependencies than to arrptr.load)
   }
    Replacement->setOperand(Op.getOperandNo(),  Replacer);
  }
  return Replacement;
}

Call

for (Instruction *UserOfArrayInst : ArrayUserInstructions) {
  auto New = recursiveReplace(UserOfArrayInst , Array, `arrptr.load`);
  if (New != UserOfArrayInst)
    UserOfArrayInst->replaceAllUsesWith(New);
}

lib/CodeGen/ManagedMemoryRewrite.cpp
99	[Suggestion] `SmallVector` instead of `std::vector`. Most instructions have only one or two operands, such that we don't need a memory location in the majority of cases.
119–121	[Style] Doxygen comment.
127	[Style] `GEP->getPointerOperand() != ArrToRewrite`
131	[Suggestion] `SmallVector` and `.reserve()`
142–144	[Suggestion] `std::set` is inefficient. Try `SmallPtrSet`?
150–151	Why does GEP need to be handled different than other instructions? There is also `GetElementPtrConstantExpr` which is not handled here.
158–164	An `assert` is enough.
221	[Style] if (auto *I = dyn_cast<Instruction>(Current))
229–234	AFAIK there is nothing else than Instruction and Constant deriving from User. An `assert(isa<Constant>(Current))` or `auto *C = cast<Constant>(Current)` would be enough.
238	Isn't it more replacing than removing? Also: functions start with lower-case letters
246	What does this return for multidimension arrays?
257	Uninitialized data would be ok as well, no? Could handle them the same as zero-initialized data, which is a value uninitialized data can have.
266	[Style] `cast<>` instead of `dyn_cast<>` gives you a type check in asset-builds.
278	Where does the magic number `100` come from?
290–291	Is the priority even important? AFAIK it can be just 0 for everything.
296	unfinished sentence
test/GPGPU/simple-managed-memory-rewrite.ll
7 ↗	(On Diff #110693)	`REQUIRES: pollyacc` missing

[Code dump] Dump of all changes that now allow global arrays to be rewritten

correctly.

We still need to rewrite allocas correctly.

[UNDEBUG] remove debug printing in GPUJIT
[NFC] check polly and fix test case to reduce size by x100

bollu marked 13 inline comments as done.Aug 15 2017, 2:00 AM

bollu added inline comments.

lib/CodeGen/ManagedMemoryRewrite.cpp
127	`rewriteGEP` has been removed.
131	`rewriteGEP` has been removed.
150–151	Yep, `GEP` special casing is removed.
246	It would return the `inner` array type. However, I don't believe this is a problem due to the way we allocate memory. We issue a single `cudaMallocManaged`, and then we convert the `[T]` to `T*`. This should still work, memory-layout wise unless I'm missing something.
257	Yes, it would. I set it to zero as a safe default. Do we gain / lose anything by switching to uninitialized?
278	Removed.

[WIP] Update according to Michael's comments.

[NFC] undo all changes to GPUJIT.

@Meinersbur - Second round of review, please :)

LGTM.

lib/CodeGen/ManagedMemoryRewrite.cpp
95	is populated WITH all the
120	No braces and no ";" Maybe also reduce casting.
124	You don't rewrite GEP, so remove the comment.
133	keeps
150	No braces for single statements
163	No braces for single-statements
215	Why is there a "false"?
307	Not needed any more.
lib/Support/RegisterPasses.cpp
48 ↗	(On Diff #111198)	Drop.
353 ↗	(On Diff #111198)	Unrelated change?

This revision is now accepted and ready to land.Aug 16 2017, 2:40 AM

The idea seems sensible. Some nits inline. Most importantly the intent and algorithm of the pass should be documented much more explicitely.

lib/CodeGen/ManagedMemoryRewrite.cpp
37	Unused include?
49	Function names should be humbleCamelCase
97	This algorithm requires a much more extensive documentation.
103	This is unused?
131	InstsToBeDeleted is unused?
161	Why is this a member?
187	Use SmalVvector instead of std::vector
193	Here and below: Superfluous braces
203	What's the purpose of this?
207	Use a SmallSet instead.
209	`ElemTy->getPointerTo(AddrSpace)`
234	No need to pass the empty array here.
241	Isn't this off by a factor of 8?
254	Use a SmallVector instead
260	Superfluous braces.
265	`InstsToBeDeleted` is never populated with anything.
281	static
300	static
313	Just `Builder.getInt64(Size)`
321	Is this actually ever relevant?

[NFC] Rewrite based on Phillip & Tobias' comments

[Bugfix] Send bytes, not bits. Also, move the alloca function extraction to the correct place.

Harbormaster completed remote builds in B9353: Diff 111475.Aug 17 2017, 1:28 AM

bollu added inline comments.Aug 17 2017, 1:28 AM

lib/CodeGen/ManagedMemoryRewrite.cpp
203	The `AddrSpace` is a parameter to all pointers. I believe the default address space is zero. As for why LLVM uses it, for example, the NVPTX backend uses address spaces to refer to where the pointer lives in memory.
215	I need some help with this one, actually. What is the correct linkage type?
321	Probably not, but I'd much rather be defensive about things like this.

@philip.pfaffe Another round of review, please?

[Re-upload] diff against the newest HEAD.

Harbormaster completed remote builds in B9354: Diff 111476.Aug 17 2017, 1:32 AM

[Merge] Merged with master, hoping that GPUJIT does not show up from arc diff this time.

Harbormaster completed remote builds in B9355: Diff 111478.Aug 17 2017, 1:36 AM

philip.pfaffe added inline comments.Aug 17 2017, 1:58 AM

lib/CodeGen/ManagedMemoryRewrite.cpp
124	Should you not assert before using I above?
151	Does this actually ever happen? You're recursively expanding ConstExpr operands, so after a single pass that shouldn't be necessary anymore. Does this mean that this loop's trip count is actually never greater than two?
203	The question was not what an `AddrSpace` is, but why you're actively passing the default everywhere, and even store this in an extra variable.
221	Internal or Common linkage should be used.

[NFC] Discuss algorithm with phillip offline, he helped to simplify it further.
[Linkage] Update linkage code to use the correct linker options as well as the ignore linkage flag. Update test case to match this change
[ReplaceUsesOfWith] remove double-loop that was not required.

Harbormaster completed remote builds in B9357: Diff 111484.Aug 17 2017, 2:34 AM

Besides the AddrSpace, LGTM!

[NFC] remove 0 address space because that is the default value.

Harbormaster completed remote builds in B9360: Diff 111488.Aug 17 2017, 3:41 AM

Closed by commit rL311080: [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas. (authored by bollu). · Explain WhyAug 17 2017, 4:23 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

ManagedMemoryRewrite.cpp

75 lines

test/

GPGPU/

managed-memory-rewrite-malloc-free.ll

4 lines

Diff 110436

lib/CodeGen/ManagedMemoryRewrite.cpp

Show All 28 Lines
#include "polly/ScopInfo.h"		#include "polly/ScopInfo.h"
#include "polly/Support/SCEVValidator.h"		#include "polly/Support/SCEVValidator.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
		philip.pfaffeUnsubmitted Not Done Reply Inline Actions Unused include? philip.pfaffe: Unused include?
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/IRReader/IRReader.h"		#include "llvm/IRReader/IRReader.h"
#include "llvm/Linker/Linker.h"		#include "llvm/Linker/Linker.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
		#include "llvm/Transforms/Utils/ModuleUtils.h"
namespace {		namespace {

static llvm::Function *GetOrCreatePollyMallocManaged(Module &M) {		static llvm::Function *GetOrCreatePollyMallocManaged(Module &M) {
		philip.pfaffeUnsubmitted Done Reply Inline Actions Function names should be humbleCamelCase philip.pfaffe: Function names should be humbleCamelCase
// TODO: should I allow this pass to be a standalone pass that		// TODO: should I allow this pass to be a standalone pass that
// doesn't care if PollyManagedMemory is enabled or not?		// doesn't care if PollyManagedMemory is enabled or not?
assert(PollyManagedMemory &&		assert(PollyManagedMemory &&
"One should only rewrite malloc & free to"		"One should only rewrite malloc & free to"
"polly_{malloc,free}Managed with managed memory enabled.");		"polly_{malloc,free}Managed with managed memory enabled.");
const char *Name = "polly_mallocManaged";		const char *Name = "polly_mallocManaged";
Function *F = M.getFunction(Name);		Function *F = M.getFunction(Name);

Show All 27 Lines	if (!F) {
FunctionType *Ty =		FunctionType *Ty =
FunctionType::get(Builder.getVoidTy(), {Builder.getInt8PtrTy()}, false);		FunctionType::get(Builder.getVoidTy(), {Builder.getInt8PtrTy()}, false);
F = Function::Create(Ty, Linkage, Name, &M);		F = Function::Create(Ty, Linkage, Name, &M);
}		}

return F;		return F;
}		}

		static void RewriteGlobalArray(Module &M, const DataLayout &DL, GlobalVariable &Array) {
		static const unsigned AddrSpace = 0;
		// We only want arrays.
		grosserUnsubmitted Not Done Reply Inline Actions is populated WITH all the grosser: is populated WITH all the
		ArrayType *ArrayTy = dyn_cast<ArrayType>(Array.getType()->getElementType());
		if (!ArrayTy) return;
		philip.pfaffeUnsubmitted Not Done Reply Inline Actions This algorithm requires a much more extensive documentation. philip.pfaffe: This algorithm requires a much more extensive documentation.
		Type *ElemTy = ArrayTy->getElementType();
		PointerType *ElemPtrTy = PointerType::get(ElemTy, AddrSpace);
		MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] `SmallVector` instead of `std::vector`. Most instructions have only one or two operands, such that we don't need a memory location in the majority of cases. Meinersbur: [Suggestion] `SmallVector` instead of `std::vector`. Most instructions have only one or two…

		// We only wish to replace stuff with internal linkage. Otherwise,
		// our type edit from [T] to T* would be illegal across modules.
		// It is interesting that most arrays don't seem to be tagged with internal linkage?
		philip.pfaffeUnsubmitted Done Reply Inline Actions This is unused? philip.pfaffe: This is unused?
		if (GlobalValue::isWeakForLinker(Array.getLinkage()) && false) {return; }

		if (!Array.hasInitializer()\|\| !isa<ConstantAggregateZero>(Array.getInitializer())) return;

		std::string NewName = (Array.getName() + Twine(".toptr")).str();
		GlobalVariable *ReplacementToArr = dyn_cast<GlobalVariable>(M.getOrInsertGlobal(NewName, ElemPtrTy));
		ReplacementToArr->setInitializer(ConstantPointerNull::get(ElemPtrTy));

		Function *PollyMallocManaged = GetOrCreatePollyMallocManaged(M);
		Twine FnName = Array.getName() + ".constructor";
		PollyIRBuilder Builder(M.getContext());
		FunctionType *Ty =
		FunctionType::get(Builder.getVoidTy(), {}, false);
		const GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
		Function *F = Function::Create(Ty, Linkage, FnName, &M);
		BasicBlock *Start = BasicBlock::Create(M.getContext(), "entry", F);
		Builder.SetInsertPoint(Start);
		grosserUnsubmitted Done Reply Inline Actions No braces and no ";" Maybe also reduce casting. grosser: No braces and no ";" Maybe also reduce casting.

		MeinersburUnsubmitted Not Done Reply Inline Actions [Style] Doxygen comment. Meinersbur: [Style] Doxygen comment.

		int ArraySizeInt = DL.getTypeAllocSizeInBits(ArrayTy);
		Value *ArraySize = ConstantInt::get(Builder.getInt64Ty(), ArraySizeInt);
		grosserUnsubmitted Done Reply Inline Actions You don't rewrite GEP, so remove the comment. grosser: You don't rewrite GEP, so remove the comment.
		philip.pfaffeUnsubmitted Not Done Reply Inline Actions Should you not assert before using I above? philip.pfaffe: Should you not assert before using I above?
		ArraySize->setName("array.size");
		grosserUnsubmitted Done Reply Inline Actions Format? Is this more than 80 lines? In general this does not seem to be run through clang format? grosser: Format? Is this more than 80 lines? In general this does not seem to be run through clang…
		bolluAuthorUnsubmitted Not Done Reply Inline Actions Yes, hence `WIP`. I wished to discuss how to do this correctly. Could you `request changes`, because this patch should not be `LGTM`d. Please see my comment on the patch about the discussion I wish to have. bollu: Yes, hence `WIP`. I wished to discuss how to do this correctly. Could you `request changes`…

		Value *AllocatedMemRaw = Builder.CreateCall(PollyMallocManaged, { ArraySize }, "mem.raw");
		MeinersburUnsubmitted Done Reply Inline Actions [Style] `GEP->getPointerOperand() != ArrToRewrite` Meinersbur: [Style] `GEP->getPointerOperand() != ArrToRewrite`
		bolluAuthorUnsubmitted Not Done Reply Inline Actions `rewriteGEP` has been removed. bollu: `rewriteGEP` has been removed.
		Value *AllocatedMemTyped = Builder.CreatePointerCast(AllocatedMemRaw, ElemPtrTy, "mem.typed");
		Builder.CreateStore(AllocatedMemTyped, ReplacementToArr);
		Builder.CreateRetVoid();

		MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] `SmallVector` and `.reserve()` Meinersbur: [Suggestion] `SmallVector` and `.reserve()`
		bolluAuthorUnsubmitted Not Done Reply Inline Actions `rewriteGEP` has been removed. bollu: `rewriteGEP` has been removed.
		philip.pfaffeUnsubmitted Done Reply Inline Actions InstsToBeDeleted is unused? philip.pfaffe: InstsToBeDeleted is unused?
		// All the array's uses will be GEP
		for (Use &use: Array.uses()) {
		grosserUnsubmitted Done Reply Inline Actions keeps grosser: keeps
		//GEPOperator *GEP = dyn_cast<GEPOperator>(use->getUser());
		GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(use.getUser());
		if (!GEP) {
		errs() << "\|User of global that is NOT a GEP: " << *use.getUser() << "\n";
		bolluAuthorUnsubmitted Not Done Reply Inline Actions I believe this bitcast to be correct. Am I doing something "obviously" wrong here? bollu: I believe this bitcast to be correct. Am I doing something "obviously" wrong here?
		continue;
		}
		// assert(GEP && "Global array is being used without a GEP!");
		bolluAuthorUnsubmitted Not Done Reply Inline Actions For some reason, this fails. I suspect this indicates some deeper bug in my code. bollu: For some reason, this fails. I suspect this indicates some deeper bug in my code.
		Builder.SetInsertPoint(GEP);
		Value *BitcastedNewArray = Builder.CreateBitCast(ReplacementToArr, PointerType::get(ArrayTy, AddrSpace));
		use.set(BitcastedNewArray);
		}
		MeinersburUnsubmitted Done Reply Inline Actions [Suggestion] `std::set` is inefficient. Try `SmallPtrSet`? Meinersbur: [Suggestion] `std::set` is inefficient. Try `SmallPtrSet`?



		// HACK: refactor this.
		static int priority = 0;
		appendToGlobalCtors(M, F, priority++, ReplacementToArr);
		grosserUnsubmitted Done Reply Inline Actions No braces for single statements grosser: No braces for single statements

		MeinersburUnsubmitted Not Done Reply Inline Actions Why does GEP need to be handled different than other instructions? There is also `GetElementPtrConstantExpr` which is not handled here. Meinersbur: Why does GEP need to be handled different than other instructions? There is also…
		bolluAuthorUnsubmitted Not Done Reply Inline Actions Yep, `GEP` special casing is removed. bollu: Yep, `GEP` special casing is removed.
		philip.pfaffeUnsubmitted Not Done Reply Inline Actions Does this actually ever happen? You're recursively expanding ConstExpr operands, so after a single pass that shouldn't be necessary anymore. Does this mean that this loop's trip count is actually never greater than two? philip.pfaffe: Does this actually ever happen? You're recursively expanding ConstExpr operands, so after a…


		}

class ManagedMemoryRewritePass : public ModulePass {		class ManagedMemoryRewritePass : public ModulePass {
public:		public:
static char ID;		static char ID;
GPUArch Architecture;		GPUArch Architecture;
GPURuntime Runtime;		GPURuntime Runtime;
		const DataLayout *DL;
		philip.pfaffeUnsubmitted Done Reply Inline Actions Why is this a member? philip.pfaffe: Why is this a member?

ManagedMemoryRewritePass() : ModulePass(ID) {}		ManagedMemoryRewritePass() : ModulePass(ID) {}
		grosserUnsubmitted Done Reply Inline Actions No braces for single-statements grosser: No braces for single-statements

		grosserUnsubmitted Not Done Reply Inline Actions Unrelated change? grosser: Unrelated change?
		MeinersburUnsubmitted Done Reply Inline Actions An `assert` is enough. Meinersbur: An `assert` is enough.
virtual bool runOnModule(Module &M) {		virtual bool runOnModule(Module &M) {
		DL = &M.getDataLayout();

Function *Malloc = M.getFunction("malloc");		Function *Malloc = M.getFunction("malloc");

if (Malloc) {		if (Malloc) {
Function *PollyMallocManaged = GetOrCreatePollyMallocManaged(M);		Function *PollyMallocManaged = GetOrCreatePollyMallocManaged(M);
assert(PollyMallocManaged && "unable to create polly_mallocManaged");		assert(PollyMallocManaged && "unable to create polly_mallocManaged");
Malloc->replaceAllUsesWith(PollyMallocManaged);		Malloc->replaceAllUsesWith(PollyMallocManaged);
		Malloc->eraseFromParent();
}		}

Function *Free = M.getFunction("free");		Function *Free = M.getFunction("free");

if (Free) {		if (Free) {
Function *PollyFreeManaged = GetOrCreatePollyFreeManaged(M);		Function *PollyFreeManaged = GetOrCreatePollyFreeManaged(M);
assert(PollyFreeManaged && "unable to create polly_freeManaged");		assert(PollyFreeManaged && "unable to create polly_freeManaged");
Free->replaceAllUsesWith(PollyFreeManaged);		Free->replaceAllUsesWith(PollyFreeManaged);
		Free->eraseFromParent();
		}

		for(GlobalVariable &Global : M.globals()) {
		RewriteGlobalArray(M, *DL, Global);
		philip.pfaffeUnsubmitted Done Reply Inline Actions Use SmalVvector instead of std::vector philip.pfaffe: Use SmalVvector instead of std::vector
}		}

return true;		return true;
}		}
};		};

		philip.pfaffeUnsubmitted Done Reply Inline Actions Here and below: Superfluous braces philip.pfaffe: Here and below: Superfluous braces
} // namespace		} // namespace
char ManagedMemoryRewritePass::ID = 42;		char ManagedMemoryRewritePass::ID = 42;

Pass *polly::createManagedMemoryRewritePassPass(GPUArch Arch,		Pass *polly::createManagedMemoryRewritePassPass(GPUArch Arch,
GPURuntime Runtime) {		GPURuntime Runtime) {
ManagedMemoryRewritePass *pass = new ManagedMemoryRewritePass();		ManagedMemoryRewritePass *pass = new ManagedMemoryRewritePass();
pass->Runtime = Runtime;		pass->Runtime = Runtime;
pass->Architecture = Arch;		pass->Architecture = Arch;
return pass;		return pass;
}		}
		philip.pfaffeUnsubmitted Not Done Reply Inline Actions What's the purpose of this? philip.pfaffe: What's the purpose of this?
		bolluAuthorUnsubmitted Not Done Reply Inline Actions The `AddrSpace` is a parameter to all pointers. I believe the default address space is zero. As for why LLVM uses it, for example, the NVPTX backend uses address spaces to refer to where the pointer lives in memory. bollu: The `AddrSpace` is a parameter to all pointers. I believe the default address space is zero.
		philip.pfaffeUnsubmitted Not Done Reply Inline Actions The question was not what an `AddrSpace` is, but why you're actively passing the default everywhere, and even store this in an extra variable. philip.pfaffe: The question was not what an `AddrSpace` is, but why you're actively passing the default…

INITIALIZE_PASS_BEGIN(		INITIALIZE_PASS_BEGIN(
ManagedMemoryRewritePass, "polly-acc-rewrite-managed-memory",		ManagedMemoryRewritePass, "polly-acc-rewrite-managed-memory",
"Polly - Rewrite all allocations in heap & data section to managed memory",		"Polly - Rewrite all allocations in heap & data section to managed memory",
		philip.pfaffeUnsubmitted Done Reply Inline Actions Use a SmallSet instead. philip.pfaffe: Use a SmallSet instead.
false, false)		false, false)
INITIALIZE_PASS_DEPENDENCY(PPCGCodeGeneration);		INITIALIZE_PASS_DEPENDENCY(PPCGCodeGeneration);
		philip.pfaffeUnsubmitted Done Reply Inline Actions `ElemTy->getPointerTo(AddrSpace)` philip.pfaffe: `ElemTy->getPointerTo(AddrSpace)`
INITIALIZE_PASS_DEPENDENCY(DependenceInfo);		INITIALIZE_PASS_DEPENDENCY(DependenceInfo);
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass);		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass);
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass);		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass);
INITIALIZE_PASS_DEPENDENCY(RegionInfoPass);		INITIALIZE_PASS_DEPENDENCY(RegionInfoPass);
INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass);		INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass);
INITIALIZE_PASS_DEPENDENCY(ScopDetectionWrapperPass);		INITIALIZE_PASS_DEPENDENCY(ScopDetectionWrapperPass);
		grosserUnsubmitted Not Done Reply Inline Actions Why is there a "false"? grosser: Why is there a "false"?
		bolluAuthorUnsubmitted Not Done Reply Inline Actions I need some help with this one, actually. What is the correct linkage type? bollu: I need some help with this one, actually. What is the correct linkage type?
INITIALIZE_PASS_END(		INITIALIZE_PASS_END(
ManagedMemoryRewritePass, "polly-acc-rewrite-managed-memory",		ManagedMemoryRewritePass, "polly-acc-rewrite-managed-memory",
"Polly - Rewrite all allocations in heap & data section to managed memory",		"Polly - Rewrite all allocations in heap & data section to managed memory",
false, false)		false, false)
		MeinersburUnsubmitted Not Done Reply Inline Actions What does this return for multidimension arrays? Meinersbur: What does this return for multidimension arrays?
		bolluAuthorUnsubmitted Not Done Reply Inline Actions It would return the `inner` array type. However, I don't believe this is a problem due to the way we allocate memory. We issue a single `cudaMallocManaged`, and then we convert the `[T]` to `T`. This should still work, memory-layout wise unless I'm missing something. bollu:* It would return the `inner` array type. However, I don't believe this is a problem due to the…
		MeinersburUnsubmitted Done Reply Inline Actions Uninitialized data would be ok as well, no? Could handle them the same as zero-initialized data, which is a value uninitialized data can have. Meinersbur: Uninitialized data would be ok as well, no? Could handle them the same as zero-initialized data…
		bolluAuthorUnsubmitted Not Done Reply Inline Actions Yes, it would. I set it to zero as a safe default. Do we gain / lose anything by switching to uninitialized? bollu: Yes, it would. I set it to zero as a safe default. Do we gain / lose anything by switching to…
		MeinersburUnsubmitted Done Reply Inline Actions [Style] `cast<>` instead of `dyn_cast<>` gives you a type check in asset-builds. Meinersbur: [Style] `cast<>` instead of `dyn_cast<>` gives you a type check in asset-builds.
		MeinersburUnsubmitted Done Reply Inline Actions Where does the magic number `100` come from? Meinersbur: Where does the magic number `100` come from?
		bolluAuthorUnsubmitted Not Done Reply Inline Actions Removed. bollu: Removed.
		MeinersburUnsubmitted Done Reply Inline Actions unfinished sentence Meinersbur: unfinished sentence
		MeinersburUnsubmitted Done Reply Inline Actions Is the priority even important? AFAIK it can be just 0 for everything. Meinersbur: Is the priority even important? AFAIK it can be just 0 for everything.
		MeinersburUnsubmitted Done Reply Inline Actions [Style] if (auto I = dyn_cast<Instruction>(Current)) Meinersbur:* [Style] ```if (auto *I = dyn_cast<Instruction>(Current))```
		MeinersburUnsubmitted Done Reply Inline Actions AFAIK there is nothing else than Instruction and Constant deriving from User. An `assert(isa<Constant>(Current))` or `auto C = cast<Constant>(Current)` would be enough. Meinersbur:* AFAIK there is nothing else than Instruction and Constant deriving from User. An `assert…
		MeinersburUnsubmitted Not Done Reply Inline Actions Isn't it more replacing than removing? Also: functions start with lower-case letters Meinersbur: Isn't it more replacing than removing? Also: functions start with lower-case letters
		grosserUnsubmitted Done Reply Inline Actions Not needed any more. grosser: Not needed any more.
		philip.pfaffeUnsubmitted Not Done Reply Inline Actions No need to pass the empty array here. philip.pfaffe: No need to pass the empty array here.
		philip.pfaffeUnsubmitted Done Reply Inline Actions Isn't this off by a factor of 8? philip.pfaffe: Isn't this off by a factor of 8?
		philip.pfaffeUnsubmitted Done Reply Inline Actions Use a SmallVector instead philip.pfaffe: Use a SmallVector instead
		philip.pfaffeUnsubmitted Done Reply Inline Actions Superfluous braces. philip.pfaffe: Superfluous braces.
		philip.pfaffeUnsubmitted Done Reply Inline Actions `InstsToBeDeleted` is never populated with anything. philip.pfaffe: `InstsToBeDeleted` is never populated with anything.
		philip.pfaffeUnsubmitted Done Reply Inline Actions static philip.pfaffe: static
		philip.pfaffeUnsubmitted Done Reply Inline Actions static philip.pfaffe: static
		philip.pfaffeUnsubmitted Done Reply Inline Actions Just `Builder.getInt64(Size)` philip.pfaffe: Just `Builder.getInt64(Size)`
		philip.pfaffeUnsubmitted Done Reply Inline Actions Is this actually ever relevant? philip.pfaffe: Is this actually ever relevant?
		bolluAuthorUnsubmitted Not Done Reply Inline Actions Probably not, but I'd much rather be defensive about things like this. bollu: Probably not, but I'd much rather be defensive about things like this.
		philip.pfaffeUnsubmitted Not Done Reply Inline Actions Internal or Common linkage should be used. philip.pfaffe: Internal or Common linkage should be used.

test/GPGPU/managed-memory-rewrite-malloc-free.ll

	Show All 35 Lines
	; HOST-IR: %call = tail call i8* @polly_mallocManaged(i64 400)			; HOST-IR: %call = tail call i8* @polly_mallocManaged(i64 400)
	; HOST-IR: declare i8* @polly_mallocManaged(i64)			; HOST-IR: declare i8* @polly_mallocManaged(i64)

	; // Check that polly_freeManaged is declared and used correctly.			; // Check that polly_freeManaged is declared and used correctly.
	; HOST-IR %toFreeBitcast = bitcast i32* %toFree to i8*			; HOST-IR %toFreeBitcast = bitcast i32* %toFree to i8*
	; HOST-IR call void @polly_freeManaged(i8* %toFreeBitcast)			; HOST-IR call void @polly_freeManaged(i8* %toFreeBitcast)
	; HOST-IR: declare void @polly_freeManaged(i8*)			; HOST-IR: declare void @polly_freeManaged(i8*)

				; // Check that we remove the original malloc,free
				; HOST-IR-NOT: declare i8* @malloc(i64)
				; HOST-IR-NOT: declare void @free(i8*)

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.12.0"			target triple = "x86_64-apple-macosx10.12.0"

	define i32* @f(i32 *%toFree) {			define i32* @f(i32 *%toFree) {
	entry:			entry:
	%toFreeBitcast = bitcast i32* %toFree to i8*			%toFreeBitcast = bitcast i32* %toFree to i8*
	call void @free(i8* %toFreeBitcast)			call void @free(i8* %toFreeBitcast)
	br label %entry.split			br label %entry.split
	Show All 39 Lines