Details
- Reviewers
efriedma jdoerfert Meinersbur gareevroman sebpop • zinob huihuiz pollydev grosser singam-sanjay philip.pfaffe - Commits
- rG8a2c07f6d42f: [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
rPLO311080: [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
rL311080: [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
Diff Detail
- Repository
- rL LLVM
Event Timeline
[WIP] [Polly] [ManagedMemoryRewrite] Rewrite global arrays to pointers that are allocated with managed memory.
This is slightly harder than I initially thought, because,
- mutateType on the original global is entirely the wrong way to go about this. It leaves the IR in some inconsistent state.
- The right solution (I believe) is to create a replacement global (say A.repl), set it up correctly, visit each use site of the original (say A), and change it to the replacement (A.repl)
The problem is that we can't insert an IRBuilder at a Use to call bitcast. We need an Instruction. But the underlying Value could be a Constant or something, and I can't see a way to extract a location for a builder from a Value. I'm probably missing something in the API, any help appreciated.
Right now, this doesn't actually replace, though it does create a separate A.repl and initialises it (I can see the call to cudaMallocManaged with nvprof).
[WIP] [Polly] [ManagedMemoryRewrite] Rewrite global arrays to pointers that are allocated with managed memory.
@grosser I'm sorry, I seem to be hitting some Phab bug that updates D36516 rather than creating a new revision. The original D36516 was an NFC change. I tried running another arc patch with the same effect (you can see the bollu updated this revision to Diff 110436.). Something super weird is happening.
Anyway, let's discuss the proposed change here. I'll move the [NFC] patch to another revision.
lib/CodeGen/ManagedMemoryRewrite.cpp | ||
---|---|---|
125 ↗ | (On Diff #110434) | Yes, hence WIP. I wished to discuss how to do this correctly. Could you request changes, because this patch should not be LGTMd. Please see my comment on the patch about the discussion I wish to have. |
@grosser: could you change your status to "Requesting changes"? That way, you'll see when I push
Example of changes induced by this patch on a .ll file
#include <stdio.h> #include <stdlib.h> #include <assert.h> #include <cuda.h> #include <cuda_runtime.h> int A[5]; void f(int *arr) { for(int i = 0; i < 5; i++) arr[i] = 10; } #define CHECK(r) {_check((r), __LINE__);} void _check(cudaError_t r, int line) { if (r != cudaSuccess) { printf("CUDA error on line %d: %s\n", line, cudaGetErrorString(r)); exit(0); } } void print_arr(int *A) { for(int i = 0; i < 5; i++) printf("%d := %d\n", i, A[i]); } int main() { for(int i = 0; i < 5; i++) { A[i] = -42; } printf("A before kernel:\n"); print_arr(A); printf("launching kernel...\n"); f(A); printf("printing A...\n"); print_arr(A); return 0; }
; ModuleID = 'program.ll' source_filename = "program.c" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" @.str = private unnamed_addr constant [27 x i8] c"CUDA error on line %d: %s\0A\00", align 1 @.str.1 = private unnamed_addr constant [10 x i8] c"%d := %d\0A\00", align 1 @A = common global [5 x i32] zeroinitializer, align 16 @.str.2 = private unnamed_addr constant [18 x i8] c"A before kernel:\0A\00", align 1 @.str.3 = private unnamed_addr constant [21 x i8] c"launching kernel...\0A\00", align 1 @.str.4 = private unnamed_addr constant [15 x i8] c"printing A...\0A\00", align 1 @str = private unnamed_addr constant [17 x i8] c"A before kernel:\00" @str.1 = private unnamed_addr constant [20 x i8] c"launching kernel...\00" @str.2 = private unnamed_addr constant [14 x i8] c"printing A...\00" define void @f(i32* %arr) { entry: br label %entry.split entry.split: ; preds = %entry br label %for.body for.body: ; preds = %entry.split, %for.body %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds i32, i32* %arr, i64 %indvars.iv store i32 10, i32* %arrayidx, align 4 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp ne i64 %indvars.iv.next, 5 br i1 %exitcond, label %for.body, label %for.end for.end: ; preds = %for.body ret void } define void @_check(i32 %r, i32 %line) { entry: br label %entry.split entry.split: ; preds = %entry %cmp = icmp eq i32 %r, 0 br i1 %cmp, label %if.end, label %if.then if.then: ; preds = %entry.split %call = tail call i8* @cudaGetErrorString(i32 %r) %call1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([27 x i8], [27 x i8]* @.str, i64 0, i64 0), i32 %line, i8* %call) tail call void @exit(i32 0) unreachable if.end: ; preds = %entry.split ret void } declare i32 @printf(i8*, ...) declare i8* @cudaGetErrorString(i32) declare void @exit(i32) define void @print_arr(i32* %A) { entry: br label %entry.split entry.split: ; preds = %entry br label %for.body for.body: ; preds = %entry.split, %for.body %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv %tmp4 = load i32, i32* %arrayidx, align 4 %0 = trunc i64 %indvars.iv to i32 %call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 %0, i32 %tmp4) %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp ne i64 %indvars.iv.next, 5 br i1 %exitcond, label %for.body, label %for.end for.end: ; preds = %for.body ret void } define i32 @main() { entry: br label %entry.split entry.split: ; preds = %entry br label %for.body for.body: ; preds = %entry.split, %for.body %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds [5 x i32], [5 x i32]* @A, i64 0, i64 %indvars.iv store i32 -42, i32* %arrayidx, align 4 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp ne i64 %indvars.iv.next, 5 br i1 %exitcond, label %for.body, label %for.end for.end: ; preds = %for.body %puts = tail call i32 @puts(i8* getelementptr inbounds ([17 x i8], [17 x i8]* @str, i64 0, i64 0)) tail call void @print_arr(i32* getelementptr inbounds ([5 x i32], [5 x i32]* @A, i64 0, i64 0)) %puts1 = tail call i32 @puts(i8* getelementptr inbounds ([20 x i8], [20 x i8]* @str.1, i64 0, i64 0)) tail call void @f(i32* getelementptr inbounds ([5 x i32], [5 x i32]* @A, i64 0, i64 0)) %puts2 = tail call i32 @puts(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @str.2, i64 0, i64 0)) tail call void @print_arr(i32* getelementptr inbounds ([5 x i32], [5 x i32]* @A, i64 0, i64 0)) ret i32 0 } ; Function Attrs: nounwind declare i32 @puts(i8* nocapture readonly) #0 attributes #0 = { nounwind }
Optimised .ll file
- Note that all uses of @A have now become @A.toptr
- Also notice that wherever we use @A.toptr, we bitcast it from i32 ** to [i32 x 5]*, which is a legal bitcast (?)
You can run the output program through opt, it passes verify module so there's nothing grossly wrong with it.
Is the bitcast incorrect? Am I missing something obvious?
; ModuleID = 'program.canonical.ll' source_filename = "program.c" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" @.str = private unnamed_addr constant [27 x i8] c"CUDA error on line %d: %s\0A\00", align 1 @.str.1 = private unnamed_addr constant [10 x i8] c"%d := %d\0A\00", align 1 @A = common global [5 x i32] zeroinitializer, align 16 @.str.2 = private unnamed_addr constant [18 x i8] c"A before kernel:\0A\00", align 1 @.str.3 = private unnamed_addr constant [21 x i8] c"launching kernel...\0A\00", align 1 @.str.4 = private unnamed_addr constant [15 x i8] c"printing A...\0A\00", align 1 @str = private unnamed_addr constant [17 x i8] c"A before kernel:\00" @str.1 = private unnamed_addr constant [20 x i8] c"launching kernel...\00" @str.2 = private unnamed_addr constant [14 x i8] c"printing A...\00" @FUNC_f_SCOP_0_KERNEL_0 = private unnamed_addr constant [462 x i8] c"//\0A// Generated by LLVM NVPTX Back-End\0A//\0A\0A.version 3.2\0A.target sm_30\0A.address_size 64\0A\0A\09// .globl\09FUNC_f_SCOP_0_KERNEL_0\0A\0A.visible .entry FUNC_f_SCOP_0_KERNEL_0(\0A\09.param .u64 FUNC_f_SCOP_0_KERNEL_0_param_0\0A)\0A.maxntid 5, 1, 1\0A{\0A\09.reg .b32 \09%r<3>;\0A\09.reg .b64 \09%rd<4>;\0A\0A\09ld.param.u64 \09%rd1, [FUNC_f_SCOP_0_KERNEL_0_param_0];\0A\09mov.u32 \09%r1, %tid.x;\0A\09mul.wide.u32 \09%rd2, %r1, 4;\0A\09add.s64 \09%rd3, %rd1, %rd2;\0A\09mov.u32 \09%r2, 10;\0A\09st.global.u32 \09[%rd3], %r2;\0A\09ret;\0A}\0A\0A\0A\00" @FUNC_f_SCOP_0_KERNEL_0_name = private unnamed_addr constant [23 x i8] c"FUNC_f_SCOP_0_KERNEL_0\00" @A.toptr = global i32* null @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 0, void ()* @A.constructor, i8* bitcast (i32** @A.toptr to i8*) }] define void @f(i32* %arr) { entry: %polly_launch_0_params = alloca [2 x i8*] %polly_launch_0_param_0 = alloca i8* %polly_launch_0_param_size_0 = alloca i32 %polly_launch_0_params_i8ptr = bitcast [2 x i8*]* %polly_launch_0_params to i8* br label %entry.split entry.split: ; preds = %entry br label %polly.split_new_and_old polly.split_new_and_old: ; preds = %entry.split %0 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 1, i64 5) %.obit = extractvalue { i64, i1 } %0, 1 %polly.overflow.state = or i1 false, %.obit %.res = extractvalue { i64, i1 } %0, 0 %1 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 6, i64 %.res) %.obit1 = extractvalue { i64, i1 } %1, 1 %polly.overflow.state2 = or i1 %polly.overflow.state, %.obit1 %.res3 = extractvalue { i64, i1 } %1, 0 %2 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 0, i64 %.res3) %.obit4 = extractvalue { i64, i1 } %2, 1 %polly.overflow.state5 = or i1 %polly.overflow.state2, %.obit4 %.res6 = extractvalue { i64, i1 } %2, 0 %3 = icmp sge i64 %.res6, 0 %4 = and i1 true, %3 %polly.rtc.overflown = xor i1 %polly.overflow.state5, true %polly.rtc.result = and i1 %4, %polly.rtc.overflown br i1 %polly.rtc.result, label %polly.start, label %for.body.pre_entry_bb for.body.pre_entry_bb: ; preds = %polly.split_new_and_old br label %for.body for.body: ; preds = %for.body.pre_entry_bb, %for.body %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.pre_entry_bb ] %arrayidx = getelementptr inbounds i32, i32* %arr, i64 %indvars.iv store i32 10, i32* %arrayidx, align 4 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp ne i64 %indvars.iv.next, 5 br i1 %exitcond, label %for.body, label %polly.merge_new_and_old polly.merge_new_and_old: ; preds = %polly.exiting, %for.body br label %for.end for.end: ; preds = %polly.merge_new_and_old ret void polly.start: ; preds = %polly.split_new_and_old br label %polly.acc.initialize polly.acc.initialize: ; preds = %polly.start %5 = call i8* @polly_initContextCUDA() %6 = bitcast i32* %arr to i8* %7 = getelementptr [2 x i8*], [2 x i8*]* %polly_launch_0_params, i64 0, i64 0 store i8* %6, i8** %polly_launch_0_param_0 %8 = bitcast i8** %polly_launch_0_param_0 to i8* store i8* %8, i8** %7 store i32 4, i32* %polly_launch_0_param_size_0 %9 = getelementptr [2 x i8*], [2 x i8*]* %polly_launch_0_params, i64 0, i64 1 %10 = bitcast i32* %polly_launch_0_param_size_0 to i8* store i8* %10, i8** %9 %11 = call i8* @polly_getKernel(i8* getelementptr inbounds ([462 x i8], [462 x i8]* @FUNC_f_SCOP_0_KERNEL_0, i32 0, i32 0), i8* getelementptr inbounds ([23 x i8], [23 x i8]* @FUNC_f_SCOP_0_KERNEL_0_name, i32 0, i32 0)) call void @polly_launchKernel(i8* %11, i32 1, i32 1, i32 5, i32 1, i32 1, i8* %polly_launch_0_params_i8ptr) call void @polly_freeKernel(i8* %11) call void @polly_synchronizeDevice() call void @polly_freeContext(i8* %5) br label %polly.exiting polly.exiting: ; preds = %polly.acc.initialize br label %polly.merge_new_and_old } define void @_check(i32 %r, i32 %line) { entry: br label %entry.split entry.split: ; preds = %entry %cmp = icmp eq i32 %r, 0 br i1 %cmp, label %if.end, label %if.then if.then: ; preds = %entry.split %call = tail call i8* @cudaGetErrorString(i32 %r) %call1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([27 x i8], [27 x i8]* @.str, i64 0, i64 0), i32 %line, i8* %call) tail call void @exit(i32 0) unreachable if.end: ; preds = %entry.split ret void } declare i32 @printf(i8*, ...) declare i8* @cudaGetErrorString(i32) declare void @exit(i32) define void @print_arr(i32* %A) { entry: br label %entry.split entry.split: ; preds = %entry br label %for.body for.body: ; preds = %for.body, %entry.split %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv %tmp4 = load i32, i32* %arrayidx, align 4 %0 = trunc i64 %indvars.iv to i32 %call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 %0, i32 %tmp4) %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp ne i64 %indvars.iv.next, 5 br i1 %exitcond, label %for.body, label %for.end for.end: ; preds = %for.body ret void } define i32 @main() { entry: br label %entry.split entry.split: ; preds = %entry br label %for.body for.body: ; preds = %for.body, %entry.split %indvars.iv = phi i64 [ 0, %entry.split ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds [5 x i32], [5 x i32]* bitcast (i32** @A.toptr to [5 x i32]*), i64 0, i64 %indvars.iv store i32 -42, i32* %arrayidx, align 4 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp ne i64 %indvars.iv.next, 5 br i1 %exitcond, label %for.body, label %for.end for.end: ; preds = %for.body %puts = tail call i32 @puts(i8* getelementptr inbounds ([17 x i8], [17 x i8]* @str, i64 0, i64 0)) tail call void @print_arr(i32* getelementptr inbounds ([5 x i32], [5 x i32]* bitcast (i32** @A.toptr to [5 x i32]*), i64 0, i64 0)) %puts1 = tail call i32 @puts(i8* getelementptr inbounds ([20 x i8], [20 x i8]* @str.1, i64 0, i64 0)) tail call void @f(i32* getelementptr inbounds ([5 x i32], [5 x i32]* bitcast (i32** @A.toptr to [5 x i32]*), i64 0, i64 0)) %puts2 = tail call i32 @puts(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @str.2, i64 0, i64 0)) tail call void @print_arr(i32* getelementptr inbounds ([5 x i32], [5 x i32]* bitcast (i32** @A.toptr to [5 x i32]*), i64 0, i64 0)) ret i32 0 } ; Function Attrs: nounwind declare i32 @puts(i8* nocapture readonly) #0 ; Function Attrs: nounwind readnone speculatable declare { i64, i1 } @llvm.smul.with.overflow.i64(i64, i64) #1 ; Function Attrs: nounwind readnone speculatable declare { i64, i1 } @llvm.sadd.with.overflow.i64(i64, i64) #1 declare i8* @polly_initContextCUDA() declare i8* @polly_getKernel(i8*, i8*) declare void @polly_launchKernel(i8*, i32, i32, i32, i32, i32, i8*) declare void @polly_freeKernel(i8*) declare void @polly_synchronizeDevice() declare void @polly_freeContext(i8*) declare i8* @polly_mallocManaged(i64) define void @A.constructor() { entry: %mem.raw = call i8* @polly_mallocManaged(i64 16000) %mem.typed = bitcast i8* %mem.raw to i32* store i32* %mem.typed, i32** @A.toptr ret void } attributes #0 = { nounwind } attributes #1 = { nounwind readnone speculatable }
lib/CodeGen/ManagedMemoryRewrite.cpp | ||
---|---|---|
137 ↗ | (On Diff #110543) | I believe this bitcast to be correct. Am I doing something "obviously" wrong here? |
140 ↗ | (On Diff #110543) | For some reason, this fails. I suspect this indicates some deeper bug in my code. |
tools/GPURuntime/GPUJIT.c | ||
36 ↗ | (On Diff #110543) | This is just a debugging change, I'll remove it at the final commit. |
- [WIP] version now works on test code, going to run it on COSMO to detect bugs in implementation.
The function editAllUses look more complicated than necessary. Here is what I'd try:
- For every function, insert a arrptr.load in the entry (or lazily when needed) (might be beneficial to have just one such load per function)
- Iterate of the instructions that contain a use of GlobalArray.
- Recurse over the operands of the instruction. Pseudo code:
Value *recursiveReplace(Value *Replacee, GlobalArray*Old, LoadInst *New) { if (Value == Old) return New; Instuction *Replacement = dyn_cast<Instruction>(Replacee); // Replacee is only an instruction for the recursion root. for (Use &Op : Value->operands()) { auto Replacer= recursiveReplace(Op>get()); if (Replacer== Op->get()) continue; if (!Replacement) { Replacement = Replacee->getAsInstruction(); // Insert Replacement somewhere, e.g. right after arrptr.load (It was previously a constant, i.e. there are no other dependencies than to arrptr.load) } Replacement->setOperand(Op.getOperandNo(), Replacer); } return Replacement; }
- Call
for (Instruction *UserOfArrayInst : ArrayUserInstructions) { auto New = recursiveReplace(UserOfArrayInst , Array, `arrptr.load`); if (New != UserOfArrayInst) UserOfArrayInst->replaceAllUsesWith(New); }
lib/CodeGen/ManagedMemoryRewrite.cpp | ||
---|---|---|
100 ↗ | (On Diff #110693) | [Suggestion] SmallVector instead of std::vector. Most instructions have only one or two operands, such that we don't need a memory location in the majority of cases. |
120–122 ↗ | (On Diff #110693) | [Style] Doxygen comment. |
128 ↗ | (On Diff #110693) | [Style] GEP->getPointerOperand() != ArrToRewrite |
132 ↗ | (On Diff #110693) | [Suggestion] SmallVector and .reserve() |
143–145 ↗ | (On Diff #110693) | [Suggestion] std::set is inefficient. Try SmallPtrSet? |
151–152 ↗ | (On Diff #110693) | Why does GEP need to be handled different than other instructions? There is also GetElementPtrConstantExpr which is not handled here. |
159–165 ↗ | (On Diff #110693) | An assert is enough. |
222 ↗ | (On Diff #110693) | [Style] if (auto *I = dyn_cast<Instruction>(Current)) |
230–235 ↗ | (On Diff #110693) | AFAIK there is nothing else than Instruction and Constant deriving from User. An assert(isa<Constant>(Current)) or auto *C = cast<Constant>(Current) would be enough. |
239 ↗ | (On Diff #110693) | Isn't it more replacing than removing? Also: functions start with lower-case letters |
247 ↗ | (On Diff #110693) | What does this return for multidimension arrays? |
258 ↗ | (On Diff #110693) | Uninitialized data would be ok as well, no? Could handle them the same as zero-initialized data, which is a value uninitialized data can have. |
267 ↗ | (On Diff #110693) | [Style] cast<> instead of dyn_cast<> gives you a type check in asset-builds. |
279 ↗ | (On Diff #110693) | Where does the magic number 100 come from? |
291–292 ↗ | (On Diff #110693) | Is the priority even important? AFAIK it can be just 0 for everything. |
297 ↗ | (On Diff #110693) | unfinished sentence |
test/GPGPU/simple-managed-memory-rewrite.ll | ||
7 ↗ | (On Diff #110693) | REQUIRES: pollyacc missing |
- [Code dump] Dump of all changes that now allow global arrays to be rewritten
correctly.
We still need to rewrite allocas correctly.
- [UNDEBUG] remove debug printing in GPUJIT
- [NFC] check polly and fix test case to reduce size by x100
lib/CodeGen/ManagedMemoryRewrite.cpp | ||
---|---|---|
128 ↗ | (On Diff #110693) | rewriteGEP has been removed. |
132 ↗ | (On Diff #110693) | rewriteGEP has been removed. |
151–152 ↗ | (On Diff #110693) | Yep, GEP special casing is removed. |
247 ↗ | (On Diff #110693) | It would return the inner array type. However, I don't believe this is a problem due to the way we allocate memory. We issue a single cudaMallocManaged, and then we convert the [T] to T*. This should still work, memory-layout wise unless I'm missing something. |
258 ↗ | (On Diff #110693) | Yes, it would. I set it to zero as a safe default. Do we gain / lose anything by switching to uninitialized? |
279 ↗ | (On Diff #110693) | Removed. |
LGTM.
lib/CodeGen/ManagedMemoryRewrite.cpp | ||
---|---|---|
98 ↗ | (On Diff #111198) | is populated WITH all the |
123 ↗ | (On Diff #111198) | No braces and no ";" Maybe also reduce casting. |
127 ↗ | (On Diff #111198) | You don't rewrite GEP, so remove the comment. |
136 ↗ | (On Diff #111198) | keeps |
153 ↗ | (On Diff #111198) | No braces for single statements |
166 ↗ | (On Diff #111198) | No braces for single-statements |
218 ↗ | (On Diff #111198) | Why is there a "false"? |
310 ↗ | (On Diff #111198) | Not needed any more. |
lib/Support/RegisterPasses.cpp | ||
48 ↗ | (On Diff #111198) | Drop. |
353 ↗ | (On Diff #111198) | Unrelated change? |
The idea seems sensible. Some nits inline. Most importantly the intent and algorithm of the pass should be documented much more explicitely.
lib/CodeGen/ManagedMemoryRewrite.cpp | ||
---|---|---|
37 ↗ | (On Diff #111198) | Unused include? |
58 ↗ | (On Diff #111198) | Function names should be humbleCamelCase |
100 ↗ | (On Diff #111198) | This algorithm requires a much more extensive documentation. |
106 ↗ | (On Diff #111198) | This is unused? |
134 ↗ | (On Diff #111198) | InstsToBeDeleted is unused? |
190 ↗ | (On Diff #111198) | Use SmalVvector instead of std::vector |
206 ↗ | (On Diff #111198) | What's the purpose of this? |
212 ↗ | (On Diff #111198) | ElemTy->getPointerTo(AddrSpace) |
237 ↗ | (On Diff #111198) | No need to pass the empty array here. |
244 ↗ | (On Diff #111198) | Isn't this off by a factor of 8? |
257 ↗ | (On Diff #111198) | Use a SmallVector instead |
263 ↗ | (On Diff #111198) | Superfluous braces. |
268 ↗ | (On Diff #111198) | InstsToBeDeleted is never populated with anything. |
284 ↗ | (On Diff #111198) | static |
303 ↗ | (On Diff #111198) | static |
316 ↗ | (On Diff #111198) | Just Builder.getInt64(Size) |
324 ↗ | (On Diff #111198) | Is this actually ever relevant? |
341 ↗ | (On Diff #111198) | Why is this a member? |
369 ↗ | (On Diff #111198) | Here and below: Superfluous braces |
383 ↗ | (On Diff #111198) | Use a SmallSet instead. |
- [Bugfix] Send bytes, not bits. Also, move the alloca function extraction to the correct place.
lib/CodeGen/ManagedMemoryRewrite.cpp | ||
---|---|---|
206 ↗ | (On Diff #111198) | The AddrSpace is a parameter to all pointers. I believe the default address space is zero. As for why LLVM uses it, for example, the NVPTX backend uses address spaces to refer to where the pointer lives in memory. |
218 ↗ | (On Diff #111198) | I need some help with this one, actually. What is the correct linkage type? |
324 ↗ | (On Diff #111198) | Probably not, but I'd much rather be defensive about things like this. |
- [Merge] Merged with master, hoping that GPUJIT does not show up from arc diff this time.
lib/CodeGen/ManagedMemoryRewrite.cpp | ||
---|---|---|
122 ↗ | (On Diff #111478) | Should you not assert before using I above? |
149 ↗ | (On Diff #111478) | Does this actually ever happen? You're recursively expanding ConstExpr operands, so after a single pass that shouldn't be necessary anymore. Does this mean that this loop's trip count is actually never greater than two? |
219 ↗ | (On Diff #111478) | Internal or Common linkage should be used. |
206 ↗ | (On Diff #111198) | The question was not what an AddrSpace is, but why you're actively passing the default everywhere, and even store this in an extra variable. |
- [NFC] Discuss algorithm with phillip offline, he helped to simplify it further.
- [Linkage] Update linkage code to use the correct linker options as well as the ignore linkage flag. Update test case to match this change
- [ReplaceUsesOfWith] remove double-loop that was not required.