Details
- Reviewers
- None
Diff Detail
Event Timeline
The changes to PtrUseVisitor make sense (it's a generic tool) but I'm curious why the correct fix isn't to have instcombine nuke all addrspacecasts of allocas? They don't really make any sense to me...
addrspacecasts of allocas are an important case in OpenCL 2.0. Your private allocations are from allocas in address space 0, which are then often casted to the non-zero generic address space for convenience of use. Accessing private allocations and access through a flat pointer is expensive, so we really want to be able to eliminate the cast to generic and alloca if possible. Right now SROA doesn't eliminate any of these common allocas because of the addrspacecast
This patch is not in SROA yet. Do you mind I check it in?
I was bitten by the same issue in the NVPTX backend. Code patterns such as
%0 = alloca i32 %1 = addrspacecast i32* %0 to addrspace(4) i32* ; cast from generic to local so that later accesses can be much faster ... use %1 ...
will appear quite often after NVPTXFavorNonGenericAddrSpaces (http://llvm.org/docs/doxygen/html/NVPTXTargetMachine_8cpp_source.html#l00169) with some WIP checked in. It would be great if SROA can nuke these allocas across addrspacecasts.
So this is a case that we currently want to handle in nvptx, which is not covered instcombine/sroa right now.
; ModuleID = '<stdin>' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64" target triple = "nvptx64-unknown-unknown" %struct.S = type { i32, i32, i32 } ; Function Attrs: nounwind define void @_Z11TakesStruct1SPi(%struct.S* byval nocapture readonly %input, i32* nocapture %output) #0 { entry: %input1 = alloca %struct.S, align 8 %0 = addrspacecast %struct.S* %input1 to %struct.S addrspace(5)* %input2 = addrspacecast %struct.S* %input to %struct.S addrspace(101)* %input3 = load %struct.S, %struct.S addrspace(101)* %input2, align 4 store %struct.S %input3, %struct.S addrspace(5)* %0, align 8 %1 = getelementptr inbounds %struct.S, %struct.S addrspace(5)* %0, i64 0, i32 1 %2 = load i32, i32 addrspace(5)* %1, align 4 store i32 %2, i32* %output, align 4 ret void }