Page MenuHomePhabricator

Teach SROA about addrspacecast.

Authored by arsenm on Jul 14 2014, 10:44 AM.



Diff Detail

Event Timeline

arsenm updated this revision to Diff 11399.Jul 14 2014, 10:44 AM
arsenm retitled this revision from to Teach SROA about addrspacecast. .
arsenm updated this object.
arsenm edited the test plan for this revision. (Show Details)
arsenm added a reviewer: chandlerc.
arsenm added a subscriber: Unknown Object (MLST).
chandlerc edited edge metadata.Jul 17 2014, 12:58 PM

The changes to PtrUseVisitor make sense (it's a generic tool) but I'm curious why the correct fix isn't to have instcombine nuke all addrspacecasts of allocas? They don't really make any sense to me...

addrspacecasts of allocas are an important case in OpenCL 2.0. Your private allocations are from allocas in address space 0, which are then often casted to the non-zero generic address space for convenience of use. Accessing private allocations and access through a flat pointer is expensive, so we really want to be able to eliminate the cast to generic and alloca if possible. Right now SROA doesn't eliminate any of these common allocas because of the addrspacecast

chandlerc resigned from this revision.Mar 29 2015, 12:50 PM
chandlerc removed a reviewer: chandlerc.

Pretty sure all the addrspace stuff got fixed, but let me know if not.

This patch is not in SROA yet. Do you mind I check it in?

I was bitten by the same issue in the NVPTX backend. Code patterns such as

%0 = alloca i32
%1 = addrspacecast i32* %0 to addrspace(4) i32* ; cast from generic to local so that later accesses can be much faster
... use %1 ...

will appear quite often after NVPTXFavorNonGenericAddrSpaces ( with some WIP checked in. It would be great if SROA can nuke these allocas across addrspacecasts.

So this is a case that we currently want to handle in nvptx, which is not covered instcombine/sroa right now.

; ModuleID = '<stdin>'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-unknown-unknown"

%struct.S = type { i32, i32, i32 }

; Function Attrs: nounwind
define void @_Z11TakesStruct1SPi(%struct.S* byval nocapture readonly %input, i32* nocapture %output) #0 {
  %input1 = alloca %struct.S, align 8
  %0 = addrspacecast %struct.S* %input1 to %struct.S addrspace(5)*
  %input2 = addrspacecast %struct.S* %input to %struct.S addrspace(101)*
  %input3 = load %struct.S, %struct.S addrspace(101)* %input2, align 4
  store %struct.S %input3, %struct.S addrspace(5)* %0, align 8
  %1 = getelementptr inbounds %struct.S, %struct.S addrspace(5)* %0, i64 0, i32 1
  %2 = load i32, i32 addrspace(5)* %1, align 4
  store i32 %2, i32* %output, align 4
  ret void
t-tye added a subscriber: t-tye.Apr 6 2017, 8:48 AM
arsenm abandoned this revision.Feb 21 2019, 5:42 PM