- In last https://reviews.llvm.org/D69826, generic pointers in struct/array types are also replaced with global pointers. But, as no additional addrspacecast is inserted, they are promoted with a ptrtoint/inttoptr pair in SROA/GVN. That breaks the address space inferring as well as other optimizations. For such case, we need to recursively dive into these aggregate types and insert addrspacecast when necessary.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/CodeGen/CGCall.cpp | ||
---|---|---|
1339 | Is there a limit on array size? We may end up here with potentially unbounded number of spacecasts. Perhaps we need a loop which may later be unrolled, if feasible. | |
clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu | ||
17 | Nit: relying on parameter name %x.coerce may be rather fragile. Considering that we don't care about the name itself, it may be better to just trim the match after %x or even after`%`. | |
62–63 | Nit: You could use CHECK-COUNT-2 : https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-count-directive |
addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.
addrspacecast *must* be a no-op in terms of argument coercion. As we change the parameter type but cannot control how an argument is prepared, coercion should only change how we interpret them instead of how we encode them. If the target chooses to add an addrspacecast, it must ensure that's a no-op. In AMDGPU, global pointers have the same encoding as their corresponding generic pointers.
So what does this mean exactly? If the ABI lowering uses argument coercion in a way that changes address spaces, it must ensure that the representations are different? So it's always *legal* to just do a memcpy here, we're just trying really hard to not do so.
If addrspacecast is used, the target must ensure that addrspacecast would NOT change the representation to use argument coercion. Yeah, it's always safe to memcpy or load/store directly but, without target specific knowledge, SROA/GVN won't create a no-op addrspacecast to help address space inferring pass. SROA/GVN only generates the pair of ptr2int/int2ptr to ensure the presentation is the same. Unfortunately, address space inferring doesn't understand that without target specific knowledge as well. That leaves the address space information won't be propagated to final IR. We need a trigger to enable that. That's why we need to add addrspacecast for a simple type argument in line 1314 - 1321. In this patch, we need to traverse the struct and add proper addrspacecast.
Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?
It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces. InferAddressSpaces won't be able to infer this (maybe it could, but it might break a theoretical language that allows passing some valid 32-bit address in 64-bit flat pointers).
I think the real problem here is the IR is missing a no-op cast between pointers with different address spaces. With the current promotion, GVN sees the bits of the addrspace(0) ptr get reinterpreted as addrspace(1), and doesn't have a better option than a ptrtoint/inttoptr pair. When addrspacecast was added, pointer bitcasts between different address spaces were disallowed (I think mostly for reasons that no longer apply). If we reallowed pointer bitcast between equivalent sized address spaces as known no-op cast, we wouldn't need to change the argument promotion in the frontend.
Is that because it's a kernel argument? I don't understand how the coercion can work in general for structs that just contain pointers.
I think the real problem here is the IR is missing a no-op cast between pointers with different address spaces. With the current promotion, GVN sees the bits of the addrspace(0) ptr get reinterpreted as addrspace(1), and doesn't have a better option than a ptrtoint/inttoptr pair. When addrspacecast was added, pointer bitcasts between different address spaces were disallowed (I think mostly for reasons that no longer apply). If we reallowed pointer bitcast between equivalent sized address spaces as known no-op cast, we wouldn't need to change the argument promotion in the frontend.
That seems like it would be a much cleaner way of handling this, yeah. Ideally, GVN would be able to find the target-specific information necessary to know that it can introduce an addrspacecast between two address spaces.
Yes, there's no valid way to pass a non-global address space pointer through a flat pointer into a kernel. This doesn't apply in other contexts
Okay. And you can't just treat kernel parameters specially in your pass that un-promotes generic pointers back to global pointers? Or is that pass not quite so precise as that?
Is there a limit on array size? We may end up here with potentially unbounded number of spacecasts. Perhaps we need a loop which may later be unrolled, if feasible.