This is an archive of the discontinued LLVM Phabricator instance.

[hip] Ensure pointer in struct argument has proper `addrspacecast`.
AbandonedPublic

Authored by hliao on May 19 2020, 1:28 PM.

Details

Summary
  • In last https://reviews.llvm.org/D69826, generic pointers in struct/array types are also replaced with global pointers. But, as no additional addrspacecast is inserted, they are promoted with a ptrtoint/inttoptr pair in SROA/GVN. That breaks the address space inferring as well as other optimizations. For such case, we need to recursively dive into these aggregate types and insert addrspacecast when necessary.

Diff Detail

Event Timeline

hliao created this revision.May 19 2020, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2020, 1:28 PM
tra added inline comments.May 19 2020, 2:12 PM
clang/lib/CodeGen/CGCall.cpp
1339

Is there a limit on array size? We may end up here with potentially unbounded number of spacecasts. Perhaps we need a loop which may later be unrolled, if feasible.

clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
17

Nit: relying on parameter name %x.coerce may be rather fragile. Considering that we don't care about the name itself, it may be better to just trim the match after %x or even after`%`.

62–63
hliao updated this revision to Diff 265148.May 19 2020, 11:32 PM

Revise following comments.

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

hliao marked an inline comment as done.EditedMay 22 2020, 8:12 PM

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

addrspacecast *must* be a no-op in terms of argument coercion. As we change the parameter type but cannot control how an argument is prepared, coercion should only change how we interpret them instead of how we encode them. If the target chooses to add an addrspacecast, it must ensure that's a no-op. In AMDGPU, global pointers have the same encoding as their corresponding generic pointers.

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

addrspacecast *must* be a no-op in terms of argument coercion.

So what does this mean exactly? If the ABI lowering uses argument coercion in a way that changes address spaces, it must ensure that the representations are different? So it's always *legal* to just do a memcpy here, we're just trying really hard to not do so.

hliao added a comment.May 22 2020, 9:06 PM

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

addrspacecast *must* be a no-op in terms of argument coercion.

So what does this mean exactly? If the ABI lowering uses argument coercion in a way that changes address spaces, it must ensure that the representations are different? So it's always *legal* to just do a memcpy here, we're just trying really hard to not do so.

If addrspacecast is used, the target must ensure that addrspacecast would NOT change the representation to use argument coercion. Yeah, it's always safe to memcpy or load/store directly but, without target specific knowledge, SROA/GVN won't create a no-op addrspacecast to help address space inferring pass. SROA/GVN only generates the pair of ptr2int/int2ptr to ensure the presentation is the same. Unfortunately, address space inferring doesn't understand that without target specific knowledge as well. That leaves the address space information won't be propagated to final IR. We need a trigger to enable that. That's why we need to add addrspacecast for a simple type argument in line 1314 - 1321. In this patch, we need to traverse the struct and add proper addrspacecast.

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces. InferAddressSpaces won't be able to infer this (maybe it could, but it might break a theoretical language that allows passing some valid 32-bit address in 64-bit flat pointers).

I think the real problem here is the IR is missing a no-op cast between pointers with different address spaces. With the current promotion, GVN sees the bits of the addrspace(0) ptr get reinterpreted as addrspace(1), and doesn't have a better option than a ptrtoint/inttoptr pair. When addrspacecast was added, pointer bitcasts between different address spaces were disallowed (I think mostly for reasons that no longer apply). If we reallowed pointer bitcast between equivalent sized address spaces as known no-op cast, we wouldn't need to change the argument promotion in the frontend.

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces.

Is that because it's a kernel argument? I don't understand how the coercion can work in general for structs that just contain pointers.

I think the real problem here is the IR is missing a no-op cast between pointers with different address spaces. With the current promotion, GVN sees the bits of the addrspace(0) ptr get reinterpreted as addrspace(1), and doesn't have a better option than a ptrtoint/inttoptr pair. When addrspacecast was added, pointer bitcasts between different address spaces were disallowed (I think mostly for reasons that no longer apply). If we reallowed pointer bitcast between equivalent sized address spaces as known no-op cast, we wouldn't need to change the argument promotion in the frontend.

That seems like it would be a much cleaner way of handling this, yeah. Ideally, GVN would be able to find the target-specific information necessary to know that it can introduce an addrspacecast between two address spaces.

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces.

Is that because it's a kernel argument? I don't understand how the coercion can work in general for structs that just contain pointers.

Yes, there's no valid way to pass a non-global address space pointer through a flat pointer into a kernel. This doesn't apply in other contexts

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces.

Is that because it's a kernel argument? I don't understand how the coercion can work in general for structs that just contain pointers.

Yes, there's no valid way to pass a non-global address space pointer through a flat pointer into a kernel. This doesn't apply in other contexts

Okay. And you can't just treat kernel parameters specially in your pass that un-promotes generic pointers back to global pointers? Or is that pass not quite so precise as that?

hliao abandoned this revision.Jun 25 2020, 8:43 PM