Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend.
Details
Diff Detail
Event Timeline
I think this will be accepted right now as a no-op addrspacecast to any of the global-like address spaces
docs/AMDGPUUsage.rst | ||
---|---|---|
294–295 | Might as well reserve more for 256-bit descriptors. We'll probably need several of these eventually | |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
297 | Shouldn't this add p7:128? | |
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp | ||
257 | This probably shouldn't be included now, at least without a test | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1050 | Ditto |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
297 | It'd need to be p7:160 - but I'm entirely unsure whether LLVM will drop a lung on a non-power of 2 pointer size. | |
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp | ||
257 | It's needed for us to be able to run the middle end optimizations with fat pointers in the module though. I'll work out how to test this. | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1050 | TLI calls into this (which is why it is needed). I can work out some pass that queries TLI and do a test though. |
docs/AMDGPUUsage.rst | ||
---|---|---|
294–295 | I'd rather not do that now - we'll need at least 3 (1 for image descriptors, 1 for structured buffer descriptors, 1 for samplers) more and I don't want to go through all the steps of adding them everywhere when we've got no immediate need for them. |
docs/AMDGPUUsage.rst | ||
---|---|---|
299–300 | I don't understand why you would blend these. You just need the 128-bit pointer, and then the intrinsic accessing it will have a 32-bit offset operand that isn't part of the pointer | |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
297 | Why does it need to be 160? It should be 128 like the descriptor |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
297 | So for non-swizzable buffer descriptors we want to model it using normal LLVM load/store/atomic instructions, so that we have no intrinsics required for them at all. To model this we need a 160-bit pointer for the 128-bit descriptor + 32-bit offset. This is super important because it means these 160-bit pointers partake in all the normal load/store optimizations without us having to have special cases for whatever new intrinsics we'd have to introduce. We're trying our best to avoid the need for any new intrinsics (if at all possible, it won't always be possible though). |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
297 | I still don't see how the offset is part of the pointer itself. You could always use an offset of 0, so it would be a matter of changing the GEP index type to be different from the bit width of the pointer, and an optimization during codegen to fold it in |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
297 | So if we explicitly laid out the pointer as a p7:128, what would happen if you stored that pointer into memory (say an alloca)? This is fine if you are storing the original pointer (just the 128-bit memory buffer descriptor), but if you had GEP'ed into this pointer, when you store the pointer into memory you are losing the offset the information as part of that store. It seems dangerous to me that we'd pretend the pointer is 128-bit when actually for all intermediate uses of the pointer it'll contain 160-bits of valid information. I think I'd rather just leave it as non-integral which solves all these issues perfectly well and also has the added benefit of restricting the usages of the pointer from things we don't want to support. |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
297 | I don't follow this. Of course it's invalid to store a different value and expect a different one to be there afterwards. You seem to be implying getelementptr is invalid to use at all. The getelementptr is producing a new 128-bit value, it isn't packing the 32-bit offset into some merge value |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
297 | So your pitch is that GEP produces a new 128-bit value, where we've modified the base addr + the num_records within the GEP to record the new addr + upper bound? This is really not what we want when we actually want to consume the fat pointer into an MUBUF instruction, we really want the original descriptor to be unmodified and the offset to be passed in a separate VGPR though. |
lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp | ||
---|---|---|
65 | It may be more pragmatic to have NoAlias with Constant 32-bit. The intention and current practice is for Constant 32-bit to be used with descriptor tables, and those really shouldn't ever alias buffer fat pointers. | |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
297 | Definitely agree with @sheredom here, as the way I would think about these pointers is similar to "pointers + bounds" you could do on a CPU. Two very immediate practical problems that would come up with 128 bits are:
|
lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp | ||
---|---|---|
65 | Should I then actually be changing all the Constant 32-bit aliasing rules to only alias with itself do you think? |
LGTM, whether you do the NoAlias between constant 32-bit and buffer address space here or separately. (Making NoAlias between constant 32-bit and the other existing address spaces should definitely be a different patch.)
lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp | ||
---|---|---|
65 | I think that's a good idea, but it should be a separate patch. |
lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp | ||
---|---|---|
65 | It also seems to me like it should be renamed, since there's a more specific purpose in mind than just a 32-bit pointer for constant |
lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp | ||
---|---|---|
65 | Agreed on both your comments, I'll submit that change as a separate patch. |
Might as well reserve more for 256-bit descriptors. We'll probably need several of these eventually