Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Shouldn't these have been expanded into loads and stores already?
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
508 | Extra ; |
I think InstCombine and/or SROA do some too. Also that threshold is arbitrary, and we could start expanding the small ones in IR too
If I understand correctly, InstCombine focus on simplifying mem transfer intrinsic instead of expanding them and only work in very limited cases.
In fact, the change here shares similar idea with SROA, it will analyze whether the Alloca involved in memmove/memcpy can be promoted to vector/scalar operation. If it is promotable, then we will expand the mem transfer intrinsic as part of the promotion process. I would say the important thing here is to get optimal code generation for small alloca, expanding smaller mem transfer is not quite important. Does this sound reasonable to you? Or could you share how would you like we expand small mem transfer intrinsic?
In the long term, I would hope we can enhance the SROA to cover the optimization we have done in tryPromoteAllocaToVector(). But that need much more time/effort.
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
508 | I guess you might misread here? I think we always put a semicolon at the end of lambda expression. |
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
502 | [=]? | |
593–595 | probably should dyn_cast to MemIntrinsicInst here to defend agains these becoming out of sync | |
611 | No reason to care about type pointers anymore | |
llvm/test/CodeGen/AMDGPU/promote-alloca-array-aggregate.ll | ||
219 | Needs tests for the different address space cases you mentioned in the todo | |
219 | also memcpy_inline |
Thanks for the careful review @arsenm, I have fixed most of them. Please help take a second look. Thanks!
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
502 | We are capturing GEPVectorIdx and Alloca, so I think capture by reference is preferred over by copy here. Please correct me if I am wrong. | |
593–595 | Agree, I have changed to cast to catch the possible broken case. | |
llvm/test/CodeGen/AMDGPU/promote-alloca-array-aggregate.ll | ||
219 | Good idea, I have added all of them. |
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
593–595 | This still has the switch over specific IDs instead of the matching cast | |
611 | Well, that needs to be fixed soon since code is going to start getting ripped out very shortly | |
llvm/test/CodeGen/AMDGPU/promote-alloca-array-aggregate.ll | ||
307 | Another edge case to maybe test is the identity copy of the full alloca to itself |
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
611 | We have top engineers working on it! https://github.com/GPUOpen-Drivers/llpc/pull/2210 |
llvm/test/CodeGen/AMDGPU/promote-alloca-array-aggregate.ll | ||
---|---|---|
307 | I have added a case for it, but would not bother to optimize for it as such code pattern are trivially optimized away early. |
[=]?