Currently, raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16}
intrinsics are lowered as buffer_load_{u8,u16}. This patch combines
buffer_load_{u8,u16} and sign extension instructions in order to
generate buffer_load_{i8,i16} instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Updating D144313: [AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics
llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp | ||
---|---|---|
399 | You need to check that SubwordBufferLoad has no other uses. Instead of creating a new instruction, you can modify it in-place using SubwordBufferLoad->setDesc and SubwordBufferLoad->getOperand(0).setReg. | |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
375–376 ↗ | (On Diff #498544) | Instead of calling applyCombineSignExtendInReg here, please declare the new combine in AMDGPUCombine.td so it will be called from the autogenerated tryCombineAll. |
Updating D144313: [AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics
llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp | ||
---|---|---|
393 | I checked the combineMachineInstrs() in Combiner.cpp and it does not set an insertion point for the builder. The same happens in combine() in AMDGPUPostLegalizerCombiner.cpp . So, I assume it is intentional in case you need to add instructions at different points. |
Looks good with a few suggested changes.
llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp | ||
---|---|---|
400–404 | ||
411–412 | Don't need these two lines. SubwordBufferLoad is passed in as an argument now. | |
416 | I think all this new code should probably live in AMDGPUPostLegalizerCombinerHelper instead of AMDGPUCombinerHelper, and you should add TII as a member variable in AMDGPUPostLegalizerCombinerHelper following the example of AMDGPURegBankCombinerHelper. |
Updating D144313: [AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics
Updating D144313: [AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics
getVRegDef?