Widen 1 and 2 byte scalar loads to 4 bytes when sufficiently
aligned to avoid using a global load.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Needs some pure MIR tests too (also including negative tests)
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
447 | There's no point in checking the size anymore | |
1158 | I don't understand this getSize check. It's illegal for the memory size to exceed the loaded type | |
1159 | You could also handle G_SEXTLOAD and G_ZEXTLOAD, but would require inserting some instructions to set the high bits appropriately. This also makes me think this should be done earlier | |
1186 | Needs to elaborate more on when this is valid | |
1186–1201 | Should use trunc. I'm trying to eliminate G_EXTRACT usage |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1182 | Use B.buildZExtInReg. |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1182 | I attempted to use B.buildZExtInReg but this function creates a new virtual register as the destination instead of using MI.getOperand(0)'s register when passing it as the first argument. This is unlike buildSExtInReg which uses the first argument as the destination. |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1182 | Ah, that's a bug in the implementation: --- a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp +++ b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp @@ -492,7 +492,7 @@ MachineInstrBuilder MachineIRBuilder::buildZExtInReg(const DstOp &Res, LLT ResTy = Res.getLLTTy(*getMRI()); auto Mask = buildConstant( ResTy, APInt::getLowBitsSet(ResTy.getScalarSizeInBits(), ImmOp)); - return buildAnd(ResTy, Op, Mask); + return buildAnd(Res, Op, Mask); } MachineInstrBuilder MachineIRBuilder::buildCast(const DstOp &Dst, |
Fix destination register bug in buildZExtInReg and use it for widening scalar G_ZEXTLOAD memory access.
llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp | ||
---|---|---|
495 ↗ | (On Diff #338904) | Should split this into a separate patch |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
1157 | Grammar, widen->widened | |
1171–1172 | This comment is worded confusingly so it doesn't explain what is going on. We are widening the memory access, not the result register type | |
1175 | Should just hoist the S32 definition instead of repeating it |
llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-widen-scalar-loads.mir | ||
---|---|---|
290 | Maybe add a negative case for the memory not being invariant |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1180–1182 | I used this previously but there was an issue with the implementation in buildZExtInReg where it would create a new destination register instead of using the original destination register passed to it. Jay mentioned this bug in a previous comment. |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1180–1182 | Yes, buildZExtInReg is broken and needs to be fixed. You should fix it and continue using it |
There's no point in checking the size anymore