This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Widen 1 and 2 byte scalar loads
ClosedPublic

Authored by vangthao on Apr 13 2021, 6:01 PM.

Details

Summary

Widen 1 and 2 byte scalar loads to 4 bytes when sufficiently
aligned to avoid using a global load.

Diff Detail

Event Timeline

vangthao created this revision.Apr 13 2021, 6:01 PM
vangthao requested review of this revision.Apr 13 2021, 6:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2021, 6:01 PM
arsenm requested changes to this revision.Apr 14 2021, 2:23 PM

Needs some pure MIR tests too (also including negative tests)

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
448

There's no point in checking the size anymore

1159

I don't understand this getSize check. It's illegal for the memory size to exceed the loaded type

1160

You could also handle G_SEXTLOAD and G_ZEXTLOAD, but would require inserting some instructions to set the high bits appropriately. This also makes me think this should be done earlier

1171

Needs to elaborate more on when this is valid

1174

Should use trunc. I'm trying to eliminate G_EXTRACT usage

This revision now requires changes to proceed.Apr 14 2021, 2:23 PM
vangthao updated this revision to Diff 338663.Apr 19 2021, 4:05 PM

Handle G_SEXTLOAD and G_ZEXTLOAD and added more tests.

foad added inline comments.Apr 20 2021, 1:27 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1181

Use B.buildZExtInReg.

vangthao added inline comments.Apr 20 2021, 7:32 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1181

I attempted to use B.buildZExtInReg but this function creates a new virtual register as the destination instead of using MI.getOperand(0)'s register when passing it as the first argument. This is unlike buildSExtInReg which uses the first argument as the destination.

foad added inline comments.Apr 20 2021, 8:33 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1181

Ah, that's a bug in the implementation:

--- a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
@@ -492,7 +492,7 @@ MachineInstrBuilder MachineIRBuilder::buildZExtInReg(const DstOp &Res,
   LLT ResTy = Res.getLLTTy(*getMRI());
   auto Mask = buildConstant(
       ResTy, APInt::getLowBitsSet(ResTy.getScalarSizeInBits(), ImmOp));
-  return buildAnd(ResTy, Op, Mask);
+  return buildAnd(Res, Op, Mask);
 }
 
 MachineInstrBuilder MachineIRBuilder::buildCast(const DstOp &Dst,
vangthao updated this revision to Diff 338904.EditedApr 20 2021, 9:34 AM

Fix destination register bug in buildZExtInReg and use it for widening scalar G_ZEXTLOAD memory access.

arsenm added inline comments.Apr 20 2021, 6:01 PM
llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
495 ↗(On Diff #338904)

Should split this into a separate patch

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1166

Grammar, widen->widened

1170–1171

This comment is worded confusingly so it doesn't explain what is going on. We are widening the memory access, not the result register type

1174

Should just hoist the S32 definition instead of repeating it

vangthao updated this revision to Diff 339248.Apr 21 2021, 8:30 AM

Reverted buildZExtInReg changes for later patch. Addressed comments.

arsenm added inline comments.Apr 21 2021, 5:18 PM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
447–448

The old comment was more informative

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-widen-scalar-loads.mir
245 ↗(On Diff #339248)

Should add some dummy uses, e.g. S_ENDPGM 0, implicit %1

Just in case regbankselect decides to start dropping dead uses someday

arsenm added inline comments.Apr 21 2021, 5:19 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-widen-scalar-loads.mir
289 ↗(On Diff #339248)

Maybe add a negative case for the memory not being invariant

arsenm requested changes to this revision.Apr 27 2021, 5:53 AM
This revision now requires changes to proceed.Apr 27 2021, 5:53 AM
vangthao updated this revision to Diff 341047.Apr 27 2021, 6:30 PM

Updated mir test and comment.

arsenm added inline comments.May 4 2021, 11:17 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1179–1181

MachineIRBuilder has a buildZExtInReg helper to figure out the mask and create the and for you

1181

S32.getScalarSizeInBits() is a complicated way of just saying 32

vangthao added inline comments.May 4 2021, 11:24 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1179–1181

I used this previously but there was an issue with the implementation in buildZExtInReg where it would create a new destination register instead of using the original destination register passed to it. Jay mentioned this bug in a previous comment.

arsenm added inline comments.May 4 2021, 11:54 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1179–1181

Yes, buildZExtInReg is broken and needs to be fixed. You should fix it and continue using it

vangthao updated this revision to Diff 343079.May 5 2021, 9:13 AM

Rebase and use buildZExtInReg

arsenm accepted this revision.May 5 2021, 2:06 PM
This revision is now accepted and ready to land.May 5 2021, 2:06 PM
This revision was automatically updated to reflect the committed changes.