This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Propagate LDS align into to instructions
ClosedPublic

Authored by rampitec on Jun 15 2021, 12:14 PM.

Diff Detail

Event Timeline

rampitec created this revision.Jun 15 2021, 12:14 PM
rampitec requested review of this revision.Jun 15 2021, 12:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2021, 12:14 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Jun 15 2021, 12:18 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
336

This looks like you are reinventing getOrEnforceKnownAlignment

arsenm added inline comments.Jun 15 2021, 12:20 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
336

I'm not sure why you need to do this. SelectionDAG does already try to increase the alignment after this point

rampitec added inline comments.Jun 15 2021, 12:43 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
336

This looks like you are reinventing getOrEnforceKnownAlignment

It is not the same. I do not need to enforce alignment on an Value itself but propagate it down to loads and stores. I did not find such helper.

336

I'm not sure why you need to do this. SelectionDAG does already try to increase the alignment after this point

It actually does not, at least not in this scenario. You may notice there are actual codegen changes. I was checking -print-after-all and alignment stays the same all the way past selection. Besides there is also global isel. I assume the earlier we get better alignment the better.

rampitec added inline comments.Jun 17 2021, 10:50 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
336

I'm not sure why you need to do this. SelectionDAG does already try to increase the alignment after this point

It actually does not, at least not in this scenario. You may notice there are actual codegen changes. I was checking -print-after-all and alignment stays the same all the way past selection. Besides there is also global isel. I assume the earlier we get better alignment the better.

I have checked, SDag only calls refineAlignment when it folds a node or for a mem intrinsic, which does not happen with the tests I am using. I hit zero bps in refineAlignment.

rampitec updated this revision to Diff 352790.Jun 17 2021, 10:51 AM

Fixed formatting.

hsmhsm added inline comments.Jun 21 2021, 10:28 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
314

Did not understand the logic behind calling commonAlignment() here. I thought, alignment of GV and GEP are same, and we just need to propogate the alignment of GV? Because, we already have properly updated the alignment of GV?

364

In case of GEP instruction, I thought, we should be exploring the uses of GEP?

371

Same here, in case of bitcast instruction, I thought, we should be exploring uses of bitcast?

rampitec marked 3 inline comments as done.Jun 22 2021, 9:56 AM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
314

Did not understand the logic behind calling commonAlignment() here. I thought, alignment of GV and GEP are same, and we just need to propogate the alignment of GV? Because, we already have properly updated the alignment of GV?

Assume struct:

{ i64 x 16, i32 }

Second field has align 4 and that is the alignment of GV and GEP. But given the structure layout we can tell that actual efective alignment is 8. That is what commonAlignment() call is about.

364

In case of GEP instruction, I thought, we should be exploring the uses of GEP?

Yes, this is recursive call to refineUsesAlignment() with GEP as a Ptr.

371

Same here, in case of bitcast instruction, I thought, we should be exploring uses of bitcast?

Same here, recursive call explores uses of a cast.

rampitec marked 3 inline comments as done.Jun 22 2021, 9:57 AM
hsmhsm accepted this revision.Jun 22 2021, 10:35 PM

LGTM

This revision is now accepted and ready to land.Jun 22 2021, 10:35 PM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Jun 23 2021, 1:28 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
346

For StoreInst, AtomicRMWInst, AtomicCmpXchgInst and GetElementPtrInst you need to check that the use is actually the "address" operand of the instruction.

foad added inline comments.Jun 23 2021, 1:31 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
340

Could early-out here if Align==1 to save a lot of unnecessary work.

346

Oh, I see you are already checking it for GetElementPtrInst.

rampitec marked 3 inline comments as done.Jun 23 2021, 10:24 AM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
346

None of AtomicRMW operators can use pointers. AtomicCmpXchgInst can.Actually I cannot come up with a test for GEP too. I think it is only possible if we start to process ConstantExpr here which is not handled right now.

Anyway, D104796 addresses this and Align == 1 case.