This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Write LDS objects out as global symbols in code generation
ClosedPublic

Authored by nhaehnle on May 3 2019, 5:04 AM.

Details

Summary

The symbols are of the STT_AMDGPU_LDS type introduced with a previous
change. The linker is then expected to resolve relocations, which are
also emitted.

Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.

Some notes:

  • The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied.

    The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now.
  • We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check.

Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275

Diff Detail

Repository
rL LLVM

Event Timeline

nhaehnle created this revision.May 3 2019, 5:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2019, 5:04 AM
arsenm added a comment.May 3 2019, 6:00 AM

Is it still possible to make use constant addresses in some cases? If there are no calls for example

include/llvm/CodeGen/AsmPrinter.h
623–628 ↗(On Diff #197962)

Separate patch

include/llvm/CodeGen/MachineOperand.h
716–718 ↗(On Diff #197962)

This should be a separate patch

nhaehnle marked 2 inline comments as done.May 7 2019, 1:18 PM

Is it still possible to make use constant addresses in some cases? If there are no calls for example

I'm changing the patch so that the symbols are only generated for globals with external linkage as discussed.

nhaehnle updated this revision to Diff 198518.May 7 2019, 1:21 PM

Address review comments.

arsenm added inline comments.May 8 2019, 5:46 AM
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
310 ↗(On Diff #198518)

This uses the deprecated pointer type method. This should use getValueType() on the global instead

lib/Target/AMDGPU/SIISelLowering.cpp
4599–4603 ↗(On Diff #198518)

Shouldn't introduce machine nodes here. What is the problem with selecting the GlobalAddress directly in a tablegen pattern or manually in AMDGPUISelDAGToDAG?

5571–5572 ↗(On Diff #198518)

Maybe we can define the intrinsic to return -1 if the size can't be determined, similar to how llvm.objectsize works?

9994–10007 ↗(On Diff #198518)

I don't think there should be a need to handle machine nodes here, particularly moves. The GlobalAddress should be legal, and the select pattern should be inserting the move?

test/CodeGen/AMDGPU/constant-fold-mi-operands.ll
4–6 ↗(On Diff #198518)

I'm not sure these tests have been entirely replaced with MIR yet. Can you just set the triple to preserve them?

test/CodeGen/AMDGPU/sopk-compares.ll
4–8 ↗(On Diff #198518)

I'm 99% sure we don't have replacements for any of these tests either. The point of these isn't to use groupstaticsize, but to get some constant that isn't visible during selection. These shouldn't be removed

367–369 ↗(On Diff #198518)

Ditto

test/CodeGen/AMDGPU/sub.i16.ll
147 ↗(On Diff #198518)

Ditto

nhaehnle updated this revision to Diff 204976.Jun 16 2019, 4:37 PM
nhaehnle marked 5 inline comments as done.

Rebase, address some of the comments.

nhaehnle marked an inline comment as done.Jun 16 2019, 4:42 PM
nhaehnle added inline comments.
lib/Target/AMDGPU/SIISelLowering.cpp
5571–5572 ↗(On Diff #198518)

Makes sense, though that seems like a task for a separate change.

nhaehnle updated this revision to Diff 204982.Jun 16 2019, 6:02 PM
nhaehnle marked 3 inline comments as done.

Wrap LDS TargetGlobalAddress nodes in an AMDGPUISD::LDS node that is
selected to S_MOV/V_MOV. This should address the concerns about having
more machine nodes during SelectionDAG, and as a nice side bonus it
actually improves the quality of known-bits information by adding
alignment information as well.

jsji removed a subscriber: jsji.Jun 16 2019, 7:52 PM
arsenm added inline comments.Jun 17 2019, 10:16 AM
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
304–306 ↗(On Diff #204982)

This breaks the initializer error. This should be moved after

nhaehnle updated this revision to Diff 205157.Jun 17 2019, 12:54 PM
nhaehnle marked an inline comment as done.

Address review comment

arsenm accepted this revision.Jun 18 2019, 4:38 PM

LGTM

lib/Target/AMDGPU/SIInstructions.td
1144–1147 ↗(On Diff #205157)

I would be surprised this is ever useful, but it's fine

This revision is now accepted and ready to land.Jun 18 2019, 4:38 PM
nhaehnle marked 2 inline comments as done.Jun 24 2019, 6:45 AM

Thanks.

lib/Target/AMDGPU/SIInstructions.td
1144–1147 ↗(On Diff #205157)

This is required to get rid of the SIlds node. Without wrapping the targetglobaladdress in a dedicated node, a lot of stuff just breaks or gets really messy, e.g.:

  1. Global addrs can't be address operands for DS instructions, so we'd need to pick them out there manually.
  2. At least one test case numerically compares an LDS address to a number and ends up breaking the constant bus limitations without this.

Besides, all the targets I've looked at de facto wrap targetglobaladdresses in other DAG nodes in one way or another. Selecting "naked" tgas seems to just be totally unexpected behavior.

This also simplifies the computeKnownBits integration. Long story short, the weird stuff I did previously were all things to work around the deficiencies of naked tgas and my thinking it would be nicer to avoid a dedicated SD node just for this. I still think it's silly to require a dedicated SD node, but that assumption is deeply embedded in a lot of places...

This revision was automatically updated to reflect the committed changes.

Hi Nicolai,

This introduced some CTS regressions with RADV, for example dEQP-VK.pipeline.spec_constant.compute.local_size.x.

Is this expected?

Thanks!

nhaehnle marked an inline comment as done.Jun 26 2019, 11:56 AM

Oh... radv declares compute LDS as an LLVM global variable but doesn't use rtld yet, right? Sorry, I missed that.

Do you plan to fix it? It breaks a bunch of games.

I'm currently looking into it.

llvm/trunk/test/CodeGen/AMDGPU/local-memory.ll