This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Drop lowering of LDS used by global variables
ClosedPublic

Authored by JonChesterfield on Dec 8 2021, 5:23 PM.

Details

Summary

Approximately revert D103431.

LDS variables are allocated at kernel launch and deallocated at kernel exit.
The address is therefore kernel execution dependent. Global variables are
initialized by values written to .data, which can't be done for a LDS variable
as there is no kernel running, or by a global constructor. Initializing the
global to the address of some LDS allocated by a global constructor is possible
but indistinguishable from undef.

Assigning the address of a LDS variable to a global should be a sema error. It
isn't for openmp, haven't checked other languages. Failing that it could be set
to undef, perhaps in this pass.

Diff Detail

Event Timeline

JonChesterfield created this revision.Dec 8 2021, 5:23 PM
JonChesterfield requested review of this revision.Dec 8 2021, 5:23 PM
Herald added a project: Restricted Project. · View Herald Transcript

Passed internal CI 'psdb', ID 4593.

Deleting this obfuscation unblocks a patch to avoid allocating LDS from kernels that (sufficiently clearly) don't need it.

LGTM, but needs Matt or Stas or Brian ...

Ping. Would like to ship this by the weekend.

I can see how it is illegal to assign an address of an LDS object to a global pointer global variable. What about a global variable which is an LDS pointer? I.e.:

@gptr.3 = addrspace(3) global i64* addrspacecast (i64 addrspace(3)* @lds.3 to i64*), align 8

This should be legal I guess.

Yes, agreed. The address of an LDS variable is scoped to the kernel execution, thus some other LDS variable could be initialised with it. Fortunately, LDS have undef as an initialiser (which is also checked before this call), so that edge case can be postponed until implementing initializers for LDS.

llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
123

LDS with non-undef init skipped over here

I'd rather keep the tests around. The backend needs some stressing of the IR patterns

The tests of nonsensical IR input being mapped to some other nonsensical IR output? This pass (once fixed) will leave them unchanged, so they'll show nothing

The tests of nonsensical IR input being mapped to some other nonsensical IR output? This pass (once fixed) will leave them unchanged, so they'll show nothing

By definition it's not nonsensical IR input since it passes the verifier, and the backend has to have some behavior on it

I can change the IR tests to ones that run to the backend (instead of just this pass) and see how that behaves. Might crash, might write nonsense to .data. That's strictly an increase in coverage to what we have now, with or without this patch.

It seems unfortunate that we don't have an amdgpu specific verifier pass that rejects invalid addrspace annotated variables.

I can change the IR tests to ones that run to the backend (instead of just this pass) and see how that behaves. Might crash, might write nonsense to .data. That's strictly an increase in coverage to what we have now, with or without this patch.

It seems unfortunate that we don't have an amdgpu specific verifier pass that rejects invalid addrspace annotated variables.

Perhaps these tests would be better if globals would have address space 3. When we support LDS initializers this is what we will want to transform.
That said, if we eventually allow LDS initializers isn't that a kind of code we would like to have? What is a motivation to remove it modulo the initializers?

I am doubtful that this would be the correct strategy for initializers of LDS - I suspect that's going to have to be handled by injecting initialisation code into the prologue of kernels and changing the initialisers to undef - but even if it is, we can retrieve it from git at will. Right now it makes callgraph crawling to avoid unnecessary allocation of LDS much more complicated (or imprecise).

I definitely agree that LDS handling is under-tested at present but there's no value I can see to these specific tests. They'll need to be changed to be more useful at which point we're better off deciding what it is we want to check and writing tests that hit that.

rampitec accepted this revision.Dec 13 2021, 3:08 PM

I am doubtful that this would be the correct strategy for initializers of LDS - I suspect that's going to have to be handled by injecting initialisation code into the prologue of kernels and changing the initialisers to undef - but even if it is, we can retrieve it from git at will. Right now it makes callgraph crawling to avoid unnecessary allocation of LDS much more complicated (or imprecise).

I definitely agree that LDS handling is under-tested at present but there's no value I can see to these specific tests. They'll need to be changed to be more useful at which point we're better off deciding what it is we want to check and writing tests that hit that.

OK, makes sense.

This revision is now accepted and ready to land.Dec 13 2021, 3:08 PM
JonChesterfield added a comment.EditedDec 13 2021, 3:40 PM

Will land this shortly.

I think a significant missing piece of LDS testing is code that runs on the GPU. We have some coarse coverage through the openmp tests (because the openmp runtime happens to make fairly heavy use of LDS) but I'm not aware of any targeted testing. I'm planning to add some through openmp, syntax comes out as

uint64_t var __attribute__((loader_uninitialized));
#pragma allocate(var) allocator(omp_pteam_mem_alloc)

but it's not going to be as precise as it could be through a runtime that doesn't use LDS, will see if I can find some hip or opencl tests that run in tree.

edit: openmp tests are under libomptarget, hip tests are in the llvm-test-suite, e.g. https://reviews.llvm.org/D99997

This revision was landed with ongoing or failed builds.Dec 14 2021, 1:59 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/lower-module-lds-global-alias.ll