This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Place global constructors in .init_array and .fini_array
ClosedPublic

Authored by jhuber6 on Apr 27 2023, 6:07 AM.

Details

Summary

For the GPU, we emit external kernels that call the initializers and
constructors, however if we had a persistent kernel like in the _start
kernel for the libc project, we could initialize the standard way of
calling constructors. This patch adds new global variables containing
pointers to the constructors to be called. If these are placed in the
.init_array and .fini_array sections, then the backend will handle
them specially. The linker will then provide the __init_array_ and
__fini_array_ sections to traverse them. An implementation would look
like this.

extern uintptr_t __init_array_start[];
extern uintptr_t __init_array_end[];
extern uintptr_t __fini_array_start[];
extern uintptr_t __fini_array_end[];

using InitCallback = void(int, char **, char **);
using FiniCallback = void(void);

extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
_start(int argc, char **argv, char **envp) {
  uint64_t init_array_size = __init_array_end - __init_array_start;
  for (uint64_t i = 0; i < init_array_size; ++i)
    reinterpret_cast<InitCallback *>(__init_array_start[i])(argc, argv, env);
  uint64_t fini_array_size = __fini_array_end - __fini_array_start;
  for (uint64_t i = 0; i < fini_array_size; ++i)
    reinterpret_cast<FiniCallback *>(__fini_array_start[i])();
}

Diff Detail

Event Timeline

jhuber6 created this revision.Apr 27 2023, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 6:07 AM
jhuber6 requested review of this revision.Apr 27 2023, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 6:07 AM
jhuber6 updated this revision to Diff 517534.Apr 27 2023, 6:11 AM

Add section check in test.

I am curious how other targets handle @llvm.global_ctors. Is there some generic LLVM pass to change them to .init_array ?

I am curious how other targets handle @llvm.global_ctors. Is there some generic LLVM pass to change them to .init_array ?

I don't think we could fold this AMDGPU lowering pass since it need to do its own ctor / dtor handling for making the kernels. I actually don't know where the lowering happens for generic targets, I could look if it's important.

jhuber6 updated this revision to Diff 517556.Apr 27 2023, 7:48 AM

Add priority

jhuber6 updated this revision to Diff 517634.Apr 27 2023, 10:35 AM

Remove leftoever debug prints.

Does this work for non-AMD hardware?

Is this just adding the globals and not actually using this mechanism?

Does this work for non-AMD hardware?

Is this just adding the globals and not actually using this mechanism?

This is all that's required on a platform with a functioning linker, Nvidia need not apply.

Does this work for non-AMD hardware?

Is this just adding the globals and not actually using this mechanism?

If you're talking about the expected usage, it's in the commit header. I'm planning on adding that to libc in a separate patch.

So we have different schemes for AMD and NVIDIA? That does not sound good.

So we have different schemes for AMD and NVIDIA? That does not sound good.

There's no choice, Nvidia offers no way to export variables to sections and an incomplete ELF linker. I'd rather have AMDGPU do what every other target does and have Nvidia be the black sheep.

I know little about GPU. The generic code emittings constructors to assembly is AsmPrinter::emitSpecialLLVMGlobal and AsmPrinter::emitXXStructorList. It will use .init_array.

The priority works this way. Note that in the absence of a .N suffix, .init_array has the highest priority. https://maskray.me/blog/2021-11-07-init-ctors-init-array

a.o:(.init_array.1) b.o:(.init_array.1)
a.o:(.init_array.2) b.o:(.init_array.2)
...
a.o:(.init_array.65533) b.o:(.init_array.65533)
a.o:(.init_array.65534) b.o:(.init_array.65534)
a.o:(.init_array) b.o:(.init_array)

I know little about GPU. The generic code emittings constructors to assembly is AsmPrinter::emitSpecialLLVMGlobal and AsmPrinter::emitXXStructorList. It will use .init_array.

This pass currently deletes the global list before making it to the assembly printer, maybe we will get some of this behaviour if we don't do that?

The priority works this way. Note that in the absence of a .N suffix, .init_array has the highest priority. https://maskray.me/blog/2021-11-07-init-ctors-init-array

a.o:(.init_array.1) b.o:(.init_array.1)
a.o:(.init_array.2) b.o:(.init_array.2)
...
a.o:(.init_array.65533) b.o:(.init_array.65533)
a.o:(.init_array.65534) b.o:(.init_array.65534)
a.o:(.init_array) b.o:(.init_array)

AMD's linker is lld and targets ELF, so it should follow the exact same rules as you are familiar with.

jhuber6 updated this revision to Diff 517761.Apr 27 2023, 5:48 PM

Update, turns out that if we don't delete the ctor / dtor here it does the
default approach which is exactly what we want. The only changes required are
making sure we don't create duplicate kernels.

yaxunl accepted this revision.Apr 28 2023, 8:53 AM

LGTM. Thanks.

llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
76

I noticed these functions are called not following the priority.

However, I guess that is out of the scope of this patch.

This revision is now accepted and ready to land.Apr 28 2023, 8:53 AM
jhuber6 added inline comments.Apr 28 2023, 9:13 AM
llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
76

True, it was like that when I got here. Do you know who the current user is for this feature?

yaxunl added inline comments.Apr 28 2023, 9:22 AM
llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
76

Currently, all HIP programs use this feature when -fsanitize=addr is used.

But they do not care about priority yet.

jhuber6 added inline comments.Apr 28 2023, 9:23 AM
llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
76

FWIW we could change this to be a single kernel that calls an ASAN library function, which implements the method in the commit header to traverse the list in priority order.

yaxunl added inline comments.Apr 28 2023, 9:34 AM
llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
76

We want to keep this feature as a generic approach to support dynamic initialization.

jhuber6 added inline comments.Apr 28 2023, 9:35 AM
llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
76

This will also cause a duplicate symbol if someone compiles without monolithic LTO. I'm assuming that's the expected behavior.

yaxunl added inline comments.Apr 28 2023, 9:44 AM
llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
76

That is fine. HIP runtime will launch all of them as long as they have init kernel metadata. It is not based on the kernel name.

This revision was landed with ongoing or failed builds.Apr 29 2023, 6:40 AM
This revision was automatically updated to reflect the committed changes.