This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGCUDANV.cpp
-
test/
-
CodeGenCUDA/
1/2
offloading-entries.cu
-
Driver/
-
linker-wrapper-image.c
-
OpenMP/
-
declare_target_link_codegen.cpp
-
tools/clang-linker-wrapper/
-
clang-linker-wrapper/
3/7
OffloadWrapper.cpp
-
llvm/
-
include/llvm/Frontend/OpenMP/
-
llvm/
-
Frontend/
-
OpenMP/
-
OMPIRBuilder.h
-
lib/Frontend/OpenMP/
-
Frontend/
-
OpenMP/
-
OMPIRBuilder.cpp

Differential D137470

[Offloading] Initial support for registering offloading entries on COFF targets
Needs ReviewPublic

Authored by jhuber6 on Nov 4 2022, 3:50 PM.

Download Raw Diff

Details

Reviewers

tra
jdoerfert
tianshilei1992
JonChesterfield
yaxunl
Meinersbur
mstorsjo
Bigcheese
rnk

Summary

The new driver registers all the offloading entries by first storing a
structure containing the necessary information into a special section
and then iterating the section at runtime. This is done in ELF targets
using the linker defined __start and __stop sections. However for
COFF targets these are not provided. This is instead done by generating
sections as described here.

This patch adds the initial support required to offloadon COFF targets
by implementing this for the new driver. We use the .<kind>$Ox section
for COFF now.

NOTE: I have not tested the runtime functionality of patch as I do not have a Windows machine set up yet.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Nov 4 2022, 3:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2022, 3:50 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

jhuber6 requested review of this revision.Nov 4 2022, 3:50 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 4 2022, 3:50 PM

Herald added subscribers: llvm-commits, cfe-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B196234: Diff 473355.Nov 4 2022, 4:32 PM

jhuber6 added a reviewer: Meinersbur.Nov 10 2022, 6:57 PM

Meinersbur added reviewers: mstorsjo, Bigcheese.Nov 11 2022, 2:44 PM

This looks reasonable to me, overall. I didn't quite try to follow all the wall-of-text changes in the tests though, but overall fine. Just a couple comments and one case where I didn't find where code went in the refactoring.

clang/test/CodeGenCUDA/offloading-entries.cu
92	Is this identical to the one above? Should the lines be shared with `--check-prefix=COMMON,COFF` etc? (The number of lines is rather small here so it's maybe not strictly necessary, but I saw that done in the other testcase.)
clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
145	FWIW, this comment doesn't feel entirely accurate: Regardless of the length of the section name, all sections with names of the form `name$suffix` will get merged into the same section `name` (sorted by the suffix). Then if `name` is 8 chars or less, the name is kept in the section table (so that it can easily be looked up at runtime), while if it is longer, the full name is kept in the string table (which is not mapped at runtime). Also as an extra side note; we added an exception into lld for `.eh_frame` - this is 9 chars, but libunwind wants to locate the section at runtime. So for that case, lld truncates it to `.eh_fram`. (This behaviour is lld specific, to appease libunwind - binutils doesn't do that, and libgcc locates that section differently.)
337	I don't quite see where the corresponding GlobalVariable for this case is created after the refactoring?

Thanks for the feedback.

Another significant portion of getting this workflow to work for Windows / COFF is parsing the linker arguments. I should be able to look at lld-link and add necessarily aliases to what ld.lld takes I assume? E.g. we use values like -o and -L in the linker wrapper to set the output and find libraries.

clang/test/CodeGenCUDA/offloading-entries.cu
92	Yeah it might be valid to collapse it further, this test is mostly just copy-pasted directly from the output so we should probably try to keep it common.
clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
145	I see, I'm not that familiar with the inner workings of the COFF linking process. All that matters for this use-case is whether or not we can get a pointer to the array. In that case we shouldn't need to worry about the eight character limit right?
337	The CUDA / HIP cases did this separately. This patch merged it into a common method `getELFEntriesArray`. Functionally this just changed the order in the output slightly. The `dummy` variable is only necessary for the ELF linkers to generate the begin / end section. For COFF we make the `$a` and `$z` variables which perform a similar role.

In D137470#3928731, @jhuber6 wrote:

Another significant portion of getting this workflow to work for Windows / COFF is parsing the linker arguments. I should be able to look at lld-link and add necessarily aliases to what ld.lld takes I assume? E.g. we use values like -o and -L in the linker wrapper to set the output and find libraries.

Sorry, I'm not quite up to speed with exactly what is being done linker-wise here - can you give a more detailed overview? Keep in mind that there's two separate interfaces to lld for COFF; when used in mingw mode, it invokes the ld.lld frontend, but with a -m option which directs lld to the mingw frontend, which parses ld.lld style options and rewrites them to lld-link style options and invokes that interface. And when Clang is operating in msvc/clang-cl mode, lld-link is invoked (or called directly by the build system).

clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
145	If you locate the contents at runtime by using specific symbols that point to the start and end of the data, then yes, you don't need to worry about keeping it below the 8 char limit. The 8 char limit is relevant if you enumerate and iterate over the sections of a DLL/EXE at runtime, and try to locate the section dynamically that way.
337	Ah, now I see - this is the second half of what's being merged into the cuda/hip call below in createRegisterGlobalsFunction.

In D137470#3928828, @mstorsjo wrote:

Sorry, I'm not quite up to speed with exactly what is being done linker-wise here - can you give a more detailed overview? Keep in mind that there's two separate interfaces to lld for COFF; when used in mingw mode, it invokes the ld.lld frontend, but with a -m option which directs lld to the mingw frontend, which parses ld.lld style options and rewrites them to lld-link style options and invokes that interface. And when Clang is operating in msvc/clang-cl mode, lld-link is invoked (or called directly by the build system).

Sure, there's a bit of documentation for what's going on here, but I may need to update it a bit.

Basically, for offloading languages (CUDA, HIP, OpenMP, etc) we compile the source code twice, once for the host and once for the target device. We embed the device relocatable object inside the host so we follow a standard compilation pipeline. This linker-wrapper then fishes those relocatable objects out and performs the device-linking phase. The linked output is then put into a global along with some runtime calls to register the image and kernels. That new file gets passed to the wrapped linker job and we get a final executable.

My concern is that the linker wrapper keys off of certain arguments to the linker to do its job since it's invoked something like clang-linker-wrapper <linker-args>. I understand these are fundamentally different for lld-link so I was wondering if this approach in general would work there.

clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
145	Good to know, I may change the section names to be more verbose then, something like `cuda.entries$OE`.

rnk resigned from this revision.Nov 18 2022, 12:00 PM

In D137470#3928870, @jhuber6 wrote:

In D137470#3928828, @mstorsjo wrote:

Sorry, I'm not quite up to speed with exactly what is being done linker-wise here - can you give a more detailed overview? Keep in mind that there's two separate interfaces to lld for COFF; when used in mingw mode, it invokes the ld.lld frontend, but with a -m option which directs lld to the mingw frontend, which parses ld.lld style options and rewrites them to lld-link style options and invokes that interface. And when Clang is operating in msvc/clang-cl mode, lld-link is invoked (or called directly by the build system).

Sure, there's a bit of documentation for what's going on here, but I may need to update it a bit.

Basically, for offloading languages (CUDA, HIP, OpenMP, etc) we compile the source code twice, once for the host and once for the target device. We embed the device relocatable object inside the host so we follow a standard compilation pipeline. This linker-wrapper then fishes those relocatable objects out and performs the device-linking phase. The linked output is then put into a global along with some runtime calls to register the image and kernels. That new file gets passed to the wrapped linker job and we get a final executable.

My concern is that the linker wrapper keys off of certain arguments to the linker to do its job since it's invoked something like clang-linker-wrapper <linker-args>. I understand these are fundamentally different for lld-link so I was wondering if this approach in general would work there.

Right, yes, lld-link uses different arguments than ld.lld. (Also note that for mingw targets, the regular ld.lld interface is used on Windows too, while link.exe or lld-link is used for msvc targets.)

Anyway, I don't have much other comments on this - other than what I commented before, this seems mostly reasonable, but I'm not familiar enough with the whole OpenMP pipeline to give any authoritative review really. Feel free to ask specifically about other details relating to PE/COFF though!

GitHub <noreply@github.com> mentioned this in rG52204a29aba2: [Offload] Initial support for registering offloading entries on COFF targets….Tue, Nov 21, 4:48 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGCUDANV.cpp

9 lines

test/

CodeGenCUDA/

offloading-entries.cu

48 lines

Driver/

linker-wrapper-image.c

96 lines

OpenMP/

declare_target_link_codegen.cpp

9 lines

tools/

clang-linker-wrapper/

OffloadWrapper.cpp

120 lines

llvm/

include/

llvm/

Frontend/

OpenMP/

OMPIRBuilder.h

3 lines

lib/

Frontend/

OpenMP/

OMPIRBuilder.cpp

5 lines

Diff 473355

clang/lib/CodeGen/CGCUDANV.cpp

	Show First 20 Lines • Show All 1,121 Lines • ▼ Show 20 Lines

	// Creates offloading entries for all the kernels and globals that must be			// Creates offloading entries for all the kernels and globals that must be
	// registered. The linker will provide a pointer to this section so we can			// registered. The linker will provide a pointer to this section so we can
	// register the symbols with the linked device image.			// register the symbols with the linked device image.
	void CGNVCUDARuntime::createOffloadingEntries() {			void CGNVCUDARuntime::createOffloadingEntries() {
	llvm::OpenMPIRBuilder OMPBuilder(CGM.getModule());			llvm::OpenMPIRBuilder OMPBuilder(CGM.getModule());
	OMPBuilder.initialize();			OMPBuilder.initialize();

	StringRef Section = CGM.getLangOpts().HIP ? "hip_offloading_entries"			StringRef Section;
				if (CGM.getTriple().isOSBinFormatCOFF())
				Section = CGM.getLangOpts().HIP ? ".hip$OE" : ".cuda$OE";
				else
				Section = CGM.getLangOpts().HIP ? "hip_offloading_entries"
	: "cuda_offloading_entries";			: "cuda_offloading_entries";

	for (KernelInfo &I : EmittedKernels)			for (KernelInfo &I : EmittedKernels)
	OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel],			OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel],
	getDeviceSideName(cast<NamedDecl>(I.D)), 0,			getDeviceSideName(cast<NamedDecl>(I.D)), 0,
	DeviceVarFlags::OffloadGlobalEntry, Section);			DeviceVarFlags::OffloadGlobalEntry, Section);

	for (VarInfo &I : DeviceVars) {			for (VarInfo &I : DeviceVars) {
	uint64_t VarSize =			uint64_t VarSize =
	CGM.getDataLayout().getTypeAllocSize(I.Var->getValueType());			CGM.getDataLayout().getTypeAllocSize(I.Var->getValueType());
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/offloading-entries.cu

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --global-value-regex ".omp_offloading.entry.*"			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --global-value-regex ".omp_offloading.entry.*"
	// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \			// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \
	// RUN: --offload-new-driver -emit-llvm -o - -x cuda %s \| FileCheck \			// RUN: --offload-new-driver -emit-llvm -o - -x cuda %s \| FileCheck \
	// RUN: --check-prefix=CUDA %s			// RUN: --check-prefix=CUDA %s
	// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \			// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \
	// RUN: --offload-new-driver -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: --offload-new-driver -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: --check-prefix=HIP %s			// RUN: --check-prefix=HIP %s
				// RUN: %clang_cc1 -std=c++11 -triple x86_64-win32-gnu -fgpu-rdc \
				// RUN: --offload-new-driver -emit-llvm -o - -x cuda %s \| FileCheck \
				// RUN: --check-prefix=CUDA-COFF %s
				// RUN: %clang_cc1 -std=c++11 -triple x86_64-win32-gnu -fgpu-rdc \
				// RUN: --offload-new-driver -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: --check-prefix=HIP-COFF %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	//.			//.
	// CUDA: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"			// CUDA: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"
	// CUDA: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries", align 1			// CUDA: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
	// CUDA: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00"			// CUDA: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00"
	// CUDA: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries", align 1			// CUDA: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
	// CUDA: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00"			// CUDA: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00"
	// CUDA: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries", align 1			// CUDA: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
	//.			//.
	// HIP: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"			// HIP: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"
	// HIP: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries", align 1			// HIP: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries", align 1
	// HIP: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00"			// HIP: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00"
	// HIP: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries", align 1			// HIP: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries", align 1
	// HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00"			// HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00"
	// HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries", align 1			// HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries", align 1
	//.			//.
				// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"
				// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section ".cuda$OE", align 1
				// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00"
				// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section ".cuda$OE", align 1
				// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00"
				// CUDA-COFF: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section ".cuda$OE", align 1
				//.
				// HIP-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"
				// HIP-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section ".hip$OE", align 1
				// HIP-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00"
				// HIP-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section ".hip$OE", align 1
				// HIP-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00"
				// HIP-COFF: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section ".hip$OE", align 1
				//.
	// CUDA-LABEL: @_Z18__device_stub__foov(			// CUDA-LABEL: @_Z18__device_stub__foov(
	// CUDA-NEXT: entry:			// CUDA-NEXT: entry:
	// CUDA-NEXT: [[TMP0:%.*]] = call i32 @cudaLaunch(ptr @_Z18__device_stub__foov)			// CUDA-NEXT: [[TMP0:%.*]] = call i32 @cudaLaunch(ptr @_Z18__device_stub__foov)
	// CUDA-NEXT: br label [[SETUP_END:%.*]]			// CUDA-NEXT: br label [[SETUP_END:%.*]]
	// CUDA: setup.end:			// CUDA: setup.end:
	// CUDA-NEXT: ret void			// CUDA-NEXT: ret void
	//			//
	// HIP-LABEL: @_Z18__device_stub__foov(			// HIP-LABEL: @_Z18__device_stub__foov(
	// HIP-NEXT: entry:			// HIP-NEXT: entry:
	// HIP-NEXT: [[TMP0:%.*]] = call i32 @hipLaunchByPtr(ptr @_Z3foov)			// HIP-NEXT: [[TMP0:%.*]] = call i32 @hipLaunchByPtr(ptr @_Z3foov)
	// HIP-NEXT: br label [[SETUP_END:%.*]]			// HIP-NEXT: br label [[SETUP_END:%.*]]
	// HIP: setup.end:			// HIP: setup.end:
	// HIP-NEXT: ret void			// HIP-NEXT: ret void
	//			//
				// CUDA-COFF-LABEL: @_Z18__device_stub__foov(
				// CUDA-COFF-NEXT: entry:
				// CUDA-COFF-NEXT: [[TMP0:%.*]] = call i32 @cudaLaunch(ptr @_Z18__device_stub__foov)
				// CUDA-COFF-NEXT: br label [[SETUP_END:%.*]]
				// CUDA-COFF: setup.end:
				// CUDA-COFF-NEXT: ret void
				//
				// HIP-COFF-LABEL: @_Z18__device_stub__foov(
				// HIP-COFF-NEXT: entry:
				// HIP-COFF-NEXT: [[TMP0:%.*]] = call i32 @hipLaunchByPtr(ptr @_Z3foov)
				// HIP-COFF-NEXT: br label [[SETUP_END:%.*]]
				// HIP-COFF: setup.end:
				// HIP-COFF-NEXT: ret void
				//
	__global__ void foo() {}			__global__ void foo() {}

	// CUDA-LABEL: @_Z18__device_stub__barv(			// CUDA-LABEL: @_Z18__device_stub__barv(
	// CUDA-NEXT: entry:			// CUDA-NEXT: entry:
	// CUDA-NEXT: [[TMP0:%.*]] = call i32 @cudaLaunch(ptr @_Z18__device_stub__barv)			// CUDA-NEXT: [[TMP0:%.*]] = call i32 @cudaLaunch(ptr @_Z18__device_stub__barv)
	// CUDA-NEXT: br label [[SETUP_END:%.*]]			// CUDA-NEXT: br label [[SETUP_END:%.*]]
	// CUDA: setup.end:			// CUDA: setup.end:
	// CUDA-NEXT: ret void			// CUDA-NEXT: ret void
	//			//
	// HIP-LABEL: @_Z18__device_stub__barv(			// HIP-LABEL: @_Z18__device_stub__barv(
	// HIP-NEXT: entry:			// HIP-NEXT: entry:
	// HIP-NEXT: [[TMP0:%.*]] = call i32 @hipLaunchByPtr(ptr @_Z3barv)			// HIP-NEXT: [[TMP0:%.*]] = call i32 @hipLaunchByPtr(ptr @_Z3barv)
	// HIP-NEXT: br label [[SETUP_END:%.*]]			// HIP-NEXT: br label [[SETUP_END:%.*]]
	// HIP: setup.end:			// HIP: setup.end:
	// HIP-NEXT: ret void			// HIP-NEXT: ret void
	//			//
				// CUDA-COFF-LABEL: @_Z18__device_stub__barv(
				// CUDA-COFF-NEXT: entry:
				// CUDA-COFF-NEXT: [[TMP0:%.*]] = call i32 @cudaLaunch(ptr @_Z18__device_stub__barv)
				mstorsjoUnsubmitted Not Done Reply Inline Actions Is this identical to the one above? Should the lines be shared with `--check-prefix=COMMON,COFF` etc? (The number of lines is rather small here so it's maybe not strictly necessary, but I saw that done in the other testcase.) mstorsjo: Is this identical to the one above? Should the lines be shared with `--check-prefix=COMMON…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions Yeah it might be valid to collapse it further, this test is mostly just copy-pasted directly from the output so we should probably try to keep it common. jhuber6: Yeah it might be valid to collapse it further, this test is mostly just copy-pasted directly…
				// CUDA-COFF-NEXT: br label [[SETUP_END:%.*]]
				// CUDA-COFF: setup.end:
				// CUDA-COFF-NEXT: ret void
				//
				// HIP-COFF-LABEL: @_Z18__device_stub__barv(
				// HIP-COFF-NEXT: entry:
				// HIP-COFF-NEXT: [[TMP0:%.*]] = call i32 @hipLaunchByPtr(ptr @_Z3barv)
				// HIP-COFF-NEXT: br label [[SETUP_END:%.*]]
				// HIP-COFF: setup.end:
				// HIP-COFF-NEXT: ret void
				//
	__global__ void bar() {}			__global__ void bar() {}
	__device__ int x = 1;			__device__ int x = 1;

clang/test/Driver/linker-wrapper-image.c

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: clang-offload-packager -o %t.out --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70			// RUN: clang-offload-packager -o %t.out --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
	// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \			// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
	// RUN: -fembed-offload-object=%t.out			// RUN: -fembed-offload-object=%t.out
	// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \			// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
	// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefix=OPENMP			// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefixes=OPENMP,OPENMP-ELF
				// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-windows-gnu \
	// OPENMP: @__start_omp_offloading_entries = external hidden constant %__tgt_offload_entry			// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefixes=OPENMP,OPENMP-COFF
	// OPENMP-NEXT: @__stop_omp_offloading_entries = external hidden constant %__tgt_offload_entry
	// OPENMP-NEXT: @__dummy.omp_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries"			// OPENMP-ELF: @__start_omp_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]
	// OPENMP-NEXT: @.omp_offloading.device_image = internal unnamed_addr constant [[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}"			// OPENMP-ELF-NEXT: @__stop_omp_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]
	// OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr @.omp_offloading.device_image, ptr getelementptr inbounds ([[[SIZE]] x i8], ptr @.omp_offloading.device_image, i64 1, i64 0), ptr @__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]			// OPENMP-ELF-NEXT: @__dummy.omp_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries"
	// OPENMP-NEXT: @.omp_offloading.descriptor = internal constant %__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr @__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }			// OPENMP-ELF-NEXT: @.omp_offloading.device_image = internal unnamed_addr constant [[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}"
	// OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_reg, ptr null }]			// OPENMP-ELF-NEXT: @.omp_offloading.device_images = internal unnamed_addr constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr @.omp_offloading.device_image, ptr getelementptr inbounds ([[[SIZE]] x i8], ptr @.omp_offloading.device_image, i64 1, i64 0), ptr @__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
	// OPENMP-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_unreg, ptr null }]			// OPENMP-ELF-NEXT: @.omp_offloading.descriptor = internal constant %__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr @__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }
				// OPENMP-ELF-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_reg, ptr null }]
				// OPENMP-ELF-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_unreg, ptr null }]

				// OPENMP-COFF: @__start.omp_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section ".omp$OA"
				// OPENMP-COFF-NEXT: @__stop.omp_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section ".omp$OZ"
				// OPENMP-COFF-NEXT: @.omp_offloading.device_image = internal unnamed_addr constant [[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}"
				// OPENMP-COFF-NEXT: @.omp_offloading.device_images = internal unnamed_addr constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr @.omp_offloading.device_image, ptr getelementptr inbounds ([[[SIZE]] x i8], ptr @.omp_offloading.device_image, i64 1, i64 0), ptr getelementptr inbounds ([0 x %__tgt_offload_entry], ptr @__start.omp_offloading.entry, i64 0, i64 1), ptr @__stop.omp_offloading.entry }]
				// OPENMP-COFF-NEXT: @.omp_offloading.descriptor = internal constant %__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr getelementptr inbounds ([0 x %__tgt_offload_entry], ptr @__start.omp_offloading.entry, i64 0, i64 1), ptr @__stop.omp_offloading.entry }
				// OPENMP-COFF-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_reg, ptr null }]
				// OPENMP-COFF-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_unreg, ptr null }]

	// OPENMP: define internal void @.omp_offloading.descriptor_reg() section ".text.startup" {			// OPENMP: define internal void @.omp_offloading.descriptor_reg() section ".text.startup" {
	// OPENMP-NEXT: entry:			// OPENMP-NEXT: entry:
	// OPENMP-NEXT: call void @__tgt_register_lib(ptr @.omp_offloading.descriptor)			// OPENMP-NEXT: call void @__tgt_register_lib(ptr @.omp_offloading.descriptor)
	// OPENMP-NEXT: ret void			// OPENMP-NEXT: ret void
	// OPENMP-NEXT: }			// OPENMP-NEXT: }

	// OPENMP: define internal void @.omp_offloading.descriptor_unreg() section ".text.startup" {			// OPENMP: define internal void @.omp_offloading.descriptor_unreg() section ".text.startup" {
	// OPENMP-NEXT: entry:			// OPENMP-NEXT: entry:
	// OPENMP-NEXT: call void @__tgt_unregister_lib(ptr @.omp_offloading.descriptor)			// OPENMP-NEXT: call void @__tgt_unregister_lib(ptr @.omp_offloading.descriptor)
	// OPENMP-NEXT: ret void			// OPENMP-NEXT: ret void
	// OPENMP-NEXT: }			// OPENMP-NEXT: }

	// RUN: clang-offload-packager -o %t.out --image=file=%S/Inputs/dummy-elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_70			// RUN: clang-offload-packager -o %t.out --image=file=%S/Inputs/dummy-elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_70
	// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \			// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
	// RUN: -fembed-offload-object=%t.out			// RUN: -fembed-offload-object=%t.out
	// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \			// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
	// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefix=CUDA			// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefixes=CUDA,CUDA-ELF
				// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-windows-gnu \
	// CUDA: @.fatbin_image = internal constant [0 x i8] zeroinitializer, section ".nv_fatbin"			// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefixes=CUDA,CUDA-COFF
	// CUDA-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 1180844977, i32 1, ptr @.fatbin_image, ptr null }, section ".nvFatBinSegment", align 8
	// CUDA-NEXT: @__dummy.cuda_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section "cuda_offloading_entries"			// CUDA-ELF: @.fatbin_image = internal constant [0 x i8] zeroinitializer, section ".nv_fatbin"
	// CUDA-NEXT: @.cuda.binary_handle = internal global ptr null			// CUDA-ELF-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 1180844977, i32 1, ptr @.fatbin_image, ptr null }, section ".nvFatBinSegment", align 8
	// CUDA-NEXT: @__start_cuda_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]			// CUDA-ELF-NEXT: @.cuda.binary_handle = internal global ptr null
	// CUDA-NEXT: @__stop_cuda_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]			// CUDA-ELF-NEXT: @__start_cuda_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]
	// CUDA-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg, ptr null }]			// CUDA-ELF-NEXT: @__stop_cuda_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]
				// CUDA-ELF-NEXT: @__dummy.cuda_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section "cuda_offloading_entries"
				// CUDA-ELF-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg, ptr null }]

				// CUDA-COFF: @.fatbin_image = internal constant [0 x i8] zeroinitializer, section ".nv_fatbin"
				// CUDA-COFF-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 1180844977, i32 1, ptr @.fatbin_image, ptr null }, section ".nvFatBinSegment", align 8
				// CUDA-COFF-NEXT: @.cuda.binary_handle = internal global ptr null
				// CUDA-COFF-NEXT: @__start.cuda_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section ".cuda$OA"
				// CUDA-COFF-NEXT: @__stop.cuda_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section ".cuda$OZ"
				// CUDA-COFF-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg, ptr null }]

	// CUDA: define internal void @.cuda.fatbin_reg() section ".text.startup" {			// CUDA: define internal void @.cuda.fatbin_reg() section ".text.startup" {
	// CUDA-NEXT: entry:			// CUDA-NEXT: entry:
	// CUDA-NEXT: %0 = call ptr @__cudaRegisterFatBinary(ptr @.fatbin_wrapper)			// CUDA-NEXT: %0 = call ptr @__cudaRegisterFatBinary(ptr @.fatbin_wrapper)
	// CUDA-NEXT: store ptr %0, ptr @.cuda.binary_handle, align 8			// CUDA-NEXT: store ptr %0, ptr @.cuda.binary_handle, align 8
	// CUDA-NEXT: call void @.cuda.globals_reg(ptr %0)			// CUDA-NEXT: call void @.cuda.globals_reg(ptr %0)
	// CUDA-NEXT: call void @__cudaRegisterFatBinaryEnd(ptr %0)			// CUDA-NEXT: call void @__cudaRegisterFatBinaryEnd(ptr %0)
	// CUDA-NEXT: %1 = call i32 @atexit(ptr @.cuda.fatbin_unreg)			// CUDA-NEXT: %1 = call i32 @atexit(ptr @.cuda.fatbin_unreg)
	// CUDA-NEXT: ret void			// CUDA-NEXT: ret void
	// CUDA-NEXT: }			// CUDA-NEXT: }

	// CUDA: define internal void @.cuda.fatbin_unreg() section ".text.startup" {			// CUDA: define internal void @.cuda.fatbin_unreg() section ".text.startup" {
	// CUDA-NEXT: entry:			// CUDA-NEXT: entry:
	// CUDA-NEXT: %0 = load ptr, ptr @.cuda.binary_handle, align 8			// CUDA-NEXT: %0 = load ptr, ptr @.cuda.binary_handle, align 8
	// CUDA-NEXT: call void @__cudaUnregisterFatBinary(ptr %0)			// CUDA-NEXT: call void @__cudaUnregisterFatBinary(ptr %0)
	// CUDA-NEXT: ret void			// CUDA-NEXT: ret void
	// CUDA-NEXT: }			// CUDA-NEXT: }

	// CUDA: define internal void @.cuda.globals_reg(ptr %0) section ".text.startup" {			// CUDA: define internal void @.cuda.globals_reg(ptr %0) section ".text.startup" {
	// CUDA-NEXT: entry:			// CUDA-NEXT: entry:
	// CUDA-NEXT: br i1 icmp ne (ptr @__start_cuda_offloading_entries, ptr @__stop_cuda_offloading_entries), label %while.entry, label %while.end			// CUDA-NEXT: br i1 icmp ne (ptr [[START_ENTRIES:.+]], ptr [[STOP_ENTRIES:.+]]), label %while.entry, label %while.end

	// CUDA: while.entry:			// CUDA: while.entry:
	// CUDA-NEXT: %entry1 = phi ptr [ @__start_cuda_offloading_entries, %entry ], [ %7, %if.end ]			// CUDA-NEXT: %entry1 = phi ptr [ [[START_ENTRIES]], %entry ], [ %7, %if.end ]
	// CUDA-NEXT: %1 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 0			// CUDA-NEXT: %1 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 0
	// CUDA-NEXT: %addr = load ptr, ptr %1, align 8			// CUDA-NEXT: %addr = load ptr, ptr %1, align 8
	// CUDA-NEXT: %2 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 1			// CUDA-NEXT: %2 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 1
	// CUDA-NEXT: %name = load ptr, ptr %2, align 8			// CUDA-NEXT: %name = load ptr, ptr %2, align 8
	// CUDA-NEXT: %3 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 2			// CUDA-NEXT: %3 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 2
	// CUDA-NEXT: %size = load i64, ptr %3, align 4			// CUDA-NEXT: %size = load i64, ptr %3, align 4
	// CUDA-NEXT: %4 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 3			// CUDA-NEXT: %4 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 3
	// CUDA-NEXT: %flag = load i32, ptr %4, align 4			// CUDA-NEXT: %flag = load i32, ptr %4, align 4
	Show All 22 Lines
	// CUDA: sw.surface:			// CUDA: sw.surface:
	// CUDA-NEXT: br label %if.end			// CUDA-NEXT: br label %if.end

	// CUDA: sw.texture:			// CUDA: sw.texture:
	// CUDA-NEXT: br label %if.end			// CUDA-NEXT: br label %if.end

	// CUDA: if.end:			// CUDA: if.end:
	// CUDA-NEXT: %7 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 1			// CUDA-NEXT: %7 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 1
	// CUDA-NEXT: %8 = icmp eq ptr %7, @__stop_cuda_offloading_entries			// CUDA-NEXT: %8 = icmp eq ptr %7, [[STOP_ENTRIES]]
	// CUDA-NEXT: br i1 %8, label %while.end, label %while.entry			// CUDA-NEXT: br i1 %8, label %while.end, label %while.entry

	// CUDA: while.end:			// CUDA: while.end:
	// CUDA-NEXT: ret void			// CUDA-NEXT: ret void
	// CUDA-NEXT: }			// CUDA-NEXT: }

	// RUN: clang-offload-packager -o %t.out --image=file=%S/Inputs/dummy-elf.o,kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx908			// RUN: clang-offload-packager -o %t.out --image=file=%S/Inputs/dummy-elf.o,kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx908
	// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \			// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
	// RUN: -fembed-offload-object=%t.out			// RUN: -fembed-offload-object=%t.out
	// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \			// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
	// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefix=HIP			// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefixes=HIP,HIP-ELF
				// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-windows-gnu \
	// HIP: @.fatbin_image = internal constant [0 x i8] zeroinitializer, section ".hip_fatbin"			// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 \| FileCheck %s --check-prefixes=HIP,HIP-COFF
	// HIP-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 1212764230, i32 1, ptr @.fatbin_image, ptr null }, section ".hipFatBinSegment", align 8
	// HIP-NEXT: @__dummy.hip_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section "hip_offloading_entries"			// HIP-ELF: @.fatbin_image = internal constant [0 x i8] zeroinitializer, section ".hip_fatbin"
	// HIP-NEXT: @.hip.binary_handle = internal global ptr null			// HIP-ELF-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 1212764230, i32 1, ptr @.fatbin_image, ptr null }, section ".hipFatBinSegment", align 8
	// HIP-NEXT: @__start_hip_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]			// HIP-ELF-NEXT: @.hip.binary_handle = internal global ptr null
	// HIP-NEXT: @__stop_hip_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]			// HIP-ELF-NEXT: @__start_hip_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]
	// HIP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.hip.fatbin_reg, ptr null }]			// HIP-ELF-NEXT: @__stop_hip_offloading_entries = external hidden constant [0 x %__tgt_offload_entry]
				// HIP-ELF-NEXT: @__dummy.hip_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section "hip_offloading_entries"
				// HIP-ELF-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.hip.fatbin_reg, ptr null }]

				// HIP-COFF: @.fatbin_image = internal constant [0 x i8] zeroinitializer, section ".hip_fatbin"
				// HIP-COFF-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 1212764230, i32 1, ptr @.fatbin_image, ptr null }, section ".hipFatBinSegment", align 8
				// HIP-COFF-NEXT: @.hip.binary_handle = internal global ptr null
				// HIP-COFF-NEXT: @__start.hip_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section ".hip$OA"
				// HIP-COFF-NEXT: @__stop.hip_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section ".hip$OZ"
				// HIP-COFF-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.hip.fatbin_reg, ptr null }]

	// HIP: define internal void @.hip.fatbin_reg() section ".text.startup" {			// HIP: define internal void @.hip.fatbin_reg() section ".text.startup" {
	// HIP-NEXT: entry:			// HIP-NEXT: entry:
	// HIP-NEXT: %0 = call ptr @__hipRegisterFatBinary(ptr @.fatbin_wrapper)			// HIP-NEXT: %0 = call ptr @__hipRegisterFatBinary(ptr @.fatbin_wrapper)
	// HIP-NEXT: store ptr %0, ptr @.hip.binary_handle, align 8			// HIP-NEXT: store ptr %0, ptr @.hip.binary_handle, align 8
	// HIP-NEXT: call void @.hip.globals_reg(ptr %0)			// HIP-NEXT: call void @.hip.globals_reg(ptr %0)
	// HIP-NEXT: %1 = call i32 @atexit(ptr @.hip.fatbin_unreg)			// HIP-NEXT: %1 = call i32 @atexit(ptr @.hip.fatbin_unreg)
	// HIP-NEXT: ret void			// HIP-NEXT: ret void
	// HIP-NEXT: }			// HIP-NEXT: }

	// HIP: define internal void @.hip.fatbin_unreg() section ".text.startup" {			// HIP: define internal void @.hip.fatbin_unreg() section ".text.startup" {
	// HIP-NEXT: entry:			// HIP-NEXT: entry:
	// HIP-NEXT: %0 = load ptr, ptr @.hip.binary_handle, align 8			// HIP-NEXT: %0 = load ptr, ptr @.hip.binary_handle, align 8
	// HIP-NEXT: call void @__hipUnregisterFatBinary(ptr %0)			// HIP-NEXT: call void @__hipUnregisterFatBinary(ptr %0)
	// HIP-NEXT: ret void			// HIP-NEXT: ret void
	// HIP-NEXT: }			// HIP-NEXT: }

	// HIP: define internal void @.hip.globals_reg(ptr %0) section ".text.startup" {			// HIP: define internal void @.hip.globals_reg(ptr %0) section ".text.startup" {
	// HIP-NEXT: entry:			// HIP-NEXT: entry:
	// HIP-NEXT: br i1 icmp ne (ptr @__start_hip_offloading_entries, ptr @__stop_hip_offloading_entries), label %while.entry, label %while.end			// HIP-NEXT: br i1 icmp ne (ptr [[START_ENTRIES:.+]], ptr [[STOP_ENTRIES:.+]]), label %while.entry, label %while.end

	// HIP: while.entry:			// HIP: while.entry:
	// HIP-NEXT: %entry1 = phi ptr [ @__start_hip_offloading_entries, %entry ], [ %7, %if.end ]			// HIP-NEXT: %entry1 = phi ptr [ [[START_ENTRIES]], %entry ], [ %7, %if.end ]
	// HIP-NEXT: %1 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 0			// HIP-NEXT: %1 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 0
	// HIP-NEXT: %addr = load ptr, ptr %1, align 8			// HIP-NEXT: %addr = load ptr, ptr %1, align 8
	// HIP-NEXT: %2 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 1			// HIP-NEXT: %2 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 1
	// HIP-NEXT: %name = load ptr, ptr %2, align 8			// HIP-NEXT: %name = load ptr, ptr %2, align 8
	// HIP-NEXT: %3 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 2			// HIP-NEXT: %3 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 2
	// HIP-NEXT: %size = load i64, ptr %3, align 4			// HIP-NEXT: %size = load i64, ptr %3, align 4
	// HIP-NEXT: %4 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 3			// HIP-NEXT: %4 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 3
	// HIP-NEXT: %flag = load i32, ptr %4, align 4			// HIP-NEXT: %flag = load i32, ptr %4, align 4
	Show All 22 Lines
	// HIP: sw.surface:			// HIP: sw.surface:
	// HIP-NEXT: br label %if.end			// HIP-NEXT: br label %if.end

	// HIP: sw.texture:			// HIP: sw.texture:
	// HIP-NEXT: br label %if.end			// HIP-NEXT: br label %if.end

	// HIP: if.end:			// HIP: if.end:
	// HIP-NEXT: %7 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 1			// HIP-NEXT: %7 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 1
	// HIP-NEXT: %8 = icmp eq ptr %7, @__stop_hip_offloading_entries			// HIP-NEXT: %8 = icmp eq ptr %7, [[STOP_ENTRIES]]
	// HIP-NEXT: br i1 %8, label %while.end, label %while.entry			// HIP-NEXT: br i1 %8, label %while.end, label %while.entry

	// HIP: while.end:			// HIP: while.end:
	// HIP-NEXT: ret void			// HIP-NEXT: ret void
	// HIP-NEXT: }			// HIP-NEXT: }

clang/test/OpenMP/declare_target_link_codegen.cpp

	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -o - \| FileCheck %s --check-prefix HOST --check-prefix CHECK			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -o - \| FileCheck %s --check-prefix HOST --check-prefix CHECK
	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix DEVICE --check-prefix CHECK			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix DEVICE --check-prefix CHECK
	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t
	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -o - \| FileCheck %s --check-prefix DEVICE --check-prefix CHECK			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -o - \| FileCheck %s --check-prefix DEVICE --check-prefix CHECK

	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -o - \| FileCheck %s --check-prefix SIMD-ONLY			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -o - \| FileCheck %s --check-prefix SIMD-ONLY
	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o -\| FileCheck %s --check-prefix SIMD-ONLY			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o -\| FileCheck %s --check-prefix SIMD-ONLY
	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t
	// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -verify -o - \| FileCheck %s --check-prefix SIMD-ONLY			// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp-simd -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -verify -o - \| FileCheck %s --check-prefix SIMD-ONLY

				// RUN: %clang_cc1 -no-opaque-pointers -verify -fopenmp -x c++ -triple x86_64-win32-gnu -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -o - \| FileCheck %s --check-prefix HOST-COFF --check-prefix CHECK

	// expected-no-diagnostics			// expected-no-diagnostics

	// SIMD-ONLY-NOT: {{__kmpc\|__tgt}}			// SIMD-ONLY-NOT: {{__kmpc\|__tgt}}

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	// HOST-DAG: @c = external global i32,			// HOST-DAG: @c = external global i32,
	// HOST-DAG: @c_decl_tgt_ref_ptr = weak global i32* @c			// HOST-DAG: @c_decl_tgt_ref_ptr = weak global i32* @c
	// HOST-DAG: @[[D:.+]] = internal global i32 2			// HOST-DAG: @[[D:.+]] = internal global i32 2
	// HOST-DAG: @[[D_PTR:.+]] = weak global i32* @[[D]]			// HOST-DAG: @[[D_PTR:.+]] = weak global i32* @[[D]]
	// DEVICE-NOT: @c =			// DEVICE-NOT: @c =
	// DEVICE: @c_decl_tgt_ref_ptr = weak global i32* null			// DEVICE: @c_decl_tgt_ref_ptr = weak global i32* null
	// HOST: [[SIZES:@.+]] = private unnamed_addr constant [3 x i64] [i64 4, i64 4, i64 4]			// HOST: [[SIZES:@.+]] = private unnamed_addr constant [3 x i64] [i64 4, i64 4, i64 4]
	// HOST: [[MAPTYPES:@.+]] = private unnamed_addr constant [3 x i64] [i64 35, i64 531, i64 531]			// HOST: [[MAPTYPES:@.+]] = private unnamed_addr constant [3 x i64] [i64 35, i64 531, i64 531]
	// HOST: @.omp_offloading.entry_name{{.*}} = internal unnamed_addr constant [{{[0-9]+}} x i8] c"c_decl_tgt_ref_ptr\00"			// HOST: @.omp_offloading.entry_name{{.*}} = internal unnamed_addr constant [{{[0-9]+}} x i8] c"c_decl_tgt_ref_ptr\00"
	// HOST: @.omp_offloading.entry.c_decl_tgt_ref_ptr = weak constant %struct.__tgt_offload_entry { i8* bitcast (i32** @c_decl_tgt_ref_ptr to i8), i8 getelementptr inbounds ([{{[0-9]+}} x i8], [{{[0-9]+}} x i8]* @.omp_offloading.entry_name, i32 0, i32 0), i64 8, i32 1, i32 0 }, section "omp_offloading_entries", align 1			// HOST: @.omp_offloading.entry.c_decl_tgt_ref_ptr = weak constant %struct.__tgt_offload_entry { i8* bitcast (i32** @c_decl_tgt_ref_ptr to i8), i8 getelementptr inbounds ([{{[0-9]+}} x i8], [{{[0-9]+}} x i8]* @.omp_offloading.entry_name, i32 0, i32 0), i64 8, i32 1, i32 0 }, section "omp_offloading_entries", align 1
				// HOST-COFF: @.omp_offloading.entry.c_decl_tgt_ref_ptr = weak constant %struct.__tgt_offload_entry { i8* bitcast (i32** @c_decl_tgt_ref_ptr to i8), i8 getelementptr inbounds ([{{[0-9]+}} x i8], [{{[0-9]+}} x i8]* @.omp_offloading.entry_name, i32 0, i32 0), i64 8, i32 1, i32 0 }, section ".omp$OE", align 1
	// DEVICE-NOT: internal unnamed_addr constant [{{[0-9]+}} x i8] c"c_{{.*}}_decl_tgt_ref_ptr\00"			// DEVICE-NOT: internal unnamed_addr constant [{{[0-9]+}} x i8] c"c_{{.*}}_decl_tgt_ref_ptr\00"
	// HOST: @.omp_offloading.entry_name{{.}} = internal unnamed_addr constant [{{[0-9]+}} x i8] c"_{{.}}d_{{.*}}_decl_tgt_ref_ptr\00"			// HOST: @.omp_offloading.entry_name{{.}} = internal unnamed_addr constant [{{[0-9]+}} x i8] c"_{{.}}d_{{.*}}_decl_tgt_ref_ptr\00"
	// HOST: @.omp_offloading.entry.[[D_PTR]] = weak constant %struct.__tgt_offload_entry { i8* bitcast (i32** @[[D_PTR]] to i8), i8 getelementptr inbounds ([{{[0-9]+}} x i8], [{{[0-9]+}} x i8]* @.omp_offloading.entry_name{{.*}}, i32 0, i32 0			// HOST: @.omp_offloading.entry.[[D_PTR]] = weak constant %struct.__tgt_offload_entry { i8* bitcast (i32** @[[D_PTR]] to i8), i8 getelementptr inbounds ([{{[0-9]+}} x i8], [{{[0-9]+}} x i8]* @.omp_offloading.entry_name{{.*}}, i32 0, i32 0

	extern int c;			extern int c;
	#pragma omp declare target link(c)			#pragma omp declare target link(c)

	static int d = 2;			static int d = 2;
	#pragma omp declare target link(d)			#pragma omp declare target link(d)

	int maini1() {			int maini1() {
	int a;			int a;
	#pragma omp target map(tofrom : a)			#pragma omp target map(tofrom : a)
	{			{
	a = c;			a = c;
	d++;			d++;
	}			}
	#pragma omp target			#pragma omp target
	#pragma omp teams			#pragma omp teams
	c = a;			c = a;
	return 0;			return 0;
	}			}

	// DEVICE: define weak_odr protected void @__omp_offloading_{{.}}_{{.}}maini1{{.}}_l42(i32 noundef nonnull align {{[0-9]+}} dereferenceable{{[^,]*}}			// DEVICE: define weak_odr protected void @__omp_offloading_{{.}}_{{.}}maini1{{.}}_l45(i32 noundef nonnull align {{[0-9]+}} dereferenceable{{[^,]*}}
	// DEVICE: [[C_REF:%.+]] = load i32, i32* @c_decl_tgt_ref_ptr,			// DEVICE: [[C_REF:%.+]] = load i32, i32* @c_decl_tgt_ref_ptr,
	// DEVICE: [[C:%.+]] = load i32, i32* [[C_REF]],			// DEVICE: [[C:%.+]] = load i32, i32* [[C_REF]],
	// DEVICE: store i32 [[C]], i32* %			// DEVICE: store i32 [[C]], i32* %

	// HOST: define {{.}}i32 @{{.}}maini1{{.*}}()			// HOST: define {{.}}i32 @{{.}}maini1{{.*}}()
	// HOST: [[BASEPTRS:%.+]] = alloca [3 x i8*],			// HOST: [[BASEPTRS:%.+]] = alloca [3 x i8*],
	// HOST: [[PTRS:%.+]] = alloca [3 x i8*],			// HOST: [[PTRS:%.+]] = alloca [3 x i8*],
	// HOST: getelementptr inbounds [3 x i8], [3 x i8]* [[BASEPTRS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0			// HOST: getelementptr inbounds [3 x i8], [3 x i8]* [[BASEPTRS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
	Show All 11 Lines
	// HOST: store i32 @[[D_PTR]], i32* [[BP2_CAST]],			// HOST: store i32 @[[D_PTR]], i32* [[BP2_CAST]],
	// HOST: [[P2:%.+]] = getelementptr inbounds [3 x i8], [3 x i8]* [[PTRS]], i{{[0-9]+}} 0, i{{[0-9]+}} 2			// HOST: [[P2:%.+]] = getelementptr inbounds [3 x i8], [3 x i8]* [[PTRS]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
	// HOST: [[P2_CAST:%.+]] = bitcast i8 [[P2]] to i32			// HOST: [[P2_CAST:%.+]] = bitcast i8 [[P2]] to i32
	// HOST: store i32* @[[D]], i32** [[P2_CAST]],			// HOST: store i32* @[[D]], i32** [[P2_CAST]],

	// HOST: [[BP0:%.+]] = getelementptr inbounds [3 x i8], [3 x i8]* [[BASEPTRS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0			// HOST: [[BP0:%.+]] = getelementptr inbounds [3 x i8], [3 x i8]* [[BASEPTRS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
	// HOST: [[P0:%.+]] = getelementptr inbounds [3 x i8], [3 x i8]* [[PTRS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0			// HOST: [[P0:%.+]] = getelementptr inbounds [3 x i8], [3 x i8]* [[PTRS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
	// HOST: call i32 @__tgt_target_kernel(%struct.ident_t* @{{.+}}, i64 -1, i32 -1, i32 0, i8* @.{{.+}}.region_id, %struct.__tgt_kernel_arguments* %{{.+}})			// HOST: call i32 @__tgt_target_kernel(%struct.ident_t* @{{.+}}, i64 -1, i32 -1, i32 0, i8* @.{{.+}}.region_id, %struct.__tgt_kernel_arguments* %{{.+}})
	// HOST: call void @__omp_offloading_{{.}}_{{.}}_{{.}}maini1{{.}}_l42(i32* %{{[^,]+}})			// HOST: call void @__omp_offloading_{{.}}_{{.}}_{{.}}maini1{{.}}_l45(i32* %{{[^,]+}})
	// HOST: call i32 @__tgt_target_kernel(%struct.ident_t* @{{.+}}, i64 -1, i32 0, i32 0, i8* @.{{.+}}.region_id, %struct.__tgt_kernel_arguments* %{{.+}})			// HOST: call i32 @__tgt_target_kernel(%struct.ident_t* @{{.+}}, i64 -1, i32 0, i32 0, i8* @.{{.+}}.region_id, %struct.__tgt_kernel_arguments* %{{.+}})

	// HOST: define internal void @__omp_offloading_{{.}}_{{.}}maini1{{.}}_l42(i32 noundef nonnull align {{[0-9]+}} dereferenceable{{.*}})			// HOST: define internal void @__omp_offloading_{{.}}_{{.}}maini1{{.}}_l45(i32 noundef nonnull align {{[0-9]+}} dereferenceable{{.*}})
	// HOST: [[C:%.]] = load i32, i32 @c,			// HOST: [[C:%.]] = load i32, i32 @c,
	// HOST: store i32 [[C]], i32* %			// HOST: store i32 [[C]], i32* %

	// CHECK: !{i32 1, !"c_decl_tgt_ref_ptr", i32 1, i32 {{[0-9]+}}}			// CHECK: !{i32 1, !"c_decl_tgt_ref_ptr", i32 1, i32 {{[0-9]+}}}
	#endif // HEADER			#endif // HEADER

clang/tools/clang-linker-wrapper/OffloadWrapper.cpp

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	DescTy = StructType::create("__tgt_bin_desc", Type::getInt32Ty(C),
getEntryPtrTy(M));		getEntryPtrTy(M));
return DescTy;		return DescTy;
}		}

PointerType *getBinDescPtrTy(Module &M) {		PointerType *getBinDescPtrTy(Module &M) {
return PointerType::getUnqual(getBinDescTy(M));		return PointerType::getUnqual(getBinDescTy(M));
}		}

		std::pair<Constant , Constant > getELFEntriesArray(Module &M,
		StringRef Kind) {
		auto *EntriesB = new GlobalVariable(
		M, ArrayType::get(getEntryTy(M), 0), /isConstant/ true,
		GlobalValue::ExternalLinkage,
		/Initializer/ nullptr, "__start_" + Kind + "_offloading_entries");
		EntriesB->setVisibility(GlobalValue::HiddenVisibility);
		auto *EntriesE = new GlobalVariable(
		M, ArrayType::get(getEntryTy(M), 0), /isConstant/ true,
		GlobalValue::ExternalLinkage,
		/Initializer/ nullptr, "__stop_" + Kind + "_offloading_entries");
		EntriesE->setVisibility(GlobalValue::HiddenVisibility);

		// We assume that external begin/end symbols that we have created above will
		// be defined by the linker. But linker will do that only if linker inputs
		// have section with "omp_offloading_entries" name which is not guaranteed.
		// So, we just create dummy zero sized object in the offload entries section
		// to force linker to define those symbols.
		auto *DummyInit =
		ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u));
		auto *DummyEntry = new GlobalVariable(
		M, DummyInit->getType(), true, GlobalVariable::ExternalLinkage, DummyInit,
		"__dummy." + Kind + "_offloading.entry");
		DummyEntry->setSection((Kind + "_offloading_entries").str());
		DummyEntry->setVisibility(GlobalValue::HiddenVisibility);

		return std::make_pair(EntriesB, EntriesE);
		}

		std::pair<Constant , Constant > getCOFFEntriesArray(Module &M,
		StringRef Kind) {
		// For COFF targets, sections with 8 or fewer characters containing a '$' will
		// be merged into the same section at runtime. The order is determined by the
		mstorsjoUnsubmitted Not Done Reply Inline Actions FWIW, this comment doesn't feel entirely accurate: Regardless of the length of the section name, all sections with names of the form `name$suffix` will get merged into the same section `name` (sorted by the suffix). Then if `name` is 8 chars or less, the name is kept in the section table (so that it can easily be looked up at runtime), while if it is longer, the full name is kept in the string table (which is not mapped at runtime). Also as an extra side note; we added an exception into lld for `.eh_frame` - this is 9 chars, but libunwind wants to locate the section at runtime. So for that case, lld truncates it to `.eh_fram`. (This behaviour is lld specific, to appease libunwind - binutils doesn't do that, and libgcc locates that section differently.) mstorsjo: FWIW, this comment doesn't feel entirely accurate: Regardless of the length of the section name…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I see, I'm not that familiar with the inner workings of the COFF linking process. All that matters for this use-case is whether or not we can get a pointer to the array. In that case we shouldn't need to worry about the eight character limit right? jhuber6: I see, I'm not that familiar with the inner workings of the COFF linking process. All that…
		mstorsjoUnsubmitted Not Done Reply Inline Actions If you locate the contents at runtime by using specific symbols that point to the start and end of the data, then yes, you don't need to worry about keeping it below the 8 char limit. The 8 char limit is relevant if you enumerate and iterate over the sections of a DLL/EXE at runtime, and try to locate the section dynamically that way. mstorsjo: If you locate the contents at runtime by using specific symbols that point to the start and end…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Good to know, I may change the section names to be more verbose then, something like `cuda.entries$OE`. jhuber6: Good to know, I may change the section names to be more verbose then, something like `cuda.
		// alphebetical ordering of the text after the '$' character. Here we generate
		// two dummy variables that will be placed at the start and end of that
		// section respectively that can be used to iterate the section at runtime.
		auto *EntriesInit =
		ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u));
		auto *EntriesB =
		new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0), true,
		GlobalVariable::ExternalLinkage, EntriesInit,
		"__start." + Kind + "_offloading.entry");
		EntriesB->setSection(("." + Kind + "$OA").str());
		EntriesB->setVisibility(GlobalValue::HiddenVisibility);
		auto *EntriesE =
		new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0), true,
		GlobalVariable::ExternalLinkage, EntriesInit,
		"__stop." + Kind + "_offloading.entry");
		EntriesE->setSection(("." + Kind + "$OZ").str());
		EntriesE->setVisibility(GlobalValue::HiddenVisibility);

		Constant *ZeroOne[] = {ConstantInt::get(getSizeTTy(M), 0u),
		ConstantInt::get(getSizeTTy(M), 1u)};
		return std::make_pair(ConstantExpr::getGetElementPtr(EntriesB->getValueType(),
		EntriesB, ZeroOne),
		EntriesE);
		}

/// Creates binary descriptor for the given device images. Binary descriptor		/// Creates binary descriptor for the given device images. Binary descriptor
/// is an object that is passed to the offloading runtime at program startup		/// is an object that is passed to the offloading runtime at program startup
/// and it describes all device images available in the executable or shared		/// and it describes all device images available in the executable or shared
/// library. It is defined as follows		/// library. It is defined as follows
///		///
/// __attribute__((visibility("hidden")))		/// __attribute__((visibility("hidden")))
/// extern __tgt_offload_entry *__start_omp_offloading_entries;		/// extern __tgt_offload_entry *__start_omp_offloading_entries;
/// __attribute__((visibility("hidden")))		/// __attribute__((visibility("hidden")))
Show All 24 Lines
/// Images, /DeviceImages/		/// Images, /DeviceImages/
/// __start_omp_offloading_entries, /HostEntriesBegin/		/// __start_omp_offloading_entries, /HostEntriesBegin/
/// __stop_omp_offloading_entries /HostEntriesEnd/		/// __stop_omp_offloading_entries /HostEntriesEnd/
/// };		/// };
///		///
/// Global variable that represents BinDesc is returned.		/// Global variable that represents BinDesc is returned.
GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {		GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
LLVMContext &C = M.getContext();		LLVMContext &C = M.getContext();
// Create external begin/end symbols for the offload entries table.		llvm::Triple Triple(M.getTargetTriple());
auto *EntriesB = new GlobalVariable(
M, getEntryTy(M), /isConstant/ true, GlobalValue::ExternalLinkage,
/Initializer/ nullptr, "__start_omp_offloading_entries");
EntriesB->setVisibility(GlobalValue::HiddenVisibility);
auto *EntriesE = new GlobalVariable(
M, getEntryTy(M), /isConstant/ true, GlobalValue::ExternalLinkage,
/Initializer/ nullptr, "__stop_omp_offloading_entries");
EntriesE->setVisibility(GlobalValue::HiddenVisibility);

// We assume that external begin/end symbols that we have created above will		Constant EntriesB, EntriesE;
// be defined by the linker. But linker will do that only if linker inputs		if (Triple.isOSBinFormatCOFF())
// have section with "omp_offloading_entries" name which is not guaranteed.		std::tie(EntriesB, EntriesE) = getCOFFEntriesArray(M, "omp");
// So, we just create dummy zero sized object in the offload entries section		else
// to force linker to define those symbols.		std::tie(EntriesB, EntriesE) = getELFEntriesArray(M, "omp");
auto *DummyInit =
ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u));
auto *DummyEntry = new GlobalVariable(
M, DummyInit->getType(), true, GlobalVariable::ExternalLinkage, DummyInit,
"__dummy.omp_offloading.entry");
DummyEntry->setSection("omp_offloading_entries");
DummyEntry->setVisibility(GlobalValue::HiddenVisibility);

auto *Zero = ConstantInt::get(getSizeTTy(M), 0u);		auto *Zero = ConstantInt::get(getSizeTTy(M), 0u);
Constant *ZeroZero[] = {Zero, Zero};		Constant *ZeroZero[] = {Zero, Zero};

// Create initializer for the images array.		// Create initializer for the images array.
SmallVector<Constant *, 4u> ImagesInits;		SmallVector<Constant *, 4u> ImagesInits;
ImagesInits.reserve(Bufs.size());		ImagesInits.reserve(Bufs.size());
for (ArrayRef<char> Buf : Bufs) {		for (ArrayRef<char> Buf : Bufs) {
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {

auto *FatbinDesc =		auto *FatbinDesc =
new GlobalVariable(M, getFatbinWrapperTy(M),		new GlobalVariable(M, getFatbinWrapperTy(M),
/isConstant/ true, GlobalValue::InternalLinkage,		/isConstant/ true, GlobalValue::InternalLinkage,
FatbinInitializer, ".fatbin_wrapper");		FatbinInitializer, ".fatbin_wrapper");
FatbinDesc->setSection(FatbinWrapperSection);		FatbinDesc->setSection(FatbinWrapperSection);
FatbinDesc->setAlignment(Align(8));		FatbinDesc->setAlignment(Align(8));

// We create a dummy entry to ensure the linker will define the begin / end
// symbols. The CUDA runtime should ignore the null address if we attempt to
// register it.
auto *DummyInit =
ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u));
auto *DummyEntry = new GlobalVariable(
M, DummyInit->getType(), true, GlobalVariable::ExternalLinkage, DummyInit,
IsHIP ? "__dummy.hip_offloading.entry" : "__dummy.cuda_offloading.entry");
mstorsjoUnsubmitted Not Done Reply Inline Actions I don't quite see where the corresponding GlobalVariable for this case is created after the refactoring? mstorsjo: I don't quite see where the corresponding GlobalVariable for this case is created after the…
jhuber6AuthorUnsubmitted Done Reply Inline Actions The CUDA / HIP cases did this separately. This patch merged it into a common method `getELFEntriesArray`. Functionally this just changed the order in the output slightly. The `dummy` variable is only necessary for the ELF linkers to generate the begin / end section. For COFF we make the `$a` and `$z` variables which perform a similar role. jhuber6: The CUDA / HIP cases did this separately. This patch merged it into a common method…
mstorsjoUnsubmitted Not Done Reply Inline Actions Ah, now I see - this is the second half of what's being merged into the cuda/hip call below in createRegisterGlobalsFunction. mstorsjo: Ah, now I see - this is the second half of what's being merged into the cuda/hip call below in…
DummyEntry->setVisibility(GlobalValue::HiddenVisibility);
DummyEntry->setSection(IsHIP ? "hip_offloading_entries"
: "cuda_offloading_entries");

return FatbinDesc;		return FatbinDesc;
}		}

/// Create the register globals function. We will iterate all of the offloading		/// Create the register globals function. We will iterate all of the offloading
/// entries stored at the begin / end symbols and register them according to		/// entries stored at the begin / end symbols and register them according to
/// their type. This creates the following function in IR:		/// their type. This creates the following function in IR:
///		///
/// extern struct __tgt_offload_entry __start_cuda_offloading_entries;		/// extern struct __tgt_offload_entry __start_cuda_offloading_entries;
Show All 12 Lines
/// entry->name, -1, 0, 0, 0, 0, 0);		/// entry->name, -1, 0, 0, 0, 0, 0);
/// else		/// else
/// __cudaRegisterVar(fatbinHandle, entry->addr, entry->name, entry->name,		/// __cudaRegisterVar(fatbinHandle, entry->addr, entry->name, entry->name,
/// 0, entry->size, 0, 0);		/// 0, entry->size, 0, 0);
/// }		/// }
/// }		/// }
Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {		Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
LLVMContext &C = M.getContext();		LLVMContext &C = M.getContext();
		llvm::Triple Triple(M.getTargetTriple());
// Get the __cudaRegisterFunction function declaration.		// Get the __cudaRegisterFunction function declaration.
auto *RegFuncTy = FunctionType::get(		auto *RegFuncTy = FunctionType::get(
Type::getInt32Ty(C),		Type::getInt32Ty(C),
{Type::getInt8PtrTy(C)->getPointerTo(), Type::getInt8PtrTy(C),		{Type::getInt8PtrTy(C)->getPointerTo(), Type::getInt8PtrTy(C),
Type::getInt8PtrTy(C), Type::getInt8PtrTy(C), Type::getInt32Ty(C),		Type::getInt8PtrTy(C), Type::getInt8PtrTy(C), Type::getInt32Ty(C),
Type::getInt8PtrTy(C), Type::getInt8PtrTy(C), Type::getInt8PtrTy(C),		Type::getInt8PtrTy(C), Type::getInt8PtrTy(C), Type::getInt8PtrTy(C),
Type::getInt8PtrTy(C), Type::getInt32PtrTy(C)},		Type::getInt8PtrTy(C), Type::getInt32PtrTy(C)},
/isVarArg/ false);		/isVarArg/ false);
FunctionCallee RegFunc = M.getOrInsertFunction(		FunctionCallee RegFunc = M.getOrInsertFunction(
IsHIP ? "__hipRegisterFunction" : "__cudaRegisterFunction", RegFuncTy);		IsHIP ? "__hipRegisterFunction" : "__cudaRegisterFunction", RegFuncTy);

// Get the __cudaRegisterVar function declaration.		// Get the __cudaRegisterVar function declaration.
auto *RegVarTy = FunctionType::get(		auto *RegVarTy = FunctionType::get(
Type::getVoidTy(C),		Type::getVoidTy(C),
{Type::getInt8PtrTy(C)->getPointerTo(), Type::getInt8PtrTy(C),		{Type::getInt8PtrTy(C)->getPointerTo(), Type::getInt8PtrTy(C),
Type::getInt8PtrTy(C), Type::getInt8PtrTy(C), Type::getInt32Ty(C),		Type::getInt8PtrTy(C), Type::getInt8PtrTy(C), Type::getInt32Ty(C),
getSizeTTy(M), Type::getInt32Ty(C), Type::getInt32Ty(C)},		getSizeTTy(M), Type::getInt32Ty(C), Type::getInt32Ty(C)},
/isVarArg/ false);		/isVarArg/ false);
FunctionCallee RegVar = M.getOrInsertFunction(		FunctionCallee RegVar = M.getOrInsertFunction(
IsHIP ? "__hipRegisterVar" : "__cudaRegisterVar", RegVarTy);		IsHIP ? "__hipRegisterVar" : "__cudaRegisterVar", RegVarTy);

// Create the references to the start / stop symbols defined by the linker.		Constant EntriesB, EntriesE;
auto *EntriesB =		if (Triple.isOSBinFormatCOFF())
new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0),		std::tie(EntriesB, EntriesE) =
/isConstant/ true, GlobalValue::ExternalLinkage,		getCOFFEntriesArray(M, IsHIP ? "hip" : "cuda");
/Initializer/ nullptr,		else
IsHIP ? "__start_hip_offloading_entries"		std::tie(EntriesB, EntriesE) =
: "__start_cuda_offloading_entries");		getELFEntriesArray(M, IsHIP ? "hip" : "cuda");
EntriesB->setVisibility(GlobalValue::HiddenVisibility);
auto *EntriesE =
new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0),
/isConstant/ true, GlobalValue::ExternalLinkage,
/Initializer/ nullptr,
IsHIP ? "__stop_hip_offloading_entries"
: "__stop_cuda_offloading_entries");
EntriesE->setVisibility(GlobalValue::HiddenVisibility);

auto *RegGlobalsTy = FunctionType::get(Type::getVoidTy(C),		auto *RegGlobalsTy = FunctionType::get(Type::getVoidTy(C),
Type::getInt8PtrTy(C)->getPointerTo(),		Type::getInt8PtrTy(C)->getPointerTo(),
/isVarArg/ false);		/isVarArg/ false);
auto *RegGlobalsFn =		auto *RegGlobalsFn =
Function::Create(RegGlobalsTy, GlobalValue::InternalLinkage,		Function::Create(RegGlobalsTy, GlobalValue::InternalLinkage,
IsHIP ? ".hip.globals_reg" : ".cuda.globals_reg", &M);		IsHIP ? ".hip.globals_reg" : ".cuda.globals_reg", &M);
RegGlobalsFn->setSection(".text.startup");		RegGlobalsFn->setSection(".text.startup");
▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

Show First 20 Lines • Show All 861 Lines • ▼ Show 20 Lines	public:
/// };		/// };
///		///
/// \param Addr The pointer to the global being registered.		/// \param Addr The pointer to the global being registered.
/// \param Name The symbol name associated with the global.		/// \param Name The symbol name associated with the global.
/// \param Size The size in bytes of the global (0 for functions).		/// \param Size The size in bytes of the global (0 for functions).
/// \param Flags Flags associated with the entry.		/// \param Flags Flags associated with the entry.
/// \param SectionName The section this entry will be placed at.		/// \param SectionName The section this entry will be placed at.
void emitOffloadingEntry(Constant *Addr, StringRef Name, uint64_t Size,		void emitOffloadingEntry(Constant *Addr, StringRef Name, uint64_t Size,
int32_t Flags,		int32_t Flags, StringRef SectionName);
StringRef SectionName = "omp_offloading_entries");

/// Generate control flow and cleanup for cancellation.		/// Generate control flow and cleanup for cancellation.
///		///
/// \param CancelFlag Flag indicating if the cancellation is performed.		/// \param CancelFlag Flag indicating if the cancellation is performed.
/// \param CanceledDirective The kind of directive that is cancled.		/// \param CanceledDirective The kind of directive that is cancled.
/// \param ExitCB Extra code to be generated in the exit block.		/// \param ExitCB Extra code to be generated in the exit block.
void emitCancelationCheckImpl(Value *CancelFlag,		void emitCancelationCheckImpl(Value *CancelFlag,
omp::Directive CanceledDirective,		omp::Directive CanceledDirective,
▲ Show 20 Lines • Show All 1,331 Lines • Show Last 20 Lines

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Show First 20 Lines • Show All 4,691 Lines • ▼ Show 20 Lines	void OpenMPIRBuilder::OutlineInfo::collectBlocks(
}		}
}		}

void OpenMPIRBuilder::createOffloadEntry(bool IsTargetCodegen, Constant *ID,		void OpenMPIRBuilder::createOffloadEntry(bool IsTargetCodegen, Constant *ID,
Constant *Addr, uint64_t Size,		Constant *Addr, uint64_t Size,
int32_t Flags,		int32_t Flags,
GlobalValue::LinkageTypes) {		GlobalValue::LinkageTypes) {
if (!IsTargetCodegen) {		if (!IsTargetCodegen) {
emitOffloadingEntry(ID, Addr->getName(), Size, Flags);		llvm::Triple Triple(M.getTargetTriple());
		emitOffloadingEntry(ID, Addr->getName(), Size, Flags,
		Triple.isOSBinFormatCOFF() ? ".omp$OE"
		: "omp_offloading_entries");
return;		return;
}		}
// TODO: Add support for global variables on the device after declare target		// TODO: Add support for global variables on the device after declare target
// support.		// support.
Function *Fn = dyn_cast<Function>(Addr);		Function *Fn = dyn_cast<Function>(Addr);
if (!Fn)		if (!Fn)
return;		return;

▲ Show 20 Lines • Show All 495 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Offloading] Initial support for registering offloading entries on COFF targetsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 473355

clang/lib/CodeGen/CGCUDANV.cpp

clang/test/CodeGenCUDA/offloading-entries.cu

clang/test/Driver/linker-wrapper-image.c

clang/test/OpenMP/declare_target_link_codegen.cpp

clang/tools/clang-linker-wrapper/OffloadWrapper.cpp

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

[Offloading] Initial support for registering offloading entries on COFF targets
Needs ReviewPublic