This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/plugins-nextgen/
-
libomptarget/
-
plugins-nextgen/
-
amdgpu/src/
-
src/
-
rtl.cpp
-
common/PluginInterface/
-
PluginInterface/
-
PluginInterface.cpp
-
cuda/src/
-
src/
-
rtl.cpp

Differential D147756

[Libomptarget] Load an image if it is compatible with at least one device
Needs ReviewPublic

Authored by jhuber6 on Apr 6 2023, 7:14 PM.

Download Raw Diff

Details

Reviewers

estewart08
jdoerfert
JonChesterfield
ronlieb
ye-luo
tianshilei1992

Summary

Currently, we only load an image if its supported architecture matches
all of the devices found. This prevents us from supporting a system with
multiple GPUs from the same vendor but different architectures. This
patch makes a very simple change that returns that the image is
incompatible only once we've searched every device.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Apr 6 2023, 7:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 7:14 PM

Herald added subscribers: kosarev, kerbowa, jvesely. · View Herald Transcript

jhuber6 requested review of this revision.Apr 6 2023, 7:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 7:14 PM

Herald added a subscriber: openmp-commits. · View Herald Transcript

Harbormaster completed remote builds in B224146: Diff 511590.Apr 6 2023, 7:17 PM

Herald added subscribers: jplehr, sstefan1. · View Herald TranscriptApr 6 2023, 7:17 PM

Currently, we only load an image if its supported architecture matches all of the devices found.

Cannot understand

On my machine with one gfx906 GPU and one sm_86 GPU. When I build offload to both GPUs.
Neither image supports both GPUs but they are both loaded fine.

clang++ -fopenmp --offload-arch=gfx906,sm_80 main.cpp
LIBOMPTARGET_DEBUG=1 ./a.out
Libomptarget --> Init target library!
Libomptarget --> Loading RTLs...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.nextgen.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.nextgen.so': libomptarget.rtl.ppc64.nextgen.so: cannot open shared object file: No such file or directory!
Libomptarget --> Falling back to original plugin...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so': libomptarget.rtl.ppc64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.nextgen.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.nextgen.so'!
Libomptarget --> Registering RTL libomptarget.rtl.x86_64.nextgen.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.nextgen.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.nextgen.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.nextgen.so supporting 1 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.nextgen.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.nextgen.so': libomptarget.rtl.aarch64.nextgen.so: cannot open shared object file: No such file or directory!
Libomptarget --> Falling back to original plugin...
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.ve.nextgen.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ve.nextgen.so': libomptarget.rtl.ve.nextgen.so: cannot open shared object file: No such file or directory!
Libomptarget --> Falling back to original plugin...
Libomptarget --> Loading library 'libomptarget.rtl.ve.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ve.so': libomptarget.rtl.ve.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.amdgpu.nextgen.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.amdgpu.nextgen.so'!
Libomptarget --> Registering RTL libomptarget.rtl.amdgpu.nextgen.so supporting 1 devices!
Libomptarget --> Loading library 'libomptarget.rtl.rpc.nextgen.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.rpc.nextgen.so': libomptarget.rtl.rpc.nextgen.so: cannot open shared object file: No such file or directory!
Libomptarget --> Falling back to original plugin...
Libomptarget --> Loading library 'libomptarget.rtl.rpc.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.rpc.so': libomptarget.rtl.rpc.so: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> OMPT: Enter ompt_init
OMPT --> OMPT: Trying to load library libomp.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomp_connect
OMPT --> OMPT: Library connection handle = 0x7fb8c650f5d0
Libomptarget --> OMPT: Exit ompt_init
Libomptarget --> Image 0x00005635224dc0d8 is NOT compatible with RTL !
PluginInterface --> Image is compatible with current environment: sm_80
Libomptarget --> Image 0x00005635224dc0d8 is compatible with RTL !
Libomptarget --> RTL 0x000056352392e0a0 has index 0!
Libomptarget --> Registering image 0x00005635224dc0d8 with RTL !
Libomptarget --> Image 0x00005635224fbb80 is NOT compatible with RTL !
Libomptarget --> Image 0x00005635224fbb80 is NOT compatible with RTL !
TARGET AMDGPU RTL --> Compatible: Target IDs are compatible 	[Image: gfx906]	:	[Env: gfx906:sramecc+:xnack-]
PluginInterface --> Image is compatible with current environment: gfx906
Libomptarget --> Image 0x00005635224fbb80 is compatible with RTL !
Libomptarget --> RTL 0x0000563523963d40 has index 1!
Libomptarget --> Registering image 0x00005635224fbb80 with RTL !
Libomptarget --> Done registering entries!
Libomptarget --> Entering target region for device -1 with entry point 0x00005635224dc004
Libomptarget --> Call to omp_get_num_devices returning 2
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices were found)
Libomptarget --> Use default device id 0
Libomptarget --> Call to omp_get_num_devices returning 2
Libomptarget --> Call to omp_get_num_devices returning 2
Libomptarget --> Call to omp_get_initial_device returning 2
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
TARGET CUDA RTL --> The primary context is inactive, set its flags to CU_CTX_SCHED_BLOCKING_SYNC
Libomptarget --> Device 0 is ready to use.
PluginInterface --> Load data from image 0x00005635224dc0d8
PluginInterface --> Succesfully write 16 bytes associated with global symbol '__omp_rtl_device_environment' to the device (0x7fb78be00000 -> 0x7ffd028aec10).
TARGET CUDA RTL --> Entry point 0x00005635224ff048 maps to __omp_offloading_10307_18c6198_main_l4 (0x000056352458f130)
PluginInterface --> Global symbol '__omp_offloading_10307_18c6198_main_l4_exec_mode' was found in the ELF image and 1 bytes will copied from 0x5635224fa770 to 0x7ffd028aeaa0.
PluginInterface --> Entry point 0x0000000000000000 maps to __omp_offloading_10307_18c6198_main_l4 (0x0000563524558980)
Libomptarget --> loop trip count is 0.
Libomptarget --> Launching target execution __omp_offloading_10307_18c6198_main_l4 with pointer 0x0000563524558980 (index=0).
PluginInterface --> Launching kernel __omp_offloading_10307_18c6198_main_l4 with 1 blocks and 128 threads in Generic mode
Libomptarget --> Unloading target library!
Libomptarget --> Unregistered image 0x00005635224dc0d8 from RTL 0x000056352392e0a0!
Libomptarget --> Unregistered image 0x00005635224fbb80 from RTL 0x0000563523963d40!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 0x00005635224ff048
Libomptarget --> Done unregistering library!
Libomptarget --> Deinit target library!

In D147756#4250392, @ye-luo wrote:

Currently, we only load an image if its supported architecture matches all of the devices found.

Cannot understand

On my machine with one gfx906 GPU and one sm_86 GPU. When I build offload to both GPUs.
Neither image supports both GPUs but they are both loaded fine.

I should have specified "If its architecture matches all of the devices found for a single plugin" E.g. currently we will not initialize two devices on a machine with both gfx90a and gfx908.

What we currently have conservatively determines that a plugin is compatible to the image if all devices managed by the plugin are compatible. This is because when we initialize the device vector, we use a one-for-all style: if a plugin is compatible, we assume we can use all devices managed by the plugin. This patch will break the assumption.

In D147756#4251325, @tianshilei1992 wrote:

What we currently have conservatively determines that a plugin is compatible to the image if all devices managed by the plugin are compatible. This is because when we initialize the device vector, we use a one-for-all style: if a plugin is compatible, we assume we can use all devices managed by the plugin. This patch will break the assumption.

So if the system has a gfx908 and a gfx90a and --offload-arch=gfx908,gfx90a then it will say both images are not compatible because each image does not support both devices.

In D147756#4251372, @estewart08 wrote:

In D147756#4251325, @tianshilei1992 wrote:

What we currently have conservatively determines that a plugin is compatible to the image if all devices managed by the plugin are compatible. This is because when we initialize the device vector, we use a one-for-all style: if a plugin is compatible, we assume we can use all devices managed by the plugin. This patch will break the assumption.

So if the system has a gfx908 and a gfx90a and --offload-arch=gfx908,gfx90a then it will say both images are not compatible because each image does not support both devices.

Yep, unfortunately that's the issue with current implementation. We have to fix it.

It sounds like this might break some invariants we have about plugins, devices, and images.

If this does not work, some alternative ideas:
What if we make the plugins detect that they have different architectures and they create one instance per arch?
Though, we could also force the plugin to choose. If device 0 works with in image and device 1 does not, we ignore 1 from now on.

In D147756#4251909, @jdoerfert wrote:

It sounds like this might break some invariants we have about plugins, devices, and images.

If this does not work, some alternative ideas:
What if we make the plugins detect that they have different architectures and they create one instance per arch?
Though, we could also force the plugin to choose. If device 0 works with in image and device 1 does not, we ignore 1 from now on.

Yes, the problem right now is that we assign device ID's based on the logical devices found, not whether or not they loaded an image. So to make this work we would need to initialize the devices in order of images we find them compatible for. So on a system offloading to gfx90a and gfx908, if we had a gfx908 image, we would find it matches gfx908 and assign that device to zero. This is definitely a break from the current logic where we simply initialize all the devices found.

In D147756#4252032, @jhuber6 wrote:

In D147756#4251909, @jdoerfert wrote:

It sounds like this might break some invariants we have about plugins, devices, and images.

If this does not work, some alternative ideas:
What if we make the plugins detect that they have different architectures and they create one instance per arch?
Though, we could also force the plugin to choose. If device 0 works with in image and device 1 does not, we ignore 1 from now on.

Yes, the problem right now is that we assign device ID's based on the logical devices found, not whether or not they loaded an image. So to make this work we would need to initialize the devices in order of images we find them compatible for. So on a system offloading to gfx90a and gfx908, if we had a gfx908 image, we would find it matches gfx908 and assign that device to zero. This is definitely a break from the current logic where we simply initialize all the devices found.

This is not part of this patch though, or am I missing something here?

In D147756#4329242, @jplehr wrote:

This is not part of this patch though, or am I missing something here?

It's not, I realized that after the fact and haven't invested the time to fix it.

Revision Contents

Path

Size

openmp/

libomptarget/

plugins-nextgen/

amdgpu/

src/

rtl.cpp

11 lines

common/

PluginInterface/

PluginInterface.cpp

2 lines

cuda/

src/

rtl.cpp

6 lines

Diff 511590

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp

Show First 20 Lines • Show All 2,442 Lines • ▼ Show 20 Lines	for (hsa_agent_t Agent : KernelAgents) {
std::string Target;		std::string Target;
auto Err = utils::iterateAgentISAs(Agent, [&](hsa_isa_t ISA) {		auto Err = utils::iterateAgentISAs(Agent, [&](hsa_isa_t ISA) {
uint32_t Length;		uint32_t Length;
hsa_status_t Status;		hsa_status_t Status;
Status = hsa_isa_get_info_alt(ISA, HSA_ISA_INFO_NAME_LENGTH, &Length);		Status = hsa_isa_get_info_alt(ISA, HSA_ISA_INFO_NAME_LENGTH, &Length);
if (Status != HSA_STATUS_SUCCESS)		if (Status != HSA_STATUS_SUCCESS)
return Status;		return Status;

// TODO: This is not allowed by the standard.		std::string ISAName(Length, '\0');
char ISAName[Length];		Status = hsa_isa_get_info_alt(ISA, HSA_ISA_INFO_NAME, ISAName.data());
Status = hsa_isa_get_info_alt(ISA, HSA_ISA_INFO_NAME, ISAName);
if (Status != HSA_STATUS_SUCCESS)		if (Status != HSA_STATUS_SUCCESS)
return Status;		return Status;

llvm::StringRef TripleTarget(ISAName);		llvm::StringRef TripleTarget(ISAName);
if (TripleTarget.consume_front("amdgcn-amd-amdhsa"))		if (TripleTarget.consume_front("amdgcn-amd-amdhsa"))
Target = TripleTarget.ltrim('-').str();		Target = TripleTarget.ltrim('-').str();
return HSA_STATUS_SUCCESS;		return HSA_STATUS_SUCCESS;
});		});
if (Err)		if (Err)
return std::move(Err);		return std::move(Err);

if (!utils::isImageCompatibleWithEnv(Info, Target))		if (utils::isImageCompatibleWithEnv(Info, Target))
return false;
}
return true;		return true;
}		}
		return false;
		}

/// This plugin does not support exchanging data between two devices.		/// This plugin does not support exchanging data between two devices.
bool isDataExchangable(int32_t SrcDeviceId, int32_t DstDeviceId) override {		bool isDataExchangable(int32_t SrcDeviceId, int32_t DstDeviceId) override {
return false;		return false;
}		}

/// Get the host device instance.		/// Get the host device instance.
AMDHostDeviceTy &getHostDevice() {		AMDHostDeviceTy &getHostDevice() {
▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp

Show First 20 Lines • Show All 1,159 Lines • ▼ Show 20 Lines	if (!CompatibleOrErr) {
std::string ErrString = toString(CompatibleOrErr.takeError());		std::string ErrString = toString(CompatibleOrErr.takeError());
DP("Failure to check whether image %p is valid: %s\n", TgtImage,		DP("Failure to check whether image %p is valid: %s\n", TgtImage,
ErrString.data());		ErrString.data());
return false;		return false;
}		}

bool Compatible = *CompatibleOrErr;		bool Compatible = *CompatibleOrErr;
DP("Image is %scompatible with current environment: %s\n",		DP("Image is %scompatible with current environment: %s\n",
(Compatible) ? "" : "not", Info->Arch);		(Compatible) ? "" : "not ", Info->Arch);

return Compatible;		return Compatible;
}		}

int32_t __tgt_rtl_supports_empty_images() {		int32_t __tgt_rtl_supports_empty_images() {
return Plugin::get().supportsEmptyImages();		return Plugin::get().supportsEmptyImages();
}		}

▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp

Show First 20 Lines • Show All 956 Lines • ▼ Show 20 Lines	for (int32_t DevId = 0; DevId < getNumDevices(); ++DevId) {
return Plugin::error("Unrecognized image arch %s", ArchStr.data());		return Plugin::error("Unrecognized image arch %s", ArchStr.data());

int32_t ImageMajor = ArchStr[PrefixStr.size() + 0] - '0';		int32_t ImageMajor = ArchStr[PrefixStr.size() + 0] - '0';
int32_t ImageMinor = ArchStr[PrefixStr.size() + 1] - '0';		int32_t ImageMinor = ArchStr[PrefixStr.size() + 1] - '0';

// A cubin generated for a certain compute capability is supported to run		// A cubin generated for a certain compute capability is supported to run
// on any GPU with the same major revision and same or higher minor		// on any GPU with the same major revision and same or higher minor
// revision.		// revision.
if (Major != ImageMajor \|\| Minor < ImageMinor)		if (Major == ImageMajor && Minor >= ImageMinor)
return false;
}
return true;		return true;
}		}
		return false;
		}
};		};

Error CUDADeviceTy::dataExchangeImpl(const void *SrcPtr,		Error CUDADeviceTy::dataExchangeImpl(const void *SrcPtr,
GenericDeviceTy &DstGenericDevice,		GenericDeviceTy &DstGenericDevice,
void *DstPtr, int64_t Size,		void *DstPtr, int64_t Size,
AsyncInfoWrapperTy &AsyncInfoWrapper) {		AsyncInfoWrapperTy &AsyncInfoWrapper) {
if (auto Err = setContext())		if (auto Err = setContext())
return Err;		return Err;
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines