This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Load an image if it is compatible with at least one device
Needs ReviewPublic

Authored by jhuber6 on Apr 6 2023, 7:14 PM.

Details

Summary

Currently, we only load an image if its supported architecture matches
all of the devices found. This prevents us from supporting a system with
multiple GPUs from the same vendor but different architectures. This
patch makes a very simple change that returns that the image is
incompatible only once we've searched every device.

Diff Detail

Event Timeline

jhuber6 created this revision.Apr 6 2023, 7:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 7:14 PM
jhuber6 requested review of this revision.Apr 6 2023, 7:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 7:14 PM
ye-luo added a comment.Apr 6 2023, 7:27 PM

Currently, we only load an image if its supported architecture matches all of the devices found.

Cannot understand

On my machine with one gfx906 GPU and one sm_86 GPU. When I build offload to both GPUs.
Neither image supports both GPUs but they are both loaded fine.

clang++ -fopenmp --offload-arch=gfx906,sm_80 main.cpp
LIBOMPTARGET_DEBUG=1 ./a.out
Libomptarget --> Init target library!
Libomptarget --> Loading RTLs...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.nextgen.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.nextgen.so': libomptarget.rtl.ppc64.nextgen.so: cannot open shared object file: No such file or directory!
Libomptarget --> Falling back to original plugin...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so': libomptarget.rtl.ppc64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.nextgen.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.nextgen.so'!
Libomptarget --> Registering RTL libomptarget.rtl.x86_64.nextgen.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.nextgen.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.nextgen.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.nextgen.so supporting 1 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.nextgen.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.nextgen.so': libomptarget.rtl.aarch64.nextgen.so: cannot open shared object file: No such file or directory!
Libomptarget --> Falling back to original plugin...
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.ve.nextgen.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ve.nextgen.so': libomptarget.rtl.ve.nextgen.so: cannot open shared object file: No such file or directory!
Libomptarget --> Falling back to original plugin...
Libomptarget --> Loading library 'libomptarget.rtl.ve.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ve.so': libomptarget.rtl.ve.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.amdgpu.nextgen.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.amdgpu.nextgen.so'!
Libomptarget --> Registering RTL libomptarget.rtl.amdgpu.nextgen.so supporting 1 devices!
Libomptarget --> Loading library 'libomptarget.rtl.rpc.nextgen.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.rpc.nextgen.so': libomptarget.rtl.rpc.nextgen.so: cannot open shared object file: No such file or directory!
Libomptarget --> Falling back to original plugin...
Libomptarget --> Loading library 'libomptarget.rtl.rpc.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.rpc.so': libomptarget.rtl.rpc.so: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> OMPT: Enter ompt_init
OMPT --> OMPT: Trying to load library libomp.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomp_connect
OMPT --> OMPT: Library connection handle = 0x7fb8c650f5d0
Libomptarget --> OMPT: Exit ompt_init
Libomptarget --> Image 0x00005635224dc0d8 is NOT compatible with RTL !
PluginInterface --> Image is compatible with current environment: sm_80
Libomptarget --> Image 0x00005635224dc0d8 is compatible with RTL !
Libomptarget --> RTL 0x000056352392e0a0 has index 0!
Libomptarget --> Registering image 0x00005635224dc0d8 with RTL !
Libomptarget --> Image 0x00005635224fbb80 is NOT compatible with RTL !
Libomptarget --> Image 0x00005635224fbb80 is NOT compatible with RTL !
TARGET AMDGPU RTL --> Compatible: Target IDs are compatible 	[Image: gfx906]	:	[Env: gfx906:sramecc+:xnack-]
PluginInterface --> Image is compatible with current environment: gfx906
Libomptarget --> Image 0x00005635224fbb80 is compatible with RTL !
Libomptarget --> RTL 0x0000563523963d40 has index 1!
Libomptarget --> Registering image 0x00005635224fbb80 with RTL !
Libomptarget --> Done registering entries!
Libomptarget --> Entering target region for device -1 with entry point 0x00005635224dc004
Libomptarget --> Call to omp_get_num_devices returning 2
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices were found)
Libomptarget --> Use default device id 0
Libomptarget --> Call to omp_get_num_devices returning 2
Libomptarget --> Call to omp_get_num_devices returning 2
Libomptarget --> Call to omp_get_initial_device returning 2
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
TARGET CUDA RTL --> The primary context is inactive, set its flags to CU_CTX_SCHED_BLOCKING_SYNC
Libomptarget --> Device 0 is ready to use.
PluginInterface --> Load data from image 0x00005635224dc0d8
PluginInterface --> Succesfully write 16 bytes associated with global symbol '__omp_rtl_device_environment' to the device (0x7fb78be00000 -> 0x7ffd028aec10).
TARGET CUDA RTL --> Entry point 0x00005635224ff048 maps to __omp_offloading_10307_18c6198_main_l4 (0x000056352458f130)
PluginInterface --> Global symbol '__omp_offloading_10307_18c6198_main_l4_exec_mode' was found in the ELF image and 1 bytes will copied from 0x5635224fa770 to 0x7ffd028aeaa0.
PluginInterface --> Entry point 0x0000000000000000 maps to __omp_offloading_10307_18c6198_main_l4 (0x0000563524558980)
Libomptarget --> loop trip count is 0.
Libomptarget --> Launching target execution __omp_offloading_10307_18c6198_main_l4 with pointer 0x0000563524558980 (index=0).
PluginInterface --> Launching kernel __omp_offloading_10307_18c6198_main_l4 with 1 blocks and 128 threads in Generic mode
Libomptarget --> Unloading target library!
Libomptarget --> Unregistered image 0x00005635224dc0d8 from RTL 0x000056352392e0a0!
Libomptarget --> Unregistered image 0x00005635224fbb80 from RTL 0x0000563523963d40!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 0x00005635224ff048
Libomptarget --> Done unregistering library!
Libomptarget --> Deinit target library!

Currently, we only load an image if its supported architecture matches all of the devices found.

Cannot understand

On my machine with one gfx906 GPU and one sm_86 GPU. When I build offload to both GPUs.
Neither image supports both GPUs but they are both loaded fine.

I should have specified "If its architecture matches all of the devices found for a single plugin" E.g. currently we will not initialize two devices on a machine with both gfx90a and gfx908.

What we currently have conservatively determines that a plugin is compatible to the image if all devices managed by the plugin are compatible. This is because when we initialize the device vector, we use a one-for-all style: if a plugin is compatible, we assume we can use all devices managed by the plugin. This patch will break the assumption.

What we currently have conservatively determines that a plugin is compatible to the image if all devices managed by the plugin are compatible. This is because when we initialize the device vector, we use a one-for-all style: if a plugin is compatible, we assume we can use all devices managed by the plugin. This patch will break the assumption.

So if the system has a gfx908 and a gfx90a and --offload-arch=gfx908,gfx90a then it will say both images are not compatible because each image does not support both devices.

What we currently have conservatively determines that a plugin is compatible to the image if all devices managed by the plugin are compatible. This is because when we initialize the device vector, we use a one-for-all style: if a plugin is compatible, we assume we can use all devices managed by the plugin. This patch will break the assumption.

So if the system has a gfx908 and a gfx90a and --offload-arch=gfx908,gfx90a then it will say both images are not compatible because each image does not support both devices.

Yep, unfortunately that's the issue with current implementation. We have to fix it.

It sounds like this might break some invariants we have about plugins, devices, and images.

If this does not work, some alternative ideas:
What if we make the plugins detect that they have different architectures and they create one instance per arch?
Though, we could also force the plugin to choose. If device 0 works with in image and device 1 does not, we ignore 1 from now on.

It sounds like this might break some invariants we have about plugins, devices, and images.

If this does not work, some alternative ideas:
What if we make the plugins detect that they have different architectures and they create one instance per arch?
Though, we could also force the plugin to choose. If device 0 works with in image and device 1 does not, we ignore 1 from now on.

Yes, the problem right now is that we assign device ID's based on the logical devices found, not whether or not they loaded an image. So to make this work we would need to initialize the devices in order of images we find them compatible for. So on a system offloading to gfx90a and gfx908, if we had a gfx908 image, we would find it matches gfx908 and assign that device to zero. This is definitely a break from the current logic where we simply initialize all the devices found.

jplehr added a comment.May 9 2023, 4:05 AM

It sounds like this might break some invariants we have about plugins, devices, and images.

If this does not work, some alternative ideas:
What if we make the plugins detect that they have different architectures and they create one instance per arch?
Though, we could also force the plugin to choose. If device 0 works with in image and device 1 does not, we ignore 1 from now on.

Yes, the problem right now is that we assign device ID's based on the logical devices found, not whether or not they loaded an image. So to make this work we would need to initialize the devices in order of images we find them compatible for. So on a system offloading to gfx90a and gfx908, if we had a gfx908 image, we would find it matches gfx908 and assign that device to zero. This is definitely a break from the current logic where we simply initialize all the devices found.

This is not part of this patch though, or am I missing something here?

This is not part of this patch though, or am I missing something here?

It's not, I realized that after the fact and haven't invested the time to fix it.