diff --git a/openmp/libomptarget/docs/declare_target_indirect.md b/openmp/libomptarget/docs/declare_target_indirect.md new file mode 100644 --- /dev/null +++ b/openmp/libomptarget/docs/declare_target_indirect.md @@ -0,0 +1,51 @@ +# Overview +The indirect clause enables **indirect device invocation** for a procedure: +> 19 An indirect call to the device version of a procedure on a device other than the host
+> 20 device, through a function pointer (C/C++), a pointer to a member function (C++) or
+> 21 a procedure pointer (Fortran) that refers to the host version of the procedure. + +# Compiler/runtime support +### Offload entries table + +The offload entries table that is created for the host and for each of the device images currently have entries for **declare target** global variables, **omp target** outlined functions and constructor/destructor thunks for **declare target** global variables. + + +Compiler will also produce an entry for each procedure listed in **indirect** clause of **declare target** construct: +```C++ +struct __tgt_offload_entry { + void *addr; // Pointer to the function + char *name; // Name of the function + size_t size; // 0 for function + int32_t flags; // OpenMPOffloadingDeclareTargetFlags::OMP_DECLARE_TARGET_FPTR + int32_t reserved; // Reserved +}; +``` + +### Run-time dispatch in device code + +When an indirect function call is generated by a FE in **device code** it defines the following global variable for the translation module: + +```C++ +__attribute__((weak)) struct __openmp_offload_function_ptr_map_ty { + int64_t host_ptr; // key + int64_t tgt_ptr; // value +} *__openmp_offload_function_ptr_map = 0; +``` + +FE generates runtime lookup code to match the function address against the key `host_ptr` and produce the new function address `tgt_ptr` that is then used for the indirect function call. + +#### Optimization for non-unified_shared_memory + +Since all pointers are supposed to be translated/mapped, when program does not use **required unified_shared_memory**, it is possible to avoid generating the runtime dispatch code for indirect function calls. The mapping between host and device address of an indirect function will be established by `libomptarget` during processing of the offload entries table. + +## Runtime handling of function pointers + +`OpenMPOffloadingDeclareTargetFlags::OMP_DECLARE_TARGET_FPTR` is a new flag to distinguish offload entries for function pointers from other function entries. Unlike other function entries (with `size` equal to 0) `omptarget::InitLibrary()` will establish mapping for function pointer entries in `Device.HostDataToTargetMap`. + +Once `Device.HostDataToTargetMap` is populated, `libomptarget` walks the host offload entries table and creates an entry in the host version of `__openmp_offload_function_ptr_map` for each `OMP_DECLARE_TARGET_PTR` entry - the device pointer is taken from `Device.HostDataToTargetMap`. `libomptarget` sorts the host version of `__openmp_offload_function_ptr_map` by `key` values and then transfers the table to the device memory (implying device memory allocation via `omp_target_alloc`). + +The device address of the transferred data is then assigned into `__openmp_offload_function_ptr_map` on the device. The assignment may be made in different ways, so it is the plugin responsibility to do the assignment in a target dependent way. Plugins provide an optional interface that implement the assignment, e.g. `__tgt_rtl_set_function_ptr_map(int32_t device_id, void *device_addr)`, where `device_addr` has the same characteristics as an address returned by `omp_target_alloc` invoked for the same device identified by `device_id`. + +#### Optimization for non-unified_shared_memory + +For programs that do not use **required unified_shared_memory** only `Device.HostDataToTargetMap` mapping is necessary.