Emit IR metadata with list of all functions found during host CodeGen.
Details
- Reviewers
- None
Diff Detail
Event Timeline
This doesn't work the way I hoped :-(
The problem is that implicit declare target isn't determined during Sema and not present in AST. Instead this is handled in CodeGen (using addDeferredDeclToEmit if I understand rL327636 correctly).
So unless we reconsider something like D38798 the host doesn't know which functions are implicitly declare target and consequently can't propagate that information to the device compilation.
D38798 does not work correctly and there is no stable Sema-based solution to do this absolutely correctly. We need to adapt the codegen so that it emits metadata for the implicit declare target functions.
Best regards,
Alexey Bataev
18 авг. 2018 г., в 10:26, Jonas Hahnfeld via Phabricator <reviews@reviews.llvm.org> написал(а):
Hahnfeld added a comment.
This doesn't work the way I hoped :-(
The problem is that implicit declare target isn't determined during Sema and not present in AST. Instead this is handled in CodeGen (using addDeferredDeclToEmit if I understand https://reviews.llvm.org/rL327636 correctly).
So unless we reconsider something like https://reviews.llvm.org/D38798 the host doesn't know which functions are implicitly declare target and consequently can't propagate that information to the device compilation.
This revision correctly compiles the following two examples iff BREAK is not defined:
#include <stdint.h> static void cpuid(int i, uint32_t *regs) { asm volatile("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) : "a" (i), "c" (0)); } void host() { uint32_t regs[4]; cpuid(0, regs); } #pragma omp declare target void target() { #ifdef BREAK uint32_t regs[4]; cpuid(0, regs); #endif } #pragma omp end declare target
static void vla(int n) { int a[n]; } void target() { vla(123); #pragma omp target { #ifdef BREAK vla(123); #endif } }
It doesn't work for C++ yet because the LLVM function names are already mangled, so we need to do the same when checking in Sema. I think the tricky thing will be templates: Because we don't know which specialization will eventually be emitted we might need to check all of them if a template is referenced in device code. Thoughts?
I expected a lot of problems with C++ and I don't think we will be able to resolve this, especially with the templates. We need to try some other solutions.
I tend to agree.
My next idea was to "abuse" -fdelayed-template-parsing:
#include <stdint.h> template <typename T> void cpuid(int i, uint32_t *regs) { asm volatile("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) : "a" (i), "c" (0)); } template <typename T> void host() { uint32_t regs[4]; cpuid<T>(0, regs); } #pragma omp declare target void target() { #ifdef BREAK uint32_t regs[4]; cpuid<bool>(0, regs); #endif } #pragma omp end declare target
Clang will delay checking templates (= functions outside of declare target) until they are instantiated during CodeGen. However we also need to find all target directives to outline the offload entries, but we can't know this without actually parsing (and checking) all functions. So I don't think this will work either.