This is an archive of the discontinued LLVM Phabricator instance.

[WIP][OpenMP] Propagate 'declare target' information to device
AbandonedPublic

Authored by Hahnfeld on Aug 18 2018, 7:11 AM.

Details

Reviewers
None
Summary

Emit IR metadata with list of all functions found during host CodeGen.

Diff Detail

Event Timeline

Hahnfeld created this revision.Aug 18 2018, 7:11 AM

This doesn't work the way I hoped :-(

The problem is that implicit declare target isn't determined during Sema and not present in AST. Instead this is handled in CodeGen (using addDeferredDeclToEmit if I understand rL327636 correctly).
So unless we reconsider something like D38798 the host doesn't know which functions are implicitly declare target and consequently can't propagate that information to the device compilation.

D38798 does not work correctly and there is no stable Sema-based solution to do this absolutely correctly. We need to adapt the codegen so that it emits metadata for the implicit declare target functions.

Best regards,
Alexey Bataev

18 авг. 2018 г., в 10:26, Jonas Hahnfeld via Phabricator <reviews@reviews.llvm.org> написал(а):

Hahnfeld added a comment.

This doesn't work the way I hoped :-(

The problem is that implicit declare target isn't determined during Sema and not present in AST. Instead this is handled in CodeGen (using addDeferredDeclToEmit if I understand https://reviews.llvm.org/rL327636 correctly).
So unless we reconsider something like https://reviews.llvm.org/D38798 the host doesn't know which functions are implicitly declare target and consequently can't propagate that information to the device compilation.

https://reviews.llvm.org/D50939

Hahnfeld planned changes to this revision.Aug 20 2018, 2:55 AM

Okay, looks promising so far. I'll update this revision once ready for testing.

Hahnfeld updated this revision to Diff 162244.Aug 23 2018, 11:52 AM
Hahnfeld edited the summary of this revision. (Show Details)
Hahnfeld planned changes to this revision.Aug 23 2018, 11:59 AM

This revision correctly compiles the following two examples iff BREAK is not defined:

#include <stdint.h>

static void cpuid(int i, uint32_t *regs) {
  asm volatile("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
               : "a" (i), "c" (0));
}

void host() {
  uint32_t regs[4];
  cpuid(0, regs);
}

#pragma omp declare target
void target() {
#ifdef BREAK
  uint32_t regs[4];
  cpuid(0, regs);
#endif
}
#pragma omp end declare target
static void vla(int n) {
  int a[n];
}

void target() {
  vla(123);

#pragma omp target
{
#ifdef BREAK
  vla(123);
#endif
}
}

It doesn't work for C++ yet because the LLVM function names are already mangled, so we need to do the same when checking in Sema. I think the tricky thing will be templates: Because we don't know which specialization will eventually be emitted we might need to check all of them if a template is referenced in device code. Thoughts?

It doesn't work for C++ yet because the LLVM function names are already mangled, so we need to do the same when checking in Sema. I think the tricky thing will be templates: Because we don't know which specialization will eventually be emitted we might need to check all of them if a template is referenced in device code. Thoughts?

I expected a lot of problems with C++ and I don't think we will be able to resolve this, especially with the templates. We need to try some other solutions.

Hahnfeld abandoned this revision.Aug 26 2018, 6:16 AM

It doesn't work for C++ yet because the LLVM function names are already mangled, so we need to do the same when checking in Sema. I think the tricky thing will be templates: Because we don't know which specialization will eventually be emitted we might need to check all of them if a template is referenced in device code. Thoughts?

I expected a lot of problems with C++ and I don't think we will be able to resolve this, especially with the templates. We need to try some other solutions.

I tend to agree.

My next idea was to "abuse" -fdelayed-template-parsing:

#include <stdint.h>

template <typename T>
void cpuid(int i, uint32_t *regs) {
  asm volatile("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
               : "a" (i), "c" (0));
}

template <typename T>
void host() {
  uint32_t regs[4];
  cpuid<T>(0, regs);
}

#pragma omp declare target
void target() {
#ifdef BREAK
  uint32_t regs[4];
  cpuid<bool>(0, regs);
#endif
}
#pragma omp end declare target

Clang will delay checking templates (= functions outside of declare target) until they are instantiated during CodeGen. However we also need to find all target directives to outline the offload entries, but we can't know this without actually parsing (and checking) all functions. So I don't think this will work either.