This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Implement hsa plugin
AbandonedPublic

Authored by JonChesterfield on Dec 11 2019, 6:26 PM.

Details

Summary

[libomptarget] Implement hsa plugin

This is a close approximation to the plugin we are using in production.
Differs in that a header containing constants is inlined and the file clang-formatted.
There are some todos in the source - Ron has volunteered to look at them.

It occurred to me that this doesn't have to be especially polished code as
it's only bridging the gap between amdgcn and atmi, so can't really be reused
by another architecture. On those grounds, it seems better to fix the remaining
todos in tree. Failing that, feedback is till appreciated.

This patch doesn't connect it to the top level cmake so it can't break the
build for other plugins. Early adopters can of course add the subdirectory
in the meantime.

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptDec 11 2019, 6:26 PM
JonChesterfield edited the summary of this revision. (Show Details)Dec 11 2019, 6:27 PM
JonChesterfield marked an inline comment as done.Dec 11 2019, 6:31 PM
JonChesterfield added inline comments.
openmp/libomptarget/plugins/hsa/src/rtl.cpp
12

These headers are part of why it isn't enabled by default - I'm not totally sure which dev libraries can be assumed to be installed when building llvm. ffi.h and gelf.h may not be on the list, in which case extra cmake logic or dropping the dependencies would be needed.

I'll go over this later. Added information wrt. the dependences below.

openmp/libomptarget/plugins/hsa/src/rtl.cpp
12

ffi should already have CMAKE flag: FFI_INCLUDE_DIR and FFI_LIBRARY_DIR, the other one might also, there is an include of gelf.h in openmp/libomptarget/plugins/generic-elf-64bit/src/rtl.cpp (the only one in llvm).

JonChesterfield marked an inline comment as done.Dec 13 2019, 6:39 AM
JonChesterfield added inline comments.
openmp/libomptarget/plugins/hsa/src/rtl.cpp
12

That's good, thanks.

I think the detection logic in the cmake will avoid trying to build this on systems that don't have the atmi library available, in which case enabling this by default will be safe.

jdoerfert added inline comments.Dec 13 2019, 9:53 AM
openmp/libomptarget/plugins/hsa/CMakeLists.txt
134

What are all the reasons we need AOMP here and not the LLVM/Clang we ship this with?

openmp/libomptarget/plugins/hsa/src/rtl.cpp
12

I would try it on such a system if possible ;)

162

For the record, no need to fix this now: As with the team limit env variable (see below), I think this has to be handled differently in 5.1. We need to deal with arbitrary team sizes, if the hardware doesn't we need to have a software loop.

620

Temporary fixes are the most enduring ones usually.

655

Assuming this is the same for all targets, I would think this stuff should live in a plugins/helper file. I mean, a way to iterate over all offload entries. What do you think?

1180

I'm fairly certain we have to get rid of this EnvMaxTeamsDefault. Btw. I cannot find it in my llvm version.

1186

Can we have a macro for this if (print_kernel_trace > 1) { fprint ... } scheme? Less control flow will make this easier to read.

JonChesterfield marked 7 inline comments as done.Dec 13 2019, 10:42 AM

Nice feedback, thanks!

openmp/libomptarget/plugins/hsa/CMakeLists.txt
134

The fundamental reason is the library depends on an ATMI library which ships with aomp, and I don't have a great handle on under what conditions one can bundle runtime libraries with llvm.

Some incidental reasons are that aomp clang has patches related to making OpenMP work on amdgcn that aren't upsteam, e.g. some constant handling that is partly inlined here.

openmp/libomptarget/plugins/hsa/src/rtl.cpp
12

Yes, will definitely do so. I have an intel/nvidia system behind a ssh server that is currently misbehaving, but that's a transient problem.

162

Do you happen to know what arbitrary is in this context? UINT32_MAX, UINT64_MAX? It's hopefully documented in the spec.

620

Quite. @ronlieb the links here are dead, do you happen to know what they referred to?

655

Sure - there's probably a number of common patterns between this and some other plugins that should be extracted. This particular one looks OK to me - it's only just shy of iterators.

1180

Ah, nice catch. The aomp branch has an abstraction over nvptx and amdgcn constants which gets used by clang and this plugin. This is one of the pieces I've missed by not actually compiling against trunk. Will fix.

1186

Reluctant to go for a macro but agree with the sentiment. How do you feel about

int errprint(const char *fmt, ...) {
  int rc = 0;
  if (print_kernel_trace > 1) {
    va_list args;
    va_start(args, fmt);
    rc = vfprintf(stderr, fmt, args);
    va_end(args);
  }
  return rc;
}
jdoerfert added inline comments.Dec 13 2019, 11:19 AM
openmp/libomptarget/plugins/hsa/CMakeLists.txt
134

OK, but people that have ATMI have aomp, correct? I mean, either they can build upstream LLVM with amdgcn offloading and everything is there or they can't build the offloading part.

Long term, as you mentioned at some point earlier, we have to minimize such LLVM duplication ;)

openmp/libomptarget/plugins/hsa/src/rtl.cpp
162

The spec is generally vague on the types of expressions but anything that fits in a uint64_t should be acceptable for us.

655

I am mostly interested in the ones that have a cross dependence on the driver/linker/embedding part. Full refactoring might be a long term goal but nothing required right now.

1186

Sure. Macro, variadic function, variadic template (if this wasn't C), all fine with me.

I'd call it dbg_trace, kernel_trace, or sth similar, errprint sounds like it is an error once it is hit.

JonChesterfield marked an inline comment as done.Dec 17 2019, 5:16 AM

I think next steps from here are:

  • get the file building against trunk
  • work out what the requirements on ATMI are - I think ROCm and AOMP can use the same version, but haven't tested carefully enough to be sure yet
  • clean up the fprintf(stderr...) noise
  • try to simplify the cmake searching logic

I think next steps from here are:

  • get the file building against trunk

Most important (IMHO).

  • work out what the requirements on ATMI are - I think ROCm and AOMP can use the same version, but haven't tested carefully enough to be sure yet

Important and should be on the web page at some point (https://clang.llvm.org/docs/OpenMPSupport.html)

  • clean up the fprintf(stderr...) noise

Not too important but should also not take too long.

  • try to simplify the cmake searching logic

Not too important right now (IMHO).

JonChesterfield marked an inline comment as done.EditedDec 17 2019, 6:57 PM

A fprintf simplification is at https://github.com/ROCm-Developer-Tools/llvm-project/pull/63, I'll update this diff once that lands.

openmp/libomptarget/plugins/hsa/src/rtl.cpp
1180

OK, so this one isn't a constant - it's another hook for pulling a value from the environment. Cuda/rtl.cpp does a lot of this too.

Comparing the two, both use a class named RTLDeviceInfoTy, but with different fields and different implementations. The cuda and hsa implementations look broadly similar in general - same functions, doing roughly the same things - but there's a lot of differences in the detail. I don't think the code size savings from pulling out common sequences would be worth coupling the two together.

A fprintf simplification is at https://github.com/ROCm-Developer-Tools/llvm-project/pull/63, I'll update this diff once that lands.

In case it helps (performance) we can wrap the body of the function in an ifdef #OMP_TARGET_DEBUG (I forgot the actual macro).

openmp/libomptarget/plugins/hsa/src/rtl.cpp
1180

Fair. We need to make sure this builds though, so the members are there.

jdoerfert added inline comments.Jan 8 2020, 7:37 AM
openmp/libomptarget/plugins/hsa/src/rtl.cpp
49

Maybe enum { GV_Warp_Size = 64, GV_Default_WG_Size = 256 ,... } ?

sri added a subscriber: sri.Feb 12 2020, 11:11 AM
JonChesterfield abandoned this revision.Aug 11 2020, 8:51 AM