This is an archive of the discontinued LLVM Phabricator instance.

Introduce an interface for target_impl that supports default implementations
AbandonedPublic

Authored by JonChesterfield on Oct 1 2019, 5:37 PM.

Details

Summary

Introduce an interface for target_impl that supports default implementations

Design goals:

  • No runtime indirection
  • Permit header-only implementations for inlining under nvcc
  • Permit default implementations
  • Catch various errors at compile time
  • Syntactically reasonable
  • Familiar to C++ developers

The API is an adaption of the curiously recuring template pattern, modified to
use static calls throughout. This gives the impl::Bits::pack syntax that matches
free functions in a namespace, which is essentially what the static functions in
a non-template class are.

Marking methods as = delete provides better diagnostics than missing symbols
at link time. The friend/private annotations are analogous to the non-virtual
interface of runtime dispatch.

Event Timeline

JonChesterfield created this revision.Oct 1 2019, 5:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2019, 5:37 PM

nvptx/src/target_impl.h currently defines an implicit interface comprised of various free functions that abstracts over low level aspects of nvptx. I'd prefer a more explicit interface before introducing another target.

This is a big design space. I think this diff is a reasonable point, but failing that it gives us a starting point for discussion.

Could you motivate this is more? The template solution is for sure more complex than the old proposal/solution and still requires the same cmake tricks so I'd like to hear why it is worth it.

I'll also go through the bullet points you have (in order) and leave some feedback:

  • I don't see why there should be "runtime indirection" if you don't use the template stuff.
  • I also do not see why the "old design" would not allow header only (it was actually proposed that way after all).
  • I get the default implementations but (1) I doubt there are many (where do you expect default impls that are in the target_impl part?) and (2) we could deal with them with different headers, the same way as I proposed to work with different targets.
  • Unclear why declarations would not do the same, actually declarations would probably catch more than templates do.
  • Unsure what "syntactically reasonable" means.
  • Familiar to C++ developers, is not a good argument when the C++ stuff actually doesn't solve the problem (still cmake magic involved)
openmp/libomptarget/deviceRTLs/nvptx/src/target_api.h
34

It is odd to provide a default for pack but not for unpack, when would this ever be useful?

It also seems overly complicated to redirect here in the first place. Subclasses could as well provide pack/unpack directly, couldn't thy?

JonChesterfield marked an inline comment as done.Oct 2 2019, 3:24 AM

Could you motivate this is more? The template solution is for sure more complex than the old proposal/solution and still requires the same cmake tricks so I'd like to hear why it is worth it.

Thanks! If the end result of the review is we stick with free functions that's fine with me. I can imagine using the types to move some complexity out of cmake but haven't thought that through. I think the primary drawbacks to the current system are relying on cmake to work with multiple architectures and the requirement to implement every function for every architecture.

I'd like the target_impl layer to have default implementations where they're meaningful. It would mean a target can introduce a new function, along with a default, and every other target would continue to work as before, without changing any of the other target code.

  • popc can be implemented as a call to __builtin_popcount if one doesn't want to call the cuda function
  • Lanemask, smid probably don't have good defaults
  • Fences might do, in that localised ones could call through to more global ones
  • The dozen or so atomic operations used in deviceRTL could all have default implementations that call the standard C++ functions, while using the cuda calls when preferred

It's worth noting that the current list of free functions hits most of the bullet points in the diff, except that I can't see a way to provide compile time default implementations without some extra plumbing. Above template is a suggestion for said plumbing.

I'll also go through the bullet points you have (in order) and leave some feedback:

  • Thanks! I'll try the same format.
  • I don't see why there should be "runtime indirection" if you don't use the template stuff.
    • The most popular interface scheme in C++ involves virtual functions (usually with heap allocation). I don't trust devirtualisation, and eliding the heap allocation is messy.
  • I also do not see why the "old design" would not allow header only (it was actually proposed that way after all).
  • - It does. My preferred option is declarations in a header and implementations at link time, but that degraded codegen under nvcc.
  • I get the default implementations but (1) I doubt there are many (where do you expect default impls that are in the target_impl part?) and (2) we could deal with them with different headers, the same way as I proposed to work with different targets.
    • (1) All the bit level functions, all atomics. Partly papering over cuda as opposed to nvptx, but they're closely related. (2) is interesting - deferred to below
  • Unclear why declarations would not do the same, actually declarations would probably catch more than templates do.
    • I think the crtp approach moves some errors from link time to compile time. Mostly though I was referring to the private constructor / friend stuff - trying to make the interface harder to implement wrong. The = delete syntax was nice in that regard.
  • Unsure what "syntactically reasonable" means.
    • Mostly that the interface shouldn't be woven out of macros and code generators
  • Familiar to C++ developers, is not a good argument when the C++ stuff actually doesn't solve the problem (still cmake magic involved)
    • Not so much that the code should look like C++, more that the code shouldn't look totally alien to C++. E.g. there could be variable fields that map to arbitrary function calls, theadIdx.x style, but that would be a bad thing.

Above you said

we could deal with them with different headers, the same way as I proposed to work with different targets.

Please could you expand on that? I think the current multitarget plan is a common folder containing as much code as we can manage, with a target_impl.h in each target, where #include paths are set by cmake to look in the target subdir when compiling things in the common subdir. I can see a path to default functions that involves a separate header per function, where the existence of files on disk and some cmake determines which set are pulled into an aggregate header. That's not necessarily what you have in mind though.

openmp/libomptarget/deviceRTLs/nvptx/src/target_api.h
34

Sure. We'd either want defaults for both, or defaults for neither until a target is introduced that would use the defaults. Providing one of each is good for discussion but probably not how it should be committed.

The redirect costs some code in the base class but none in the subclass (well, other than the Impl suffix). I like the separation between the interface to the client and the interface to the target, but sure - there's lots of ways to wire things up.

Above you said

we could deal with them with different headers, the same way as I proposed to work with different targets.

Please could you expand on that? I think the current multitarget plan is a common folder containing as much code as we can manage, with a target_impl.h in each target, where #include paths are set by cmake to look in the target subdir when compiling things in the common subdir. I can see a path to default functions that involves a separate header per function, where the existence of files on disk and some cmake determines which set are pulled into an aggregate header. That's not necessarily what you have in mind though.

A header file per function, or set of function that belong together, was what I meant. I'm still unsure how much "target specific code" we want to provide as a default without it becoming completely target independent. If there is code to be shared now, I mean with hopefully soon two targets, we could just call it common code. Once we get to the situation with >2 targets and not all but some share some code, we can reevaluate and determine the best solution for the actual use case at hand. I'm not strictly against templates or overloading but these do not solve all problems but basically only the ones we do not face yet. Designing something for a future use case is generally to be avoided (IMO).

A header file per function, or set of function that belong together, was what I meant.

Cool. So something like kmpc_atomics.h in common, that can be #included or can be ignored based on the target's desires.

I'm still unsure how much "target specific code" we want to provide as a default without it becoming completely target independent.

The "target specific" idea can be refined a little. I think we have the following categories:

  • Operations that have to be done in asm or with target builtins (e.g. lanemask)
  • Operations that can be done in C, but the target wants to use target builtins or asm anyway (e.g. pack)
  • Operations that are done in C, but different architectures want different C (don't have any yet)

If there is code to be shared now, I mean with hopefully soon two targets, we could just call it common code. Once we get to the situation with >2 targets and not all but some share some code, we can reevaluate and determine the best solution for the actual use case at hand. I'm not strictly against templates or overloading but these do not solve all problems but basically only the ones we do not face yet. Designing something for a future use case is generally to be avoided (IMO).

That's fair. The 'default' I have in mind is essentially what amdgcn uses, as it could be used by other architectures without changes. However until such point as a third architecture is imminent (and indeed we don't have two yet), it's difficult to reliably distinguish common from target specific.

I'll leave this diff open for a while to see if it attracts more comments.

JonChesterfield abandoned this revision.Oct 24 2019, 11:33 PM