This is an archive of the discontinued LLVM Phabricator instance.

Factor architecture dependent code out of loop.cu
ClosedPublic

Authored by JonChesterfield on Aug 6 2019, 5:14 PM.

Details

Summary

[libomptarget] Factor architecture dependent code out of loop.cu

Related to the patch series starting D64217. Added subscribers to said series as reviewers. This effort is smaller in scope.

This patch factors out just enough architecture dependent code from loop.cu to allow the same source to be used with amdgcn, given a different target_impl.h. Testing is that the same bitcode (modulo variable names) is generated for libomptarget before and after the refactor, for nvptx and the out of tree amdgcn.

Event Timeline

JonChesterfield created this revision.Aug 6 2019, 5:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2019, 5:14 PM
JonChesterfield edited the summary of this revision. (Show Details)Aug 6 2019, 5:15 PM
JonChesterfield marked 2 inline comments as done.Aug 6 2019, 5:45 PM

Couple of comments from me inline.

This is working from the branch at https://github.com/ROCm-Developer-Tools/llvm-project. I'm hoping to move the openmp repo incrementally towards a point where it makes few enough nvptx-specific assumptions that adding the amdgcn target only involves a different version of target_impl.h and a few lines of CMake. Currently our repo has six identical files, fourteen different between the src directories. I've tried to pick a representative starting point with loop.cu.

Feedback very welcome.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
20

I would prefer to have declarations in this file and implementations in target_impl.cu. That works for amdgcn but the CMake for nvptx doesn't allow these to be inlined across translation units. This has the advantage that the bitcode library is unchanged. __inline__ was not sufficient for that with nvptx.

32

Several differences between nvptx and amdgcn follow from warp size. The wrapper around __ffs allows the source to call an overloaded function instead of #ifdef between __ffs and __ffsll. __SHFL_SYNC is currently defined in omptarget-nvptx.hand similarly needs different implementations, deferred for a future diff.

  • drop omptarget-nvptx include

I would suggest at first to come to an agreement on the design of this reworked library at first.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
17–18

Better to use original INLINE macro defined in the project rather than to define the new one.

19

Why pointers? Use references.

32

uitn32_t->__kmpc_impl_lanemask_t?

  • address review comments
JonChesterfield marked 4 inline comments as done.Aug 7 2019, 7:37 AM

I would suggest at first to come to an agreement on the design of this reworked library at first.

Part of the motivation behind this change is that smaller diffs are easier to discuss. Hopefully this contributes to reaching said agreement. I think moving inline nvptx behind an interface is prerequisite for any movement towards sharing code between architectures.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
17–18

I'd prefer that too, but INLINE maps to __inline__, rather than __forceinline__, and that leaves calls to these functions in the bitcode library for nvptx.

19

Habit. References look like pass by value at the call site so I tend to write out parameters as pointers. Changed.

ABataev added inline comments.Aug 7 2019, 7:42 AM
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
17–18

Then better to fix original INLINE macro and replace __inline__ with __forceinline__. I assume we'd like to inline all the functions.

JonChesterfield marked 2 inline comments as done.Aug 7 2019, 7:55 AM
JonChesterfield added inline comments.
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
17–18

That works for me. I suspect everything marked INLINE was intended to be inlined. Diff at D65876.

  • drop omptarget-nvptx include
  • address review comments
  • rebase, use INLINE from D65876

I'm fine with this, @ABataev ?

openmp/libomptarget/deviceRTLs/nvptx/src/loop.cu
392

I would prefer __SHFL_SYNC __ACTIVEMASK etc. also to be function calls to __kmpc_XXXX functions but I won't require it.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
17–18

We can even have different definitions of INLINE if that becomes necessary.

40

int -> (u)int32_t ?

JonChesterfield marked 2 inline comments as done.Aug 7 2019, 1:00 PM
JonChesterfield added inline comments.
openmp/libomptarget/deviceRTLs/nvptx/src/loop.cu
392

Likewise. They're currently defined in omptarget-nvptx.h. I'd like to move them into target_impl and replace the macros with inline functions.

That'll raise the question of how to handle implementations for different cuda versions, which I'd like to avoid for this first patch.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
40

The cuda functions (plus __ffsll etc) return int. I'd slightly prefer uint32_t (on the basis that these can't return a negative integer). I'm happy either way.

jdoerfert added inline comments.Aug 7 2019, 1:40 PM
openmp/libomptarget/deviceRTLs/nvptx/src/loop.cu
392

Agreed, let's do it later but eventually.

We can have the #ifdef cascade for cuda versions but that should be in the cuda subfolder (target_impl.h for example) and the "general logic" contains proper calls.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
40

I think we should make it clear what we expect wrt. bit-width if we have an expectation. Given the (u)int32 floating around I'd say we do have some expectations/restrictions.

JonChesterfield marked an inline comment as done.Aug 8 2019, 3:15 AM
JonChesterfield added inline comments.
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
40

I think unsigned is clearer for bit manipulation functions. Width is mostly determined by warp/wavefront width, at least for these functions. Return width, inclined to follow IR intrinsics and match the argument.

There's a mix of signed and unsigned in the existing code, with implicit conversions between them. Amdgcn happens to use a slightly different mix. Considering that NextIter returns a uint64_t, what would you think of using uint32/64 for the functions in loop as well as in this header?

@ABataev, others, any concerns? If not, let's go ahead with this.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
40

what would you think of using uint32/64 for the functions in loop as well as in this header?

I would prefer that. We might need a device specific type but (unsigned) int does make me worry every time.

Let's table this discussion for now to make progress on this.

I like the general approach,

@ABataev, others, any concerns? If not, let's go ahead with this.

Did you come to an agreement about the design of the new universal library? I would suggest starting with this.
We need to find a better way to files layout, design of the target-specific functions (template class with the specialization implementation for each particular target or just good old plain set of target-specific functions, controlled by the condition compilation), etc.
Then, I would suggest committing this new structure at first.

@ABataev, others, any concerns? If not, let's go ahead with this.

Did you come to an agreement about the design of the new universal library? I would suggest starting with this.

We have a proposal and no major complaints, I count that as agreement.

We need to find a better way to files layout,

See above.

design of the target-specific functions (template class with the specialization implementation for each particular target or just good old plain set of target-specific functions, controlled by the condition compilation), etc.

The design chosen here seems fine to me. Others didn't disagree. It is for sure a step in the right direction.

Then, I would suggest committing this new structure at first.

I think extracting the code makes more sense first. Also mentioned in the proposal and not objected.
The problem is we cannot really restructure as it is still interleaved with target dependent code.

@ABataev, others, any concerns? If not, let's go ahead with this.

Did you come to an agreement about the design of the new universal library? I would suggest starting with this.

We have a proposal and no major complaints, I count that as agreement.

I would suggest to ping one more time, maybe someone just missed it.

We need to find a better way to files layout,

See above.

design of the target-specific functions (template class with the specialization implementation for each particular target or just good old plain set of target-specific functions, controlled by the condition compilation), etc.

The design chosen here seems fine to me. Others didn't disagree. It is for sure a step in the right direction.

Then, I would suggest committing this new structure at first.

I think extracting the code makes more sense first. Also mentioned in the proposal and not objected.
The problem is we cannot really restructure as it is still interleaved with target dependent code.

JonChesterfield added a comment.EditedAug 8 2019, 9:35 AM

I'm wary of refactoring to support multiple architectures while there is only one in tree. It's too difficult to see where the abstractions should be, and it's usually easier to introduce said abstractions than to move them later.

^ obviously I'm proposing a refactor despite that - hoping to minimise risk by keeping the change small and the abstraction very thin.

Ping. I'd like to land this - Alexey?

jdoerfert accepted this revision.Aug 13 2019, 10:10 AM

No complaints on the list, none here. LGTM.

This revision is now accepted and ready to land.Aug 13 2019, 10:10 AM

What about namong convention here? Shall we use capital letters for the var names or it is fine as is? Also, did you come to an agreement about design, directory layout etc.?

Naming convention

What about namong convention here? Shall we use capital letters for the var names or it is fine as is?

Long term it's probably worth moving to the LLVM conventions throughout. Short term, I'd rather keep roughly the same style as the surrounding code. I believe that's what this patch does.

Also, did you come to an agreement about design, directory layout etc.?

Overall agreement on the final design is a work in progress. I believe there is agreement that we'll need at least one target specific header, so this patch follows the suggestion in D64217 and calls it target_impl.

At the moment, nvptx/src is the most sensible directory for it (as that's the only directory!). I think this is an uncontentious step in the right direction.

ABataev accepted this revision.Aug 13 2019, 12:03 PM

What about namong convention here? Shall we use capital letters for the var names or it is fine as is?

Long term it's probably worth moving to the LLVM conventions throughout. Short term, I'd rather keep roughly the same style as the surrounding code. I believe that's what this patch does.

Also, did you come to an agreement about design, directory layout etc.?

Overall agreement on the final design is a work in progress. I believe there is agreement that we'll need at least one target specific header, so this patch follows the suggestion in D64217 and calls it target_impl.

At the moment, nvptx/src is the most sensible directory for it (as that's the only directory!). I think this is an uncontentious step in the right direction.

Ok, LG

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2019, 2:41 PM

Jon, please subscribe to openmp-commits so that commit emails get through immediately. Thanks!

Jon, please subscribe to openmp-commits so that commit emails get through immediately. Thanks!

Ah, so that's why I'm getting the rejection emails. Thanks. Subscribed.

The bounce message recommends emailing openmp-commits-owner@lists.llvm.org. It should probably mention subscribing to commits as well.