This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Refined CUDA plugin to put all CUDA operations into class
ClosedPublic

Authored by tianshilei1992 on Apr 11 2020, 11:20 AM.

Details

Summary

Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style.

Diff Detail

Event Timeline

tianshilei1992 edited the summary of this revision. (Show Details)Apr 11 2020, 11:23 AM

Removed useless headers

First set of comments. We need to split this. I suggested some split points and follow ups.

openmp/libomptarget/include/omptarget.h
21 ↗(On Diff #256788)

Split this off, LGTM on this part

openmp/libomptarget/plugins/cuda/src/rtl.cpp
67

I imagine the above is all clang-formated, if so split it off, LGTM on that part. (keep the namsepace as part of the other patch)

85

I don't know where this should go right now but a global list sounds wrong. Follow up.

87

The above is defined in a different header. We should not redefine it here but include the appropriate header. Follow up.

114
  1. Can we replace the list with vectors please. We do not add or delete elements once created.
  2. Can we make it a template class parametric in the Context type. e.g., struct NVIDIADeviceDataTy : public DeviceDataTy<CUcontext> {}.
  3. These should all be unsigned types, right? We also should add explicit documentation for each. I mean NumThreads is some maximum I guess.
  4. These types need to go into a common header (see also 2)). Follow up though
163–164

These three lines can be removed if you move the getenv stuff into init and call resizeStreamPool there.

248

Once we have the common header, the StreamManager interface has to go there as well. We need generic functions for stream create/destory and set context can be done via a DeviceDataTy method.

278–279

Nit: It or better Entry

292

(Off topic: Why do we have a table member that needs to be updated when we use it, that seems wrong.)

336–337

No braces around return also above.

343

Nice

355

Style: I'd move the getenv calls into the condition if (const ... = getenv)

892

Nit: LLVM style "generally" avoids trailing comments. Maybe inline it: , /* EM_CUDA */ 190

1032

Add comments for the constants or define hem as before. /* ThreadNum */ or something is fine.

1035

I like the validation in the user facing functions. Makes the assertion location better too.

tianshilei1992 added inline comments.Apr 11 2020, 7:23 PM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
67

Okay, got your point.

85

This is what it is originally. Basically I didn't change any logic in this patch. From its usage, looks like it is only for storage. Items will only be emplace_back to it and then get the pointer. That's it.

114
  1. Can we replace the list with vectors please. We do not add or delete elements once created.

Yes, we can do that.

  1. Can we make it a template class parametric in the Context type. e.g., struct NVIDIADeviceDataTy : public DeviceDataTy<CUcontext> {}.

I don't think so. Now we only know that CUDA has this context thing. We have no idea whether other platforms do. Other things like warp as well.

  1. These should all be unsigned types, right? We also should add explicit documentation for each. I mean NumThreads is some maximum I guess.

Yeah, they should be. Will change them correspondingly.

  1. These types need to go into a common header (see also 2)). Follow up though.

We should only put the most common part into a common header, but it seems that we only have three known common members: FuncGblEntries, NumTeams, and NumThreads.

163–164

Correct. Actually I would leave the getenv stuff here because the env has already been determined during the library is initialized. I mean, the global variable is constructed.

292

Well, that is what it is. We could probably think how to improve it later.

355

But here EnvStr are used twice...You would like to have two local variables? :-)

1035

You mean take the validation from isValidDeviceId to this function?

Update patch accordingly

tianshilei1992 marked 9 inline comments as done.Apr 11 2020, 7:31 PM

Use back omptarget_device_environmentTy to align with deviceRTLs

jdoerfert added inline comments.Apr 12 2020, 12:33 AM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
85

I figured it was not you, just that it seems weird. I should have made that clearer.

114

If we need a "common/gpu" header we can do that as well. AMD will at least look similar enough to reuse the same code. They have some "context" even if it is a class we define in the AMD headers. You can even go as far and make

common/Device.h with the entries you mentioned and then common/GPUDevice.h with the template class template<typename CtxTy> GPUDeviceDataTy : public DeviceDataTy.

163–164

OK

355

If you do what I described the lifetime is limited to the conditional. There will be two variables, distinct but with the same name. That is for different reasons preferable, e.g., the variable is not available outside the conditional and it is not reused. Overall it is less complex.

1035

I mean it is good to have the asserts here as that is where user provide input which we can validate. I'd leave it as it is in this version.

tianshilei1992 marked 13 inline comments as done.Apr 12 2020, 10:25 AM
tianshilei1992 added inline comments.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
114

This part is worth a new patch. Let's do it in next step.

Updated patch accordingly

jdoerfert accepted this revision.Apr 12 2020, 5:21 PM

I assume you tested this locally and it works as expected? If so, I think we can go ahead. One comment to address now below.

openmp/libomptarget/plugins/cuda/src/rtl.cpp
247

We check already in resizeStreamPool. Just return if it worked or not there.

870–871

You can make checkResult variadic in a separate commit to reuse it here. It could even be a macro if that makes it easier.

This revision is now accepted and ready to land.Apr 12 2020, 5:21 PM
tianshilei1992 marked 2 inline comments as done.Apr 12 2020, 5:45 PM
tianshilei1992 added inline comments.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
247

Actually the failure in resizeStreamPool will not impact its return. Those checks would only print out some messages but not aborting the program. Actually, I'm very not sure whether we need to abort the whole program as long as one CUDA operation returns error. From my point of view, I think we should. WDYT?

870–871

Actually I'm thinking to improve it. Will do in another patch.

tianshilei1992 marked an inline comment as done.

Fixed a tiny compilation error

I assume you tested this locally and it works as expected? If so, I think we can go ahead. One comment to address now below.

Yep. I tested all unit tests and two benchmarks. Worked as expected.

This revision was automatically updated to reflect the committed changes.