Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style.
First set of comments. We need to split this. I suggested some split points and follow ups.
|21 ↗||(On Diff #256788)|
Split this off, LGTM on this part
I imagine the above is all clang-formated, if so split it off, LGTM on that part. (keep the namsepace as part of the other patch)
I don't know where this should go right now but a global list sounds wrong. Follow up.
The above is defined in a different header. We should not redefine it here but include the appropriate header. Follow up.
These three lines can be removed if you move the getenv stuff into init and call resizeStreamPool there.
Once we have the common header, the StreamManager interface has to go there as well. We need generic functions for stream create/destory and set context can be done via a DeviceDataTy method.
Nit: It or better Entry
(Off topic: Why do we have a table member that needs to be updated when we use it, that seems wrong.)
No braces around return also above.
Style: I'd move the getenv calls into the condition if (const ... = getenv)
Nit: LLVM style "generally" avoids trailing comments. Maybe inline it: , /* EM_CUDA */ 190
Add comments for the constants or define hem as before. /* ThreadNum */ or something is fine.
I like the validation in the user facing functions. Makes the assertion location better too.
Okay, got your point.
This is what it is originally. Basically I didn't change any logic in this patch. From its usage, looks like it is only for storage. Items will only be emplace_back to it and then get the pointer. That's it.
Yes, we can do that.
I don't think so. Now we only know that CUDA has this context thing. We have no idea whether other platforms do. Other things like warp as well.
Yeah, they should be. Will change them correspondingly.
We should only put the most common part into a common header, but it seems that we only have three known common members: FuncGblEntries, NumTeams, and NumThreads.
Correct. Actually I would leave the getenv stuff here because the env has already been determined during the library is initialized. I mean, the global variable is constructed.
Well, that is what it is. We could probably think how to improve it later.
But here EnvStr are used twice...You would like to have two local variables? :-)
You mean take the validation from isValidDeviceId to this function?
I figured it was not you, just that it seems weird. I should have made that clearer.
If we need a "common/gpu" header we can do that as well. AMD will at least look similar enough to reuse the same code. They have some "context" even if it is a class we define in the AMD headers. You can even go as far and make
common/Device.h with the entries you mentioned and then common/GPUDevice.h with the template class template<typename CtxTy> GPUDeviceDataTy : public DeviceDataTy.
If you do what I described the lifetime is limited to the conditional. There will be two variables, distinct but with the same name. That is for different reasons preferable, e.g., the variable is not available outside the conditional and it is not reused. Overall it is less complex.
I mean it is good to have the asserts here as that is where user provide input which we can validate. I'd leave it as it is in this version.
I assume you tested this locally and it works as expected? If so, I think we can go ahead. One comment to address now below.
We check already in resizeStreamPool. Just return if it worked or not there.
You can make checkResult variadic in a separate commit to reuse it here. It could even be a macro if that makes it easier.
Actually the failure in resizeStreamPool will not impact its return. Those checks would only print out some messages but not aborting the program. Actually, I'm very not sure whether we need to abort the whole program as long as one CUDA operation returns error. From my point of view, I think we should. WDYT?
Actually I'm thinking to improve it. Will do in another patch.