This is an archive of the discontinued LLVM Phabricator instance.

[SE] Pack global dev handle addresses
ClosedPublic

Authored by jhen on Sep 13 2016, 4:16 PM.

Details

Summary

We were packing global device memory handles in
PackedKernelArgumentArray, but as I was implementing the CUDA
platform, I realized that CUDA wants the address of the handle, not the
handle itself. So this patch switches to packing the address of the
handle.

Diff Detail

Repository
rL LLVM

Event Timeline

jhen updated this revision to Diff 71256.Sep 13 2016, 4:16 PM
jhen retitled this revision from to [SE] Pack global dev handle addresses.
jhen updated this object.
jhen added a reviewer: jlebar.
jhen added subscribers: parallel_libs-commits, jprice.
jlebar added inline comments.Sep 13 2016, 4:26 PM
streamexecutor/include/streamexecutor/DeviceMemory.h
170 ↗(On Diff #71256)

Same question as last patch: What happens if this guy is moved?

Specifically, would it be a problem if it were moved after calling thenLaunch but before the driver actually launches the kernel?

jhen updated this revision to Diff 71268.Sep 13 2016, 4:48 PM
  • Warning about kernel launch args
streamexecutor/include/streamexecutor/DeviceMemory.h
170 ↗(On Diff #71256)

Yes, it would be a problem if the argument was moved at the wrong time. This was a choice we made internally to reduce the kernel launch overhead. Apparently it could make up 5% of some applications' run-time.

In response to your question, I added a warning message to the Stream::thenLaunch method. This message tells users not to touch kernel launch arguments from other threads.

jlebar added inline comments.Sep 13 2016, 4:54 PM
streamexecutor/include/streamexecutor/DeviceMemory.h
170 ↗(On Diff #71268)

Clearly in the multithreaded case you can't modify a GlobalDeviceMemoryBase concurrently with a call to thenLaunch(). That's true of any parameters to any function call, so I am not sure that's even worth warning about.

What I was worried about was a single-threaded case, something like this:

GlobalDeviceMemory<int> other_mem;
{  
  GlobalDeviceMemory<int> mem;
  Stream->thenLaunch(foo_kernel, mem);
  other_mem = std::move(mem);
}
Stream->block();

Is this safe? That is, do we use &mem.Handle only within thenLaunch? (We don't actually have to have a separate scope and so on for us to hit this same problem.)

(I have to admit, if any of this is 5% of some applications' runtime, it seems like we could do a lot better even than what we have here. I'm not sure how, but hot code is hot...)

jhen updated this revision to Diff 71273.Sep 13 2016, 5:00 PM
  • Remove warning
streamexecutor/include/streamexecutor/DeviceMemory.h
170 ↗(On Diff #71268)

Oh, that's right. I think we spoke about this before. There should be no problem with your example, the platform should be responsible for copying the arguments (using memcpy) before returning from a launch call. This will match the guarantee currently provided by the CUDA driver library.

I removed the unnecessary warning.

jlebar accepted this revision.Sep 13 2016, 5:01 PM
jlebar edited edge metadata.

I think we spoke about this before.

Oh, I remember this now. :) Okay, sgtm.

This revision is now accepted and ready to land.Sep 13 2016, 5:01 PM
This revision was automatically updated to reflect the committed changes.