This is an archive of the discontinued LLVM Phabricator instance.

[SE] Cache CUDA modules
AbandonedPublic

Authored by jhen on Sep 15 2016, 12:42 PM.

Details

Reviewers
jlebar
Summary

Instead of reloading a module if the same kernel is requested multiple times,
cache the loaded module and return the cached value.

The CUDAPlatformDevice now also keeps handles to all its modules so they can be
unloaded if the device is cleared.

Diff Detail

Event Timeline

jhen updated this revision to Diff 71536.Sep 15 2016, 12:42 PM
jhen retitled this revision from to [SE] Cache CUDA modules.
jhen updated this object.
jhen added a reviewer: jlebar.
jhen added subscribers: parallel_libs-commits, jprice.
jlebar added inline comments.Sep 15 2016, 1:22 PM
streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
130

Hm. This makes a copy of "Code" in the map. And also, every time we do a lookup, we're going to have to compare the whole PTX strings. Which are potentially very long.

Is there no other identifier we could use as the map key?

jhen added inline comments.Sep 15 2016, 2:10 PM
streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
130

Unfortunately, I don't think there is currently any other identifier that won't ever lead to false matches. Using the whole string as the key is what the original developers did because they couldn't find a better solution, so I'm mostly just following their lead here.

There are some things we could do with randomly generated UUIDs that would work for all practical purposes, but I don't want to worry about correctly generating UUIDs.

I also have the following idea that seems a little complex. Does it seem too complex to you?:

Use a static integer with atomic increments (or mutex or whatever) to give a unique ID to each MultiKernelLoaderSpec instance in the process and then have each instance assign a unique ID to each piece of code that is registered with it. The pair of MultiKernelLoaderSpec ID and code ID will uniquely identify a piece of code and can be used as a key in the module cache.

jhen abandoned this revision.Oct 17 2016, 12:46 PM

We've decided to come at this problem from a different angle, so I'm abandoning this revision.