Page MenuHomePhabricator

Added ptxwrap utility to help incorporating PTX into host-side object file.
AbandonedPublic

Authored by tra on Mar 17 2015, 2:26 PM.

Details

Reviewers
eliben
echristo
Summary

Added ptxwrap utility to help incorporating PTX into host-side object file.

Device-side CUDA compilation produces a text file with PTX assembly in
it. In order for the GPU code to be usable, it must be passed to GPU
driver which would then JIT it for appropriate GPU hardware.

Currently we rely on CUDA runtime to launch kernels from the host
side. cudaLaunch() function uses host-side address of the kernel we
want to launch and expects corresponding GPU kernel to be registered
with CUDA runtime by the time kernel launch is attempted.

Before we can register kernels, we have to load GPU code which is
expected to be in 'fatbin' container.

ptxwrap takes a file with PTX assembly and encapsulates into 'fatbin'
container. If -fatbin flag is passed, it produces fatbin binary. If
-stub argument is passed (default) ptxwrap generates kernel
registration code which incorporates fatbin bits as a string, loads it
and registers all the kernels it finds in the PTX. The output can be
included into host-side compilation or can be compiled and linked with
separately.

Caveats: most fatbin parameters are currently hardcoded and were only
tested to work with CUDA-7.0 on sm_35 hardware.

Diff Detail

Event Timeline

tra updated this revision to Diff 22130.Mar 17 2015, 2:26 PM
tra retitled this revision from to Added ptxwrap utility to help incorporating PTX into host-side object file..
tra updated this object.
tra edited the test plan for this revision. (Show Details)
tra added reviewers: eliben, echristo.
tra added a subscriber: Unknown Object (MLST).
rnk added a subscriber: rnk.Mar 17 2015, 2:47 PM

What do you think about busy-boxing this functionality into the clang binary? That way we don't have to copy around more statically linked binaries.

rnk added a comment.Mar 17 2015, 2:51 PM

I meant to add that you can see how we've done this for -cc1 and -cc1as in ExecuteCC1Tool of clang/tools/driver/driver.cpp.

echristo edited edge metadata.Mar 17 2015, 2:54 PM

I think I'm with Reid here. This seems a little unfortunate. Alternately have it as an output option for the object file for nvptx?

-eric

eliben edited edge metadata.Mar 17 2015, 6:47 PM

Agreed with Reid & Eric but I want to prod further - why is this a separate executable at all? Seems kinda wasteful to fork a tool if we can have a library call?

I may be completely misguided if the driver logic is absolutely against such things, but I think it's at least interesting to consider.

tra added a comment.Mar 18 2015, 1:50 PM

After discussing various options with echristo@ we may have a better way of bundling PTX with the host code.
Relevant bits of ptxwrap will be incorporated into clang.

tra abandoned this revision.Mar 18 2015, 1:52 PM

Scrapped.