This is an archive of the discontinued LLVM Phabricator instance.

[StreamExecutor] Add KernelLoaderSpec
ClosedPublic

Authored by jhen on Aug 1 2016, 3:02 PM.

Details

Summary

Add definitions for the KernelLoaderSpec and MultiKernelLoaderSpec
classes to StreamExecutor. Instances of these classes are generated by the
compiler in order to provide host code with a handle to device code.

Diff Detail

Event Timeline

jhen updated this revision to Diff 66390.Aug 1 2016, 3:02 PM
jhen retitled this revision from to [StreamExecutor] Add KernelLoaderSpec.
jhen updated this object.
jhen added reviewers: jlebar, tra.
jhen added a subscriber: parallel_libs-commits.
jlebar added inline comments.Aug 1 2016, 5:00 PM
streamexecutor/include/streamexecutor/KernelSpec.h
12

Nit: "A Foo object" or "Foo is a class".

13

suggest s/single/particular/

16

Oh, hm, I see why you said "A Foo class". I think maybe you want to say "FileLoaderSpec is the base class for types that know where to find the code."

29

I think this would be much easier to read if it were in second person. As-is the subject of the first phrase is MultiKernelLoaderSpec, which isn't right.

43

Would it be helpful to see how this function is implemented?

51

UserKernelLoaderSpec()?

52

This won't compile because MultiKernelLoaderSpec doesn't have this function.

92

I'm not sure we need this comment; it's pretty clear.

99

operator= usually returns the KernelLoaderSpec&. I think it will work as you've written it here [1], but maybe changing it will let others avoid having to look it up.

[1] http://en.cppreference.com/w/cpp/language/copy_assignment

102

Maybe "resides in memory as a null-terminated string."

103

Should this have "Spec" in the name?

107

A tuple as opposed to std::pair? pair also defines operator<, and seems simpler.

122

I don't really understand this -- we're only giving one PTX string, so of course it's going to be the default?

126

Since PTX source code usually (always?) has a directive at the beginning specifying the compute capability, it doesn't seem unreasonable for someone to assume that if they call this constructor, we will infer the CC from the given string.

It also seems weird that this constructor is the only way to specify a minimum-CC PTX string.

I wonder if we we need this constructor at all? If we had a public way to get the minimum CC, then we could just have the one constructor?

135

I wonder if it would make more sense to put the name first.

135

Using an initializer_list as opposed to, say, a vector, seems kind of limiting, since you can't dynamically construct an initializer_list.

(llvm::ArrayRef is really what we want, but I don't know if you want to force the compiler to pull that in.)

150

Since PTX is backwards-compatible, we want to return the PTX with the largest version <= the given compute-capability, right? Not clear from the comment if that's what we're doing.

155

Can we write this as (cc_major, cc_minor) and (2, 0), etc?

158

Not sure this comment is necessary here. It's kind of obvious, but if necessary maybe we can place it next to the code that uses this assumption?

178

Do you think it would make sense to take the size of the cubin data? I guess we don't ever need it so it doesn't matter.

Maybe I was mislead by the comments saying that we don't make a copy -- clearly we can't if we don't even know the length of the thing!

188

It seems kind of inconsistent that the PTX spec encompasses multiple PTX strings compiled for different archs, while the cubin one doesn't.

Unless you mean s/cubin/fatbin/? AIUI a cubin encompasses SASS code for a particular sm, while a fatbin is a combination of one or more cubins and ptx strings.

190

If this is null-terminated, maybe we should say that?

205

It looks like we do make a copy? That seems inconsistent though.

223

It seems like we could at least assert() that they have the same name?

264

extern "C" doesn't need a scope:

extern "C" void foo()

so maybe this is better said as 'if it is not declared extern "C".'

284

Same comments as on the constructors for those types.

294

Not sure these comments are necessary; seems obvious.

streamexecutor/lib/KernelSpec.cpp
28

Seems obvious from the code?

72

For chaining purposes it's more common to return *this (see e.g. operator=).

jhen updated this revision to Diff 66559.Aug 2 2016, 2:30 PM
jhen marked 28 inline comments as done.
  • Responding to comments from jlebar
streamexecutor/include/streamexecutor/KernelSpec.h
29

I didn't switch to second person, but I do think I cleaned up the language to make it much more understandable.

43

Oops, I shouldn't have put the parens here. This is supposed to be instantiating a MultiKernelLoaderSpec called UserKernelLoaderSpec.

51

I think the confusion here is the same as the point above. UserKernelLoaderSpec is a class instance, not a function to create a class instance.

52

I'm not sure what's causing the confusion here. It looks to me like I did declare MultiKernelLoaderSpec::addCUDACubinInMemory. Is that the function you are talking about?

122

Yes, this whole method doesn't really make sense. I removed it. I also removed the method for returning the "default" code. We can add that in later if we have a use case, but it's not needed for now.

126

We could add code to make this interface safer at the expense of more complexity, but I think the original design is the right trade-off because this is only meant to be called by the compiler.

150

I added to the comment to explain that there is no fallback logic in this function. If the exact compute capability is not found, it returns an error value. If we need that other functionality, we can add in another method later, but this is the only functionality that is present right now.

158

I removed the comment because the code that used it is gone now.

178

I don't think it would be good to take the size because we would never use it. I'm not sure what's best to put in the comment to reduce confusion. If the user thought it was a null-terminated string (which it isn't, but they might think it was), then they might be imagining that a copy is made even though no length is provided.

188

Good catch! What this code has been calling "cubin" is really "fatbin". That's how it is treated when it is loaded. So I changed the names to fatbin. Now they are accurate.

205

Oops, thanks, I missed that. I removed the copy to make it consistent with the others.

jhen added a comment.Aug 2 2016, 2:31 PM

In addition to responding to jlebar's comments, I also added unit tests.

jlebar added inline comments.Aug 2 2016, 2:55 PM
streamexecutor/include/streamexecutor/KernelSpec.h
29

lgtm

52

I have no idea what that comment was about, sorry. I think I was just very confused by that most vexing parse.

106

Not sure we need this line anymore.

120

Here and elsewhere, I think we should change "holds a reference to". For example, if I have a std::shared_ptr<Foo>, I "hold a (strong) reference to Foo", which is the opposite of what you mean.

Maybe just "Does not take ownership of the PTXCode pointers in SpecList's elements." or something would make it clear what we're going after.

126

I think the original design is the right trade-off because this is only meant to be called by the compiler.

sgtm

144

It's not a null-terminated string, I think. I thought it was at first, but I think I was being mislead by "char" -- it's more likely that the first 4 bytes are the length (or something).

178

Hm...what if we used char* for text strings and uint8_t* for byte data?

streamexecutor/lib/unittests/KernelSpecTest.cpp
26

ASSERT has an implicit "return", so we usually call EXPECT unless the test is going to crash if the condition fails.

74

EXPECT_DEBUG_DEATH and you don't need the ifdefs. Same below.

jhen updated this revision to Diff 66596.Aug 2 2016, 4:35 PM
jhen marked 8 inline comments as done.
  • Respond to jlebar's comments, round 2
streamexecutor/include/streamexecutor/KernelSpec.h
144

Oops. I just wrote that comment in the wrong place. You're right that the char types are making things confusing.

178

Yes, that is much better. How do you feel about void* rather than uint8_t? I think that would be good because the data is a formless blob.

streamexecutor/lib/unittests/KernelSpecTest.cpp
74

Ooooooo, that's nice. Thanks!

jlebar accepted this revision.Aug 2 2016, 4:39 PM
jlebar edited edge metadata.

All right!

streamexecutor/include/streamexecutor/KernelSpec.h
180

Even better.

This revision is now accepted and ready to land.Aug 2 2016, 4:39 PM
This revision was automatically updated to reflect the committed changes.