This is an archive of the discontinued LLVM Phabricator instance.

[cuda] Driver changes to build and stitch together host and device-side CUDA code.
ClosedPublic

Authored by tra on May 5 2015, 3:58 PM.

Details

Reviewers
echristo
Summary

The changes were extracted from D8463 to separate driver part from the rest.

  • Changed driver pipeline to compile host and device sides of CUDA files and pack device-side results into host object file.
  • Added tests for cuda pipeline creation in driver.

Differential Revision: http://reviews.llvm.org/D8463

Diff Detail

Event Timeline

tra updated this revision to Diff 24991.May 5 2015, 3:58 PM
tra retitled this revision from to [cuda] Driver changes to build and stitch together host and device-side CUDA code..
tra updated this object.
tra edited the test plan for this revision. (Show Details)
tra added a reviewer: echristo.
tra added a subscriber: Unknown Object (MLST).
echristo edited edge metadata.May 6 2015, 11:36 AM

Mostly requests for comments etc.

This is terrible, I just don't have any better ideas. :)

-eric

include/clang/Driver/Action.h
142

AtTopLevel needs to be documented more. It's not clear. Also the usage below.

lib/Driver/Driver.cpp
1146

"For each"

1174–1215

Needs some overarching documentation.

tools/libclang/CIndex.cpp
3096

Comment.

tra updated this revision to Diff 25082.May 6 2015, 1:37 PM
tra edited edge metadata.

More comments...

tra added a comment.May 6 2015, 1:38 PM

Added more comments.

include/clang/Driver/Action.h
142

Commented.

lib/Driver/Driver.cpp
1146

Fixed.

1174–1215

Added more comments describing what's happening in this function.

tools/libclang/CIndex.cpp
3096

Done.

Not really happy with the getTargetToolChain changes. Can you go ahead and revisit the necessity of those (and probably the naming of the StringRef argument as well)?

Thanks!

-eric

tra updated this revision to Diff 25209.May 7 2015, 10:46 AM

Added more comments.
Cleaned up job creation for CudaDeviceAction so it no longer uses computeTargetTriple directly.
Removed -nocudainc argument (which does not exist yet) from the test case RUN lines.

tra added a comment.May 7 2015, 11:04 AM

Not really happy with the getTargetToolChain changes. Can you go ahead and revisit the necessity of those (and probably the naming of the StringRef argument as well)?

getToolChain does two things -- calculates a triple based on default triple, command line args and optional DarwinArch and then selects a toolchain based on the triple. I needed toolchain selection by triple, so I've extracted it into getTargetToolChain.

Alternative approach would be to extend computeTargetTriple so that it can figure out that I need to get a NVPTX triple based on DarwinArchName (renamed to ArchName?). IMO it's not as clean as using getTargetToolchain(Triple) considering that we already know the triple.

I've changed the code so it no longer uses computeTargetTriple() directly (and removed forward declaration), but kept getTargetToolchain().

tra updated this revision to Diff 25225.May 7 2015, 1:32 PM

Fixed a problem with device-side inputs being added to host's cc1 command line twice.

tra added a comment.May 11 2015, 11:49 AM

Eric,

Did you get a chance to take a look at cleaned up getTargetToolChain() and ?

--Artem

tra added a comment.May 11 2015, 11:51 AM

Oops. Hit "Sent" accidentally..

Couple of inline comments, and I'm still not a huge fan of the getTargetToolChain bits. It seems like we should be able to have a single interface to getting a target toolchain that works for both cases. Maybe compute the triple up front for use in the rest of the driver for the host?

Good lord cuda is terrible. :)

-eric

include/clang/Driver/Driver.h
412–414

The comments no longer match the code here.

lib/Driver/Driver.cpp
806–807

Can you rearrange the code so that it doesn't need the forward declaration?

tra added a comment.Jul 7 2015, 1:27 PM

Couple of inline comments, and I'm still not a huge fan of the getTargetToolChain bits. It seems like we should be able to have a single interface to getting a target toolchain that works for both cases. Maybe compute the triple up front for use in the rest of the driver for the host?

Existing code commingles two independent things -- figuring out target triple value and retrieving corresponding ToolChain for a given Triple. My change just refactored out Triple->ToolChain mapping.

Good lord cuda is terrible. :)

'Quirky' would be a better word, perhaps.

include/clang/Driver/Driver.h
412–414

I believe it still does. My change just split StringRef -> Triple -> Toolchain conversion into StringRaf->Triple and Tuple->ToolChain without change in functionality.

lib/Driver/Driver.cpp
806–807

Nope. PrintActions1 and PrintActionList call each other recursively. At least one of them must have forward declaration.

dougk added a subscriber: dougk.Jul 8 2015, 9:44 AM
dougk added inline comments.
lib/Driver/Driver.cpp
806–807

you could move PrintActionList back into PrintActions1 so that the "gpu binaries" clause and final clause each fall into some common code that performs the equivalent of what is now your PrintActionList function. (i.e. don't use mutually recursive functions)

If nothing else, consider renaming PrintActionList so not to imply that it performs the printing, perhaps "ActionListAsString". And use a range-based for loop.

lib/Driver/ToolChains.h
702

CudaToolChain would be are more readily understandable name.

tra updated this revision to Diff 29269.Jul 8 2015, 10:36 AM
tra marked 3 inline comments as done.

Addressed dougk@ and echristo@ comments:

  • folded PrintActionList back into PrintActions1 and got rid of forward declaration.
  • gave cuda toolchain more descriptive name
  • minor formatting and test cases fixes.
tra added inline comments.Jul 8 2015, 10:40 AM
lib/Driver/Driver.cpp
806–807

Good point. Folded PrintActionList back into PrintActions1 as you suggested.

tra updated this revision to Diff 29477.Jul 10 2015, 1:00 PM

Removed getToolCahin changes that were reviewed/committed in D11105.

echristo accepted this revision.Jul 10 2015, 1:27 PM
echristo edited edge metadata.

This is a good start. We'll probably want to do some more refactoring, but this gets it in so we can look at it and make those decisions.

Make sure everything is formatted with clang-format, then LGTM.

-eric

lib/Driver/Driver.cpp
806–807

Bleh. OK.

lib/Driver/ToolChains.cpp
3651–3653

Formatting?

This revision is now accepted and ready to land.Jul 10 2015, 1:27 PM
tra updated this revision to Diff 29487.Jul 10 2015, 1:49 PM
tra edited edge metadata.

clang-formatted the patch. No functional changes.

tra closed this revision.Jul 20 2015, 11:43 AM

Committed in r242085.