Extracted from D8463 to separate codegen and driver changes.
- added -fcuda-include-gpubinary option to incorporate results of device-side compilation into host-side one.
- generate code to register GPU binaries and associated kernels with CUDA runtime and clean-up on exit.
- added test cases for init/deinit code generation.
Complete patch and review are here: http://reviews.llvm.org/D8463