Current implementation tries to guess which Action will result in a job which needs to incorporate device-side GPU binaries. The guessing was attempting to work around the fact that multiple actions may be combined into a single compiler invocation. If CudaHostAction ends up being combined (and thus bypassed during action list traversal) no device-side actions it pointed to were processed. The guessing worked for most of the usual cases, but fell apart when external assembler was used.
This change removes the guessing and makes sure we create and pass device-side jobs regardless of how the jobs get combined.
- CudaHostAction is always inserted either at Compile phase or the FinalPhase of current compilation, whichever happens first.
- If selectToolForJob combines CudaHostAction with other actions, it passes info about CudaHostAction up to the caller
- When it sees that CudaHostAction got combined with other actions (and hence will never be passed to BuildJobsForActions), BuildJobsForActions creates device-side jobs the same way they would be created if CudaHostAction was passed to BuildJobsForActions directly.
- Added two more test cases to make sure GPU binaries are passed to correct jobs.