When compiling CUDA, we run the frontend N times, once for each device
arch. This means that if you have a compile error in your file, you'll
see that error N times.
Relatedly, if ptxas fails, we'll output that error and then still try to
pass its output to fatbinary, which then fails because (duh) its input
file doesn't exist.
This patch stops compilations with -stop-on-failure as soon as we
encounter an error. -stop-on-failure is turned on by default for CUDA
compilations.