By convention the default output file for -E is "-" (stdout).
This is expected by tools like ccache, which uses output
of -E to determine if a file and its dependence has changed.
Currently clang does not use stdout as default output file for -E
for HIP, which causes ccache not working.
This patch fixes that.
What does it mean for -E to be used when we compile for host and multiple devices. I believe for CUDA clang errors out unless there's only one sub-compilation. What does HIP do when it's run with -E -o - ?
Looks like CUDA (and, maybe HIP, too) has a bug there. -E will run preprocess on all subcompilations. -E -o - will error out claiming that you can't use -o for multiple output files, even though -### shows the same -o - in all subcompilations in both cases.