Introduce -fgpu-default-stream={legacy|per-thread} option to
support per-thread default stream for HIP runtime.
When -fgpu-default-stream=per-thread, HIP kernels are
launched through hipLaunchKernel_spt instead of
hipLaunchKernel. Also -DHIP_API_PER_THREAD_DEFAULT_STREAM
is passed to clang -cc1 to enable other per-thread stream
API's.
Naming, as usual, is hard. :-)
Considering that the option tweaks codegen, it may be appropriate to use -f prefix.
On the other hand, --default-stream is the option used by NVCC, so it may be more familiar to the end users.
The third option would be to use --gpu-default-stream or -fgpu-default-stream.
I'm leaning towards -fgpu-default-stream.
WDYT?