Currently HIP toolchain calls clang to emit bitcode then calls opt/llc for device compilation for the default -fno-gpu-rdc
case, which is unnecessary since clang is able to compile a single source file to ISA.
This patch fixes the HIP action builder and toolchain so that the default -fno-gpu-rdc can be done like a canonical
toolchain, i.e. one clang -cc1 invocation to compile source code to ISA.
This can avoid unnecessary processes to speed up the compilation, and avoid redundant LLVM passes which are
performed in clang -cc1 and opt.
This patch does not remove opt/llc in -fgpu-rdc case since device linking is still needed whereas
amdgpu backend does not support ISA linking for now.
The comment about "create link action" should probably be moved down below to where the link action is constructed now.