Enable -amdgpu-internalize-symbols to eliminate unused functions and global variables
for whole program to speed up compilation and improve performance.
For -fno-gpu-rdc, -amdgpu-internalize-symbols is passed to clang -cc1.
For -fgpu-rdc, -amdgpu-internalize-symbols is passed to lld.
We should probably add an lld driver flag for this rather than relying on the backend option? Is there some regular way to specify internalization that could add the target pass instead?