[NVPTX] make load on global readonly memory to use ldg
Summary:
As describe in [1], ld.global.nc may be used to load memory by nvcc when
restrict is used and compiler can detect whether read-only data cache
is safe to use.
This patch will try to check whether ldg is safe to use and use them to
replace ld.global when possible. This change can improve the performance
by 18~29% on affected kernels (ratt*_kernel and rwdot*_kernel) in
S3D benchmark of shoc [2].
Patched by Xuetian Weng.
[1] http://docs.nvidia.com/cuda/kepler-tuning-guide/#read-only-data-cache
[2] https://github.com/vetter/shoc
Test Plan: test/CodeGen/NVPTX/load-with-non-coherent-cache.ll