CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA
driver from unrolling a loop marked with llvm.loop.unroll.disable is not
unrolled by CUDA driver, we need to emit .pragma "nounroll" at the
header of that loop.
This patch also extracts getting unroll metadata from loop ID metadata
into a shared helper function.
FYI the loop unrolling pass should replace instances of "llvm.loop.unroll.count 1" from "#pragma unroll 1" with llvm.loop.unroll.disable.