Add int_amdgcn_s_buffer_load_imm instinsic for gfx9+, similar
to the existing int_amdgcn_s_buffer_load, but with immediate
instruction offset.
This exposes an immediate field of the instruction to the front-ends,
and can potentially help generate better code, especially in cases
of complex address expressions where the offset is located in
a different block than the load.
Basic handling also added in the global-isel path with the fall-back
to the old intrinsic. It is not clear at this point, whether
the new intrinsic will help global-isel.
Document whether there is an upper bound to the offset argument, or whether codegen will handle it.