[AMDGPU] Implement hardware bug workaround for image instructions
This implements a workaround for a hardware bug in gfx8 and gfx9,
where register usage is not estimated correctly for image_store and
image_gather4 instructions when D16 is used.
Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899
I know clang-format really wants to pack these onto a single line, but it's a terrible idea and you shouldn't listen to it