InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image intrinsics
This helps to avoid the situation where RA spots that only 3 of the
v4f32 result of a load are used, and immediately reallocates the 4th
register for something else, requiring a stall waiting for the load.
Differential Revision: https://reviews.llvm.org/D58906