[AMDGPU] Unroll preferences improvements
Exit loop analysis early if suitable private access found.
Do not account for GEPs which are invariant to loop induction variable.
Do not account for Allocas which are too big to fit into register file anyway.
Add option for tuning: -amdgpu-unroll-threshold-private.
Differential Revision: https://reviews.llvm.org/D29473