Exit loop analysis early if suitable private access found.
Do not account for GEPs which are invariant to loop induction variable.
Do not account for Allocas which are too big to fit into register file anyway.
Add two options for tuning: -amdgpu-unroll-threshold and -amdgpu-unroll-threshold-private.
Fix AMDGPUTTIImpl::getNumberOfRegisters() query to return correct amount on pre-VI.