Move it up with other module passes. It's a higher level optimization
that should probably be done before hacking up the IR for codegen. It
should really be done earlier than this. We could possibly move this
with other IPO passes, but we'd have to stop inferring the lack of
lds.kernel.id calls and have the LDS module pass mark functions which
don't need the ID.
The one test change is because that pass is relying on the backend run
of SROA (which we ideally wouldn't have).
nit: Do you know why there's more registers being used?
We go up to s31 for the call at the end so I guess the SGPR usage is the same overall, I'm just wondering.