Function calls are now well supported. We used to also need this for
dealing with LDS uses in functions, but now that should also work.
This also removes the stress calls option, which would require moving
the flag to the TargetMachine and I no longer think it's useful.
We should probably drop the amdgpu-function-calls too but I think
hipcc is still using it.
This trims the pass list nicely (I'm surprised this saved a dominator
and alias analysis run).
Currently the inliner gives a boost to the inline-threshold for functions that take alloca instructions as arguments (ArgAllocaCost in AMDGPUTargetTransformInfo.cpp).
Should that inline bonus be removed too?