AMDGPU only supports direct calls, but at lower optimization levels it fails to lower statically direct calls which appear indirect due to a bitcast, e.g. calls of the form call ... bitcast (... @func to ...)(...)
Update CallPromotionUtils to handle both bitcast and pointer cast of argument and return types and use it in an AMDGPU pass to promote all possible calls before inlining.