The previous patch added an argument to the __tgt_target_kernel
runtime function which includes the tripcount used for the loop clause.
This was originally passed in via the __kmpc_push_target_tripcount
function. Now we move this logic to the kernel launch itself and remove
the need for the push function.
Depends on D128816
Doesn't target already do this? And we should get rid of this in a follow up.