The previous patch added an argument to the __tgt_target_kernel
runtime function which includes the tripcount used for the loop clause.
This was originally passed in via the __kmpc_push_target_tripcount
function. Now we move this logic to the kernel launch itself and remove
the need for the push function.
Depends on D128816