The optional argument is needed for CUDA-11+ headers when we're compiling for sm_80+ GPUs.
For the intrinsics, the src_size argument is required now. Old calls w/o the src_size argument can be upgraded by using src_size=transfer size of the intrinsic.