Lower byval arguments of device functions the same way
we lower them for kernels and ensure that it can be accessed
via argument's symbol.
This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).
This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4% on thrust tests.