Lower byval arguments of device functions the same way
we lower them for kernels and ensure that it can be accessed
via argument's symbol.
This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).
This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4% on thrust tests.
I'm not sure this is the right place to do this transformation, as opposed to (say) at the machine instruction level.
My first approximation mental model for selection dag is that it does a reasonably direct translation of IR to machine instructions. It's responsible for recognizing mapping constructions in IR to fast machine instructions (e.g. x86's various addressing modes), but optimizations that recognize that e.g. f(f^-1(x)) == x are less common in isel. (Maybe someone will tell me this is wrong.)
So I'm not sure whether this belongs here. Especially if it's a correctness transformation -- what if the source isn't directly a MoveParam? Like, we moved the param to one reg, then moved it to another reg, then converted *that*. Are we clear of the bug we're working around? If this is just an optimization, at least it should be guarded so we don't run it at -O0.
I would also like to be explicit in our source (somewhere) exactly what is the nvptx bug that we're working around, so that this change doesn't just sit in our backend and confuse all hackers who come after us. :)