Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space. A stack pointer is unsuitable for kernel
arguments, which are passed in a separate, constant buffer with a
different address space.
Start using byval for aggregate kernel arguments. Previously these
were emitted as raw struct arguments, and turned into loads in the
backend. These will lower identically, although with byval you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byval for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).
This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.
Most of the language surrounding byval involves the stack, however I
can't find any real reason this needs to limited to a stack address
space. My main concern is that nothing seems to explicitly disallow
writing to a byval address, but it is illegal to write to this read
only pointer. A theoretical stack space reusing pass might try to
reuse some byval bytes. The only one we have now is the stack coloring
machine pass, which would never see a frame index for these byval
arguments. I could go a step further and add the readonly attribute to
these to make sure writes should never be introduced, although this
patch doesn't do that yet. There is code in the clang handling
explicitly removing readnone from byval arguments for some reason.