This patch adds lowering for function calls with variadic number of arguments
as well as enables support for the following instructions/intrinsics:
- va_arg
- va_start
- va_end
- va_copy
Note that this patch doesn't intent to include clang's support for
variadic functions for CUDA.
According to the docs:
PTX version 6.0 supports passing unsized array parameter to a function which can be used to implement variadic functions. [0]
The last parameter in the parameter list may be a .param array of type .b8 with no size specified. It is used to pass an arbitrary number of parameters to the function packed into a single array object. When calling a function with such an unsized last argument, the last argument may be omitted from the call instruction if no parameter is passed through it. Accesses to this array parameter must be within the bounds of the array. The result of an access is undefined if no array was passed, or if the access was outside the bounds of the actual array being passed. [1]
Note that aggregates passed by value as variadic arguments are not currently
supported.
[0] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#variadic-functions
[1] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func
What determines the alignment here?
NVIDIA does not seem to specify anything regarding alignment here and their example shows align 4:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func