Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Additional changes by Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com> - Fixed opaque pointer miscompile. - Added alloc_aggregate_arg entry point to OpenMPOpt SPMD list. - Fixed nocapture attribute of kmpc_alloc_aggregate_arg. - Added align attribute for call to kmpc_alloc_shared.
This doesn't work anymore with opaque pointers, IIRC. We should remember the type and pass to this place.