This is a portion of the work for implementing OpenMP's "simd" loop directive for GPUs. For now only working on upstreaming the code generation portion. Right now the runtime just runs the loop sequentially.
Style-wise we're using a similar methodology of other directives in libomptarget where the "parallel region" is outlined, and passed as an argument into the appropriate runtime function. These changes are in the OMPIRBuilder and right now are only enabled if OMPIRBuilder is enabled. The code also depends on the OMPCanonicalLoop class existing in the AST, which currently only happens when the OMPIRBuilder is enabled.
This is in large parts copied from existing code. Can we extract those parts into helper functions instead of duplicating them?