Currently Clang emits call of __read_pipe_2 or __read_pipe_4 for OpenCL read_pipe builtin,
with appended type size and alignment arguments, where 2 or 4 indicates the original
number of arguments.
For certain targets (e.g. amdgpu), there are optimized version of __read_pipe_2/__read_pipe_4
when the type size and alignment has the same power of 2 value. It is desired that Clang
emits a different function for these cases.
This patch let Clang emits __read_pipe_2_N for such cases where N is the size in bytes of
the type. (N = 1,2,4,8,...,128), so that the target runtime can use an optimized version of
read_pipe.
The same with __read_pipe_4, __write_pipe_2 and __wirte_pipe_4.
This optimization is controlled by TargetCodeGenInfo::hasOptimizedPipeBuiltin, which returns
false by default. Each target can override this function to turn on this optimization.