Supersedes https://reviews.llvm.org/D146556
The above PR introduces a new op called gpu.create_queue and adds queue as an optional arg to some gpu dialect ops.
Some GPU runtimes (OpenCL/SYCL) are using queue as explicit argument to all ops.
Explicit queue allows more flexibility on scheduling kernels, e.g interleave execution on multiple devices, allow to create separate queues for kernel execution and data copying, etc.
Queues doesn't introduce any additional sychronization semantics to ops, aync dependencies are still controlled by async tokens.
The link to the discussion is:
https://discourse.llvm.org/t/proposal-to-add-stream-queue-as-an-optional-argument-to-few-gpu-dialect-ops/67920/13