Supersedes https://reviews.llvm.org/D146556
The above PR introduces a new op called gpu.create_queue and adds queue as an optional arg to some gpu dialect ops.
Some GPU runtimes (OpenCL/SYCL) are using queue as explicit argument to all ops.
Explicit queue allows more flexibility on scheduling kernels, e.g interleave execution on multiple devices, allow to create separate queues for kernel execution and data copying, etc.
Queues doesn't introduce any additional sychronization semantics to ops, aync dependencies are still controlled by async tokens.
The link to the discussion is:
https://discourse.llvm.org/t/proposal-to-add-stream-queue-as-an-optional-argument-to-few-gpu-dialect-ops/67920/13
This is back to the discussion in the thread, where I don't think I got clear answers, and I'd rather see it address there.
Right now it is claimed that the queues "allow more fine-grained control over kernel scheduling", but I don't quite get this argument: since they are entirely out-of-order it's not clear to me where is the "fine-grained control" here?
Why is it useful? Since queues are out-of-order, other than mapping to a given device, I don't see the semantic different between one queue or two here.