mbarrier is a barrier created in shared memory that supports different flavors of synchronizing threads other than __syncthreads, for more information see below.
https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-mbarrier
This work adds initial Ops wrt mbarrier to nvgpu dialect.
First, it introduces to two types:
mbarrier.barrier that is barrier object in shared memory
mbarrier.barrier.token that is token
It introduces following Ops:
mbarrier.create creates mbarrier.barrier
mbarrier.init initializes mbarrier.barrier
mbarrier.arrive performs arrive-on mbarrier.barrier returns mbarrier.barrier.token
mbarrier.arrive.nocomplete performs arrive-on (non-blocking) mbarrier.barrier returns mbarrier.barrier.token
mbarrier.test_wait waits on mbarrier.barrier and mbarrier.barrier.token
doc please: this is the type for a barrier in shared memory that is used to synchronize a variable number of threads. Conceptually it behaves similarly to C++ std::barrier<T>::arrive_and_wait
etc