This work adds two Ops:
mbarrier.arrive.expect_tx performs expect_tx mbarrier.barrier returns mbarrier.barrier.token
mbarrier.try_wait.parity waits on mbarrier.barrier and mbarrier.barrier.token
mbarrier.arrive.expect_tx is one of the requirement to enable H100 TMA support.
Could you add some description here to better explain what this is doing?
Essentially what it means to "perform arrive with expect_tx".
From what I understand, when the thread arrives to this point, we set the expect count of the barrier to txcount and produce a token. Later we'll wait on the produced token until the barrier is released.
Feel free to reword/fix, but we need some kind of explanation :).