- Move ThreadPool ownership to the runtime, and wait for the async tasks completion in the destructor.
- Remove MLIR_ASYNCRUNTIME_EXPORT from method definitions because they are unnecessary in .cpp files, as only function declarations need to be exported, not their definitions.
- Fix concurrency bugs in group emplace and potential use-after-free in token emplace.
Tested internally 10k runs in async.mlir and async-group.mlir.
Extra S here and below.