Addressed some feedback from initial version.
- factored out allocator padding calculations
- use anonymous unions instead of char-buffers and reinterpret_cast
- make co_await'ing an invalid task have undefined behaviour
- run clang-format over the file to fix some formatting