The computed number of hardware threads can change over the life of the process based on affinity changes. Since we need a data structure that is at least as large as the maximum parallelism, it is important to use the value that was actually latched for the thread pool we will be dispatching work to.
Also adds an assert specifically for if it doesn't line up (I was getting a crash on an index into the vector).
Patch LGTM, but I don't quite get this comment right now.