This fixes various races and cleans up the thread pool logic. In particular:
- The ThreadPoolExecutor destructor now waits for the pool to fully exit before letting potentially in-use instance variables be implicitly destroyed.
- Explicitly joining threads ensures that thread-specific data destruction is complete.