This patch fixes the issue where the built-in profiler doesn't produce data for multiple threads:
Before creating a TimeScope object, the code checks to see if a profiler instance has
been created for that thread. If not, it creates one and saves a pointer for later.
At shutdown, all the saved profiler instances are finished before the call to write out the data.