[Support] Extend TimeProfiler to support multiple threads
This makes TimeTraceProfilerInstance thread local. Added
timeTraceProfilerFinishThread() which moves the thread local instance to
a global vector of instances. timeTraceProfilerWrite() then writes
recorded data from all instances.
Threads are identified based on their thread ids. Totals are reported
with artificial thread ids higher than the real ones.
Replaced raw pointer for TimeTraceProfilerInstance with unique_ptr.
Differential Revision: https://reviews.llvm.org/D71059