This metric was missing. We were only measuring in per-thread mode, and
this completes the work.
For a sample trace I have, the dump info command shows
Timing for this thread:
Decoding instructions: 0.12sI also improved a bit the TaskTime function so that callers don't need to
specify the template argument