This metric was missing. We were only measuring in per-thread mode, and
this completes the work.
For a sample trace I have, the dump info command shows
Timing for this thread: Decoding instructions: 0.12s
I also improved a bit the TaskTime function so that callers don't need to
specify the template argument