This change implements the basic mode filtering similar to what we do in
FDR mode. The implementation is slightly simpler in basic-mode filtering
because we have less details to remember, but the idea is the same. At a
high level, we do the following to decide when to filter function call
records:
- We maintain a per-thread "shadow stack" which keeps track of the XRay instrumented functions we've encountered in a thread's execution.
- We push an entry onto the stack when we enter an XRay instrumented function, and note the CPU, TSC, and type of entry (whether we have payload or not when entering).
- When we encounter an exit event, we determine whether the function being exited is the same function we've entered recently, was executing in the same CPU, and the delta of the recent TSC and the recorded TSC at the top of the stack is less than the equivalent amount of microseconds we're configured to ignore -- then we un-wind the record offset an appropriate number of times (so we can overwrite the records later).
We also support limiting the stack depth of the recorded functions,
so that we don't arbitrarily write deep function call stacks.
is 8 bits enough for a CPU number much longer? E.g., getcpu() on Linux returns an unsigned integer for a cpu number.