Prerequisite change:
Previously JIT notifiers were called before relocations were
performed (leading to ominious function call of "0"), and before
memory marked executable (confusing some profilers).
Move notifications to finalizeLoadedModules().
Main change:
Add PerfJITEventListener for perf profiling support.
Given that the listener has no dependencies, it might be sensible to
enable it by default when running on linux.
I followed existing precedent in registering the listener by default
in lli, but I'm not sure that's a great idea?  I've had to disable it
for the remote MCJIT tests - the computed address is from the remote,
and I didn't see a way to get access to the actual code.
Example:
$ cat /tmp/expensive_loop.c
bool stupid_isprime(uint64_t num)
{
if (num == 2)
        return true;
if (num < 1 || num % 2 == 0)
        return false;
for(uint64_t i = 3; i < num / 2; i+= 2) {
        if (num % i == 0)
                return false;
}
return true;}
int main(int argc, char **argv)
{
int numprimes = 0;
for (uint64_t num = argc; num < 100000; num++)
{
        if (stupid_isprime(num))
                numprimes++;
}
return numprimes;}
$ clang -ggdb -S -c -emit-llvm /tmp/expensive_loop.c -o
/tmp/expensive_loop.ll
$ perf record -o /tmp/perf.data -g -k 1 ./bin/lli -jit-kind=mcjit /tmp/expensive_loop.ll 1
$ perf inject --jit -i perf.data -o perf.jit.data
$ perf report -i perf.jit.data
- 92.59% lli jitted-5881-2.so [.] stupid_isprime stupid_isprime main llvm::MCJIT::runFunction llvm::ExecutionEngine::runFunctionAsMain main __libc_start_main 0x4bf6258d4c544155
+ 0.85% lli ld-2.27.so [.] do_lookup_x
And line-level annotations also work:
     │              for(uint64_t i = 3; i < num / 2; i+= 2) {
     │1 30:   movq   $0x3,-0x18(%rbp)
0.03 │1 38:   mov    -0x18(%rbp),%rax
0.03 │        mov    -0x10(%rbp),%rcx
     │        shr    $0x1,%rcx
3.63 │     ┌──cmp    %rcx,%rax
     │     ├──jae    6f
     │     │                if (num % i == 0)
0.03 │     │  mov    -0x10(%rbp),%rax
     │     │  xor    %edx,%edx89.00 │ │ divq -0x18(%rbp)
     │     │  cmp    $0x0,%rdx
0.22 │     │↓ jne    5f
     │     │                        return false;
     │     │  movb   $0x0,-0x1(%rbp)
     │     │↓ jmp    73
     │     │        }
3.22 │1 5f:│↓ jmp    61
     │     │        for(uint64_t i = 3; i < num / 2; i+= 2) {
Should this be enable this by default on linux?