This is an archive of the discontinued LLVM Phabricator instance.

Add PerfJITEventListener for perf profiling support.
ClosedPublic

Authored by anarazel on Mar 25 2018, 11:20 PM.

Details

Summary

Prerequisite change:

Previously JIT notifiers were called before relocations were
performed (leading to ominious function call of "0"), and before
memory marked executable (confusing some profilers).

Move notifications to finalizeLoadedModules().

Main change:
Add PerfJITEventListener for perf profiling support.

Given that the listener has no dependencies, it might be sensible to
enable it by default when running on linux.

I followed existing precedent in registering the listener by default
in lli, but I'm not sure that's a great idea? I've had to disable it
for the remote MCJIT tests - the computed address is from the remote,
and I didn't see a way to get access to the actual code.

Example:

$ cat /tmp/expensive_loop.c

bool stupid_isprime(uint64_t num)
{

if (num == 2)
        return true;
if (num < 1 || num % 2 == 0)
        return false;
for(uint64_t i = 3; i < num / 2; i+= 2) {
        if (num % i == 0)
                return false;
}
return true;

}

int main(int argc, char **argv)
{

int numprimes = 0;

for (uint64_t num = argc; num < 100000; num++)
{
        if (stupid_isprime(num))
                numprimes++;
}

return numprimes;

}

$ clang -ggdb -S -c -emit-llvm /tmp/expensive_loop.c -o
/tmp/expensive_loop.ll

$ perf record -o /tmp/perf.data -g -k 1 ./bin/lli -jit-kind=mcjit /tmp/expensive_loop.ll 1

$ perf inject --jit -i perf.data -o perf.jit.data

$ perf report -i perf.jit.data

  • 92.59% lli jitted-5881-2.so [.] stupid_isprime stupid_isprime main llvm::MCJIT::runFunction llvm::ExecutionEngine::runFunctionAsMain main __libc_start_main 0x4bf6258d4c544155

+ 0.85% lli ld-2.27.so [.] do_lookup_x

And line-level annotations also work:

     │              for(uint64_t i = 3; i < num / 2; i+= 2) {
     │1 30:   movq   $0x3,-0x18(%rbp)
0.03 │1 38:   mov    -0x18(%rbp),%rax
0.03 │        mov    -0x10(%rbp),%rcx
     │        shr    $0x1,%rcx
3.63 │     ┌──cmp    %rcx,%rax
     │     ├──jae    6f
     │     │                if (num % i == 0)
0.03 │     │  mov    -0x10(%rbp),%rax
     │     │  xor    %edx,%edx

89.00 │ │ divq -0x18(%rbp)

     │     │  cmp    $0x0,%rdx
0.22 │     │↓ jne    5f
     │     │                        return false;
     │     │  movb   $0x0,-0x1(%rbp)
     │     │↓ jmp    73
     │     │        }
3.22 │1 5f:│↓ jmp    61
     │     │        for(uint64_t i = 3; i < num / 2; i+= 2) {

Diff Detail

Repository
rL LLVM

Event Timeline

anarazel created this revision.Mar 25 2018, 11:20 PM
anarazel retitled this revision from Call JIT notifiers only after code sections are ready. to Add PerfJITEventListener for perf profiling support..Mar 25 2018, 11:22 PM
anarazel edited the summary of this revision. (Show Details)
anarazel added a reviewer: reames.

Ping? Any suggestions for other reviewers?

CMakeLists.txt
448 ↗(On Diff #139763)

Should this be enable this by default on linux?

lib/ExecutionEngine/MCJIT/MCJIT.cpp
224 ↗(On Diff #139763)

Arguably this should be merged separately?

lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp
160 ↗(On Diff #139763)

Using the structs and reinterpret cast seems easiest, but then I'm a C programmer at hard. Does that look ok?

479 ↗(On Diff #139763)

This should be optimized to avoid re-emitting the filename if previous filename is the same.

zkrx added a subscriber: zkrx.Apr 26 2018, 12:11 AM
reames accepted this revision.May 9 2018, 4:39 PM

LGTM w/comments addressed before submission.

I didn't review all of the perf API usage closely, but I'm assuming this at least mostly works. If it does, it's better to have something checked in we can improve than nothing.

CMakeLists.txt
448 ↗(On Diff #139763)

No. Or at least, not in the first patch.

659 ↗(On Diff #139763)

You're dropping the other optional components. Please do what OPROFILE does just above.

lib/ExecutionEngine/MCJIT/MCJIT.cpp
224 ↗(On Diff #139763)

Indeed, this needs to be separated and reviewed separately. I fine with that being done in either order w.r.t. this patch landing.

lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp
225 ↗(On Diff #139763)

Probably better to just report a fatal error. Error handling for event handlers like this is not well specified at the moment.

377 ↗(On Diff #139763)

is there a cleaner way to check for this?

389 ↗(On Diff #139763)

Instead of the goto error idiom, please use a RAII pattern with a destructor which closes it.

479 ↗(On Diff #139763)

I think this is fine for the first version, but yes, a good followup.

This revision is now accepted and ready to land.May 9 2018, 4:39 PM
gipri added a subscriber: gipri.May 14 2018, 1:07 AM
anarazel marked 2 inline comments as done.May 24 2018, 12:09 PM
anarazel added inline comments.
lib/ExecutionEngine/MCJIT/MCJIT.cpp
224 ↗(On Diff #139763)
lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp
225 ↗(On Diff #139763)

Hm. It seems not unrealistic for this to fail in some environments, e.g. due to running as a user with few permissions. It seems nicer to just not emit profiling data in that case, without fataling out.

377 ↗(On Diff #139763)

I don't know of any way that's meaningfully cleaner. We could just skip the check and hope for the best I guess?

anarazel updated this revision to Diff 148520.May 24 2018, 5:28 PM

Address reames' comments and rebase onto newer code. As
https://reviews.llvm.org/D44890 has landed, add the corresponding
C-API function as well.

anarazel marked 5 inline comments as done.May 24 2018, 5:31 PM
anarazel added inline comments.
lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp
389 ↗(On Diff #139763)

Replaced with MemoryBuffer::getFileSlice(). That reduces the amount of error checking a bit, because it doesn't report "short reads", but that seems ok, given the ELF signature verification.

vchuravy requested changes to this revision.Jul 10 2018, 9:23 AM
vchuravy added inline comments.
lib/ExecutionEngine/Orc/LLVMBuild.txt
22 ↗(On Diff #148520)

I don't think you should add "PerfJITEvents" here, I see a "missing LLVMPerfJITEvents" with -DLLVM_USE_PERF=0

This revision now requires changes to proceed.Jul 10 2018, 9:23 AM

Given that the listener has no dependencies, it might be sensible to
enable it by default when running on linux.

Fwiw, I agree with this except I would add an option to lli so that generating the perf jitdump data was optional - otherwise you could end up filling a disk, .. unexpectedly.

sanjoy added a subscriber: sanjoy.Jul 21 2018, 10:26 PM
anarazel updated this revision to Diff 156717.Jul 22 2018, 7:27 PM
anarazel marked an inline comment as done.

Fix build without perf enabled, fix rebasing issues.

@vchuravy Yea, that was an unnecessary, and as you notice wrong, dependency. I confirmed that just removing it is sufficient. Does that resolve your blocker?

Given that the listener has no dependencies, it might be sensible to
enable it by default when running on linux.

Fwiw, I agree with this except I would add an option to lli so that generating the perf jitdump data was optional - otherwise you could end up filling a disk, .. unexpectedly.

Noted. I'd like to go without that for the initial revision, but will look towards doing something like that once the initial issues have shaken out. Ok?

vchuravy accepted this revision.Jul 23 2018, 9:48 AM

It indeed does! Thanks!

This revision is now accepted and ready to land.Jul 23 2018, 9:48 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2019, 1:06 AM