This is an archive of the discontinued LLVM Phabricator instance.

[RuntimeDyld] Fix performance problem in resolveRelocations with many sections
ClosedPublic

Authored by loladiro on Oct 7 2015, 10:44 PM.

Details

Summary

Rather than just iterating over all sections and checking whether we have relocations for them, iterate over the relocation map instead. This showed up heavily in an artificial julia benchmark that does lots of compilation. On that particular benchmark, this patch gives
~15% performance improvements. As far as I can tell the primary reason why the original
loop was so expensive is that Relocations[i] actually constructs a relocationList (allocating memory & doing lots of other unnecessary computing) if none is found.

Diff Detail

Repository
rL LLVM

Event Timeline

loladiro updated this revision to Diff 36826.Oct 7 2015, 10:44 PM
loladiro retitled this revision from to [RuntimeDyld] Fix performance problem in resolveRelocations with many sections.
loladiro updated this object.
loladiro added a reviewer: lhames.
loladiro set the repository for this revision to rL LLVM.
loladiro added a subscriber: llvm-commits.

@lhames (& others interested). This is part of my push to make MCJIT performance acceptable enough to be able to switch over julia to MCJIT by default (it's still stuck on 3.3 by default due to that). After this one of the major performance drains is that creating a new pass manager for every module is pretty expensive (see graph at the end of https://github.com/JuliaLang/julia/issues/9336). Is it possible to only construct the pass manager once (I tried this briefly, but couldn't immediate make it work)? Does ORC maybe do this already?

lhames edited edge metadata.Oct 7 2015, 11:26 PM

This showed up heavily in an artificial julia benchmark that does lots of compilation. On that particular benchmark, this patch gives
~15% performance improvements.

O_o

I'm fine with this change, but that result is surprising. Relocations is a:

DenseMap<unsigned, SmallVector<RelocationEntry, 64>>

Constructing entries should be cheap relative to the rest of the compilation.

After this one of the major performance drains is that creating a new pass manager for every module is pretty
expensive (see graph at the end of https://github.com/JuliaLang/julia/issues/9336)Is it possible to only construct
the pass manager once (I tried this briefly, but couldn't immediate make it work)? Does ORC maybe do this already?

ORC doesn't do this yet, but will definitely suffer the same performance issues, especialy when people use lazy compilation (which involves creating or breaking up modules).

A while back I actually tried persisting the pass manager in the ORC SimpleCompiler utility, and blew it up immediately on Darwin (and Linux too from memory). I'd be happy to turn that feature back on conditionally and add a test for it though: We should start tackling this. Plenty of people would love to be able to persist the pass manager.

RE switching Julia to MCJIT: Sounds cool. Shameless plug: I'd love to help you port to ORC if you're interested. If you're using the C++ APIs it's a trivial change (can be conditionally enabled to test it out), should be as fast or faster, and gets you some new features. :)

Re performance, I was quite surprised too, but this is (kN)^2 (since we do it every time for all sections, for all modules) for k the number of section with N being about 5000 and the modules being very tiny (as I said, it's an artificial benchmark - just compiling all of complex number support over and over).

Re switching to ORC, I think the initial switch should be pretty simple, but I want to replace some of the homegrown functionality for splitting modules, which will be a slightly harder task. If you'll be at llvmdev later this month we can try to find some time to sit down then.

lhames accepted this revision.Oct 9 2015, 11:31 AM
lhames edited edge metadata.

Ok - LGTM.

And yep - I'll be at the Dev Meeting. I look forward to chatting about this.

This revision is now accepted and ready to land.Oct 9 2015, 11:31 AM
This revision was automatically updated to reflect the committed changes.