Page MenuHomePhabricator

[ORC] Add a LLJITWithThinLTOSummaries example in OrcV2Examples
ClosedPublic

Authored by sgraenitz on Aug 14 2020, 7:47 AM.

Details

Summary

The example demonstrates how to use a module summary index file produced for ThinLTO to:

  • find the module that defines the main entry point
  • find all extra modules that are required for the build

A LIT test runs the example as part of the LLVM test suite [1] and shows how to create a module summary index file.
The code also provides two Error types that can be useful when working with ThinLTO summaries.

[1] if LLVM_BUILD_EXAMPLES=ON and platform is not Windows

Diff Detail

Event Timeline

sgraenitz created this revision.Aug 14 2020, 7:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2020, 7:47 AM
sgraenitz requested review of this revision.Aug 14 2020, 7:47 AM

I would like to remove the ThinLtoJIT example. It needs a more decent threading library to speed-up multithreaded compile times and that's easier to do out-of-tree.
This might be a useful (minimal) portion to keep in-tree. What do you think?

sgraenitz updated this revision to Diff 285655.Aug 14 2020, 8:05 AM

Test discovery should ignore subdirectories that contain test inputs.

LGTM.

What performance issues did you run in to with threading and performance? I haven't had a chance to look in to that yet.

What performance issues did you run in to with threading and performance? I haven't had a chance to look in to that yet.

Performance gains inherently depended on a smart handling of fine-grained async tasks. Otherwise the runtime cost for handling concurrency is easily eating up the gains quickly. I have the impression that the LLVM ThreadPool implementation is too limited here, e.g. there is no mechanism for priority-based scheduling. Trying to walk around the limitations added at lot of complexity that I didn't manage to handle. It might be easier with a decent threading library at hand or maybe a Rust-like async/await. More experiments to come out-of-tree :)

Fix clang-format issue and rebase

sgraenitz updated this revision to Diff 285838.Aug 15 2020, 5:12 AM

Fix clang-tidy warnings

What performance issues did you run in to with threading and performance? I haven't had a chance to look in to that yet.

Performance gains inherently depended on a smart handling of fine-grained async tasks. Otherwise the runtime cost for handling concurrency is easily eating up the gains quickly. I have the impression that the LLVM ThreadPool implementation is too limited here, e.g. there is no mechanism for priority-based scheduling. Trying to walk around the limitations added at lot of complexity that I didn't manage to handle. It might be easier with a decent threading library at hand or maybe a Rust-like async/await. More experiments to come out-of-tree :)

Yep. We have the ExtensibleRTTI system available now, but I haven't had time to hook it up to MaterializationUnit -- doing so might give you some of the prioritization information that you need.

I also want to generalize the ExecutionSession dispatch API to handle arbitrary tasks, rather than just MaterializationUnits. If query handlers were dispatched rather than running on the thread that satisfies the last query dependence it should expose some new opportunities for concurrency.

I will be very interested to hear how your experiments go -- I'd love to get all this tuned to improve performance.

I will get back to performance evaluation maybe in a few weeks and sure I am happy to share my progress.

If query handlers were dispatched rather than running on the thread that satisfies the last query dependence it should expose some new opportunities for concurrency.

Indeed, that sounds promising. I hope it doesn't require adding more locking to the engine? In general, performance analysis only works in combination with comprehensive benchmark data. Maybe a good opportunity to create a benchmark suite for tracking both, single- and multi-threaded performance over time?

Back to the review: I'd like to keep the test for the example and see how the build servers behave. Generally it might be useful to have tests for all the "LLJITWith..." examples right?
Do you think it makes sense to land the patch on the weekend in order to keep the number of people getting annoyed by me breaking their builds at a minimum? :)

lhames accepted this revision.Aug 21 2020, 2:26 PM
This revision is now accepted and ready to land.Aug 21 2020, 2:26 PM

I will get back to performance evaluation maybe in a few weeks and sure I am happy to share my progress.

If query handlers were dispatched rather than running on the thread that satisfies the last query dependence it should expose some new opportunities for concurrency.

Indeed, that sounds promising. I hope it doesn't require adding more locking to the engine? In general, performance analysis only works in combination with comprehensive benchmark data. Maybe a good opportunity to create a benchmark suite for tracking both, single- and multi-threaded performance over time?

It wouldn't introduce new static locking points. To the extent that it enables extra concurrency there's more opportunities for lock contention, but that's a good thing. :)

I 100% agree on the benchmarking idea -- we definitely want one of those.

Back to the review: I'd like to keep the test for the example and see how the build servers behave. Generally it might be useful to have tests for all the "LLJITWith..." examples right?

Sorry -- I LGTM'd earlier but forgot to hit "accept". I think LLJITWith... is a good home for this.

Do you think it makes sense to land the patch on the weekend in order to keep the number of people getting annoyed by me breaking their builds at a minimum? :)

Up to you, but reverts are cheap -- if it works for you locally I'd say land away.