Page MenuHomePhabricator

[Support] allow -stats/-time-passes reporting into a custom stream
AbandonedPublic

Authored by fedor.sergeev on Feb 22 2019, 11:44 PM.

Details

Summary

Extending the notion of info-output-file by replacing it with
a generic getInfoOutputStream. By default it is still populated with
the file stream created from a managed-static LibSupportInfoOutputFilename.

New interface introduced:

llvm::setInfoOutputStreamProvider

which allows to specify a custom callback which provides an output stream
when needed.

Current intended use is to get "info" output into separate output streams per
each compilation, allowing to, say, get independent non-overlapping pass-times
reports for parallel independent compilations. It can be implemented by installing
a custom stream provider that manages thread-local copies of output streams.

That "thread-aware" provider is not supplied with this fix.

Diff Detail

Event Timeline

fedor.sergeev created this revision.Feb 22 2019, 11:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2019, 11:44 PM

clang-formatted.
still figuring out how to unit-test this thing.

I don't think this is sufficient or goes into the right direction to support parallel info collection and output.

For the new PassManager, reporting per-thread timings is trivial already and requires no (additional) global state. You can just create a TimePassesHandler instance per thread or per pipeline.
For Statistics on the other hand the approach in this change is insufficient, because the statistics registry is already a singleton. In order to support parallel statistics collection, can we get rid of the Statistics singleton?

fedor.sergeev added a comment.EditedFeb 25 2019, 4:02 PM

I don't think this is sufficient or goes into the right direction to support parallel info collection and output.

For the new PassManager, reporting per-thread timings is trivial already and requires no (additional) global state. You can just create a TimePassesHandler instance per thread or per pipeline.

A global state of the data being collected is indeed a problem, yet I'm not trying to address it here.

This change is focused purely on the stream that LLVM uses to report the data being collected.
No matter how many TimePassesHandlers you create (and yes, we do create TimePassesHandler one per thread-per-pipeline) they all emit into the same CreateInfoOutputFile() - either stderr or a single file.
I'm trying to solve a problem of a badly interleaved output, thats all.

For Statistics on the other hand the approach in this change is insufficient, because the statistics registry is already a singleton. In order to support parallel statistics collection, can we get rid of the Statistics singleton?

Yes, unfortunately for now the only case that will work for me is -time-passes with new pass manager (and that means - just opt, and not codegen :( ).
But that will be a separate fight (which I perhaps will dash into).

This change is focused purely on the stream that LLVM uses to report the data being collected.
No matter how many TimePassesHandlers you create (and yes, we do create TimePassesHandler one per thread-per-pipeline) they all emit into the same CreateInfoOutputFile() - either stderr or a single file.
I'm trying to solve a problem of a badly interleaved output, thats all.

You could set the output directly on the handler.

This change is focused purely on the stream that LLVM uses to report the data being collected.
No matter how many TimePassesHandlers you create (and yes, we do create TimePassesHandler one per thread-per-pipeline) they all emit into the same CreateInfoOutputFile() - either stderr or a single file.
I'm trying to solve a problem of a badly interleaved output, thats all.

You could set the output directly on the handler.

Do you suggest to get rid of the current scheme where we have a single output file for all kinds of "info data"?
Say, now w/o these changes -info-output-file is controlling the output file for both -stats and -time-passes.
Do you think it was a wrong design and we need to introduce separate output controls for each of those features?

Perhaps you can expand a bit on what exactly in *this* change do you see as problematic?
Your previous comment was mostly about overall -stats/-time-passes design, which I consider to be somewhat orthogonal here.

Do you suggest to get rid of the current scheme where we have a single output file for all kinds of "info data"?

No I don't, that should certainly be the default.

Perhaps you can expand a bit on what exactly in *this* change do you see as problematic?

This patch adds to global state and it solves having multiple output streams in a very convoluted way. Since there's only a single callback the callee has to figure out who the caller is to give them the right file. And that's not as simple as looking at the thread id or something like that. For Statistics, it makes no sense to output to different files, because the statistics are singleton.

Your previous comment was mostly about overall -stats/-time-passes design, which I consider to be somewhat orthogonal here.

My comment was about decentralizing the info output file setting. I'm thinking the current CreateInfoOutputFile should be the default for things printing out infos. But in those cases where we want multiple info output streams, the override should be controlled by the caller (in this particular instance, TimePassesHandler, e.g.).

I do generally feel like it would be a useful and generically good direction to support the timing system having a pre-bound output stream so that it completely avoids the flag-based info output stream we currently use. It moves everything into a proper API instead of a flags based system that seems inherently hostile to more advanced users.

It also seems good to separate this at the *pass manager* level with a specifically bound pass because it is timing passes.

Statistics are, IMO, a separate question. There, I would suggest that the correct way to have useful parallel access to statistics is to follow-through on a long-standing idea of making statistics be owned by the LLVMContext. But unsure how necessary this is... I thought our statistics were already atomic updated. So maybe it would be enough to just have a single stream for those?

fedor.sergeev abandoned this revision.Mar 14 2019, 7:56 AM

Abandoning this in favor of D59366.