This is an archive of the discontinued LLVM Phabricator instance.

[Driver] Add option -fproc-stat-report
ClosedPublic

Authored by sepavloff on Apr 27 2020, 12:36 AM.

Details

Summary

The new option -fproc-stat-info=<file> can be used to generate report
about used memory and execution tile of each stage of compilation.
Documentation for this option can be found in UserManual.rst. The
option can be used in parallel builds.

Diff Detail

Event Timeline

sepavloff created this revision.Apr 27 2020, 12:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2020, 12:36 AM
aganea added inline comments.Apr 27 2020, 11:09 AM
clang/docs/UsersManual.rst
770

Please add a header to the output .CSV, specifying the units of measure where relevant.

786

Why not just -fproc-stat-report in this case?

787

I think it is better if the units are specified along (and locale-formatted, if possible):

clang-11: output=/tmp/foo-123456.o  total=84,000 ms  user=76,000 ms  mem=87,496 kb
clang/include/clang/Driver/Job.h
78

mutable Optional<llvm::sys::ProcessStatistics> ProcStat; and then you won't need .isSet().

clang/lib/Driver/Driver.cpp
3774

There is a large piece of code in the if statement, I would add curly brackets:
if (!StatReportFile.empty()) { ... }

3776

Use Optional<> like state above, then the condition changes to:
if (ProcStat)

and then usage below becomes:
<< ", total=" << ProcStat->TotalTime.count()

3778

if (StatReportFile.equals("-") || StatReportFile.empty()) {

MaskRay added inline comments.Apr 28 2020, 11:27 PM
clang/docs/UsersManual.rst
786

It can save an Option..

787

Sorry, I tend to disagree with the argument for decimal separators and locale differences. They make behaviors divergent and make the output difficult to parse by a script. (Scripts may have to use LANG=C clang -fproc-stat-report=- to cancel the locale effect)

aganea added inline comments.Apr 29 2020, 7:26 AM
clang/docs/UsersManual.rst
786

I was thinking in terms of user experience, I find it a bit awkward. If you want stdin, you say mytool < file, if you want stdout you say mytool and pipe the output down the line. Is there a need to state the output when we're not writing to a file?

There is a precedent for this, other tools like xray have llvm-xray account %s -o -, but it's there for completness, you could just say llvm-xray account %s.

787

In my sense, CSV or YAML are for machine parsing, TXT is for human consumption. A script should not parse a human-targetted output.
Maybe an option is missing here to set the format? Again xray has -f=text or -f=csv.

sepavloff marked 3 inline comments as done.Apr 29 2020, 9:29 AM
sepavloff added inline comments.
clang/docs/UsersManual.rst
770

CSV was chosen because such file is a simple union of records. It is convenient if multiple processes write to it. A process locks the file, writes to it and unlock it.

If header is required, thing get more complicated. A process locks file, then moves to the end and get current position. If it is the beginning of the file, this process if the first writer, so it writes header. Then it writes line of data and unlock file.

On the other hand, CSV file is proposed for parsing by scripts. Does such complication really makes sense?

786

Agree, it looks awkward. I will add variant without value.

787

I don't think an option for output format is needed here (but it could be useful if we supported JSON in addition to CSV). However the proposed format indeed is more readable. I am going to change printing accordingly, but without locale specifics. For users of this feature C locale is convenient.

aganea added inline comments.Apr 29 2020, 10:33 AM
clang/docs/UsersManual.rst
770

Thanks for the explanation! Yes, it would be more complicated indeed if locking is required. Let's forget the header if it's not practical.

However that raises a question: is it up to the user/build system to delete the log file on startup? Otherwise each build would indefintely append, you wouldn't know where your last build started in the log? How do you address that currently?

787

Sounds good!

sepavloff updated this revision to Diff 261621.May 2 2020, 12:11 AM

Updated patch

  • added variant of the option without file, it is used to print statistics on stdout.
  • Updated file locking mechanism.
  • Updated test and documentation accordingly.
sepavloff marked an inline comment as done.May 2 2020, 6:34 AM
sepavloff added inline comments.
clang/docs/UsersManual.rst
770

However that raises a question: is it up to the user/build system to delete the log file on startup? Otherwise each build would indefintely append, you wouldn't know where your last build started in the log? How do you address that currently?

Yes, the log file is only appended. Users need to delete it if they want it contains results of one run only. However this feature is also used to collect statistics. User runs the same build several times and average needed metric for the same output file.

sepavloff updated this revision to Diff 266506.May 27 2020, 5:53 AM

Updated patch

sepavloff marked 4 inline comments as done.May 27 2020, 5:58 AM
sepavloff added inline comments.
clang/lib/Driver/Driver.cpp
83

Thank you for the catch!

3776

Indeed, changed accordingly.

aganea added inline comments.Jun 16 2020, 1:28 PM
clang/lib/Driver/Driver.cpp
3778

In the case where !ProcStat, I am wondering if we shouldn't emit zero values, in the report file at least. Otherwise if an invocation fails, it won't be there in the report and we might wonder why. Emitting 0 might indicate that something went wrong.

3810

If the goal is to report accurate information, maybe it's worth looping here a bit in case of an error, to give the chance to other clang.exe instances to release the file lock? What do you think? (same for tryToLock)

sepavloff marked 2 inline comments as done.Jun 17 2020, 9:29 AM
sepavloff added inline comments.
clang/lib/Driver/Driver.cpp
3778

The feature allows getting statistics using conventional tools like make. In this case the absence of numbers for a particular file is not an issue, maybe the file has already been built. If something goes wrong, clang must handle it in Compilation::ExecuteCommand and react properly. The fact of failed invocation would be handled by build tool. For the purpose of monitoring performance numbers it is more convenient to record successful compilations only.

3810

Actually this method makes blocking request and waits infinitely.

There is long discussion of how this lock helper must be implemented: https://reviews.llvm.org/D79066. Finally the variant with timeout was removed, only blocking one remained. The method has prefix try because it requires handling result value.

If you think the name or interface might be better, you can participate in the discussion in D79066, that patch isn't committed yet.

aganea added inline comments.Jun 17 2020, 10:41 AM
clang/lib/Driver/Driver.cpp
3778

Ok, thanks for clarifying! Could you please do an early exit here? ie.

if (!ProcStat)
  return;
3810

Thanks!

In the same way as above, I think the code will read better if it did early exits. Also, I don't think you need handleAllErrors, you could replace it with toString and report the error code the user:

if (EC)
  return;
auto L=OS.tryToLock();
if (!L) {
  llvm::errs() ... << toString(L.takeError());
  return;
}
OS << Buffer;
sepavloff updated this revision to Diff 271648.Jun 18 2020, 3:50 AM

Updated patch according to reviewer's notes

sepavloff updated this revision to Diff 271665.Jun 18 2020, 4:59 AM

Updated patch

sepavloff marked 2 inline comments as done.Jun 18 2020, 6:58 AM
sepavloff added inline comments.
clang/lib/Driver/Driver.cpp
3810

It becomes better. Thanks!

aganea accepted this revision.EditedJun 18 2020, 7:10 AM

Thanks again for the changes @sepavloff! Looks good to me!
You might wait a bit before commiting, if there are further comments.

clang/lib/Driver/Driver.cpp
3803

Please note that I've just moved this function to llvm::sys::printArg, you will need to update before commit.

This revision is now accepted and ready to land.Jun 18 2020, 7:10 AM
This revision was landed with ongoing or failed builds.Nov 12 2020, 11:16 PM
This revision was automatically updated to reflect the committed changes.

Could you add quotation marks around the executable name to make the CSV file easier to parse?

Could you add quotation marks around the executable name to make the CSV file easier to parse?

Implemented: rG9faedb2d7146.

ZarkoCA added inline comments.
clang/test/Driver/report-stat.c
2

Hi @sepavloff, is it possible to add -fintegrated-as to the clang invocation?

As it is, the test fails on our AIX target because we don't use the integrated assembler by default.

ZarkoCA added inline comments.Nov 27 2020, 2:57 PM
clang/test/Driver/report-stat.c
2

fyi, I added the option in 979bcbd3a6f7ea784f2098ad4cf613fbd6b09e38