This is an archive of the discontinued LLVM Phabricator instance.

Make STATISTIC() values available programmatically
ClosedPublic

Authored by dsanders on Feb 28 2018, 1:34 PM.

Details

Summary

It can be useful for tools to be able to retrieve the values of variables
declared via STATISTIC() directly without having to emit them and parse
them back. Use cases include:

  • Needing to report specific statistics to a test harness
  • Wanting to post-process statistics. For example, to produce a percentage of functions that were fully selected by GlobalISel

Make this possible by adding llvm::GetStatistics() which returns an
iterator_range that can be used to inspect the statistics that have been
touched during execution. When statistics are disabled (NDEBUG and not
LLVM_ENABLE_STATISTICS) this method will return an empty range.

This patch doesn't address the effect of multiple compilations within the same
process. In such situations, the statistics will be cumulative for all
compilations up to the GetStatistics() call.

Diff Detail

Event Timeline

dsanders created this revision.Feb 28 2018, 1:34 PM
dsanders updated this revision to Diff 136574.Mar 1 2018, 10:45 AM

During further testing, I noticed that the pre-existing LLVM_ENABLE_STATS option
isn't actually available to CMake and therefore the -DLLVM_ENABLE_STATS option
that PrintStatistics() suggests in its error message doesn't do anything.

Updated the patch to fix that at the same time

dsanders updated this revision to Diff 136577.Mar 1 2018, 10:52 AM

Remove unintentional change. ResetStatistics() is something I'm considering implementing later

rtereshin accepted this revision.Mar 1 2018, 10:35 PM

LGTM

My only concern here is that I don't quite understand why only the ManagedStatic<StatisticInfo> StatInfo's writer is guarded by a mutex, but non of the readers. As the newly added GetStatistics() method leaks a read-access to StatInfo into the wild losing any control over it, it might be a problem.

Could you please clarify that a little bit?

Thanks! I really like the direction this is going =)

This revision is now accepted and ready to land.Mar 1 2018, 10:35 PM

Ah, yeah, also, as the range returned is a couple of vector iterators, it looks like bumping any counter anywhere would potentially invalidate all the ranges existing. I suspect that might be a little error-prone and fragile even in a single-threaded environment. Maybe replace that vector with a list? And add a test that will check we don't regress on this little detail.

Alternatively, we could just copy the whole thing instead of returning iterators to it. Potentially protect that copy with that mutex and then we're rock solid.
But again, why do we even worry about this being thread-safe in any way one more time?

Thanks

LGTM

My only concern here is that I don't quite understand why only the ManagedStatic<StatisticInfo> StatInfo's writer is guarded by a mutex, but non of the readers. As the newly added GetStatistics() method leaks a read-access to StatInfo into the wild losing any control over it, it might be a problem.

Could you please clarify that a little bit?

The value itself isn't protected by a mutex but rather ensures atomic updates with std::atomic. Do you mean the StatLock mutex referenced by RegisterStatistic()? That's protecting against a race that can occur on the first modification to the value. The StatisticInfo object records a list of all Statistic objects that were modified at some point. The first time Statistic modified it registers itself with the StatisticInfo object and we need a lock to avoid double-registration (or worse).

Ah, yeah, also, as the range returned is a couple of vector iterators, it looks like bumping any counter anywhere would potentially invalidate all the ranges existing. I suspect that might be a little error-prone and fragile even in a single-threaded environment. Maybe replace that vector with a list? And add a test that will check we don't regress on this little detail.
Alternatively, we could just copy the whole thing instead of returning iterators to it. Potentially protect that copy with that mutex and then we're rock solid.
But again, why do we even worry about this being thread-safe in any way one more time?

For a lot of tools, returning iterators is going to be fine. The process would typically be:

  • compile, possibly multiple compilations on multiple threads
  • wait for compilations
  • extract statistics

That's certainly good enough for my aims. However, you're right that there's cases where races can happen. For the most part, they're not an issue since to use this as a measuring tool you already need to manage your threads such that you don't have noise in the statistics from things you didn't want to measure. With those requirements, there's no new registrations while the results are retrieved. However, if it's used to capture a snapshot of the current values then stopping the world is a burden and it's possible for a new statistic to be registered while you're reading them, invalidating the iterators.

I'll switch it to return a std::vector<std::pair<StringRef(), unsigned>>. That will cover the latter use case too

dsanders updated this revision to Diff 136796.Mar 2 2018, 10:36 AM

Fix a race between GetStatistics() and addStatistic()
Documented caveats that are the callers responsibility to handle if it matters to them.

Thanks! That's looking great!

As discussed, a separate patch could add a lock on that mutex in PrintStatistics and PrintStatisticsJSON to avoid the race.

I'm amazed that C++ standard library doesn't appear to have a Writer/Reader Lock, only the regular mutex. That's a shame. That thing could be useful here.

LGTM (again =))

bogner added inline comments.Mar 2 2018, 11:14 AM
include/llvm/ADT/Statistic.h
63 ↗(On Diff #136796)

This assumes that every file that includes llvm/ADT/Statistic.h included llvm/Config/config.h first, which is actually never true due to our rules about include order. OTOH, we can't include config.h in this header, because it's llvm internal. I think if we want to do this we need to publish LLVM_ENABLE_STATS into llvm-config.h as API.

include/llvm/ADT/StatisticInfo.h
24–54

Why move this to the header / make this type public? AFAICT nobody uses it since the only interesting API for users is GetStatistics() below.

63

Don't capitalize function names. This should probably be getStatistics().

unittests/ADT/StatisticTest.cpp
29

Related to the Config.h comment above, I'm pretty sure this is always false in the test, as is.

dsanders added inline comments.Mar 2 2018, 2:27 PM
include/llvm/ADT/Statistic.h
63 ↗(On Diff #136796)

I've moved it to llvm-config.h and made Statistics.h include it.

As for whether it belongs in llvm-config.h or Config.h. At the moment, we ship Statistics.h and it needs to have access to the macro so llvm-config.h makes sense right now.

I do want to look into making STATISTIC() more fine-grained though. I haven't thought much about how I'm going to achieve that but if I can move the #if into the cpp file instead then it will make more sense to use Config.h. That said, Statistic will still be public and anyone using it will be quite surprised if it doesn't count and they can't see that in the header.

include/llvm/ADT/StatisticInfo.h
24–54

I'll move this back. An earlier version of the patch had GetStatistics() return this object itself

63

This one is capitalized to match the existing PrintStatistics() and PrintStatisticsJSON(). It would be strange to have one function in this family with a lower case first letter

unittests/ADT/StatisticTest.cpp
29

Well spotted.

dsanders updated this revision to Diff 136853.Mar 2 2018, 2:28 PM

Statistics can actually be gathered (LLVM_ENABLE_STATS isn't always false anymore). Tested out of the patch using #error in the unit test.
Moved StatisticInfo back

bogner accepted this revision.Mar 2 2018, 5:15 PM

One last comment, then I think this is good to go.

lib/Support/Statistic.cpp
54

I think you lost the anonymous namespace here by accident.

This revision was automatically updated to reflect the committed changes.