This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking.
ClosedPublic

Authored by oontvoo on Aug 4 2020, 3:36 PM.

Details

Summary

This is mostly for the benefit of the LBR latency mode.
Right now, it performs no checking. If this is run on a non-supported hardware, it will produce all zeroes for latency.

Diff Detail

Event Timeline

oontvoo created this revision.Aug 4 2020, 3:36 PM
oontvoo requested review of this revision.Aug 4 2020, 3:36 PM
oontvoo updated this revision to Diff 283295.Aug 5 2020, 10:30 AM

Fixed clangtidy warnings

Thanks Vy for working on this!

My main comments are about the feature flag + feature strings: rather than specifying the requested features through the command-line flag, we should auto-detect which features are required and then check for them appropriately (in part, we're already doing this with the uops counters, which are available only on certain platforms). Just this - and perhaps being able to show which features are supported by the host should be sufficient & give a better user experience.

llvm/tools/llvm-exegesis/lib/Target.h
148

We probably want to reverse this - make all features disabled by default and whitelist them only when we know they are available. We'll typically be adding a check when some feature is generally not available, so it would make more sense to disable it unless we can prove that it is available.

llvm/tools/llvm-exegesis/lib/X86/Target.cpp
677

I'd prefer to avoid passing around raw strings for feature checks - this would make it difficult to e.g. put together a list of all features that are available across all platforms (i.e. the list of valid inputs to this function), or the list of features that are available on the current host.

If we want to keep this generic function, I'd prefer having an enum type that enumerates all the possible features, and functions for parsing and string formatting that could be used in the command-line interface (more on the interface in the comments).

Then, for each target, we could list all the features (enum values) that are always supported, and a list of those that need a host-specific check (along with a function/object that checks for their presence).

llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp
48

This is not portable - the inline assembly is compiler-specific, and this would not work e.g. with Microsoft's C++ compiler. Sadly, a portable implementation would include a lot of #ifdefs and different implementation - see for example DoNotOptimize() in the benchmark library.

The code looks quite complex and fragile, so I'd prefer not duplicating it. I'm wondering if we should depend on the benchmark library (which would mean some unexpected and possibly dangerous dependencies), or use another way to prevent the compiler from fully inlining this, e.g. make the number of iterations somehow depend on the host/the environment, e.g. the current time.

154

Style nit: checkLbrSupport

163

Style nit: Sum

164

Style nit: I

171

Can this be const auto?

180

I'd prefer being more concrete here, e.g. "LBR format with cycles is not supported on the host", since we'll also return this error when the LBR is there, but it doesn't have the timing info (e.g. on Haswell).

llvm/tools/llvm-exegesis/llvm-exegesis.cpp
166

I'd prefer removing the flag altogether. With this flag, we're effectively asking the user to tell us the CPU features that will be used by llvm-exegesis given the values of the other command-line flags, which is something we should be able to determine ourselves in the code from the other flags.

Rather than have this flag, we should recognize that the user is trying to use the LBR (or some other future feature) from other command-line flags, and run the check automatically for them. From the viewpoint of the user, this will be a significant reduction of complexity (they will not need to know if they have to check for some features), but the overall behavior of the program will remain the same (there will be a non-zero exit code if the feature is not supported).

What might make sense instead (though it might be dangerous too) is a flag that would make llvm-exegesis exit peacefully if the required feature is not there. We could use this to make tests no-ops on targets where the feature is not supported.

oontvoo updated this revision to Diff 286626.Aug 19 2020, 11:52 AM
oontvoo marked 7 inline comments as done.

Updated diff

llvm/tools/llvm-exegesis/lib/Target.h
148

Done. refactored it so that the counter check for the features depending on which mode is requested .
Specifically, if the target sees that lbr is requested, then it'll check for LBR otherwise, assume it's fine to run benchmarks

llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp
48

Hmm, can we not assume llvm-exegesis will always be built with clang?

llvm/tools/llvm-exegesis/llvm-exegesis.cpp
166

How about this? We change it to a bool flag to say whether a smoke check should be done prior to running the benchmarks
If the flag is set, then then ask the target and respective counter(s) to do a check of hardware/software.
If flag is NOT set, then assume support is present.

oontvoo updated this revision to Diff 288033.Aug 26 2020, 10:22 AM

Use a more portable way to prevent loop unrolling

ondrasej added inline comments.Aug 28 2020, 3:45 AM
llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp
48

Normally, it would be built by the C++ compiler available on the host and selected during the configuration of LLVM build. Depending on the host system, it might be Clang, GCC, MSVC++ or others.

166

time_t is usually the number of seconds since the Unix epoch, so this might be (in terms of CPU time) a very long wait. Something between 1-10 milliseconds should be sufficient).

This precision is not possible with time(), so we'd have to use the <chrono> library from the C++ standard, possibly the TimePoint type from llvm/include/llvm/Support/Chrono.h.

214

A more C++14 way would be
auto DataBuf = std::make_unique<char[]>(kDataBufferSize);

llvm/tools/llvm-exegesis/llvm-exegesis.cpp
166

If the check that all the requires features are supported isn't too expensive (which it shouldn't be with the millisecond time limit), I'd prefer running it every time as a safety measure.

As for the flag - I think the main problem we need to solve here is that LBR tests will fail if we run them on hardware that doesn't support LBR timings (i.e. if they run on anything beyond somewhat recent Intel CPUs) or on a system that doesn't have LBR support.

I'm not sure that the flag would really help us - what we probably need one of the following:

  • a flag that tests for the presence of the required features on the host, and that fails if and only if the needed features are not available (but it would never fail for any other reason). We'd also have to instrument the tests to first run this feature autodetection.
  • a flag that makes llvm-exegesis "succeed" (i.e. end with exit code 0, though they might produce a message that informs the user that the test was skipped) on hosts where the features are not supported, so that the tests turn into a no-op.

There might be other ways to do this, but these two look like the simplest ones to implement.

oontvoo updated this revision to Diff 291994.Sep 15 2020, 11:54 AM
oontvoo marked 2 inline comments as done.

Removed the check-support flag and always performs the check.

ondrasej added inline comments.Sep 16 2020, 1:13 PM
llvm/tools/llvm-exegesis/lib/X86/Target.cpp
679

Nit: LBR

oontvoo updated this revision to Diff 292327.Sep 16 2020, 1:27 PM
oontvoo marked an inline comment as done.

Updated comment

oontvoo updated this revision to Diff 293934.Sep 23 2020, 8:46 PM

reduce wait time

oontvoo updated this revision to Diff 294069.Sep 24 2020, 8:21 AM

Reduce wait time to 5ms. (not really meaningful - we just use this as a way to prevent unrolling)

oontvoo updated this revision to Diff 295317.Sep 30 2020, 9:12 AM

More clean up.

ondrasej accepted this revision.Sep 30 2020, 9:20 AM

Thanks Vy!

This revision is now accepted and ready to land.Sep 30 2020, 9:20 AM
oontvoo marked 4 inline comments as done.Sep 30 2020, 9:25 AM
hliao added a subscriber: hliao.Sep 30 2020, 8:26 PM

This change is reverted as, on hosts without LBR supported but with LIBPFM installed and used, this change makes llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s failed. On that host, perf_event_open fails with EOPNOTSUPP on LBR config. That change's basic assumption

If this is run on a non-supported hardware, it will produce all zeroes for latency.

could not stand as perf_event_open system call will fail if the underlying hardware really doesn't have LBR supported.

Here's the output from

llvm-lit -v llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s
-- Testing: 1 tests, 1 workers --
FAIL: LLVM :: tools/llvm-exegesis/X86/lbr/mov-add.s (1 of 1)
******************** TEST 'LLVM :: tools/llvm-exegesis/X86/lbr/mov-add.s' FAILED ********************
Script:
--
: 'RUN: at line 1';   llvm-exegesis -mode=latency --repetition-mode=loop --x86-lbr-sample-period=521 --snippets-file=llvm/test/tools/llvm-exegesis/X86/lbr/Inputs/mov_add.att
--
Exit Code: 134

Command Output (stderr):
--
Unable to open event. ERRNO: Operation not supported. Make sure your kernel allows user space perf monitoring.
You may want to try:
$ sudo sh -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
llvm-exegesis: llvm/tools/llvm-exegesis/lib/PerfHelper.cpp:111: llvm::exegesis::pfm::Counter::Counter(llvm::exegesis::pfm::PerfEvent&&): Assertion `FileDescriptor != -1 && "Unable to open event"' failed.
test/tools/llvm-exegesis/X86/lbr/Output/mov-add.s.script: line 1: 34962 Aborted                 llvm-exegesis -mode=latency --repetition-mode=loop --x86-lbr-sample-period=521 --snippets-file=llvm/test/tools/llvm-exegesis/X86/lbr/Inputs/mov_add.att

--

********************
********************
Failed Tests (1):
  LLVM :: tools/llvm-exegesis/X86/lbr/mov-add.s


Testing Time: 0.11s
  Failed: 1

Please note perf_event_open returns ERRNO: Operation not supported.

This change is reverted as, on hosts without LBR supported but with LIBPFM installed and used, this change makes llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s failed. On that host, perf_event_open fails with EOPNOTSUPP on LBR config. That change's basic assumption

If this is run on a non-supported hardware, it will produce all zeroes for latency.

could not stand as perf_event_open system call will fail if the underlying hardware really doesn't have LBR supported.

tl;dr: This patch is still needed because on certain microarchitectures, the LBR is present but it doesn't provide timing information. This patch detects these cases and adds an appropriate error message.

The issue here is that the LBR is not either available or not, but it has multiple versions that have different features and different output formats (see the Intel SDM, vol 3B, section 17.4, if you're interested in the details). The Linux kernel takes care of most of it, abstracts away the details, and transforms the data from the hardware format into a unified format defined in perf_event.h.
Earlier versions of the LBR in Haswell and Broadwell had only the addresses of the branches, but no timing information. On these hosts, the kernel returns the addresses as usual, but it puts zero cycles for all branches. Moreover, the LBR format and feature support is readable only in kernel mode (it's provided through a MSR, not in CPUID) and the Linux kernel doesn't expose it to user mode.

So, depending on the microarchitecture of the host, one of the following may happen:

  • if the host doesn't have LBR support at all, perf_event_open will fail as you said.
  • if the host supports LBR but it's an older microarchitecture (Haswell, Broadwell, and maybe others), perf_event_open and reading from the LBR will succeed, but it will contain zero cycles for all branches, as provided by the Linux kernel. The goal of this patch is to detect this situation and report an error rather than return invalid numbers.
  • only if the host supports LBR and LBR timings (i.e. it is a Skylake or newer), will we get the number of cycles between the branches.

We need to find a way to make the mov-add.s test behave in all three cases mentioned above/regardless of the microarchitecture it runs on. But this patch still handles valid situations.

Ah, the issue here was that lit.local.cfg , which was supposed to detect if lbr-tests are runnable, didn't use the right arguments . I've updated it and the test should now be skipped on unsupported kernels/architectures..