This is an archive of the discontinued LLVM Phabricator instance.

[libcxx][dsl] Reduce number of feature checks
AbandonedPublic

Authored by arichardson on Oct 9 2020, 2:29 AM.

Details

Reviewers
ldionne
Group Reviewers
Restricted Project
Summary

Even with caching of certain function calls the lit testsuite still takes
a while to start up. It turns out that most cache lookups are caused by
assertions inside the Feature class. This patch introduces a new
tryEnableIn() function. This allows us to only call isSupported() once
rather than currently tow or three times inside newconfig.py configure().

Another way of reducing the number of (potentially) expensive calls would
be to cache the value of the first isSupported() call (see D84055).

Diff Detail

Event Timeline

arichardson created this revision.Oct 9 2020, 2:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 9 2020, 2:29 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
arichardson requested review of this revision.Oct 9 2020, 2:29 AM

What's the time gain like for doing this?

What's the time gain like for doing this?

Without this change:

Benchmark #1: ./bin/llvm-lit -v --no-execute projects/libcxx/test/libcxx/selftest/dsl/
  Time (mean ± σ):      2.621 s ±  0.029 s    [User: 1.455 s, System: 0.847 s]
  Range (min … max):    2.593 s …  2.688 s    10 runs

With the patch applied:

Benchmark #1: ./bin/llvm-lit -v --no-execute projects/libcxx/test/libcxx/selftest/dsl/
  Time (mean ± σ):      2.466 s ±  0.019 s    [User: 1.387 s, System: 0.779 s]
  Range (min … max):    2.435 s …  2.500 s    10 runs

So it appears to save ~150ms or about 6% of the total run time.

What's the time gain like for doing this?

Measuring with perf on a Linux machine I get the following without (rG0db08e59c9d2d3b004ea61f96d823edff283ed25):

Performance counter stats for './bin/llvm-lit -v --no-execute projects/libcxx/test/libcxx/selftest/dsl/' (10 runs):

      3001.646813      task-clock (msec)         #    0.971 CPUs utilized            ( +-  0.68% )
              530      context-switches          #    0.177 K/sec                    ( +-  0.20% )
               37      cpu-migrations            #    0.012 K/sec                    ( +-  8.62% )
          223,059      page-faults               #    0.074 M/sec                    ( +-  0.01% )
    5,421,406,209      cycles                    #    1.806 GHz                      ( +-  0.26% )
    5,856,892,943      instructions              #    1.08  insn per cycle           ( +-  0.01% )
    1,168,779,209      branches                  #  389.379 M/sec                    ( +-  0.01% )
       30,692,873      branch-misses             #    2.63% of all branches          ( +-  0.02% )

      3.091204859 seconds time elapsed                                          ( +-  0.72% )

And the following with the patch:

Performance counter stats for './bin/llvm-lit -v --no-execute projects/libcxx/test/libcxx/selftest/dsl/' (10 runs):

      2833.725264      task-clock (msec)         #    0.968 CPUs utilized            ( +-  0.62% )
              503      context-switches          #    0.178 K/sec                    ( +-  0.42% )
               29      cpu-migrations            #    0.010 K/sec                    ( +- 12.05% )
          208,416      page-faults               #    0.074 M/sec                    ( +-  0.01% )
    5,135,758,720      cycles                    #    1.812 GHz                      ( +-  0.20% )
    5,564,092,379      instructions              #    1.08  insn per cycle           ( +-  0.01% )
    1,112,432,802      branches                  #  392.569 M/sec                    ( +-  0.01% )
       29,604,154      branch-misses             #    2.66% of all branches          ( +-  0.04% )

      2.928603087 seconds time elapsed                                          ( +-  0.70% )

So almost 200ms here. The difference is probably bigger because the first measurements had my locale detection changes applied and this one didn't.

Thanks a lot for the perf numbers and due diligence.

However, in most contexts, we run the test suite on at least a few tests, where a ~200ms speed up is literally unobservable because the tests are significantly slower than that anyway. I believe this isn't worth the increased complexity. Do you disagree? Do you have a use case where this speed up is observable?

Thanks a lot for the perf numbers and due diligence.

However, in most contexts, we run the test suite on at least a few tests, where a ~200ms speed up is literally unobservable because the tests are significantly slower than that anyway. I believe this isn't worth the increased complexity. Do you disagree? Do you have a use case where this speed up is observable?

The reason I noticed is that I have a shortcut in setup in my IDE to run a single test so I noticed that the lit startup time was very high. I agree that 200ms is not that much (and doesn't matter at all for the full testsuite), but it is a noticeable delay when running single tests.

And obviously the big win here is the lower-level caching of feature checks, this change is just a small (but noticeable) improvement. It made much more of a difference before caching. Since I run most tests on QEMU emulating CHERI, doing checks via SSH takes a lot longer than on the local machine.

I would rather keep the test harness code as straightforward as possible unless there is a sizeable performance benefit, which doesn't seem to be the case anymore since we do lower-level caching. So I would rather not move forward with this patch, if you think that's a reasonable argument.

Note that I think this is superseded by the refactor in https://reviews.llvm.org/D90429. That refactor will also allow defining straight flags in the DSL, without any feature being tied to them.

arichardson abandoned this revision.Oct 30 2020, 7:58 AM

Thanks! D90429 does indeed look much better.