This is an archive of the discontinued LLVM Phabricator instance.

[libcxx][lit] Cache the value of the feature lambda
AbandonedPublic

Authored by arichardson on Jul 17 2020, 10:40 AM.

Details

Reviewers
ldionne
Group Reviewers
Restricted Project
Summary

This avoids running configuration tests multiple times. I discovered this
using the collect binaries executor (D84045) since the output directory
contained two folders for each locale check since isSupported() is called
multiple times:

Before:
check_locale_cs_CZ.ISO8859-22l9izs95.cpp.dir
check_locale_cs_CZ.ISO8859-2kai5chqg.cpp.dir
check_locale_en_US.UTF-8av355dei.cpp.dir
check_locale_en_US.UTF-8bh_vxteu.cpp.dir
check_locale_fr_CA.ISO8859-16iqfdts7.cpp.dir
check_locale_fr_CA.ISO8859-1jytt_wqw.cpp.dir
check_locale_fr_FR.UTF-8jxintigp.cpp.dir
check_locale_fr_FR.UTF-8lp3_p554.cpp.dir
check_locale_ru_RU.UTF-84wd2uc3q.cpp.dir
check_locale_ru_RU.UTF-8pjuyogc7.cpp.dir
check_locale_zh_CN.UTF-8eekw_qhg.cpp.dir
check_locale_zh_CN.UTF-8zfw71vzu.cpp.dir

After:
check_locale_cs_CZ.ISO8859-2ldoy7luu.cpp.dir
check_locale_en_US.UTF-824v0soux.cpp.dir
check_locale_fr_CA.ISO8859-18nhriebb.cpp.dir
check_locale_fr_FR.UTF-82nn3grpg.cpp.dir
check_locale_ru_RU.UTF-8k4aduyv1.cpp.dir
check_locale_zh_CN.UTF-8pr4ax8yr.cpp.dir

This noticeably speeds up running a single test.
Adding some debugging printfs to _executeScriptInternal()
show that with this change it is only called 36 times
(with D88878 and D88884 applied) instead of 120 times.
Even with caching at a lower level, this reduces the lit startup time
(using a single test and --no-execute) from 5.7seconds to 2.3 seconds.

Diff Detail

Event Timeline

arichardson created this revision.Jul 17 2020, 10:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2020, 10:40 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
ldionne requested changes to this revision.Jul 17 2020, 10:44 AM

This doesn't work, since a feature could be supported in one configuration but not in another one. The caching would have to take into account the config.

Instead, I believe caching should be implemented at a lower level, i.e. where we run the actual locale tests. That's where we know the command-line we use to compile the test and the locale we're testing. Note that we originally cached subprocess.call calls, but this was removed when we moved to creating mini-Lit tests for configuration checks. I believe implementing caching of some sort at that lower level would be more sound and would yield potentially even larger benefits.

This revision now requires changes to proceed.Jul 17 2020, 10:44 AM
arichardson planned changes to this revision.Jul 17 2020, 10:51 AM

Naive workaround for different configs.

I am not sure if this is sufficient but it works for my standalone libunwind+libcxx testing

arichardson edited the summary of this revision. (Show Details)Oct 6 2020, 3:13 AM

ping? Even with caching at the lower levels this still appears to make a noticeable difference.

Additionally, I feel like isSupported() potentially returning different values depending on later changes to the configuration object is confusing and it should always use the initially detected value.

arichardson abandoned this revision.Oct 10 2020, 5:49 AM

The startup improvement is very similar but slightly higher than D89110: 5,048,426,054 cycles instead of 5,421,406,209 baseline. D89110 reports 5,135,758,720. Not sure if this is worth it since this seems more fragile.

Performance counter stats for './bin/llvm-lit -v --no-execute projects/libcxx/test/libcxx/selftest/dsl/' (10 runs):

      2842.501038      task-clock (msec)         #    0.968 CPUs utilized            ( +-  0.47% )
              503      context-switches          #    0.177 K/sec                    ( +-  0.31% )
               38      cpu-migrations            #    0.013 K/sec                    ( +-  6.02% )
          204,788      page-faults               #    0.072 M/sec                    ( +-  0.01% )
    5,048,426,054      cycles                    #    1.776 GHz                      ( +-  0.25% )
    5,494,868,028      instructions              #    1.09  insn per cycle           ( +-  0.01% )
    1,099,097,312      branches                  #  386.666 M/sec                    ( +-  0.01% )
       29,349,374      branch-misses             #    2.67% of all branches          ( +-  0.02% )

      2.935697590 seconds time elapsed                                          ( +-  0.45% )