This is an archive of the discontinued LLVM Phabricator instance.

[SpecialCaseList] Remove TrigramIndex
ClosedPublic

Authored by ellis on Jun 16 2023, 2:32 PM.

Details

Reviewers
phosek
MaskRay
vitalybuka
Group Reviewers
Restricted Project
Commits
rG139067554276: [SpecialCaseList] Remove TrigramIndex
Summary

TrigramIndex was added back in https://reviews.llvm.org/D27188 as an optimization to make SpecialCaseList::match() faster. I've found that TrigramIndex actually makes the function slower and it has no functional use, so we can remove it.

I grabbed the list of queries passed to SpecialCaseList::match() on a random very large file (AArch64ISelLowering.cpp) and measured the runtime to call match() on all of them with this line disabled and then enabled.

$ hyperfine --warmup 3 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests' 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests'
Benchmark 1: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests
  Time (mean ± σ):     575.9 ms ±  20.3 ms    [User: 573.1 ms, System: 2.7 ms]
  Range (min … max):   555.5 ms … 620.0 ms    10 runs

Benchmark 2: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests
  Time (mean ± σ):     283.4 ms ±   6.7 ms    [User: 280.3 ms, System: 3.0 ms]
  Range (min … max):   277.0 ms … 294.9 ms    10 runs

Summary
  'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests' ran
    2.03 ± 0.09 times faster than 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests'

Using perf I found that most of the runtime in TrigramIndex::isDefinitelyOut() comes from a division operation that seems to come from std::unordered_map: https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/include/llvm/Support/TrigramIndex.h#L62

Removing TrigramIndex will make it easier to potentially switch to using GlobPattern instead of a full regex for SpecialCaseList. See discussion in https://reviews.llvm.org/D152762 for details.

Diff Detail

Event Timeline

ellis created this revision.Jun 16 2023, 2:32 PM
ellis published this revision for review.Jun 16 2023, 7:35 PM
ellis edited the summary of this revision. (Show Details)
ellis added reviewers: phosek, MaskRay.
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 7:36 PM
MaskRay accepted this revision.Jun 16 2023, 11:22 PM

Using perf I found that most of the runtime in TrigramIndex::isDefinitelyOut() comes from a division operation that seems to come from std::unordered_map: https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/include/llvm/Support/TrigramIndex.h#L62

I guess it doesn't use DenseMap to protect the case that the key collides with the empty/tombstone keys, but std::unordered_map has very bad performance...
Another problem with TrigramIndex is that we need to iterate over the full query, while with a regex engine I think we can often bail out early.

I have tested TrigramIndex removal with a large ubsan_ignorelist.txt and the performance is better (1.10+ x as fast). We should give some time for #sanitizers folks to respond.

This revision is now accepted and ready to land.Jun 16 2023, 11:22 PM
MaskRay added a reviewer: Restricted Project.Jun 16 2023, 11:23 PM
vitalybuka added a subscriber: vitalybuka.

I'd like to test on our ignore lists next.

This revision now requires review to proceed.Jun 16 2023, 11:28 PM
vitalybuka accepted this revision.EditedJun 18 2023, 9:20 PM

Using perf I found that most of the runtime in TrigramIndex::isDefinitelyOut() comes from a division operation that seems to come from std::unordered_map: https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/include/llvm/Support/TrigramIndex.h#L62

I guess it doesn't use DenseMap to protect the case that the key collides with the empty/tombstone keys, but std::unordered_map has very bad performance...
Another problem with TrigramIndex is that we need to iterate over the full query, while with a regex engine I think we can often bail out early.

I have tested TrigramIndex removal with a large ubsan_ignorelist.txt and the performance is better (1.10+ x as fast). We should give some time for #sanitizers folks to respond.

I tried with artificially complex list for msan of ~1000 entry, and no-TrigramIndex is faster. By my estimate, regex matches costs ~2% of total compilation time. But it was artificial list. On the real list, about the same size, most entries isLiteralERE, so I can see no regex cost at all. But If we have to optimize, maybe easy way is to split regexps into groups with fixed prefix, suffix, the rest... or even do not use regex at all.

This revision is now accepted and ready to land.Jun 18 2023, 9:20 PM

Using perf I found that most of the runtime in TrigramIndex::isDefinitelyOut() comes from a division operation that seems to come from std::unordered_map: https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/include/llvm/Support/TrigramIndex.h#L62

I guess it doesn't use DenseMap to protect the case that the key collides with the empty/tombstone keys, but std::unordered_map has very bad performance...
Another problem with TrigramIndex is that we need to iterate over the full query, while with a regex engine I think we can often bail out early.

I have tested TrigramIndex removal with a large ubsan_ignorelist.txt and the performance is better (1.10+ x as fast). We should give some time for #sanitizers folks to respond.

I tried with artificially complex list for msan of ~1000 entry, and no-TrigramIndex is faster. By my estimate, regex matches costs ~2% of total compilation time. But it was artificial list. On the real list, about the same size, most entries isLiteralERE, so I can see no regex cost at all. But If we have to optimize, maybe easy way is to split regexps into groups with fixed prefix, suffix, the rest... or even do not use regex at all.

Thanks for testing!

This revision was landed with ongoing or failed builds.Jun 19 2023, 10:41 AM
This revision was automatically updated to reflect the committed changes.
llvm/lib/Support/TrigramIndex.cpp