This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] NFCI, Getting GlobalISel ~5% faster
ClosedPublic

Authored by rtereshin on May 13 2018, 12:11 PM.

Details

Summary

by replacing DenseMap with IndexedMap for LLTs within MRI, as
benchmarked by cross-compiling sqlite3 amalgamation for AArch64
on x86 machine.

Per-pass diffs follow:

IRTranslator: +0.1%
Legalizer: -4.5%
RegBankSelect: -4.4%
Localizer: -2.5%
InstructionSelect: -8.8%

Total GlobalISel: -5.5%

More data below:

llc -O0 -mtriple arm64-ios-apple sqlite3.bc -time-passes -time-compilations=20

BeforeAfterDiff
GlobalISel 5 Passes7.78477.3563
7.79277.3586
7.78967.3576
7.78767.3538
7.77037.3615
Min7.7707.354-5.4%
Avg7.7857.358-5.5%
Err0.3%0.1%
IRTranslator1.76601.7682
1.77371.7678
1.76821.7674
1.76261.7702
1.75951.7667
Min1.7601.7670.4%
Avg1.7661.7680.1%
Err0.8%0.2%
Legalizer0.80890.7729
0.80730.7729
0.80880.7732
0.80990.7705
0.80740.7720
Min0.8070.771-4.6%
Avg0.8080.772-4.5%
Err0.3%0.4%
RegBankSelect0.80340.7638
0.79880.7669
0.80110.7635
0.80040.7640
0.79850.7670
Min0.7990.764-4.4%
Avg0.8000.765-4.4%
Err0.6%0.5%
Localizer0.48000.4693
0.48000.4684
0.48040.4680
0.48030.4676
0.47970.4679
Min0.4800.468-2.5%
Avg0.4800.468-2.5%
Err0.1%0.4%
InstructionSelect3.92643.5821
3.93293.5826
3.93113.5855
3.93443.5815
3.92523.5879
Min3.9253.582-8.8%
Avg3.9303.584-8.8%
Err0.2%0.2%
Notes:Err = (Max - Min) / Min
Diff = (After - Before) / Before

Diff Detail

Repository
rL LLVM

Event Timeline

rtereshin created this revision.May 13 2018, 12:11 PM
rtereshin edited the summary of this revision. (Show Details)May 14 2018, 9:50 AM
rtereshin edited the summary of this revision. (Show Details)

This is pretty non-invasive, so code LGTM aside from one minor stylistic comment on an assertion. I would wait for comments from another reviewer though, since I'm not super familiar with this particular bit of code.

It'd also be nice to just see the high level results of a CTMark run if possible? I love the in-depth data analysis here, but I think it would be nice to have a couple more data points just to make sure there are no regressions.

lib/CodeGen/GlobalISel/InstructionSelect.cpp
205

Maybe something like "VReg's low-level type and register class have different sizes" would be a bit more concise?

rtereshin marked an inline comment as done.

Updating a GISelFailure message so it sounds more like English ;-)

rtereshin added inline comments.May 15 2018, 10:33 AM
lib/CodeGen/GlobalISel/InstructionSelect.cpp
205

Absolutely, thanks for the suggestion. I'm updating the patch.

Nice finding Roman.

I second Jessica, we should have more comprehensive testing to make sure we are not optimizing for one benchmark.

Otherwise LGTM.

Nice finding Roman.

I second Jessica, we should have more comprehensive testing to make sure we are not optimizing for one benchmark.

Otherwise LGTM.

Thanks!

Here we go:

NamePrevCurrent%ΔMADN
CTMark/7zip/7zip-benchmark15.390315.2923-0.64%-0.09800.004730
CTMark/7zip/7zip-benchmark-link0.07500.0745-0.67%-0.00050.002430
CTMark/Bullet/bullet12.026411.9626-0.53%-0.06380.008130
CTMark/Bullet/bullet-link0.06660.0662-0.60%-0.00040.000530
CTMark/ClamAV/clamscan3.28513.2562-0.88%-0.02890.003430
CTMark/ClamAV/clamscan-link0.02750.02750.00%0.00000.000330
CTMark/SPASS/SPASS2.69502.6815-0.50%-0.01350.002930
CTMark/SPASS/SPASS-link0.03030.03030.00%0.00000.000230
CTMark/consumer-typeset/consumer-typeset2.59222.5650-1.05%-0.02720.004730
CTMark/consumer-typeset/consumer-typeset-link0.02520.02520.00%0.00000.000230
CTMark/kimwitu++/kc5.84385.8180-0.44%-0.02580.007530
CTMark/kimwitu++/kc-link0.06520.0649-0.46%-0.00030.000330
CTMark/lencod/lencod2.44052.4152-1.04%-0.02530.002830
CTMark/lencod/lencod-link0.02640.02640.00%0.00000.000130
CTMark/mafft/pairlocalalign1.20781.1960-0.98%-0.01180.002730
CTMark/mafft/pairlocalalign-link0.01890.01890.00%0.00000.000130
CTMark/sqlite3/sqlite30.63030.6200-1.63%-0.01030.002630
CTMark/sqlite3/sqlite3-link0.01940.0193-0.52%-0.00010.000030
CTMark/tramp3d-v4/tramp3d-v43.62543.5962-0.81%-0.02920.011330
CTMark/tramp3d-v4/tramp3d-v4-link0.11760.1173-0.26%-0.00030.000430

Both are LLVM + Clang Release builds, assertions off, targets enabled are x86 and AArch64.

lnt is ran the following way:

lnt runtest test-suite --sandbox tmp --cc ./llvm/build/obj/bin/clang \
  --cxx ./llvm/build/obj/bin/clang++ --use-cmake /usr/local/bin/cmake \
  --use-lit ./llvm/build/obj/bin/llvm-lit --test-suite $SRC/llvm-test-suite \
  --cflags '-O0' --cxxflags '-O0' --only-compile --build-threads 1 \
  --cmake-define TEST_SUITE_SUBDIRS=CTMark \
  --cmake-cache ./llvm-test-suite/cmake/caches/target-arm64-iphoneos.cmake \
  --benchmarking-only --compile-multisample 30 --succinct-compile-output

The hardware is iMac (SSD) running macOS.

With the same config, but comparing pre- and post-https://reviews.llvm.org/D44700 (partially committed) + this patch:

NamePrevCurrent%ΔMAD
CTMark/7zip/7zip-benchmark15.427215.2943-0.86%-0.13290.0096
CTMark/7zip/7zip-benchmark-link0.07790.07790.00%00.0009
CTMark/Bullet/bullet12.04811.9608-0.72%-0.08720.0069
CTMark/Bullet/bullet-link0.0670.0670.00%00.0004
CTMark/ClamAV/clamscan3.34723.2349-3.36%-0.11230.0037
CTMark/ClamAV/clamscan-link0.0280.0280.00%00.0002
CTMark/SPASS/SPASS2.74732.6692-2.84%-0.07810.0027
CTMark/SPASS/SPASS-link0.03050.03050.00%00.0001
CTMark/consumer-typeset/consumer-typeset2.65142.5437-4.06%-0.10770.0049
CTMark/consumer-typeset/consumer-typeset-link0.02550.02550.00%00.0001
CTMark/kimwitu++/kc5.94775.8094-2.33%-0.13830.0057
CTMark/kimwitu++/kc-link0.06530.0653-0.08%00.0003
CTMark/lencod/lencod2.51052.3974-4.51%-0.11320.0026
CTMark/lencod/lencod-link0.02680.02680.00%00.0002
CTMark/mafft/pairlocalalign1.23831.1894-3.94%-0.04890.0034
CTMark/mafft/pairlocalalign-link0.0190.0190.00%00.0001
CTMark/sqlite3/sqlite30.67880.6089-10.30%-0.06990.0027
CTMark/sqlite3/sqlite3-link0.01950.01950.00%00.0001
CTMark/tramp3d-v4/tramp3d-v43.71553.5861-3.48%-0.12940.0091
CTMark/tramp3d-v4/tramp3d-v4-link0.11850.1184-0.08%-0.00010.0005
qcolombet accepted this revision.May 23 2018, 12:49 PM
This revision is now accepted and ready to land.May 23 2018, 12:49 PM
This revision was automatically updated to reflect the committed changes.