Page MenuHomePhabricator

[NFC] Remove waymarking because it improves performances
ClosedPublic

Authored by Tyker on Mar 31 2020, 7:29 AM.

Details

Summary

This patch remove waymarking and replaces it with storing a pointer to the User in the Use.
here are the results on the measurements for the CTMark tests of the test suite.

Metric: instructions_count

Program                                                      baseline      patched       diff 
 test-suite :: CTMark/ClamAV/clamscan.test                    72557942065   71733653521  -1.1%
 test-suite :: CTMark/sqlite3/sqlite3.test                    76281422939   75484840636  -1.0%
 test-suite :: CTMark/consumer-typeset/consumer-typeset.test  51364676366   50862185614  -1.0%
 test-suite :: CTMark/SPASS/SPASS.test                        60476106505   59908437767  -0.9%
 test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test              112578442329  111725050856 -0.8%
 test-suite :: CTMark/mafft/pairlocalalign.test               50846133013   50473644539  -0.7%
 test-suite :: CTMark/kimwitu++/kc.test                       54692641250   54349070299  -0.6%
 test-suite :: CTMark/7zip/7zip-benchmark.test                182216614747  181216091230 -0.5%
 test-suite :: CTMark/Bullet/bullet.test                      123459210616  122905866767 -0.4%
 Geomean difference                                                                      -0.8%

Metric: peak_memory_use

Program                                                      baseline  patched   diff 
 test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test              326864    338524    3.6%
 test-suite :: CTMark/sqlite3/sqlite3.test                    216412    221240    2.2%
 test-suite :: CTMark/7zip/7zip-benchmark.test                11808284  12022604  1.8%
 test-suite :: CTMark/Bullet/bullet.test                      6831752   6945988   1.7%
 test-suite :: CTMark/SPASS/SPASS.test                        2682552   2721820   1.5%
 test-suite :: CTMark/ClamAV/clamscan.test                    5037256   5107936   1.4%
 test-suite :: CTMark/consumer-typeset/consumer-typeset.test  2752728   2790768   1.4%
 test-suite :: CTMark/mafft/pairlocalalign.test               1517676   1537244   1.3%
 test-suite :: CTMark/kimwitu++/kc.test                       1090748   1103448   1.2%
 Geomean difference                                                               1.8%

Metric: compile_time

Program                                                      baseline patched diff 
 test-suite :: CTMark/consumer-typeset/consumer-typeset.test  14.71    14.38  -2.2%
 test-suite :: CTMark/sqlite3/sqlite3.test                    23.18    22.73  -2.0%
 test-suite :: CTMark/7zip/7zip-benchmark.test                57.96    56.99  -1.7%
 test-suite :: CTMark/ClamAV/clamscan.test                    20.75    20.49  -1.2%
 test-suite :: CTMark/kimwitu++/kc.test                       18.35    18.15  -1.1%
 test-suite :: CTMark/SPASS/SPASS.test                        18.72    18.57  -0.8%
 test-suite :: CTMark/mafft/pairlocalalign.test               14.09    14.00  -0.6%
 test-suite :: CTMark/Bullet/bullet.test                      37.38    37.19  -0.5%
 test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test              33.81    33.76  -0.2%
 Geomean difference                                                           -1.1%

i believe that it is worth trading +1.8% peak memory use for -1.1% compile time.
also this patch removes waymarking which simplifies the Use and User classes.

Diff Detail

Event Timeline

Tyker created this revision.Mar 31 2020, 7:29 AM

Why did instructions_count metric change?

Tyker added a comment.EditedMar 31 2020, 8:42 AM

Why did instructions_count metric change?

the instructions_count metric is the number of instruction the compiler executed when building the code not the number of instruction in the resulting binary.
i added it because it is usually close to compile time but has less noise.

fhahn added a subscriber: fhahn.Mar 31 2020, 8:55 AM
nikic added a comment.Mar 31 2020, 2:34 PM

I also ran the numbers, results are pretty similar to what you're seeing: instructions, max-rss.

I suspect that we might not be seeing the whole picture here though. I couldn't find the original discussion on this, but https://github.com/ggreif/EuroLLVM-Waymarking mentions that at the time it was introduced, waymarking was a 2.5% compile-time regression, but saved 12% memory on some large C++ workloads. We're clearly not seeing anything close to a 12% memory regression here, but it's not clear whether that's because memory usage characteristics in LLVM changed significantly over the last 12 years (which doesn't seem unplausible), or whether waymarking needs heavier workloads to really shine.

Huh, interesting. I think that taking this could make a lot of sense, I like the simplicity and performance benefits. Could you please start a discussion about it on llvm-dev?

@ekatz and @rriddle it would be good for you two to be aware of this.

ekatz added a subscriber: ggreif.Apr 1 2020, 12:31 AM

Let me shed a little light on the Waymarking algorithm:

It all began with @ggreif, in 2008.
He proposed the the Waymarking as a solution in LLVM Data Structures, and Putting Use on Diet.

As a solution, he opened a branch called use-diet in the LLVM SVN repository, which was merged in rL50943 : rGf6caff66a1bf.

According to the his blog, in http://heisenbug.blogspot.com/2008/05/merged.html; I quote:
"... the SPEC benchmark 447.dealII had a 13% reduced memory footprint ..."
and
"... observed slight speedup on kimwitu++ benchmark's compilation time."

In addition, according to @lattner's response in https://lists.llvm.org/pipermail/llvm-dev/2008-April/014151.html, the Waymarking seems to go up to even 15% improvement in memory usage.

Regarding the removal of the Waymarking, don't get me wrong, I am all in for its simplicity, and performance improvements, but I'm afraid more investigation needs to be done.
Please try to test using the benchmarks mentioned above, and maybe even larger workloads (like the linux kernel, if you can; and I know, at least it part, if not in full, the AOSP is using clang).

foad added a subscriber: foad.Apr 2 2020, 1:07 AM
foad added a comment.Apr 2 2020, 1:20 AM

I like this. It's a bit sad that waymarking isn't paying for itself, but if it's not then it should definitely go.

llvm/include/llvm/IR/Use.h
60

Maybe name the argument Parent instead of U? Does it actually need a default of nullptr?

llvm/lib/IR/User.cpp
45–46

Comment needs updating :-)

foad added a comment.Apr 2 2020, 1:24 AM

[...] at the time it was introduced, waymarking was a 2.5% compile-time regression, but saved 12% memory on some large C++ workloads. We're clearly not seeing anything close to a 12% memory regression here, but it's not clear whether that's because memory usage characteristics in LLVM changed significantly over the last 12 years (which doesn't seem unplausible) [...]

Can we easily collect statistics on how much memory is allocated for what kind of objects? E.g. how much of the peak usage is Uses, how much is Instructions, how much is BasicBlocks and so on?

Tyker added a comment.Apr 2 2020, 2:53 AM

Let me shed a little light on the Waymarking algorithm:

thanks for the insight

According to the his blog, in http://heisenbug.blogspot.com/2008/05/merged.html; I quote:
"... the SPEC benchmark 447.dealII had a 13% reduced memory footprint ..."

i don't have access to SPEC since it isn't open source.

and
"... observed slight speedup on kimwitu++ benchmark's compilation time."

from my an nikic's measurement this is not true anymore.

Regarding the removal of the Waymarking, don't get me wrong, I am all in for its simplicity, and performance improvements, but I'm afraid more investigation needs to be done.
Please try to test using the benchmarks mentioned above, and maybe even larger workloads (like the linux kernel, if you can; and I know, at least it part, if not in full, the AOSP is using clang).

i ran a build of clang while tracking max memory use of every translation units (2687).
the highest max memory use difference i saw was +7.27% on llvm/lib/DebugInfo/CodeView/EnumTables.cpp.
but it is a outlier since the the second highest was at +1.80% and there are 7 other TU above 1.5%
the average over all TU was a +0.25%.
i didn't measure compile-time during this build because it measuring it accurately would have taken a very long time.

[...] at the time it was introduced, waymarking was a 2.5% compile-time regression, but saved 12% memory on some large C++ workloads. We're clearly not seeing anything close to a 12% memory regression here, but it's not clear whether that's because memory usage characteristics in LLVM changed significantly over the last 12 years (which doesn't seem unplausible) [...]

Can we easily collect statistics on how much memory is allocated for what kind of objects? E.g. how much of the peak usage is Uses, how much is Instructions, how much is BasicBlocks and so on?

since allocation of Use is grouped with the User (and all derived classes of User) i don't think a tool could figure it out.
we could easily instrument manually every allocation of a Use to see the difference in allocation size but this wouldn't give us the difference at its peak.

Tyker updated this revision to Diff 254487.Apr 2 2020, 4:36 AM
Tyker marked an inline comment as done.

fixed comments

ekatz added a comment.Apr 3 2020, 9:23 PM

"llvm/docs/ProgrammersManual.rst" needs to be updated as well.

Tyker updated this revision to Diff 255072.Apr 4 2020, 12:13 PM

adapt ProgrammersManual
remove some waymarking specific comments.

lattner accepted this revision.Apr 15 2020, 9:36 AM

Sorry, I thought this was approved on llvm-dev. Looks good, thank you for simplifying the compiler!

This revision is now accepted and ready to land.Apr 15 2020, 9:36 AM
This revision was automatically updated to reflect the committed changes.