Page MenuHomePhabricator

[CodeGen] Speedup stack slot sharing during stack coloring (interval overlapping test).
Needs ReviewPublic

Authored by vpykhtin on Tue, Mar 14, 8:51 AM.

Details

Summary

AMDGPU code with enabled address sanitizer generates tons of stack objects (> 200000 in my testcase) and
takes forever to compile due to the time spent on stack slot sharing.

While LiveRange::overlaps method has logarithmic complexity on the number of segments in the involved
liveranges the problem is that when a new interval is assigned to a used color it's tested against
overlapping every other assigned interval for that color.

Instead I decided to join all assigned intervals for a color into a single interval and this allows to
have logarithmic complexity on the number of segments for the joined interval.

This patch reduced time spent on stack slot coloring pass from 628 to 3 seconds on my testcase.

Diff Detail

Unit TestsFailed

TimeTest
130 msx64 debian > Flang.Driver::code-gen-rv64.f90
Script: -- : 'RUN: at line 5'; rm -f /var/lib/buildkite-agent/builds/llvm-project/build/tools/flang/test/Driver/Output/code-gen-rv64.f90.tmp.o
60,060 msx64 debian > MLIR.Examples/standalone::test.toy
Script: -- : 'RUN: at line 1'; /usr/bin/cmake /var/lib/buildkite-agent/builds/llvm-project/mlir/examples/standalone -G "Ninja" -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/var/lib/buildkite-agent/builds/llvm-project/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld -DPython3_EXECUTABLE="/usr/bin/python3.9"

Event Timeline

vpykhtin created this revision.Tue, Mar 14, 8:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Mar 14, 8:51 AM
vpykhtin requested review of this revision.Tue, Mar 14, 8:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Mar 14, 8:51 AM
arsenm added inline comments.Tue, Mar 14, 8:59 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
99

Typo sigle

124

Don't understand why this is the case? Don't these need to be renumbered?

vpykhtin added inline comments.Tue, Mar 14, 9:09 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
124

Joined interval isn't used for anything else than range overlapping test, I don't need value numbers.

Nicola resigned from this revision.Tue, Mar 14, 9:11 AM
arsenm added inline comments.Tue, Mar 14, 9:14 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
124

You might not need it, but will it fail the verifier?

vpykhtin added inline comments.Tue, Mar 14, 9:17 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
124

It never reaches verifier I believe, it lives locally and dies after the overlapping test is done. It's not a fully valid object I agree.

vpykhtin updated this revision to Diff 505145.Tue, Mar 14, 9:49 AM
  • fixed typo, debug build, added comment.
vpykhtin marked an inline comment as done.Tue, Mar 14, 9:50 AM

Hmm the main RAGreedy allocator is using the LiveRegMatrix and within that the LiveIntervalUnion classes to keep things performant. Maybe we can use those here too? In principle they should have even better runtime behavior than merging here...

Hmm the main RAGreedy allocator is using the LiveRegMatrix and within that the LiveIntervalUnion classes to keep things performant. Maybe we can use those here too? In principle they should have even better runtime behavior than merging here...

Thank you for the clue, I'll take a look.

vpykhtin updated this revision to Diff 506355.Sun, Mar 19, 12:51 AM

Tried version with LiveIntervalUnion but it works slightly slower than first:

~4.2 vs ~3 seconds per pass on my testcase.

There're 48005704 overlap tests and 277607 interval joins in my testcase so it
looks like LiveRange::overlaps works faster than LiveIntervalUnion::Query::checkInterference.

Not sure which version to choose.