This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Speedup stack slot sharing during stack coloring (interval overlapping test).
ClosedPublic

Authored by vpykhtin on Mar 14 2023, 8:51 AM.

Details

Summary

AMDGPU code with enabled address sanitizer generates tons of stack objects (> 200000 in my testcase) and
takes forever to compile due to the time spent on stack slot sharing.

While LiveRange::overlaps method has logarithmic complexity on the number of segments in the involved
liveranges the problem is that when a new interval is assigned to a used color it's tested against
overlapping every other assigned interval for that color.

Instead I decided to join all assigned intervals for a color into a single interval and this allows to
have logarithmic complexity on the number of segments for the joined interval.

This patch reduced time spent on stack slot coloring pass from 628 to 3 seconds on my testcase.

Diff Detail

Event Timeline

vpykhtin created this revision.Mar 14 2023, 8:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 8:51 AM
vpykhtin requested review of this revision.Mar 14 2023, 8:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 8:51 AM
arsenm added inline comments.Mar 14 2023, 8:59 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
99

Typo sigle

124

Don't understand why this is the case? Don't these need to be renumbered?

vpykhtin added inline comments.Mar 14 2023, 9:09 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
124

Joined interval isn't used for anything else than range overlapping test, I don't need value numbers.

Nicola resigned from this revision.Mar 14 2023, 9:11 AM
arsenm added inline comments.Mar 14 2023, 9:14 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
124

You might not need it, but will it fail the verifier?

vpykhtin added inline comments.Mar 14 2023, 9:17 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
124

It never reaches verifier I believe, it lives locally and dies after the overlapping test is done. It's not a fully valid object I agree.

vpykhtin updated this revision to Diff 505145.Mar 14 2023, 9:49 AM
  • fixed typo, debug build, added comment.
vpykhtin marked an inline comment as done.Mar 14 2023, 9:50 AM

Hmm the main RAGreedy allocator is using the LiveRegMatrix and within that the LiveIntervalUnion classes to keep things performant. Maybe we can use those here too? In principle they should have even better runtime behavior than merging here...

Hmm the main RAGreedy allocator is using the LiveRegMatrix and within that the LiveIntervalUnion classes to keep things performant. Maybe we can use those here too? In principle they should have even better runtime behavior than merging here...

Thank you for the clue, I'll take a look.

vpykhtin updated this revision to Diff 506355.Mar 19 2023, 12:51 AM

Tried version with LiveIntervalUnion but it works slightly slower than first:

~4.2 vs ~3 seconds per pass on my testcase.

There're 48005704 overlap tests and 277607 interval joins in my testcase so it
looks like LiveRange::overlaps works faster than LiveIntervalUnion::Query::checkInterference.

Not sure which version to choose.

vpykhtin updated this revision to Diff 508913.Mar 28 2023, 12:39 AM
  • Rebased.
arsenm added inline comments.Mar 31 2023, 2:24 PM
llvm/lib/CodeGen/StackSlotColoring.cpp
101–103

I don't understand what this is doing, the parameter is just the allocator? Why can't you just normally construct a LiveIntervalUnion without placement new?

vpykhtin added inline comments.Apr 1 2023, 12:13 AM
llvm/lib/CodeGen/StackSlotColoring.cpp
101–103

ColorAssignmentInfo objects are placed in SmallVector so they need default constructor.

arsenm accepted this revision.Apr 4 2023, 6:52 AM

LGTM

This revision is now accepted and ready to land.Apr 4 2023, 6:52 AM
vpykhtin updated this revision to Diff 511265.Apr 5 2023, 9:29 PM

Rebased before landing.