This is an archive of the discontinued LLVM Phabricator instance.

[MachineSink] Compile time improvement for large testcases which has many kill flags
ClosedPublic

Authored by yubing on Oct 12 2021, 7:29 PM.

Details

Summary

We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05

After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03

Diff Detail

Event Timeline

yubing created this revision.Oct 12 2021, 7:29 PM
yubing requested review of this revision.Oct 12 2021, 7:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 7:29 PM
yubing updated this revision to Diff 379255.Oct 12 2021, 7:40 PM

use DenseSet<unsigned> instead, please ignore previous patch

How do you calculate the kill flags handling time?

How do you calculate the kill flags handling time?

I am using std::chrono::high_resolution_clock::now()

std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
RegsToClearKillFlags.insert(MO.getReg()); // Remember to clear kill flags.
std::chrono::duration<double, std::milli> duration = std::chrono::high_resolution_clock::now() - start;

How do you calculate the kill flags handling time?

I am using std::chrono::high_resolution_clock::now()

std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
RegsToClearKillFlags.insert(MO.getReg()); // Remember to clear kill flags.
std::chrono::duration<double, std::milli> duration = std::chrono::high_resolution_clock::now() - start;

Do you also calculate the time of all the access to RegsToClearKillFlags?

How do you calculate the kill flags handling time?

I am using std::chrono::high_resolution_clock::now()

std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
RegsToClearKillFlags.insert(MO.getReg()); // Remember to clear kill flags.
std::chrono::duration<double, std::milli> duration = std::chrono::high_resolution_clock::now() - start;

Do you also calculate the time of all the access to RegsToClearKillFlags?

The rest code of maintaining RegsToClearKillFlags contributes minority of the compilation time.

huh... I guess the problem here was that we interpreted the register numbers as plain unsigned... And virtual registers always have bit 31 set, so I guess the bitset could indeed grow to unreasonable sizes.

  • Please try if DenseSet<Register> works too.
  • If you can please find a shorter more succinct title.

Then we should be good to land this.

llvm/lib/CodeGen/MachineSink.cpp
134

Does DenseSet<Register> work too? (if it does not then I'm fine with this).

yubing added inline comments.Oct 13 2021, 7:28 PM
llvm/lib/CodeGen/MachineSink.cpp
134

Thanks for suggestions, DenseSet<Register> works for me.

yubing updated this revision to Diff 379585.Oct 13 2021, 7:31 PM

Replace DenseSet<unsigned> with DenseSet<Register>

MatzeB accepted this revision.Oct 14 2021, 2:45 PM

Please shorten the title of this change!

LGTM

This revision is now accepted and ready to land.Oct 14 2021, 2:45 PM
yubing retitled this revision from [MachineSink] Compile time improvement for large testcases which has many kill flags We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags. to [MachineSink] Compile time improvement for large testcases which has many kill flags.Oct 18 2021, 12:40 AM
yubing edited the summary of this revision. (Show Details)

huh... I guess the problem here was that we interpreted the register numbers as plain unsigned... And virtual registers always have bit 31 set, so I guess the bitset could indeed grow to unreasonable sizes.

  • Please try if DenseSet<Register> works too.
  • If you can please find a shorter more succinct title.

Then we should be good to land this.

SparseBitVector shouldn't be effected by bit 31 being set. It stores 128 bit chunks of bits in a linked list. Insertion does a linear scan forward or backward from the most recently accessed chunk to trying to find the chunk to insert in. Were we accessing it in some pathologically bad way that caused long linear scans?

If the issue is with the inserting function, then the title of this patch is misleading. The number of kill flags in is irrelevant. The place where the insert happens doesn't know how many kill flags exist. Only the later call to MRI->clearKillFlags(I) would know that.

huh... I guess the problem here was that we interpreted the register numbers as plain unsigned... And virtual registers always have bit 31 set, so I guess the bitset could indeed grow to unreasonable sizes.

  • Please try if DenseSet<Register> works too.
  • If you can please find a shorter more succinct title.

Then we should be good to land this.

SparseBitVector shouldn't be effected by bit 31 being set. It stores 128 bit chunks of bits in a linked list. Insertion does a linear scan forward or backward from the most recently accessed chunk to trying to find the chunk to insert in. Were we accessing it in some pathologically bad way that caused long linear scans?

If the issue is with the inserting function, then the title of this patch is misleading. The number of kill flags in is irrelevant. The place where the insert happens doesn't know how many kill flags exist. Only the later call to MRI->clearKillFlags(I) would know that.

Hi, Craig.
Yeah, title is misleading. There are many regs whose kill flags need to be cleared, and machinesink visit these regs such randomly that SparseBitVector is not fit for storing info of these regs.
So I guess a better title is "[MachineSink] Compile time improvement for large testcases which has many regs whose kill flags need to be cleared"?