This is an archive of the discontinued LLVM Phabricator instance.

Stop using isPhysRegModified() in RegUsageInfoCollector
ClosedPublic

Authored by kariddi on Mar 12 2017, 11:49 PM.

Details

Summary

The current way of computing the RegMask in RegUsageInfoCollector is very slow on architectures with certain register configurations.

That is particularly true on architectures with many registers that are aliasing a lot one another. An example of this in the LLVM source is the AMDGPU backend , that has almost 3500 registers and where the VGPR_512, VGPR_256, VGPR_* register are composed of consecutive instances of VGPR_32 registers. All these classes end up aliasing one another.

The current algorithm based on isPhysRegModified() is iterating over all the registers and calling isPhysRegModified() for each one.
isPhysRegModified() itself will scan over all the aliasing registers and if any of the aliased register is defined it will return true.
In a backend like AMDGPU where most registers alias with other 30-40 registers it results in a very high amount of iterations and is very slow.

The algorithm proposed in this patch instead of using isPhysRegModified checks that the register is defined and if it is it will iterate over the aliased registers and set them as defined as well.
I believe this should have the same effect as the previous algorithm, but in this case we'll only iterate over the aliased registers only if the register is defined.
Because most of the time only a subset of the whole set of LLVM registers is actually defined directly by an instruction in a function this results in an average much lower number of iterations.

Diff Detail

Repository
rL LLVM

Event Timeline

kariddi created this revision.Mar 12 2017, 11:49 PM

This change is supposed to be NFC

mehdi_amini added inline comments.Mar 13 2017, 10:38 AM
lib/CodeGen/RegUsageInfoCollector.cpp
107 ↗(On Diff #91517)

physical registers

115 ↗(On Diff #91517)

defined

118 ↗(On Diff #91517)

I implemented almost the same thing on Friday, but didn't have the time to benchmark it, how much did it improve Maxime's test-case? (this was taking ~37% of the total time IIRC).

Also I wrote something that looked like:

if (!MRI->def_empty(PReg) || UsedPhysRegsMask.test(PReg)) {
    for (MCRegAliasIterator AI(PReg, TRI, true); AI.isValid(); ++AI)
      RegMask[*AI / 32] &= ~(1u << *AI % 32);
    continue;
  }

Why don't you need to add the aliases for UsedPhysRegsMask?

Another approach that I talked about with @MatzeB was to perform two pass: one to collect which register units are modified and the second to mark the physical register that include a modified reg units. (I had planned to bench the two approaches).

MatzeB edited edge metadata.Mar 13 2017, 10:47 AM

In principle we can go with this. This is not completely NFC as it lacks the hack that ignores the defs from noreturn tail calls in MachineRegisterInfo.

lib/CodeGen/RegUsageInfoCollector.cpp
106 ↗(On Diff #91517)

Please no auto when the type isn't immediately visible.

Another approach that I talked about with @MatzeB was to perform two pass: one to collect which register units are modified and the second to mark the physical register that include a modified reg units. (I had planned to bench the two approaches).

  • I would expect the regunit approach to perform better especially when a lot of registers are in use.
  • For targets with thousands of registers it would probably also be more performant to simply walk the function searching for register operands instead of walking the list of registers.
kariddi added a comment.EditedMar 13 2017, 12:28 PM

@MatzeB I tried also doing the funciton scanning thing and is very fast in this case too. I think the approach in this patch is slightly faster here for that function (because was a very large one with a few thousands of instructions) , but both should be fine for our case. If you think this algorithm could be slower on more standard architectures we could have both maybe and decide which one to use depending on the number of registers?

What exactly is the hack you are talking about there.
Do you mean the one related to this code?

if (!SkipNoReturnDef && isNoReturnDef(MO))
  continue;

It seemed that it was ignored as isPhysRegModified() is called with "true" as an argument for "SkipNoReturnDef" , which would make this if always evaluate to false and consider any def of an aliasing register as fine for the purpose of "isPhysRegModified()"

lib/CodeGen/RegUsageInfoCollector.cpp
106 ↗(On Diff #91517)

Thanks, will do

115 ↗(On Diff #91517)

Whoops for this and the one above

118 ↗(On Diff #91517)

On that test went from 37% of compile time to about 0.1%

I don't add the aliases because that seems to be matching what the original code was doing.
The original code checks for each register if they are in the Used set (through isPhysRegModified) and adds them if they are.
This should be matching what that code was doing. In the case the register is defined it will be added to the set anyway (together with all the aliased ones), so I skip it in that case

kariddi updated this revision to Diff 91605.Mar 13 2017, 12:32 PM

Addressing comments from Mehdi and Matthias

kariddi marked 3 inline comments as done.Mar 13 2017, 12:32 PM
mehdi_amini added inline comments.Mar 13 2017, 12:36 PM
lib/CodeGen/RegUsageInfoCollector.cpp
118 ↗(On Diff #91517)

On that test went from 37% of compile time to about 0.1%

Great!

I don't add the aliases because that seems to be matching what the original code was doing.

I think you're right. It seemed incorrect to me but I just looked the comment for this and it says "UsedPhysRegMask - Additional used physregs including aliases". The fact that it includes alias makes the check redundant.

I suspect that we should first perform the if (UsedPhysRegsMask.test(PReg)) case before if (!MRI->def_empty(PReg)) {. That would save us the loop on the aliases (which are covered in UsedPhysRegsMask.test).

kariddi added inline comments.Mar 13 2017, 12:43 PM
lib/CodeGen/RegUsageInfoCollector.cpp
118 ↗(On Diff #91517)

You mean something like:

if (UsedPhysRegsMask.test(PReg) || !MRI->def_empty(PReg)) {

// Add aliases

}

This seems like a good idea.

mehdi_amini added inline comments.Mar 13 2017, 12:48 PM
lib/CodeGen/RegUsageInfoCollector.cpp
118 ↗(On Diff #91517)

This seems to be what I wrote in my local patch and asked above? But aliases are already part of UsedPhysRegsMask, so what I had in mind was rather:

 if (UsedPhysRegsMask.test(PReg)) {
   // mark used
   ...
   continue;
}
if (!MRI->def_empty(PReg)) {
  // Add aliases and current reg.
}

That way we don't process aliases ever when a reg is in UsedPhysRegsMask

112 ↗(On Diff #91605)

Could we not duplicate this bit twingling logic? Maybe a lambda can do it.

kariddi added inline comments.Mar 13 2017, 1:11 PM
lib/CodeGen/RegUsageInfoCollector.cpp
118 ↗(On Diff #91517)

I see what you mean now.

Sounds good.

kariddi updated this revision to Diff 91607.Mar 13 2017, 1:12 PM

Updated to reflect Mehdi's last comments

kariddi marked an inline comment as done.Mar 13 2017, 1:12 PM

You did not address the isNoReturnDef() part yet. Maybe you can make that a public function in MachineRegisterInfo so it can be re-used.

On a side note: I would love to get rid of that hack for good by simply not adding defs to tailcalls, but that is probably out of the scope for this patch.

MatzeB accepted this revision.Mar 13 2017, 1:37 PM

You did not address the isNoReturnDef() part yet. Maybe you can make that a public function in MachineRegisterInfo so it can be re-used.

On a side note: I would love to get rid of that hack for good by simply not adding defs to tailcalls, but that is probably out of the scope for this patch.

Sorry I missed one mail in between. You are right this is indeed not used in this context (though maybe it should be). In that case it is indeed NFC and LGTM.

This revision is now accepted and ready to land.Mar 13 2017, 1:37 PM
mehdi_amini accepted this revision.Mar 13 2017, 1:56 PM
This revision was automatically updated to reflect the committed changes.