This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
4/12
RegUsageInfoCollector.cpp

Differential D30880

Stop using isPhysRegModified() in RegUsageInfoCollector
ClosedPublic

Authored by kariddi on Mar 12 2017, 11:49 PM.

Download Raw Diff

Details

Reviewers

MatzeB
mehdi_amini

Commits

rG598d89a3f494: [IPRA] Change algorithm for RegUsageInfoCollector.
rL297673: [IPRA] Change algorithm for RegUsageInfoCollector.

Summary

The current way of computing the RegMask in RegUsageInfoCollector is very slow on architectures with certain register configurations.

That is particularly true on architectures with many registers that are aliasing a lot one another. An example of this in the LLVM source is the AMDGPU backend , that has almost 3500 registers and where the VGPR_512, VGPR_256, VGPR_* register are composed of consecutive instances of VGPR_32 registers. All these classes end up aliasing one another.

The current algorithm based on isPhysRegModified() is iterating over all the registers and calling isPhysRegModified() for each one.
isPhysRegModified() itself will scan over all the aliasing registers and if any of the aliased register is defined it will return true.
In a backend like AMDGPU where most registers alias with other 30-40 registers it results in a very high amount of iterations and is very slow.

The algorithm proposed in this patch instead of using isPhysRegModified checks that the register is defined and if it is it will iterate over the aliased registers and set them as defined as well.
I believe this should have the same effect as the previous algorithm, but in this case we'll only iterate over the aliased registers only if the register is defined.
Because most of the time only a subset of the whole set of LLVM registers is actually defined directly by an instruction in a function this results in an average much lower number of iterations.

Diff Detail

Repository: rL LLVM

Event Timeline

kariddi created this revision.Mar 12 2017, 11:49 PM

Herald added a subscriber: tpr. · View Herald TranscriptMar 12 2017, 11:49 PM

kariddi added reviewers: mehdi_amini, MatzeB.Mar 12 2017, 11:50 PM

This change is supposed to be NFC

mehdi_amini added inline comments.Mar 13 2017, 10:38 AM

lib/CodeGen/RegUsageInfoCollector.cpp
107	`physical registers`
115	`defined`
118	I implemented almost the same thing on Friday, but didn't have the time to benchmark it, how much did it improve Maxime's test-case? (this was taking ~37% of the total time IIRC). Also I wrote something that looked like: if (!MRI->def_empty(PReg) \|\| UsedPhysRegsMask.test(PReg)) { for (MCRegAliasIterator AI(PReg, TRI, true); AI.isValid(); ++AI) RegMask[AI / 32] &= ~(1u << AI % 32); continue; } Why don't you need to add the aliases for `UsedPhysRegsMask`? Another approach that I talked about with @MatzeB was to perform two pass: one to collect which register units are modified and the second to mark the physical register that include a modified reg units. (I had planned to bench the two approaches).

In principle we can go with this. This is not completely NFC as it lacks the hack that ignores the defs from noreturn tail calls in MachineRegisterInfo.

lib/CodeGen/RegUsageInfoCollector.cpp
106	Please no `auto` when the type isn't immediately visible.

Another approach that I talked about with @MatzeB was to perform two pass: one to collect which register units are modified and the second to mark the physical register that include a modified reg units. (I had planned to bench the two approaches).

I would expect the regunit approach to perform better especially when a lot of registers are in use.
For targets with thousands of registers it would probably also be more performant to simply walk the function searching for register operands instead of walking the list of registers.

@MatzeB I tried also doing the funciton scanning thing and is very fast in this case too. I think the approach in this patch is slightly faster here for that function (because was a very large one with a few thousands of instructions) , but both should be fine for our case. If you think this algorithm could be slower on more standard architectures we could have both maybe and decide which one to use depending on the number of registers?

What exactly is the hack you are talking about there.
Do you mean the one related to this code?

if (!SkipNoReturnDef && isNoReturnDef(MO))
  continue;

It seemed that it was ignored as isPhysRegModified() is called with "true" as an argument for "SkipNoReturnDef" , which would make this if always evaluate to false and consider any def of an aliasing register as fine for the purpose of "isPhysRegModified()"

lib/CodeGen/RegUsageInfoCollector.cpp
106	Thanks, will do
115	Whoops for this and the one above
118	On that test went from 37% of compile time to about 0.1% I don't add the aliases because that seems to be matching what the original code was doing. The original code checks for each register if they are in the Used set (through isPhysRegModified) and adds them if they are. This should be matching what that code was doing. In the case the register is defined it will be added to the set anyway (together with all the aliased ones), so I skip it in that case

Addressing comments from Mehdi and Matthias

kariddi marked 3 inline comments as done.Mar 13 2017, 12:32 PM

mehdi_amini added inline comments.Mar 13 2017, 12:36 PM

lib/CodeGen/RegUsageInfoCollector.cpp
118	On that test went from 37% of compile time to about 0.1% Great! I don't add the aliases because that seems to be matching what the original code was doing. I think you're right. It seemed incorrect to me but I just looked the comment for this and it says "UsedPhysRegMask - Additional used physregs including aliases". The fact that it includes alias makes the check redundant. I suspect that we should first perform the `if (UsedPhysRegsMask.test(PReg))` case before `if (!MRI->def_empty(PReg)) {`. That would save us the loop on the aliases (which are covered in UsedPhysRegsMask.test).

kariddi added inline comments.Mar 13 2017, 12:43 PM

lib/CodeGen/RegUsageInfoCollector.cpp
118	You mean something like: if (UsedPhysRegsMask.test(PReg) \|\| !MRI->def_empty(PReg)) { // Add aliases } This seems like a good idea.

mehdi_amini added inline comments.Mar 13 2017, 12:48 PM

lib/CodeGen/RegUsageInfoCollector.cpp
112	Could we not duplicate this bit twingling logic? Maybe a lambda can do it.
118	This seems to be what I wrote in my local patch and asked above? But aliases are already part of `UsedPhysRegsMask`, so what I had in mind was rather: if (UsedPhysRegsMask.test(PReg)) { // mark used ... continue; } if (!MRI->def_empty(PReg)) { // Add aliases and current reg. } That way we don't process aliases ever when a reg is in `UsedPhysRegsMask`

kariddi added inline comments.Mar 13 2017, 1:11 PM

lib/CodeGen/RegUsageInfoCollector.cpp
118	I see what you mean now. Sounds good.

Updated to reflect Mehdi's last comments

kariddi marked an inline comment as done.Mar 13 2017, 1:12 PM

You did not address the isNoReturnDef() part yet. Maybe you can make that a public function in MachineRegisterInfo so it can be re-used.

On a side note: I would love to get rid of that hack for good by simply not adding defs to tailcalls, but that is probably out of the scope for this patch.

In D30880#699705, @MatzeB wrote:

You did not address the isNoReturnDef() part yet. Maybe you can make that a public function in MachineRegisterInfo so it can be re-used.

On a side note: I would love to get rid of that hack for good by simply not adding defs to tailcalls, but that is probably out of the scope for this patch.

Sorry I missed one mail in between. You are right this is indeed not used in this context (though maybe it should be). In that case it is indeed NFC and LGTM.

This revision is now accepted and ready to land.Mar 13 2017, 1:37 PM

mehdi_amini accepted this revision.Mar 13 2017, 1:56 PM

Closed by commit rL297673: [IPRA] Change algorithm for RegUsageInfoCollector. (authored by mggm). · Explain WhyMar 13 2017, 2:54 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

RegUsageInfoCollector.cpp

15 lines

Diff 91517

lib/CodeGen/RegUsageInfoCollector.cpp

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	bool RegUsageInfoCollector::runOnMachineFunction(MachineFunction &MF) {
const Function *F = MF.getFunction();		const Function *F = MF.getFunction();

PhysicalRegisterUsageInfo *PRUI = &getAnalysis<PhysicalRegisterUsageInfo>();		PhysicalRegisterUsageInfo *PRUI = &getAnalysis<PhysicalRegisterUsageInfo>();

PRUI->setTargetMachine(&TM);		PRUI->setTargetMachine(&TM);

DEBUG(dbgs() << "Clobbered Registers: ");		DEBUG(dbgs() << "Clobbered Registers: ");

for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg)		auto &UsedPhysRegsMask = MRI->getUsedPhysRegsMask();
		MatzeBUnsubmitted Done Reply Inline Actions Please no `auto` when the type isn't immediately visible. MatzeB: Please no `auto` when the type isn't immediately visible.
		kariddiAuthorUnsubmitted Not Done Reply Inline Actions Thanks, will do kariddi: Thanks, will do
if (MRI->isPhysRegModified(PReg, true))		// Scan all the physical register. When a register is defined in the current
		mehdi_aminiUnsubmitted Done Reply Inline Actions `physical registers` mehdi_amini: `physical registers`
		// function set it and all the aliasing registers as defined in the regmask.
		for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg) {
		if (!MRI->def_empty(PReg)) {
		for (MCRegAliasIterator AI(PReg, TRI, true); AI.isValid(); ++AI)
		RegMask[AI / 32] &= ~(1u << AI % 32);
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Could we not duplicate this bit twingling logic? Maybe a lambda can do it. mehdi_amini: Could we not duplicate this bit twingling logic? Maybe a lambda can do it.
		continue;
		}
		// If a register is not defned but is in the UsedPhysRegsMask set then
		mehdi_aminiUnsubmitted Done Reply Inline Actions `defined` mehdi_amini: `defined`
		kariddiAuthorUnsubmitted Not Done Reply Inline Actions Whoops for this and the one above kariddi: Whoops for this and the one above
		// set it to defined in the mask anyway.
		if (UsedPhysRegsMask.test(PReg))
RegMask[PReg / 32] &= ~(1u << PReg % 32);		RegMask[PReg / 32] &= ~(1u << PReg % 32);
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I implemented almost the same thing on Friday, but didn't have the time to benchmark it, how much did it improve Maxime's test-case? (this was taking ~37% of the total time IIRC). Also I wrote something that looked like: if (!MRI->def_empty(PReg) \|\| UsedPhysRegsMask.test(PReg)) { for (MCRegAliasIterator AI(PReg, TRI, true); AI.isValid(); ++AI) RegMask[AI / 32] &= ~(1u << AI % 32); continue; } Why don't you need to add the aliases for `UsedPhysRegsMask`? Another approach that I talked about with @MatzeB was to perform two pass: one to collect which register units are modified and the second to mark the physical register that include a modified reg units. (I had planned to bench the two approaches). mehdi_amini: I implemented almost the same thing on Friday, but didn't have the time to benchmark it, how…
		kariddiAuthorUnsubmitted Not Done Reply Inline Actions On that test went from 37% of compile time to about 0.1% I don't add the aliases because that seems to be matching what the original code was doing. The original code checks for each register if they are in the Used set (through isPhysRegModified) and adds them if they are. This should be matching what that code was doing. In the case the register is defined it will be added to the set anyway (together with all the aliased ones), so I skip it in that case kariddi: On that test went from 37% of compile time to about 0.1% I don't add the aliases because that…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions On that test went from 37% of compile time to about 0.1% Great! I don't add the aliases because that seems to be matching what the original code was doing. I think you're right. It seemed incorrect to me but I just looked the comment for this and it says "UsedPhysRegMask - Additional used physregs including aliases". The fact that it includes alias makes the check redundant. I suspect that we should first perform the `if (UsedPhysRegsMask.test(PReg))` case before `if (!MRI->def_empty(PReg)) {`. That would save us the loop on the aliases (which are covered in UsedPhysRegsMask.test). mehdi_amini: > On that test went from 37% of compile time to about 0.1% Great! > I don't add the aliases…
		kariddiAuthorUnsubmitted Not Done Reply Inline Actions You mean something like: if (UsedPhysRegsMask.test(PReg) \|\| !MRI->def_empty(PReg)) { // Add aliases } This seems like a good idea. kariddi: You mean something like: if (UsedPhysRegsMask.test(PReg) \|\| !MRI->def_empty(PReg)) { // Add…
		mehdi_aminiUnsubmitted Done Reply Inline Actions This seems to be what I wrote in my local patch and asked above? But aliases are already part of `UsedPhysRegsMask`, so what I had in mind was rather: if (UsedPhysRegsMask.test(PReg)) { // mark used ... continue; } if (!MRI->def_empty(PReg)) { // Add aliases and current reg. } That way we don't process aliases ever when a reg is in `UsedPhysRegsMask` mehdi_amini: This seems to be what I wrote in my local patch and asked above? But aliases are already part…
		kariddiAuthorUnsubmitted Not Done Reply Inline Actions I see what you mean now. Sounds good. kariddi: I see what you mean now. Sounds good.
		}

if (!TargetFrameLowering::isSafeForNoCSROpt(F)) {		if (!TargetFrameLowering::isSafeForNoCSROpt(F)) {
const uint32_t *CallPreservedMask =		const uint32_t *CallPreservedMask =
TRI->getCallPreservedMask(MF, F->getCallingConv());		TRI->getCallPreservedMask(MF, F->getCallingConv());
// Set callee saved register as preserved.		// Set callee saved register as preserved.
for (unsigned i = 0; i < RegMaskSize; ++i)		for (unsigned i = 0; i < RegMaskSize; ++i)
RegMask[i] = RegMask[i] \| CallPreservedMask[i];		RegMask[i] = RegMask[i] \| CallPreservedMask[i];
} else {		} else {
Show All 15 Lines