This is an archive of the discontinued LLVM Phabricator instance.

[MachineCombiner] [NFC]Add MustReduceRegisterPressure goal
ClosedPublic

Authored by shchenz on Nov 24 2020, 7:40 PM.

Details

Summary

add a new goal MustReduceRegisterPressure for machine combiner pass. This is an NFC patch.

PowerPC will use this new goal to do some register pressure related optimization. https://reviews.llvm.org/D92071

Diff Detail

Event Timeline

shchenz created this revision.Nov 24 2020, 7:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 7:40 PM
shchenz requested review of this revision.Nov 24 2020, 7:40 PM
shchenz edited reviewers, added: Restricted Project, efriedma; removed: eli.friedman.Nov 26 2020, 9:46 PM
shchenz updated this revision to Diff 309192.Dec 3 2020, 2:06 AM
shchenz added a reviewer: MatzeB.

1: fix lint warning messages
2: fix x86 CodeGen/X86/coalescer-dce.ll verify error after liveintervals introduced in machine combiner pass

and gentle ping...

spatel added a comment.Dec 3 2020, 9:24 AM

I'm not too familiar with MIR and I have not worked on MachineCombiner in a long time, so someone else should probably have a look too.

  1. Why are the killed and dead flags changing in the tests?
  2. Is there compile-time increase from having the extra passes (Slot index numbering, Live Interval Analysis, Machine Trace Metrics) even for targets that do not enable the MustReduceRegisterPressure objective?
shchenz added a comment.EditedDec 4 2020, 12:49 AM

@spatel Very much appreciate your comments. It is a good start for this patch to move on.

  1. Why are the killed and dead flags changing in the tests?

For the dead flag in case test/DebugInfo/X86/machinecse-wrongdebug-hoist.ll, I think now it is more accurate. It should be an improvement.
For the kill flags in other cases, it caused by Liveintervals pass cleaning the killed flag for uses. See https://github.com/llvm/llvm-project/blob/5b9fc44d8128717ef2f219b061c018abb85c717f/llvm/lib/CodeGen/LiveIntervalCalc.cpp#L157-L160.
I am guessing this code can be improved because now Liveintervals is used in many places except before register allocation, such as MachineScheduler, Machinepipeliner, and some other target-specific passes. Fortunately, before RA, we have data flow analysis, so it is easy for us to check if a register/operand is killed without depending on the kill flag. For example in MachineLICM pass, we check kill operand like:

static bool isOperandKill(const MachineOperand &MO, MachineRegisterInfo *MRI) {
  return MO.isKill() || MRI->hasOneNonDBGUse(MO.getReg());
}

yeah, sure, if this is a must fix degradation, I will fix this in follow up patches.

  1. Is there compile-time increase from having the extra passes (Slot index numbering, Live Interval Analysis, Machine Trace Metrics) even for targets that do not enable the MustReduceRegisterPressure objective?

Compile-time diff: https://llvm-compile-time-tracker.com/compare.php?from=1bb79875e4b8f9018142a5155ca3f7df37778419&to=16f8136e47fdf13a05a27bbd0829d670f3baacb1&stat=instructions
I am not sure which target is for above testing, but I think it must be not PowerPC ^^.
GEO increases 0.53%, and biggest increase is 0.7%, do you think this is an acceptable increase after I add these new passes to machine combiner?
I think LiveIntervals pass can not only be used for this patch but it can also be used for later register pressure track for ILP related reassociations. I knew we used to plan a register estimation model in this pass to prevent too many reassociations from causing too big register pressures.

Thanks again.

llvm/test/DebugInfo/X86/machinecse-wrongdebug-hoist.ll
3 ↗(On Diff #309192)

This should be a positive change. register %5 has no user below.

  1. Why are the killed and dead flags changing in the tests?

For the dead flag in case test/DebugInfo/X86/machinecse-wrongdebug-hoist.ll, I think now it is more accurate. It should be an improvement.
For the kill flags in other cases, it caused by Liveintervals pass cleaning the killed flag for uses. See https://github.com/llvm/llvm-project/blob/5b9fc44d8128717ef2f219b061c018abb85c717f/llvm/lib/CodeGen/LiveIntervalCalc.cpp#L157-L160.
I am guessing this code can be improved because now Liveintervals is used in many places except before register allocation, such as MachineScheduler, Machinepipeliner, and some other target-specific passes. Fortunately, before RA, we have data flow analysis, so it is easy for us to check if a register/operand is killed without depending on the kill flag. For example in MachineLICM pass, we check kill operand like:

static bool isOperandKill(const MachineOperand &MO, MachineRegisterInfo *MRI) {
  return MO.isKill() || MRI->hasOneNonDBGUse(MO.getReg());
}

yeah, sure, if this is a must fix degradation, I will fix this in follow up patches.

I don't know the answer to that question - does it potentially harm other passes if the kill flag was dropped, or can we assume it does not matter? Adding more potential reviewers.

  1. Is there compile-time increase from having the extra passes (Slot index numbering, Live Interval Analysis, Machine Trace Metrics) even for targets that do not enable the MustReduceRegisterPressure objective?

Compile-time diff: https://llvm-compile-time-tracker.com/compare.php?from=1bb79875e4b8f9018142a5155ca3f7df37778419&to=16f8136e47fdf13a05a27bbd0829d670f3baacb1&stat=instructions
I am not sure which target is for above testing, but I think it must be not PowerPC ^^.

I assume it's an x86 machine, but @nikic can confirm.

GEO increases 0.53%, and biggest increase is 0.7%, do you think this is an acceptable increase after I add these new passes to machine combiner?
I think LiveIntervals pass can not only be used for this patch but it can also be used for later register pressure track for ILP related reassociations. I knew we used to plan a register estimation model in this pass to prevent too many reassociations from causing too big register pressures.

We have tried to avoid changes that can cost that much compile-time without any benefit. If every target will have that loss even if there is no chance of run-time improvement, we should look for ways to hide it. Again, I'm not very familiar with how things work here - is it possible to make the extra analysis conditional on some flags internal to the machine combiner pass?

I agree, this shouldn't incur compile-time cost for targets on which this won't even be run.
I'm going to guess that pretty much all extra cost comes from requiring LiveIntervals,
in which case i'd guess you'd want to compute it lazily only when actually needed.

Thanks for your comments @spatel @lebedev.ri

Do you have any idea to run the LiveIntervals pass only on the target that MustReduceRegisterPressure is enabled?
Some considerations:
1: PowerPC uses LiveIntervals in class PPCInstrInfo which is not inherited from class Pass, so I can not call getAnalysis<LiveIntervals>() there.
2: I tried to guard getAnalysis<LiveIntervals>() in MachineCombiner pass under a new condition like enableReduceRegisterPressureGoal(), it also does not work. If we add AU.addRequired<LiveIntervals>() in MachineCombiner::getAnalysisUsage, we will need to add the required passes before machine combiner pass.

shchenz updated this revision to Diff 310828.Dec 10 2020, 3:52 AM
shchenz retitled this revision from [MachineCombiner] Add MustReduceRegisterPressure goal to [MachineCombiner] [NFC]Add MustReduceRegisterPressure goal.
shchenz edited the summary of this revision. (Show Details)

1: don't require LiveIntervals analysis
2: revert the test case change as now this should be NFC

Hi @spatel @lebedev.ri , I changed the patch not requiring LiveIntervals pass. Now this patch is NFC.

New compiling time diff is: https://llvm-compile-time-tracker.com/compare.php?from=5bb28fa0f51e94522644fe5633877b441b9ad8d3&to=73b8523c4c149866690257f58bc678be20a4991b&stat=instructions
In summary, at -O3 GEO increases 0.01%; at ReleaseThinLTO config, GEO decreases 0.01%. I assume this patch now has no compiling time regression.

Could you help to have another look? Thanks.

If this is NFC now, no objections from me - I just noted minor errors.
So we do not need LiveIntervals analysis at all now to reduce pressure, or that will be dealt with separately?

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1085

typo: "if target supports reassociation of instructions"

1093

Make a complete sentence for code comments:

/// Fix up the placeholder we may add in genAlternativeCodeSequence().
1095

typo: "finalize"

llvm/lib/CodeGen/MachineCombiner.cpp
317

typo: PressureTracker

478

typo:

// even if InsInstrs is not the better pattern.
636

typo (also in original code):
"threshold"

641

InInstrs -> InsInstrs

646

back -> Go back ?

shchenz updated this revision to Diff 311120.Dec 10 2020, 11:28 PM

1: address @spatel comments

If this is NFC now, no objections from me - I just noted minor errors.
So we do not need LiveIntervals analysis at all now to reduce pressure, or that will be dealt with separately?

Thanks for your careful review. Updated accordingly.
Now I only handle two patterns in PowerPC patch https://reviews.llvm.org/D92071. The left patterns hit in our internal tests don't need live intervals analysis to decide the specific blocks are in high register pressure.
And yes, I will handle the other two deleted patterns which require live intervals analysis in following patches. I will keep in mind live intervals analysis should only be run on targets that use this newly added goal. ^^

Thanks again for making this patch go so far.

spatel accepted this revision.Dec 12 2020, 9:05 AM

LGTM

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1085

Spelling is still wrong here:
ressociation -> reassociation
instrucitons -> instructions

llvm/lib/CodeGen/MachineCombiner.cpp
478

bettern -> better

This revision is now accepted and ready to land.Dec 12 2020, 9:05 AM
This revision was landed with ongoing or failed builds.Dec 13 2020, 9:03 PM
This revision was automatically updated to reflect the committed changes.