This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
24/50
MachineCopyPropagation.cpp
-
test/CodeGen/
-
CodeGen/
-
PowerPC/
1/1
mcp-elim-eviction-chain.mir
-
Thumb2/
-
LowOverheadLoops/
-
spillingmove.ll
-
mve-postinc-dct.ll

Differential D122118

[MachineCopyPropagation] Eliminate spillage copies that might be caused by eviction chain
ClosedPublic

Authored by lkail on Mar 21 2022, 2:21 AM.

Download Raw Diff

Details

Reviewers

aditya_nandakumar
qcolombet

Group Reviewers

Restricted Project

Commits

rG96aaebd12e73: [MachineCopyPropagation] Eliminate spillage copies that might be caused by…

Summary

Remove spill-reload like copy chains. For example

r0 = COPY r1
r1 = COPY r2
r2 = COPY r3
r3 = COPY r4
<def-use r4>
r4 = COPY r3
r3 = COPY r2
r2 = COPY r1
r1 = COPY r0

will be folded into

r0 = COPY r1
r1 = COPY r4
<def-use r4>
r4 = COPY r1
r1 = COPY r0

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lkail created this revision.Mar 21 2022, 2:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 2:21 AM

Herald added subscribers: dmgreen, hiraditya, nemanjai, qcolombet. · View Herald Transcript

lkail requested review of this revision.Mar 21 2022, 2:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 2:21 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

lkail retitled this revision from [MachineCopyPropagaion][WIP] Eliminate spillage copies that might caused by eviction chain to [MachineCopyPropagation][WIP] Eliminate spillage copies that might caused by eviction chain.Mar 21 2022, 2:21 AM

lkail updated this revision to Diff 416891.Mar 21 2022, 4:23 AM

lkail updated this revision to Diff 416893.Mar 21 2022, 4:32 AM

Harbormaster completed remote builds in B155352: Diff 416893.Mar 21 2022, 5:29 AM

lkail updated this revision to Diff 417201.Mar 22 2022, 12:35 AM

Harbormaster completed remote builds in B155570: Diff 417201.Mar 22 2022, 1:24 AM

At first glance it looks similar to what @aditya_nandakumar implemented internally to get rid of copies produced by eviction chains.

@aditya_nandakumar could you take a look?

qcolombet added a reviewer: qcolombet.Sep 19 2022, 4:21 PM

Talked to Aditya offline (a while back actually) and he told me he doesn't expect to have time to look at this any time soon.
Adding myself as a reviewers.

Hi @lkail ,

Thanks for your patience.
This goes in the right direction.

I think we miss a few comments and some cleanups (clang-format, remove stall comments, use proper LLVM_DEBUG macros, etc.) and we're good to go!

Cheers,
-Quentin

llvm/lib/CodeGen/MachineCopyPropagation.cpp
924	Add comments. What this does: explain the algorithm at a high level.
932	Not for this patch, but you may want to use `isCopyInstr` instead of `isCopy` to catch more cases. That said, it probably won't make much of a difference, since in particular most copies you're trying to remove here comes from splitting in regalloc (i.e., we'll have plain `COPY`).
932	To be on the safe side, you may want to check that the operation has no implicit operand.
939	Instead of copying the chain, can we hold on to the reference until we're done with the processing?
946	Maybe worth splitting the reload from the spill chain as this assert is strange at first glance. Perhaps it wouldn't be as problematic when the method is properly documented. Put differently, let's leave it like this for now, add a bunch of comments and we'll see if it still feels weird after that.
949	We'll need to expand on that commet because naively, a 2 pairs chain would be beneficial to remove, but this is not something this code can do since we don't recolor outside of the spill chain. I'd suggest putting something like: We need at least 3 pairs of copies for the transformation to apply, because the first outermost pair cannot be removed since we don't recolor outside of the chain and that we need at least one temporary spill slot to shorten the chain. If we only have a chain of two pairs, we already have the shortest sequence this code can handle: the outermost pair for the temporary spill slot, and the pair that use that temporary spill slot for the other end of the chain.
961	Here and other places where you have "debug" statements: Put this in `LLVM_DEBUG` macros. Or remove completely.
966	Move that into its own helper function and call it from an assert. static bool LLVM_ATTRIBUTE_UNUSED isValidChain(const SmallVectorImpl<MachineInstr *> &Chain) { // your checks here. } ... assert(isValidChain(Chain) && "Invalid chain to process");
971	I feel that this is a bit late to check that. We should not put a copy in the "candidate" chains if the copy is not foldable. I would suggest to handle that in the main loop.
978	By construction `Chain[Len - 4]->getOperand(0) == Chain[Len - 3]->getOperand(1)`, so I would instead put that in a variable and use it in both places. E.g., something like: // Pull the last spill slot used only within the chain as the final spill slot. MCRegister LastReusableRegSpillSlot = Chain[Len - 4]->getOperand(0).getReg() // Update the chain to skip all the intermediate register spill slots: // Spilling: Chain[0]->getOperand(0).setReg(LastReusableRegSpillSlot); // Reload: Chain[1]->getOperand(1).setReg(LastReusableRegSpillSlot);
980	Maybe be worth adding a comment here that although the variable is called `MaybeDeadCopies`, we really are going to remove the related instructions. The fact that we use `MaybeDeadCopies` to do our code cleanup is slightly confusing because if we don't actually delete the intermediate copies (a.k.a. what remains of the chain at this point) the resulting code would be incorrect.
1045	At first it is strange to see that we look for a copy when `Reg` is a def, but I guess it makes sense because: We are not going to recolor `Reg` We need to consider this chain before it gets clobbered later in that same loop Assuming I understood that correctly, it deserves its comment here.
1053	Use `LeadRegs.find` and avoid the double lookups (one in `count` and one in `operator[]`).
1063	I think this statement deserves its own comment. IIUC here we unconditionally clobber all the registers (as opposed to only clobbering the definitions) because we only rewrite the chain itself (i.e., we don't attempt to rewrite uses after the chain). BTW, you need to take into account regmasks too.
1063	Shouldn't we clear the `SpillChains` here for defs and not-preversed-by-regmasks regs at this point?
llvm/test/CodeGen/PowerPC/mcp-elim-eviction-chain.mir
135	Add a test with regmasks.

Hi @qcolombet , thanks for your detail comments. I have uploaded another patch which is very different from previous one. Not sure I have addressed all your comments.

lkail marked 7 inline comments as done.Oct 27 2022, 9:04 AM

lkail retitled this revision from [MachineCopyPropagation] Eliminate spillage copies that might caused by eviction chain to [MachineCopyPropagation] Eliminate spillage copies that might be caused by eviction chain.Oct 27 2022, 9:13 AM

Harbormaster completed remote builds in B194670: Diff 471178.Oct 27 2022, 9:59 AM

Ping.

Gentle ping.

Hi @lkail,

I am halfway through.

I'm sharing my comments so far if you want to get started with some of the nitpicks.

Cheers,
-Quentin

llvm/lib/CodeGen/MachineCopyPropagation.cpp
89	Maybe rename in `LastSeenUseInCopy`. Essentially, I would avoid `LastUse` alone as it carries a lot of expected semantic that I don't think apply here.
166	Use `MI` directly here instead of adding it line 200.
246	Could `Current` be const here?
265	If `Def` is clobbered between `DefCopy` and `Current` I would have expected that `DefCopy` would have been removed from `Copies`. Put differently, I feel that if we have to check that here, the `Copies` map is holding hard-to-reason-about information. Are we missing some call to `clobberRegister`?
931	I think you can simplify this with a range based loop with `rbegin`/`rend`.
932	That should be a simple range based loop: for (const MachineInstr *MI: RC) MI->dump();
951	Technically we could collapse this sequence to: // r0 = COPY r4 // <def-use r4> // r4 = COPY r0 I.e., I think it is worth explaining that the propagation doesn't check whether or not `r0` can be altered outside of the chain and that's why we conservatively keep its value as it was before the rewrite. (Would be a nice follow-up fix BTW :)).
954	Typo: until
954	typo: chain uses
967	typo: encountered
972	That sounds weird. Would you mind sharing the assertion?
974	Could you add a comment on what the mappings hold (all three of them)? I haven't read the code past this point yet, but for instance I would have expected that the key in these maps would be a `Register` not `MachineInstr`.
975	typo: until
977	Instead of tracking that, should we just invalidate the chain / stop it before that point?
990	Maybe add a todo that if the outermost pair of copies modifies a register that is dead outside of that pair, we could eliminate one more pair.

lkail added inline comments.Nov 18 2022, 12:07 AM

llvm/lib/CodeGen/MachineCopyPropagation.cpp
166	Correct me if I'm woring, `Copies.insert` insert successfully only when the key doesn't exist before. Suppose we have L0: R0 = COPY R1 L1: R2 = COPY R1 `LastUseSeenInCopy` should track MI in `L1` rather than `L0`.
265	I can imagine there might be some `RegMask`s doesn't implicit def any registers, just clobber them. When we are traversing the MBB and are encountered with a `RegMask` without any other implicit-def, we don't know which register to clobber directly. I'm not sure we have a way to enumerate registers a `RegMask` clobbers. If there is such a way, I think we should clobber registers when we are traversing the MBB, not checking `RegMask` clobbers here.
972	It hits `DenseMapIterator`'s pointer operator->() const { assert(isHandleInSync() && "invalid iterator access!"); ... } I dived into it a bit, looks it's checking the validity of the iterator, i.e., if the container is updated, the iterator constructed before the update is invalid. Code like auto Leader = ChainLeader.find(MaybePrevReload); ... ChainLeader.insert({MaybeSpill, Leader->second}); ChainLeader.insert({&MI, Leader->second}); Should be avoided.
974	I haven't read the code past this point yet, but for instance I would have expected that the key in these maps would be a Register not MachineInstr. I separate the algorithm implementation in to two stages. stage1: Collect spill-reload chains. stage2: Fold the chains. If using `Register`, we are unable to track different spill-reload chains that share same registers.
977	The implementation doesn't invalidate any chain in stage1. Compared to previous implementation, I think current one is easier to reason and easier to maintain. When we are iterating MI inside the MBB, we don't know which `COPY` might be one of the innermost spill-reload pair and we don't want to lose track of the innermost spill-reload pair. The Source of the innermost spill is allowed to be re-use and re-def between the innermost spill-reload pair.

Address comments.

lkail marked 10 inline comments as done.Nov 18 2022, 12:09 AM

lkail updated this revision to Diff 476369.Nov 18 2022, 12:17 AM

lkail marked an inline comment as done.

lkail updated this revision to Diff 476371.Nov 18 2022, 12:30 AM

lkail added inline comments.Nov 18 2022, 12:57 AM

llvm/lib/CodeGen/MachineCopyPropagation.cpp
951	Maybe we can check if `r0` is killed to remove one more COPY.

Harbormaster completed remote builds in B198396: Diff 476371.Nov 18 2022, 1:13 AM

qcolombet added inline comments.Nov 22 2022, 7:20 PM

llvm/lib/CodeGen/MachineCopyPropagation.cpp
166	Ah good point!
265	I think I see our misunderstanding. Given the name of the function I would expect that this function only does some queries on the tracker, but you're actually using this function to do some bookkeeping as well. So the conclusion is either rename this function to more accurately represents what it does (I don't have a good name for now) or move the bookkeeping in the main loop (i.e., I thought we were calling clobberRegister from the main loop already.) Regarding your comment on `RegMask`s, I am not sure I follow: `RegMask`s always list all the registers they preserve/clobber Liveness sets at basic block boundaries are not represented with `RegMask`s, but anyway we don't care because the tracking is always purely local to a basic block in that pass. (Unless you've changed that and I missed it :)).
931	You should be able to use an even more compact form: for (auto I : make_range(SC.rbegin(), SC.rend())
951	Yep, but for that to be accurate we would need to flip the direction of the analysis (from top-down, to bottom-up) to get proper liveness construction. (Or use the kill flag, but generally speaking we try to avoid relying on this.)
972	Yep, every time you insert something in the dense map, you may invalidate the iterators.
974	I see, make sense.

lkail added inline comments.Nov 23 2022, 1:36 AM

llvm/lib/CodeGen/MachineCopyPropagation.cpp
265	RegMasks always list all the registers they preserve/clobber Ah, I see. Is it a good idea to enumerate them via `TRI->getNumRegs()` and check `RegMask` to see if they are preserve/clobber? For the origin question If Def is clobbered between DefCopy and Current I would have expected that DefCopy would have been removed from Copies. Does it imply we should not check `RegMask` and don't update bookkeeping by calling `Tracker::clobberRegister` here, checking if `RegMask` clobbers registers should already have been conducted in the main loop? Correct me if I still fail to get your point.

qcolombet added inline comments.Nov 24 2022, 12:15 PM

llvm/lib/CodeGen/MachineCopyPropagation.cpp
265	Ah, I see. Is it a good idea to enumerate them via TRI->getNumRegs() and check RegMask to see if they are preserve/clobber? That's the idea. Though we wouldn't need to enumerate all the registers, only the ones that you care about. What you're doing here is already fine. The thing that bothers me in the current code is the call to `clobberRegister`. This modifies the state of the tracker. Usually the `find` methods only query the `RegMask`s directly (with `MachineOperand::clobbersPhysReg`). Correct me if I still fail to get your point. You got the point. Let's keep the code as is for now and let me do a full pass on the code so that I have a better model of how the whole things works. I'll probably won't have time to do it before next week though.

Hi @lkail,

Thanks for your patience.

Looks mostly good to me.

The only thing that makes me uneasy is the potential impact on compile time of CheckCopyConstraint.
Do you know on average how many chains we see per function?

I know we have a bot tracking compile time. If that doesn't show anything significant, I guess we could enable it by default. I just don't know how extensive the tests are.

Alternatively, to avoid surprising everybody, we could add a way to enable/disable the new folding.
Either like what was done with UseCopyInstr or with a target hook (you can add a command line option too that would override what the target asked for for testing purposes).

Then only enabled it for your target and send an RFC asking people to try the new folding for their targets.

What do you think?

Cheers,
-Quentin

llvm/lib/CodeGen/MachineCopyPropagation.cpp
985	Nit: `const` on `MachineInstr*`
1005	You should be able to use a range loop: for (const MachineInstr *Spill : SC) { if (CopySourceInvalid.count(Spill)) return; }
1010	Nit: range loop
1016	That's going to be pretty expensive to walk all the register classes. I'm guessing you're trying to check if the resulting copy is legal and unfortunately there's no good way to do that. Did you see that showing up in compile time profile?
1046	Nit: range loop
1055	Nit: Here and other places where you use `isCopyInstr`: use the explicit type instead of `auto`. (The return type is hard to infer.)

lkail updated this revision to Diff 485466.Dec 28 2022, 12:18 AM

Herald added subscribers: • pcwang-thead, frasercrmck, luismarques and 20 others. · View Herald TranscriptDec 28 2022, 12:18 AM

Add target hook and command line option.

Herald added a subscriber: kbarton. · View Herald TranscriptDec 28 2022, 12:23 AM

Harbormaster completed remote builds in B205056: Diff 485469.Dec 28 2022, 1:16 AM

Do you know on average how many chains we see per function?

I have run llvm-test-suite, here's the stats

	functions	spill chain length	avg spill chain length per function	max spill chain length in one CU	number of spill chains	avg number of spill chains per function	max number of spill chains in one CU
powerpc64-ibm-aix	60426	308	0.005097143613676	52	27	0.000446827524576	3
x86_64-linux-gnu	185938	646	0.003474276371694	130	80	0.000430250943863	14

Compile time on powerpc64-ibm-aix

Tests: 1039
Metric: compile_time

Program                                       compile_time           
                                              baseline     experiment
SingleSour...sts/2002-10-09-ArrayResolution     0.07         0.50    
MultiSourc...marks/Trimaran/enc-pc1/enc-pc1     0.14         0.86    
SingleSour...e/UnitTests/2002-10-13-BadLoad     0.06         0.38    
SingleSour...UnitTests/2002-08-02-CastTest2     0.06         0.37    
SingleSour...nitTests/2002-04-17-PrintfChar     0.06         0.36    
SingleSour.../UnitTests/conditional-gnu-ext     0.07         0.23    
MultiSource/Benchmarks/llubenchmark/llu         0.10         0.35    
SingleSour...tTests/2003-08-05-CastFPToUint     0.06         0.22    
SingleSour...nitTests/2003-05-31-LongShifts     0.07         0.22    
SingleSour...e/UnitTests/Vector/Altivec/lde     0.21         0.68    
SingleSource/UnitTests/blockstret               0.06         0.18    
SingleSour...e/UnitTests/2002-05-03-NotTest     0.07         0.21    
SingleSour...UnitTests/2002-05-02-CastTest2     0.07         0.20    
SingleSour...UnitTests/2003-08-11-VaListArg     0.15         0.43    
SingleSource/UnitTests/StructModifyTest         0.07         0.19    
      compile_time             
run       baseline   experiment
count  1039.000000  1039.000000
mean   1.108578     1.102223   
std    4.496574     4.384102   
min    0.000000     0.000000   
25%    0.000000     0.000000   
50%    0.000000     0.000000   
75%    0.276350     0.278450   
max    80.927800    77.030800

Compile time on x86_64-linux-gnu

Tests: 2991
Metric: compile_time

Program                                       compile_time           
                                              baseline     experiment
UnitTests/...9-04-16-BitfieldInitialization     0.00         0.02    
UnitTests/2002-04-17-PrintfChar                 0.00         0.01    
UnitTests/2002-10-13-BadLoad                    0.00         0.01    
UnitTests/block-copied-in-cxxobj-1              0.00         0.01    
UnitTests/testcase-ExprConstant-1               0.01         0.02    
UnitTests/2010-05-24-BitfieldTest               0.00         0.01    
UnitTests/block-byref-test                      0.00         0.01    
UnitTests/testcase-Expr-1                       0.01         0.02    
UnitTests/byval-alignment                       0.01         0.02    
Benchmarks/Misc/pi                              0.01         0.02    
UnitTests/block-byref-cxxobj-test               0.01         0.01    
UnitTests/2020-01-06-coverage-008               0.01         0.02    
UnitTests/2002-08-02-CastTest2                  0.01         0.01    
UnitTests/2002-05-03-NotTest                    0.02         0.02    
UnitTests/2005-05-13-SDivTwo                    0.01         0.02    
      compile_time             
l/r       baseline   experiment
count  2991.000000  2991.000000
mean   0.291675     0.291211   
std    2.642554     2.642948   
min    0.000000     0.000000   
25%    0.000000     0.000000   
50%    0.000000     0.000000   
75%    0.000000     0.000000   
max    99.370200    99.510200

I don't see significant compile time regressions.

I just don't know how extensive the tests are.

I have tried bootstrapping stage3 and running llvm-test-suite on powerpc64-ibm-aix and x86_64-linux-gnu, no regression found. I'll follow your advice to send an RFC on discourse, currently I only enable it on PowerPC by default. https://reviews.llvm.org/D122118?id=485466 shows changes in other targets.

lkail updated this revision to Diff 485479.Dec 28 2022, 2:41 AM

Harbormaster completed remote builds in B205063: Diff 485479.Dec 28 2022, 3:37 AM

lkail updated this revision to Diff 485578.Dec 28 2022, 10:00 PM

Harbormaster completed remote builds in B205131: Diff 485578.Dec 28 2022, 11:01 PM

Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=781eabeb40b8e47e3a46b0b927784e63f0aad9ab&to=0af2744a89bf0ed05e83ac1ed9d21d6d74cdfeca&stat=instructions%3Au

bjope added a subscriber: bjope.Dec 30 2022, 4:39 AM

Use range loop.

In D122118#4019373, @nikic wrote:

Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=781eabeb40b8e47e3a46b0b927784e63f0aad9ab&to=0af2744a89bf0ed05e83ac1ed9d21d6d74cdfeca&stat=instructions%3Au

Much appreciated for your profiling!

Harbormaster completed remote builds in B205389: Diff 485899.Jan 2 2023, 7:39 PM

qcolombet accepted this revision.Jan 24 2023, 6:00 AM

This revision is now accepted and ready to land.Jan 24 2023, 6:00 AM

Matt added a subscriber: Matt.Jan 25 2023, 9:10 AM

This revision was landed with ongoing or failed builds.Feb 7 2023, 7:34 PM

Closed by commit rG96aaebd12e73: [MachineCopyPropagation] Eliminate spillage copies that might be caused by… (authored by lkail). · Explain Why

This revision was automatically updated to reflect the committed changes.

lkail added a commit: rG96aaebd12e73: [MachineCopyPropagation] Eliminate spillage copies that might be caused by….

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineCopyPropagation.cpp

161 lines

test/

CodeGen/

PowerPC/

mcp-elim-eviction-chain.mir

39 lines

Thumb2/

LowOverheadLoops/

spillingmove.ll

16 lines

mve-postinc-dct.ll

44 lines

Diff 416864

llvm/lib/CodeGen/MachineCopyPropagation.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
STATISTIC(NumCopyBackwardPropagated, "Number of copy defs backward propagated");		STATISTIC(NumCopyBackwardPropagated, "Number of copy defs backward propagated");
DEBUG_COUNTER(FwdCounter, "machine-cp-fwd",		DEBUG_COUNTER(FwdCounter, "machine-cp-fwd",
"Controls which register COPYs are forwarded");		"Controls which register COPYs are forwarded");

namespace {		namespace {

class CopyTracker {		class CopyTracker {
struct CopyInfo {		struct CopyInfo {
MachineInstr *MI;		MachineInstr *MI;
		qcolombetUnsubmitted Done Reply Inline Actions Maybe rename in `LastSeenUseInCopy`. Essentially, I would avoid `LastUse` alone as it carries a lot of expected semantic that I don't think apply here. qcolombet: Maybe rename in `LastSeenUseInCopy`. Essentially, I would avoid `LastUse` alone as it carries…
SmallVector<MCRegister, 4> DefRegs;		SmallVector<MCRegister, 4> DefRegs;
bool Avail;		bool Avail;
};		};

DenseMap<MCRegister, CopyInfo> Copies;		DenseMap<MCRegister, CopyInfo> Copies;

public:		public:
/// Mark all of the given registers and their subregisters as unavailable for		/// Mark all of the given registers and their subregisters as unavailable for
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void trackCopy(MachineInstr *MI, const TargetRegisterInfo &TRI) {

// Remember Def is defined by the copy.		// Remember Def is defined by the copy.
for (MCRegUnitIterator RUI(Def, &TRI); RUI.isValid(); ++RUI)		for (MCRegUnitIterator RUI(Def, &TRI); RUI.isValid(); ++RUI)
Copies[*RUI] = {MI, {}, true};		Copies[*RUI] = {MI, {}, true};

// Remember source that's copied to Def. Once it's clobbered, then		// Remember source that's copied to Def. Once it's clobbered, then
// it's no longer available for copy propagation.		// it's no longer available for copy propagation.
for (MCRegUnitIterator RUI(Src, &TRI); RUI.isValid(); ++RUI) {		for (MCRegUnitIterator RUI(Src, &TRI); RUI.isValid(); ++RUI) {
auto I = Copies.insert({*RUI, {nullptr, {}, false}});		auto I = Copies.insert({*RUI, {nullptr, {}, false}});
		qcolombetUnsubmitted Not Done Reply Inline Actions Use `MI` directly here instead of adding it line 200. qcolombet: Use `MI` directly here instead of adding it line 200.
		lkailAuthorUnsubmitted Done Reply Inline Actions Correct me if I'm woring, `Copies.insert` insert successfully only when the key doesn't exist before. Suppose we have L0: R0 = COPY R1 L1: R2 = COPY R1 `LastUseSeenInCopy` should track MI in `L1` rather than `L0`. lkail: Correct me if I'm woring, `Copies.insert` insert successfully only when the key doesn't exist…
		qcolombetUnsubmitted Not Done Reply Inline Actions Ah good point! qcolombet: Ah good point!
auto &Copy = I.first->second;		auto &Copy = I.first->second;
if (!is_contained(Copy.DefRegs, Def))		if (!is_contained(Copy.DefRegs, Def))
Copy.DefRegs.push_back(Def);		Copy.DefRegs.push_back(Def);
}		}
}		}

bool hasAnyCopies() {		bool hasAnyCopies() {
return !Copies.empty();		return !Copies.empty();
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	for (const MachineInstr &MI :
if (MO.isRegMask())		if (MO.isRegMask())
if (MO.clobbersPhysReg(AvailSrc) \|\| MO.clobbersPhysReg(AvailDef))		if (MO.clobbersPhysReg(AvailSrc) \|\| MO.clobbersPhysReg(AvailDef))
return nullptr;		return nullptr;

return AvailCopy;		return AvailCopy;
}		}

void clear() {		void clear() {
Copies.clear();		Copies.clear();
		qcolombetUnsubmitted Done Reply Inline Actions Could `Current` be const here? qcolombet: Could `Current` be const here?
}		}
};		};

class MachineCopyPropagation : public MachineFunctionPass {		class MachineCopyPropagation : public MachineFunctionPass {
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const MachineRegisterInfo *MRI;		const MachineRegisterInfo *MRI;

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

MachineCopyPropagation() : MachineFunctionPass(ID) {		MachineCopyPropagation() : MachineFunctionPass(ID) {
initializeMachineCopyPropagationPass(*PassRegistry::getPassRegistry());		initializeMachineCopyPropagationPass(*PassRegistry::getPassRegistry());
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
		qcolombetUnsubmitted Not Done Reply Inline Actions If `Def` is clobbered between `DefCopy` and `Current` I would have expected that `DefCopy` would have been removed from `Copies`. Put differently, I feel that if we have to check that here, the `Copies` map is holding hard-to-reason-about information. Are we missing some call to `clobberRegister`? qcolombet: If `Def` is clobbered between `DefCopy` and `Current` I would have expected that `DefCopy`…
		lkailAuthorUnsubmitted Done Reply Inline Actions I can imagine there might be some `RegMask`s doesn't implicit def any registers, just clobber them. When we are traversing the MBB and are encountered with a `RegMask` without any other implicit-def, we don't know which register to clobber directly. I'm not sure we have a way to enumerate registers a `RegMask` clobbers. If there is such a way, I think we should clobber registers when we are traversing the MBB, not checking `RegMask` clobbers here. lkail: I can imagine there might be some `RegMask`s doesn't implicit def any registers, just clobber…
		qcolombetUnsubmitted Not Done Reply Inline Actions I think I see our misunderstanding. Given the name of the function I would expect that this function only does some queries on the tracker, but you're actually using this function to do some bookkeeping as well. So the conclusion is either rename this function to more accurately represents what it does (I don't have a good name for now) or move the bookkeeping in the main loop (i.e., I thought we were calling clobberRegister from the main loop already.) Regarding your comment on `RegMask`s, I am not sure I follow: `RegMask`s always list all the registers they preserve/clobber Liveness sets at basic block boundaries are not represented with `RegMask`s, but anyway we don't care because the tracking is always purely local to a basic block in that pass. (Unless you've changed that and I missed it :)). qcolombet: I think I see our misunderstanding. Given the name of the function I would expect that this…
		lkailAuthorUnsubmitted Done Reply Inline Actions RegMasks always list all the registers they preserve/clobber Ah, I see. Is it a good idea to enumerate them via `TRI->getNumRegs()` and check `RegMask` to see if they are preserve/clobber? For the origin question If Def is clobbered between DefCopy and Current I would have expected that DefCopy would have been removed from Copies. Does it imply we should not check `RegMask` and don't update bookkeeping by calling `Tracker::clobberRegister` here, checking if `RegMask` clobbers registers should already have been conducted in the main loop? Correct me if I still fail to get your point. lkail: > RegMasks always list all the registers they preserve/clobber Ah, I see. Is it a good idea to…
		qcolombetUnsubmitted Not Done Reply Inline Actions Ah, I see. Is it a good idea to enumerate them via TRI->getNumRegs() and check RegMask to see if they are preserve/clobber? That's the idea. Though we wouldn't need to enumerate all the registers, only the ones that you care about. What you're doing here is already fine. The thing that bothers me in the current code is the call to `clobberRegister`. This modifies the state of the tracker. Usually the `find` methods only query the `RegMask`s directly (with `MachineOperand::clobbersPhysReg`). Correct me if I still fail to get your point. You got the point. Let's keep the code as is for now and let me do a full pass on the code so that I have a better model of how the whole things works. I'll probably won't have time to do it before next week though. qcolombet: > Ah, I see. Is it a good idea to enumerate them via TRI->getNumRegs() and check RegMask to see…

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

MachineFunctionProperties getRequiredProperties() const override {		MachineFunctionProperties getRequiredProperties() const override {
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoVRegs);		MachineFunctionProperties::Property::NoVRegs);
}		}

private:		private:
typedef enum { DebugUse = false, RegularUse = true } DebugType;		typedef enum { DebugUse = false, RegularUse = true } DebugType;

void ReadRegister(MCRegister Reg, MachineInstr &Reader, DebugType DT);		void ReadRegister(MCRegister Reg, MachineInstr &Reader, DebugType DT);
void ForwardCopyPropagateBlock(MachineBasicBlock &MBB);		void ForwardCopyPropagateBlock(MachineBasicBlock &MBB);
void BackwardCopyPropagateBlock(MachineBasicBlock &MBB);		void BackwardCopyPropagateBlock(MachineBasicBlock &MBB);
		void EliminateSpillageCopies(MachineBasicBlock &MBB);
bool eraseIfRedundant(MachineInstr &Copy, MCRegister Src, MCRegister Def);		bool eraseIfRedundant(MachineInstr &Copy, MCRegister Src, MCRegister Def);
void forwardUses(MachineInstr &MI);		void forwardUses(MachineInstr &MI);
void propagateDefs(MachineInstr &MI);		void propagateDefs(MachineInstr &MI);
bool isForwardableRegClassCopy(const MachineInstr &Copy,		bool isForwardableRegClassCopy(const MachineInstr &Copy,
const MachineInstr &UseI, unsigned UseIdx);		const MachineInstr &UseI, unsigned UseIdx);
bool isBackwardPropagatableRegClassCopy(const MachineInstr &Copy,		bool isBackwardPropagatableRegClassCopy(const MachineInstr &Copy,
const MachineInstr &UseI,		const MachineInstr &UseI,
unsigned UseIdx);		unsigned UseIdx);
▲ Show 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	for (auto *Copy : MaybeDeadCopies) {
++NumDeletes;		++NumDeletes;
}		}

MaybeDeadCopies.clear();		MaybeDeadCopies.clear();
CopyDbgUsers.clear();		CopyDbgUsers.clear();
Tracker.clear();		Tracker.clear();
}		}

		void MachineCopyPropagation::EliminateSpillageCopies(MachineBasicBlock &MBB) {
		qcolombetUnsubmitted Done Reply Inline Actions Add comments. What this does: explain the algorithm at a high level. qcolombet: Add comments. What this does: explain the algorithm at a high level.
		LLVM_DEBUG(dbgs() << "MCP: Eliminating spillage copies in " << MBB.getName()
		<< "\n");
		DenseMap<MCRegister, MCRegister> LeadRegs;
		DenseMap<MCRegister, SmallVector<MachineInstr *>> SpillChains;
		auto IsFoldableCopy = [](const MachineInstr *Copy) {
		assert(Copy->isCopy() && "MI is expected to be COPY");
		return Copy->getOperand(0).isRenamable() &&
		qcolombetUnsubmitted Done Reply Inline Actions I think you can simplify this with a range based loop with `rbegin`/`rend`. qcolombet: I think you can simplify this with a range based loop with `rbegin`/`rend`.
		qcolombetUnsubmitted Not Done Reply Inline Actions You should be able to use an even more compact form: for (auto I : make_range(SC.rbegin(), SC.rend()) qcolombet: You should be able to use an even more compact form: ``` for (auto I : make_range(SC.rbegin()…
		Copy->getOperand(1).isRenamable();
		qcolombetUnsubmitted Done Reply Inline Actions Not for this patch, but you may want to use `isCopyInstr` instead of `isCopy` to catch more cases. That said, it probably won't make much of a difference, since in particular most copies you're trying to remove here comes from splitting in regalloc (i.e., we'll have plain `COPY`). qcolombet: Not for this patch, but you may want to use `isCopyInstr` instead of `isCopy` to catch more…
		qcolombetUnsubmitted Done Reply Inline Actions To be on the safe side, you may want to check that the operation has no implicit operand. qcolombet: To be on the safe side, you may want to check that the operation has no implicit operand.
		qcolombetUnsubmitted Done Reply Inline Actions That should be a simple range based loop: for (const MachineInstr MI: RC) MI->dump(); qcolombet:* That should be a simple range based loop: ``` for (const MachineInstr *MI: RC) MI->dump()…
		};
		auto TryFoldSpillageCopies = [&, this](MCRegister LeadReg) {
		assert(SpillChains.count(LeadReg) &&
		"Corresponding chain should be tracked");
		SmallVector<MachineInstr *> Chain = SpillChains[LeadReg];
		// We don't need this chain anymore.
		SpillChains.erase(LeadReg);
		qcolombetUnsubmitted Not Done Reply Inline Actions Instead of copying the chain, can we hold on to the reference until we're done with the processing? qcolombet: Instead of copying the chain, can we hold on to the reference until we're done with the…
		// Clear LeadReg.
		for (MachineInstr *Copy : Chain) {
		LeadRegs.erase(Copy->getOperand(0).getReg().asMCReg());
		LeadRegs.erase(Copy->getOperand(1).getReg().asMCReg());
		}
		const size_t Len = Chain.size();
		assert(Len % 2 == 0 && "Must be paired");
		qcolombetUnsubmitted Done Reply Inline Actions Maybe worth splitting the reload from the spill chain as this assert is strange at first glance. Perhaps it wouldn't be as problematic when the method is properly documented. Put differently, let's leave it like this for now, add a bunch of comments and we'll see if it still feels weird after that. qcolombet: Maybe worth splitting the reload from the spill chain as this assert is strange at first glance.

		// Bailout if less then 3 pairs.
		if (Len < 6)
		qcolombetUnsubmitted Done Reply Inline Actions We'll need to expand on that commet because naively, a 2 pairs chain would be beneficial to remove, but this is not something this code can do since we don't recolor outside of the spill chain. I'd suggest putting something like: We need at least 3 pairs of copies for the transformation to apply, because the first outermost pair cannot be removed since we don't recolor outside of the chain and that we need at least one temporary spill slot to shorten the chain. If we only have a chain of two pairs, we already have the shortest sequence this code can handle: the outermost pair for the temporary spill slot, and the pair that use that temporary spill slot for the other end of the chain. qcolombet: We'll need to expand on that commet because naively, a 2 pairs chain would be beneficial to…
		return;
		// Confirm we have a valid chain.
		qcolombetUnsubmitted Done Reply Inline Actions Technically we could collapse this sequence to: // r0 = COPY r4 // <def-use r4> // r4 = COPY r0 I.e., I think it is worth explaining that the propagation doesn't check whether or not `r0` can be altered outside of the chain and that's why we conservatively keep its value as it was before the rewrite. (Would be a nice follow-up fix BTW :)). qcolombet: Technically we could collapse this sequence to: ``` // r0 = COPY r4 // <def-use r4> // r4 =…
		lkailAuthorUnsubmitted Done Reply Inline Actions Maybe we can check if `r0` is killed to remove one more COPY. lkail: Maybe we can check if `r0` is killed to remove one more COPY.
		qcolombetUnsubmitted Not Done Reply Inline Actions Yep, but for that to be accurate we would need to flip the direction of the analysis (from top-down, to bottom-up) to get proper liveness construction. (Or use the kill flag, but generally speaking we try to avoid relying on this.) qcolombet: Yep, but for that to be accurate we would need to flip the direction of the analysis (from top…
		for (size_t I = 2; I != Len; I += 2) {
		MachineInstr *SpillMI = Chain[I];
		MachineInstr *ReloadMI = Chain[I + 1];
		qcolombetUnsubmitted Done Reply Inline Actions Typo: until qcolombet: Typo: until
		qcolombetUnsubmitted Done Reply Inline Actions typo: chain uses qcolombet: typo: chain uses
		MachineInstr *PrevSpillMI = Chain[I - 2];
		MachineInstr *PrevReloadMI = Chain[I - 1];
		// dbgs() << "Checking:\n";
		// SpillMI->dump();
		// PrevSpillMI->dump();
		// PrevReloadMI->dump();
		// ReloadMI->dump();
		qcolombetUnsubmitted Done Reply Inline Actions Here and other places where you have "debug" statements: Put this in `LLVM_DEBUG` macros. Or remove completely. qcolombet: Here and other places where you have "debug" statements: Put this in `LLVM_DEBUG` macros. Or…
		assert(PrevSpillMI->getOperand(0).getReg().asMCReg() ==
		SpillMI->getOperand(1).getReg().asMCReg());
		assert(PrevReloadMI->getOperand(1).getReg().asMCReg() ==
		ReloadMI->getOperand(0).getReg().asMCReg());
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions Move that into its own helper function and call it from an assert. static bool LLVM_ATTRIBUTE_UNUSED isValidChain(const SmallVectorImpl<MachineInstr > &Chain) { // your checks here. } ... assert(isValidChain(Chain) && "Invalid chain to process"); qcolombet:* Move that into its own helper function and call it from an assert. ``` static bool…
		// If one in the chain is not foldable, give up.
		qcolombetUnsubmitted Done Reply Inline Actions typo: encountered qcolombet: typo: encountered
		for (size_t I = 0; I != Len; I += 2) {
		MachineInstr *SpillMI = Chain[I];
		MachineInstr *ReloadMI = Chain[I + 1];
		if (!IsFoldableCopy(SpillMI) \|\| !IsFoldableCopy(ReloadMI))
		qcolombetUnsubmitted Not Done Reply Inline Actions I feel that this is a bit late to check that. We should not put a copy in the "candidate" chains if the copy is not foldable. I would suggest to handle that in the main loop. qcolombet: I feel that this is a bit late to check that. We should not put a copy in the "candidate"…
		return;
		qcolombetUnsubmitted Done Reply Inline Actions That sounds weird. Would you mind sharing the assertion? qcolombet: That sounds weird. Would you mind sharing the assertion?
		lkailAuthorUnsubmitted Done Reply Inline Actions It hits `DenseMapIterator`'s pointer operator->() const { assert(isHandleInSync() && "invalid iterator access!"); ... } I dived into it a bit, looks it's checking the validity of the iterator, i.e., if the container is updated, the iterator constructed before the update is invalid. Code like auto Leader = ChainLeader.find(MaybePrevReload); ... ChainLeader.insert({MaybeSpill, Leader->second}); ChainLeader.insert({&MI, Leader->second}); Should be avoided. lkail: It hits `DenseMapIterator`'s ``` pointer operator->() const { assert(isHandleInSync() &&…
		qcolombetUnsubmitted Not Done Reply Inline Actions Yep, every time you insert something in the dense map, you may invalidate the iterators. qcolombet: Yep, every time you insert something in the dense map, you may invalidate the iterators.
		}

		qcolombetUnsubmitted Done Reply Inline Actions Could you add a comment on what the mappings hold (all three of them)? I haven't read the code past this point yet, but for instance I would have expected that the key in these maps would be a `Register` not `MachineInstr`. qcolombet: Could you add a comment on what the mappings hold (all three of them)? I haven't read the code…
		lkailAuthorUnsubmitted Done Reply Inline Actions I haven't read the code past this point yet, but for instance I would have expected that the key in these maps would be a Register not MachineInstr. I separate the algorithm implementation in to two stages. stage1: Collect spill-reload chains. stage2: Fold the chains. If using `Register`, we are unable to track different spill-reload chains that share same registers. lkail: > I haven't read the code past this point yet, but for instance I would have expected that the…
		qcolombetUnsubmitted Not Done Reply Inline Actions I see, make sense. qcolombet: I see, make sense.
		// TODO: If the last reload has kill flag on src, we can fold into the last
		qcolombetUnsubmitted Not Done Reply Inline Actions typo: until qcolombet: typo: until
		// spill/reload pair.
		Chain[0]->getOperand(0).setReg(Chain[Len - 3]->getOperand(0).getReg());
		qcolombetUnsubmitted Not Done Reply Inline Actions Instead of tracking that, should we just invalidate the chain / stop it before that point? qcolombet: Instead of tracking that, should we just invalidate the chain / stop it before that point?
		lkailAuthorUnsubmitted Done Reply Inline Actions The implementation doesn't invalidate any chain in stage1. Compared to previous implementation, I think current one is easier to reason and easier to maintain. When we are iterating MI inside the MBB, we don't know which `COPY` might be one of the innermost spill-reload pair and we don't want to lose track of the innermost spill-reload pair. The Source of the innermost spill is allowed to be re-use and re-def between the innermost spill-reload pair. lkail: The implementation doesn't invalidate any chain in stage1. Compared to previous implementation…
		Chain[1]->getOperand(1).setReg(Chain[Len - 4]->getOperand(1).getReg());
		qcolombetUnsubmitted Not Done Reply Inline Actions By construction `Chain[Len - 4]->getOperand(0) == Chain[Len - 3]->getOperand(1)`, so I would instead put that in a variable and use it in both places. E.g., something like: // Pull the last spill slot used only within the chain as the final spill slot. MCRegister LastReusableRegSpillSlot = Chain[Len - 4]->getOperand(0).getReg() // Update the chain to skip all the intermediate register spill slots: // Spilling: Chain[0]->getOperand(0).setReg(LastReusableRegSpillSlot); // Reload: Chain[1]->getOperand(1).setReg(LastReusableRegSpillSlot); qcolombet: By construction `Chain[Len - 4]->getOperand(0) == Chain[Len - 3]->getOperand(1)`, so I would…
		for (size_t I = 2; I != Len - 3; ++I)
		MaybeDeadCopies.insert(Chain[I]);
		qcolombetUnsubmitted Not Done Reply Inline Actions Maybe be worth adding a comment here that although the variable is called `MaybeDeadCopies`, we really are going to remove the related instructions. The fact that we use `MaybeDeadCopies` to do our code cleanup is slightly confusing because if we don't actually delete the intermediate copies (a.k.a. what remains of the chain at this point) the resulting code would be incorrect. qcolombet: Maybe be worth adding a comment here that although the variable is called `MaybeDeadCopies`, we…
		};

		for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
		// IntEqClasses not feasible here.
		if (MI.isCopy() && MI.getNumOperands() == 2 &&
		qcolombetUnsubmitted Not Done Reply Inline Actions Nit: `const` on `MachineInstr` qcolombet:* Nit: `const` on `MachineInstr*`
		!TRI->regsOverlap(MI.getOperand(0).getReg(),
		MI.getOperand(1).getReg())) {
		MCRegister DefReg = MI.getOperand(0).getReg().asMCReg();
		MCRegister SrcReg = MI.getOperand(1).getReg().asMCReg();
		MachineInstr MaybeSpill = Tracker.findAvailCopy(MI, SrcReg, TRI);
		qcolombetUnsubmitted Done Reply Inline Actions Maybe add a todo that if the outermost pair of copies modifies a register that is dead outside of that pair, we could eliminate one more pair. qcolombet: Maybe add a todo that if the outermost pair of copies modifies a register that is dead outside…
		auto IsSpillReload = [](const MachineInstr &MI0,
		const MachineInstr &MI1) {
		return MI0.getOperand(0).getReg().asMCReg() ==
		MI1.getOperand(1).getReg().asMCReg() &&
		MI0.getOperand(1).getReg().asMCReg() ==
		MI1.getOperand(0).getReg().asMCReg();
		};
		if (MaybeSpill) {
		// dbgs() << "MaybeSpill:\n";
		// MaybeSpill->dump();
		if (IsSpillReload(*MaybeSpill, MI)) {
		// dbgs() << "Found spill/reload pair:\n";
		// MaybeSpill->dump();
		// MI.dump();
		// We have found a spill/reload pair. Now look for which chain this
		qcolombetUnsubmitted Not Done Reply Inline Actions You should be able to use a range loop: for (const MachineInstr Spill : SC) { if (CopySourceInvalid.count(Spill)) return; } qcolombet:* You should be able to use a range loop: ``` for (const MachineInstr *Spill : SC) { if…
		// pair belongs to.
		auto LeadRegI = LeadRegs.find(DefReg);
		if (LeadRegI == LeadRegs.end()) {
		assert(!SpillChains.count(DefReg) &&
		"Chain for DefReg should not exist without a LeadReg");
		qcolombetUnsubmitted Not Done Reply Inline Actions Nit: range loop qcolombet: Nit: range loop
		SpillChains.insert({DefReg, {MaybeSpill, &MI}});
		LeadRegs[DefReg] = DefReg;
		LeadRegs[SrcReg] = DefReg;
		} else {
		MCRegister LeadReg = LeadRegI->second;
		assert(SpillChains.count(LeadReg) &&
		qcolombetUnsubmitted Not Done Reply Inline Actions That's going to be pretty expensive to walk all the register classes. I'm guessing you're trying to check if the resulting copy is legal and unfortunately there's no good way to do that. Did you see that showing up in compile time profile? qcolombet: That's going to be pretty expensive to walk all the register classes. I'm guessing you're…
		"Chain with a LeadReg should exist");
		SmallVector<MachineInstr *> &Chain = SpillChains[LeadRegI->second];
		Chain.push_back(MaybeSpill);
		Chain.push_back(&MI);
		LeadRegs[DefReg] = LeadReg;
		LeadRegs[SrcReg] = LeadReg;
		}
		} else if (LeadRegs.count(DefReg)) {
		// Clobber by non-paired copy and we have a available chain. Now
		// perform transformation before invalidating it.
		MCRegister LeadReg = LeadRegs[DefReg];
		TryFoldSpillageCopies(LeadReg);
		}
		}
		// dbgs() << "Tracking:\n";
		// MI.dump();
		Tracker.trackCopy(&MI, *TRI);
		continue;
		}
		// For Non-copy instruction MI.
		for (const MachineOperand &MO : MI.operands()) {
		if (!MO.isReg())
		continue;
		MCRegister Reg = MO.getReg().asMCReg();
		if (!Reg)
		continue;
		MachineInstr MaybeCopy = Tracker.findAvailCopy(MI, Reg, TRI);
		if (!MaybeCopy)
		continue;
		qcolombetUnsubmitted Not Done Reply Inline Actions At first it is strange to see that we look for a copy when `Reg` is a def, but I guess it makes sense because: We are not going to recolor `Reg` We need to consider this chain before it gets clobbered later in that same loop Assuming I understood that correctly, it deserves its comment here. qcolombet: At first it is strange to see that we look for a copy when `Reg` is a def, but I guess it makes…
		MCRegister DefReg = MaybeCopy->getOperand(0).getReg().asMCReg();
		qcolombetUnsubmitted Not Done Reply Inline Actions Nit: range loop qcolombet: Nit: range loop
		assert(TRI->regsOverlap(Reg, DefReg) &&
		"Tracker should have tracked the COPY writing to Reg");
		MCRegister SrcReg = MaybeCopy->getOperand(1).getReg().asMCReg();
		if (LeadRegs.count(SrcReg)) {
		MCRegister LeadReg = LeadRegs[SrcReg];
		if (SpillChains.count(LeadReg)) {
		// We have a available chain. Now perform transformation.
		qcolombetUnsubmitted Not Done Reply Inline Actions Use `LeadRegs.find` and avoid the double lookups (one in `count` and one in `operator[]`). qcolombet: Use `LeadRegs.find` and avoid the double lookups (one in `count` and one in `operator[]`).
		// We don't need this chain anymore.
		// dbgs() << "SpillChain size for " << printReg(LeadReg, TRI) << ": "
		qcolombetUnsubmitted Not Done Reply Inline Actions Nit: Here and other places where you use `isCopyInstr`: use the explicit type instead of `auto`. (The return type is hard to infer.) qcolombet: Nit: Here and other places where you use `isCopyInstr`: use the explicit type instead of `auto`.
		// << SpillChains[LeadReg].size() << "\n";
		TryFoldSpillageCopies(LeadReg);
		}
		}
		// dbgs() << "Clobbering " << printReg(Reg, TRI) << "\n";
		Tracker.clobberRegister(Reg, *TRI);
		}
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions I think this statement deserves its own comment. IIUC here we unconditionally clobber all the registers (as opposed to only clobbering the definitions) because we only rewrite the chain itself (i.e., we don't attempt to rewrite uses after the chain). BTW, you need to take into account regmasks too. qcolombet: I think this statement deserves its own comment. IIUC here we unconditionally clobber all the…
		qcolombetUnsubmitted Not Done Reply Inline Actions Shouldn't we clear the `SpillChains` here for defs and not-preversed-by-regmasks regs at this point? qcolombet: Shouldn't we clear the `SpillChains` here for defs and not-preversed-by-regmasks regs at this…
		// Handle remaining chains.
		for (auto I : SpillChains)
		TryFoldSpillageCopies(I.first);

		for (MachineInstr *Copy : MaybeDeadCopies) {
		Register Src = Copy->getOperand(1).getReg();
		Register Def = Copy->getOperand(0).getReg();
		SmallVector<MachineInstr *> MaybeDeadDbgUsers(CopyDbgUsers[Copy].begin(),
		CopyDbgUsers[Copy].end());
		MRI->updateDbgUsersToReg(Src.asMCReg(), Def.asMCReg(), MaybeDeadDbgUsers);
		Copy->eraseFromParent();
		++NumDeletes;
		}

		MaybeDeadCopies.clear();
		CopyDbgUsers.clear();
		Tracker.clear();
		}

bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF) {		bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

Changed = false;		Changed = false;

TRI = MF.getSubtarget().getRegisterInfo();		TRI = MF.getSubtarget().getRegisterInfo();
TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
		EliminateSpillageCopies(MBB);
BackwardCopyPropagateBlock(MBB);		BackwardCopyPropagateBlock(MBB);
ForwardCopyPropagateBlock(MBB);		ForwardCopyPropagateBlock(MBB);
}		}

return Changed;		return Changed;
}		}

llvm/test/CodeGen/PowerPC/mcp-elim-eviction-chain.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -O3 -verify-machineinstrs -mtriple=powerpc64-unknown-unknown \
				# RUN: -simplify-mir -run-pass=machine-cp %s -o - \| FileCheck %s

				---
				name: test0
				alignment: 4
				tracksRegLiveness: true
				body: \|
				bb.0.entry:
				liveins: $x4, $x5, $x20, $x21, $x22
				; CHECK-LABEL: name: test0
				; CHECK: liveins: $x4, $x5, $x20, $x21, $x22
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: renamable $x23 = COPY $x4
				; CHECK-NEXT: renamable $x24 = COPY $x4
				; CHECK-NEXT: $x22 = COPY renamable $x20
				; CHECK-NEXT: renamable $x20 = ADD8 $x4, $x5
				; CHECK-NEXT: renamable $x4 = COPY renamable $x20
				; CHECK-NEXT: renamable $x20 = COPY $x22
				; CHECK-NEXT: renamable $x22 = COPY renamable $x23
				; CHECK-NEXT: renamable $x23 = COPY renamable $x24
				; CHECK-NEXT: $x3 = COPY $x4
				; CHECK-NEXT: BLR8 implicit $lr8, implicit undef $rm, implicit $x3, implicit $x20, implicit $x21, implicit $x22, implicit $x23
				renamable $x23 = COPY $x4
				renamable $x24 = COPY renamable $x23
				renamable $x23 = COPY renamable $x22
				renamable $x22 = COPY renamable $x21
				renamable $x21 = COPY renamable $x20
				renamable $x20 = ADD8 $x4, $x5
				renamable $x4 = COPY renamable $x20
				renamable $x20 = COPY renamable $x21
				renamable $x21 = COPY renamable $x22
				renamable $x22 = COPY renamable $x23
				renamable $x23 = COPY renamable $x24
				$x3 = COPY $x4
				BLR8 implicit $lr8, implicit undef $rm, implicit $x3, implicit $x20, implicit $x21, implicit $x22, implicit $x23

				...
				qcolombetUnsubmitted Done Reply Inline Actions Add a test with regmasks. qcolombet: Add a test with regmasks.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/spillingmove.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vldrht.u16 q0, [r5]			; CHECK-NEXT: vldrht.u16 q0, [r5]
	; CHECK-NEXT: vshr.u16 q1, q0, #3			; CHECK-NEXT: vshr.u16 q1, q0, #3
	; CHECK-NEXT: vand q1, q1, q2			; CHECK-NEXT: vand q1, q1, q2
	; CHECK-NEXT: vmov q2, q4			; CHECK-NEXT: vmov q2, q4
	; CHECK-NEXT: vmla.u16 q2, q1, r2			; CHECK-NEXT: vmla.u16 q2, q1, r2
	; CHECK-NEXT: vshr.u16 q1, q2, #5			; CHECK-NEXT: vshr.u16 q1, q2, #5
	; CHECK-NEXT: vshl.i16 q2, q0, #3			; CHECK-NEXT: vshl.i16 q2, q0, #3
	; CHECK-NEXT: vand q3, q1, q5
	; CHECK-NEXT: vmov q1, q7
	; CHECK-NEXT: vand q2, q2, q6			; CHECK-NEXT: vand q2, q2, q6
	; CHECK-NEXT: vmov q7, q6			; CHECK-NEXT: vmov q6, q4
	; CHECK-NEXT: vmov q6, q5
	; CHECK-NEXT: vmov q5, q4
	; CHECK-NEXT: vldrw.u32 q4, [sp, #48] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q4, [sp, #48] @ 16-byte Reload
				; CHECK-NEXT: vand q3, q1, q5
				; CHECK-NEXT: vmov q1, q7
	; CHECK-NEXT: vshr.u16 q0, q0, #9			; CHECK-NEXT: vshr.u16 q0, q0, #9
	; CHECK-NEXT: vmla.u16 q4, q2, r2			; CHECK-NEXT: vmla.u16 q4, q2, r2
				; CHECK-NEXT: vand q0, q0, q7
	; CHECK-NEXT: vshr.u16 q2, q4, #11			; CHECK-NEXT: vshr.u16 q2, q4, #11
	; CHECK-NEXT: vmov q4, q5			; CHECK-NEXT: vmov q4, q6
	; CHECK-NEXT: vmov q5, q6
	; CHECK-NEXT: vmov q6, q7
	; CHECK-NEXT: vmov q7, q1
	; CHECK-NEXT: vorr q1, q3, q2			; CHECK-NEXT: vorr q1, q3, q2
	; CHECK-NEXT: vldrw.u32 q2, [sp, #16] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q2, [sp, #16] @ 16-byte Reload
	; CHECK-NEXT: vand q0, q0, q7			; CHECK-NEXT: vmov q6, q7
	; CHECK-NEXT: vmla.u16 q2, q0, r2			; CHECK-NEXT: vmla.u16 q2, q0, r2
	; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
	; CHECK-NEXT: vand q0, q2, q0			; CHECK-NEXT: vand q0, q2, q0
	; CHECK-NEXT: vldrw.u32 q2, [sp, #32] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q2, [sp, #32] @ 16-byte Reload
	; CHECK-NEXT: vorr q0, q1, q0			; CHECK-NEXT: vorr q0, q1, q0
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vstrht.16 q0, [r5], #16			; CHECK-NEXT: vstrht.16 q0, [r5], #16
	; CHECK-NEXT: le lr, .LBB0_3			; CHECK-NEXT: le lr, .LBB0_3
	▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll

	Show First 20 Lines • Show All 1,115 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: sub.w r12, r12, #4			; CHECK-NEXT: sub.w r12, r12, #4
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q5, q0, q7			; CHECK-NEXT: vfmat.f32 q5, q0, q7
	; CHECK-NEXT: vldrwt.u32 q0, [r10]			; CHECK-NEXT: vldrwt.u32 q0, [r10]
	; CHECK-NEXT: add.w r6, r11, r5			; CHECK-NEXT: add.w r6, r11, r5
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q6, q0, q7			; CHECK-NEXT: vfmat.f32 q6, q0, q7
	; CHECK-NEXT: vldrwt.u32 q0, [r11]			; CHECK-NEXT: vldrwt.u32 q0, [r11]
				; CHECK-NEXT: adds r7, r6, r5
	; CHECK-NEXT: vstrw.32 q6, [sp, #40] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q6, [sp, #40] @ 16-byte Spill
	; CHECK-NEXT: vmov q6, q5			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vpst
	; CHECK-NEXT: vfmat.f32 q1, q0, q7			; CHECK-NEXT: vfmat.f32 q1, q0, q7
	; CHECK-NEXT: vmov q5, q4
	; CHECK-NEXT: vmov q4, q3
	; CHECK-NEXT: vmov q3, q1
	; CHECK-NEXT: vpst
	; CHECK-NEXT: vldrwt.u32 q0, [r6]			; CHECK-NEXT: vldrwt.u32 q0, [r6]
				; CHECK-NEXT: vmov q4, q1
	; CHECK-NEXT: vldrw.u32 q1, [sp, #56] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q1, [sp, #56] @ 16-byte Reload
	; CHECK-NEXT: adds r7, r6, r5			; CHECK-NEXT: adds r6, r7, r5
				; CHECK-NEXT: vmov q6, q5
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q1, q0, q7			; CHECK-NEXT: vfmat.f32 q1, q0, q7
	; CHECK-NEXT: vldrwt.u32 q0, [r7]			; CHECK-NEXT: vldrwt.u32 q0, [r7]
	; CHECK-NEXT: adds r6, r7, r5
	; CHECK-NEXT: vstrw.32 q1, [sp, #56] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q1, [sp, #56] @ 16-byte Spill
	; CHECK-NEXT: vmov q1, q3			; CHECK-NEXT: vmov q1, q4
	; CHECK-NEXT: vmov q3, q4
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q3, q0, q7			; CHECK-NEXT: vfmat.f32 q3, q0, q7
	; CHECK-NEXT: vldrwt.u32 q0, [r6]			; CHECK-NEXT: vldrwt.u32 q0, [r6]
	; CHECK-NEXT: vmov q4, q5			; CHECK-NEXT: vmov q4, q5
	; CHECK-NEXT: adds r7, r6, r5			; CHECK-NEXT: adds r7, r6, r5
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q4, q0, q7			; CHECK-NEXT: vfmat.f32 q4, q0, q7
	; CHECK-NEXT: vldrwt.u32 q0, [r7]			; CHECK-NEXT: vldrwt.u32 q0, [r7]
	; CHECK-NEXT: vmov q5, q6
	; CHECK-NEXT: vldrw.u32 q6, [sp, #40] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q6, [sp, #40] @ 16-byte Reload
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vfmat.f32 q2, q0, q7			; CHECK-NEXT: vfmat.f32 q2, q0, q7
	; CHECK-NEXT: le lr, .LBB6_3			; CHECK-NEXT: le lr, .LBB6_3
	; CHECK-NEXT: @ %bb.4: @ %middle.block			; CHECK-NEXT: @ %bb.4: @ %middle.block
	; CHECK-NEXT: @ in Loop: Header=BB6_2 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB6_2 Depth=1
	; CHECK-NEXT: vadd.f32 s0, s26, s27			; CHECK-NEXT: vadd.f32 s0, s26, s27
	; CHECK-NEXT: add.w r1, r2, r8, lsl #2			; CHECK-NEXT: add.w r1, r2, r8, lsl #2
	▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: @ Parent Loop BB7_2 Depth=1			; CHECK-NEXT: @ Parent Loop BB7_2 Depth=1
	; CHECK-NEXT: @ => This Inner Loop Header: Depth=2			; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
	; CHECK-NEXT: vctp.32 r10			; CHECK-NEXT: vctp.32 r10
	; CHECK-NEXT: add.w r11, r3, r5			; CHECK-NEXT: add.w r11, r3, r5
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vldrwt.u32 q0, [r9], #16			; CHECK-NEXT: vldrwt.u32 q0, [r9], #16
	; CHECK-NEXT: vldrwt.u32 q1, [r3], #16			; CHECK-NEXT: vldrwt.u32 q1, [r3], #16
	; CHECK-NEXT: add.w r6, r11, r5			; CHECK-NEXT: add.w r6, r11, r5
	; CHECK-NEXT: sub.w r10, r10, #4			; CHECK-NEXT: vmov q3, q2
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q6, q1, q0			; CHECK-NEXT: vfmat.f32 q6, q1, q0
	; CHECK-NEXT: vldrwt.u32 q1, [r11]			; CHECK-NEXT: vldrwt.u32 q1, [r11]
	; CHECK-NEXT: vstrw.32 q6, [sp, #40] @ 16-byte Spill
	; CHECK-NEXT: vmov q6, q5
	; CHECK-NEXT: vpst
	; CHECK-NEXT: vfmat.f32 q7, q1, q0
	; CHECK-NEXT: vmov q5, q3
	; CHECK-NEXT: vmov q3, q4
	; CHECK-NEXT: vmov q4, q2
	; CHECK-NEXT: vpst
	; CHECK-NEXT: vldrwt.u32 q1, [r6]
	; CHECK-NEXT: vldrw.u32 q2, [sp, #56] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q2, [sp, #56] @ 16-byte Reload
	; CHECK-NEXT: adds r7, r6, r5			; CHECK-NEXT: adds r7, r6, r5
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
				; CHECK-NEXT: vfmat.f32 q7, q1, q0
				; CHECK-NEXT: vldrwt.u32 q1, [r6]
				; CHECK-NEXT: adds r6, r7, r5
				; CHECK-NEXT: vstrw.32 q6, [sp, #40] @ 16-byte Spill
				; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q2, q1, q0			; CHECK-NEXT: vfmat.f32 q2, q1, q0
	; CHECK-NEXT: vldrwt.u32 q1, [r7]			; CHECK-NEXT: vldrwt.u32 q1, [r7]
	; CHECK-NEXT: vstrw.32 q2, [sp, #56] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q2, [sp, #56] @ 16-byte Spill
	; CHECK-NEXT: vldrw.u32 q2, [sp, #72] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q2, [sp, #72] @ 16-byte Reload
	; CHECK-NEXT: adds r6, r7, r5			; CHECK-NEXT: adds r7, r6, r5
				; CHECK-NEXT: vmov q6, q5
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q2, q1, q0			; CHECK-NEXT: vfmat.f32 q2, q1, q0
	; CHECK-NEXT: vldrwt.u32 q1, [r6]			; CHECK-NEXT: vldrwt.u32 q1, [r6]
	; CHECK-NEXT: adds r7, r6, r5
	; CHECK-NEXT: vstrw.32 q2, [sp, #72] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q2, [sp, #72] @ 16-byte Spill
	; CHECK-NEXT: vmov q2, q4			; CHECK-NEXT: vmov q2, q3
	; CHECK-NEXT: vmov q4, q3
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q2, q1, q0			; CHECK-NEXT: vfmat.f32 q2, q1, q0
	; CHECK-NEXT: vldrwt.u32 q1, [r7]			; CHECK-NEXT: vldrwt.u32 q1, [r7]
	; CHECK-NEXT: adds r6, r7, r5			; CHECK-NEXT: adds r6, r7, r5
	; CHECK-NEXT: vmov q3, q5			; CHECK-NEXT: vmov q3, q5
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q4, q1, q0			; CHECK-NEXT: vfmat.f32 q4, q1, q0
	; CHECK-NEXT: vldrwt.u32 q1, [r6]			; CHECK-NEXT: vldrwt.u32 q1, [r6]
	; CHECK-NEXT: vmov q5, q6
	; CHECK-NEXT: add r6, r5			; CHECK-NEXT: add r6, r5
				; CHECK-NEXT: vldrw.u32 q6, [sp, #40] @ 16-byte Reload
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vfmat.f32 q5, q1, q0			; CHECK-NEXT: vfmat.f32 q5, q1, q0
	; CHECK-NEXT: vldrwt.u32 q1, [r6]			; CHECK-NEXT: vldrwt.u32 q1, [r6]
	; CHECK-NEXT: vldrw.u32 q6, [sp, #40] @ 16-byte Reload			; CHECK-NEXT: sub.w r10, r10, #4
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vfmat.f32 q3, q1, q0			; CHECK-NEXT: vfmat.f32 q3, q1, q0
	; CHECK-NEXT: le lr, .LBB7_3			; CHECK-NEXT: le lr, .LBB7_3
	; CHECK-NEXT: @ %bb.4: @ %middle.block			; CHECK-NEXT: @ %bb.4: @ %middle.block
	; CHECK-NEXT: @ in Loop: Header=BB7_2 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB7_2 Depth=1
	; CHECK-NEXT: vadd.f32 s0, s30, s31			; CHECK-NEXT: vadd.f32 s0, s30, s31
	; CHECK-NEXT: add.w r1, r2, r1, lsl #2			; CHECK-NEXT: add.w r1, r2, r1, lsl #2
	; CHECK-NEXT: vadd.f32 s2, s28, s29			; CHECK-NEXT: vadd.f32 s2, s28, s29
	▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCopyPropagation] Eliminate spillage copies that might be caused by eviction chainClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 416864

llvm/lib/CodeGen/MachineCopyPropagation.cpp

llvm/test/CodeGen/PowerPC/mcp-elim-eviction-chain.mir

llvm/test/CodeGen/Thumb2/LowOverheadLoops/spillingmove.ll

llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll

[MachineCopyPropagation] Eliminate spillage copies that might be caused by eviction chain
ClosedPublic