This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
CodeGen/
-
Passes.h
-
TargetInstrInfo.h
-
InitializePasses.h
-
lib/
-
CodeGen/
-
CodeGen.cpp
3/8
MachineSink.cpp
1
TargetInstrInfo.cpp
-
TargetPassConfig.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64LoadStoreOptimizer.cpp
-
test/
-
CodeGen/
-
AArch64/
-
post-ra-machine-sink.mir
-
sink-copy-for-shrink-wrap.ll
-
Hexagon/
-
noreturn-noepilog.ll
-
swp-phi-ref.ll
-
Thumb2/
2
ifcvt-no-branch-predictor.ll
-
X86/
-
branchfolding-debugloc.ll
-
i128-mul.ll
-
machine-cp.ll
-
scalar_widen_div.ll
-
DebugInfo/X86/
-
X86/
2
dbg-value-transfer-order.ll

Differential D41463

[CodeGen] Add a new pass for PostRA sink
ClosedPublic

Authored by junbuml on Dec 20 2017, 1:57 PM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB
thegameg
mcrosier
gberry
hfinkel
john.brawn
twoh
RKSimon
sebpop
kparzysz

Commits

rG2ecb7ba4c65f: [CodeGen] Add a new pass for PostRA sink
rL328237: [CodeGen] Add a new pass for PostRA sink

Summary

This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated). This avoids executing the
the copy on paths where their results aren't needed. This also exposes
additional opportunites for dead copy elimination and shrink wrapping.

These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..

For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.

%bb.0:
   %wzr = SUBSWri %w1, 1
   %w19 = COPY %w0
   Bcc 11, %bb.2
 %bb.1:
   Live Ins: %w19
   BL @fun
   %w0 = ADDWrr %w0, %w19
   RET %w0
 %bb.2:
   %w0 = COPY %wzr
   RET %w0

As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.

With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64.

Diff Detail

Event Timeline

junbuml created this revision.Dec 20 2017, 1:57 PM

Herald added subscribers: kristof.beyls, javed.absar, mgorny, aemerson. · View Herald TranscriptDec 20 2017, 1:58 PM

sfertile added a subscriber: sfertile.Dec 20 2017, 6:00 PM

I'd like to suggest that this implementation be included in MachineSink.cpp, but continue to live as a separate pass. The two passes have the same intent, but must be independent passes due to the previously mentioned constrains; I'd almost want to refer to this as the PostRAMachineSink pass... The point being I'd like to have one place where I can see what's sunk pre-RA and what's sunk post-RA.

lib/CodeGen/MachineCopySink.cpp
10 ↗	(On Diff #127782)	Perhaps something like: // This pass sinks COPY instructions into a successor block, if the COPY is not // used in the current block and the COPY is live-in to a single successor // (i.e., doesn't require the COPY to be duplicated). This avoids executing the // the copy on paths where their results aren't needed. This also exposes // additional opportunites for dead copy elimination and shrink wrapping. // // These copies were either not handled by or are inserted after the MachineSink // pass. As an example of the former case, the MachineSink pass cannot sink // COPY instructions with allocatable source registers; for AArch64 these type // of copy instructions are frequently used to move function parameters (PhyReg) // into virtual registers in the entry block. // // For the machine IR below, this pass will sink %w19 in the entry into its // successor (%bb.1) because %w19 is only live-in in %bb.1. // %bb.0: // %wzr = SUBSWri %w1, 1 // %w19 = COPY %w0 // Bcc 11, %bb.2 // %bb.1: // Live Ins: %w19 // BL @fun // %w0 = ADDWrr %w0, %w19 // RET %w0 // %bb.2: // %w0 = COPY %wzr // RET %w0 // As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be // able to see %bb.0 as a candidate.
85 ↗	(On Diff #127782)	AFAICT, this is a straight copy from the AArch64LoadStoreOptimizer.cpp pass. Can we refactor this into AArch64InstrInfo, so we don't duplicate code?
120 ↗	(On Diff #127782)	// If BB is set here, Reg is live-in to at least two sinkable successors, so quit.
126 ↗	(On Diff #127782)	// Reg is not live-in to any sinkable successors.
130 ↗	(On Diff #127782)	(A suggestion that can be ignored if you disagree, but) I think this might be easier to understand if this loop and the loop on line 118 were combine. Something like: MachineBasicBlock BB = nullptr; for (auto SI : CurBB.successors()) { // Try to find a single sinkable successor in which Reg is live-in. if (SinkableBBs.count(SI) && SI->isLiveIn(Reg)) { if (BB) return nullptr; BB = SI; continue; } // Check if Reg or any register aliased with Reg is live-in in other successors. if (SI->isLiveIn(Reg)) return nullptr; for (const auto LI : SI->liveins()) if (AliasedRegs.count(LI.PhysReg)) return nullptr; } Of course this means that you'll have to change SinkableBBs to a Set/SmallSet, which I believe you can iterate over..
175 ↗	(On Diff #127782)	Perhaps add a comment. // Don't sink the COPY if it would violate a register dependency.

junbuml updated this revision to Diff 128532.Jan 3 2018, 9:38 AM

Moved this pass into MachineSink.cpp and named it PostRAMachineSink pass as Chad commented.

junbuml added inline comments.Jan 3 2018, 9:39 AM

lib/CodeGen/MachineCopySink.cpp
130 ↗	(On Diff #127782)	I agree that merging these two loops is easier to read, but I intentionally kept these two loops separate because I wanted to take the second loop only when we found a single sinkable successor.

Fixed failures in :
test/DebugInfo/X86/dbg-value-transfer-order.ll
test/CodeGen/Hexagon/vect/vect-v4i16.ll

Fixed typo.

I've tested this out on Power with spec2006 and several open-source applications. With spec I saw a pretty similar increase in shrink wrapping opportunities (~11% with -O3 pgo+thinlto, ~7% with just -O3). I've noticed in some instances we do a lot of sinking without enabling new shrink-wrap opportunities though. For example with xalan stats showed we sunk about 4000 copies with this pass, but only enabled 5 new shrink-wrap opportunities and I see a consistent ~2% degradation with ref data. My understanding was enabling more shrink-wrap candidates was the original motivation. Have you considered breaking this up into an analysis that collects what copies are sinkable, and then only sink if doing so is likely to make the block viable for shrink-wrapping?

I've tested this out on Power with spec2006 and several open-source applications. With spec I saw a pretty similar increase in shrink wrapping opportunities (~11% with -O3 pgo+thinlto, ~7% with just -O3). I've noticed in some instances we do a lot of sinking without enabling new shrink-wrap opportunities though. For example with xalan stats showed we sunk about 4000 copies with this pass, but only enabled 5 new shrink-wrap opportunities and I see a consistent ~2% degradation with ref data. My understanding was enabling more shrink-wrap candidates was the original motivation. Have you considered breaking this up into an analysis that collects what copies are sinkable, and then only sink if doing so is likely to make the block viable for shrink-wrapping?

Very initially, this was motivated for more shrink wrapping, but we don't have to limit this just for more shrink wrapping because it can reduce #of dynamic instruction by sinking copies into paths where their results are really needed. This also helps for more dead copy eliminations. Can you please share little bit more detail about Xalan. AFAIK, Xalan score often vary across the top of truck in range of 3~4%.

In D41463#980597, @junbuml wrote:

I've tested this out on Power with spec2006 and several open-source applications. With spec I saw a pretty similar increase in shrink wrapping opportunities (~11% with -O3 pgo+thinlto, ~7% with just -O3). I've noticed in some instances we do a lot of sinking without enabling new shrink-wrap opportunities though. For example with xalan stats showed we sunk about 4000 copies with this pass, but only enabled 5 new shrink-wrap opportunities and I see a consistent ~2% degradation with ref data. My understanding was enabling more shrink-wrap candidates was the original motivation. Have you considered breaking this up into an analysis that collects what copies are sinkable, and then only sink if doing so is likely to make the block viable for shrink-wrapping?

Very initially, this was motivated for more shrink wrapping, but we don't have to limit this just for more shrink wrapping because it can reduce #of dynamic instruction by sinking copies into paths where their results are really needed. This also helps for more dead copy eliminations. Can you please share little bit more detail about Xalan. AFAIK, Xalan score often vary across the top of truck in range of 3~4%.

Sure, I used -O3 -fexperimental-new-pass-manager as the compiler options, the only difference between the compilers was this patch and I ran the binaries 7 times. With each executable run-to-run variation was small.

In my environment, Xalan is sensitive in code alignment. I open see some small change which is not even applied in hot functions could impact on score in range of 3~4%. I'm not sure if this is also your case. Can you please share how this change impact on hot functions (AFAIK, Xalan has two hot functions). It will be great if you can share any suspicious sinking or transformations caused by sinking copies.

In D41463#982337, @junbuml wrote:

In my environment, Xalan is sensitive in code alignment. I open see some small change which is not even applied in hot functions could impact on score in range of 3~4%. I'm not sure if this is also your case. Can you please share how this change impact on hot functions (AFAIK, Xalan has two hot functions). It will be great if you can share any suspicious sinking or transformations caused by sinking copies.

I spent some more time profiling this. I think your right on the code alignment causing the degradation. I've seen this before with Xalan but never with this big of a difference. I can trace the slow down to 2 functions that didn't change with this patch.

Thank you very much for confirming this. Kindly ping.

Please let me know any other comment about this patch.

Thanks for working on this! I've seen a few 1-2% improvements on arm64 with this enabled and I think it solves one of the issue that has been frequently raised with shrink-wrapping.

Few questions / concerns:

Is there anything preventing us to sink this even deeper than "one of the successors"? I think we should go further with this instead of special casing this particular issue.
If we end up doing that, I think this pass should sink more than just COPYs. Is going further with this and having a generic Post-RA Sink pass what you're planning to do?
I wonder if we could improve MachineSink to be scheduled both pre and post RA.
If that's not suitable, should we build some kind of infrastructure where we can merge both pre and post RA Sink passes and re-use the algorithms while only changing the constraints?

I'm just throwing ideas out here, since this feels a little bit special cased to the shrink-wrapping issue, while it could (and from what I understand, it already does) catch some even more interesting opportunities.

lib/CodeGen/MachineSink.cpp
970	You can use an `ArrayRef<MachineBasicBlock *>` here.
1002	You can use a reference for things that are never null.
test/DebugInfo/X86/dbg-value-transfer-order.ll
1	Just curious: what happened here? Is some debug info missing after this pass?

Is there anything preventing us to sink this even deeper than "one of the successors"? I think we should go further with this instead of special casing this particular issue.

As long as there is no the register dependency, we can continue sinking a COPY deeply. I believe sinkcopy5() in post-ra-machine-sink.mir shows this case.

If we end up doing that, I think this pass should sink more than just COPYs. Is going further with this and having a generic Post-RA Sink pass what you're planning to do?
I wonder if we could improve MachineSink to be scheduled both pre and post RA.
If that's not suitable, should we build some kind of infrastructure where we can merge both pre and post RA Sink passes and re-use the algorithms while only changing the constraints?
I'm just throwing ideas out here, since this feels a little bit special cased to the shrink-wrapping issue, while it could (and from what I understand, it already does) catch some even more interesting opportunities.

I believe we can make this as a generic Post-RA Sink pass, but I didn't see any other motivation cases other than sinking COPYs for now. Considering the current scope of this post-ra sink, I thought separating the pre/post-ra make code much simpler. If there are good enough motivation cases which require the post-ra sink pass to do the pretty much the same jobs done in pre-ra, I will be happy to extend it in a way that you mention here.

Thanks Francis for reviewing and testing this. I will fix your other comments soon.

In D41463#986976, @junbuml wrote:

Is there anything preventing us to sink this even deeper than "one of the successors"? I think we should go further with this instead of special casing this particular issue.

As long as there is no the register dependency, we can continue sinking a COPY deeply. I believe sinkcopy5() in post-ra-machine-sink.mir shows this case.

Great! Sorry, I didn't see that all the potential successors were added to the list.

Addressed Francis' comments.

On top of the COPY source forwarding (D41835), this will increase even more shrink-wrapping . With D41835, I observed about 60% more shrink-wrapping in spec2000/2006/2017 benchmarks.

test/DebugInfo/X86/dbg-value-transfer-order.ll
1	There was minor block structure change by sinking two COPYs into an empty block. Instead of disabling the pass, I changed block names accordingly.

junbuml mentioned this in D42600: [CodeGen][Shrink-wrap]split restore point.Jan 26 2018, 2:03 PM

Kindly ping ?

For me this is generally ok, but I would wait for the other reviewers to see if there are any concerns.

LGTM, but formal approval should probably come from someone outside of our group (ping! :).

lib/CodeGen/MachineSink.cpp
914	the the -> the

Updated comments and some test cases.

junbuml added reviewers: john.brawn, twoh.Feb 16 2018, 8:16 AM

Added reviewers (Taewook Oh and John Brawn) due to the changes in :
. test/CodeGen/X86/branchfolding-debugloc.ll
. test/CodeGen/Thumb2/ifcvt-no-branch-predictor.ll

Kindly ping.

The change to ifcvt-no-branch-predictor.ll looks OK to me, though this patch doesn't apply cleanly to latest trunk (and also has test failures, including in ifcvt-no-branch-predictor.ll). It looks like the cause of that is that D41835 has been reverted in r325421.

The change to ifcvt-no-branch-predictor.ll looks OK to me, though this patch doesn't apply cleanly to latest trunk (and also has test failures, including in ifcvt-no-branch-predictor.ll). It looks like the cause of that is that D41835 has been reverted in r325421.

Thanks John for the review. As I expect D41835 is recommitted soon, I will defer rebasing it for now.

Rebased.
Added Simon Pilgrim as reviewer due to the change in test/CodeGen/X86/i128-mul.ll

In D41463#1021240, @junbuml wrote:

Rebased.
Added Simon Pilgrim as reviewer due to the change in test/CodeGen/X86/i128-mul.ll

That change is fine, thanks.

Thanks Simon ! I will be happy to hear any comment, suggestion, or objection.

Can anybody take a look at this? I will be happy to get any feedback.

sebpop added a subscriber: sebpop.Mar 12 2018, 2:40 PM

sebpop added inline comments.

lib/Target/AArch64/AArch64InstrInfo.cpp
4620 ↗	(On Diff #136152)	This check for the zero registers seems to be the only difference with the generic TII implementation. I am thinking that this may be the case for other targets than aarch64. You could avoid duplicating all the code in this function by checking the result of a function like TII->canModifyRegister(Reg) or TII->isReadOnly(Reg)
test/CodeGen/Thumb2/ifcvt-no-branch-predictor.ll
121	Why do you need to change the test here?

sebpop added a subscriber: evandro.Mar 13 2018, 8:55 AM

sebpop added inline comments.

lib/CodeGen/MachineSink.cpp
995	So at this point we are at a loop depth 4: for (BB : MF) for (MI: BB) for (SI : BB.successors) for (LI : SI->liveins) I think you could do the last two loops above the loop (MI:BB) by asking first whether there is an opportunity to perform the sinking from BB into one of the BB.successors. The liveins are stored as a bitmap, and you could efficiently compute the difference of liveins in the successors. The registers in the diff are candidates for sinking. Also please post compile time numbers with this change.

Thanks Sebastian for the review.
I added inlined reply.

lib/CodeGen/MachineSink.cpp
995	I think you could do the last two loops above the loop (MI:BB) by asking first whether there is an opportunity to perform the sinking from BB into one of the BB.successors. We perform this loop only when we can find a single successor to which we can sink in the right above loop line 978. Are you asking to move this check done in this loop outside of getSingleLiveInSuccBB() before finding the single sinkable successor ? The liveins are stored as a bitmap, and you could efficiently compute the difference of liveins in the successors. The registers in the diff are candidates for sinking. I'm not clear about this. As far as I checked, LiveIns in a MachineBasicBlock is a vector. I use a bitmap, but that is to track def/use of registers. Also please post compile time numbers with this change. I will share compile time info soon.
lib/Target/AArch64/AArch64InstrInfo.cpp
4620 ↗	(On Diff #136152)	I cannot see neither TII->canModifyRegister(Reg) nor TII->isReadOnly(Reg), but instead I use TRI->isConstantPhysReg(Reg) in the generic TII.
test/CodeGen/Thumb2/ifcvt-no-branch-predictor.ll
122	I simply wanted to make r0 (%n) used in both successors so that we can keep the MOV in the entry block. I believe this is the easiest/right way to keep the original intention of this test with this new pass.

sebpop added inline comments.Mar 13 2018, 11:37 AM

lib/CodeGen/MachineSink.cpp
995	My reasoning is that we could compute the same information independently of an MI. By only looking at a BB and the liveins in its successors, we can compute whether there is an opportunity to sink.
lib/CodeGen/TargetInstrInfo.cpp
901	Looks good, thanks!

junbuml added inline comments.Mar 13 2018, 4:25 PM

lib/CodeGen/MachineSink.cpp
995	I think you meant that we first try to find sinkable Regs in advance independently of an MI and iterate MIs only when there is candidates, right? Assuming that this was what you meant, I doubt if doing so in advance is beneficial for the compile time. To be independent on MI, we need to do this aliased reg check for all live-ins in successors of every BB. In current implementation, we do this check after finishing register dependency of MI (line 1040) and when we know that the DestRef of copy is potentially sinkable to a single successor (line 978). I think in practically doing this only when we need to do must be cheaper than doing this in advance for all live-ins of successors independently of an MI. The worst case I can think of is that when a BB contain many redundant sinkable Copies writing to the same DestReg. For that, we might be able to cache the result for already visited Regs for a BB. Then, in worst case, we will perform this loop at most the number of registers of the target for a BB. Please let me know your thought.

I think the current implementation is good: please commit.
Thanks for the explanations.

This revision is now accepted and ready to land.Mar 13 2018, 4:35 PM

Thanks Sebastian for the review. I will commit it tomorrow.

Added Krzysztof as reviewer due to the change in hexagon test (test/CodeGen/Hexagon/noreturn-noepilog.ll).

I tried this patch on aarch64 A72 firefly linux on a set of benchmarks.
Overall the performance improved by 11% cumulatively (sum of all speedups and slowdowns.)
7 benchmarks improved by more than 1% and 4 degraded by >1%, one degraded by 4% and another by 3%.
I will investigate the 4% and 3% regressions.

Thanks for testing this. Please let me know any detail if the regression is real. I will also rerun performance tests on my side as well.

There was no obvious speedup and slowdown in my performance test for spec2000/2006/2017 on aarch64 falkor.

Sebastian,
Please let me know if you found something on your regressions.

Krzysztof,
Can you please confirm if the change in test/CodeGen/Hexagon/noreturn-noepilog.ll is okay?

Thanks,
Jun

The Hexagon change is fine, although a bit surprising. The two instructions are the same, but the :raw form is generally emitted for newer CPUs. I'm not sure what changed to get the older form printed instead.

I got more data for the benchmarks that slowed down, and I see that the variation is within the noise level.
Thanks for checking the performance on your side.

The Hexagon change is fine, although a bit surprising. The two instructions are the same, but the :raw form is generally emitted for newer CPUs. I'm not sure what changed to get the older form printed instead.

Thanks for review this. The only change from this patch is that a MOV in the entry is sunk into %b1. Honestly, I don't have much idea about the :raw form and how the MOV impact on allocframe in Hexagon. Do you think this expose some unexpected behavior in Hexagon?

Krzysztof, here is the assembly before this patch:

f1:                                     // @f1
// %bb.0:                               // %b0
        {
                        p0 = cmp.gtu(r0,#3); if (p0.new) jump:nt .LBB0_2
                        r2 = r0
        }
// %bb.1:                               // %b2
        {
                        r0 = +mpyi(r1,#7)
                        r1 = #0
                        jumpr r31
        }
.LBB0_2:                                // %b1
        {
                        call f0
                        r1:0 = combine(r2,##g0)
                        allocframe(r29,#0):raw
        }

after the patch:

f1:                                     // @f1
// %bb.0:                               // %b0
        {
                        p0 = cmp.gtu(r0,#3); if (p0.new) jump:nt .LBB0_2
        }
// %bb.1:                               // %b2
        {
                        r0 = +mpyi(r1,#7)
                        r1 = #0
                        jumpr r31
        }
.LBB0_2:                                // %b1
        {
                        r2 = r0
                        allocframe(#0)
        }
        {
                        call f0
                        r1:0 = combine(r2,##g0)
        }

The result takes one extra packet, which is a perf regression on Hexagon.
I think this is due to the fact that the sink of copies is post-ra, and
there doesn't seem to be a propagation pass to remove the extra transfer r2=r0.

The result takes one extra packet, which is a perf regression on Hexagon.
I think this is due to the fact that the sink of copies is post-ra, and
there doesn't seem to be a propagation pass to remove the extra transfer r2=r0.

Due to my lack of knowledge on Hexagon, I don't quite understand why this is regression. If this is Hexagon specific issue, I can turn it off for Hexagon.

Each "{ instructions }" represents a packet of instructions.
Each packet executes in 1 cycle.
Overall, before the patch there were 3 packets, after the patch 4 packets.
On one of the paths we go from 2 cycles to 3 cycles.

Can this be disabled via TargetPassConfig::disablePass? If so, please XFAIL the Hexagon test with a comment stating that it started failing after post-ra machine sinking,

Can this be disabled via TargetPassConfig::disablePass? If so, please XFAIL the Hexagon test with a comment stating that it started failing after post-ra machine sinking,

Yes, it can be disabled via TargetPassConfig::disablePass. I will go ahead and XFAIL the hexagon test with a comment.

XFAILed test/CodeGen/Hexagon/noreturn-noepilog.ll

Thanks for this @junbuml! I did some runs on arm64 and I see a regression on 176.gcc, in average of 1.5%.

I'm comparing

-Os -mllvm -disable-postra-machine-sink
vs
-Os

I did 6 runs for both configs and compared all of them:

176.gcc   1.4%
176.gcc   1.3%
176.gcc   1.3%
176.gcc   0.9%
176.gcc   0.5%
176.gcc   0.6%
176.gcc   1.2%
176.gcc   1.1%
176.gcc   1.1%
176.gcc   0.7%
176.gcc   0.3%
176.gcc   0.4%
176.gcc   1.4%
176.gcc   1.2%
176.gcc   1.2%
176.gcc   0.8%
176.gcc   0.4%
176.gcc   0.6%
176.gcc   2.4%
176.gcc   2.3%
176.gcc   2.3%
176.gcc   1.9%
176.gcc   1.5%
176.gcc   1.6%
176.gcc   2.5%
176.gcc   2.4%
176.gcc   2.4%
176.gcc   1.9%
176.gcc   1.6%
176.gcc   1.7%
176.gcc   2.6%
176.gcc   2.5%
176.gcc   2.5%
176.gcc   2.1%
176.gcc   1.7%
176.gcc   1.8%

I will investigate to see if I can find anything if I'm the only one seeing this.

lib/CodeGen/MachineSink.cpp
970	Should be `ArrayRef<MachineBasicBlock *> SinkableBBs`

Thanks Francis for your test. In my previous test (O3) on aarch64, I didn't observe any noticeable change in spec 2000/2006/2017 score. I just kicked off perf tests for Os, and I will share the results.

In my test with Os, I can see -0.6% in spec2000/gcc, but I couldn't see any change in the hot function (propagate_block), which takes just 11% in profile though. As far as I check in my environment, 0.6% seems reasonable performance swing. I'm not sure if your regression on 176.gcc could be explained as swing or not. Please let me know you see any negative impact from this.

minor formatting change.

Francis,
Just curious if your regression on 176.gcc is really caused by this change or it was possible performance swing. Did you by change have any improvement on your test?

In D41463#1043179, @junbuml wrote:

In my test with Os, I can see -0.6% in spec2000/gcc, but I couldn't see any change in the hot function (propagate_block), which takes just 11% in profile though. As far as I check in my environment, 0.6% seems reasonable performance swing. I'm not sure if your regression on 176.gcc could be explained as swing or not. Please let me know you see any negative impact from this.

I took a look at the diffs. Other than very few block placement diffs and some loads using different base registers, everything looks great, so I think this should be fine (I hope I'm not missing anything obvious here). Thank you for the work!

Thanks Francis for looking at this.

Found a new failure in a recently added hexagon test.

Krzysztof, can you please confirm if my change in your test (test/CodeGen/Hexagon/swp-phi-ref.ll) is okay?

Assuming this pass will be disabled on Hexagon, XFAILed swp-phi-ref.ll just like noreturn-noepilog.ll.

Closed by commit rL328237: [CodeGen] Add a new pass for PostRA sink (authored by junbuml). · Explain WhyMar 22 2018, 1:11 PM

This revision was automatically updated to reflect the committed changes.

This pass destroys DominatorInfo and we have to recompute it right after the pass from scratch. Is it possible to preserve it? Also, have you measured compile time impact of the patch?

Thanks,
Michael

This pass destroys DominatorInfo and we have to recompute it right after the pass from scratch. Is it possible to preserve it? Also, have you measured compile time impact of the patch?

I measured compile time impact of this patch for spec2000/2006/2017. Overall, I wasn't able to see any reproduciable regression; all up and down are in noise range. There is no change in CFG in this pass, preserve DT should be good and I will submit a follow-up patch for it.

I measured compile time impact of this patch for spec2000/2006/2017. Overall, I wasn't able to see any reproduciable regression; all up and down are in noise range. There is no change in CFG in this pass, preserve DT should be good and I will submit a follow-up patch for it.

Sounds good, thank you!

Michael

MatzeB added inline comments.Apr 16 2018, 5:36 PM

llvm/trunk/include/llvm/CodeGen/TargetInstrInfo.h
961 ↗	(On Diff #139501)	Why did you put this function into TargetInstrInfo? There is nothing target specific about it. In fact I would heavily object if a target would actually override this!

Revision Contents

Path

Size

include/

llvm/

CodeGen/

Passes.h

3 lines

TargetInstrInfo.h

5 lines

InitializePasses.h

1 line

lib/

CodeGen/

1 line

188 lines

27 lines

10 lines

Target/

AArch64/

AArch64LoadStoreOptimizer.cpp

45 lines

test/

CodeGen/

AArch64/

post-ra-machine-sink.mir

365 lines

sink-copy-for-shrink-wrap.ll

22 lines

Hexagon/

noreturn-noepilog.ll

4 lines

swp-phi-ref.ll

3 lines

Thumb2/

ifcvt-no-branch-predictor.ll

6 lines

X86/

branchfolding-debugloc.ll

4 lines

i128-mul.ll

8 lines

machine-cp.ll

18 lines

scalar_widen_div.ll

2 lines

DebugInfo/

X86/

dbg-value-transfer-order.ll

8 lines

Diff 139443

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
extern char &UnreachableMachineBlockElimID;		extern char &UnreachableMachineBlockElimID;

/// DeadMachineInstructionElim - This pass removes dead machine instructions.		/// DeadMachineInstructionElim - This pass removes dead machine instructions.
extern char &DeadMachineInstructionElimID;		extern char &DeadMachineInstructionElimID;

/// This pass adds dead/undef flags after analyzing subregister lanes.		/// This pass adds dead/undef flags after analyzing subregister lanes.
extern char &DetectDeadLanesID;		extern char &DetectDeadLanesID;

		/// This pass perform post-ra machine sink for COPY instructions.
		extern char &PostRAMachineSinkingID;

/// FastRegisterAllocation Pass - This pass register allocates as fast as		/// FastRegisterAllocation Pass - This pass register allocates as fast as
/// possible. It is best suited for debug code where live ranges are short.		/// possible. It is best suited for debug code where live ranges are short.
///		///
FunctionPass *createFastRegisterAllocator();		FunctionPass *createFastRegisterAllocator();

/// BasicRegisterAllocation Pass - This pass implements a degenerate global		/// BasicRegisterAllocation Pass - This pass implements a degenerate global
/// register allocator using the basic regalloc framework.		/// register allocator using the basic regalloc framework.
///		///
▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 951 Lines • ▼ Show 20 Lines	public:

/// Return true when a target supports MachineCombiner.		/// Return true when a target supports MachineCombiner.
virtual bool useMachineCombiner() const { return false; }		virtual bool useMachineCombiner() const { return false; }

/// Return true if the given SDNode can be copied during scheduling		/// Return true if the given SDNode can be copied during scheduling
/// even if it has glue.		/// even if it has glue.
virtual bool canCopyGluedNodeDuringSchedule(SDNode *N) const { return false; }		virtual bool canCopyGluedNodeDuringSchedule(SDNode *N) const { return false; }

		/// Remember what registers the specified instruction uses and modifies.
		virtual void trackRegDefsUses(const MachineInstr &MI, BitVector &ModifiedRegs,
		BitVector &UsedRegs,
		const TargetRegisterInfo *TRI) const;

protected:		protected:
/// Target-dependent implementation for foldMemoryOperand.		/// Target-dependent implementation for foldMemoryOperand.
/// Target-independent code in foldMemoryOperand will		/// Target-independent code in foldMemoryOperand will
/// take care of adding a MachineMemOperand to the newly created instruction.		/// take care of adding a MachineMemOperand to the newly created instruction.
/// The instruction and any auxiliary instructions necessary will be inserted		/// The instruction and any auxiliary instructions necessary will be inserted
/// at InsertPt.		/// at InsertPt.
virtual MachineInstr *		virtual MachineInstr *
foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
▲ Show 20 Lines • Show All 738 Lines • Show Last 20 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
	void initializePostDomOnlyPrinterPass(PassRegistry&);			void initializePostDomOnlyPrinterPass(PassRegistry&);
	void initializePostDomOnlyViewerPass(PassRegistry&);			void initializePostDomOnlyViewerPass(PassRegistry&);
	void initializePostDomPrinterPass(PassRegistry&);			void initializePostDomPrinterPass(PassRegistry&);
	void initializePostDomViewerPass(PassRegistry&);			void initializePostDomViewerPass(PassRegistry&);
	void initializePostDominatorTreeWrapperPassPass(PassRegistry&);			void initializePostDominatorTreeWrapperPassPass(PassRegistry&);
	void initializePostMachineSchedulerPass(PassRegistry&);			void initializePostMachineSchedulerPass(PassRegistry&);
	void initializePostOrderFunctionAttrsLegacyPassPass(PassRegistry&);			void initializePostOrderFunctionAttrsLegacyPassPass(PassRegistry&);
	void initializePostRAHazardRecognizerPass(PassRegistry&);			void initializePostRAHazardRecognizerPass(PassRegistry&);
				void initializePostRAMachineSinkingPass(PassRegistry&);
	void initializePostRASchedulerPass(PassRegistry&);			void initializePostRASchedulerPass(PassRegistry&);
	void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry&);			void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry&);
	void initializePredicateInfoPrinterLegacyPassPass(PassRegistry&);			void initializePredicateInfoPrinterLegacyPassPass(PassRegistry&);
	void initializePrintBasicBlockPassPass(PassRegistry&);			void initializePrintBasicBlockPassPass(PassRegistry&);
	void initializePrintFunctionPassWrapperPass(PassRegistry&);			void initializePrintFunctionPassWrapperPass(PassRegistry&);
	void initializePrintModulePassWrapperPass(PassRegistry&);			void initializePrintModulePassWrapperPass(PassRegistry&);
	void initializeProcessImplicitDefsPass(PassRegistry&);			void initializeProcessImplicitDefsPass(PassRegistry&);
	void initializeProfileSummaryInfoWrapperPassPass(PassRegistry&);			void initializeProfileSummaryInfoWrapperPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeMachineVerifierPassPass(Registry);		initializeMachineVerifierPassPass(Registry);
initializeOptimizePHIsPass(Registry);		initializeOptimizePHIsPass(Registry);
initializePEIPass(Registry);		initializePEIPass(Registry);
initializePHIEliminationPass(Registry);		initializePHIEliminationPass(Registry);
initializePatchableFunctionPass(Registry);		initializePatchableFunctionPass(Registry);
initializePeepholeOptimizerPass(Registry);		initializePeepholeOptimizerPass(Registry);
initializePostMachineSchedulerPass(Registry);		initializePostMachineSchedulerPass(Registry);
initializePostRAHazardRecognizerPass(Registry);		initializePostRAHazardRecognizerPass(Registry);
		initializePostRAMachineSinkingPass(Registry);
initializePostRASchedulerPass(Registry);		initializePostRASchedulerPass(Registry);
initializePreISelIntrinsicLoweringLegacyPassPass(Registry);		initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
initializeProcessImplicitDefsPass(Registry);		initializeProcessImplicitDefsPass(Registry);
initializeRABasicPass(Registry);		initializeRABasicPass(Registry);
initializeRegAllocFastPass(Registry);		initializeRegAllocFastPass(Registry);
initializeRAGreedyPass(Registry);		initializeRAGreedyPass(Registry);
initializeRegisterCoalescerPass(Registry);		initializeRegisterCoalescerPass(Registry);
initializeRenameIndependentSubregsPass(Registry);		initializeRenameIndependentSubregsPass(Registry);
Show All 24 Lines

lib/CodeGen/MachineSink.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	cl::desc(
"If the branch threshold is higher than this threshold, we allow "		"If the branch threshold is higher than this threshold, we allow "
"speculative execution of up to 1 instruction to avoid branching to "		"speculative execution of up to 1 instruction to avoid branching to "
"splitted critical edge"),		"splitted critical edge"),
cl::init(40), cl::Hidden);		cl::init(40), cl::Hidden);

STATISTIC(NumSunk, "Number of machine instructions sunk");		STATISTIC(NumSunk, "Number of machine instructions sunk");
STATISTIC(NumSplit, "Number of critical edges split");		STATISTIC(NumSplit, "Number of critical edges split");
STATISTIC(NumCoalesces, "Number of copies coalesced");		STATISTIC(NumCoalesces, "Number of copies coalesced");
		STATISTIC(NumPostRACopySink, "Number of copies sunk after RA");

namespace {		namespace {

class MachineSinking : public MachineFunctionPass {		class MachineSinking : public MachineFunctionPass {
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
MachineRegisterInfo *MRI; // Machine register information		MachineRegisterInfo *MRI; // Machine register information
MachineDominatorTree *DT; // Machine dominator tree		MachineDominatorTree *DT; // Machine dominator tree
▲ Show 20 Lines • Show All 809 Lines • ▼ Show 20 Lines	bool MachineSinking::SinkInstruction(MachineInstr &MI, bool &SawStore,
// used registers.		// used registers.
for (MachineOperand &MO : MI.operands()) {		for (MachineOperand &MO : MI.operands()) {
if (MO.isReg() && MO.isUse())		if (MO.isReg() && MO.isUse())
RegsToClearKillFlags.set(MO.getReg()); // Remember to clear kill flags.		RegsToClearKillFlags.set(MO.getReg()); // Remember to clear kill flags.
}		}

return true;		return true;
}		}

		//===----------------------------------------------------------------------===//
		// This pass is not intended to be a replacement or a complete alternative
		// for the pre-ra machine sink pass. It is only designed to sink COPY
		// instructions which should be handled after RA.
		//
		// This pass sinks COPY instructions into a successor block, if the COPY is not
		// used in the current block and the COPY is live-in to a single successor
		// (i.e., doesn't require the COPY to be duplicated). This avoids executing the
		mcrosierUnsubmitted Not Done Reply Inline Actions the the -> the mcrosier: the the -> the
		// copy on paths where their results aren't needed. This also exposes
		// additional opportunites for dead copy elimination and shrink wrapping.
		//
		// These copies were either not handled by or are inserted after the MachineSink
		// pass. As an example of the former case, the MachineSink pass cannot sink
		// COPY instructions with allocatable source registers; for AArch64 these type
		// of copy instructions are frequently used to move function parameters (PhyReg)
		// into virtual registers in the entry block.
		//
		// For the machine IR below, this pass will sink %w19 in the entry into its
		// successor (%bb.1) because %w19 is only live-in in %bb.1.
		// %bb.0:
		// %wzr = SUBSWri %w1, 1
		// %w19 = COPY %w0
		// Bcc 11, %bb.2
		// %bb.1:
		// Live Ins: %w19
		// BL @fun
		// %w0 = ADDWrr %w0, %w19
		// RET %w0
		// %bb.2:
		// %w0 = COPY %wzr
		// RET %w0
		// As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
		// able to see %bb.0 as a candidate.
		//===----------------------------------------------------------------------===//
		namespace {

		class PostRAMachineSinking : public MachineFunctionPass {
		public:
		bool runOnMachineFunction(MachineFunction &MF) override;

		static char ID;
		PostRAMachineSinking() : MachineFunctionPass(ID) {}
		StringRef getPassName() const override { return "PostRA Machine Sink"; }

		private:
		/// Track which registers have been modified and used.
		BitVector ModifiedRegs, UsedRegs;

		/// Sink Copy instructions unused in the same block close to their uses in
		/// successors.
		bool tryToSinkCopy(MachineBasicBlock &BB, MachineFunction &MF,
		const TargetRegisterInfo TRI, const TargetInstrInfo TII);
		};
		} // namespace

		char PostRAMachineSinking::ID = 0;
		char &llvm::PostRAMachineSinkingID = PostRAMachineSinking::ID;

		INITIALIZE_PASS(PostRAMachineSinking, "postra-machine-sink",
		"PostRA Machine Sink", false, false)

		static MachineBasicBlock *
		getSingleLiveInSuccBB(MachineBasicBlock &CurBB,
		ArrayRef<MachineBasicBlock *> SinkableBBs, unsigned Reg,
		thegamegUnsubmitted Done Reply Inline Actions You can use an `ArrayRef<MachineBasicBlock >` here. thegameg:* You can use an `ArrayRef<MachineBasicBlock *>` here.
		thegamegUnsubmitted Done Reply Inline Actions Should be `ArrayRef<MachineBasicBlock > SinkableBBs` thegameg:* Should be `ArrayRef<MachineBasicBlock *> SinkableBBs`
		const TargetRegisterInfo *TRI) {
		SmallSet<unsigned, 8> AliasedRegs;
		for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI)
		AliasedRegs.insert(*AI);

		// Try to find a single sinkable successor in which Reg is live-in.
		MachineBasicBlock *BB = nullptr;
		for (auto *SI : SinkableBBs) {
		if (SI->isLiveIn(Reg)) {
		// If BB is set here, Reg is live-in to at least two sinkable successors,
		// so quit.
		if (BB)
		return nullptr;
		BB = SI;
		}
		}
		// Reg is not live-in to any sinkable successors.
		if (!BB)
		return nullptr;

		// Check if any register aliased with Reg is live-in in other successors.
		for (auto *SI : CurBB.successors()) {
		if (SI == BB)
		continue;
		for (const auto LI : SI->liveins())
		sebpopUnsubmitted Not Done Reply Inline Actions So at this point we are at a loop depth 4: for (BB : MF) for (MI: BB) for (SI : BB.successors) for (LI : SI->liveins) I think you could do the last two loops above the loop (MI:BB) by asking first whether there is an opportunity to perform the sinking from BB into one of the BB.successors. The liveins are stored as a bitmap, and you could efficiently compute the difference of liveins in the successors. The registers in the diff are candidates for sinking. Also please post compile time numbers with this change. sebpop: So at this point we are at a loop depth 4: for (BB : MF) for (MI: BB) for (SI : BB.
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions I think you could do the last two loops above the loop (MI:BB) by asking first whether there is an opportunity to perform the sinking from BB into one of the BB.successors. We perform this loop only when we can find a single successor to which we can sink in the right above loop line 978. Are you asking to move this check done in this loop outside of getSingleLiveInSuccBB() before finding the single sinkable successor ? The liveins are stored as a bitmap, and you could efficiently compute the difference of liveins in the successors. The registers in the diff are candidates for sinking. I'm not clear about this. As far as I checked, LiveIns in a MachineBasicBlock is a vector. I use a bitmap, but that is to track def/use of registers. Also please post compile time numbers with this change. I will share compile time info soon. junbuml: > I think you could do the last two loops above the loop (MI:BB) by asking first whether there…
		sebpopUnsubmitted Not Done Reply Inline Actions My reasoning is that we could compute the same information independently of an MI. By only looking at a BB and the liveins in its successors, we can compute whether there is an opportunity to sink. sebpop: My reasoning is that we could compute the same information independently of an MI. By only…
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions I think you meant that we first try to find sinkable Regs in advance independently of an MI and iterate MIs only when there is candidates, right? Assuming that this was what you meant, I doubt if doing so in advance is beneficial for the compile time. To be independent on MI, we need to do this aliased reg check for all live-ins in successors of every BB. In current implementation, we do this check after finishing register dependency of MI (line 1040) and when we know that the DestRef of copy is potentially sinkable to a single successor (line 978). I think in practically doing this only when we need to do must be cheaper than doing this in advance for all live-ins of successors independently of an MI. The worst case I can think of is that when a BB contain many redundant sinkable Copies writing to the same DestReg. For that, we might be able to cache the result for already visited Regs for a BB. Then, in worst case, we will perform this loop at most the number of registers of the target for a BB. Please let me know your thought. junbuml: I think you meant that we first try to find sinkable Regs in advance independently of an MI and…
		if (AliasedRegs.count(LI.PhysReg))
		return nullptr;
		}
		return BB;
		}

		bool PostRAMachineSinking::tryToSinkCopy(MachineBasicBlock &CurBB,
		thegamegUnsubmitted Done Reply Inline Actions You can use a reference for things that are never null. thegameg: You can use a reference for things that are never null.
		MachineFunction &MF,
		const TargetRegisterInfo *TRI,
		const TargetInstrInfo *TII) {
		SmallVector<MachineBasicBlock *, 2> SinkableBBs;
		// FIXME: For now, we sink only to a successor which has a single predecessor
		// so that we can directly sink COPY instructions to the successor without
		// adding any new block or branch instruction.
		for (MachineBasicBlock *SI : CurBB.successors())
		if (!SI->livein_empty() && SI->pred_size() == 1)
		SinkableBBs.push_back(SI);

		if (SinkableBBs.empty())
		return false;

		bool Changed = false;

		// Track which registers have been modified and used between the end of the
		// block and the current instruction.
		ModifiedRegs.reset();
		UsedRegs.reset();

		for (auto I = CurBB.rbegin(), E = CurBB.rend(); I != E;) {
		MachineInstr MI = &I;
		++I;

		// Do not move any instruction across function call.
		if (MI->isCall())
		return false;

		if (!MI->isCopy() \|\| !MI->getOperand(0).isRenamable()) {
		TII->trackRegDefsUses(*MI, ModifiedRegs, UsedRegs, TRI);
		continue;
		}

		unsigned DefReg = MI->getOperand(0).getReg();
		unsigned SrcReg = MI->getOperand(1).getReg();
		// Don't sink the COPY if it would violate a register dependency.
		if (ModifiedRegs[DefReg] \|\| ModifiedRegs[SrcReg] \|\| UsedRegs[DefReg]) {
		TII->trackRegDefsUses(*MI, ModifiedRegs, UsedRegs, TRI);
		continue;
		}

		MachineBasicBlock *SuccBB =
		getSingleLiveInSuccBB(CurBB, SinkableBBs, DefReg, TRI);
		// Don't sink if we cannot find a single sinkable successor in which Reg
		// is live-in.
		if (!SuccBB) {
		TII->trackRegDefsUses(*MI, ModifiedRegs, UsedRegs, TRI);
		continue;
		}
		assert((SuccBB->pred_size() == 1 && *SuccBB->pred_begin() == &CurBB) &&
		"Unexpected predecessor");

		// Clear the kill flag if SrcReg is killed between MI and the end of the
		// block.
		if (UsedRegs[SrcReg]) {
		MachineBasicBlock::iterator NI = std::next(MI->getIterator());
		for (MachineInstr &UI : make_range(NI, CurBB.end())) {
		if (UI.killsRegister(SrcReg, TRI)) {
		UI.clearRegisterKills(SrcReg, TRI);
		MI->getOperand(1).setIsKill(true);
		break;
		}
		}
		}

		MachineBasicBlock::iterator InsertPos = SuccBB->getFirstNonPHI();
		SuccBB->splice(InsertPos, &CurBB, MI);
		SuccBB->removeLiveIn(DefReg);
		if (!SuccBB->isLiveIn(SrcReg))
		SuccBB->addLiveIn(SrcReg);

		Changed = true;
		++NumPostRACopySink;
		}
		return Changed;
		}

		bool PostRAMachineSinking::runOnMachineFunction(MachineFunction &MF) {
		bool Changed = false;
		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
		const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
		ModifiedRegs.resize(TRI->getNumRegs());
		UsedRegs.resize(TRI->getNumRegs());

		for (auto &BB : MF)
		Changed \|= tryToSinkCopy(BB, MF, TRI, TII);

		return Changed;
		}

lib/CodeGen/TargetInstrInfo.cpp

Show First 20 Lines • Show All 876 Lines • ▼ Show 20 Lines	default:
break;		break;
}		}

assert(Prev && "Unknown pattern for machine combiner");		assert(Prev && "Unknown pattern for machine combiner");

reassociateOps(Root, *Prev, Pattern, InsInstrs, DelInstrs, InstIdxForVirtReg);		reassociateOps(Root, *Prev, Pattern, InsInstrs, DelInstrs, InstIdxForVirtReg);
}		}

		void TargetInstrInfo::trackRegDefsUses(const MachineInstr &MI,
		BitVector &ModifiedRegs,
		BitVector &UsedRegs,
		const TargetRegisterInfo *TRI) const {
		for (const MachineOperand &MO : MI.operands()) {
		if (MO.isRegMask())
		ModifiedRegs.setBitsNotInMask(MO.getRegMask());
		if (!MO.isReg())
		continue;
		unsigned Reg = MO.getReg();
		if (!Reg)
		continue;
		if (MO.isDef()) {
		// Some architectures (e.g. AArch64 XZR/WZR) have registers that are
		// constant and may be used as destinations to indicate the generated
		// value is discarded. No need to track such case as a def.
		if (!TRI->isConstantPhysReg(Reg))
		sebpopUnsubmitted Not Done Reply Inline Actions Looks good, thanks! sebpop: Looks good, thanks!
		for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI)
		ModifiedRegs.set(*AI);
		} else {
		assert(MO.isUse() && "Reg operand not a def and not a use");
		for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI)
		UsedRegs.set(*AI);
		}
		}
		}

bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric(		bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric(
const MachineInstr &MI, AliasAnalysis *AA) const {		const MachineInstr &MI, AliasAnalysis *AA) const {
const MachineFunction &MF = *MI.getMF();		const MachineFunction &MF = *MI.getMF();
const MachineRegisterInfo &MRI = MF.getRegInfo();		const MachineRegisterInfo &MRI = MF.getRegInfo();

// Remat clients assume operand 0 is the defined register.		// Remat clients assume operand 0 is the defined register.
if (!MI.getNumOperands() \|\| !MI.getOperand(0).isReg())		if (!MI.getNumOperands() \|\| !MI.getOperand(0).isReg())
return false;		return false;
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
static cl::opt<cl::boolOrDefault> OptimizeRegAlloc(		static cl::opt<cl::boolOrDefault> OptimizeRegAlloc(
"optimize-regalloc", cl::Hidden,		"optimize-regalloc", cl::Hidden,
cl::desc("Enable optimized register allocation compilation path."));		cl::desc("Enable optimized register allocation compilation path."));
static cl::opt<bool> DisablePostRAMachineLICM("disable-postra-machine-licm",		static cl::opt<bool> DisablePostRAMachineLICM("disable-postra-machine-licm",
cl::Hidden,		cl::Hidden,
cl::desc("Disable Machine LICM"));		cl::desc("Disable Machine LICM"));
static cl::opt<bool> DisableMachineSink("disable-machine-sink", cl::Hidden,		static cl::opt<bool> DisableMachineSink("disable-machine-sink", cl::Hidden,
cl::desc("Disable Machine Sinking"));		cl::desc("Disable Machine Sinking"));
		static cl::opt<bool> DisablePostRAMachineSink("disable-postra-machine-sink",
		cl::Hidden,
		cl::desc("Disable PostRA Machine Sinking"));
static cl::opt<bool> DisableLSR("disable-lsr", cl::Hidden,		static cl::opt<bool> DisableLSR("disable-lsr", cl::Hidden,
cl::desc("Disable Loop Strength Reduction Pass"));		cl::desc("Disable Loop Strength Reduction Pass"));
static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",		static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",
cl::Hidden, cl::desc("Disable ConstantHoisting"));		cl::Hidden, cl::desc("Disable ConstantHoisting"));
static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,		static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,
cl::desc("Disable Codegen Prepare"));		cl::desc("Disable Codegen Prepare"));
static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,		static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,
cl::desc("Disable Copy Propagation pass"));		cl::desc("Disable Copy Propagation pass"));
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	if (StandardID == &MachineCSEID)
return applyDisable(TargetID, DisableMachineCSE);		return applyDisable(TargetID, DisableMachineCSE);

if (StandardID == &MachineLICMID)		if (StandardID == &MachineLICMID)
return applyDisable(TargetID, DisablePostRAMachineLICM);		return applyDisable(TargetID, DisablePostRAMachineLICM);

if (StandardID == &MachineSinkingID)		if (StandardID == &MachineSinkingID)
return applyDisable(TargetID, DisableMachineSink);		return applyDisable(TargetID, DisableMachineSink);

		if (StandardID == &PostRAMachineSinkingID)
		return applyDisable(TargetID, DisablePostRAMachineSink);

if (StandardID == &MachineCopyPropagationID)		if (StandardID == &MachineCopyPropagationID)
return applyDisable(TargetID, DisableCopyProp);		return applyDisable(TargetID, DisableCopyProp);

return TargetID;		return TargetID;
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
/// TargetPassConfig		/// TargetPassConfig
▲ Show 20 Lines • Show All 569 Lines • ▼ Show 20 Lines	if (RegAlloc != &useDefaultRegisterAllocator &&
report_fatal_error("Must use fast (default) register allocator for unoptimized regalloc.");		report_fatal_error("Must use fast (default) register allocator for unoptimized regalloc.");
addFastRegAlloc(createRegAllocPass(false));		addFastRegAlloc(createRegAllocPass(false));
}		}

// Run post-ra passes.		// Run post-ra passes.
addPostRegAlloc();		addPostRegAlloc();

// Insert prolog/epilog code. Eliminate abstract frame index references...		// Insert prolog/epilog code. Eliminate abstract frame index references...
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None) {
		addPass(&PostRAMachineSinkingID);
addPass(&ShrinkWrapID);		addPass(&ShrinkWrapID);
		}

// Prolog/Epilog inserter needs a TargetMachine to instantiate. But only		// Prolog/Epilog inserter needs a TargetMachine to instantiate. But only
// do so if it hasn't been disabled, substituted, or overridden.		// do so if it hasn't been disabled, substituted, or overridden.
if (!isPassSubstitutedOrOverridden(&PrologEpilogCodeInserterID))		if (!isPassSubstitutedOrOverridden(&PrologEpilogCodeInserterID))
addPass(createPrologEpilogInserterPass());		addPass(createPrologEpilogInserterPass());

/// Add passes that optimize machine instructions after register allocation.		/// Add passes that optimize machine instructions after register allocation.
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Show First 20 Lines • Show All 988 Lines • ▼ Show 20 Lines	AArch64LoadStoreOpt::promoteLoadFromStore(MachineBasicBlock::iterator LoadI,
DEBUG((BitExtMI)->print(dbgs()));		DEBUG((BitExtMI)->print(dbgs()));
DEBUG(dbgs() << "\n");		DEBUG(dbgs() << "\n");

// Erase the old instructions.		// Erase the old instructions.
LoadI->eraseFromParent();		LoadI->eraseFromParent();
return NextI;		return NextI;
}		}

/// trackRegDefsUses - Remember what registers the specified instruction uses
/// and modifies.
static void trackRegDefsUses(const MachineInstr &MI, BitVector &ModifiedRegs,
BitVector &UsedRegs,
const TargetRegisterInfo *TRI) {
for (const MachineOperand &MO : MI.operands()) {
if (MO.isRegMask())
ModifiedRegs.setBitsNotInMask(MO.getRegMask());

if (!MO.isReg())
continue;
unsigned Reg = MO.getReg();
if (!Reg)
continue;
if (MO.isDef()) {
// WZR/XZR are not modified even when used as a destination register.
if (Reg != AArch64::WZR && Reg != AArch64::XZR)
for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI)
ModifiedRegs.set(*AI);
} else {
assert(MO.isUse() && "Reg operand not a def and not a use?!?");
for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI)
UsedRegs.set(*AI);
}
}
}

static bool inBoundsForPair(bool IsUnscaled, int Offset, int OffsetStride) {		static bool inBoundsForPair(bool IsUnscaled, int Offset, int OffsetStride) {
// Convert the byte-offset used by unscaled into an "element" offset used		// Convert the byte-offset used by unscaled into an "element" offset used
// by the scaled pair load/store instructions.		// by the scaled pair load/store instructions.
if (IsUnscaled) {		if (IsUnscaled) {
// If the byte-offset isn't a multiple of the stride, there's no point		// If the byte-offset isn't a multiple of the stride, there's no point
// trying to match it.		// trying to match it.
if (Offset % OffsetStride)		if (Offset % OffsetStride)
return false;		return false;
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	if (MI.mayStore() && isMatchingStore(LoadMI, MI) &&
StoreI = MBBI;		StoreI = MBBI;
return true;		return true;
}		}

if (MI.isCall())		if (MI.isCall())
return false;		return false;

// Update modified / uses register lists.		// Update modified / uses register lists.
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);

// Otherwise, if the base register is modified, we have no match, so		// Otherwise, if the base register is modified, we have no match, so
// return early.		// return early.
if (ModifiedRegs[BaseReg])		if (ModifiedRegs[BaseReg])
return false;		return false;

// If we encounter a store aliased with the load, return early.		// If we encounter a store aliased with the load, return early.
if (MI.mayStore() && mayAlias(LoadMI, MI, AA))		if (MI.mayStore() && mayAlias(LoadMI, MI, AA))
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	if (areCandidatesToMergeOrPair(FirstMI, MI, Flags, TII) &&
// We're trying to pair instructions that differ in how they are scaled.		// We're trying to pair instructions that differ in how they are scaled.
// If FirstMI is scaled then scale the offset of MI accordingly.		// If FirstMI is scaled then scale the offset of MI accordingly.
// Otherwise, do the opposite (i.e., make MI's offset unscaled).		// Otherwise, do the opposite (i.e., make MI's offset unscaled).
int MemSize = getMemScale(MI);		int MemSize = getMemScale(MI);
if (MIIsUnscaled) {		if (MIIsUnscaled) {
// If the unscaled offset isn't a multiple of the MemSize, we can't		// If the unscaled offset isn't a multiple of the MemSize, we can't
// pair the operations together: bail and keep looking.		// pair the operations together: bail and keep looking.
if (MIOffset % MemSize) {		if (MIOffset % MemSize) {
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);
MemInsns.push_back(&MI);		MemInsns.push_back(&MI);
continue;		continue;
}		}
MIOffset /= MemSize;		MIOffset /= MemSize;
} else {		} else {
MIOffset *= MemSize;		MIOffset *= MemSize;
}		}
}		}

if (BaseReg == MIBaseReg && ((Offset == MIOffset + OffsetStride) \|\|		if (BaseReg == MIBaseReg && ((Offset == MIOffset + OffsetStride) \|\|
(Offset + OffsetStride == MIOffset))) {		(Offset + OffsetStride == MIOffset))) {
int MinOffset = Offset < MIOffset ? Offset : MIOffset;		int MinOffset = Offset < MIOffset ? Offset : MIOffset;
if (FindNarrowMerge) {		if (FindNarrowMerge) {
// If the alignment requirements of the scaled wide load/store		// If the alignment requirements of the scaled wide load/store
// instruction can't express the offset of the scaled narrow input,		// instruction can't express the offset of the scaled narrow input,
// bail and keep looking. For promotable zero stores, allow only when		// bail and keep looking. For promotable zero stores, allow only when
// the stored value is the same (i.e., WZR).		// the stored value is the same (i.e., WZR).
if ((!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) \|\|		if ((!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) \|\|
(IsPromotableZeroStore && Reg != getLdStRegOp(MI).getReg())) {		(IsPromotableZeroStore && Reg != getLdStRegOp(MI).getReg())) {
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);
MemInsns.push_back(&MI);		MemInsns.push_back(&MI);
continue;		continue;
}		}
} else {		} else {
// Pairwise instructions have a 7-bit signed offset field. Single		// Pairwise instructions have a 7-bit signed offset field. Single
// insns have a 12-bit unsigned offset field. If the resultant		// insns have a 12-bit unsigned offset field. If the resultant
// immediate offset of merging these instructions is out of range for		// immediate offset of merging these instructions is out of range for
// a pairwise instruction, bail and keep looking.		// a pairwise instruction, bail and keep looking.
if (!inBoundsForPair(IsUnscaled, MinOffset, OffsetStride)) {		if (!inBoundsForPair(IsUnscaled, MinOffset, OffsetStride)) {
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);
MemInsns.push_back(&MI);		MemInsns.push_back(&MI);
continue;		continue;
}		}
// If the alignment requirements of the paired (scaled) instruction		// If the alignment requirements of the paired (scaled) instruction
// can't express the offset of the unscaled input, bail and keep		// can't express the offset of the unscaled input, bail and keep
// looking.		// looking.
if (IsUnscaled && (alignTo(MinOffset, OffsetStride) != MinOffset)) {		if (IsUnscaled && (alignTo(MinOffset, OffsetStride) != MinOffset)) {
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);
MemInsns.push_back(&MI);		MemInsns.push_back(&MI);
continue;		continue;
}		}
}		}
// If the destination register of the loads is the same register, bail		// If the destination register of the loads is the same register, bail
// and keep looking. A load-pair instruction with both destination		// and keep looking. A load-pair instruction with both destination
// registers the same is UNPREDICTABLE and will result in an exception.		// registers the same is UNPREDICTABLE and will result in an exception.
if (MayLoad && Reg == getLdStRegOp(MI).getReg()) {		if (MayLoad && Reg == getLdStRegOp(MI).getReg()) {
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);
MemInsns.push_back(&MI);		MemInsns.push_back(&MI);
continue;		continue;
}		}

// If the Rt of the second instruction was not modified or used between		// If the Rt of the second instruction was not modified or used between
// the two instructions and none of the instructions between the second		// the two instructions and none of the instructions between the second
// and first alias with the second, we can combine the second into the		// and first alias with the second, we can combine the second into the
// first.		// first.
Show All 20 Lines	for (unsigned Count = 0; MBBI != E && Count < Limit; ++MBBI) {
}		}

// If the instruction wasn't a matching load or store. Stop searching if we		// If the instruction wasn't a matching load or store. Stop searching if we
// encounter a call instruction that might modify memory.		// encounter a call instruction that might modify memory.
if (MI.isCall())		if (MI.isCall())
return E;		return E;

// Update modified / uses register lists.		// Update modified / uses register lists.
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);

// Otherwise, if the base register is modified, we have no match, so		// Otherwise, if the base register is modified, we have no match, so
// return early.		// return early.
if (ModifiedRegs[BaseReg])		if (ModifiedRegs[BaseReg])
return E;		return E;

// Update list of instructions that read/write memory.		// Update list of instructions that read/write memory.
if (MI.mayLoadOrStore())		if (MI.mayLoadOrStore())
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	for (unsigned Count = 0; MBBI != E && Count < Limit; ++MBBI) {
if (!MI.isTransient())		if (!MI.isTransient())
++Count;		++Count;

// If we found a match, return it.		// If we found a match, return it.
if (isMatchingUpdateInsn(*I, MI, BaseReg, UnscaledOffset))		if (isMatchingUpdateInsn(*I, MI, BaseReg, UnscaledOffset))
return MBBI;		return MBBI;

// Update the status of what the instruction clobbered and used.		// Update the status of what the instruction clobbered and used.
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);

// Otherwise, if the base register is used or modified, we have no match, so		// Otherwise, if the base register is used or modified, we have no match, so
// return early.		// return early.
if (ModifiedRegs[BaseReg] \|\| UsedRegs[BaseReg])		if (ModifiedRegs[BaseReg] \|\| UsedRegs[BaseReg])
return E;		return E;
}		}
return E;		return E;
}		}
Show All 35 Lines	do {
if (!MI.isTransient())		if (!MI.isTransient())
++Count;		++Count;

// If we found a match, return it.		// If we found a match, return it.
if (isMatchingUpdateInsn(*I, MI, BaseReg, Offset))		if (isMatchingUpdateInsn(*I, MI, BaseReg, Offset))
return MBBI;		return MBBI;

// Update the status of what the instruction clobbered and used.		// Update the status of what the instruction clobbered and used.
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);		TII->trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);

// Otherwise, if the base register is used or modified, we have no match, so		// Otherwise, if the base register is used or modified, we have no match, so
// return early.		// return early.
if (ModifiedRegs[BaseReg] \|\| UsedRegs[BaseReg])		if (ModifiedRegs[BaseReg] \|\| UsedRegs[BaseReg])
return E;		return E;
} while (MBBI != B && Count < Limit);		} while (MBBI != B && Count < Limit);
return E;		return E;
}		}
▲ Show 20 Lines • Show All 251 Lines • Show Last 20 Lines

test/CodeGen/AArch64/post-ra-machine-sink.mir

This file was added.

				# RUN: llc -mtriple=aarch64-none-linux-gnu -run-pass=postra-machine-sink -verify-machineinstrs -o - %s \| FileCheck %s

				---
				# Sink w19 to %bb.1.
				# CHECK-LABEL: name: sinkcopy1
				# CHECK-LABEL: bb.0:
				# CHECK-NOT: $w19 = COPY killed $w0
				# CHECK-LABEL: bb.1:
				# CHECK: liveins: $w1, $w0
				# CHECK: renamable $w19 = COPY killed $w0

				name: sinkcopy1
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY killed $w0
				Bcc 11, %bb.1, implicit $nzcv
				B %bb.2

				bb.1:
				liveins: $w1, $w19
				$w0 = ADDWrr $w1, $w19
				RET $x0

				bb.2:
				$w0 = COPY $wzr
				RET $x0
				...

				---
				# Sink w19 to %bb.2.
				# CHECK-LABEL: name: sinkcopy2
				# CHECK-LABEL: bb.0:
				# CHECK-NOT: renamable $w19 = COPY killed $w0
				# CHECK-LABEL: bb.2:
				# CHECK: liveins: $w1, $w0
				# CHECK: renamable $w19 = COPY killed $w0
				name: sinkcopy2
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY killed $w0
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1:
				$w0 = COPY $wzr
				RET $x0

				bb.2:
				liveins: $w1, $w19
				$w0 = ADDWrr $w1, $w19
				RET $x0
				...

				---
				# Sink w19 and w20 to %bb.1.
				# CHECK-LABEL: name: sinkcopy3
				# CHECK-LABEL: bb.0:
				# CHECK-NOT: renamable $w19 = COPY killed $w0
				# CHECK-LABEL: bb.1:
				# CHECK: liveins: $w1, $w0
				# CHECK: renamable $w19 = COPY killed $w0
				# CHECK: renamable $w20 = COPY killed $w1
				name: sinkcopy3
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY killed $w0
				renamable $w20 = COPY killed $w1

				bb.1:
				liveins: $w19, $w20
				$w0 = COPY $w19
				$w1 = COPY $w20
				RET $x0
				...


				# Sink w19 to %bb.1 and w20 to %bb.2.
				# CHECK-LABEL: name: sinkcopy4
				# CHECK-LABEL: bb.0:
				# CHECK-NOT: renamable $w19 = COPY killed $w0
				# CHECK-NOT: renamable $w20 = COPY killed $w1
				# CHECK-LABEL: bb.1:
				# CHECK: liveins: $w1, $w0
				# CHECK: renamable $w19 = COPY killed $w0
				# CHECK-LABEL: bb.2:
				# CHECK: liveins: $w0, $w1
				# CHECK: renamable $w20 = COPY killed $w1
				name: sinkcopy4
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY killed $w0
				renamable $w20 = COPY killed $w1
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1:
				liveins: $w1, $w19
				$w0 = ADDWrr $w1, $w19
				RET $x0

				bb.2:
				liveins: $w0, $w20
				$w0 = ADDWrr $w0, $w20
				RET $x0
				...

				# Sink w19 to %bb.3 through %bb.2.
				# CHECK-LABEL: name: sinkcopy5
				# CHECK-LABEL: bb.0:
				# CHECK-NOT: renamable $w19 = COPY $w0
				# CHECK-LABEL: bb.2:
				# CHECK: $w1 = ADDWrr $w1, $w0
				# CHECK-LABEL: bb.3:
				# CHECK: liveins: $w1, $w0
				# CHECK: renamable $w19 = COPY killed $w0
				name: sinkcopy5
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY $w0
				Bcc 11, %bb.2, implicit $nzcv

				bb.1:
				liveins: $x0
				$w19 = COPY $wzr
				RET $x0

				bb.2:
				liveins: $w0, $w1, $w19
				$w1 = ADDWrr $w1, killed $w0

				bb.3:
				liveins: $w1, $w19
				$w0 = ADDWrr $w1, $w19
				RET $x0
				...

				# Sink w19 to %bb.3, but through %bb.2.
				# CHECK-LABEL: name: sinkcopy6
				# CHECK-LABEL: bb.0:
				# CHECK-NOT: renamable $w19 = COPY $w0
				# CHECK-NOT: renamable $w20 = COPY $w0
				# CHECK-LABEL: bb.2:
				# CHECK: liveins: $w1, $w0
				# CHECK: renamable $w19 = COPY $w0
				# CHECK: renamable $w20 = COPY $w19
				name: sinkcopy6
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY $w0
				renamable $w20 = COPY $w19
				Bcc 11, %bb.2, implicit $nzcv

				bb.1:
				$w0 = COPY $wzr
				RET $x0

				bb.2:
				liveins: $w1, $w20
				$w0 = ADDWrr killed $w1, $w20
				RET $x0
				...

				---
				# Sink w19 regardless of the def of wzr in bb.0.
				# CHECK-LABEL: name: sinkcopy7
				# CHECK-LABEL: bb.0:
				# CHECK-NOT: renamable $w19 = COPY $w0
				# CHECK-LABEL: bb.2:
				# CHECK: renamable $w19 = COPY $wzr
				name: sinkcopy7
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				renamable $w19 = COPY $wzr
				$wzr = SUBSWri $w1, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1:
				$x0 = COPY $xzr
				RET $x0

				bb.2:
				liveins: $w0, $w19
				$w0 = ADDWrr $w0, $w19
				RET $x0
				---

				# Don't sink w19 as w0 is defined in bb.0.
				# CHECK-LABEL: name: donotsinkcopy1
				# CHECK-LABEL: bb.0:
				# CHECK: renamable $w19 = COPY $w0
				# CHECK: $w0 = LDRWui $sp, 0
				name: donotsinkcopy1
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY $w0
				$w0 = LDRWui $sp, 0 :: (load 4)
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1:
				$x0 = COPY $xzr
				RET $x0

				bb.2:
				liveins: $w0, $w19
				$w0 = ADDWrr $w0, $w19
				RET $x0
				...

				---
				# Don't sink w19 as w19 is used in bb.0.
				# CHECK-LABEL: name: donotsinkcopy2
				# CHECK-LABEL: bb.0:
				# CHECK: renamable $w19 = COPY $w0
				# CHECK: STRWui $w1, $x19, 0
				name: donotsinkcopy2
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY $w0
				STRWui $w1, $x19, 0 :: (store 4)
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1:
				$x0 = COPY $xzr
				RET $x0

				bb.2:
				liveins: $w0, $w19
				$w0 = ADDWrr $w0, $w19
				RET $x0
				...

				---
				# Don't sink w19 as w19 is used in both %bb.1 and %bb.2.
				# CHECK-LABEL: name: donotsinkcopy3
				# CHECK-LABEL: bb.0:
				# CHECK: renamable $w19 = COPY $w0
				name: donotsinkcopy3
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY $w0
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1:
				liveins: $w19
				$w0 = COPY $w19
				RET $x0

				bb.2:
				liveins: $w0, $w19
				$w0 = ADDWrr $w0, $w19
				RET $x0
				...

				---
				# Don't sink w19 as %bb.2 has multiple predecessors.
				# CHECK-LABEL: name: donotsinkcopy4
				# CHECK-LABEL: bb.0:
				# CHECK: renamable $w19 = COPY $w0
				name: donotsinkcopy4
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY $w0
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1:
				liveins: $w0
				$w19 = COPY $w0
				B %bb.2

				bb.2:
				liveins: $w0, $w19
				$w0 = ADDWrr $w0, $w19
				RET $x0
				...


				# Don't sink w19 after sinking w20.
				# CHECK-LABEL: name: donotsinkcopy5
				# CHECK-LABEL: bb.0:
				# CHECK: renamable $w19 = COPY $w0
				# CHECK-LABEL: bb.2:
				# CHECK: liveins: $w0, $w19
				# CHECK: renamable $w20 = COPY $w19
				name: donotsinkcopy5
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $w19 = COPY $w0
				renamable $w20 = COPY $w19
				Bcc 11, %bb.2, implicit $nzcv

				bb.1:
				liveins: $w19
				$w0 = COPY $w19
				RET $x0

				bb.2:
				liveins: $w0, $w20
				$w0 = ADDWrr killed $w0, $w20
				RET $x0
				...

				---
				# Don't sink w19 as x19 is live-in in %bb.2.
				# CHECK-LABEL: name: donotsinkcopy6
				# CHECK-LABEL: bb.0:
				name: donotsinkcopy6
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $x0, $w1
				$w1 = SUBSWri $w1, 1, 0, implicit-def $nzcv
				renamable $x19 = COPY $x0
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1:
				liveins: $w19
				$w0 = COPY $w19
				RET $x0

				bb.2:
				liveins: $x0, $x19
				$x0 = ADDXrr $x0, $x19
				RET $x0
				...

test/CodeGen/AArch64/sink-copy-for-shrink-wrap.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -o - %s \| FileCheck %s

				; CHECK-LABEL: %bb.0:
				; CHECK-NOT: stp
				; CHECK-NOT: mov w{{[0-9]+}}, w0
				; CHECK-LABEL: %bb.1:
				; CHECK: stp x19
				; CHECK: mov w{{[0-9]+}}, w0

				define i32 @shrinkwrapme(i32 %paramAcrossCall, i32 %paramNotAcrossCall) {
				entry:
				%cmp5 = icmp sgt i32 %paramNotAcrossCall, 0
				br i1 %cmp5, label %CallBB, label %Exit
				CallBB:
				%call = call i32 @fun()
				%add = add i32 %call, %paramAcrossCall
				ret i32 %add
				Exit:
				ret i32 0
				}

				declare i32 @fun()

test/CodeGen/Hexagon/noreturn-noepilog.ll

	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon < %s \| FileCheck %s
				;
				; XFAIL: *
				; This test is failing after post-ra machine sinking.
				;
	; Check that no epilogue is inserted after a noreturn call.			; Check that no epilogue is inserted after a noreturn call.
	;			;
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: allocframe(r29,#0):raw			; CHECK: allocframe(r29,#0):raw
	; CHECK-NOT: deallocframe			; CHECK-NOT: deallocframe

	target triple = "hexagon"			target triple = "hexagon"

	Show All 23 Lines

test/CodeGen/Hexagon/swp-phi-ref.ll

	; RUN: llc -march=hexagon -enable-pipeliner -enable-bsb-sched=0 -join-liveintervals=false < %s \| FileCheck %s			; RUN: llc -march=hexagon -enable-pipeliner -enable-bsb-sched=0 -join-liveintervals=false < %s \| FileCheck %s

				; XFAIL: *
				; This test is failing after post-ra machine sinking.

	; Test that we generate the correct Phi values when there is a Phi that			; Test that we generate the correct Phi values when there is a Phi that
	; references another Phi. We need to examine the other Phi to get the			; references another Phi. We need to examine the other Phi to get the
	; correct value. We need to do this even if we haven't generated the			; correct value. We need to do this even if we haven't generated the
	; kernel code for the other Phi yet.			; kernel code for the other Phi yet.

	; CHECK: [[REG0:(v[0-9]+)]] = [[REG1:(v[0-9]+)]]			; CHECK: [[REG0:(v[0-9]+)]] = [[REG1:(v[0-9]+)]]
	; CHECK: loop0			; CHECK: loop0
	; CHECK: [[REG0]] = [[REG1]]			; CHECK: [[REG0]] = [[REG1]]
	Show All 35 Lines

test/CodeGen/Thumb2/ifcvt-no-branch-predictor.ll

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if.then:
br label %if.end		br label %if.end

if.end:		if.end:
tail call void @otherfn()		tail call void @otherfn()
ret i32 0		ret i32 0
}		}

; CHECK-LABEL: diamond1:		; CHECK-LABEL: diamond1:
; CHECK: ite eq		; CHECK: itee eq
; CHECK: ldreq		; CHECK: ldreq
; CHECK: strne		; CHECK: strne
define i32 @diamond1(i32 %n, i32* %p) {		define i32 @diamond1(i32 %n, i32* %p) {
entry:		entry:
%tobool = icmp eq i32 %n, 0		%tobool = icmp eq i32 %n, 0
br i1 %tobool, label %if.else, label %if.then		br i1 %tobool, label %if.else, label %if.then

if.then:		if.then:
Show All 17 Lines
; CHECK-BP: b		; CHECK-BP: b
; CHECK-BP: str		; CHECK-BP: str
; CHECK-BP: ldr		; CHECK-BP: ldr
; CHECK-NOBP: ittee		; CHECK-NOBP: ittee
; CHECK-NOBP: streq		; CHECK-NOBP: streq
; CHECK-NOBP: ldreq		; CHECK-NOBP: ldreq
; CHECK-NOBP: strne		; CHECK-NOBP: strne
; CHECK-NOBP: strne		; CHECK-NOBP: strne
define i32 @diamond2(i32 %n, i32 %m, i32* %p, i32* %q) {		define i32 @diamond2(i32 %n, i32* %p, i32* %q) {
entry:		entry:
%tobool = icmp eq i32 %n, 0		%tobool = icmp eq i32 %n, 0
br i1 %tobool, label %if.else, label %if.then		br i1 %tobool, label %if.else, label %if.then

if.then:		if.then:
store i32 %n, i32* %p, align 4		store i32 %n, i32* %p, align 4
%arrayidx = getelementptr inbounds i32, i32* %p, i32 2		%arrayidx = getelementptr inbounds i32, i32* %p, i32 2
store i32 %n, i32* %arrayidx, align 4		store i32 %n, i32* %arrayidx, align 4
br label %if.end		br label %if.end

if.else:		if.else:
store i32 %m, i32* %q, align 4		store i32 %n, i32* %q, align 4
		sebpopUnsubmitted Not Done Reply Inline Actions Why do you need to change the test here? sebpop: Why do you need to change the test here?
%0 = load i32, i32* %p, align 4		%0 = load i32, i32* %p, align 4
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions I simply wanted to make r0 (%n) used in both successors so that we can keep the MOV in the entry block. I believe this is the easiest/right way to keep the original intention of this test with this new pass. junbuml: I simply wanted to make r0 (%n) used in both successors so that we can keep the MOV in the…
br label %if.end		br label %if.end

if.end:		if.end:
%n.addr.0 = phi i32 [ %n, %if.then ], [ %0, %if.else ]		%n.addr.0 = phi i32 [ %n, %if.then ], [ %0, %if.else ]
tail call void @otherfn()		tail call void @otherfn()
ret i32 %n.addr.0		ret i32 %n.addr.0
}		}

Show All 28 Lines

test/CodeGen/X86/branchfolding-debugloc.ll

	Show All 15 Lines
	; 12 }			; 12 }
	; 13 return ret;			; 13 return ret;
	; 14 }			; 14 }
	;			;
	; CHECK: # %entry			; CHECK: # %entry
	; CHECK-NOT: # %for.body			; CHECK-NOT: # %for.body
	; CHECK: .loc 1 6 3			; CHECK: .loc 1 6 3
	; CHECK-NEXT: je [[BB:.LBB[^ ]+]]			; CHECK-NEXT: je [[BB:.LBB[^ ]+]]
	; CHECK: [[BB]]:{{.}}# %for.end			; CHECK: [[BB]]:
				; CHECK: xorl %ebp, %ebp
				; CHECK-NEXT: .LBB{{.*}} # %for.end

	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define i32 @foo(i32* readonly %begin, i32* readnone %end) !dbg !4 {			define i32 @foo(i32* readonly %begin, i32* readnone %end) !dbg !4 {
	entry:			entry:
	%cmp6 = icmp eq i32* %begin, %end, !dbg !9			%cmp6 = icmp eq i32* %begin, %end, !dbg !9
	br i1 %cmp6, label %for.end, label %for.body.preheader, !dbg !12			br i1 %cmp6, label %for.end, label %for.body.preheader, !dbg !12

	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

test/CodeGen/X86/i128-mul.ll

	Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines
	; X86-BMI-NEXT: popl %esi			; X86-BMI-NEXT: popl %esi
	; X86-BMI-NEXT: popl %edi			; X86-BMI-NEXT: popl %edi
	; X86-BMI-NEXT: popl %ebx			; X86-BMI-NEXT: popl %ebx
	; X86-BMI-NEXT: popl %ebp			; X86-BMI-NEXT: popl %ebp
	; X86-BMI-NEXT: retl			; X86-BMI-NEXT: retl
	;			;
	; X64-NOBMI-LABEL: mul1:			; X64-NOBMI-LABEL: mul1:
	; X64-NOBMI: # %bb.0: # %entry			; X64-NOBMI: # %bb.0: # %entry
	; X64-NOBMI-NEXT: movq %rcx, %r8
	; X64-NOBMI-NEXT: movq %rdx, %r9
	; X64-NOBMI-NEXT: testq %rdi, %rdi			; X64-NOBMI-NEXT: testq %rdi, %rdi
	; X64-NOBMI-NEXT: je .LBB1_3			; X64-NOBMI-NEXT: je .LBB1_3
	; X64-NOBMI-NEXT: # %bb.1: # %for.body.preheader			; X64-NOBMI-NEXT: # %bb.1: # %for.body.preheader
				; X64-NOBMI-NEXT: movq %rcx, %r8
				; X64-NOBMI-NEXT: movq %rdx, %r9
	; X64-NOBMI-NEXT: xorl %r10d, %r10d			; X64-NOBMI-NEXT: xorl %r10d, %r10d
	; X64-NOBMI-NEXT: xorl %ecx, %ecx			; X64-NOBMI-NEXT: xorl %ecx, %ecx
	; X64-NOBMI-NEXT: .p2align 4, 0x90			; X64-NOBMI-NEXT: .p2align 4, 0x90
	; X64-NOBMI-NEXT: .LBB1_2: # %for.body			; X64-NOBMI-NEXT: .LBB1_2: # %for.body
	; X64-NOBMI-NEXT: # =>This Inner Loop Header: Depth=1			; X64-NOBMI-NEXT: # =>This Inner Loop Header: Depth=1
	; X64-NOBMI-NEXT: movq %r8, %rax			; X64-NOBMI-NEXT: movq %r8, %rax
	; X64-NOBMI-NEXT: mulq (%r9,%rcx,8)			; X64-NOBMI-NEXT: mulq (%r9,%rcx,8)
	; X64-NOBMI-NEXT: addq %r10, %rax			; X64-NOBMI-NEXT: addq %r10, %rax
	; X64-NOBMI-NEXT: adcq $0, %rdx			; X64-NOBMI-NEXT: adcq $0, %rdx
	; X64-NOBMI-NEXT: movq %rax, (%rsi,%rcx,8)			; X64-NOBMI-NEXT: movq %rax, (%rsi,%rcx,8)
	; X64-NOBMI-NEXT: incq %rcx			; X64-NOBMI-NEXT: incq %rcx
	; X64-NOBMI-NEXT: cmpq %rcx, %rdi			; X64-NOBMI-NEXT: cmpq %rcx, %rdi
	; X64-NOBMI-NEXT: movq %rdx, %r10			; X64-NOBMI-NEXT: movq %rdx, %r10
	; X64-NOBMI-NEXT: jne .LBB1_2			; X64-NOBMI-NEXT: jne .LBB1_2
	; X64-NOBMI-NEXT: .LBB1_3: # %for.end			; X64-NOBMI-NEXT: .LBB1_3: # %for.end
	; X64-NOBMI-NEXT: xorl %eax, %eax			; X64-NOBMI-NEXT: xorl %eax, %eax
	; X64-NOBMI-NEXT: retq			; X64-NOBMI-NEXT: retq
	;			;
	; X64-BMI-LABEL: mul1:			; X64-BMI-LABEL: mul1:
	; X64-BMI: # %bb.0: # %entry			; X64-BMI: # %bb.0: # %entry
	; X64-BMI-NEXT: movq %rcx, %r8
	; X64-BMI-NEXT: movq %rdx, %r9
	; X64-BMI-NEXT: testq %rdi, %rdi			; X64-BMI-NEXT: testq %rdi, %rdi
	; X64-BMI-NEXT: je .LBB1_3			; X64-BMI-NEXT: je .LBB1_3
	; X64-BMI-NEXT: # %bb.1: # %for.body.preheader			; X64-BMI-NEXT: # %bb.1: # %for.body.preheader
				; X64-BMI-NEXT: movq %rcx, %r8
				; X64-BMI-NEXT: movq %rdx, %r9
	; X64-BMI-NEXT: xorl %r10d, %r10d			; X64-BMI-NEXT: xorl %r10d, %r10d
	; X64-BMI-NEXT: xorl %eax, %eax			; X64-BMI-NEXT: xorl %eax, %eax
	; X64-BMI-NEXT: .p2align 4, 0x90			; X64-BMI-NEXT: .p2align 4, 0x90
	; X64-BMI-NEXT: .LBB1_2: # %for.body			; X64-BMI-NEXT: .LBB1_2: # %for.body
	; X64-BMI-NEXT: # =>This Inner Loop Header: Depth=1			; X64-BMI-NEXT: # =>This Inner Loop Header: Depth=1
	; X64-BMI-NEXT: movq %r8, %rdx			; X64-BMI-NEXT: movq %r8, %rdx
	; X64-BMI-NEXT: mulxq (%r9,%rax,8), %rcx, %rdx			; X64-BMI-NEXT: mulxq (%r9,%rax,8), %rcx, %rdx
	; X64-BMI-NEXT: addq %r10, %rcx			; X64-BMI-NEXT: addq %r10, %rcx
	Show All 35 Lines

test/CodeGen/X86/machine-cp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-macosx -mattr=+sse2 -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-macosx -mattr=+sse2 -verify-machineinstrs \| FileCheck %s

	; After tail duplication, two copies in an early exit BB can be cancelled out.			; After tail duplication, two copies in an early exit BB can be cancelled out.
	; rdar://10640363			; rdar://10640363
	define i32 @t1(i32 %a, i32 %b) nounwind {			define i32 @t1(i32 %a, i32 %b) nounwind {
	; CHECK-LABEL: t1:			; CHECK-LABEL: t1:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: movl %esi, %edx
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: testl %esi, %esi			; CHECK-NEXT: testl %esi, %esi
	; CHECK-NEXT: je LBB0_1			; CHECK-NEXT: je LBB0_1
				; CHECK-NEXT: ## %bb.2: ## %while.body.preheader
				; CHECK-NEXT: movl %esi, %edx
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_2: ## %while.body			; CHECK-NEXT: LBB0_3: ## %while.body
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: movl %edx, %ecx			; CHECK-NEXT: movl %edx, %ecx
	; CHECK-NEXT: cltd			; CHECK-NEXT: cltd
	; CHECK-NEXT: idivl %ecx			; CHECK-NEXT: idivl %ecx
	; CHECK-NEXT: testl %edx, %edx			; CHECK-NEXT: testl %edx, %edx
	; CHECK-NEXT: movl %ecx, %eax			; CHECK-NEXT: movl %ecx, %eax
	; CHECK-NEXT: jne LBB0_2			; CHECK-NEXT: jne LBB0_3
	; CHECK-NEXT: ## %bb.3: ## %while.end			; CHECK-NEXT: ## %bb.4: ## %while.end
	; CHECK-NEXT: movl %ecx, %eax			; CHECK-NEXT: movl %ecx, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	; CHECK-NEXT: LBB0_1:			; CHECK-NEXT: LBB0_1:
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%cmp1 = icmp eq i32 %b, 0			%cmp1 = icmp eq i32 %b, 0
	br i1 %cmp1, label %while.end, label %while.body			br i1 %cmp1, label %while.end, label %while.body

	Show All 21 Lines
	entry:			entry:
	%tmp8 = shufflevector <8 x i16> %T0, <8 x i16> %T1, <8 x i32> < i32 undef, i32 undef, i32 7, i32 2, i32 8, i32 undef, i32 undef , i32 undef >			%tmp8 = shufflevector <8 x i16> %T0, <8 x i16> %T1, <8 x i32> < i32 undef, i32 undef, i32 7, i32 2, i32 8, i32 undef, i32 undef , i32 undef >
	ret <8 x i16> %tmp8			ret <8 x i16> %tmp8
	}			}

	define i32 @t3(i64 %a, i64 %b) nounwind {			define i32 @t3(i64 %a, i64 %b) nounwind {
	; CHECK-LABEL: t3:			; CHECK-LABEL: t3:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: movq %rsi, %rdx
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: testq %rsi, %rsi
	; CHECK-NEXT: je LBB2_1			; CHECK-NEXT: je LBB2_1
				; CHECK-NEXT: ## %bb.2: ## %while.body.preheader
				; CHECK-NEXT: movq %rsi, %rdx
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB2_2: ## %while.body			; CHECK-NEXT: LBB2_3: ## %while.body
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: movq %rdx, %rcx			; CHECK-NEXT: movq %rdx, %rcx
	; CHECK-NEXT: cqto			; CHECK-NEXT: cqto
	; CHECK-NEXT: idivq %rcx			; CHECK-NEXT: idivq %rcx
	; CHECK-NEXT: testq %rdx, %rdx			; CHECK-NEXT: testq %rdx, %rdx
	; CHECK-NEXT: movq %rcx, %rax			; CHECK-NEXT: movq %rcx, %rax
	; CHECK-NEXT: jne LBB2_2			; CHECK-NEXT: jne LBB2_3
	; CHECK-NEXT: ## %bb.3: ## %while.end			; CHECK-NEXT: ## %bb.4: ## %while.end
	; CHECK-NEXT: movl %ecx, %eax			; CHECK-NEXT: movl %ecx, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	; CHECK-NEXT: LBB2_1:			; CHECK-NEXT: LBB2_1:
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%cmp1 = icmp eq i64 %b, 0			%cmp1 = icmp eq i64 %b, 0
	br i1 %cmp1, label %while.end, label %while.body			br i1 %cmp1, label %while.end, label %while.body

	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

test/CodeGen/X86/scalar_widen_div.ll

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%rem.r = urem <5 x i64> %num, %rem		%rem.r = urem <5 x i64> %num, %rem
ret <5 x i64> %rem.r		ret <5 x i64> %rem.r
}		}

; CHECK: test_int_div		; CHECK: test_int_div
define void @test_int_div(<3 x i32>* %dest, <3 x i32>* %old, i32 %n) {		define void @test_int_div(<3 x i32>* %dest, <3 x i32>* %old, i32 %n) {
; CHECK-LABEL: test_int_div:		; CHECK-LABEL: test_int_div:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: movl %edx, %r9d
; CHECK-NEXT: testl %edx, %edx		; CHECK-NEXT: testl %edx, %edx
; CHECK-NEXT: jle .LBB12_3		; CHECK-NEXT: jle .LBB12_3
; CHECK-NEXT: # %bb.1: # %bb.nph		; CHECK-NEXT: # %bb.1: # %bb.nph
		; CHECK-NEXT: movl %edx, %r9d
; CHECK-NEXT: xorl %ecx, %ecx		; CHECK-NEXT: xorl %ecx, %ecx
; CHECK-NEXT: .p2align 4, 0x90		; CHECK-NEXT: .p2align 4, 0x90
; CHECK-NEXT: .LBB12_2: # %for.body		; CHECK-NEXT: .LBB12_2: # %for.body
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1		; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: movdqa (%rdi,%rcx), %xmm0		; CHECK-NEXT: movdqa (%rdi,%rcx), %xmm0
; CHECK-NEXT: movdqa (%rsi,%rcx), %xmm1		; CHECK-NEXT: movdqa (%rsi,%rcx), %xmm1
; CHECK-NEXT: pextrd $1, %xmm0, %eax		; CHECK-NEXT: pextrd $1, %xmm0, %eax
; CHECK-NEXT: pextrd $1, %xmm1, %r8d		; CHECK-NEXT: pextrd $1, %xmm1, %r8d
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/DebugInfo/X86/dbg-value-transfer-order.ll

	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s
				thegamegUnsubmitted Not Done Reply Inline Actions Just curious: what happened here? Is some debug info missing after this pass? thegameg: Just curious: what happened here? Is some debug info missing after this pass?
				junbumlAuthorUnsubmitted Not Done Reply Inline Actions There was minor block structure change by sinking two COPYs into an empty block. Instead of disabling the pass, I changed block names accordingly. junbuml: There was minor block structure change by sinking two COPYs into an empty block. Instead of…

	; Reduced manually from C source:			; Reduced manually from C source:
	; unsigned __attribute__((noinline, optnone)) useslot(unsigned v) { return v; }			; unsigned __attribute__((noinline, optnone)) useslot(unsigned v) { return v; }
	; void __attribute__((noinline)) f(unsigned cell_offset, unsigned *page_start_) {			; void __attribute__((noinline)) f(unsigned cell_offset, unsigned *page_start_) {
	; unsigned cell = page_start_[0];			; unsigned cell = page_start_[0];
	; while (cell) {			; while (cell) {
	; unsigned bit_offset = cell_offset ? __builtin_ctz(cell) : 32;			; unsigned bit_offset = cell_offset ? __builtin_ctz(cell) : 32;
	; unsigned bit_mask = 1U << bit_offset;			; unsigned bit_mask = 1U << bit_offset;
	Show All 9 Lines
	; We had a bug where the DBG_VALUE instruction for bit_offset would be emitted			; We had a bug where the DBG_VALUE instruction for bit_offset would be emitted
	; at the end of the basic block, long after its actual program point. What's			; at the end of the basic block, long after its actual program point. What's
	; interesting in this example is that the !range metadata produces an AssertSext			; interesting in this example is that the !range metadata produces an AssertSext
	; DAG node that gets replaced during ISel. This leads to an unordered SDDbgValue			; DAG node that gets replaced during ISel. This leads to an unordered SDDbgValue
	; vector, which has to be sorted by IR order before it is processed in parallel			; vector, which has to be sorted by IR order before it is processed in parallel
	; with the Orders insertion point vector.			; with the Orders insertion point vector.

	; CHECK-LABEL: f: # @f			; CHECK-LABEL: f: # @f
	; CHECK: .LBB0_1: # %while.body			; CHECK: .LBB0_2: # %while.body
	; CHECK: movl $32, %ecx			; CHECK: movl $32, %ecx
	; CHECK: testl {{.*}}			; CHECK: testl {{.*}}
	; CHECK: jne .LBB0_3			; CHECK: jne .LBB0_4
	; CHECK: # %bb.2: # %if.then			; CHECK: # %bb.3: # %if.then
	; CHECK: callq if_then			; CHECK: callq if_then
	; CHECK: movl %eax, %ecx			; CHECK: movl %eax, %ecx
	; CHECK: .LBB0_3: # %if.end			; CHECK: .LBB0_4: # %if.end
	; Check that this DEBUG_VALUE comes before the left shift.			; Check that this DEBUG_VALUE comes before the left shift.
	; CHECK: #DEBUG_VALUE: bit_offset <- $ecx			; CHECK: #DEBUG_VALUE: bit_offset <- $ecx
	; CHECK: .cv_loc 0 1 8 28 # t.c:8:28			; CHECK: .cv_loc 0 1 8 28 # t.c:8:28
	; CHECK: movl $1, %[[reg:[^ ]*]]			; CHECK: movl $1, %[[reg:[^ ]*]]
	; CHECK: shll %cl, %[[reg]]			; CHECK: shll %cl, %[[reg]]

	; ModuleID = 't.c'			; ModuleID = 't.c'
	source_filename = "t.c"			source_filename = "t.c"
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Add a new pass for PostRA sinkClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 139443

include/llvm/CodeGen/Passes.h

include/llvm/CodeGen/TargetInstrInfo.h

include/llvm/InitializePasses.h

lib/CodeGen/CodeGen.cpp

lib/CodeGen/MachineSink.cpp

lib/CodeGen/TargetInstrInfo.cpp

lib/CodeGen/TargetPassConfig.cpp

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

test/CodeGen/AArch64/post-ra-machine-sink.mir

test/CodeGen/AArch64/sink-copy-for-shrink-wrap.ll

test/CodeGen/Hexagon/noreturn-noepilog.ll

test/CodeGen/Hexagon/swp-phi-ref.ll

test/CodeGen/Thumb2/ifcvt-no-branch-predictor.ll

test/CodeGen/X86/branchfolding-debugloc.ll

test/CodeGen/X86/i128-mul.ll

test/CodeGen/X86/machine-cp.ll

test/CodeGen/X86/scalar_widen_div.ll

test/DebugInfo/X86/dbg-value-transfer-order.ll

[CodeGen] Add a new pass for PostRA sink
ClosedPublic