This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachineScheduler.h
-
lib/CodeGen/
-
CodeGen/
25/34
MachineScheduler.cpp
-
test/CodeGen/SystemZ/
-
CodeGen/
-
SystemZ/
-
alloca-01.ll
-
args-06.ll
-
args-10.ll
2/6
args-11.mir
-
cond-move-02.ll
-
risbg-01.ll
-
vec-cmp-cmp-logic-select.ll

Differential D38351

MIScheduler improved handling of copied physregs
Needs ReviewPublic

Authored by jonpa on Sep 28 2017, 4:45 AM.

Download Raw Diff

Details

Reviewers

uweigand
atrick
qcolombet
MatzeB

Summary

When trying to enable the mischeduler on SystemZ, a regression was discovered which was due to how instructions using copys of physregs were scheduled.

The test case looks attached is the input to mischeduler. The observation here is that %R3D is copied into %vreg1, while %R3D is also being written to later from %vreg7. We we would want the ADJDYNALLOCs using %vreg1 to be scheduled before the one defining %vreg7, so that regalloc will be able to give %r3d to %1 and later to %7 without overlap.

The isel scheduler handled this by seing that the ADJDYNALLOCs using %1 has one more live range, which it wants to minimize, when comparing the ADJDYNALLOCs.

This patch adds a simple heuristic to tryCandidate() which handles this test case. It also seem to generally reduce the number of COPYs across benchmarks. Basically it extends the handling for physregs by looking at virtual registers which are defined by a single COPY of a physreg (%vreg1 in this example). A preference is given then for the SU that uses such a vreg in order to minimize that live-range.

I have tried different positions for this new heuristic in tryCandidate(). Anywhere above the RPDelta.CurrentMax check seems bad as it increases spilling. Then it seems to nearly make no difference at all where it is placed after this, so I put it as the last heuristic just before original instruction order.

A lot of test cases fail with this patch:
X86: 321
AMDGPU: 36
AArch64: 13
PowerPC, SPARC: 3 each
Lanai: 2
ARM: 1

So before going any further, I would like feedback on the feasability of inserting this heuristic in the mischeduler.

Is this the right handling of this test case in mischeduler (I assume it is a scheduling problem)?

Diff Detail

Event Timeline

jonpa created this revision.Sep 28 2017, 4:45 AM

Herald added subscribers: javed.absar, fedor.sergeev, kristof.beyls and 3 others. · View Herald TranscriptSep 28 2017, 4:45 AM

Interesting, I'll give it a try on ARM and AArch64.

test/CodeGen/SystemZ/args-11.mir
6	Indent in the comment off by 1?
29	You could use `-debug-only=machine-scheduler` and check the schedule emitted there, which would map more directly MachineInstrs used in the test case.

Hmm interesting observation. So to recap (and for further comments in the sourcecode :) you seem to deal with this situation:

%1 = OP %0
// ...
%physregX = COPY %1

and moving OP downwards increases the chance that %1 doesn't interfere with physregX and the COPY can be coalesced. Or in the inverse situation

%0 = COPY %physregX
// ...
%1 = OP %0

moving OP upwards minimizes the chance of %0 interfering with physregX and increasing the chance of the COPY being coalesced.

If I'm reading this correctly you only implement the first case, but the 2nd case should work as well.
You should exclude reserved registers and registers not allocatable to the vreg in question from the rule.

lib/CodeGen/MachineScheduler.cpp
2909	This is a non-obvious rule and needs more comments and a better function name!
2913–2922	I would perform the check on the ScheduleDAG rather than the MIs/operands. That should catch more cases and also gives you a sensible way to implement the reverse pattern.
2915	Please use `MachineOperand &` instead of `auto`.
test/CodeGen/SystemZ/args-11.mir
29	or `-run-pass=machine-schedule` for that matter so we only run the one pass we are interested in.

jonpa marked 3 inline comments as done.Oct 3 2017, 2:26 AM

jonpa added inline comments.

lib/CodeGen/MachineScheduler.cpp
2913–2922	I tried - for (const MachineOperand &MO : MI->uses()) { - if (!MO.isReg()) + for (const SDep &Pred : SU->Preds) { + if (Pred.getKind() != SDep::Data) continue; - MachineInstr DefMI = MRI->getUniqueVRegDef(MO.getReg()); + unsigned Reg = Pred.getReg(); + MachineInstr DefMI = MRI->getUniqueVRegDef(Reg); This seemed to give more copy coalescing, but also for some reason a bit more spilling, which is why it doesn't look as good. Why would this be better? Basically, checking the SU->Preds (which is what I guessed you mean by "on the ScheduleDAG"), should only check dependencies within the sheduling region. Since coalescing works globally, wouldn't it be better to find the defining MI also if it is in another block? What more cases are you thinking of?
test/CodeGen/SystemZ/args-11.mir
29	I am not sure - wouldn't it be better to confirm that the output in the end ends up to be good? I mean, that way it tests both that mischeduler and regalloc are smart, and no matter what, we know this test case should not have that extra copy. I am just curious about what your motivation is for changing this...

Patch updated with experiments as suggested in review response.

If I'm reading this correctly you only implement the first case, but the 2nd case should work as well.

Actually, I was checking for the use-ops of MI that come from a PReg copy, so this would be case 2, I'd say.

You should exclude reserved registers and registers not allocatable to the vreg in question from the rule.

Checks added. These cause nearly no change currently, but I still think it is good to limit this heuristic if it doesn't hurt,
in case it gets moved around in tryCandidate(). Downside is that it adds a bit more lines of code...

Adding the inverse check per your suggestion did not help as much as hoped - no change in spilling, and just a few less signextending COPYs. Note that SystemZ will initially (as during these experiments) only do bottom-up scheduling.

I tried to compare the effect of activating the second heuristic during bidirectional scheduling, and found that it looked to be about the same difference (merely enabling bidirectional scheduling gives currently a notable increase in spilling on SystemZ).

To summarize:

doing the inverse check seems to help only very marginally if at all on SystemZ.
checking for reserved regs / reg-class of copied to/from virtreg has very little effect.

Note this is for SystemZ. I updated the patch to include the new experiments so that someone else could try this on other targets to see if we e.g. need the inverse check. For SystemZ, I think it is not needed, and would be happy to remove it and simplify the patch. Also, the check for reserved/regclass could be removed, leaving the patch as it was in the previous version...

ping!

MatzeB added inline comments.Oct 11 2017, 4:35 PM

lib/CodeGen/MachineScheduler.cpp
2912	`isCoalescablePRegCopy`? Use references for parameters that cannot be nullptr.
2913–2922	Hmm right, scheduling regions are limiting as well. The case I was thinking of is when you have multiple definitions of a vreg in different basic blocks, then MachineRegisterInfo:: getUniqueVRegDef() will return null, while the scheduling region gets you a reaching definition for free; but indeed this thinking is only true if the definition is actually part of the same scheduling region. Sounds like the region limits are a bigger problem than multiple definitions so keep the code as is.
2928	Could use `MCPhysReg` instead of `unsigned` here. Subregister indexes are only used for virtual registers `getSubReg()` is always 0 when a physreg is assigned.
2968–2971	This would benefit from some refactoring to avoid the repeated getOperand(0) expressions. isImplicit() should only be set on uses and I don't see why it matters here. (Technically you could see the flag being set on a subregister definition which also acts as a use, but again I don't see why it matters)
2973–2977	I think we should be conservative here and restrict the check to cases where the vreg has one use only. I could imagine this showing strange effects and less benefits when there are multiple uses or the use is in a different block. This is another case where it may or may not be better to look at the schedule graph instead... It's enough to check `use_nodbg_operands()`
2980	No need for braces.
test/CodeGen/SystemZ/args-11.mir
29	There is a place for end-to-end testing and for directed testing of exactly the feature/change you implemented. We tend to use llvm/test for the latter and the llvm test-suite for the former. Admittedly the test-suite is only checked for performance and not for generated assembly at the moment so there may be room for some middle ground, but I don't see why we should start this here and with this commit...

Thanks for review!
Updated.

lib/CodeGen/MachineScheduler.cpp
2913–2922	OK. It also makes sense to me as it is handling cases where vreg is defined by a COPY of a physreg, and I think this should typically mean that vreg only has one definition.
2928	right
3148	Is it OK to use the same CandReason (PhysRegCopy) as biasPhysRegCopy(), or should it be different, e.g. PhysRegCopyUses or PhysRegCopy2 ?
test/CodeGen/SystemZ/args-11.mir
29	I see.

ping!

Do you have any benchmark results on SystemZ? I can provide some benchmark numbers on Cortex-A57, AArch64: test-suite & SPEC2000 has a 0.79% geomean speedup with this patch. There is one bigger regression ( reeBench/analyzer/analyzer +7% runtime)

In D38351#910595, @fhahn wrote:

Do you have any benchmark results on SystemZ? I can provide some benchmark numbers on Cortex-A57, AArch64: test-suite & SPEC2000 has a 0.79% geomean speedup with this patch. There is one bigger regression ( reeBench/analyzer/analyzer +7% runtime)

That sounds promising.

I have not yet made a serious benchmarking effort since the patch may still change (some preliminary results show some modest improvements).

Is there anything left that might change at this point?

ping!

Ping!

While waiting for review, I updated the patch to do this only if the COPY is in the same MBB, to check for the difference. This actually seemed to make the total number of COPYs even less, and also seemed to have less other side effects. At least currently this therefore looks best to me. Florian, how does this seem to you? Matthias?

BTW, this question seems still unanswered: "Is it OK to use the same CandReason (PhysRegCopy) as biasPhysRegCopy(), or should it be different, e.g. PhysRegCopyUses or PhysRegCopy2 ?"

Are there any volunteers for reviewing test changes if I update them all?

LGTM with nitpicks.

In D38351#924553, @jonpa wrote:

Ping!

While waiting for review, I updated the patch to do this only if the COPY is in the same MBB, to check for the difference. This actually seemed to make the total number of COPYs even less, and also seemed to have less other side effects. At least currently this therefore looks best to me. Florian, how does this seem to you? Matthias?

Seems reasonable as that is closer to what a scheduling region covers and the scheduler can influence.

BTW, this question seems still unanswered: "Is it OK to use the same CandReason (PhysRegCopy) as biasPhysRegCopy(), or should it be different, e.g. PhysRegCopyUses or PhysRegCopy2 ?"

I'd introduce a new name.

Are there any volunteers for reviewing test changes if I update them all?

I think generally you can use best judgement and just update tests.
If you are unsure, you could upload an update here so that peoples phabricator/herald rules match and see if someone complains within a few days.
If you worry about a specific test then it's usually best to look at the version control history of the test and CC the person.

lib/CodeGen/MachineScheduler.cpp
2912	Could also do `MachineInstr &CopyMI`.
2944	Could use `const SUnit &`
3148	Introducing a new enum entry here for this is cheap and helps debugging.

This revision is now accepted and ready to land.Nov 15 2017, 9:42 AM

Unfortunately, I realized there were still some issues with this patch, which I have now corrected:

There were some regressions that wouldn't go away, and I found that by making the patch less aggressive those regressions disappeared as well as noticing even a slight overall improvement. I added the check for "one use" in addition to "same MBB".

Even with good overall results, the SystemZ test args-11.mir now failed, since it does have multiple uses of the copied preg. To handle this I had to add a special check that this test case actually reflects. If the two candidates specifically are connected to the same preg where one is using it and the other is defining it, then the ordering should be based on that. This saves some additional COPYs, and gives also some overall improvement compared to without it (but "one use" per above).

I also had to avoid moving COPYs of subregs of a GR128 away from the defining instruction. There was a case where regalloc ran out of registers when this was done.

Please review again:

MachineScheduler patch.
SystemZ tests.

(Other test updates are still needed, but I'll wait with that for now).

lib/CodeGen/MachineScheduler.cpp
3148	All, right - I added a new entry for this with the name PhysRegCp2 / PREG-CP-2 (feel free to modify), with the next-lowest priority just before NodeOrder. The priority change is NFC on SystemZ (since it is bottom-up, I presume).

ping

Matthias, do the changes to mischeduler look good to you still with the latest additions?

Would it be acceptable to add temporary target hook for this until all targets have enabled this and gone through their tests? I have found in a similar situation (Copy hints patch), that even when I took the time to manually update +100 tests, there were very little hope of getting all those test changes reviewed. I would rather not update tests without a proper review, so to me then this seems like a sensible solution. Of course, I would prefer to update tests for one or two targets if there would be reviewer(s) anticipating this, since this is a good way to make sure that this is working well generally.

Would it be acceptable to add temporary target hook for this until all targets have enabled this and gone through their tests? I have found in a similar situation (Copy hints patch), that even when I took the time to manually update +100 tests, there were very little hope of getting all those test changes reviewed. I would rather not update tests without a proper review, so to me then this seems like a sensible solution. Of course, I would prefer to update tests for one or two targets if there would be reviewer(s) anticipating this, since this is a good way to make sure that this is working well generally.

Generally I would prefer not to; the "temporary" often becomes months/years/never. I'd rather see you updating test cases, target maintainers should have phabricator herald rules setup to catch the review (or email filters or ...) and then react to the changes. If there are no complaints in time I'd move ahead. Testcases should not be a reason to stop progress. (Of course this all assumes none of the maintainers actively rejects or comments on the changes or you already see an obvious problem doing the changes yourself).

lib/CodeGen/MachineScheduler.cpp
2912	I assume `CopyMI` can be const?
2915	No space after `assert`.
2917	Split into two lines.
2931	We don't align equal signs in llvm (unfortunately IMO).
2934	No braces necessary around the whole returned expression.
2935–2936	Should we restrict the subregister case to `VRegMO->isUndef()`? It is hard to know whether the coalescing can actually succeed in the non-undef case; It could if all other defs copy the relevant subregisters as well, but I wonder how typical that case is given that it corresponds to simply copying the whole register in a single instruction.
2950	Could use the `\p SU` doxygen syntax to reference parameters.
2969	Could use `if (DefMI == nullptr \|\| !DefMI->isCopy()) continue;` etc. to reduce indentation of the following code.
2983–3002	I think this would look better if it used more early exit (i.e. if (!getNumOperands()) return; if (skipWideCopy) return; ...`. Maybe you can rewrite the code that changes NumCopiedPhysRegUses to` PRegs.NumCopied += isTop ? 1 : -1` (or `-1 : 1`) to avoid the need for the last line.
2984–2988	Restricting this to 128bit COPYs is artifical. Would it also work when skipping all subregister COPYs? Also unnecessary outer brace.
2990	unnecessary brace outer brace around expression.
3012	No space after `assert`.
3016	Split into two lines.
3020–3047	Shouldn't the `NumCopied` comparisons already pull the `%2 = OP` upwards and the `%1 = OP` downwards in the example? If there is a reason to handle this case separately anyway write in the comment about it.

jonpa marked 14 inline comments as done.Dec 6 2017, 9:38 AM

jonpa added inline comments.

lib/CodeGen/MachineScheduler.cpp
2935–2936	I added it and as you suspected, it gives very little change. On spec, just ~5 instructions in total got rescheduled... And oh yeah, also actually 1 COPY more now... ;-) That copy is from a vector register into a floating point argument register. This is a bit special: on SystemZ an FP reg lives in a low-part of any of the 32 vector registers. So the FP register is a subreg of the vector register, but it is the only subreg that is interesting here. No change at all in spill or so, so this doesn't matter. Doesn't seem to save much compile time either, I think. Due to the vector/fp registers on SystemZ, I will leave it as it was just for good manners sake, unless you insist on it.
2984–2988	I was surprised to find that removing the check for the width of the super-register was a complete NFC :-) Removed.
3020–3047	While developing this patch, I noticed regressions that I could only (at least so far) get rid of by being less aggressive, which made sense "keep the good and get rid of the bad". One thing that seemed to do the trick then was to add if (MRI->hasOneUse(MO.getReg())) // Make this a bit less agressive by checking for one use. NumCopiedPhysRegUses++; So the reason that this general heuristic fails at least in this test case, is that the use of %0 is ignored. This seems unfortunate, but it is still true that I so far cannot get rid of regressions without limiting the general heuristic... At the same time, this case really should be handled.

Thank you for review, Matthias!

I tried to follow your suggestions and evaluate the suggestions - see in-line comments for the conclusions. What do you think?

This is NFC to the previous version.

Patch reworked into a state with experimental options. Only SystemZ tests
are passing (updated), for now.

I experimented further with the new phys-reg heuristic, trying to get rid of
the lines that does the explicit check on def/use regs (UsedPReg ==
DefedPReg). As Matthias pointed out, this should not be necessary, but
was added to catch specific and beneficial cases while keeping the general
heuristic less aggressive (to avoid regressions). I wanted to find another,
simpler way of achieving this.

Stepping back a bit to before I added the lines with 'UsedPReg ==
DefedPReg'... checks: I got regressions and reasoned that this heuristic
should only handle the simple cases we were seeing, so I wanted to make it
less aggressive. I found that putting a limit on number of users of the
COPYed phys-reg to 1, did the trick quite well. This however didn't handle
the new SystemZ test case (args-11.mir). Then I added the checks (which I am now
trying to remove...) which did the trick and was also quite beneficial in the
sense that the cases it did catch seemed to be very much beneficial.

There is no real problem with this, except it adds more code to an otherwise
simple heuristic. So I experimented further with many variations while checking
the impact of number of register moves and spills on SPEC (see table below
for numbers).

Also, a secondary goal of this patch is to handle two SystemZ test cases that
have been identified and marked as temporary failing:

The COPY regression that slipped in with the guessInstructionProperties=0 (risbg-01.ll):

%bb.0: derived from LLVM BB %0
    Live Ins: %r2l %r3d
        %1<def> = COPY %r3d; ADDR64Bit:%1
        %3:subreg_l32<def,read-undef> = COPY %r2l; GR64Bit:%3
(2)     %2<def> = COPY %3:subreg_l32; GR32Bit:%2 GR64Bit:%3
        %2<def,tied1> = SRA %2<tied0>, %noreg, 28, %cc<imp-def,dead>; GR32Bit:%2
        ST %2, %1, 0, %noreg; mem:ST4[%dest] GR32Bit:%2 ADDR64Bit:%1
(5)     %5<def,tied1> = RISBG %5<undef,tied0>, %3, 60, 190, 36, %cc<imp-def,dead>; GR64Bit:%5,%3
        %r2l<def> = COPY %5:subreg_l32; GR64Bit:%5
        Return %r2l<imp-use>

Before (in a completely different patch) fixing instruction flags and setting
guessInstructionProperties to 0, the RISBG used incorrectly to be a global memory
object and new barrier chain. When the instruction flag was *fixed*, the
CopyConstrain DAG mutator now adds a weak edge:

Constraining copy SU(2)
  Local use SU(5) -> SU(2)

This forces the RISBG to be scheduled before the COPY, which is
unfortunate as then the RISBG cannot not write to the connected %r2l
phys-reg. The regression becomes:

f21:                                    f21:
        .cfi_startproc                          .cfi_startproc
# %bb.0:                                # %bb.0:

        lr      %r0, %r2              |         risbg   %r0, %r2, 60, 190, 36
        sra     %r0, 28               |         lr      %r1, %r2
        st      %r0, 0(%r3)           |         sra     %r1, 28
        risbg   %r2, %r2, 60, 190, 36 |         st      %r1, 0(%r3)
                                      |         lr      %r2, %r0
        br      %r14                            br      %r14
.Lfunc_end0:                            .Lfunc_end0:

One remedy to this is to run the new tryPhysRegCopies2() heuristic before the
weak edges check. I am a bit reluctant to that, as it seems safest to keep the new
heuristic as late as possible, after the last reg-pressure check etc.

Alternatively, the CopyConstrain could be fixed to consider the phys reg
deps, since this seems natural at least in this case. For this I tried first
to add checks inside the loops that add the edges in constrainLocalCopy(), but then
changed this to do a general check in the beginning of the loop instead. This
seemed to work fairly well, and is part of the new patch.

args-11.mir:

This test case has proven tricky to handle without making the heuristic too aggressive.

One way is to add extra checks for the actual used / def:ed reg(s). This is(1) below from Dec 6.

This has the users in two regions, due to an SP-def. I tried to redefine the SystemZ scheduling boundary to ignore SP-defs pre-RA so as to be able to demand all users in the region. It however turned out that it was enough to demand all users local to MBB, which is simpler.

I experimented with other ways to handle this without causing regressions, tuning for the numbers in the tables. The current patch handles it without the extra specific checks used before in the first attempt.

While experimenting with variations on the cost function
(findConnectedPhysRegs), I took guidance from the number of COPYs (Reg
moves), and number of Spill|Reload instructions in output. This is compared
to master (unpatched) in the tables below:

(1) "patch Dec 6": This is the previous version of the patch, that was
limited for "one use", and also had the extra lengthy checks involving
UsedPReg and DefedPReg comparisons. This does not handle risbg-01.ll.

Reg moves                   unpatched            patch Dec 6
403.gcc        :                62353                  62238     -115
435.gromacs    :                14381                  14484     +103
481.wrf        :                 7252                   7342      +90
445.gobmk      :                14556                  14495      -61
447.dealII     :                60812                  60767      -45
464.h264ref    :                 7006                   6972      -34
454.calculix   :                18640                  18666      +26
453.povray     :                14028                  14007      -21
450.soplex     :                 6807                   6820      +13
483.xalancbmk  :                97079                  97070       -9
471.omnetpp    :                18051                  18042       -9
482.sphinx3    :                 2931                   2937       +6
... (<= 5)
Sum            :               371149                 371101      -48

Spill|Reload                unpatched            patch Dec 6
435.gromacs    :                13085                  13024      -61
445.gobmk      :                 6370                   6344      -26
400.perlbench  :                 5396                   5370      -26
481.wrf        :                 2534                   2550      +16
403.gcc        :                15789                  15777      -12
453.povray     :                 7223                   7212      -11
454.calculix   :                19859                  19851       -8
483.xalancbmk  :                 9598                   9604       +6
464.h264ref    :                12263                  12269       +6
... (<= 5)
Sum            :               165013                 164898     -115

(2) Current patch without any options:

Do the check of connected phys regs in CopyConstrain
Run tryPhysRegCopies2() last in tryCandidate().
In particular: If there is any other use other than a coalescable COPY to phys-reg, don't do anything If there is one or more such COPYs, add 1 to the cost function. ` For any use of a register that is defined by a local coalescable COPY and as well also only have local users, add 1 to the cost function.

Reg moves                   unpatched          current patch
403.gcc        :                62353                  62079     -274
483.xalancbmk  :                97079                  96879     -200
445.gobmk      :                14556                  14399     -157
435.gromacs    :                14381                  14481     +100
481.wrf        :                 7252                   7343      +91
464.h264ref    :                 7006                   6964      -42
400.perlbench  :                18428                  18389      -39
454.calculix   :                18640                  18667      +27
436.cactusADM  :                10367                  10348      -19
471.omnetpp    :                18051                  18035      -16
450.soplex     :                 6807                   6823      +16
456.hmmer      :                 5122                   5132      +10
453.povray     :                14028                  14020       -8
447.dealII     :                60812                  60806       -6
... (<= 5)
Sum            :               371149                 370625     -524

Spill|Reload                unpatched          current patch
435.gromacs    :                13085                  13024      -61
400.perlbench  :                 5396                   5364      -32
445.gobmk      :                 6370                   6344      -26
403.gcc        :                15789                  15764      -25
447.dealII     :                33182                  33200      +18
464.h264ref    :                12263                  12247      -16
481.wrf        :                 2534                   2550      +16
483.xalancbmk  :                 9598                   9606       +8
454.calculix   :                19859                  19852       -7
... (<= 5)
Sum            :               165013                 164876     -137

(3) Current patch with COPY_CONSTRAIN_CHECK = false. Same as (2), but without
the check in CopyConstrain. (does not handle risbg-01.ll)

Reg moves                   unpatched  !COPY_CONSTRAIN_CHECK
403.gcc        :                62353                  62076     -277
483.xalancbmk  :                97079                  96876     -203
445.gobmk      :                14556                  14410     -146
435.gromacs    :                14381                  14478      +97
481.wrf        :                 7252                   7343      +91
464.h264ref    :                 7006                   6964      -42
400.perlbench  :                18428                  18388      -40
454.calculix   :                18640                  18667      +27
436.cactusADM  :                10367                  10348      -19
471.omnetpp    :                18051                  18035      -16
450.soplex     :                 6807                   6821      +14
453.povray     :                14028                  14017      -11
456.hmmer      :                 5122                   5132      +10
447.dealII     :                60812                  60804       -8
... (<= 5)
Sum            :               371149                 370619     -530

Spill|Reload                unpatched  !COPY_CONSTRAIN_CHECK
435.gromacs    :                13085                  13024      -61
400.perlbench  :                 5396                   5364      -32
445.gobmk      :                 6370                   6344      -26
403.gcc        :                15789                  15764      -25
447.dealII     :                33182                  33200      +18
481.wrf        :                 2534                   2550      +16
464.h264ref    :                12263                  12247      -16
483.xalancbmk  :                 9598                   9606       +8
454.calculix   :                19859                  19852       -7
... (<= 5)
Sum            :               165013                 164874     -139

(4) Current patch with COPY_CONSTRAIN_CHECK = false and BEFORE_WEAK =
true. Same as (2), but without the check in CopyConstrain, and with the
tryPhysRegCopies2() run before checking the weak edges instead.

Reg moves                   unpatched   !COPY_CONSTRAIN_CHECK + BEFORE_WEAK
403.gcc        :                62353                  62076     -277
483.xalancbmk  :                97079                  96880     -199
445.gobmk      :                14556                  14411     -145
435.gromacs    :                14381                  14482     +101
481.wrf        :                 7252                   7343      +91
464.h264ref    :                 7006                   6965      -41
400.perlbench  :                18428                  18389      -39
454.calculix   :                18640                  18667      +27
436.cactusADM  :                10367                  10348      -19
450.soplex     :                 6807                   6822      +15
471.omnetpp    :                18051                  18036      -15
453.povray     :                14028                  14017      -11
456.hmmer      :                 5122                   5132      +10
447.dealII     :                60812                  60806       -6
... (<= 5)
Sum            :               371149                 370635     -514

Spill|Reload                unpatched   !COPY_CONSTRAIN_CHECK + BEFORE_WEAK
435.gromacs    :                13085                  13024      -61
400.perlbench  :                 5396                   5364      -32
445.gobmk      :                 6370                   6344      -26
403.gcc        :                15789                  15764      -25
447.dealII     :                33182                  33200      +18
481.wrf        :                 2534                   2550      +16
464.h264ref    :                12263                  12248      -15
483.xalancbmk  :                 9598                   9606       +8
454.calculix   :                19859                  19852       -7
... (<= 5)
Sum            :               165013                 164875     -138

(5) Current patch with COUNT_DEF = false. This ignores the def (inverse)
case, which seems to have an interesting effect on the number of register
moves, while the improvement in spilling dissapears.

Reg moves                   unpatched             !COUNT_DEF
445.gobmk      :                14556                  14279     -277
403.gcc        :                62353                  62114     -239
483.xalancbmk  :                97079                  96908     -171
400.perlbench  :                18428                  18397      -31
436.cactusADM  :                10367                  10344      -23
481.wrf        :                 7252                   7234      -18
456.hmmer      :                 5122                   5112      -10
453.povray     :                14028                  14019       -9
433.milc       :                 2402                   2393       -9
447.dealII     :                60812                  60820       +8
471.omnetpp    :                18051                  18044       -7
450.soplex     :                 6807                   6801       -6
... (<= 5)
Sum            :               371149                 370351     -798

Spill|Reload                unpatched             !COUNT_DEF
481.wrf        :                 2534                   2546      +12
445.gobmk      :                 6370                   6378       +8
403.gcc        :                15789                  15782       -7
464.h264ref    :                12263                  12256       -7
400.perlbench  :                 5396                   5390       -6
... (<= 5)
Sum            :               165013                 165025      +12

(6) Current patch with COUNT_DEF = false and COPY_CONSTRAIN_CHECK = false and
BEFORE_WEAK. Same as (3), but instead of doing the check in CopyConstrain,
the tryPhysRegCopies2() is run before checking the weak edges.

Reg moves                   unpatched !COUNT_DEF + !COPY_CONSTRAIN_CHECK + BEFORE_WEAK
445.gobmk      :                14556                  14283     -273
403.gcc        :                62353                  62111     -242
483.xalancbmk  :                97079                  96905     -174
400.perlbench  :                18428                  18396      -32
436.cactusADM  :                10367                  10344      -23
481.wrf        :                 7252                   7234      -18
456.hmmer      :                 5122                   5112      -10
450.soplex     :                 6807                   6799       -8
471.omnetpp    :                18051                  18044       -7
447.dealII     :                60812                  60819       +7
433.milc       :                 2402                   2395       -7
453.povray     :                14028                  14021       -7
... (<= 5)
Sum            :               371149                 370345     -804

Spill|Reload                unpatched !COUNT_DEF + !COPY_CONSTRAIN_CHECK + BEFORE_WEAK
481.wrf        :                 2534                   2546      +12
445.gobmk      :                 6370                   6378       +8
403.gcc        :                15789                  15782       -7
464.h264ref    :                12263                  12257       -6
400.perlbench  :                 5396                   5390       -6
... (<= 5)
Sum            :               165013                 165024      +11

(7) Trying just the CopyConstrain check, *without* tryPhysRegCopies2. Not much change --
without tryPhysRegCopies2 running after it, there seems to be no use for this.

Reg moves                   unpatched              spec-llvm
... (<= 5)
Sum            :               371149                 371164      +15

Spill|Reload                unpatched              spec-llvm
... (<= 5)
Sum            :               165013                 165015       +2

These are the variants which seem to improve. Others, which included doing
just one of the cases from before, i.e. "just the uses" or "just the def" ("inverse case")
gave bad numbers, a bit to my surprise.

I admit that small changes to the patch influence the numbers while there is
no real guarantee that this patch is the best version over time. Also, the
number of eliminated COPYs / spills is quite marginal.

Looking at SPEC, I have tried to benchmark this a few times over-night
and find that while last week there seemed to be perhaps an improvement, this
week it seems to be mostly minor regressions on average. It seems that those effects
are quite random, as I have not been able to find any obvious explanations.

Today, it looks like (1) is showing about unchanged average impact over
benchmarks, while (3), (4) and (5) show 0.15 % regression on average. (2)
and (6) show 0.3 % / 0.5 % regression on average.

Again, those regressions do not really make sense, and I would guess that
they might very well be reversed next week or so, given that the number of
COPYs and spills are reduced on average.

I have no idea how this affects other targets, I can only hope for the better. It would
be nice if someone could confirm this.

The patch might be finalized into one of the settings above, or even reverted
back to the previous version, if that's better for some reason.

At this point I would appreciate some feedback from the reviewers!

At least for the last two days, it seems this patch is a slight improvement on SystemZ.

I also cross-built to check impact on other targets. The results look acceptable to me, even though not as good as on SystemZ:

-target x86_64-linux-gnu (no cpu specified):

Reg moves                      master                patched
454.calculix   :                21958                  22000      +42
483.xalancbmk  :                70109                  70074      -35
435.gromacs    :                14019                  13985      -34
403.gcc        :                71829                  71860      +31
436.cactusADM  :                13496                  13516      +20
456.hmmer      :                 5857                   5865       +8
433.milc       :                 3079                   3086       +7
445.gobmk      :                17615                  17609       -6
... (<= 5)
Sum            :               264267                 264300      +33

Spill|Reload                   master                patched
403.gcc        :                21588                  21529      -59
483.xalancbmk  :                 8248                   8267      +19
454.calculix   :                16587                  16569      -18
456.hmmer      :                 3858                   3842      -16
433.milc       :                 1766                   1781      +15
435.gromacs    :                12109                  12119      +10
... (<= 5)
Sum            :               115627                 115572      -55

-target aarch64-linux-gnu (no cpu specified):

Reg moves                      master                patched
483.xalancbmk  :                73855                  73904      +49
403.gcc        :                82487                  82462      -25
454.calculix   :                22894                  22880      -14
436.cactusADM  :                14834                  14844      +10
435.gromacs    :                16489                  16498       +9
... (<= 5)
Sum            :               295182                 295204      +22

Spill|Reload                   master                patched
403.gcc        :                43066                  43005      -61
454.calculix   :                14818                  14810       -8
400.perlbench  :                14536                  14543       +7
445.gobmk      :                17120                  17126       +6
... (<= 5)
Sum            :               183350                 183288      -62

Matthias, can I proceed with removing experimental options and moving on to test updates for the other targets?

Ping!

PING!

Anyone has any comments on this?

Thanks for CC'ing me, but I can't comment on whether this is the right heuristic since I haven't done this level of tuning in a long time. Deferring to @MatzeB.

sebpop added subscribers: evandro, sebpop.Mar 13 2018, 8:58 AM

I tried this patch on aarch64 A72 firefly linux on a set of benchmarks.
Overall the performance degraded by 35% cumulatively (sum of all speedups and slowdowns.)
There were 5 benchmarks that sped up by more than 1% and 12 that slowed down by >1%.
One benchmark slowed down by >10% and three by >5%.
I will investigate these slowdowns.

In D38351#1037765, @sebpop wrote:

I tried this patch on aarch64 A72 firefly linux on a set of benchmarks.
Overall the performance degraded by 35% cumulatively (sum of all speedups and slowdowns.)
There were 5 benchmarks that sped up by more than 1% and 12 that slowed down by >1%.
One benchmark slowed down by >10% and three by >5%.
I will investigate these slowdowns.

I just collected more data on the slowdowns and they all are within noise level.
When taking the best scores over 5 or more runs, the patch shows a small spedup orverall.
Green light to commit from my side.

In D38351#1041844, @sebpop wrote:

In D38351#1037765, @sebpop wrote:

I tried this patch on aarch64 A72 firefly linux on a set of benchmarks.
Overall the performance degraded by 35% cumulatively (sum of all speedups and slowdowns.)
There were 5 benchmarks that sped up by more than 1% and 12 that slowed down by >1%.
One benchmark slowed down by >10% and three by >5%.
I will investigate these slowdowns.

I just collected more data on the slowdowns and they all are within noise level.
When taking the best scores over 5 or more runs, the patch shows a small spedup orverall.
Green light to commit from my side.

Thanks for checking this up! Good to know that this works also on your target.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineScheduler.h

2 lines

lib/

CodeGen/

MachineScheduler.cpp

162 lines

test/

CodeGen/

SystemZ/

6 lines

8 lines

6 lines

102 lines

8 lines

4 lines

vec-cmp-cmp-logic-select.ll

204 lines

Diff 132987

include/llvm/CodeGen/MachineScheduler.h

	Show First 20 Lines • Show All 783 Lines • ▼ Show 20 Lines
	/// heuristics for either preRA or postRA scheduling.			/// heuristics for either preRA or postRA scheduling.
	class GenericSchedulerBase : public MachineSchedStrategy {			class GenericSchedulerBase : public MachineSchedStrategy {
	public:			public:
	/// Represent the type of SchedCandidate found within a single queue.			/// Represent the type of SchedCandidate found within a single queue.
	/// pickNodeBidirectional depends on these listed by decreasing priority.			/// pickNodeBidirectional depends on these listed by decreasing priority.
	enum CandReason : uint8_t {			enum CandReason : uint8_t {
	NoCand, Only1, PhysRegCopy, RegExcess, RegCritical, Stall, Cluster, Weak,			NoCand, Only1, PhysRegCopy, RegExcess, RegCritical, Stall, Cluster, Weak,
	RegMax, ResourceReduce, ResourceDemand, BotHeightReduce, BotPathReduce,			RegMax, ResourceReduce, ResourceDemand, BotHeightReduce, BotPathReduce,
	TopDepthReduce, TopPathReduce, NextDefUse, NodeOrder};			TopDepthReduce, TopPathReduce, NextDefUse, PhysRegCp2, NodeOrder};

	#ifndef NDEBUG			#ifndef NDEBUG
	static const char *getReasonStr(GenericSchedulerBase::CandReason Reason);			static const char *getReasonStr(GenericSchedulerBase::CandReason Reason);
	#endif			#endif

	/// Policy for scheduling the next instruction in the candidate's zone.			/// Policy for scheduling the next instruction in the candidate's zone.
	struct CandPolicy {			struct CandPolicy {
	bool ReduceLatency = false;			bool ReduceLatency = false;
	▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 1,656 Lines • ▼ Show 20 Lines
std::unique_ptr<ScheduleDAGMutation>		std::unique_ptr<ScheduleDAGMutation>
createCopyConstrainDAGMutation(const TargetInstrInfo *TII,		createCopyConstrainDAGMutation(const TargetInstrInfo *TII,
const TargetRegisterInfo *TRI) {		const TargetRegisterInfo *TRI) {
return llvm::make_unique<CopyConstrain>(TII, TRI);		return llvm::make_unique<CopyConstrain>(TII, TRI);
}		}

} // end namespace llvm		} // end namespace llvm

		// Check if MI (which is expected to be a COPY) has as one of its operands a
		// physical register that could be allocated to the other operands virtual
		// register.
		static bool isCoalescablePRegCopy(const MachineInstr &CopyMI,
		const TargetRegisterInfo &TRI,
		const MachineRegisterInfo &MRI) {
		assert(CopyMI.isCopy() && "Expected a COPY");

		const MachineOperand *PRegMO = nullptr;
		const MachineOperand *VRegMO = nullptr;
		if (TargetRegisterInfo::isPhysicalRegister(CopyMI.getOperand(0).getReg())) {
		PRegMO = &CopyMI.getOperand(0);
		VRegMO = &CopyMI.getOperand(1);
		} else {
		PRegMO = &CopyMI.getOperand(1);
		VRegMO = &CopyMI.getOperand(0);
		}
		if (!TargetRegisterInfo::isPhysicalRegister(PRegMO->getReg()) \|\|
		!TargetRegisterInfo::isVirtualRegister(VRegMO->getReg()))
		return false;

		MCPhysReg PhysReg = PRegMO->getReg();
		unsigned VirtReg = VRegMO->getReg();
		unsigned VSub = VRegMO->getSubReg();

		const TargetRegisterClass *VirtRC = MRI.getRegClass(VirtReg);
		return !MRI.isReserved(PhysReg) &&
		(VirtRC->contains(PhysReg) \|\|
		(VSub && TRI.getMatchingSuperReg(PhysReg, VSub, VirtRC)));
		}

		// EXPERIMENTAL
		static cl::opt<bool> COPY_CONSTRAIN_CHECK("copy-constrain-check", cl::init(true));

/// constrainLocalCopy handles two possibilities:		/// constrainLocalCopy handles two possibilities:
/// 1) Local src:		/// 1) Local src:
/// I0: = dst		/// I0: = dst
/// I1: src = ...		/// I1: src = ...
/// I2: = dst		/// I2: = dst
/// I3: dst = src (copy)		/// I3: dst = src (copy)
/// (create pred->succ edges I0->I1, I2->I1)		/// (create pred->succ edges I0->I1, I2->I1)
///		///
/// 2) Local copy:		/// 2) Local copy:
/// I0: dst = src (copy)		/// I0: dst = src (copy)
/// I1: = dst		/// I1: = dst
/// I2: src = ...		/// I2: src = ...
/// I3: = dst		/// I3: = dst
/// (create pred->succ edges I1->I2, I3->I2)		/// (create pred->succ edges I1->I2, I3->I2)
///		///
/// Although the MachineScheduler is currently constrained to single blocks,		/// Although the MachineScheduler is currently constrained to single blocks,
/// this algorithm should handle extended blocks. An EBB is a set of		/// this algorithm should handle extended blocks. An EBB is a set of
/// contiguously numbered blocks such that the previous block in the EBB is		/// contiguously numbered blocks such that the previous block in the EBB is
/// always the single predecessor.		/// always the single predecessor.
void CopyConstrain::constrainLocalCopy(SUnit CopySU, ScheduleDAGMILive DAG) {		void CopyConstrain::constrainLocalCopy(SUnit CopySU, ScheduleDAGMILive DAG) {
LiveIntervals *LIS = DAG->getLIS();		LiveIntervals *LIS = DAG->getLIS();
MachineInstr *Copy = CopySU->getInstr();		MachineInstr *Copy = CopySU->getInstr();

		const MachineFunction *MF = Copy->getParent()->getParent();
		const MachineRegisterInfo *MRI = &MF->getRegInfo();
		const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();

// Check for pure vreg copies.		// Check for pure vreg copies.
const MachineOperand &SrcOp = Copy->getOperand(1);		const MachineOperand &SrcOp = Copy->getOperand(1);
unsigned SrcReg = SrcOp.getReg();		unsigned SrcReg = SrcOp.getReg();
if (!TargetRegisterInfo::isVirtualRegister(SrcReg) \|\| !SrcOp.readsReg())		if (!TargetRegisterInfo::isVirtualRegister(SrcReg) \|\| !SrcOp.readsReg())
return;		return;

const MachineOperand &DstOp = Copy->getOperand(0);		const MachineOperand &DstOp = Copy->getOperand(0);
unsigned DstReg = DstOp.getReg();		unsigned DstReg = DstOp.getReg();
if (!TargetRegisterInfo::isVirtualRegister(DstReg) \|\| DstOp.isDead())		if (!TargetRegisterInfo::isVirtualRegister(DstReg) \|\| DstOp.isDead())
return;		return;

		if (COPY_CONSTRAIN_CHECK) {
		// Check for connected phys regs.
		// * Make sure SrcReg was not defined by a COPY of phys-reg in the region.
		for (MachineInstr &DI : MRI->def_instructions(SrcReg))
		if (DAG->getSUnit(&DI) && DI.isCopy() &&
		isCoalescablePRegCopy(DI, TRI, MRI))
		return;

		// * Make sure DstReg users are not producing values copied to phys-regs.
		for (MachineInstr &DstRegUseMI : MRI->use_nodbg_instructions(DstReg)) {
		if (DAG->getSUnit(&DstRegUseMI) == nullptr)
		continue;
		const MachineOperand &DefMO = DstRegUseMI.getOperand(0);
		unsigned DefReg = (DefMO.isReg() && DefMO.isDef()) ? DefMO.getReg() : 0;
		if (!DefReg)
		continue;
		for (MachineInstr &UseMI : MRI->use_nodbg_instructions(DefReg)) {
		if (UseMI.isCopy() && DAG->getSUnit(&UseMI) != nullptr &&
		isCoalescablePRegCopy(UseMI, TRI, MRI))
		return;
		}
		}
		}

// Check if either the dest or source is local. If it's live across a back		// Check if either the dest or source is local. If it's live across a back
// edge, it's not local. Note that if both vregs are live across the back		// edge, it's not local. Note that if both vregs are live across the back
// edge, we cannot successfully contrain the copy without cyclic scheduling.		// edge, we cannot successfully contrain the copy without cyclic scheduling.
// If both the copy's source and dest are local live intervals, then we		// If both the copy's source and dest are local live intervals, then we
// should treat the dest as the global for the purpose of adding		// should treat the dest as the global for the purpose of adding
// constraints. This adds edges from source's other uses to the copy.		// constraints. This adds edges from source's other uses to the copy.
unsigned LocalReg = SrcReg;		unsigned LocalReg = SrcReg;
unsigned GlobalReg = DstReg;		unsigned GlobalReg = DstReg;
▲ Show 20 Lines • Show All 791 Lines • ▼ Show 20 Lines	const char *GenericSchedulerBase::getReasonStr(
case RegMax: return "REG-MAX ";		case RegMax: return "REG-MAX ";
case ResourceReduce: return "RES-REDUCE";		case ResourceReduce: return "RES-REDUCE";
case ResourceDemand: return "RES-DEMAND";		case ResourceDemand: return "RES-DEMAND";
case TopDepthReduce: return "TOP-DEPTH ";		case TopDepthReduce: return "TOP-DEPTH ";
case TopPathReduce: return "TOP-PATH ";		case TopPathReduce: return "TOP-PATH ";
case BotHeightReduce:return "BOT-HEIGHT";		case BotHeightReduce:return "BOT-HEIGHT";
case BotPathReduce: return "BOT-PATH ";		case BotPathReduce: return "BOT-PATH ";
case NextDefUse: return "DEF-USE ";		case NextDefUse: return "DEF-USE ";
		case PhysRegCp2: return "PREG-CP-2 ";
case NodeOrder: return "ORDER ";		case NodeOrder: return "ORDER ";
};		};
llvm_unreachable("Unknown reason!");		llvm_unreachable("Unknown reason!");
}		}

void GenericSchedulerBase::traceCandidate(const SchedCandidate &Cand) {		void GenericSchedulerBase::traceCandidate(const SchedCandidate &Cand) {
PressureChange P;		PressureChange P;
unsigned ResIdx = 0;		unsigned ResIdx = 0;
▲ Show 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	static int biasPhysRegCopy(const SUnit *SU, bool isTop) {
// immediately to free the dependent. We can hoist the copy later.		// immediately to free the dependent. We can hoist the copy later.
bool AtBoundary = isTop ? !SU->NumSuccsLeft : !SU->NumPredsLeft;		bool AtBoundary = isTop ? !SU->NumSuccsLeft : !SU->NumPredsLeft;
if (TargetRegisterInfo::isPhysicalRegister(		if (TargetRegisterInfo::isPhysicalRegister(
MI->getOperand(UnscheduledOper).getReg()))		MI->getOperand(UnscheduledOper).getReg()))
return AtBoundary ? -1 : 1;		return AtBoundary ? -1 : 1;
return 0;		return 0;
}		}

		// EXPERIMENTAL
		MatzeBUnsubmitted Done Reply Inline Actions This is a non-obvious rule and needs more comments and a better function name! MatzeB: This is a non-obvious rule and needs more comments and a better function name!
		static cl::opt<bool> BEFORE_WEAK("cp-2-before-weak", cl::init(false));
		static cl::opt<bool> COUNT_DEF("cp-2-count-def", cl::init(true));

		MatzeBUnsubmitted Done Reply Inline Actions `isCoalescablePRegCopy`? Use references for parameters that cannot be nullptr. MatzeB: `isCoalescablePRegCopy`? Use references for parameters that cannot be nullptr.
		MatzeBUnsubmitted Done Reply Inline Actions Could also do `MachineInstr &CopyMI`. MatzeB: Could also do `MachineInstr &CopyMI`.
		MatzeBUnsubmitted Done Reply Inline Actions I assume `CopyMI` can be const? MatzeB: I assume `CopyMI` can be const?
		/// Find the copy-connected physregs of \p SU and the heuristical sum of them
		/// (for a top region, it is incremented for a use and decremented for a
		/// def).
		MatzeBUnsubmitted Done Reply Inline Actions Please use `MachineOperand &` instead of `auto`. MatzeB: Please use `MachineOperand &` instead of `auto`.
		MatzeBUnsubmitted Done Reply Inline Actions No space after `assert`. MatzeB: No space after `assert`.
		static int findConnectedPhysRegs(const SUnit &SU, bool isTop, ScheduleDAGMILive *DAG) {
		const MachineInstr *MI = SU.getInstr();
		MatzeBUnsubmitted Done Reply Inline Actions Split into two lines. MatzeB: Split into two lines.
		const MachineBasicBlock *MBB = MI->getParent();
		const MachineFunction *MF = MBB->getParent();
		const MachineRegisterInfo *MRI = &MF->getRegInfo();
		const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();

		MatzeBUnsubmitted Not Done Reply Inline Actions I would perform the check on the ScheduleDAG rather than the MIs/operands. That should catch more cases and also gives you a sensible way to implement the reverse pattern. MatzeB: I would perform the check on the ScheduleDAG rather than the MIs/operands. That should catch…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions I tried - for (const MachineOperand &MO : MI->uses()) { - if (!MO.isReg()) + for (const SDep &Pred : SU->Preds) { + if (Pred.getKind() != SDep::Data) continue; - MachineInstr DefMI = MRI->getUniqueVRegDef(MO.getReg()); + unsigned Reg = Pred.getReg(); + MachineInstr DefMI = MRI->getUniqueVRegDef(Reg); This seemed to give more copy coalescing, but also for some reason a bit more spilling, which is why it doesn't look as good. Why would this be better? Basically, checking the SU->Preds (which is what I guessed you mean by "on the ScheduleDAG"), should only check dependencies within the sheduling region. Since coalescing works globally, wouldn't it be better to find the defining MI also if it is in another block? What more cases are you thinking of? jonpa: I tried ``` - for (const MachineOperand &MO : MI->uses()) { - if (!MO.isReg()) + for…
		MatzeBUnsubmitted Done Reply Inline Actions Hmm right, scheduling regions are limiting as well. The case I was thinking of is when you have multiple definitions of a vreg in different basic blocks, then MachineRegisterInfo:: getUniqueVRegDef() will return null, while the scheduling region gets you a reaching definition for free; but indeed this thinking is only true if the definition is actually part of the same scheduling region. Sounds like the region limits are a bigger problem than multiple definitions so keep the code as is. MatzeB: Hmm right, scheduling regions are limiting as well. The case I was thinking of is when you…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions OK. It also makes sense to me as it is handling cases where vreg is defined by a COPY of a physreg, and I think this should typically mean that vreg only has one definition. jonpa: OK. It also makes sense to me as it is handling cases where vreg is defined by a COPY of a…
		// Don't move a copy of a subreg away from the point of definition.
		if (!MI->getNumOperands() \|\|
		(MI->isCopy() && MI->getOperand(1).getSubReg()))
		return 0;

		int NumCopied = 0;
		MatzeBUnsubmitted Done Reply Inline Actions Could use `MCPhysReg` instead of `unsigned` here. Subregister indexes are only used for virtual registers `getSubReg()` is always 0 when a physreg is assigned. MatzeB: - Could use `MCPhysReg` instead of `unsigned` here. - Subregister indexes are only used for…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions right jonpa: right

		// Check if there is a local coalescable phys-reg COPY that uses the result
		// of \p SU. Return 0 if there is any other kind of use.
		MatzeBUnsubmitted Done Reply Inline Actions We don't align equal signs in llvm (unfortunately IMO). MatzeB: We don't align equal signs in llvm (unfortunately IMO).
		bool PhysRegDefUse = false;
		const MachineOperand &MO = MI->getOperand(0);
		if (MO.isReg() && MO.isDef())
		MatzeBUnsubmitted Done Reply Inline Actions No braces necessary around the whole returned expression. MatzeB: No braces necessary around the whole returned expression.
		for (MachineInstr &Use : MRI->use_nodbg_instructions(MO.getReg())) {
		if (Use.isCopy() && Use.getParent() == MI->getParent() &&
		MatzeBUnsubmitted Done Reply Inline Actions Should we restrict the subregister case to `VRegMO->isUndef()`? It is hard to know whether the coalescing can actually succeed in the non-undef case; It could if all other defs copy the relevant subregisters as well, but I wonder how typical that case is given that it corresponds to simply copying the whole register in a single instruction. MatzeB: Should we restrict the subregister case to `VRegMO->isUndef()`? It is hard to know whether the…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions I added it and as you suspected, it gives very little change. On spec, just ~5 instructions in total got rescheduled... And oh yeah, also actually 1 COPY more now... ;-) That copy is from a vector register into a floating point argument register. This is a bit special: on SystemZ an FP reg lives in a low-part of any of the 32 vector registers. So the FP register is a subreg of the vector register, but it is the only subreg that is interesting here. No change at all in spill or so, so this doesn't matter. Doesn't seem to save much compile time either, I think. Due to the vector/fp registers on SystemZ, I will leave it as it was just for good manners sake, unless you insist on it. jonpa: I added it and as you suspected, it gives very little change. On spec, just ~5 instructions in…
		isCoalescablePRegCopy(Use, TRI, MRI))
		PhysRegDefUse = true;
		else
		return 0;
		}
		if (PhysRegDefUse && COUNT_DEF)
		NumCopied += (isTop ? -1 : 1);

		MatzeBUnsubmitted Done Reply Inline Actions Could use `const SUnit &` MatzeB: Could use `const SUnit &`
		for (const MachineOperand &MO : MI->uses()) {
		if (!MO.isReg())
		continue;

		// Check if there is a local coalescable phys-reg COPY that defines the
		// used register.
		MatzeBUnsubmitted Done Reply Inline Actions Could use the `\p SU` doxygen syntax to reference parameters. MatzeB: Could use the `\p SU` doxygen syntax to reference parameters.
		bool LocalPhysRegUseDef = false;
		for (MachineInstr &DI : MRI->def_instructions(MO.getReg()))
		if (DI.isCopy() && DI.getParent() == MI->getParent() &&
		isCoalescablePRegCopy(DI, TRI, MRI)) {
		LocalPhysRegUseDef = true;
		break;
		}
		if (!LocalPhysRegUseDef)
		continue;

		// Check if all users are local.
		bool AllUsesLocal = true;
		for (MachineInstr &Use : MRI->use_nodbg_instructions(MO.getReg())) {
		if (Use.getParent() != MBB) {
		AllUsesLocal = false;
		break;
		}
		}
		if (!AllUsesLocal)
		MatzeBUnsubmitted Done Reply Inline Actions Could use `if (DefMI == nullptr \|\| !DefMI->isCopy()) continue;` etc. to reduce indentation of the following code. MatzeB: Could use `if (DefMI == nullptr \|\| !DefMI->isCopy()) continue;` etc. to reduce indentation of…
		continue;

		MatzeBUnsubmitted Done Reply Inline Actions This would benefit from some refactoring to avoid the repeated getOperand(0) expressions. isImplicit() should only be set on uses and I don't see why it matters here. (Technically you could see the flag being set on a subregister definition which also acts as a use, but again I don't see why it matters) MatzeB: - This would benefit from some refactoring to avoid the repeated getOperand(0) expressions.
		NumCopied += (isTop ? 1 : -1);
		}

		return NumCopied;
		}

		MatzeBUnsubmitted Done Reply Inline Actions I think we should be conservative here and restrict the check to cases where the vreg has one use only. I could imagine this showing strange effects and less benefits when there are multiple uses or the use is in a different block. This is another case where it may or may not be better to look at the schedule graph instead... It's enough to check `use_nodbg_operands()` MatzeB: - I think we should be conservative here and restrict the check to cases where the vreg has one…
		/// Minimize the (virtual) live ranges of copies involving phys-regs. In
		/// regions with both incoming and outgoing arguments, this will reduce the
		/// risk of overlapping live ranges that will hinder coalescing. In contrast
		MatzeBUnsubmitted Done Reply Inline Actions No need for braces. MatzeB: No need for braces.
		/// to biasPhysRegCopy(), this does not typically handle COPYs, but rather
		/// instructions connected to a COPY involving a phys-reg.
		static bool tryPhysRegCopies2(GenericSchedulerBase::SchedCandidate &TryCand,
		GenericSchedulerBase::SchedCandidate &Cand,
		ScheduleDAGMILive *DAG) {
		assert(TryCand.AtTop == Cand.AtTop &&
		"Expected two candidates from same boundary.");

		MatzeBUnsubmitted Done Reply Inline Actions Restricting this to 128bit COPYs is artifical. Would it also work when skipping all subregister COPYs? Also unnecessary outer brace. MatzeB: Restricting this to 128bit COPYs is artifical. Would it also work when skipping all subregister…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions I was surprised to find that removing the check for the width of the super-register was a complete NFC :-) Removed. jonpa: I was surprised to find that removing the check for the width of the super-register was a…
		// Find the copy-connected physregs.
		int TryCandCount = findConnectedPhysRegs(*TryCand.SU, TryCand.AtTop, DAG);
		MatzeBUnsubmitted Done Reply Inline Actions unnecessary brace outer brace around expression. MatzeB: unnecessary brace outer brace around expression.
		int CandCount = findConnectedPhysRegs(*Cand.SU, Cand.AtTop, DAG);

		// Prefer the candidate that has the most(top) / least(bot) number of copied
		// phys-regs, to minimize those live-ranges.
		return tryGreater(TryCandCount, CandCount,
		TryCand, Cand, GenericSchedulerBase::PhysRegCp2);
		}

void GenericScheduler::initCandidate(SchedCandidate &Cand, SUnit *SU,		void GenericScheduler::initCandidate(SchedCandidate &Cand, SUnit *SU,
bool AtTop,		bool AtTop,
const RegPressureTracker &RPTracker,		const RegPressureTracker &RPTracker,
RegPressureTracker &TempTracker) {		RegPressureTracker &TempTracker) {
		MatzeBUnsubmitted Done Reply Inline Actions I think this would look better if it used more early exit (i.e. if (!getNumOperands()) return; if (skipWideCopy) return; ...`. Maybe you can rewrite the code that changes NumCopiedPhysRegUses to` PRegs.NumCopied += isTop ? 1 : -1` (or `-1 : 1`) to avoid the need for the last line. MatzeB: I think this would look better if it used more early exit (i.e. if (!getNumOperands()) return…
Cand.SU = SU;		Cand.SU = SU;
Cand.AtTop = AtTop;		Cand.AtTop = AtTop;
if (DAG->isTrackingPressure()) {		if (DAG->isTrackingPressure()) {
if (AtTop) {		if (AtTop) {
TempTracker.getMaxDownwardPressureDelta(		TempTracker.getMaxDownwardPressureDelta(
Cand.SU->getInstr(),		Cand.SU->getInstr(),
Cand.RPDelta,		Cand.RPDelta,
DAG->getRegionCriticalPSets(),		DAG->getRegionCriticalPSets(),
DAG->getRegPressure().MaxSetPressure);		DAG->getRegPressure().MaxSetPressure);
} else {		} else {
		MatzeBUnsubmitted Done Reply Inline Actions No space after `assert`. MatzeB: No space after `assert`.
if (VerifyScheduling) {		if (VerifyScheduling) {
TempTracker.getMaxUpwardPressureDelta(		TempTracker.getMaxUpwardPressureDelta(
Cand.SU->getInstr(),		Cand.SU->getInstr(),
&DAG->getPressureDiff(Cand.SU),		&DAG->getPressureDiff(Cand.SU),
		MatzeBUnsubmitted Done Reply Inline Actions Split into two lines. MatzeB: Split into two lines.
Cand.RPDelta,		Cand.RPDelta,
DAG->getRegionCriticalPSets(),		DAG->getRegionCriticalPSets(),
DAG->getRegPressure().MaxSetPressure);		DAG->getRegPressure().MaxSetPressure);
} else {		} else {
RPTracker.getUpwardPressureDelta(		RPTracker.getUpwardPressureDelta(
Cand.SU->getInstr(),		Cand.SU->getInstr(),
DAG->getPressureDiff(Cand.SU),		DAG->getPressureDiff(Cand.SU),
Cand.RPDelta,		Cand.RPDelta,
DAG->getRegionCriticalPSets(),		DAG->getRegionCriticalPSets(),
DAG->getRegPressure().MaxSetPressure);		DAG->getRegPressure().MaxSetPressure);
}		}
}		}
}		}
DEBUG(if (Cand.RPDelta.Excess.isValid())		DEBUG(if (Cand.RPDelta.Excess.isValid())
dbgs() << " Try SU(" << Cand.SU->NodeNum << ") "		dbgs() << " Try SU(" << Cand.SU->NodeNum << ") "
<< TRI->getRegPressureSetName(Cand.RPDelta.Excess.getPSet())		<< TRI->getRegPressureSetName(Cand.RPDelta.Excess.getPSet())
<< ":" << Cand.RPDelta.Excess.getUnitInc() << "\n");		<< ":" << Cand.RPDelta.Excess.getUnitInc() << "\n");
}		}

/// Apply a set of heursitics to a new candidate. Heuristics are currently		/// Apply a set of heursitics to a new candidate. Heuristics are currently
/// hierarchical. This may be more efficient than a graduated cost model because		/// hierarchical. This may be more efficient than a graduated cost model because
/// we don't need to evaluate all aspects of the model for each node in the		/// we don't need to evaluate all aspects of the model for each node in the
/// queue. But it's really done to make the heuristics easier to debug and		/// queue. But it's really done to make the heuristics easier to debug and
/// statistically analyze.		/// statistically analyze.
///		///
/// \param Cand provides the policy and current best candidate.		/// \param Cand provides the policy and current best candidate.
/// \param TryCand refers to the next SUnit candidate, otherwise uninitialized.		/// \param TryCand refers to the next SUnit candidate, otherwise uninitialized.
/// \param Zone describes the scheduled zone that we are extending, or nullptr		/// \param Zone describes the scheduled zone that we are extending, or nullptr
// if Cand is from a different zone than TryCand.		// if Cand is from a different zone than TryCand.
void GenericScheduler::tryCandidate(SchedCandidate &Cand,		void GenericScheduler::tryCandidate(SchedCandidate &Cand,
SchedCandidate &TryCand,		SchedCandidate &TryCand,
		MatzeBUnsubmitted Done Reply Inline Actions Shouldn't the `NumCopied` comparisons already pull the `%2 = OP` upwards and the `%1 = OP` downwards in the example? If there is a reason to handle this case separately anyway write in the comment about it. MatzeB: Shouldn't the `NumCopied` comparisons already pull the `%2 = OP` upwards and the `%1 = OP`…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions While developing this patch, I noticed regressions that I could only (at least so far) get rid of by being less aggressive, which made sense "keep the good and get rid of the bad". One thing that seemed to do the trick then was to add if (MRI->hasOneUse(MO.getReg())) // Make this a bit less agressive by checking for one use. NumCopiedPhysRegUses++; So the reason that this general heuristic fails at least in this test case, is that the use of %0 is ignored. This seems unfortunate, but it is still true that I so far cannot get rid of regressions without limiting the general heuristic... At the same time, this case really should be handled. jonpa: While developing this patch, I noticed regressions that I could only (at least so far) get rid…
SchedBoundary *Zone) {		SchedBoundary *Zone) {
// Initialize the candidate if needed.		// Initialize the candidate if needed.
if (!Cand.isValid()) {		if (!Cand.isValid()) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
return;		return;
}		}

if (tryGreater(biasPhysRegCopy(TryCand.SU, TryCand.AtTop),		if (tryGreater(biasPhysRegCopy(TryCand.SU, TryCand.AtTop),
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	void GenericScheduler::tryCandidate(SchedCandidate &Cand,
const SUnit *TryCandNextClusterSU =		const SUnit *TryCandNextClusterSU =
TryCand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();		TryCand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
if (tryGreater(TryCand.SU == TryCandNextClusterSU,		if (tryGreater(TryCand.SU == TryCandNextClusterSU,
Cand.SU == CandNextClusterSU,		Cand.SU == CandNextClusterSU,
TryCand, Cand, Cluster))		TryCand, Cand, Cluster))
return;		return;

if (SameBoundary) {		if (SameBoundary) {
		if (BEFORE_WEAK)
		if (tryPhysRegCopies2(TryCand, Cand, DAG))
		return;

// Weak edges are for clustering and other constraints.		// Weak edges are for clustering and other constraints.
if (tryLess(getWeakLeft(TryCand.SU, TryCand.AtTop),		if (tryLess(getWeakLeft(TryCand.SU, TryCand.AtTop),
getWeakLeft(Cand.SU, Cand.AtTop),		getWeakLeft(Cand.SU, Cand.AtTop),
TryCand, Cand, Weak))		TryCand, Cand, Weak))
return;		return;
}		}

// Avoid increasing the max pressure of the entire region.		// Avoid increasing the max pressure of the entire region.
Show All 15 Lines	if (tryGreater(TryCand.ResDelta.DemandedResources,
return;		return;

// Avoid serializing long latency dependence chains.		// Avoid serializing long latency dependence chains.
// For acyclic path limited loops, latency was already checked above.		// For acyclic path limited loops, latency was already checked above.
if (!RegionPolicy.DisableLatencyHeuristic && TryCand.Policy.ReduceLatency &&		if (!RegionPolicy.DisableLatencyHeuristic && TryCand.Policy.ReduceLatency &&
!Rem.IsAcyclicLatencyLimited && tryLatency(TryCand, Cand, *Zone))		!Rem.IsAcyclicLatencyLimited && tryLatency(TryCand, Cand, *Zone))
return;		return;

		// Try to minimize live ranges of copied physregs.
		if (!BEFORE_WEAK)
		if (tryPhysRegCopies2(TryCand, Cand, DAG))
		return;
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions Is it OK to use the same CandReason (PhysRegCopy) as biasPhysRegCopy(), or should it be different, e.g. PhysRegCopyUses or PhysRegCopy2 ? jonpa: Is it OK to use the same CandReason (PhysRegCopy) as biasPhysRegCopy(), or should it be…
		MatzeBUnsubmitted Done Reply Inline Actions Introducing a new enum entry here for this is cheap and helps debugging. MatzeB: Introducing a new enum entry here for this is cheap and helps debugging.
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions All, right - I added a new entry for this with the name PhysRegCp2 / PREG-CP-2 (feel free to modify), with the next-lowest priority just before NodeOrder. The priority change is NFC on SystemZ (since it is bottom-up, I presume). jonpa: All, right - I added a new entry for this with the name PhysRegCp2 / PREG-CP-2 (feel free to…

// Fall through to original instruction order.		// Fall through to original instruction order.
if ((Zone->isTop() && TryCand.SU->NodeNum < Cand.SU->NodeNum)		if ((Zone->isTop() && TryCand.SU->NodeNum < Cand.SU->NodeNum)
\|\| (!Zone->isTop() && TryCand.SU->NodeNum > Cand.SU->NodeNum)) {		\|\| (!Zone->isTop() && TryCand.SU->NodeNum > Cand.SU->NodeNum)) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
}		}
}		}
}		}

▲ Show 20 Lines • Show All 655 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/alloca-01.ll

	; Test variable-sized allocas and addresses based on them in cases where			; Test variable-sized allocas and addresses based on them in cases where
	; stack arguments are needed.			; stack arguments are needed.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-A			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-A
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-B			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-B
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-C			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-C
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-D			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-D
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-FP			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-FP

	declare i64 @bar(i8 %a, i8 %b, i8 %c, i8 %d, i8 *%e, i64 %f, i64 %g)			declare i64 @bar(i8 %a, i8 %b, i8 %c, i8 %d, i8 *%e, i64 %f, i64 %g)

	; Allocate %length bytes and take addresses based on the result.			; Allocate %length bytes and take addresses based on the result.
	; There are two stack arguments, so an offset of 160 + 2 * 8 == 176			; There are two stack arguments, so an offset of 160 + 2 * 8 == 176
	; is added to the copy of %r15.			; is added to the copy of %r15.
	;
	; NOTE: 'la %r0, 177(%r1)' is actually an expected fail as it would
	; be better (and possible) to load into %r3 directly.
	;
	define i64 @f1(i64 %length, i64 %index) {			define i64 @f1(i64 %length, i64 %index) {
	; FIXME: a better sequence would be:			; FIXME: a better sequence would be:
	;			;
	; lgr %r1, %r15			; lgr %r1, %r15
	; sgr %r1, %r2			; sgr %r1, %r2
	; nill %r1, 0xfff8			; nill %r1, 0xfff8
	; lgr %r15, %r1			; lgr %r15, %r1
	;			;
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK-DAG: la [[REG1:%r[0-5]]], 7(%r2)			; CHECK-DAG: la [[REG1:%r[0-5]]], 7(%r2)
	; CHECK-DAG: nill [[REG1]], 65528			; CHECK-DAG: nill [[REG1]], 65528
	; CHECK-DAG: lgr [[REG2:%r[0-5]]], %r15			; CHECK-DAG: lgr [[REG2:%r[0-5]]], %r15
	; CHECK: sgr [[REG2]], [[REG1]]			; CHECK: sgr [[REG2]], [[REG1]]
	; CHECK: lgr %r15, [[REG2]]			; CHECK: lgr %r15, [[REG2]]
	;			;
	; CHECK-A-LABEL: f1:			; CHECK-A-LABEL: f1:
	; CHECK-A-DAG: lgr %r15, %r1			; CHECK-A-DAG: lgr %r15, %r1
	; CHECK-A-DAG: la %r2, 176(%r1)			; CHECK-A-DAG: la %r2, 176(%r1)
	;			;
	; CHECK-B-LABEL: f1:			; CHECK-B-LABEL: f1:
	; CHECK-B: lgr %r15, %r1			; CHECK-B: lgr %r15, %r1
	; CHECK-B: la %r0, 177(%r1)			; CHECK-B: la %r3, 177(%r1)
	;			;
	; CHECK-C-LABEL: f1:			; CHECK-C-LABEL: f1:
	; CHECK-C: lgr %r15, %r1			; CHECK-C: lgr %r15, %r1
	; CHECK-C: la %r4, 4095({{%r3,%r1\|%r1,%r3}})			; CHECK-C: la %r4, 4095({{%r3,%r1\|%r1,%r3}})
	;			;
	; CHECK-D-LABEL: f1:			; CHECK-D-LABEL: f1:
	; CHECK-D: lgr %r15, %r1			; CHECK-D: lgr %r15, %r1
	; CHECK-D: lay %r5, 4096({{%r3,%r1\|%r1,%r3}})			; CHECK-D: lay %r5, 4096({{%r3,%r1\|%r1,%r3}})
	Show All 20 Lines

test/CodeGen/SystemZ/args-06.ll

	; Test the padding of unextended integer stack parameters. These are used			; Test the padding of unextended integer stack parameters. These are used
	; to pass structures.			; to pass structures.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s

	define i8 @f1(i8 %a, i8 %b, i8 %c, i8 %d, i8 %e, i8 %f, i8 %g) {			define i8 @f1(i8 %a, i8 %b, i8 %c, i8 %d, i8 %e, i8 %f, i8 %g) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: lb {{%r[0-5]}}, 175(%r15)			; CHECK-DAG: lb {{%r[0-5]}}, 175(%r15)
	; CHECK: lb {{%r[0-5]}}, 167(%r15)			; CHECK-DAG: lb {{%r[0-5]}}, 167(%r15)
	; CHECK: ar %r2, %r3			; CHECK-DAG: ar %r2, %r3
	; CHECK: ar %r2, %r4			; CHECK-DAG: ar %r2, %r4
	; CHECK: ar %r2, %r5			; CHECK: ar %r2, %r5
	; CHECK: ar %r2, %r6			; CHECK: ar %r2, %r6
	; CHECK: br %r14			; CHECK: br %r14
	%addb = add i8 %a, %b			%addb = add i8 %a, %b
	%addc = add i8 %addb, %c			%addc = add i8 %addb, %c
	%addd = add i8 %addc, %d			%addd = add i8 %addc, %d
	%adde = add i8 %addd, %e			%adde = add i8 %addd, %e
	%addf = add i8 %adde, %f			%addf = add i8 %adde, %f
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/args-10.ll

	; Test incoming i128 arguments.			; Test incoming i128 arguments.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s

	; Do some arithmetic so that we can see the register being used.			; Do some arithmetic so that we can see the register being used.
	define void @f1(i128 *%r2, i16 %r3, i32 %r4, i64 %r5, i128 %r6) {			define void @f1(i128 *%r2, i16 %r3, i32 %r4, i64 %r5, i128 %r6) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK-DAG: lg [[REGL:%r[0-5]+]], 8(%r6)			; CHECK-DAG: lg [[REGL:%r[0-5]+]], 8(%r6)
	; CHECK-DAG: lg [[REGH:%r[0-5]+]], 0(%r6)			; CHECK-DAG: lg [[REGH:%r[0-5]+]], 0(%r6)
	; CHECK: algr [[REGL]], [[REGL]]			; CHECK: algr [[REGL]], [[REGL]]
	; CHECK-NEXT: alcgr [[REGH]], [[REGH]]			; CHECK-DAG: alcgr [[REGH]], [[REGH]]
	; CHECK-DAG: stg [[REGL]], 8(%r2)			; CHECK-DAG: stg [[REGL]], 8(%r2)
	; CHECK-DAG: stg [[REGH]], 0(%r2)			; CHECK-DAG: stg [[REGH]], 0(%r2)
	; CHECK: br %r14			; CHECK: br %r14
	%y = add i128 %r6, %r6			%y = add i128 %r6, %r6
	store i128 %y, i128 *%r2			store i128 %y, i128 *%r2
	ret void			ret void
	}			}

	; Test a case where the i128 address is passed on the stack.			; Test a case where the i128 address is passed on the stack.
	define void @f2(i128 *%r2, i16 %r3, i32 %r4, i64 %r5,			define void @f2(i128 *%r2, i16 %r3, i32 %r4, i64 %r5,
	i128 %r6, i64 %s1, i64 %s2, i128 %s4) {			i128 %r6, i64 %s1, i64 %s2, i128 %s4) {
	; CHECK-LABEL: f2:			; CHECK-LABEL: f2:
	; CHECK: lg [[ADDR:%r[1-5]+]], 176(%r15)			; CHECK: lg [[ADDR:%r[1-5]+]], 176(%r15)
	; CHECK-DAG: lg [[REGL:%r[0-5]+]], 8([[ADDR]])			; CHECK-DAG: lg [[REGL:%r[0-5]+]], 8([[ADDR]])
	; CHECK-DAG: lg [[REGH:%r[0-5]+]], 0([[ADDR]])			; CHECK-DAG: lg [[REGH:%r[0-5]+]], 0([[ADDR]])
	; CHECK: algr [[REGL]], [[REGL]]			; CHECK: algr [[REGL]], [[REGL]]
	; CHECK-NEXT: alcgr [[REGH]], [[REGH]]			; CHECK-DAG: alcgr [[REGH]], [[REGH]]
	; CHECK-DAG: stg [[REGL]], 8(%r2)			; CHECK-DAG: stg [[REGL]], 8(%r2)
	; CHECK-DAG: stg [[REGH]], 0(%r2)			; CHECK-DAG: stg [[REGH]], 0(%r2)
	; CHECK: br %r14			; CHECK: br %r14
	%y = add i128 %s4, %s4			%y = add i128 %s4, %s4
	store i128 %y, i128 *%r2			store i128 %y, i128 *%r2
	ret void			ret void
	}			}

	; Explicit i128 return values are likewise passed indirectly.			; Explicit i128 return values are likewise passed indirectly.
	define i128 @f14(i128 %r3) {			define i128 @f14(i128 %r3) {
	; CHECK-LABEL: f14:			; CHECK-LABEL: f14:
	; CHECK-DAG: lg [[REGL:%r[0-5]+]], 8(%r3)			; CHECK-DAG: lg [[REGL:%r[0-5]+]], 8(%r3)
	; CHECK-DAG: lg [[REGH:%r[0-5]+]], 0(%r3)			; CHECK-DAG: lg [[REGH:%r[0-5]+]], 0(%r3)
	; CHECK: algr [[REGL]], [[REGL]]			; CHECK: algr [[REGL]], [[REGL]]
	; CHECK-NEXT: alcgr [[REGH]], [[REGH]]			; CHECK-DAG: alcgr [[REGH]], [[REGH]]
	; CHECK-DAG: stg [[REGL]], 8(%r2)			; CHECK-DAG: stg [[REGL]], 8(%r2)
	; CHECK-DAG: stg [[REGH]], 0(%r2)			; CHECK-DAG: stg [[REGH]], 0(%r2)
	; CHECK: br %r14			; CHECK: br %r14
	%y = add i128 %r3, %r3			%y = add i128 %r3, %r3
	ret i128 %y			ret i128 %y
	}			}

test/CodeGen/SystemZ/args-11.mir

This file was added.

				# RUN: llc -o - %s -mtriple=s390x-linux-gnu -run-pass=machine-scheduler \
				# RUN: -debug-only=machine-scheduler 2>&1 \| FileCheck %s

				# Test that an extra COPY due to poor pre-RA scheduling is avoided. This
				# would happen if the '%7 = ADJDYNALLOC ...' is scheduled above the three
				# ADJDYNALLOCs using %1 (%r3d dependency).
				fhahnUnsubmitted Done Reply Inline Actions Indent in the comment off by 1? fhahn: Indent in the comment off by 1?

				--- \|

				declare i64 @bar(i8, i8, i8, i8, i8*, i64, i64)

				define i64 @f1(i64 %length, i64 %index) {
				%a = alloca i8, i64 %length
				%b = getelementptr i8, i8* %a, i64 1
				%cindex = add i64 %index, 3919
				%c = getelementptr i8, i8* %a, i64 %cindex
				%dindex = add i64 %index, 3920
				%d = getelementptr i8, i8* %a, i64 %dindex
				%eindex = add i64 %index, 4095
				%e = getelementptr i8, i8* %a, i64 %eindex
				%count = call i64 @bar(i8* %a, i8* %b, i8* %c, i8* %d, i8* %e, i64 0, i64 0)
				%res = add i64 %count, 1
				ret i64 %res
				}

				...

				# CHECK: ******** MI Scheduling ********
				# CHECK: f1:%bb.0
				fhahnUnsubmitted Not Done Reply Inline Actions You could use `-debug-only=machine-scheduler` and check the schedule emitted there, which would map more directly MachineInstrs used in the test case. fhahn: You could use `-debug-only=machine-scheduler` and check the schedule emitted there, which would…
				MatzeBUnsubmitted Not Done Reply Inline Actions or `-run-pass=machine-schedule` for that matter so we only run the one pass we are interested in. MatzeB: or `-run-pass=machine-schedule` for that matter so we only run the one pass we are interested…
				jonpaAuthorUnsubmitted Not Done Reply Inline Actions I am not sure - wouldn't it be better to confirm that the output in the end ends up to be good? I mean, that way it tests both that mischeduler and regalloc are smart, and no matter what, we know this test case should not have that extra copy. I am just curious about what your motivation is for changing this... jonpa: I am not sure - wouldn't it be better to confirm that the output in the end ends up to be good?
				MatzeBUnsubmitted Done Reply Inline Actions There is a place for end-to-end testing and for directed testing of exactly the feature/change you implemented. We tend to use llvm/test for the latter and the llvm test-suite for the former. Admittedly the test-suite is only checked for performance and not for generated assembly at the moment so there may be room for some middle ground, but I don't see why we should start this here and with this commit... MatzeB: There is a place for end-to-end testing and for directed testing of exactly the feature/change…
				jonpaAuthorUnsubmitted Not Done Reply Inline Actions I see. jonpa: I see.
				# CHECK: From: %7:gr64bit = ADJDYNALLOC %5:addr64bit, 1, $noreg
				# CHECK: To: CallBRASL
				# CHECK: * Final schedule for %bb.0 *
				# CHECK: SU(1): %8:gr64bit = ADJDYNALLOC %5:addr64bit, 3919, %1:addr64bit
				# CHECK: SU(2): %9:gr64bit = ADJDYNALLOC %5:addr64bit, 3920, %1:addr64bit
				# CHECK: SU(3): %10:gr64bit = ADJDYNALLOC %5:addr64bit, 4095, %1:addr64bit
				# CHECK: SU(0): %7:gr64bit = ADJDYNALLOC %5:addr64bit, 1, $noreg
				#
				# CHECK: ******** MI Scheduling ********
				# CHECK: f1:%bb.0
				# CHECK: From: %1:addr64bit = COPY $r3d
				# CHECK: To: $r15d = COPY %5:addr64bit


				---
				name: f1
				alignment: 2
				tracksRegLiveness: true
				registers:
				- { id: 0, class: addr64bit }
				- { id: 1, class: addr64bit }
				- { id: 2, class: gr64bit }
				- { id: 3, class: gr64bit }
				- { id: 4, class: gr64bit }
				- { id: 5, class: addr64bit }
				- { id: 6, class: gr64bit }
				- { id: 7, class: gr64bit }
				- { id: 8, class: gr64bit }
				- { id: 9, class: gr64bit }
				- { id: 10, class: gr64bit }
				- { id: 11, class: addr64bit }
				- { id: 12, class: addr64bit }
				- { id: 13, class: gr64bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%0' }
				- { reg: '$r3d', virtual-reg: '%1' }
				frameInfo:
				maxAlignment: 8
				hasCalls: true
				stack:
				- { id: 0, name: a, type: variable-sized, alignment: 1, stack-id: 0 }
				body: \|
				bb.0 (%ir-block.0):
				liveins: $r2d, $r3d

				%1 = COPY $r3d
				%0 = COPY $r2d
				%3 = LA %0, 7, _
				%3 = NILL64 %3, 65528, implicit-def dead $cc
				%5 = COPY $r15d
				%5 = SGR %5, %3, implicit-def dead $cc
				%6 = ADJDYNALLOC %5, 0, _
				$r15d = COPY %5
				%7 = ADJDYNALLOC %5, 1, _
				%8 = ADJDYNALLOC %5, 3919, %1
				%9 = ADJDYNALLOC %5, 3920, %1
				%10 = ADJDYNALLOC %5, 4095, %1
				ADJCALLSTACKDOWN 16, 0
				MVGHI $r15d, 168, 0 :: (store 8)
				MVGHI $r15d, 160, 0 :: (store 8)
				$r2d = COPY %6
				$r3d = COPY %7
				$r4d = COPY %8
				$r5d = COPY %9
				$r6d = COPY %10
				CallBRASL @bar, $r2d, $r3d, killed $r4d, killed $r5d, killed $r6d, csr_systemz, implicit-def dead $r14d, implicit-def dead $cc, implicit-def $r2d
				ADJCALLSTACKUP 16, 0
				%12 = COPY $r2d
				%13 = LA %12, 1, _
				$r2d = COPY %13
				Return implicit $r2d

				...

test/CodeGen/SystemZ/cond-move-02.ll

	; Test LOCHI and LOCGHI.			; Test LOCHI and LOCGHI.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 -verify-machineinstrs \| FileCheck %s

	define i32 @f1(i32 %x) {			define i32 @f1(i32 %x) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: lhi [[REG:%r[0-5]]], 0			; CHECK-DAG: lhi [[REG:%r[0-5]]], 0
	; CHECK: chi %r2, 0			; CHECK-DAG: chi %r2, 0
	; CHECK: lochilh [[REG]], 42			; CHECK: lochilh [[REG]], 42
	; CHECK: br %r14			; CHECK: br %r14
	%cond = icmp ne i32 %x, 0			%cond = icmp ne i32 %x, 0
	%res = select i1 %cond, i32 42, i32 0			%res = select i1 %cond, i32 42, i32 0
	ret i32 %res			ret i32 %res
	}			}

	define i32 @f2(i32 %x, i32 %y) {			define i32 @f2(i32 %x, i32 %y) {
	Show All 13 Lines
	; CHECK: br %r14			; CHECK: br %r14
	%cond = icmp ne i32 %x, 0			%cond = icmp ne i32 %x, 0
	%res = select i1 %cond, i32 %y, i32 42			%res = select i1 %cond, i32 %y, i32 42
	ret i32 %res			ret i32 %res
	}			}

	define i64 @f4(i64 %x) {			define i64 @f4(i64 %x) {
	; CHECK-LABEL: f4:			; CHECK-LABEL: f4:
	; CHECK: lghi [[REG:%r[0-5]]], 0			; CHECK-DAG: lghi [[REG:%r[0-5]]], 0
	; CHECK: cghi %r2, 0			; CHECK-DAG: cghi %r2, 0
	; CHECK: locghilh [[REG]], 42			; CHECK: locghilh [[REG]], 42
	; CHECK: br %r14			; CHECK: br %r14
	%cond = icmp ne i64 %x, 0			%cond = icmp ne i64 %x, 0
	%res = select i1 %cond, i64 42, i64 0			%res = select i1 %cond, i64 42, i64 0
	ret i64 %res			ret i64 %res
	}			}

	define i64 @f5(i64 %x, i64 %y) {			define i64 @f5(i64 %x, i64 %y) {
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/risbg-01.ll

Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	; CHECK: br %r14
%shr = ashr i64 %foo, 60		%shr = ashr i64 %foo, 60
%and = and i64 %shr, 30		%and = and i64 %shr, 30
ret i64 %and		ret i64 %and
}		}

; Now try an arithmetic right shift in which the sign bits aren't needed.		; Now try an arithmetic right shift in which the sign bits aren't needed.
; Introduce a second use of %shr so that the ashr doesn't decompose to		; Introduce a second use of %shr so that the ashr doesn't decompose to
; an lshr.		; an lshr.
; NOTE: the extra move to %r2 should not be needed (temporary FAIL)
define i32 @f21(i32 %foo, i32 *%dest) {		define i32 @f21(i32 %foo, i32 *%dest) {
; CHECK-LABEL: f21:		; CHECK-LABEL: f21:
; CHECK: risbg %r0, %r2, 60, 190, 36		; CHECK: risbg %r2, %r2, 60, 190, 36
; CHECK: lr %r2, %r0
; CHECK: br %r14		; CHECK: br %r14
%shr = ashr i32 %foo, 28		%shr = ashr i32 %foo, 28
store i32 %shr, i32 *%dest		store i32 %shr, i32 *%dest
%and = and i32 %shr, 14		%and = and i32 %shr, 14
ret i32 %and		ret i32 %and
}		}

; ...and again with i64.		; ...and again with i64.
▲ Show 20 Lines • Show All 257 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/vec-cmp-cmp-logic-select.ll

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; CHECK-DAG: vceqb [[REG0:%v[0-9]+]], %v24, %v26		; CHECK-DAG: vceqb [[REG0:%v[0-9]+]], %v24, %v26
; CHECK-DAG: vuphb [[REG2:%v[0-9]+]], [[REG0]]		; CHECK-DAG: vuphb [[REG2:%v[0-9]+]], [[REG0]]
; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], [[REG0]], [[REG0]]		; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], [[REG0]], [[REG0]]
; CHECK-DAG: vuphb [[REG1]], [[REG1]]		; CHECK-DAG: vuphb [[REG1]], [[REG1]]
; CHECK-DAG: vceqh [[REG3:%v[0-9]+]], %v28, %v25		; CHECK-DAG: vceqh [[REG3:%v[0-9]+]], %v28, %v25
; CHECK-DAG: vceqh [[REG4:%v[0-9]+]], %v30, %v27		; CHECK-DAG: vceqh [[REG4:%v[0-9]+]], %v30, %v27
; CHECK-DAG: vl [[REG5:%v[0-9]+]], 176(%r15)		; CHECK-DAG: vl [[REG5:%v[0-9]+]], 176(%r15)
; CHECK-DAG: vl [[REG6:%v[0-9]+]], 160(%r15)		; CHECK-DAG: vl [[REG6:%v[0-9]+]], 160(%r15)
; CHECK-DAG: vo [[REG7:%v[0-9]+]], %v2, [[REG4]]		; CHECK-DAG: vo [[REG7:%v[0-9]+]], [[REG1]], [[REG4]]
; CHECK-DAG: vo [[REG8:%v[0-9]+]], [[REG2]], [[REG3]]		; CHECK-DAG: vo [[REG8:%v[0-9]+]], [[REG2]], [[REG3]]
; CHECK-DAG: vsel %v24, %v29, [[REG6]], [[REG8]]		; CHECK-DAG: vsel %v24, %v29, [[REG6]], [[REG8]]
; CHECK-DAG: vsel %v26, %v31, [[REG5]], [[REG7]]		; CHECK-DAG: vsel %v26, %v31, [[REG5]], [[REG7]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%cmp0 = icmp eq <16 x i8> %val1, %val2		%cmp0 = icmp eq <16 x i8> %val1, %val2
%cmp1 = icmp eq <16 x i16> %val3, %val4		%cmp1 = icmp eq <16 x i16> %val3, %val4
%and = or <16 x i1> %cmp0, %cmp1		%and = or <16 x i1> %cmp0, %cmp1
%sel = select <16 x i1> %and, <16 x i16> %val5, <16 x i16> %val6		%sel = select <16 x i1> %and, <16 x i16> %val5, <16 x i16> %val6
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
}		}

define <8 x i32> @fun10(<8 x i16> %val1, <8 x i16> %val2, <8 x i16> %val3, <8 x i16> %val4, <8 x i32> %val5, <8 x i32> %val6) {		define <8 x i32> @fun10(<8 x i16> %val1, <8 x i16> %val2, <8 x i16> %val3, <8 x i16> %val4, <8 x i32> %val5, <8 x i32> %val6) {
; CHECK-LABEL: fun10:		; CHECK-LABEL: fun10:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-DAG: vceqh [[REG0:%v[0-9]+]], %v24, %v26		; CHECK-DAG: vceqh [[REG0:%v[0-9]+]], %v24, %v26
; CHECK-DAG: vceqh [[REG1:%v[0-9]+]], %v28, %v30		; CHECK-DAG: vceqh [[REG1:%v[0-9]+]], %v28, %v30
; CHECK-NEXT: vx [[REG2:%v[0-9]+]], [[REG0]], [[REG1]]		; CHECK-NEXT: vx [[REG2:%v[0-9]+]], [[REG0]], [[REG1]]
; CHECK-DAG: vuphh [[REG3:%v[0-9]+]], [[REG2]]		; CHECK-NEXT: vuphh [[REG3:%v[0-9]+]], [[REG2]]
; CHECK-DAG: vmrlg [[REG4:%v[0-9]+]], [[REG2]], [[REG2]]		; CHECK-DAG: vmrlg [[REG4:%v[0-9]+]], [[REG2]], [[REG2]]
; CHECK-DAG: vuphh [[REG5:%v[0-9]+]], [[REG4]]		; CHECK-DAG: vuphh [[REG5:%v[0-9]+]], [[REG4]]
; CHECK-NEXT: vsel %v24, %v25, %v29, [[REG3]]		; CHECK-NEXT: vsel %v24, %v25, %v29, [[REG3]]
; CHECK-NEXT: vsel %v26, %v27, %v31, [[REG5]]		; CHECK-NEXT: vsel %v26, %v27, %v31, [[REG5]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%cmp0 = icmp eq <8 x i16> %val1, %val2		%cmp0 = icmp eq <8 x i16> %val1, %val2
%cmp1 = icmp eq <8 x i16> %val3, %val4		%cmp1 = icmp eq <8 x i16> %val3, %val4
%and = xor <8 x i1> %cmp0, %cmp1		%and = xor <8 x i1> %cmp0, %cmp1
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines

define <4 x i64> @fun18(<4 x i32> %val1, <4 x i32> %val2, <4 x i16> %val3, <4 x i16> %val4, <4 x i64> %val5, <4 x i64> %val6) {		define <4 x i64> @fun18(<4 x i32> %val1, <4 x i32> %val2, <4 x i16> %val3, <4 x i16> %val4, <4 x i64> %val5, <4 x i64> %val6) {
; CHECK-LABEL: fun18:		; CHECK-LABEL: fun18:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vceqh %v1, %v28, %v30		; CHECK-NEXT: vceqh %v1, %v28, %v30
; CHECK-NEXT: vceqf %v0, %v24, %v26		; CHECK-NEXT: vceqf %v0, %v24, %v26
; CHECK-NEXT: vuphh %v1, %v1		; CHECK-NEXT: vuphh %v1, %v1
; CHECK-NEXT: vn %v0, %v0, %v1		; CHECK-NEXT: vn %v0, %v0, %v1
; CHECK-DAG: vuphf [[REG0:%v[0-9]+]], %v0		; CHECK-NEXT: vuphf [[REG0:%v[0-9]+]], %v0
; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], %v0, %v0		; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], %v0, %v0
; CHECK-DAG: vuphf [[REG2:%v[0-9]+]], [[REG1]]		; CHECK-DAG: vuphf [[REG2:%v[0-9]+]], [[REG1]]
; CHECK-NEXT: vsel %v24, %v25, %v29, [[REG0]]		; CHECK-NEXT: vsel %v24, %v25, %v29, [[REG0]]
; CHECK-NEXT: vsel %v26, %v27, %v31, [[REG2]]		; CHECK-NEXT: vsel %v26, %v27, %v31, [[REG2]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%cmp0 = icmp eq <4 x i32> %val1, %val2		%cmp0 = icmp eq <4 x i32> %val1, %val2
%cmp1 = icmp eq <4 x i16> %val3, %val4		%cmp1 = icmp eq <4 x i16> %val3, %val4
%and = and <4 x i1> %cmp0, %cmp1		%and = and <4 x i1> %cmp0, %cmp1
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	; CHECK-NEXT: br %r14
ret <2 x i64> %sel		ret <2 x i64> %sel
}		}

define <4 x i32> @fun23(<4 x i64> %val1, <4 x i64> %val2, <4 x i32> %val3, <4 x i32> %val4, <4 x i32> %val5, <4 x i32> %val6) {		define <4 x i32> @fun23(<4 x i64> %val1, <4 x i64> %val2, <4 x i32> %val3, <4 x i32> %val4, <4 x i32> %val5, <4 x i32> %val6) {
; CHECK-LABEL: fun23:		; CHECK-LABEL: fun23:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vceqg %v0, %v26, %v30		; CHECK-NEXT: vceqg %v0, %v26, %v30
; CHECK-NEXT: vceqg %v1, %v24, %v28		; CHECK-NEXT: vceqg %v1, %v24, %v28
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-DAG: vpkg %v0, %v1, %v0
; CHECK-NEXT: vceqf %v1, %v25, %v27		; CHECK-DAG: vceqf [[REG0:%v[0-9]+]], %v25, %v27
; CHECK-NEXT: vx %v0, %v0, %v1		; CHECK-NEXT: vx %v0, %v0, [[REG0]]
; CHECK-NEXT: vsel %v24, %v29, %v31, %v0		; CHECK-NEXT: vsel %v24, %v29, %v31, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%cmp0 = icmp eq <4 x i64> %val1, %val2		%cmp0 = icmp eq <4 x i64> %val1, %val2
%cmp1 = icmp eq <4 x i32> %val3, %val4		%cmp1 = icmp eq <4 x i32> %val3, %val4
%and = xor <4 x i1> %cmp0, %cmp1		%and = xor <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x i32> %val5, <4 x i32> %val6		%sel = select <4 x i1> %and, <4 x i32> %val5, <4 x i32> %val6
ret <4 x i32> %sel		ret <4 x i32> %sel
}		}
Show All 19 Lines	; CHECK-NEXT: br %r14
%and = xor <4 x i1> %cmp0, %cmp1		%and = xor <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x i64> %val5, <4 x i64> %val6		%sel = select <4 x i1> %and, <4 x i64> %val5, <4 x i64> %val6
ret <4 x i64> %sel		ret <4 x i64> %sel
}		}

define <2 x float> @fun25(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4, <2 x float> %val5, <2 x float> %val6) {		define <2 x float> @fun25(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4, <2 x float> %val5, <2 x float> %val6) {
; CHECK-LABEL: fun25:		; CHECK-LABEL: fun25:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-NEXT: vmrlf [[REG0:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-NEXT: vmrlf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-NEXT: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-NEXT: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-NEXT: vfchdb [[REG5:%v[0-9]+]], [[REG1]], [[REG0]]
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-NEXT: vmrhf [[REG2:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-NEXT: vmrhf [[REG3:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-NEXT: vldeb [[REG2]], [[REG2]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG3]], [[REG3]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb [[REG4:%v[0-9]+]], [[REG3]], [[REG2]]
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-DAG: vpkg [[REG7:%v[0-9]+]], [[REG4]], [[REG5]]
; CHECK-NEXT: vfchdb %v1, %v28, %v30		; CHECK-DAG: vfchdb [[REG6:%v[0-9]+]], %v28, %v30
; CHECK-NEXT: vpkg %v1, %v1, %v1		; CHECK-DAG: vpkg [[REG6]], [[REG6]], [[REG6]]
; CHECK-NEXT: vo %v0, %v0, %v1		; CHECK-NEXT: vo %v0, [[REG7]], [[REG6]]
; CHECK-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun25:		; CHECK-Z14-LABEL: fun25:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchdb %v1, %v28, %v30		; CHECK-Z14-NEXT: vfchdb %v1, %v28, %v30
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vpkg %v1, %v1, %v1		; CHECK-Z14-NEXT: vpkg %v1, %v1, %v1
; CHECK-Z14-NEXT: vo %v0, %v0, %v1		; CHECK-Z14-NEXT: vo %v0, %v0, %v1
; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <2 x float> %val1, %val2		%cmp0 = fcmp ogt <2 x float> %val1, %val2
%cmp1 = fcmp ogt <2 x double> %val3, %val4		%cmp1 = fcmp ogt <2 x double> %val3, %val4
%and = or <2 x i1> %cmp0, %cmp1		%and = or <2 x i1> %cmp0, %cmp1
%sel = select <2 x i1> %and, <2 x float> %val5, <2 x float> %val6		%sel = select <2 x i1> %and, <2 x float> %val5, <2 x float> %val6
ret <2 x float> %sel		ret <2 x float> %sel
}		}

define <2 x double> @fun26(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4, <2 x double> %val5, <2 x double> %val6) {		define <2 x double> @fun26(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4, <2 x double> %val5, <2 x double> %val6) {
; CHECK-LABEL: fun26:		; CHECK-LABEL: fun26:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-NEXT: vmrlf [[REG0:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-NEXT: vmrlf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-NEXT: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-NEXT: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-NEXT: vfchdb [[REG2:%v[0-9]+]], [[REG1]], [[REG0]]
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-NEXT: vmrhf [[REG3:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-NEXT: vmrhf [[REG4:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-NEXT: vldeb [[REG3]], [[REG3]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG4]], [[REG4]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb [[REG5:%v[0-9]+]], [[REG4]], [[REG3]]
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-DAG: vpkg [[REG6:%v[0-9]+]], [[REG5]], [[REG2]]
; CHECK-NEXT: vuphf %v0, %v0		; CHECK-DAG: vuphf [[REG6]], [[REG6]]
; CHECK-NEXT: vfchdb %v1, %v28, %v30		; CHECK-DAG: vfchdb [[REG7:%v[0-9]+]], %v28, %v30
; CHECK-NEXT: vo %v0, %v0, %v1		; CHECK-NEXT: vo %v0, [[REG6]], [[REG7]]
; CHECK-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun26:		; CHECK-Z14-LABEL: fun26:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vuphf %v0, %v0		; CHECK-Z14-DAG: vuphf %v0, %v0
; CHECK-Z14-NEXT: vfchdb %v1, %v28, %v30		; CHECK-Z14-DAG: vfchdb %v1, %v28, %v30
; CHECK-Z14-NEXT: vo %v0, %v0, %v1		; CHECK-Z14-NEXT: vo %v0, %v0, %v1
; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <2 x float> %val1, %val2		%cmp0 = fcmp ogt <2 x float> %val1, %val2
%cmp1 = fcmp ogt <2 x double> %val3, %val4		%cmp1 = fcmp ogt <2 x double> %val3, %val4
%and = or <2 x i1> %cmp0, %cmp1		%and = or <2 x i1> %cmp0, %cmp1
%sel = select <2 x i1> %and, <2 x double> %val5, <2 x double> %val6		%sel = select <2 x i1> %and, <2 x double> %val5, <2 x double> %val6
ret <2 x double> %sel		ret <2 x double> %sel
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-NEXT: vldeb %v1, %v1
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-NEXT: vldeb %v2, %v2
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-NEXT: vfchdb %v1, %v2, %v1
; CHECK-NEXT: vmrhf %v2, %v30, %v30		; CHECK-NEXT: vmrhf %v2, %v30, %v30
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-NEXT: vldeb %v2, %v2
; CHECK-NEXT: vldeb %v3, %v3		; CHECK-NEXT: vldeb %v3, %v3
; CHECK-NEXT: vfchdb %v2, %v3, %v2		; CHECK-NEXT: vfchdb %v2, %v3, %v2
; CHECK-NEXT: vpkg %v1, %v2, %v1		; CHECK-NEXT: vpkg %v1, %v2, %v1
; CHECK-NEXT: vx %v0, %v0, %v1		; CHECK-NEXT: vx [[REG1:%v[0-9]+]], %v0, %v1
; CHECK-NEXT: vmrlg %v1, %v0, %v0		; CHECK-DAG: vmrlg [[REG0:%v[0-9]+]], [[REG1]], [[REG1]]
; CHECK-NEXT: vuphf %v1, %v1		; CHECK-DAG: vuphf [[REG2:%v[0-9]+]], [[REG1]]
; CHECK-NEXT: vuphf %v0, %v0		; CHECK-DAG: vuphf [[REG0]], [[REG0]]
; CHECK-NEXT: vsel %v24, %v25, %v29, %v0		; CHECK-NEXT: vsel %v24, %v25, %v29, [[REG2]]
; CHECK-NEXT: vsel %v26, %v27, %v31, %v1		; CHECK-NEXT: vsel %v26, %v27, %v31, [[REG0]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun29:		; CHECK-Z14-LABEL: fun29:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vfchsb %v1, %v28, %v30		; CHECK-Z14-NEXT: vfchsb %v1, %v28, %v30
; CHECK-Z14-NEXT: vx %v0, %v0, %v1		; CHECK-Z14-NEXT: vx %v0, %v0, %v1
; CHECK-Z14-NEXT: vmrlg %v1, %v0, %v0		; CHECK-Z14-DAG: vmrlg [[REG0:%v[0-9]+]], %v0, %v0
; CHECK-Z14-NEXT: vuphf %v1, %v1		; CHECK-Z14-DAG: vuphf [[REG0]], [[REG0]]
; CHECK-Z14-NEXT: vuphf %v0, %v0		; CHECK-Z14-DAG: vuphf [[REG1:%v[0-9]+]], %v0
; CHECK-Z14-NEXT: vsel %v24, %v25, %v29, %v0		; CHECK-Z14-NEXT: vsel %v24, %v25, %v29, [[REG1]]
; CHECK-Z14-NEXT: vsel %v26, %v27, %v31, %v1		; CHECK-Z14-NEXT: vsel %v26, %v27, %v31, [[REG0]]
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <4 x float> %val1, %val2		%cmp0 = fcmp ogt <4 x float> %val1, %val2
%cmp1 = fcmp ogt <4 x float> %val3, %val4		%cmp1 = fcmp ogt <4 x float> %val3, %val4
%and = xor <4 x i1> %cmp0, %cmp1		%and = xor <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x double> %val5, <4 x double> %val6		%sel = select <4 x i1> %and, <4 x double> %val5, <4 x double> %val6
ret <4 x double> %sel		ret <4 x double> %sel
}		}

define <8 x float> @fun30(<8 x float> %val1, <8 x float> %val2, <8 x double> %val3, <8 x double> %val4, <8 x float> %val5, <8 x float> %val6) {		define <8 x float> @fun30(<8 x float> %val1, <8 x float> %val2, <8 x double> %val3, <8 x double> %val4, <8 x float> %val5, <8 x float> %val6) {
; CHECK-LABEL: fun30:		; CHECK-LABEL: fun30:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v16, %v28, %v28		; CHECK-NEXT: vmrlf %v16, %v28, %v28
; CHECK-NEXT: vmrlf %v17, %v24, %v24		; CHECK-NEXT: vmrlf %v17, %v24, %v24
; CHECK-NEXT: vldeb %v16, %v16		; CHECK-NEXT: vldeb %v16, %v16
; CHECK-NEXT: vldeb %v17, %v17		; CHECK-NEXT: vldeb %v17, %v17
; CHECK-NEXT: vfchdb %v16, %v17, %v16		; CHECK-NEXT: vfchdb %v16, %v17, %v16
; CHECK-NEXT: vmrhf %v17, %v28, %v28		; CHECK-NEXT: vmrhf %v17, %v28, %v28
; CHECK-NEXT: vmrhf %v18, %v24, %v24		; CHECK-NEXT: vmrhf %v18, %v24, %v24
; CHECK-NEXT: vldeb %v17, %v17		; CHECK-NEXT: vldeb %v17, %v17
; CHECK-NEXT: vl %v4, 192(%r15)		; CHECK-DAG: vl [[REG0:%v[0-9]+]], 192(%r15)
; CHECK-NEXT: vldeb %v18, %v18		; CHECK-DAG: vldeb %v18, %v18
; CHECK-NEXT: vl %v5, 208(%r15)		; CHECK-DAG: vl [[REG1:%v[0-9]+]], 208(%r15)
; CHECK-NEXT: vl %v6, 160(%r15)		; CHECK-DAG: vl [[REG2:%v[0-9]+]], 160(%r15)
; CHECK-NEXT: vl %v7, 176(%r15)		; CHECK-DAG: vl [[REG3:%v[0-9]+]], 176(%r15)
; CHECK-NEXT: vl %v0, 272(%r15)		; CHECK-DAG: vl [[REG4:%v[0-9]+]], 272(%r15)
; CHECK-NEXT: vl %v1, 240(%r15)		; CHECK-DAG: vl [[REG5:%v[0-9]+]], 240(%r15)
; CHECK-NEXT: vfchdb %v17, %v18, %v17		; CHECK-DAG: vfchdb %v17, %v18, %v17
; CHECK-NEXT: vl %v2, 256(%r15)		; CHECK-DAG: vl [[REG6:%v[0-9]+]], 256(%r15)
; CHECK-NEXT: vl %v3, 224(%r15)		; CHECK-DAG: vl [[REG7:%v[0-9]+]], 224(%r15)
; CHECK-NEXT: vpkg %v16, %v17, %v16		; CHECK-NEXT: vpkg %v16, %v17, %v16
; CHECK-NEXT: vmrlf %v17, %v30, %v30		; CHECK-NEXT: vmrlf %v17, %v30, %v30
; CHECK-NEXT: vmrlf %v18, %v26, %v26		; CHECK-NEXT: vmrlf %v18, %v26, %v26
; CHECK-NEXT: vmrhf %v19, %v26, %v26		; CHECK-NEXT: vmrhf %v19, %v26, %v26
; CHECK-NEXT: vfchdb %v7, %v27, %v7		; CHECK-NEXT: vfchdb [[REG8:%v[0-9]+]], %v27, [[REG3]]
; CHECK-NEXT: vfchdb %v6, %v25, %v6		; CHECK-NEXT: vfchdb [[REG9:%v[0-9]+]], %v25, [[REG2]]
; CHECK-NEXT: vfchdb %v5, %v31, %v5		; CHECK-NEXT: vfchdb [[REG10:%v[0-9]+]], %v31, [[REG1]]
; CHECK-NEXT: vfchdb %v4, %v29, %v4		; CHECK-NEXT: vfchdb [[REG11:%v[0-9]+]], %v29, [[REG0]]
; CHECK-NEXT: vpkg %v6, %v6, %v7		; CHECK-NEXT: vpkg [[REG12:%v[0-9]+]], [[REG9]], [[REG8]]
; CHECK-NEXT: vpkg %v4, %v4, %v5		; CHECK-NEXT: vpkg [[REG13:%v[0-9]+]], [[REG11]], [[REG10]]
; CHECK-NEXT: vn %v5, %v16, %v6		; CHECK-NEXT: vn [[REG14:%v[0-9]+]], %v16, [[REG12]]
; CHECK-NEXT: vsel %v24, %v3, %v2, %v5		; CHECK-NEXT: vsel %v24, [[REG7]], [[REG6]], [[REG14]]
; CHECK-NEXT: vldeb %v17, %v17		; CHECK-NEXT: vldeb %v17, %v17
; CHECK-NEXT: vldeb %v18, %v18		; CHECK-NEXT: vldeb %v18, %v18
; CHECK-NEXT: vfchdb %v17, %v18, %v17		; CHECK-NEXT: vfchdb %v17, %v18, %v17
; CHECK-NEXT: vmrhf %v18, %v30, %v30		; CHECK-NEXT: vmrhf %v18, %v30, %v30
; CHECK-NEXT: vldeb %v18, %v18		; CHECK-NEXT: vldeb %v18, %v18
; CHECK-NEXT: vldeb %v19, %v19		; CHECK-NEXT: vldeb %v19, %v19
; CHECK-NEXT: vfchdb %v18, %v19, %v18		; CHECK-NEXT: vfchdb %v18, %v19, %v18
; CHECK-NEXT: vpkg %v17, %v18, %v17		; CHECK-NEXT: vpkg %v17, %v18, %v17
; CHECK-NEXT: vn %v4, %v17, %v4		; CHECK-NEXT: vn [[REG15:%v[0-9]+]], %v17, [[REG13]]
; CHECK-NEXT: vsel %v26, %v1, %v0, %v4		; CHECK-NEXT: vsel %v26, [[REG5]], [[REG4]], [[REG15]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun30:		; CHECK-Z14-LABEL: fun30:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vl %v4, 192(%r15)		; CHECK-Z14-DAG: vl [[REG0:%v[0-9]+]], 192(%r15)
; CHECK-Z14-NEXT: vl %v5, 208(%r15)		; CHECK-Z14-DAG: vl [[REG1:%v[0-9]+]], 208(%r15)
; CHECK-Z14-NEXT: vl %v6, 160(%r15)		; CHECK-Z14-DAG: vl [[REG2:%v[0-9]+]], 160(%r15)
; CHECK-Z14-NEXT: vl %v7, 176(%r15)		; CHECK-Z14-DAG: vl [[REG3:%v[0-9]+]], 176(%r15)
; CHECK-Z14-NEXT: vfchdb %v7, %v27, %v7		; CHECK-Z14-NEXT: vfchdb [[REG4:%v[0-9]+]], %v27, [[REG3]]
; CHECK-Z14-NEXT: vfchdb %v6, %v25, %v6		; CHECK-Z14-NEXT: vfchdb [[REG5:%v[0-9]+]], %v25, [[REG2]]
; CHECK-Z14-NEXT: vfchdb %v5, %v31, %v5		; CHECK-Z14-NEXT: vfchdb [[REG6:%v[0-9]+]], %v31, [[REG1]]
; CHECK-Z14-NEXT: vfchdb %v4, %v29, %v4		; CHECK-Z14-NEXT: vfchdb [[REG7:%v[0-9]+]], %v29, [[REG0]]
; CHECK-Z14-NEXT: vfchsb %v16, %v24, %v28		; CHECK-Z14-NEXT: vfchsb [[REG8:%v[0-9]+]], %v24, %v28
; CHECK-Z14-NEXT: vfchsb %v17, %v26, %v30		; CHECK-Z14-NEXT: vfchsb [[REG9:%v[0-9]+]], %v26, %v30
; CHECK-Z14-NEXT: vpkg %v6, %v6, %v7		; CHECK-Z14-NEXT: vpkg [[REG10:%v[0-9]+]], [[REG5]], [[REG4]]
; CHECK-Z14-NEXT: vpkg %v4, %v4, %v5		; CHECK-Z14-NEXT: vpkg [[REG11:%v[0-9]+]], [[REG7]], [[REG6]]
; CHECK-Z14-NEXT: vl %v0, 272(%r15)		; CHECK-Z14-NEXT: vl [[REG12:%v[0-9]+]], 272(%r15)
; CHECK-Z14-NEXT: vl %v1, 240(%r15)		; CHECK-Z14-NEXT: vl [[REG13:%v[0-9]+]], 240(%r15)
; CHECK-Z14-NEXT: vl %v2, 256(%r15)		; CHECK-Z14-NEXT: vl [[REG14:%v[0-9]+]], 256(%r15)
; CHECK-Z14-NEXT: vl %v3, 224(%r15)		; CHECK-Z14-NEXT: vl [[REG15:%v[0-9]+]], 224(%r15)
; CHECK-Z14-NEXT: vn %v4, %v17, %v4		; CHECK-Z14-NEXT: vn [[REG16:%v[0-9]+]], [[REG9]], [[REG11]]
; CHECK-Z14-NEXT: vn %v5, %v16, %v6		; CHECK-Z14-NEXT: vn [[REG17:%v[0-9]+]], [[REG8]], [[REG10]]
; CHECK-Z14-NEXT: vsel %v24, %v3, %v2, %v5		; CHECK-Z14-NEXT: vsel %v24, [[REG15]], [[REG14]], [[REG17]]
; CHECK-Z14-NEXT: vsel %v26, %v1, %v0, %v4		; CHECK-Z14-NEXT: vsel %v26, [[REG13]], [[REG12]], [[REG16]]
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <8 x float> %val1, %val2		%cmp0 = fcmp ogt <8 x float> %val1, %val2
%cmp1 = fcmp ogt <8 x double> %val3, %val4		%cmp1 = fcmp ogt <8 x double> %val3, %val4
%and = and <8 x i1> %cmp0, %cmp1		%and = and <8 x i1> %cmp0, %cmp1
%sel = select <8 x i1> %and, <8 x float> %val5, <8 x float> %val6		%sel = select <8 x i1> %and, <8 x float> %val5, <8 x float> %val6
ret <8 x float> %sel		ret <8 x float> %sel
}		}

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
; CHECK-NEXT: vn %v0, %v0, %v1		; CHECK-NEXT: vn %v0, %v0, %v1
; CHECK-NEXT: vsel %v24, %v29, %v31, %v0		; CHECK-NEXT: vsel %v24, %v29, %v31, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun33:		; CHECK-Z14-LABEL: fun33:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchdb %v0, %v26, %v30		; CHECK-Z14-NEXT: vfchdb %v0, %v26, %v30
; CHECK-Z14-NEXT: vfchdb %v1, %v24, %v28		; CHECK-Z14-NEXT: vfchdb %v1, %v24, %v28
; CHECK-Z14-NEXT: vpkg %v0, %v1, %v0		; CHECK-Z14-DAG: vpkg %v0, %v1, %v0
; CHECK-Z14-NEXT: vfchsb %v1, %v25, %v27		; CHECK-Z14-DAG: vfchsb [[REG0:%v[0-9]+]], %v25, %v27
; CHECK-Z14-NEXT: vn %v0, %v0, %v1		; CHECK-Z14-NEXT: vn %v0, %v0, [[REG0]]
; CHECK-Z14-NEXT: vsel %v24, %v29, %v31, %v0		; CHECK-Z14-NEXT: vsel %v24, %v29, %v31, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <4 x double> %val1, %val2		%cmp0 = fcmp ogt <4 x double> %val1, %val2
%cmp1 = fcmp ogt <4 x float> %val3, %val4		%cmp1 = fcmp ogt <4 x float> %val3, %val4
%and = and <4 x i1> %cmp0, %cmp1		%and = and <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x float> %val5, <4 x float> %val6		%sel = select <4 x i1> %and, <4 x float> %val5, <4 x float> %val6
ret <4 x float> %sel		ret <4 x float> %sel
}		}
Show All 22 Lines
; CHECK-NEXT: vn [[REG18:%v[0-9]+]], [[REG16]], [[REG17]]		; CHECK-NEXT: vn [[REG18:%v[0-9]+]], [[REG16]], [[REG17]]
; CHECK-NEXT: vn [[REG19:%v[0-9]+]], [[REG15]], [[REG13]]		; CHECK-NEXT: vn [[REG19:%v[0-9]+]], [[REG15]], [[REG13]]
; CHECK-NEXT: vsel %v24, %v29, [[REG10]], [[REG19]]		; CHECK-NEXT: vsel %v24, %v29, [[REG10]], [[REG19]]
; CHECK-NEXT: vsel %v26, %v31, [[REG8]], [[REG18]]		; CHECK-NEXT: vsel %v26, %v31, [[REG8]], [[REG18]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun34:		; CHECK-Z14-LABEL: fun34:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v4, %v25, %v27		; CHECK-Z14-NEXT: vfchsb [[REG0:%v[0-9]+]], %v25, %v27
; CHECK-Z14-NEXT: vuphf %v5, %v4		; CHECK-Z14-NEXT: vuphf %v5, [[REG0]]
; CHECK-Z14-NEXT: vmrlg %v4, %v4, %v4		; CHECK-Z14-NEXT: vmrlg [[REG0]], [[REG0]], [[REG0]]
; CHECK-Z14-NEXT: vfchdb %v2, %v24, %v28		; CHECK-Z14-NEXT: vfchdb [[REG1:%v[0-9]+]], %v24, %v28
; CHECK-Z14-NEXT: vfchdb %v3, %v26, %v30		; CHECK-Z14-NEXT: vfchdb [[REG2:%v[0-9]+]], %v26, %v30
; CHECK-Z14-NEXT: vuphf %v4, %v4		; CHECK-Z14-NEXT: vuphf [[REG0]], [[REG0]]
; CHECK-Z14-NEXT: vl %v0, 176(%r15)		; CHECK-Z14-NEXT: vl [[REG3:%v[0-9]+]], 176(%r15)
; CHECK-Z14-NEXT: vl %v1, 160(%r15)		; CHECK-Z14-NEXT: vl [[REG4:%v[0-9]+]], 160(%r15)
; CHECK-Z14-NEXT: vn %v3, %v3, %v4		; CHECK-Z14-NEXT: vn [[REG5:%v[0-9]+]], [[REG2]], [[REG0]]
; CHECK-Z14-NEXT: vn %v2, %v2, %v5		; CHECK-Z14-NEXT: vn [[REG6:%v[0-9]+]], [[REG1]], %v5
; CHECK-Z14-NEXT: vsel %v24, %v29, %v1, %v2		; CHECK-Z14-NEXT: vsel %v24, %v29, [[REG4]], [[REG6]]
; CHECK-Z14-NEXT: vsel %v26, %v31, %v0, %v3		; CHECK-Z14-NEXT: vsel %v26, %v31, [[REG3]], [[REG5]]
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <4 x double> %val1, %val2		%cmp0 = fcmp ogt <4 x double> %val1, %val2
%cmp1 = fcmp ogt <4 x float> %val3, %val4		%cmp1 = fcmp ogt <4 x float> %val3, %val4
%and = and <4 x i1> %cmp0, %cmp1		%and = and <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x double> %val5, <4 x double> %val6		%sel = select <4 x i1> %and, <4 x double> %val5, <4 x double> %val6
ret <4 x double> %sel		ret <4 x double> %sel
}		}

This is an archive of the discontinued LLVM Phabricator instance.

MIScheduler improved handling of copied physregsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 132987

include/llvm/CodeGen/MachineScheduler.h

lib/CodeGen/MachineScheduler.cpp

test/CodeGen/SystemZ/alloca-01.ll

test/CodeGen/SystemZ/args-06.ll

test/CodeGen/SystemZ/args-10.ll

test/CodeGen/SystemZ/args-11.mir

test/CodeGen/SystemZ/cond-move-02.ll

test/CodeGen/SystemZ/risbg-01.ll

test/CodeGen/SystemZ/vec-cmp-cmp-logic-select.ll

MIScheduler improved handling of copied physregs
Needs ReviewPublic