This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Favor 3-address instructions during instruction selection.
ClosedPublic

Authored by jonpa on Apr 18 2019, 1:47 PM.

Download Raw Diff

Details

Reviewers

uweigand
qcolombet

Summary

This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are significantly improved generally.

Instructions / Tablegen:

The ...AndK instruction classes are modified to select the K instruction instead of the 2-address equivalent one.
The isConvertibleToThreeAddress flags on the 2-address instructions have been removed, since it did not seem to have any use (not sure if it should actually be kept for any future possible use?).
The getThreeOperandOpcode instruction mapping has been replaced by the inverse getTwoOperandOpcode mapping. This is wrapped by SystemZInstrInfo::get2AddrOpcode() so that it can be accessed outside of SystemZInstrInfo.cpp. The generated function returns -1 if there is no mapped instruction, and this is I hope future-safe to use like done now.

convertToThreeAddress()

Code that used getThreeOperandOpcode() removed.
finishConvertToThreeAddress() inlined.
The And -> RISBG conversions are not yet modified. Would this be worth handling as well (it seems the instructions are of the same cost)?

RegAlloc

Hinting the potentially tied register, which helps a bit.

SystemZShortenInst

Handling added to convert to 2-address instruction when possible.

Opcode counts Impact (truncated after topmost 20 opcodes):

master <> "*only* changing isel to select 3-address"

agrk           :                 7117                42856   +35739
ahi            :                34952                  485   -34467
ahik           :                26316                59979   +33663
agr            :                33596                    0   -33596
ark            :                 4972                24636   +19664
ar             :                18918                    0   -18918
aghik          :                 4980                23689   +18709
sgrk           :                 9026                27288   +18262
sgr            :                18059                    0   -18059
sll            :                15965                  241   -15724
sllk           :                 4607                20324   +15717
aghi           :                37856                28011    -9845
jne            :                13952                22838    +8886
brctg          :                 8727                  124    -8603
srk            :                 3719                 9855    +6136
sr             :                 6008                    0    -6008
lgr            :               354991               350453    -4538
srl            :                 4318                   44    -4274
srlk           :                 3097                 7366    +4269
Spill|Reload   :               188117               186340    -1777

"isel 3-address" <> "isel 3-address + handling to shorten instructions whenever possible"

ahik           :                59979                31669   -28310
ahi            :                  485                27587   +27102
aghik          :                23689                 7215   -16474
sllk           :                20324                 6717   -13607
sll            :                  241                13848   +13607
agrk           :                42856                32331   -10525
agr            :                    0                10525   +10525
sgr            :                    0                 9643    +9643
sgrk           :                27288                17645    -9643
ar             :                    0                 9182    +9182
ark            :                24636                15454    -9182
jne            :                22838                13985    -8853
brctg          :                  124                 8715    +8591
aghi           :                28011                35894    +7883
srk            :                 9855                 5566    -4289
sr             :                    0                 4289    +4289
srl            :                   44                 3475    +3431
srlk           :                 7366                 3935    -3431
ngr            :                  668                 2662    +1994

"isel 3-address + shortening" <> "isel 3-address + shortening + regalloc hints"

sgr            :                 9643                12424    +2781
sgrk           :                17645                14864    -2781
agrk           :                32331                30205    -2126
agr            :                10525                12648    +2123
ahi            :                27587                28435     +848
ahik           :                31669                30826     -843
aghik          :                 7215                 6550     -665
sr             :                 4289                 4825     +536
srk            :                 5566                 5032     -534
ar             :                 9182                 9695     +513
ark            :                15454                14945     -509
aghi           :                35894                36359     +465
sll            :                13848                14079     +231
sllk           :                 6717                 6490     -227
srak           :                 1407                 1199     -208
sra            :                 1943                 2151     +208
srl            :                 3475                 3652     +177
srlk           :                 3935                 3760     -175
lgr            :               350453               350599     +146
Spill|Reload   :               186340               186303      -37

Not sure exactly why, but the BRCTX conversions really suffered for just doing three-address, but this problem seems to disappear with shortening in place.

Diff Detail

Event Timeline

jonpa created this revision.Apr 18 2019, 1:47 PM

This looks really promising, in particular the reduction in spills and copies. Can you check that this also addresses the problem described here: https://reviews.llvm.org/D22011 ?

The ...AndK instruction classes are modified to select the K instruction instead of the 2-address equivalent one.

There is one problem here: the K instructions are only available with the DistinctOps facility (i.e. starting with z196). So if that facility isn't available, we have to be able to emit the original two-address forms only.

The isConvertibleToThreeAddress flags on the 2-address instructions have been removed, since it did not seem to have any use (not sure if it should actually be kept for any future possible use?).

This flag is just a signal to the generic TwoAddressInstructionPass that this is an instruction it should consider (it ignores everything without the flag). So since we no longer want that pass to consider these instructions, we actually should remove the flag.

The getThreeOperandOpcode instruction mapping has been replaced by the inverse getTwoOperandOpcode mapping. This is wrapped by SystemZInstrInfo::get2AddrOpcode() so that it can be accessed outside of SystemZInstrInfo.cpp. The generated function returns -1 if there is no mapped instruction, and this is I hope future-safe to use like done now.

Not sure why this needs to be a wrapper; wouldn't it be enough to just add a prototype for SystemZ::getTwoOperandOpcode to a header file (SystemZInstrInfo.h would make sense)? In any case, if it has to be a wrapper it should at least be static; it doesn't actually use any concrete TII instance.

The And -> RISBG conversions are not yet modified. Would this be worth handling as well (it seems the instructions are of the same cost)?

Possibly. On the other hand I'm not sure it would make much of a difference ....

Not sure exactly why, but the BRCTX conversions really suffered for just doing three-address, but this problem seems to disappear with shortening in place.

Well, because SystemZElimCompare::convertToBRCT only triggers if it detects an A(G)HI on a loop counter. If there are only A(G)HIK instructions, it can never do anything.

lib/Target/SystemZ/SystemZRegisterInfo.cpp
163	Just a tiny nit: can't we check for reserved registers and already hinted registers in the loop above? Then we could eliminate the CopyHints variable and get rid of one copy of the whole array.
lib/Target/SystemZ/SystemZShortenInst.cpp
310	So all shifts ignore everything but the low 6 bits of the shift count anyway. This means we can always convert a SLLK to SLL, we just may have to truncate the constant.

Most points in review inocoorporated, except for the getRegAllocationHints() -- see inlined comment.

This looks really promising, in particular the reduction in spills and copies. Can you check that this also addresses the problem described here: https://reviews.llvm.org/D22011 ?

I beleive so - at least the two test functions there are now improved to use lghi/lgfi + sgrk :-)

There is one problem here: the K instructions are only available with the DistinctOps facility (i.e. starting with z196). So if that facility isn't available, we have to be able to emit the original two-address forms only.

I added back the pattern for the 2-address instruction, which did not affect codegen on z13. I think this should be guaranteed by the order of the instruction defs in the Tablegen file, but I am not quite sure (The "K" is defined before "" in the multidef).

Not sure why this needs to be a wrapper; wouldn't it be enough to just add a prototype for SystemZ::getTwoOperandOpcode to a header file (SystemZInstrInfo.h would make sense)? In any case, if it has to be a wrapper it should at least be static; it doesn't actually use any concrete TII instance.

Ah, yes you are right. I was under the delusion that this function was generated as part of SystemZInstrInfo, while it is actually just a function in the SystemZ namespace.

Possibly. On the other hand I'm not sure it would make much of a difference ....

OK, I'll wait with the And / RISBG instructions then (added a TODO comment in convertToThreeAddress).

jonpa added inline comments.Apr 22 2019, 12:17 PM

lib/Target/SystemZ/SystemZRegisterInfo.cpp
163	Not sure how we could eleiminate CopyHints... Perhaps it would be better to use is_contained(Hints, PhysReg) as is done in AllocationOrder::isHint(), but if we do that maybe we should also change addHints(). The TwoAddrHints set is needed in the second loop to iterate over Order so that the hinted regs are sorted by it (Order), which I think is expected. Not sure what "copy of the whole array" we could get rid of...
lib/Target/SystemZ/SystemZShortenInst.cpp
310	Ah, yes, forgot that. This gave ~100 more 2-address instructions. At least the assembler complains if we do not truncate the immediates.

Whitespace fixes.

In D60888#1474556, @jonpa wrote:

This looks really promising, in particular the reduction in spills and copies. Can you check that this also addresses the problem described here: https://reviews.llvm.org/D22011 ?

I beleive so - at least the two test functions there are now improved to use lghi/lgfi + sgrk :-)

Excellent. It would be good to add those as new test cases for this patch then.

There is one problem here: the K instructions are only available with the DistinctOps facility (i.e. starting with z196). So if that facility isn't available, we have to be able to emit the original two-address forms only.

I added back the pattern for the 2-address instruction, which did not affect codegen on z13. I think this should be guaranteed by the order of the instruction defs in the Tablegen file, but I am not quite sure (The "K" is defined before "" in the multidef).

I think it should indeed work by order in the Tablegen file. We already rely on that e.g. to match (scalar) FP vector instructions before FP instructions, I believe.

lib/Target/SystemZ/SystemZRegisterInfo.cpp
163	I mean this statment: CopyHints.insert(Hints.begin(), Hints.end()); which does copy the whole Hints array into a set. This may not be a big deal since the array is typically small, I just thought it could be easily avoided by indeed using something like a is_contained(Hints, PhysReg) check in the first loop. I agree we need the second loop in any case. (addHints() seems different since here we need to clear/change the existing Hints array anyway and therefore we have to a copy somewhere.)

@Quentin: This has the same common-code change as in D58923, with the added VRM to foldMemoryOperand(). You seemed fine with this change, right?

@uweigand:

Excellent. It would be good to add those as new test cases for this patch then.

Added as test/CodeGen/SystemZ/int-sub-11.ll

All failing tests updated to pass -- Comments:

int-add-05.ll / f9()

The MBB looks like:

256B      %12:gr64bit = LA %11:addr64bit, 0, %1:addr64bit
272B      %13:gr64bit = AGRK %12:gr64bit, %2:gr64bit, implicit-def dead $cc
288B      %14:gr64bit = AGRK %13:gr64bit, %3:gr64bit, implicit-def dead $cc
304B      %15:gr64bit = AGRK %14:gr64bit, %4:gr64bit, implicit-def dead $cc
320B      %16:gr64bit = AGRK %15:gr64bit, %5:gr64bit, implicit-def dead $cc
336B      %17:gr64bit = AGRK %16:gr64bit, %6:gr64bit, implicit-def dead $cc
352B      %18:gr64bit = AGRK %17:gr64bit, %7:gr64bit, implicit-def dead $cc
368B      %19:gr64bit = AGRK %18:gr64bit, %8:gr64bit, implicit-def dead $cc
384B      %20:gr64bit = AGRK %19:gr64bit, %9:gr64bit, implicit-def dead $cc
400B      %21:gr64bit = AGRK %20:gr64bit, %10:gr64bit, implicit-def dead $cc
416B      $r2d = COPY %21:gr64bit
432B      Return implicit $r2d

Greedy:

%21 -> $r2d
%12 -> $r0d // does not see that $r2d would be better
%13-%19 hinted $r0d
%20 hinted %r0d, %r2d
%21 hinted %r2d

%10 gets spilled, which is a loss compared to trunk.

On trunk, this looks like

256B      %14:gr64bit = LA %11:addr64bit, 0, %1:addr64bit
288B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %2:gr64bit, implicit-def dead $cc
320B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %3:gr64bit, implicit-def dead $cc
352B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %4:gr64bit, implicit-def dead $cc
384B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %5:gr64bit, implicit-def dead $cc
416B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %6:gr64bit, implicit-def dead $cc
448B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %7:gr64bit, implicit-def dead $cc
480B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %8:gr64bit, implicit-def dead $cc
512B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %9:gr64bit, implicit-def dead $cc
544B      %14:gr64bit = AGR %14:gr64bit(tied-def 0), %10:gr64bit, implicit-def dead $cc
560B      $r2d = COPY %14:gr64bit
576B      Return implicit $r2d

%r14 hinted $r2d...
It seems that on trunk we get the COPY-hint ($r2d) "for free", since all those instructions operate on the same virtreg. With this patch, we fail to recognize of using $r2d for %12, since we are not searching through the def/use chains to detect the COPY-hint. I have not tried to handle this and do not know how practical/beneficial it would be, but my feeling is that it does not seem quite natural to attempt it.

ctpop-01.ll/f2() is similar, although the effect is just a K instruction that could have been avoided.
store_nonbytesized_vecs.ll: Updated by adding register variables and DAG matches, but this test case is tough to maintain... simplify?

Fixing the failing tests proved a healthy excercise as several needed improvements were spotted and added:

foldMemoryOperandImpl() extended to handle the case where the K instruction has the same dst and LHS operands, and can therefore be converted to use a mem op.
The instructions that are commutable can be shortened after commutation if the RHS matches the dst reg. Regalloc hints are also passed for RHS in these cases.

These new improvements give this further change against last version of patch:

agrk           :                30105                 8818   -21287
agr            :                12637                33635   +20998
ark            :                14931                 5049    -9882
ar             :                 9657                19257    +9600
sgrk           :                14854                11443    -3411
ahi            :                28649                31971    +3322
sgr            :                12530                15819    +3289
ahik           :                30574                27412    -3162
nrk            :                 2198                  427    -1771
nr             :                 1270                 2978    +1708
ork            :                 2767                 1316    -1451
or             :                 1512                 2848    +1336
sll            :                14120                15228    +1108
sllk           :                 6439                 5334    -1105
srk            :                 4981                 3956    -1025
sr             :                 4877                 5784     +907
lg             :               374694               373819     -875
l              :                73998                73368     -630
srl            :                 3715                 4149     +434
Spill|Reload   :               186524               185994     -530

Against master it now looks like:

lgr            :               353422               348929    -4493
lr             :                31515                28410    -3105
ahi            :                35006                31971    -3035
sgrk           :                 8800                11443    +2643
sgr            :                18364                15819    -2545
risbhg         :                 1968                 4347    +2379
ag             :                11873                10021    -1852
aghik          :                 4616                 6454    +1838
aghi           :                39738                37970    -1768
agrk           :                 7070                 8818    +1748
aih            :                  477                 2108    +1631
ahik           :                26178                27412    +1234
risblg         :                 6803                 7749     +946
stg            :               140290               139357     -933
stfh           :                 2538                 3386     +848
sll            :                15961                15228     -733
sllk           :                 4605                 5334     +729
st             :                60347                59642     -705
lfh            :                 2215                 2723     +508
...
Spill|Reload   :               188303               185994    -2309

:-)

Herald added subscribers: eraman, javed.absar. · View Herald TranscriptApr 26 2019, 11:28 AM

jonpa added inline comments.Apr 26 2019, 11:31 AM

lib/Target/SystemZ/SystemZRegisterInfo.cpp
163	Changed to use is_contained() instead.

> %12 -> $r0d // does not see that $r2d would be better

Is this a hint, or is this a choice made by the register allocator in the absence of all hints? I would have expected that there would be no hint due to the LA, but the hints propagate backwards from the end of the AGRK chain ...

But in any case, this doesn't seem to be a big deal. Otherwise, this all looks quite good to me ...

In D60888#1480819, @uweigand wrote:
> %12 -> $r0d // does not see that $r2d would be better
Is this a hint, or is this a choice made by the register allocator in the absence of all hints? I would have expected that there would be no hint due to the LA, but the hints propagate backwards from the end of the AGRK chain ...

But in any case, this doesn't seem to be a big deal. Otherwise, this all looks quite good to me ...

This is just the choice made without any hints.

As explained previously, there is no propagation of hints from the COPY because that type of search is not performed. Not sure if I should attempt that...

In D60888#1480854, @jonpa wrote:

As explained previously, there is no propagation of hints from the COPY because that type of search is not performed. Not sure if I should attempt that...

Well, I wasn't sure how the hinting mechanism iterates. I'd have thought the following might be possible:

First, because of this instruction:

560B      $r2d = COPY %21:gr64bit

Because of that, and this instruction:

400B      %21:gr64bit = AGRK %20:gr64bit, %10:gr64bit, implicit-def dead $cc

we now get a new hint for register %r20 as $r2d

... and so forth backwards through the AGRK chain.

Well, I wasn't sure how the hinting mechanism iterates. I'd have thought the following might be possible:

First, because of this instruction:
560B      $r2d = COPY %21:gr64bit
register %r21 is hinted as $r2d.

Because of that, and this instruction:
400B      %21:gr64bit = AGRK %20:gr64bit, %10:gr64bit, implicit-def dead $cc
we now get a new hint for register %r20 as $r2d

... and so forth backwards through the AGRK chain.

Sorry if I was not clear in my description, but what I meant to illustrate is that the RA allocates the VRegs in the order I listed them. So first %21 is allocated $r2d, and then the next VReg assigned is %12, and then %13, %14, ..., %20, %21. So it seems to me that in order for %12 to be hinted $r2d, getRegAlloctaionHints() would have to try and find that COPY hint for %21, supposedly by means of considering the 3->2 address convertible instruction uses that leads to it. In this case the allocation order was such that it gave "bad luck" and did not propagate this naturally.

Sorry if I was not clear in my description, but what I meant to illustrate is that the RA allocates the VRegs in the order I listed them. So first %21 is allocated $r2d, and then the next VReg assigned is %12, and then %13, %14, ..., %20, %21.

I guess my question is, why is RA allocating VRegs in this particular order? If it is first allocating %21, it seems there is some understanding that it makes sense to allocate it since we have a hint. But why is then %12 assigned next? The allocation of %21 should have made a new hint available for %20, so wouldn't it make more sense to now attempt to allocate %20 next?

In D60888#1480961, @uweigand wrote:

Sorry if I was not clear in my description, but what I meant to illustrate is that the RA allocates the VRegs in the order I listed them. So first %21 is allocated $r2d, and then the next VReg assigned is %12, and then %13, %14, ..., %20, %21.

I guess my question is, why is RA allocating VRegs in this particular order? If it is first allocating %21, it seems there is some understanding that it makes sense to allocate it since we have a hint. But why is then %12 assigned next? The allocation of %21 should have made a new hint available for %20, so wouldn't it make more sense to now attempt to allocate %20 next?

This makes sense to me, except I can't find anything in RegAllocGreedy that does this. What I see is that

calculateSpillWeightsAndHints() finds the (multiple) COPY hints.
seedLiveRegs() calls enqueue() on each LiveInterval which has

void RAGreedy::enqueue(PQueue &CurQueue, LiveInterval *LI) {
...
    // Boost ranges that have a physical register hint.
    if (VRM->hasKnownPreference(Reg))
      Prio |= (1u << 30);
...
}

, so any VirtReg with a hint (mapped to a physreg) gets a higher priority and is allocated earlier.

2b. SystemZ::getRegAllocationHints() is called when finding the AllocationOrder for the VirtReg being allocated by selectOrSplit(). This is done by common code looking at the previously added COPY/Target hints. Then the SystemZ method also adds hints for LOCR etc...

I was looking for something that when a VirtReg_A has been allocated, the VirtRegs that are hinting VirtReg_A should now get their priorities recomputed (by calling dequeue() + enqueue() on them). This is however not done. Not really sure how feasible this would be beyond the simple test case...

*If* (3) would be done, it would be (at least in this simple example) enough to add a hint for VirtReg_A at some point before allocatePhysRegs(). In our test case:

256B      %12:gr64bit = LA %11:addr64bit, 0, %1:addr64bit
272B      %13:gr64bit = AGRK %12:gr64bit, %2:gr64bit, implicit-def dead $cc
288B      %14:gr64bit = AGRK %13:gr64bit, %3:gr64bit, implicit-def dead $cc
304B      %15:gr64bit = AGRK %14:gr64bit, %4:gr64bit, implicit-def dead $cc
320B      %16:gr64bit = AGRK %15:gr64bit, %5:gr64bit, implicit-def dead $cc
336B      %17:gr64bit = AGRK %16:gr64bit, %6:gr64bit, implicit-def dead $cc
352B      %18:gr64bit = AGRK %17:gr64bit, %7:gr64bit, implicit-def dead $cc
368B      %19:gr64bit = AGRK %18:gr64bit, %8:gr64bit, implicit-def dead $cc
384B      %20:gr64bit = AGRK %19:gr64bit, %9:gr64bit, implicit-def dead $cc
400B      %21:gr64bit = AGRK %20:gr64bit, %10:gr64bit, implicit-def dead $cc
416B      $r2d = COPY %21:gr64bit

, we could add a simple hint of %21 for %20, and I think it would resolve. This would not be done by SystemZRegisterInfo::getRegAllocationHints(), but as done by other targets before the allocation actually begins somehow.

I am however not sure this is truly satisfactory. In the general case we could hint %21, %19 and %9 for %20, but VirtRegMap::hasKnownPreference() only checks for the first hint:

bool VirtRegMap::hasKnownPreference(unsigned VirtReg) {
  std::pair<unsigned, unsigned> Hint = MRI->getRegAllocationHint(VirtReg);
...
  if (TargetRegisterInfo::isVirtualRegister(Hint.second))
    return hasPhys(Hint.second);
...
}

(3) + (4) seems like a potential improvement to me, but I am not sure about the best way to proceed. Should we try just a single target hint for these instructions, and if so, which register?

Should this perhaps wait a while as a follow-up patch, since this seems like non-trivial common-code changes may be involved?

Removing the improvement in foldMemoryOperandImpl().

I saw that the machine verifier reported "Bad machine code" when two tied register operands did not have the same register. This was then the different virtual registers, which were to be allocated to the same phys reg. I then wrote a fix for this so that the operands would be legal, by adding the LHS register as an implicit use, and then setting the LHS MachineOperand register to be that of the Dst operand.

I then found that one (!) instruction on spec had gotten a different scheduling, while all the instructions and registers where the same.

It was actually the case that even though the two virtual registers were allocated to the same physreg at the point of foldMemoryOperandImpl(), these registers seemed to not be the same after regalloc. After some consideration it seemed obvious to me that we actually can't trust the VRM mapping to remain the same after spilling, since registers can be evicted. In my small example all spilling happened after all other allocations, but that's not true with bigger functions.

The difference on spec seems to be:

lg             :               373819               374447     +628
l              :                73368                73988     +620
ag             :                10021                 9723     -298
agr            :                33635                33933     +298
a              :                13531                13245     -286
ar             :                19257                19541     +284
ng             :                 3115                 2929     -186
ngr            :                 2794                 2980     +186
sg             :                 6722                 6591     -131
sgr            :                15819                15949     +130
or             :                 2848                 2975     +127
o              :                 1883                 1756     -127
sr             :                 5784                 5904     +120
s              :                 1389                 1270     -119
nr             :                 2978                 3041      +63
n              :                 1995                 1932      -63
st             :                59642                59666      +24
lr             :                28410                28394      -16
lfh            :                 2723                 2734      +11
...

I would per this list estimate then that there are some ~1250 loads that remain unfolded due to this patch. Currently ~10 tests are failing, but I am not sure if I should try to fold these loads in some later pass (SystemZShortenInst.cpp?), or if I should update the tests and wait with this.

...it is unlikely we can fix this later, since RA will have allocated an additional register to load the spilled value into, and this will have pessimized the whole function (since we must already have register pressure, otherwise we wouldn't have a spilled value in the first place)...

I got an idea of one way to handle this which may work:

Insert a COPY before the new reg/mem instruction:

%20:gr64bit = AG %19:gr64bit(tied-def 0), %stack.0, 0, $noreg, implicit-def dead $cc
=>
%20:gr64bit = COPY %19:gr64bit
%20:gr64bit = AG %20:gr64bit(tied-def 0), %stack.0, 0, $noreg, implicit-def dead $cc

If %19 and %20 do end up in the same physreg (which should be the very common case), VirtRegMap will remove the Identity COPY. If not, there will be a COPY instead of a reload, which would then hopefully also be an improvement.

However, when trying this, I got machine verifier errors. The LiveIntervals need to be updated, but I did not find a simple way to do that. InlineSpiller is the caller here, and it is taking care to insert any newly created instructions into the SlotIndexes maps. I therefore find it reasonable to have it also update the LiveIntervals since otherwise the backend would have to first insert them, update LIS, and then remove them, which doesn't seem right. I tried:

diff --git a/lib/CodeGen/InlineSpiller.cpp b/lib/CodeGen/InlineSpiller.cpp
index 9ed524e..ee8bf2d 100644
--- a/lib/CodeGen/InlineSpiller.cpp
+++ b/lib/CodeGen/InlineSpiller.cpp
@@ -872,9 +872,19 @@ foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
 
   // Insert any new instructions other than FoldMI into the LIS maps.
   assert(!MIS.empty() && "Unexpected empty span of instructions!");
+  SmallVector<unsigned, 4> NewSpanRegs;
+  std::set<unsigned> SeenRegs;
   for (MachineInstr &MI : MIS)
-    if (&MI != FoldMI)
+    if (&MI != FoldMI) {
       LIS.InsertMachineInstrInMaps(MI);
+      for (const MachineOperand &MO : MI.operands()) {
+        if (MO.isReg() && SeenRegs.insert(MO.getReg()).second)
+          NewSpanRegs.push_back(MO.getReg());
+      }
+    }
+
+  LIS.repairIntervalsInRange(FoldMI->getParent(), MIS.begin(), MIS.end(),
+                             NewSpanRegs);

LIS.repairIntervalsInRange() seems to be under development, so I suspect this can be improved to help us here.

When trying to build SPEC without the verifier, all but two failed, which may indicate that most of the COPYs are removed...

Currently, it seems this might have to wait then as a later improvement - this patch is looking like an improvement also without it.

@Quentin: This has the same common-code change as in D58923, with the added VRM to foldMemoryOperand(). You seemed fine with this change, right?

Correct!

Thanks Jonas.

In D60888#1489952, @qcolombet wrote:

@Quentin: This has the same common-code change as in D58923, with the added VRM to foldMemoryOperand(). You seemed fine with this change, right?

Correct!

Thanks Jonas.

Thanks for review, Quentin. Do you have any comment / suggestion with regards to my previous thoughts on improving InlineSpiller to repair live intervals (see my previous comment)?

I think this is good for now. We can work on further improving folded reloads as a follow-on. LGTM.

test/CodeGen/SystemZ/asm-18.ll
606	Should update this comment now.

This revision is now accepted and ready to land.May 31 2019, 12:21 PM

jonpa mentioned this in D62803: [SystemZ] Handle 3-address instructions in foldMemoryOperandImpl().Jun 3 2019, 1:33 AM

D62803 merged into this patch.

I'm a bit confused about the mapping logic. For the case of e.g. ADD, we today have
AR ---> gets mapped to A by getMemOpcode
and
ARK --> no mapping via getMemOpcode
I would have expected the first mapping to stay as is, and a new second mapping of ARK to some pseudo A_MemFoldPseudo.
Instead, it seems you're redirecting the mapping of AR to A_MemFoldPseudo, if I'm reading the patch correctly. Why is this? > If you already have an AR, you already have a two-operand form, so it can just be modified to A. The problem is as long as you > have an ARK ...

I think you are right - that seems to be a better way to do it, so I changed it. I guess I was generally aiming for a uniform way of handling all INSN<R> -> INSN transformations, but I see now that there is no point in doing so.

I may have forgot, but could you explain again why we need (yet another) new pass for this, instead of just expanding the pseudo in one of the existing pseudo-expansion passes?

The reason this is needed is that the MachineCopyPropagation pass will be free to replace any physical registers in the pseudo instructions, and it is run before any of the later pseudo-expansion passes.

This is the new pass that will also handle the Add/Sub/Compare "high" pseudos. When that patch lands the later pseudo-expansion pass will be removed (see D58923).

Herald added a subscriber: mgorny. · View Herald TranscriptJun 3 2019, 11:41 PM

jonpa requested review of this revision.Jun 3 2019, 11:42 PM

Looking mostly good now, thanks! Just a few remaining questions/comments inline ...

lib/Target/SystemZ/SystemZInstrFormats.td
110	I'd prefer if you let that stay where it is, and move the new pseudos to the appropriate places in the pseudo section.
3417	Shouldn't the Y variant also be mapped to a pseudo?
test/CodeGen/SystemZ/int-add-05.ll
100	Just to clarify: even with the new MemFoldPseudos this is still suboptimal? Why is that?

I did SPEC runs overnight on both 2006 and 2017, see summary-files:

results.190605.z1413 KBDownload

results.190605.z1313 KBDownload

On z14, this looks good.

I see two regressions on z13 (i541.leela_r and i557.xz_r). I checked quickly without the latest foldMemoryOperandImpl() changes and found that the regression was still there just the same.

I tried rebuilding with one file at a time taken from master:

i541.leela_r:

Matcher.s: regression mostly disappears (5% -> 1.4%)

Opcodes:
master <> patched
lgr            :                   53                   51       -2
sgrk           :                    1                    3       +2
sgr            :                    5                    3       -2
agrk           :                    0                    1       +1
agr            :                    3                    2       -1

FastBoard.s regression mostly disappears (5% -> 1.7%)

Both of these files have two of the hotter functions (~11% of ticks), but since this is resolved by replacing just *one* of them with unpatched version, it's hard to get a hold of this. Also, this benchmark only runs for half a second...

i557.xz_r:

lz_encoder_mf.s: regression entirely disappears (1.052% -> 0.99.615%)

Opcodes:
master <> patched
ahi            :                   64                   45      -19
ahik           :                   67                   85      +18
lr             :                   97                   79      -18
risbhg         :                    0                   14      +14
lg             :                  112                  125      +13      <<<
agr            :                   34                   24      -10
l              :                  213                  203      -10
srk            :                   30                   21       -9
sr             :                   33                   41       +8
ag             :                    6                   14       +8
llihl          :                   10                    3       -7
sgrk           :                    0                    5       +5
sg             :                    4                    0       -4
ar             :                   24                   28       +4
st             :                  188                  184       -4
j              :                   59                   55       -4
clrjl          :                   20                   16       -4
ark            :                   22                   18       -4
chi            :                    3                    0       -3
...

This looks to be a file that has increased reloads (+8), so this could be one of the files that suffers from this, as per my mail of 6th of May. Back then xz_r was a ~2% regression with this patch. Right now it seems to be a 4-5% regression. I don't think the patch itself has changed since then.

lib/Target/SystemZ/SystemZInstrFormats.td
110	I can't just move down MemFoldPseudo and keep the new multiclasses above it (won't compile), so I am not sure what to do other than moving up the Pseudo definition. Alternatives are to move down the new multiclasses (but then we would have a multiclass with a target instruction in the Pseudo section), or to define all those MemFoldPseudos in the InstrInfo.td file, which seems less clean.
3417	I thought the "Y" reg/mem mapping was dead since there is no "YR" opcode. So in BinaryRXPair, the instruction A makes (unpatched) with BinaryRX an entry AR->"mem", and with BinaryRXY an entry AYR->"mem" entry. There is however no AYR->"reg" entry, right? I think that the reason only the 12-bit displacement instructions are needed is that at this point a FrameIndex operand is added. This is then later handled in SystemZRegisterInfo::eliminateFrameIndex(), where the actual offset is checked.
test/CodeGen/SystemZ/int-add-05.ll
100	The MemFoldPseudos are not improving anything compared to before, they are just making the IR legal, as well as handling a few rare cases by inserting COPYs before when needed. In the case where a MemFoldPseudo actually ended up getting the dst and LHS regs to be the same after regalloc, all that is needed is a lowering to the target instruction. In the case where a MemFoldPseudo had dst and LHS allocated to the same physreg at the point of foldMemoryOperandImpl(), but this was later changed by an eviction or so, a COPY of LHS to dst is also needed during lowering. In the case where dst and LHS were allocated different regs to begin with, the folding cannot occur, which is why this test case fails (discussed here on 26th of April).

uweigand added inline comments.Jun 5 2019, 12:44 PM

lib/Target/SystemZ/SystemZInstrFormats.td
110	OK, then please move them to the very end, in a new section entitled something like "Multiclasses that emit both real and pseudo instructions"
3417	I see. This looks fine then.
test/CodeGen/SystemZ/int-add-05.ll
100	Ah, I thought the MemFoldPseudo classes would allow folding in the case where dst and LHS were allocated different regs to begin with! Why don't they? I thought this would fold to a pseudo three-operand add-from-memory, which later gets lowered to a COPY of the register LHS to dst followed by the real two-operand add-from-memory?

Hi Jonas,

Do you have any comment / suggestion with regards to my previous thoughts on improving InlineSpiller to repair live intervals (see my previous comment)?

Thanks for your patience, I missed that one.
Those are good ideas.

Cheers,
-Quentin

Move down new multiclasses to new section.

jonpa added inline comments.Jun 6 2019, 12:30 AM

lib/Target/SystemZ/SystemZInstrFormats.td
110	OK. Perhaps we should also move down StringRRE to that section?
test/CodeGen/SystemZ/int-add-05.ll
100	I guess I was expecting "Load + Op(Reg)" be better than "COPY + Op(Mem)", but I really don't know. I tried to remove this restriction on SPEC 2006 and found these opcode differences: lg : 371908 370637 -1271 lgr : 349230 350477 +1247 ag : 11716 12786 +1070 agr : 32345 31346 -999 l : 72751 71972 -779 a : 13718 14396 +678 ar : 19560 18886 -674 lr : 28345 28960 +615 sg : 6715 6862 +147 sgrk : 11729 11635 -94 agrk : 8730 8658 -72 sgr : 15302 15249 -53 s : 1370 1416 +46 o : 1924 1964 +40 or : 2780 2742 -38 srk : 3730 3693 -37 ngr : 2811 2780 -31 ng : 3117 3148 +31 nr : 2933 2917 -16 ... This would also handle this test case to do the fold while requiring one more lgr. Would this be better?

uweigand added inline comments.Jun 6 2019, 12:39 PM

lib/Target/SystemZ/SystemZInstrFormats.td
110	Good catch; yes, StringRRE as well as MemorySS / CompareMemorySS ought to be moved there as well.
test/CodeGen/SystemZ/int-add-05.ll
100	It is an interesting question whether LGR/AG is in general better or worse (or the same) than LG/AGR. Even if they are the same hardware-wise, I guess there might still be differences w.r.t. LLVM register allocation ... When you make that change, do you see any performance differences / changes to those regressions you mention above?

More multiclasses moved down to new section in SystemZInstrFormats.td

jonpa added inline comments.Jun 7 2019, 8:49 AM

test/CodeGen/SystemZ/int-add-05.ll
100	During quick preliminary benchmarking (14 x 3 runs per benchmark during the day): z13: (leela_r was down again to a 2.5% regression (without any rebase, and with same build). xz_r was helped ~1%, see below. Effects of not requiring Dst/LHS to be the same, compared to patch with dst/lhs required to be the same: 2017 (Average: 99.969%): Improvements: 0.992: i557.xz_r 0.996: i525.x264_r 0.997: f507.cactuBSSN_r 0.997: i541.leela_r Regressions: 1.004: i531.deepsjeng_r 1.003: f511.povray_r 1.003: i500.perlbench_r 2006 (Average: 99.983%): Improvements 0.975: f436.cactusADM 0.992: i456.hmmer 0.996: f453.povray Regressions 1.011: f470.lbm 1.009: i464.h264ref 1.005: f454.calculix 1.004: i473.astar 1.003: f435.gromacs z14: 2017 (Average: 100.126%): Improvements 0.991: f519.lbm_r Regressions 1.010: f511.povray_r 1.009: i500.perlbench_r 1.008: i523.xalancbmk_r 1.005: f507.cactuBSSN_r 1.005: f508.namd_r 1.005: i541.leela_r 2006 (Average: 99.820%) Improvements 0.988: f470.lbm 0.989: f435.gromacs 0.992: f447.dealII 0.994: f481.wrf 0.996: f436.cactusADM 0.997: i401.bzip2 Regressions 1.008: i400.perlbench 1.003: i458.sjeng The regalloc effects of this seem to be very marginal according to some stats (z13): 2006 / ThreeAddr 43688 regalloc - Number of spill slots allocated 57280 regalloc - Number of spilled live ranges 1794 regalloc - Number of spilled snippets 52901 regalloc - Number of spills inserted 1577 regalloc - Number of spills removed 2006 / ThreeAddr + disable_dstlhs_check 43686 regalloc - Number of spill slots allocated 57278 regalloc - Number of spilled live ranges 1795 regalloc - Number of spilled snippets 52899 regalloc - Number of spills inserted 1575 regalloc - Number of spills removed 2017 / ThreeAddr 138380 regalloc - Number of spill slots allocated 182928 regalloc - Number of spilled live ranges 9532 regalloc - Number of spilled snippets 177398 regalloc - Number of spills inserted 5050 regalloc - Number of spills removed 2017 / ThreeAddr + disable_dstlhs_check 138382 regalloc - Number of spill slots allocated 182928 regalloc - Number of spilled live ranges 9509 regalloc - Number of spilled snippets 177387 regalloc - Number of spills inserted 5057 regalloc - Number of spills removed All in all, it doesn't seem to matter much, but possibly it is better to skip this requirement as you expected. I can see the point of not needing that extra register to do the reload with...

At this point, this looks good to me. Thanks!

test/CodeGen/SystemZ/int-add-05.ll
100	All in all, it doesn't seem to matter much, but possibly it is better to skip this requirement as you expected. I can see the point of not needing that extra register to do the reload with... You shouldn't really need an extra register, since the original destination register will always be free at this point, so the allocator should be able to choose it. I was just wondering whether the additional allocation has any secondary effects in the allocator, but apparently not. Given the information I received from our hardware folks that LG/AGR is preferable from their point over LGR/AG, and your performance results that show not much difference, I now think we should leave the patch as-is.

This revision is now accepted and ready to land.Jun 7 2019, 1:36 PM

Thanks for review! Committed as r362868.

Note: test/CodeGen/SystemZ/codegenprepare-splitstore.ll was no longer changed by this patch after r362471, which changed that test to work on the IR instead of MIR.
Note: r362869 was committed immediately after to fix the CMake file to have the new SystemZPostRewrite.cpp file in alphabetical order.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

TargetInstrInfo.h

9 lines

TargetPassConfig.h

5 lines

lib/

CodeGen/

InlineSpiller.cpp

2 lines

TargetInstrInfo.cpp

5 lines

TargetPassConfig.cpp

4 lines

Target/

AArch64/

AArch64InstrInfo.h

3 lines

AArch64InstrInfo.cpp

2 lines

SystemZ/

CMakeLists.txt

1 line

SystemZ.h

1 line

SystemZInstrFormats.td

108 lines

SystemZInstrInfo.h

8 lines

SystemZInstrInfo.cpp

118 lines

SystemZInstrInfo.td

28 lines

SystemZPostRewrite.cpp

124 lines

SystemZRegisterInfo.cpp

48 lines

SystemZShortenInst.cpp

25 lines

SystemZTargetMachine.cpp

10 lines

X86/

X86InstrInfo.h

3 lines

X86InstrInfo.cpp

3 lines

test/

CodeGen/

SystemZ/

asm-18.ll

6 lines

codegenprepare-splitstore.ll

4 lines

26 lines

8 lines

22 lines

28 lines

store_nonbytesized_vecs.ll

38 lines

vec-combine-02.ll

2 lines

Diff 202861

include/llvm/CodeGen/TargetInstrInfo.h

Show All 20 Lines
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineCombinerPattern.h"		#include "llvm/CodeGen/MachineCombinerPattern.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineOutliner.h"		#include "llvm/CodeGen/MachineOutliner.h"
#include "llvm/CodeGen/PseudoSourceValue.h"		#include "llvm/CodeGen/PseudoSourceValue.h"
		#include "llvm/CodeGen/VirtRegMap.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/Support/BranchProbability.h"		#include "llvm/Support/BranchProbability.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>
#include <vector>		#include <vector>
▲ Show 20 Lines • Show All 890 Lines • ▼ Show 20 Lines	public:
virtual bool isSubregFoldable() const { return false; }		virtual bool isSubregFoldable() const { return false; }

/// Attempt to fold a load or store of the specified stack		/// Attempt to fold a load or store of the specified stack
/// slot into the specified machine instruction for the specified operand(s).		/// slot into the specified machine instruction for the specified operand(s).
/// If this is possible, a new instruction is returned with the specified		/// If this is possible, a new instruction is returned with the specified
/// operand folded, otherwise NULL is returned.		/// operand folded, otherwise NULL is returned.
/// The new instruction is inserted before MI, and the client is responsible		/// The new instruction is inserted before MI, and the client is responsible
/// for removing the old instruction.		/// for removing the old instruction.
		/// If VRM is passed, the assigned physregs can be inspected by target to
		/// decide on using an opcode (note that those assignments can still change).
MachineInstr *foldMemoryOperand(MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineInstr *foldMemoryOperand(MachineInstr &MI, ArrayRef<unsigned> Ops,
int FI,		int FI,
LiveIntervals *LIS = nullptr) const;		LiveIntervals *LIS = nullptr,
		VirtRegMap *VRM = nullptr) const;

/// Same as the previous version except it allows folding of any load and		/// Same as the previous version except it allows folding of any load and
/// store from / to any address, not just from a specific stack slot.		/// store from / to any address, not just from a specific stack slot.
MachineInstr *foldMemoryOperand(MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineInstr *foldMemoryOperand(MachineInstr &MI, ArrayRef<unsigned> Ops,
MachineInstr &LoadMI,		MachineInstr &LoadMI,
LiveIntervals *LIS = nullptr) const;		LiveIntervals *LIS = nullptr) const;

/// Return true when there is potentially a faster code sequence		/// Return true when there is potentially a faster code sequence
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	protected:
/// Target-independent code in foldMemoryOperand will		/// Target-independent code in foldMemoryOperand will
/// take care of adding a MachineMemOperand to the newly created instruction.		/// take care of adding a MachineMemOperand to the newly created instruction.
/// The instruction and any auxiliary instructions necessary will be inserted		/// The instruction and any auxiliary instructions necessary will be inserted
/// at InsertPt.		/// at InsertPt.
virtual MachineInstr *		virtual MachineInstr *
foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
ArrayRef<unsigned> Ops,		ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, int FrameIndex,		MachineBasicBlock::iterator InsertPt, int FrameIndex,
LiveIntervals *LIS = nullptr) const {		LiveIntervals *LIS = nullptr,
		VirtRegMap *VRM = nullptr) const {
return nullptr;		return nullptr;
}		}

/// Target-dependent implementation for foldMemoryOperand.		/// Target-dependent implementation for foldMemoryOperand.
/// Target-independent code in foldMemoryOperand will		/// Target-independent code in foldMemoryOperand will
/// take care of adding a MachineMemOperand to the newly created instruction.		/// take care of adding a MachineMemOperand to the newly created instruction.
/// The instruction and any auxiliary instructions necessary will be inserted		/// The instruction and any auxiliary instructions necessary will be inserted
/// at InsertPt.		/// at InsertPt.
▲ Show 20 Lines • Show All 688 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetPassConfig.h

Show First 20 Lines • Show All 380 Lines • ▼ Show 20 Lines	protected:
///		///
/// Note if the target overloads addRegAssignAndRewriteOptimized, this may not		/// Note if the target overloads addRegAssignAndRewriteOptimized, this may not
/// be honored. This is also not generally used for the the fast variant,		/// be honored. This is also not generally used for the the fast variant,
/// where the allocation and rewriting are done in one pass.		/// where the allocation and rewriting are done in one pass.
virtual bool addPreRewrite() {		virtual bool addPreRewrite() {
return false;		return false;
}		}

		/// Add passes to be run immediately after virtual registers are rewritten
		/// to physical registers. These passes may replace an MI with a new one,
		/// but should preserve SlotIndexes while doing so.
		virtual void addPostRewrite() { }

/// This method may be implemented by targets that want to run passes after		/// This method may be implemented by targets that want to run passes after
/// register allocation pass pipeline but before prolog-epilog insertion.		/// register allocation pass pipeline but before prolog-epilog insertion.
virtual void addPostRegAlloc() { }		virtual void addPostRegAlloc() { }

/// Add passes that optimize machine instructions after register allocation.		/// Add passes that optimize machine instructions after register allocation.
virtual void addMachineLateOptimization();		virtual void addMachineLateOptimization();

/// This method may be implemented by targets that want to run passes after		/// This method may be implemented by targets that want to run passes after
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

lib/CodeGen/InlineSpiller.cpp

Show First 20 Lines • Show All 831 Lines • ▼ Show 20 Lines	foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
// Moreover, TargetInstrInfo::foldMemoryOperand will assert if we try!		// Moreover, TargetInstrInfo::foldMemoryOperand will assert if we try!
if (FoldOps.empty())		if (FoldOps.empty())
return false;		return false;

MachineInstrSpan MIS(MI);		MachineInstrSpan MIS(MI);

MachineInstr *FoldMI =		MachineInstr *FoldMI =
LoadMI ? TII.foldMemoryOperand(MI, FoldOps, LoadMI, &LIS)		LoadMI ? TII.foldMemoryOperand(MI, FoldOps, LoadMI, &LIS)
: TII.foldMemoryOperand(*MI, FoldOps, StackSlot, &LIS);		: TII.foldMemoryOperand(*MI, FoldOps, StackSlot, &LIS, &VRM);
if (!FoldMI)		if (!FoldMI)
return false;		return false;

// Remove LIS for any dead defs in the original MI not in FoldMI.		// Remove LIS for any dead defs in the original MI not in FoldMI.
for (MIBundleOperands MO(*MI); MO.isValid(); ++MO) {		for (MIBundleOperands MO(*MI); MO.isValid(); ++MO) {
if (!MO->isReg())		if (!MO->isReg())
continue;		continue;
unsigned Reg = MO->getReg();		unsigned Reg = MO->getReg();
▲ Show 20 Lines • Show All 692 Lines • Show Last 20 Lines

lib/CodeGen/TargetInstrInfo.cpp

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	for (unsigned i = StartIdx; i < MI.getNumOperands(); ++i) {
else		else
MIB.add(MO);		MIB.add(MO);
}		}
return NewMI;		return NewMI;
}		}

MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,		MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
ArrayRef<unsigned> Ops, int FI,		ArrayRef<unsigned> Ops, int FI,
LiveIntervals *LIS) const {		LiveIntervals *LIS,
		VirtRegMap *VRM) const {
auto Flags = MachineMemOperand::MONone;		auto Flags = MachineMemOperand::MONone;
for (unsigned OpIdx : Ops)		for (unsigned OpIdx : Ops)
Flags \|= MI.getOperand(OpIdx).isDef() ? MachineMemOperand::MOStore		Flags \|= MI.getOperand(OpIdx).isDef() ? MachineMemOperand::MOStore
: MachineMemOperand::MOLoad;		: MachineMemOperand::MOLoad;

MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
assert(MBB && "foldMemoryOperand needs an inserted instruction");		assert(MBB && "foldMemoryOperand needs an inserted instruction");
MachineFunction &MF = *MBB->getParent();		MachineFunction &MF = *MBB->getParent();
Show All 29 Lines	if (MI.getOpcode() == TargetOpcode::STACKMAP \|\|
MI.getOpcode() == TargetOpcode::PATCHPOINT \|\|		MI.getOpcode() == TargetOpcode::PATCHPOINT \|\|
MI.getOpcode() == TargetOpcode::STATEPOINT) {		MI.getOpcode() == TargetOpcode::STATEPOINT) {
// Fold stackmap/patchpoint.		// Fold stackmap/patchpoint.
NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);		NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);
if (NewMI)		if (NewMI)
MBB->insert(MI, NewMI);		MBB->insert(MI, NewMI);
} else {		} else {
// Ask the target to do the actual folding.		// Ask the target to do the actual folding.
NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, FI, LIS);		NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, FI, LIS, VRM);
}		}

if (NewMI) {		if (NewMI) {
NewMI->setMemRefs(MF, MI.memoperands());		NewMI->setMemRefs(MF, MI.memoperands());
// Add a memory operand, foldMemoryOperandImpl doesn't do that.		// Add a memory operand, foldMemoryOperandImpl doesn't do that.
assert((!(Flags & MachineMemOperand::MOStore) \|\|		assert((!(Flags & MachineMemOperand::MOStore) \|\|
NewMI->mayStore()) &&		NewMI->mayStore()) &&
"Folded a def to a non-store!");		"Folded a def to a non-store!");
▲ Show 20 Lines • Show All 640 Lines • Show Last 20 Lines

lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 1,162 Lines • ▼ Show 20 Lines	void TargetPassConfig::addOptimizedRegAlloc() {
// when moving subregister definitions around, avoid this by splitting them to		// when moving subregister definitions around, avoid this by splitting them to
// separate vregs before. Splitting can also improve reg. allocation quality.		// separate vregs before. Splitting can also improve reg. allocation quality.
addPass(&RenameIndependentSubregsID);		addPass(&RenameIndependentSubregsID);

// PreRA instruction scheduling.		// PreRA instruction scheduling.
addPass(&MachineSchedulerID);		addPass(&MachineSchedulerID);

if (addRegAssignmentOptimized()) {		if (addRegAssignmentOptimized()) {
		// Allow targets to expand pseudo instructions depending on the choice of
		// registers before MachineCopyPropagation.
		addPostRewrite();

// Copy propagate to forward register uses and try to eliminate COPYs that		// Copy propagate to forward register uses and try to eliminate COPYs that
// were not coalesced.		// were not coalesced.
addPass(&MachineCopyPropagationID);		addPass(&MachineCopyPropagationID);

// Run post-ra machine LICM to hoist reloads / remats.		// Run post-ra machine LICM to hoist reloads / remats.
//		//
// FIXME: can this move into MachineLateOptimization?		// FIXME: can this move into MachineLateOptimization?
addPass(&MachineLICMID);		addPass(&MachineLICMID);
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	public:
// with subreg operands to foldMemoryOperandImpl.		// with subreg operands to foldMemoryOperandImpl.
bool isSubregFoldable() const override { return true; }		bool isSubregFoldable() const override { return true; }

using TargetInstrInfo::foldMemoryOperandImpl;		using TargetInstrInfo::foldMemoryOperandImpl;
MachineInstr *		MachineInstr *
foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
ArrayRef<unsigned> Ops,		ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, int FrameIndex,		MachineBasicBlock::iterator InsertPt, int FrameIndex,
LiveIntervals *LIS = nullptr) const override;		LiveIntervals *LIS = nullptr,
		VirtRegMap *VRM = nullptr) const override;

/// \returns true if a branch from an instruction with opcode \p BranchOpc		/// \returns true if a branch from an instruction with opcode \p BranchOpc
/// bytes is capable of jumping to a position \p BrOffset bytes away.		/// bytes is capable of jumping to a position \p BrOffset bytes away.
bool isBranchOffsetInRange(unsigned BranchOpc,		bool isBranchOffsetInRange(unsigned BranchOpc,
int64_t BrOffset) const override;		int64_t BrOffset) const override;

MachineBasicBlock *getBranchDestBlock(const MachineInstr &MI) const override;		MachineBasicBlock *getBranchDestBlock(const MachineInstr &MI) const override;

▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 3,043 Lines • ▼ Show 20 Lines	if ((DestReg == AArch64::FP && SrcReg == AArch64::SP) \|\|
addImm(Offset).setMIFlag(Flag);		addImm(Offset).setMIFlag(Flag);
}		}
}		}
}		}

MachineInstr *AArch64InstrInfo::foldMemoryOperandImpl(		MachineInstr *AArch64InstrInfo::foldMemoryOperandImpl(
MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, int FrameIndex,		MachineBasicBlock::iterator InsertPt, int FrameIndex,
LiveIntervals *LIS) const {		LiveIntervals LIS, VirtRegMap VRM) const {
// This is a bit of a hack. Consider this instruction:		// This is a bit of a hack. Consider this instruction:
//		//
// %0 = COPY %sp; GPR64all:%0		// %0 = COPY %sp; GPR64all:%0
//		//
// We explicitly chose GPR64all for the virtual register so such a copy might		// We explicitly chose GPR64all for the virtual register so such a copy might
// be eliminated by RegisterCoalescer. However, that may not be possible, and		// be eliminated by RegisterCoalescer. However, that may not be possible, and
// %0 may even spill. We can't spill %sp, and since it is in the GPR64all		// %0 may even spill. We can't spill %sp, and since it is in the GPR64all
// register class, TargetInstrInfo::foldMemoryOperand() is going to try.		// register class, TargetInstrInfo::foldMemoryOperand() is going to try.
▲ Show 20 Lines • Show All 2,484 Lines • Show Last 20 Lines

lib/Target/SystemZ/CMakeLists.txt

Show All 24 Lines	add_llvm_target(SystemZCodeGen
SystemZInstrInfo.cpp		SystemZInstrInfo.cpp
SystemZLDCleanup.cpp		SystemZLDCleanup.cpp
SystemZLongBranch.cpp		SystemZLongBranch.cpp
SystemZMachineFunctionInfo.cpp		SystemZMachineFunctionInfo.cpp
SystemZMachineScheduler.cpp		SystemZMachineScheduler.cpp
SystemZMCInstLower.cpp		SystemZMCInstLower.cpp
SystemZRegisterInfo.cpp		SystemZRegisterInfo.cpp
SystemZSelectionDAGInfo.cpp		SystemZSelectionDAGInfo.cpp
		SystemZPostRewrite.cpp
SystemZShortenInst.cpp		SystemZShortenInst.cpp
SystemZSubtarget.cpp		SystemZSubtarget.cpp
SystemZTargetMachine.cpp		SystemZTargetMachine.cpp
SystemZTargetTransformInfo.cpp		SystemZTargetTransformInfo.cpp
SystemZTDC.cpp		SystemZTDC.cpp
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)

lib/Target/SystemZ/SystemZ.h

	Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines

	FunctionPass *createSystemZISelDag(SystemZTargetMachine &TM,			FunctionPass *createSystemZISelDag(SystemZTargetMachine &TM,
	CodeGenOpt::Level OptLevel);			CodeGenOpt::Level OptLevel);
	FunctionPass *createSystemZElimComparePass(SystemZTargetMachine &TM);			FunctionPass *createSystemZElimComparePass(SystemZTargetMachine &TM);
	FunctionPass *createSystemZExpandPseudoPass(SystemZTargetMachine &TM);			FunctionPass *createSystemZExpandPseudoPass(SystemZTargetMachine &TM);
	FunctionPass *createSystemZShortenInstPass(SystemZTargetMachine &TM);			FunctionPass *createSystemZShortenInstPass(SystemZTargetMachine &TM);
	FunctionPass *createSystemZLongBranchPass(SystemZTargetMachine &TM);			FunctionPass *createSystemZLongBranchPass(SystemZTargetMachine &TM);
	FunctionPass *createSystemZLDCleanupPass(SystemZTargetMachine &TM);			FunctionPass *createSystemZLDCleanupPass(SystemZTargetMachine &TM);
				FunctionPass *createSystemZPostRewritePass(SystemZTargetMachine &TM);
	FunctionPass *createSystemZTDCPass();			FunctionPass *createSystemZTDCPass();
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

lib/Target/SystemZ/SystemZInstrFormats.td

Show All 31 Lines	class InstSystemZ<int size, dag outs, dag ins, string asmstr,
string DispSize = "none";		string DispSize = "none";

// Many register-based <INSN>R instructions have a memory-based <INSN>		// Many register-based <INSN>R instructions have a memory-based <INSN>
// counterpart. OpKey uniquely identifies <INSN>R, while OpType is		// counterpart. OpKey uniquely identifies <INSN>R, while OpType is
// "reg" for <INSN>R and "mem" for <INSN>.		// "reg" for <INSN>R and "mem" for <INSN>.
string OpKey = "";		string OpKey = "";
string OpType = "none";		string OpType = "none";

		// MemKey identifies a targe reg-mem opcode, while MemType can be either
		// "pseudo" or "target". This is used to map a pseduo memory instruction to
		// its corresponding target opcode. See comment at MemFoldPseudo.
		string MemKey = "";
		string MemType = "none";

// Many distinct-operands instructions have older 2-operand equivalents.		// Many distinct-operands instructions have older 2-operand equivalents.
// NumOpsKey uniquely identifies one of these 2-operand and 3-operand pairs,		// NumOpsKey uniquely identifies one of these 2-operand and 3-operand pairs,
// with NumOpsValue being "2" or "3" as appropriate.		// with NumOpsValue being "2" or "3" as appropriate.
string NumOpsKey = "";		string NumOpsKey = "";
string NumOpsValue = "none";		string NumOpsValue = "none";

// True if this instruction is a simple D(X,B) load of a register		// True if this instruction is a simple D(X,B) load of a register
// (with no sign or zero extension).		// (with no sign or zero extension).
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	class InstSystemZ<int size, dag outs, dag ins, string asmstr,
let TSFlags{9-5} = AccessBytes;		let TSFlags{9-5} = AccessBytes;
let TSFlags{13-10} = CCValues;		let TSFlags{13-10} = CCValues;
let TSFlags{17-14} = CompareZeroCCMask;		let TSFlags{17-14} = CompareZeroCCMask;
let TSFlags{18} = CCMaskFirst;		let TSFlags{18} = CCMaskFirst;
let TSFlags{19} = CCMaskLast;		let TSFlags{19} = CCMaskLast;
let TSFlags{20} = IsLogical;		let TSFlags{20} = IsLogical;
}		}

		class Pseudo<dag outs, dag ins, list<dag> pattern>
		: InstSystemZ<0, outs, ins, "", pattern> {
		let isPseudo = 1;
		let isCodeGenOnly = 1;
		}
		uweigandUnsubmitted Done Reply Inline Actions I'd prefer if you let that stay where it is, and move the new pseudos to the appropriate places in the pseudo section. uweigand: I'd prefer if you let that stay where it is, and move the new pseudos to the appropriate places…
		jonpaAuthorUnsubmitted Done Reply Inline Actions I can't just move down MemFoldPseudo and keep the new multiclasses above it (won't compile), so I am not sure what to do other than moving up the Pseudo definition. Alternatives are to move down the new multiclasses (but then we would have a multiclass with a target instruction in the Pseudo section), or to define all those MemFoldPseudos in the InstrInfo.td file, which seems less clean. jonpa: I can't just move down MemFoldPseudo and keep the new multiclasses above it (won't compile), so…
		uweigandUnsubmitted Done Reply Inline Actions OK, then please move them to the very end, in a new section entitled something like "Multiclasses that emit both real and pseudo instructions" uweigand: OK, then please move them to the very end, in a new section entitled something like…
		jonpaAuthorUnsubmitted Done Reply Inline Actions OK. Perhaps we should also move down StringRRE to that section? jonpa: OK. Perhaps we should also move down StringRRE to that section?
		uweigandUnsubmitted Done Reply Inline Actions Good catch; yes, StringRRE as well as MemorySS / CompareMemorySS ought to be moved there as well. uweigand: Good catch; yes, StringRRE as well as MemorySS / CompareMemorySS ought to be moved there as…

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Mappings between instructions		// Mappings between instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Return the version of an instruction that has an unsigned 12-bit		// Return the version of an instruction that has an unsigned 12-bit
// displacement.		// displacement.
def getDisp12Opcode : InstrMapping {		def getDisp12Opcode : InstrMapping {
let FilterClass = "InstSystemZ";		let FilterClass = "InstSystemZ";
let RowFields = ["DispKey"];		let RowFields = ["DispKey"];
let ColFields = ["DispSize"];		let ColFields = ["DispSize"];
let KeyCol = ["20"];		let KeyCol = ["20"];
let ValueCols = [["12"]];		let ValueCols = [["12"]];
}		}

// Return the version of an instruction that has a signed 20-bit displacement.		// Return the version of an instruction that has a signed 20-bit displacement.
def getDisp20Opcode : InstrMapping {		def getDisp20Opcode : InstrMapping {
let FilterClass = "InstSystemZ";		let FilterClass = "InstSystemZ";
let RowFields = ["DispKey"];		let RowFields = ["DispKey"];
let ColFields = ["DispSize"];		let ColFields = ["DispSize"];
let KeyCol = ["12"];		let KeyCol = ["12"];
let ValueCols = [["20"]];		let ValueCols = [["20"]];
}		}

// Return the memory form of a register instruction.		// Return the memory form of a register instruction. Note that this may
		// return a MemFoldPseudo instruction (see below).
def getMemOpcode : InstrMapping {		def getMemOpcode : InstrMapping {
let FilterClass = "InstSystemZ";		let FilterClass = "InstSystemZ";
let RowFields = ["OpKey"];		let RowFields = ["OpKey"];
let ColFields = ["OpType"];		let ColFields = ["OpType"];
let KeyCol = ["reg"];		let KeyCol = ["reg"];
let ValueCols = [["mem"]];		let ValueCols = [["mem"]];
}		}

// Return the 3-operand form of a 2-operand instruction.		// Return the target memory instruction for a MemFoldPseudo.
def getThreeOperandOpcode : InstrMapping {		def getTargetMemOpcode : InstrMapping {
		let FilterClass = "InstSystemZ";
		let RowFields = ["MemKey"];
		let ColFields = ["MemType"];
		let KeyCol = ["pseudo"];
		let ValueCols = [["target"]];
		}

		// Return the 2-operand form of a 3-operand instruction.
		def getTwoOperandOpcode : InstrMapping {
let FilterClass = "InstSystemZ";		let FilterClass = "InstSystemZ";
let RowFields = ["NumOpsKey"];		let RowFields = ["NumOpsKey"];
let ColFields = ["NumOpsValue"];		let ColFields = ["NumOpsValue"];
let KeyCol = ["2"];		let KeyCol = ["3"];
let ValueCols = [["3"]];		let ValueCols = [["2"]];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Instruction formats		// Instruction formats
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Formats are specified using operand field declarations of the form:		// Formats are specified using operand field declarations of the form:
//		//
▲ Show 20 Lines • Show All 2,914 Lines • ▼ Show 20 Lines

class BinaryRRFa<string mnemonic, bits<16> opcode, SDPatternOperator operator,		class BinaryRRFa<string mnemonic, bits<16> opcode, SDPatternOperator operator,
RegisterOperand cls1, RegisterOperand cls2,		RegisterOperand cls1, RegisterOperand cls2,
RegisterOperand cls3>		RegisterOperand cls3>
: InstRRFa<opcode, (outs cls1:$R1), (ins cls2:$R2, cls3:$R3),		: InstRRFa<opcode, (outs cls1:$R1), (ins cls2:$R2, cls3:$R3),
mnemonic#"\t$R1, $R2, $R3",		mnemonic#"\t$R1, $R2, $R3",
[(set cls1:$R1, (operator cls2:$R2, cls3:$R3))]> {		[(set cls1:$R1, (operator cls2:$R2, cls3:$R3))]> {
let M4 = 0;		let M4 = 0;
		let OpKey = mnemonic#cls1;
		let OpType = "reg";
}		}

multiclass BinaryRRAndK<string mnemonic, bits<8> opcode1, bits<16> opcode2,		multiclass BinaryRRAndK<string mnemonic, bits<8> opcode1, bits<16> opcode2,
SDPatternOperator operator, RegisterOperand cls1,		SDPatternOperator operator, RegisterOperand cls1,
RegisterOperand cls2> {		RegisterOperand cls2> {
let NumOpsKey = mnemonic in {		let NumOpsKey = mnemonic in {
let NumOpsValue = "3" in		let NumOpsValue = "3" in
def K : BinaryRRFa<mnemonic#"k", opcode2, null_frag, cls1, cls1, cls2>,		def K : BinaryRRFa<mnemonic#"k", opcode2, operator, cls1, cls1, cls2>,
Requires<[FeatureDistinctOps]>;		Requires<[FeatureDistinctOps]>;
let NumOpsValue = "2", isConvertibleToThreeAddress = 1 in		let NumOpsValue = "2" in
def "" : BinaryRR<mnemonic, opcode1, operator, cls1, cls2>;		def "" : BinaryRR<mnemonic, opcode1, operator, cls1, cls2>;
}		}
}		}

multiclass BinaryRREAndK<string mnemonic, bits<16> opcode1, bits<16> opcode2,		multiclass BinaryRREAndK<string mnemonic, bits<16> opcode1, bits<16> opcode2,
SDPatternOperator operator, RegisterOperand cls1,		SDPatternOperator operator, RegisterOperand cls1,
RegisterOperand cls2> {		RegisterOperand cls2> {
let NumOpsKey = mnemonic in {		let NumOpsKey = mnemonic in {
let NumOpsValue = "3" in		let NumOpsValue = "3" in
def K : BinaryRRFa<mnemonic#"k", opcode2, null_frag, cls1, cls1, cls2>,		def K : BinaryRRFa<mnemonic#"k", opcode2, operator, cls1, cls1, cls2>,
Requires<[FeatureDistinctOps]>;		Requires<[FeatureDistinctOps]>;
let NumOpsValue = "2", isConvertibleToThreeAddress = 1 in		let NumOpsValue = "2" in
def "" : BinaryRRE<mnemonic, opcode1, operator, cls1, cls2>;		def "" : BinaryRRE<mnemonic, opcode1, operator, cls1, cls2>;
}		}
}		}

class BinaryRRFb<string mnemonic, bits<16> opcode, SDPatternOperator operator,		class BinaryRRFb<string mnemonic, bits<16> opcode, SDPatternOperator operator,
RegisterOperand cls1, RegisterOperand cls2,		RegisterOperand cls1, RegisterOperand cls2,
RegisterOperand cls3>		RegisterOperand cls3>
: InstRRFb<opcode, (outs cls1:$R1), (ins cls2:$R2, cls3:$R3),		: InstRRFb<opcode, (outs cls1:$R1), (ins cls2:$R2, cls3:$R3),
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	: InstRIEd<opcode, (outs cls:$R1), (ins cls:$R3, imm:$I2),
mnemonic#"\t$R1, $R3, $I2",		mnemonic#"\t$R1, $R3, $I2",
[(set cls:$R1, (operator cls:$R3, imm:$I2))]>;		[(set cls:$R1, (operator cls:$R3, imm:$I2))]>;

multiclass BinaryRIAndK<string mnemonic, bits<12> opcode1, bits<16> opcode2,		multiclass BinaryRIAndK<string mnemonic, bits<12> opcode1, bits<16> opcode2,
SDPatternOperator operator, RegisterOperand cls,		SDPatternOperator operator, RegisterOperand cls,
Immediate imm> {		Immediate imm> {
let NumOpsKey = mnemonic in {		let NumOpsKey = mnemonic in {
let NumOpsValue = "3" in		let NumOpsValue = "3" in
def K : BinaryRIE<mnemonic##"k", opcode2, null_frag, cls, imm>,		def K : BinaryRIE<mnemonic##"k", opcode2, operator, cls, imm>,
Requires<[FeatureDistinctOps]>;		Requires<[FeatureDistinctOps]>;
let NumOpsValue = "2", isConvertibleToThreeAddress = 1 in		let NumOpsValue = "2" in
def "" : BinaryRI<mnemonic, opcode1, operator, cls, imm>;		def "" : BinaryRI<mnemonic, opcode1, operator, cls, imm>;
}		}
}		}

class CondBinaryRIE<string mnemonic, bits<16> opcode, RegisterOperand cls,		class CondBinaryRIE<string mnemonic, bits<16> opcode, RegisterOperand cls,
Immediate imm>		Immediate imm>
: InstRIEg<opcode, (outs cls:$R1),		: InstRIEg<opcode, (outs cls:$R1),
(ins cls:$R1src, imm:$I2, cond4:$valid, cond4:$M3),		(ins cls:$R1src, imm:$I2, cond4:$valid, cond4:$M3),
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	class BinaryRSY<string mnemonic, bits<16> opcode, SDPatternOperator operator,
: InstRSYa<opcode, (outs cls:$R1), (ins cls:$R3, shift20only:$BD2),		: InstRSYa<opcode, (outs cls:$R1), (ins cls:$R3, shift20only:$BD2),
mnemonic#"\t$R1, $R3, $BD2",		mnemonic#"\t$R1, $R3, $BD2",
[(set cls:$R1, (operator cls:$R3, shift20only:$BD2))]>;		[(set cls:$R1, (operator cls:$R3, shift20only:$BD2))]>;

multiclass BinaryRSAndK<string mnemonic, bits<8> opcode1, bits<16> opcode2,		multiclass BinaryRSAndK<string mnemonic, bits<8> opcode1, bits<16> opcode2,
SDPatternOperator operator, RegisterOperand cls> {		SDPatternOperator operator, RegisterOperand cls> {
let NumOpsKey = mnemonic in {		let NumOpsKey = mnemonic in {
let NumOpsValue = "3" in		let NumOpsValue = "3" in
def K : BinaryRSY<mnemonic##"k", opcode2, null_frag, cls>,		def K : BinaryRSY<mnemonic##"k", opcode2, operator, cls>,
Requires<[FeatureDistinctOps]>;		Requires<[FeatureDistinctOps]>;
let NumOpsValue = "2", isConvertibleToThreeAddress = 1 in		let NumOpsValue = "2" in
def "" : BinaryRS<mnemonic, opcode1, operator, cls>;		def "" : BinaryRS<mnemonic, opcode1, operator, cls>;
}		}
}		}

class BinaryRSL<string mnemonic, bits<16> opcode, RegisterOperand cls>		class BinaryRSL<string mnemonic, bits<16> opcode, RegisterOperand cls>
: InstRSLb<opcode, (outs cls:$R1),		: InstRSLb<opcode, (outs cls:$R1),
(ins bdladdr12onlylen8:$BDL2, imm32zx4:$M3),		(ins bdladdr12onlylen8:$BDL2, imm32zx4:$M3),
mnemonic#"\t$R1, $BDL2, $M3", []> {		mnemonic#"\t$R1, $BDL2, $M3", []> {
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	class BinaryRXY<string mnemonic, bits<16> opcode, SDPatternOperator operator,
let OpKey = mnemonic#"r"#cls;		let OpKey = mnemonic#"r"#cls;
let OpType = "mem";		let OpType = "mem";
let Constraints = "$R1 = $R1src";		let Constraints = "$R1 = $R1src";
let DisableEncoding = "$R1src";		let DisableEncoding = "$R1src";
let mayLoad = 1;		let mayLoad = 1;
let AccessBytes = bytes;		let AccessBytes = bytes;
}		}

		// A pseudo that is used during register allocation when folding a memory
		// operand. The 3-address register instruction with a spilled source cannot
		// be converted directly to a target 2-address reg/mem instruction.
		// Mapping: <INSN>R -> MemFoldPseudo -> <INSN>
		class MemFoldPseudo<string mnemonic, RegisterOperand cls, bits<5> bytes,
		AddressingMode mode>
		: Pseudo<(outs cls:$R1), (ins cls:$R2, mode:$XBD2), []> {
		let OpKey = mnemonic#"rk"#cls;
		let OpType = "mem";
		let MemKey = mnemonic#cls;
		let MemType = "pseudo";
		let mayLoad = 1;
		let AccessBytes = bytes;
		let HasIndex = 1;
		let hasNoSchedulingInfo = 1;
		}

		multiclass BinaryRXYAndPseudo<string mnemonic, bits<16> opcode,
		SDPatternOperator operator, RegisterOperand cls,
		SDPatternOperator load, bits<5> bytes,
		AddressingMode mode = bdxaddr20only> {

		def "" : BinaryRXY<mnemonic, opcode, operator, cls, load, bytes, mode> {
		let MemKey = mnemonic#cls;
		let MemType = "target";
		}
		let Has20BitOffset = 1 in
		def _MemFoldPseudo : MemFoldPseudo<mnemonic, cls, bytes, mode>;
		}

multiclass BinaryRXPair<string mnemonic, bits<8> rxOpcode, bits<16> rxyOpcode,		multiclass BinaryRXPair<string mnemonic, bits<8> rxOpcode, bits<16> rxyOpcode,
SDPatternOperator operator, RegisterOperand cls,		SDPatternOperator operator, RegisterOperand cls,
SDPatternOperator load, bits<5> bytes> {		SDPatternOperator load, bits<5> bytes> {
let DispKey = mnemonic ## #cls in {		let DispKey = mnemonic ## #cls in {
let DispSize = "12" in		let DispSize = "12" in
def "" : BinaryRX<mnemonic, rxOpcode, operator, cls, load, bytes,		def "" : BinaryRX<mnemonic, rxOpcode, operator, cls, load, bytes,
bdxaddr12pair>;		bdxaddr12pair>;
let DispSize = "20" in		let DispSize = "20" in
def Y : BinaryRXY<mnemonic#"y", rxyOpcode, operator, cls, load, bytes,		def Y : BinaryRXY<mnemonic#"y", rxyOpcode, operator, cls, load, bytes,
bdxaddr20pair>;		bdxaddr20pair>;
}		}
}		}

		multiclass BinaryRXPairAndPseudo<string mnemonic, bits<8> rxOpcode,
		bits<16> rxyOpcode, SDPatternOperator operator,
		RegisterOperand cls,
		SDPatternOperator load, bits<5> bytes> {
		let DispKey = mnemonic ## #cls in {
		def "" : BinaryRX<mnemonic, rxOpcode, operator, cls, load, bytes,
		bdxaddr12pair> {
		let DispSize = "12";
		let MemKey = mnemonic#cls;
		let MemType = "target";
		}
		let DispSize = "20" in
		def Y : BinaryRXY<mnemonic#"y", rxyOpcode, operator, cls, load,
		bytes, bdxaddr20pair>;
		uweigandUnsubmitted Not Done Reply Inline Actions Shouldn't the Y variant also be mapped to a pseudo? uweigand: Shouldn't the Y variant also be mapped to a pseudo?
		jonpaAuthorUnsubmitted Done Reply Inline Actions I thought the "Y" reg/mem mapping was dead since there is no "YR" opcode. So in BinaryRXPair, the instruction A makes (unpatched) with BinaryRX an entry AR->"mem", and with BinaryRXY an entry AYR->"mem" entry. There is however no AYR->"reg" entry, right? I think that the reason only the 12-bit displacement instructions are needed is that at this point a FrameIndex operand is added. This is then later handled in SystemZRegisterInfo::eliminateFrameIndex(), where the actual offset is checked. jonpa: I thought the "Y" reg/mem mapping was dead since there is no "YR" opcode. So in BinaryRXPair…
		uweigandUnsubmitted Done Reply Inline Actions I see. This looks fine then. uweigand: I see. This looks fine then.
		}
		def _MemFoldPseudo : MemFoldPseudo<mnemonic, cls, bytes, bdxaddr12pair>;
		}

class BinarySI<string mnemonic, bits<8> opcode, SDPatternOperator operator,		class BinarySI<string mnemonic, bits<8> opcode, SDPatternOperator operator,
Operand imm, AddressingMode mode = bdaddr12only>		Operand imm, AddressingMode mode = bdaddr12only>
: InstSI<opcode, (outs), (ins mode:$BD1, imm:$I2),		: InstSI<opcode, (outs), (ins mode:$BD1, imm:$I2),
mnemonic#"\t$BD1, $I2",		mnemonic#"\t$BD1, $I2",
[(store (operator (load mode:$BD1), imm:$I2), mode:$BD1)]> {		[(store (operator (load mode:$BD1), imm:$I2), mode:$BD1)]> {
let mayLoad = 1;		let mayLoad = 1;
let mayStore = 1;		let mayStore = 1;
}		}
▲ Show 20 Lines • Show All 1,176 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Convenience instructions that get lowered to real instructions		// Convenience instructions that get lowered to real instructions
// by either SystemZTargetLowering::EmitInstrWithCustomInserter()		// by either SystemZTargetLowering::EmitInstrWithCustomInserter()
// or SystemZInstrInfo::expandPostRAPseudo().		// or SystemZInstrInfo::expandPostRAPseudo().
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class Pseudo<dag outs, dag ins, list<dag> pattern>
: InstSystemZ<0, outs, ins, "", pattern> {
let isPseudo = 1;
let isCodeGenOnly = 1;
}

// Like UnaryRI, but expanded after RA depending on the choice of register.		// Like UnaryRI, but expanded after RA depending on the choice of register.
class UnaryRIPseudo<SDPatternOperator operator, RegisterOperand cls,		class UnaryRIPseudo<SDPatternOperator operator, RegisterOperand cls,
Immediate imm>		Immediate imm>
: Pseudo<(outs cls:$R1), (ins imm:$I2),		: Pseudo<(outs cls:$R1), (ins imm:$I2),
[(set cls:$R1, (operator imm:$I2))]>;		[(set cls:$R1, (operator imm:$I2))]>;

// Like UnaryRXY, but expanded after RA depending on the choice of register.		// Like UnaryRXY, but expanded after RA depending on the choice of register.
class UnaryRXYPseudo<string key, SDPatternOperator operator,		class UnaryRXYPseudo<string key, SDPatternOperator operator,
Show All 32 Lines	class BinaryRIEPseudo<SDPatternOperator operator, RegisterOperand cls,
: Pseudo<(outs cls:$R1), (ins cls:$R3, imm:$I2),		: Pseudo<(outs cls:$R1), (ins cls:$R3, imm:$I2),
[(set cls:$R1, (operator cls:$R3, imm:$I2))]>;		[(set cls:$R1, (operator cls:$R3, imm:$I2))]>;

// Like BinaryRIAndK, but expanded after RA depending on the choice of register.		// Like BinaryRIAndK, but expanded after RA depending on the choice of register.
multiclass BinaryRIAndKPseudo<string key, SDPatternOperator operator,		multiclass BinaryRIAndKPseudo<string key, SDPatternOperator operator,
RegisterOperand cls, Immediate imm> {		RegisterOperand cls, Immediate imm> {
let NumOpsKey = key in {		let NumOpsKey = key in {
let NumOpsValue = "3" in		let NumOpsValue = "3" in
def K : BinaryRIEPseudo<null_frag, cls, imm>,		def K : BinaryRIEPseudo<operator, cls, imm>,
Requires<[FeatureHighWord, FeatureDistinctOps]>;		Requires<[FeatureHighWord, FeatureDistinctOps]>;
let NumOpsValue = "2", isConvertibleToThreeAddress = 1 in		let NumOpsValue = "2" in
def "" : BinaryRIPseudo<operator, cls, imm>,		def "" : BinaryRIPseudo<operator, cls, imm>,
Requires<[FeatureHighWord]>;		Requires<[FeatureHighWord]>;
}		}
}		}

// Like CompareRI, but expanded after RA depending on the choice of register.		// Like CompareRI, but expanded after RA depending on the choice of register.
class CompareRIPseudo<SDPatternOperator operator, RegisterOperand cls,		class CompareRIPseudo<SDPatternOperator operator, RegisterOperand cls,
Immediate imm>		Immediate imm>
▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZInstrInfo.h

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	enum FusedCompareType {
CompareAndSibcall,		CompareAndSibcall,

// Trap		// Trap
CompareAndTrap		CompareAndTrap
};		};

} // end namespace SystemZII		} // end namespace SystemZII

		namespace SystemZ {
		int getTwoOperandOpcode(uint16_t Opcode);
		int getTargetMemOpcode(uint16_t Opcode);
		}

class SystemZInstrInfo : public SystemZGenInstrInfo {		class SystemZInstrInfo : public SystemZGenInstrInfo {
const SystemZRegisterInfo RI;		const SystemZRegisterInfo RI;
SystemZSubtarget &STI;		SystemZSubtarget &STI;

void splitMove(MachineBasicBlock::iterator MI, unsigned NewOpcode) const;		void splitMove(MachineBasicBlock::iterator MI, unsigned NewOpcode) const;
void splitAdjDynAlloc(MachineBasicBlock::iterator MI) const;		void splitAdjDynAlloc(MachineBasicBlock::iterator MI) const;
void expandRIPseudo(MachineInstr &MI, unsigned LowOpcode, unsigned HighOpcode,		void expandRIPseudo(MachineInstr &MI, unsigned LowOpcode, unsigned HighOpcode,
bool ConvertHigh) const;		bool ConvertHigh) const;
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	void loadRegFromStackSlot(MachineBasicBlock &MBB,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;
MachineInstr *convertToThreeAddress(MachineFunction::iterator &MFI,		MachineInstr *convertToThreeAddress(MachineFunction::iterator &MFI,
MachineInstr &MI,		MachineInstr &MI,
LiveVariables *LV) const override;		LiveVariables *LV) const override;
MachineInstr *		MachineInstr *
foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
ArrayRef<unsigned> Ops,		ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, int FrameIndex,		MachineBasicBlock::iterator InsertPt, int FrameIndex,
LiveIntervals *LIS = nullptr) const override;		LiveIntervals *LIS = nullptr,
		VirtRegMap *VRM = nullptr) const override;
MachineInstr *foldMemoryOperandImpl(		MachineInstr *foldMemoryOperandImpl(
MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,		MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,
LiveIntervals *LIS = nullptr) const override;		LiveIntervals *LIS = nullptr) const override;
bool expandPostRAPseudo(MachineInstr &MBBI) const override;		bool expandPostRAPseudo(MachineInstr &MBBI) const override;
bool reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const		bool reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const
override;		override;

▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZInstrInfo.cpp

Show First 20 Lines • Show All 951 Lines • ▼ Show 20 Lines
static void transferDeadCC(MachineInstr OldMI, MachineInstr NewMI) {		static void transferDeadCC(MachineInstr OldMI, MachineInstr NewMI) {
if (OldMI->registerDefIsDead(SystemZ::CC)) {		if (OldMI->registerDefIsDead(SystemZ::CC)) {
MachineOperand *CCDef = NewMI->findRegisterDefOperand(SystemZ::CC);		MachineOperand *CCDef = NewMI->findRegisterDefOperand(SystemZ::CC);
if (CCDef != nullptr)		if (CCDef != nullptr)
CCDef->setIsDead(true);		CCDef->setIsDead(true);
}		}
}		}

// Used to return from convertToThreeAddress after replacing two-address
// instruction OldMI with three-address instruction NewMI.
static MachineInstr finishConvertToThreeAddress(MachineInstr OldMI,
MachineInstr *NewMI,
LiveVariables *LV) {
if (LV) {
unsigned NumOps = OldMI->getNumOperands();
for (unsigned I = 1; I < NumOps; ++I) {
MachineOperand &Op = OldMI->getOperand(I);
if (Op.isReg() && Op.isKill())
LV->replaceKillInstruction(Op.getReg(), OldMI, NewMI);
}
}
transferDeadCC(OldMI, NewMI);
return NewMI;
}

MachineInstr *SystemZInstrInfo::convertToThreeAddress(		MachineInstr *SystemZInstrInfo::convertToThreeAddress(
MachineFunction::iterator &MFI, MachineInstr &MI, LiveVariables *LV) const {		MachineFunction::iterator &MFI, MachineInstr &MI, LiveVariables *LV) const {
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
MachineFunction *MF = MBB->getParent();
MachineRegisterInfo &MRI = MF->getRegInfo();

unsigned Opcode = MI.getOpcode();
unsigned NumOps = MI.getNumOperands();

// Try to convert something like SLL into SLLK, if supported.
// We prefer to keep the two-operand form where possible both
// because it tends to be shorter and because some instructions
// have memory forms that can be used during spilling.
if (STI.hasDistinctOps()) {
MachineOperand &Dest = MI.getOperand(0);
MachineOperand &Src = MI.getOperand(1);
unsigned DestReg = Dest.getReg();
unsigned SrcReg = Src.getReg();
// AHIMux is only really a three-operand instruction when both operands
// are low registers. Try to constrain both operands to be low if
// possible.
if (Opcode == SystemZ::AHIMux &&
TargetRegisterInfo::isVirtualRegister(DestReg) &&
TargetRegisterInfo::isVirtualRegister(SrcReg) &&
MRI.getRegClass(DestReg)->contains(SystemZ::R1L) &&
MRI.getRegClass(SrcReg)->contains(SystemZ::R1L)) {
MRI.constrainRegClass(DestReg, &SystemZ::GR32BitRegClass);
MRI.constrainRegClass(SrcReg, &SystemZ::GR32BitRegClass);
}
int ThreeOperandOpcode = SystemZ::getThreeOperandOpcode(Opcode);
if (ThreeOperandOpcode >= 0) {
// Create three address instruction without adding the implicit
// operands. Those will instead be copied over from the original
// instruction by the loop below.
MachineInstrBuilder MIB(
*MF, MF->CreateMachineInstr(get(ThreeOperandOpcode), MI.getDebugLoc(),
/NoImplicit=/true));
MIB.add(Dest);
// Keep the kill state, but drop the tied flag.
MIB.addReg(Src.getReg(), getKillRegState(Src.isKill()), Src.getSubReg());
// Keep the remaining operands as-is.
for (unsigned I = 2; I < NumOps; ++I)
MIB.add(MI.getOperand(I));
MBB->insert(MI, MIB);
return finishConvertToThreeAddress(&MI, MIB, LV);
}
}

// Try to convert an AND into an RISBG-type instruction.		// Try to convert an AND into an RISBG-type instruction.
if (LogicOp And = interpretAndImmediate(Opcode)) {		// TODO: It might be beneficial to select RISBG and shorten to AND instead.
		if (LogicOp And = interpretAndImmediate(MI.getOpcode())) {
uint64_t Imm = MI.getOperand(2).getImm() << And.ImmLSB;		uint64_t Imm = MI.getOperand(2).getImm() << And.ImmLSB;
// AND IMMEDIATE leaves the other bits of the register unchanged.		// AND IMMEDIATE leaves the other bits of the register unchanged.
Imm \|= allOnes(And.RegSize) & ~(allOnes(And.ImmSize) << And.ImmLSB);		Imm \|= allOnes(And.RegSize) & ~(allOnes(And.ImmSize) << And.ImmLSB);
unsigned Start, End;		unsigned Start, End;
if (isRxSBGMask(Imm, And.RegSize, Start, End)) {		if (isRxSBGMask(Imm, And.RegSize, Start, End)) {
unsigned NewOpcode;		unsigned NewOpcode;
if (And.RegSize == 64) {		if (And.RegSize == 64) {
NewOpcode = SystemZ::RISBG;		NewOpcode = SystemZ::RISBG;
Show All 11 Lines	if (isRxSBGMask(Imm, And.RegSize, Start, End)) {
BuildMI(*MBB, MI, MI.getDebugLoc(), get(NewOpcode))		BuildMI(*MBB, MI, MI.getDebugLoc(), get(NewOpcode))
.add(Dest)		.add(Dest)
.addReg(0)		.addReg(0)
.addReg(Src.getReg(), getKillRegState(Src.isKill()),		.addReg(Src.getReg(), getKillRegState(Src.isKill()),
Src.getSubReg())		Src.getSubReg())
.addImm(Start)		.addImm(Start)
.addImm(End + 128)		.addImm(End + 128)
.addImm(0);		.addImm(0);
return finishConvertToThreeAddress(&MI, MIB, LV);		if (LV) {
		unsigned NumOps = MI.getNumOperands();
		for (unsigned I = 1; I < NumOps; ++I) {
		MachineOperand &Op = MI.getOperand(I);
		if (Op.isReg() && Op.isKill())
		LV->replaceKillInstruction(Op.getReg(), MI, *MIB);
		}
		}
		transferDeadCC(&MI, MIB);
		return MIB;
}		}
}		}
return nullptr;		return nullptr;
}		}

MachineInstr *SystemZInstrInfo::foldMemoryOperandImpl(		MachineInstr *SystemZInstrInfo::foldMemoryOperandImpl(
MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, int FrameIndex,		MachineBasicBlock::iterator InsertPt, int FrameIndex,
LiveIntervals *LIS) const {		LiveIntervals LIS, VirtRegMap VRM) const {
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
unsigned Size = MFI.getObjectSize(FrameIndex);		unsigned Size = MFI.getObjectSize(FrameIndex);
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();

if (Ops.size() == 2 && Ops[0] == 0 && Ops[1] == 1) {		if (Ops.size() == 2 && Ops[0] == 0 && Ops[1] == 1) {
if (LIS != nullptr && (Opcode == SystemZ::LA \|\| Opcode == SystemZ::LAY) &&		if (LIS != nullptr && (Opcode == SystemZ::LA \|\| Opcode == SystemZ::LAY) &&
isInt<8>(MI.getOperand(2).getImm()) && !MI.getOperand(3).getReg()) {		isInt<8>(MI.getOperand(2).getImm()) && !MI.getOperand(3).getReg()) {
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	if (MMO->getSize() == Size && !MMO->isVolatile() && !MMO->isAtomic()) {
.addImm(Size)		.addImm(Size)
.addFrameIndex(FrameIndex)		.addFrameIndex(FrameIndex)
.addImm(0)		.addImm(0)
.addMemOperand(MMO);		.addMemOperand(MMO);
}		}
}		}
}		}

// If the spilled operand is the final one, try to change <INSN>R		// If the spilled operand is the final one or the instruction is
// into <INSN>.		// commutable, try to change <INSN>R into <INSN>.
		unsigned NumOps = MI.getNumExplicitOperands();
int MemOpcode = SystemZ::getMemOpcode(Opcode);		int MemOpcode = SystemZ::getMemOpcode(Opcode);

		// See if this is a 3-address instruction that is convertible to 2-address
		// and suitable for folding below. Only try this whith virtual registers
		// and a provided VRM (during regalloc).
		bool NeedsCommute = false;
		if (SystemZ::getTwoOperandOpcode(Opcode) != -1 && MemOpcode != -1) {
		if (VRM == nullptr)
		MemOpcode = -1;
		else {
		assert(NumOps == 3 && "Expected two source registers.");
		unsigned DstReg = MI.getOperand(0).getReg();
		unsigned DstPhys =
		(TRI->isVirtualRegister(DstReg) ? VRM->getPhys(DstReg) : DstReg);
		unsigned SrcReg = (OpNum == 2 ? MI.getOperand(1).getReg()
		: ((OpNum == 1 && MI.isCommutable())
		? MI.getOperand(2).getReg()
		: 0));
		if (DstPhys && !SystemZ::GRH32BitRegClass.contains(DstPhys) && SrcReg &&
		TRI->isVirtualRegister(SrcReg) && DstPhys == VRM->getPhys(SrcReg))
		NeedsCommute = (OpNum == 1);
		else
		MemOpcode = -1;
		}
		}

if (MemOpcode >= 0) {		if (MemOpcode >= 0) {
unsigned NumOps = MI.getNumExplicitOperands();		if ((OpNum == NumOps - 1) \|\| NeedsCommute) {
if (OpNum == NumOps - 1) {
const MCInstrDesc &MemDesc = get(MemOpcode);		const MCInstrDesc &MemDesc = get(MemOpcode);
uint64_t AccessBytes = SystemZII::getAccessSize(MemDesc.TSFlags);		uint64_t AccessBytes = SystemZII::getAccessSize(MemDesc.TSFlags);
assert(AccessBytes != 0 && "Size of access should be known");		assert(AccessBytes != 0 && "Size of access should be known");
assert(AccessBytes <= Size && "Access outside the frame index");		assert(AccessBytes <= Size && "Access outside the frame index");
uint64_t Offset = Size - AccessBytes;		uint64_t Offset = Size - AccessBytes;
MachineInstrBuilder MIB = BuildMI(*InsertPt->getParent(), InsertPt,		MachineInstrBuilder MIB = BuildMI(*InsertPt->getParent(), InsertPt,
MI.getDebugLoc(), get(MemOpcode));		MI.getDebugLoc(), get(MemOpcode));
for (unsigned I = 0; I < OpNum; ++I)		MIB.add(MI.getOperand(0));
		if (NeedsCommute)
		MIB.add(MI.getOperand(2));
		else
		for (unsigned I = 1; I < OpNum; ++I)
MIB.add(MI.getOperand(I));		MIB.add(MI.getOperand(I));
MIB.addFrameIndex(FrameIndex).addImm(Offset);		MIB.addFrameIndex(FrameIndex).addImm(Offset);
if (MemDesc.TSFlags & SystemZII::HasIndex)		if (MemDesc.TSFlags & SystemZII::HasIndex)
MIB.addReg(0);		MIB.addReg(0);
transferDeadCC(&MI, MIB);		transferDeadCC(&MI, MIB);
return MIB;		return MIB;
}		}
}		}

▲ Show 20 Lines • Show All 579 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZInstrInfo.td

Show First 20 Lines • Show All 910 Lines • ▼ Show 20 Lines	def AFIMux : BinaryRIPseudo<z_sadd, GRX32, simm32>,
Requires<[FeatureHighWord]>;		Requires<[FeatureHighWord]>;
def AFI : BinaryRIL<"afi", 0xC29, z_sadd, GR32, simm32>;		def AFI : BinaryRIL<"afi", 0xC29, z_sadd, GR32, simm32>;
def AIH : BinaryRIL<"aih", 0xCC8, z_sadd, GRH32, simm32>,		def AIH : BinaryRIL<"aih", 0xCC8, z_sadd, GRH32, simm32>,
Requires<[FeatureHighWord]>;		Requires<[FeatureHighWord]>;
def AGFI : BinaryRIL<"agfi", 0xC28, z_sadd, GR64, imm64sx32>;		def AGFI : BinaryRIL<"agfi", 0xC28, z_sadd, GR64, imm64sx32>;

// Addition of memory.		// Addition of memory.
defm AH : BinaryRXPair<"ah", 0x4A, 0xE37A, z_sadd, GR32, asextloadi16, 2>;		defm AH : BinaryRXPair<"ah", 0x4A, 0xE37A, z_sadd, GR32, asextloadi16, 2>;
defm A : BinaryRXPair<"a", 0x5A, 0xE35A, z_sadd, GR32, load, 4>;		defm A : BinaryRXPairAndPseudo<"a", 0x5A, 0xE35A, z_sadd, GR32, load, 4>;
def AGH : BinaryRXY<"agh", 0xE338, z_sadd, GR64, asextloadi16, 2>,		def AGH : BinaryRXY<"agh", 0xE338, z_sadd, GR64, asextloadi16, 2>,
Requires<[FeatureMiscellaneousExtensions2]>;		Requires<[FeatureMiscellaneousExtensions2]>;
def AGF : BinaryRXY<"agf", 0xE318, z_sadd, GR64, asextloadi32, 4>;		def AGF : BinaryRXY<"agf", 0xE318, z_sadd, GR64, asextloadi32, 4>;
def AG : BinaryRXY<"ag", 0xE308, z_sadd, GR64, load, 8>;		defm AG : BinaryRXYAndPseudo<"ag", 0xE308, z_sadd, GR64, load, 8>;

// Addition to memory.		// Addition to memory.
def ASI : BinarySIY<"asi", 0xEB6A, add, imm32sx8>;		def ASI : BinarySIY<"asi", 0xEB6A, add, imm32sx8>;
def AGSI : BinarySIY<"agsi", 0xEB7A, add, imm64sx8>;		def AGSI : BinarySIY<"agsi", 0xEB7A, add, imm64sx8>;
}		}
defm : SXB<z_sadd, GR64, AGFR>;		defm : SXB<z_sadd, GR64, AGFR>;

// Addition producing a carry.		// Addition producing a carry.
Show All 21 Lines	let Defs = [CC] in {
def ALFI : BinaryRIL<"alfi", 0xC2B, z_uadd, GR32, uimm32>;		def ALFI : BinaryRIL<"alfi", 0xC2B, z_uadd, GR32, uimm32>;
def ALGFI : BinaryRIL<"algfi", 0xC2A, z_uadd, GR64, imm64zx32>;		def ALGFI : BinaryRIL<"algfi", 0xC2A, z_uadd, GR64, imm64zx32>;

// Addition of signed 32-bit immediates.		// Addition of signed 32-bit immediates.
def ALSIH : BinaryRIL<"alsih", 0xCCA, null_frag, GRH32, simm32>,		def ALSIH : BinaryRIL<"alsih", 0xCCA, null_frag, GRH32, simm32>,
Requires<[FeatureHighWord]>;		Requires<[FeatureHighWord]>;

// Addition of memory.		// Addition of memory.
defm AL : BinaryRXPair<"al", 0x5E, 0xE35E, z_uadd, GR32, load, 4>;		defm AL : BinaryRXPairAndPseudo<"al", 0x5E, 0xE35E, z_uadd, GR32, load, 4>;
def ALGF : BinaryRXY<"algf", 0xE31A, z_uadd, GR64, azextloadi32, 4>;		def ALGF : BinaryRXY<"algf", 0xE31A, z_uadd, GR64, azextloadi32, 4>;
def ALG : BinaryRXY<"alg", 0xE30A, z_uadd, GR64, load, 8>;		defm ALG : BinaryRXYAndPseudo<"alg", 0xE30A, z_uadd, GR64, load, 8>;

// Addition to memory.		// Addition to memory.
def ALSI : BinarySIY<"alsi", 0xEB6E, null_frag, imm32sx8>;		def ALSI : BinarySIY<"alsi", 0xEB6E, null_frag, imm32sx8>;
def ALGSI : BinarySIY<"algsi", 0xEB7E, null_frag, imm64sx8>;		def ALGSI : BinarySIY<"algsi", 0xEB7E, null_frag, imm64sx8>;
}		}
defm : ZXB<z_uadd, GR64, ALGFR>;		defm : ZXB<z_uadd, GR64, ALGFR>;

// Addition producing and using a carry.		// Addition producing and using a carry.
Show All 26 Lines	let Defs = [CC], CCValues = 0xF, CompareZeroCCMask = 0x8 in {
// Subtraction from a high register.		// Subtraction from a high register.
def SHHHR : BinaryRRFa<"shhhr", 0xB9C9, null_frag, GRH32, GRH32, GRH32>,		def SHHHR : BinaryRRFa<"shhhr", 0xB9C9, null_frag, GRH32, GRH32, GRH32>,
Requires<[FeatureHighWord]>;		Requires<[FeatureHighWord]>;
def SHHLR : BinaryRRFa<"shhlr", 0xB9D9, null_frag, GRH32, GRH32, GR32>,		def SHHLR : BinaryRRFa<"shhlr", 0xB9D9, null_frag, GRH32, GRH32, GR32>,
Requires<[FeatureHighWord]>;		Requires<[FeatureHighWord]>;

// Subtraction of memory.		// Subtraction of memory.
defm SH : BinaryRXPair<"sh", 0x4B, 0xE37B, z_ssub, GR32, asextloadi16, 2>;		defm SH : BinaryRXPair<"sh", 0x4B, 0xE37B, z_ssub, GR32, asextloadi16, 2>;
defm S : BinaryRXPair<"s", 0x5B, 0xE35B, z_ssub, GR32, load, 4>;		defm S : BinaryRXPairAndPseudo<"s", 0x5B, 0xE35B, z_ssub, GR32, load, 4>;
def SGH : BinaryRXY<"sgh", 0xE339, z_ssub, GR64, asextloadi16, 2>,		def SGH : BinaryRXY<"sgh", 0xE339, z_ssub, GR64, asextloadi16, 2>,
Requires<[FeatureMiscellaneousExtensions2]>;		Requires<[FeatureMiscellaneousExtensions2]>;
def SGF : BinaryRXY<"sgf", 0xE319, z_ssub, GR64, asextloadi32, 4>;		def SGF : BinaryRXY<"sgf", 0xE319, z_ssub, GR64, asextloadi32, 4>;
def SG : BinaryRXY<"sg", 0xE309, z_ssub, GR64, load, 8>;		defm SG : BinaryRXYAndPseudo<"sg", 0xE309, z_ssub, GR64, load, 8>;
}		}
defm : SXB<z_ssub, GR64, SGFR>;		defm : SXB<z_ssub, GR64, SGFR>;

// Subtracting an immediate is the same as adding the negated immediate.		// Subtracting an immediate is the same as adding the negated immediate.
let AddedComplexity = 1 in {		let AddedComplexity = 1 in {
def : Pat<(z_ssub GR32:$src1, imm32sx16n:$src2),		def : Pat<(z_ssub GR32:$src1, imm32sx16n:$src2),
(AHIMux GR32:$src1, imm32sx16n:$src2)>,		(AHIMux GR32:$src1, imm32sx16n:$src2)>,
Requires<[FeatureHighWord]>;		Requires<[FeatureHighWord]>;
Show All 31 Lines	let Defs = [CC] in {
def SLHHLR : BinaryRRFa<"slhhlr", 0xB9DB, null_frag, GRH32, GRH32, GR32>,		def SLHHLR : BinaryRRFa<"slhhlr", 0xB9DB, null_frag, GRH32, GRH32, GR32>,
Requires<[FeatureHighWord]>;		Requires<[FeatureHighWord]>;

// Subtraction of unsigned 32-bit immediates.		// Subtraction of unsigned 32-bit immediates.
def SLFI : BinaryRIL<"slfi", 0xC25, z_usub, GR32, uimm32>;		def SLFI : BinaryRIL<"slfi", 0xC25, z_usub, GR32, uimm32>;
def SLGFI : BinaryRIL<"slgfi", 0xC24, z_usub, GR64, imm64zx32>;		def SLGFI : BinaryRIL<"slgfi", 0xC24, z_usub, GR64, imm64zx32>;

// Subtraction of memory.		// Subtraction of memory.
defm SL : BinaryRXPair<"sl", 0x5F, 0xE35F, z_usub, GR32, load, 4>;		defm SL : BinaryRXPairAndPseudo<"sl", 0x5F, 0xE35F, z_usub, GR32, load, 4>;
def SLGF : BinaryRXY<"slgf", 0xE31B, z_usub, GR64, azextloadi32, 4>;		def SLGF : BinaryRXY<"slgf", 0xE31B, z_usub, GR64, azextloadi32, 4>;
def SLG : BinaryRXY<"slg", 0xE30B, z_usub, GR64, load, 8>;		defm SLG : BinaryRXYAndPseudo<"slg", 0xE30B, z_usub, GR64, load, 8>;
}		}
defm : ZXB<z_usub, GR64, SLGFR>;		defm : ZXB<z_usub, GR64, SLGFR>;

// Subtracting an immediate is the same as adding the negated immediate.		// Subtracting an immediate is the same as adding the negated immediate.
let AddedComplexity = 1 in {		let AddedComplexity = 1 in {
def : Pat<(z_usub GR32:$src1, imm32sx16n:$src2),		def : Pat<(z_usub GR32:$src1, imm32sx16n:$src2),
(ALHSIK GR32:$src1, imm32sx16n:$src2)>,		(ALHSIK GR32:$src1, imm32sx16n:$src2)>,
Requires<[FeatureDistinctOps]>;		Requires<[FeatureDistinctOps]>;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	let CCValues = 0xC, CompareZeroCCMask = 0x8 in {
def NIHF : BinaryRIL<"nihf", 0xC0A, and, GRH32, uimm32>;		def NIHF : BinaryRIL<"nihf", 0xC0A, and, GRH32, uimm32>;
}		}
def NILF64 : BinaryAliasRIL<and, GR64, imm64lf32c>;		def NILF64 : BinaryAliasRIL<and, GR64, imm64lf32c>;
def NIHF64 : BinaryAliasRIL<and, GR64, imm64hf32c>;		def NIHF64 : BinaryAliasRIL<and, GR64, imm64hf32c>;
}		}

// ANDs of memory.		// ANDs of memory.
let CCValues = 0xC, CompareZeroCCMask = 0x8 in {		let CCValues = 0xC, CompareZeroCCMask = 0x8 in {
defm N : BinaryRXPair<"n", 0x54, 0xE354, and, GR32, load, 4>;		defm N : BinaryRXPairAndPseudo<"n", 0x54, 0xE354, and, GR32, load, 4>;
def NG : BinaryRXY<"ng", 0xE380, and, GR64, load, 8>;		defm NG : BinaryRXYAndPseudo<"ng", 0xE380, and, GR64, load, 8>;
}		}

// AND to memory		// AND to memory
defm NI : BinarySIPair<"ni", 0x94, 0xEB54, null_frag, imm32zx8>;		defm NI : BinarySIPair<"ni", 0x94, 0xEB54, null_frag, imm32zx8>;

// Block AND.		// Block AND.
let mayLoad = 1, mayStore = 1 in		let mayLoad = 1, mayStore = 1 in
defm NC : MemorySS<"nc", 0xD4, z_nc, z_nc_loop>;		defm NC : MemorySS<"nc", 0xD4, z_nc, z_nc_loop>;
Show All 39 Lines	let CCValues = 0xC, CompareZeroCCMask = 0x8 in {
def OILF : BinaryRIL<"oilf", 0xC0D, or, GR32, uimm32>;		def OILF : BinaryRIL<"oilf", 0xC0D, or, GR32, uimm32>;
def OIHF : BinaryRIL<"oihf", 0xC0C, or, GRH32, uimm32>;		def OIHF : BinaryRIL<"oihf", 0xC0C, or, GRH32, uimm32>;
}		}
def OILF64 : BinaryAliasRIL<or, GR64, imm64lf32>;		def OILF64 : BinaryAliasRIL<or, GR64, imm64lf32>;
def OIHF64 : BinaryAliasRIL<or, GR64, imm64hf32>;		def OIHF64 : BinaryAliasRIL<or, GR64, imm64hf32>;

// ORs of memory.		// ORs of memory.
let CCValues = 0xC, CompareZeroCCMask = 0x8 in {		let CCValues = 0xC, CompareZeroCCMask = 0x8 in {
defm O : BinaryRXPair<"o", 0x56, 0xE356, or, GR32, load, 4>;		defm O : BinaryRXPairAndPseudo<"o", 0x56, 0xE356, or, GR32, load, 4>;
def OG : BinaryRXY<"og", 0xE381, or, GR64, load, 8>;		defm OG : BinaryRXYAndPseudo<"og", 0xE381, or, GR64, load, 8>;
}		}

// OR to memory		// OR to memory
defm OI : BinarySIPair<"oi", 0x96, 0xEB56, null_frag, imm32zx8>;		defm OI : BinarySIPair<"oi", 0x96, 0xEB56, null_frag, imm32zx8>;

// Block OR.		// Block OR.
let mayLoad = 1, mayStore = 1 in		let mayLoad = 1, mayStore = 1 in
defm OC : MemorySS<"oc", 0xD6, z_oc, z_oc_loop>;		defm OC : MemorySS<"oc", 0xD6, z_oc, z_oc_loop>;
Show All 22 Lines	let CCValues = 0xC, CompareZeroCCMask = 0x8 in {
def XILF : BinaryRIL<"xilf", 0xC07, xor, GR32, uimm32>;		def XILF : BinaryRIL<"xilf", 0xC07, xor, GR32, uimm32>;
def XIHF : BinaryRIL<"xihf", 0xC06, xor, GRH32, uimm32>;		def XIHF : BinaryRIL<"xihf", 0xC06, xor, GRH32, uimm32>;
}		}
def XILF64 : BinaryAliasRIL<xor, GR64, imm64lf32>;		def XILF64 : BinaryAliasRIL<xor, GR64, imm64lf32>;
def XIHF64 : BinaryAliasRIL<xor, GR64, imm64hf32>;		def XIHF64 : BinaryAliasRIL<xor, GR64, imm64hf32>;

// XORs of memory.		// XORs of memory.
let CCValues = 0xC, CompareZeroCCMask = 0x8 in {		let CCValues = 0xC, CompareZeroCCMask = 0x8 in {
defm X : BinaryRXPair<"x",0x57, 0xE357, xor, GR32, load, 4>;		defm X : BinaryRXPairAndPseudo<"x",0x57, 0xE357, xor, GR32, load, 4>;
def XG : BinaryRXY<"xg", 0xE382, xor, GR64, load, 8>;		defm XG : BinaryRXYAndPseudo<"xg", 0xE382, xor, GR64, load, 8>;
}		}

// XOR to memory		// XOR to memory
defm XI : BinarySIPair<"xi", 0x97, 0xEB57, null_frag, imm32zx8>;		defm XI : BinarySIPair<"xi", 0x97, 0xEB57, null_frag, imm32zx8>;

// Block XOR.		// Block XOR.
let mayLoad = 1, mayStore = 1 in		let mayLoad = 1, mayStore = 1 in
defm XC : MemorySS<"xc", 0xD7, z_xc, z_xc_loop>;		defm XC : MemorySS<"xc", 0xD7, z_xc, z_xc_loop>;
▲ Show 20 Lines • Show All 982 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZPostRewrite.cpp

This file was added.

				//==---- SystemZPostRewrite.cpp - Select pseudos after RegAlloc ---- C++ --=//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains a pass that is run immediately after VirtRegRewriter
				// but before MachineCopyPropagation. The purpose is to lower pseudos to
				// target instructions before any later pass might substitute a register for
				// another.
				//
				//===----------------------------------------------------------------------===//

				#include "SystemZ.h"
				#include "SystemZInstrInfo.h"
				#include "SystemZSubtarget.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				using namespace llvm;

				#define SYSTEMZ_POSTREWRITE_NAME "SystemZ Post Rewrite pass"

				#define DEBUG_TYPE "systemz-postrewrite"
				STATISTIC(MemFoldCopies, "Number of copies inserted before folded mem ops.");

				namespace llvm {
				void initializeSystemZPostRewritePass(PassRegistry&);
				}

				namespace {

				class SystemZPostRewrite : public MachineFunctionPass {
				public:
				static char ID;
				SystemZPostRewrite() : MachineFunctionPass(ID) {
				initializeSystemZPostRewritePass(*PassRegistry::getPassRegistry());
				}

				const SystemZInstrInfo *TII;

				bool runOnMachineFunction(MachineFunction &Fn) override;

				StringRef getPassName() const override { return SYSTEMZ_POSTREWRITE_NAME; }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesAll();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				private:
				bool selectMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
				MachineBasicBlock::iterator &NextMBBI);
				bool selectMBB(MachineBasicBlock &MBB);
				};

				char SystemZPostRewrite::ID = 0;

				} // end anonymous namespace

				INITIALIZE_PASS(SystemZPostRewrite, "systemz-post-rewrite",
				SYSTEMZ_POSTREWRITE_NAME, false, false)

				/// Returns an instance of the Post Rewrite pass.
				FunctionPass *llvm::createSystemZPostRewritePass(SystemZTargetMachine &TM) {
				return new SystemZPostRewrite();
				}

				/// If MBBI references a pseudo instruction that should be selected here,
				/// do it and return true. Otherwise return false.
				bool SystemZPostRewrite::selectMI(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MBBI,
				MachineBasicBlock::iterator &NextMBBI) {
				MachineInstr &MI = *MBBI;
				unsigned Opcode = MI.getOpcode();

				// Note: If this could be done during regalloc in foldMemoryOperandImpl()
				// while also updating the LiveIntervals, there would be no need for the
				// MemFoldPseudo to begin with.
				int TargetMemOpcode = SystemZ::getTargetMemOpcode(Opcode);
				if (TargetMemOpcode != -1) {
				MI.setDesc(TII->get(TargetMemOpcode));
				MI.tieOperands(0, 1);
				unsigned DstReg = MI.getOperand(0).getReg();
				MachineOperand &SrcMO = MI.getOperand(1);
				if (DstReg != SrcMO.getReg()) {
				BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(SystemZ::COPY), DstReg)
				.addReg(SrcMO.getReg());
				SrcMO.setReg(DstReg);
				MemFoldCopies++;
				}
				return true;
				}

				return false;
				}

				/// Iterate over the instructions in basic block MBB and select any
				/// pseudo instructions. Return true if anything was modified.
				bool SystemZPostRewrite::selectMBB(MachineBasicBlock &MBB) {
				bool Modified = false;

				MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
				while (MBBI != E) {
				MachineBasicBlock::iterator NMBBI = std::next(MBBI);
				Modified \|= selectMI(MBB, MBBI, NMBBI);
				MBBI = NMBBI;
				}

				return Modified;
				}

				bool SystemZPostRewrite::runOnMachineFunction(MachineFunction &MF) {
				TII = static_cast<const SystemZInstrInfo *>(MF.getSubtarget().getInstrInfo());

				bool Modified = false;
				for (auto &MBB : MF)
				Modified \|= selectMBB(MBB);

				return Modified;
				}

lib/Target/SystemZ/SystemZRegisterInfo.cpp

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
bool		bool
SystemZRegisterInfo::getRegAllocationHints(unsigned VirtReg,		SystemZRegisterInfo::getRegAllocationHints(unsigned VirtReg,
ArrayRef<MCPhysReg> Order,		ArrayRef<MCPhysReg> Order,
SmallVectorImpl<MCPhysReg> &Hints,		SmallVectorImpl<MCPhysReg> &Hints,
const MachineFunction &MF,		const MachineFunction &MF,
const VirtRegMap *VRM,		const VirtRegMap *VRM,
const LiveRegMatrix *Matrix) const {		const LiveRegMatrix *Matrix) const {
const MachineRegisterInfo *MRI = &MF.getRegInfo();		const MachineRegisterInfo *MRI = &MF.getRegInfo();
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();		const SystemZSubtarget &Subtarget = MF.getSubtarget<SystemZSubtarget>();
		const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();

bool BaseImplRetVal = TargetRegisterInfo::getRegAllocationHints(		bool BaseImplRetVal = TargetRegisterInfo::getRegAllocationHints(
VirtReg, Order, Hints, MF, VRM, Matrix);		VirtReg, Order, Hints, MF, VRM, Matrix);

if (MRI->getRegClass(VirtReg) == &SystemZ::GRX32BitRegClass) {		if (MRI->getRegClass(VirtReg) == &SystemZ::GRX32BitRegClass) {
SmallVector<unsigned, 8> Worklist;		SmallVector<unsigned, 8> Worklist;
SmallSet<unsigned, 4> DoneRegs;		SmallSet<unsigned, 4> DoneRegs;
Worklist.push_back(VirtReg);		Worklist.push_back(VirtReg);
Show All 40 Lines	while (Worklist.size()) {
return false;		return false;
}		}
}		}
} // end CHIMux / CFIMux		} // end CHIMux / CFIMux
}		}
}		}
}		}

		if (VRM == nullptr)
		return BaseImplRetVal;

		// Add any two address hints after any copy hints.
		SmallSet<unsigned, 4> TwoAddrHints;
		for (auto &Use : MRI->reg_nodbg_instructions(VirtReg))
		if (SystemZ::getTwoOperandOpcode(Use.getOpcode()) != -1) {
		const MachineOperand *VRRegMO = nullptr;
		const MachineOperand *OtherMO = nullptr;
		const MachineOperand *CommuMO = nullptr;
		if (VirtReg == Use.getOperand(0).getReg()) {
		VRRegMO = &Use.getOperand(0);
		OtherMO = &Use.getOperand(1);
		if (Use.isCommutable())
		CommuMO = &Use.getOperand(2);
		} else if (VirtReg == Use.getOperand(1).getReg()) {
		VRRegMO = &Use.getOperand(1);
		OtherMO = &Use.getOperand(0);
		} else if (VirtReg == Use.getOperand(2).getReg() && Use.isCommutable()) {
		VRRegMO = &Use.getOperand(2);
		OtherMO = &Use.getOperand(0);
		} else
		uweigandUnsubmitted Done Reply Inline Actions Just a tiny nit: can't we check for reserved registers and already hinted registers in the loop above? Then we could eliminate the CopyHints variable and get rid of one copy of the whole array. uweigand: Just a tiny nit: can't we check for reserved registers and already hinted registers in the loop…
		jonpaAuthorUnsubmitted Done Reply Inline Actions Not sure how we could eleiminate CopyHints... Perhaps it would be better to use is_contained(Hints, PhysReg) as is done in AllocationOrder::isHint(), but if we do that maybe we should also change addHints(). The TwoAddrHints set is needed in the second loop to iterate over Order so that the hinted regs are sorted by it (Order), which I think is expected. Not sure what "copy of the whole array" we could get rid of... jonpa: Not sure how we could eleiminate CopyHints... Perhaps it would be better to use is_contained…
		uweigandUnsubmitted Done Reply Inline Actions I mean this statment: CopyHints.insert(Hints.begin(), Hints.end()); which does copy the whole Hints array into a set. This may not be a big deal since the array is typically small, I just thought it could be easily avoided by indeed using something like a is_contained(Hints, PhysReg) check in the first loop. I agree we need the second loop in any case. (addHints() seems different since here we need to clear/change the existing Hints array anyway and therefore we have to a copy somewhere.) uweigand: I mean this statment: ``` CopyHints.insert(Hints.begin(), Hints.end()); ``` which does copy…
		jonpaAuthorUnsubmitted Done Reply Inline Actions Changed to use is_contained() instead. jonpa: Changed to use is_contained() instead.
		continue;

		auto tryAddHint = [&](const MachineOperand *MO) -> void {
		unsigned Reg = MO->getReg();
		unsigned PhysReg = isPhysicalRegister(Reg) ? Reg : VRM->getPhys(Reg);
		if (PhysReg) {
		if (MO->getSubReg())
		PhysReg = getSubReg(PhysReg, MO->getSubReg());
		if (VRRegMO->getSubReg())
		PhysReg = getMatchingSuperReg(PhysReg, VRRegMO->getSubReg(),
		MRI->getRegClass(VirtReg));
		if (!MRI->isReserved(PhysReg) && !is_contained(Hints, PhysReg))
		TwoAddrHints.insert(PhysReg);
		}
		};
		tryAddHint(OtherMO);
		if (CommuMO)
		tryAddHint(CommuMO);
		}
		for (MCPhysReg OrderReg : Order)
		if (TwoAddrHints.count(OrderReg))
		Hints.push_back(OrderReg);

return BaseImplRetVal;		return BaseImplRetVal;
}		}

const MCPhysReg *		const MCPhysReg *
SystemZRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {		SystemZRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
const SystemZSubtarget &Subtarget = MF->getSubtarget<SystemZSubtarget>();		const SystemZSubtarget &Subtarget = MF->getSubtarget<SystemZSubtarget>();
if (MF->getFunction().getCallingConv() == CallingConv::AnyReg)		if (MF->getFunction().getCallingConv() == CallingConv::AnyReg)
return Subtarget.hasVector()? CSR_SystemZ_AllRegs_Vector_SaveList		return Subtarget.hasVector()? CSR_SystemZ_AllRegs_Vector_SaveList
▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZShortenInst.cpp

Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	for (auto MBBI = MBB.rbegin(), MBBE = MBB.rend(); MBBI != MBBE; ++MBBI) {

case SystemZ::VL64:		case SystemZ::VL64:
Changed \|= shortenOn0(MI, SystemZ::LD);		Changed \|= shortenOn0(MI, SystemZ::LD);
break;		break;

case SystemZ::VST64:		case SystemZ::VST64:
Changed \|= shortenOn0(MI, SystemZ::STD);		Changed \|= shortenOn0(MI, SystemZ::STD);
break;		break;

		default: {
		int TwoOperandOpcode = SystemZ::getTwoOperandOpcode(MI.getOpcode());
		if (TwoOperandOpcode == -1)
		break;

		if ((MI.getOperand(0).getReg() != MI.getOperand(1).getReg()) &&
		(!MI.isCommutable() \|\|
		MI.getOperand(0).getReg() != MI.getOperand(2).getReg() \|\|
		uweigandUnsubmitted Done Reply Inline Actions So all shifts ignore everything but the low 6 bits of the shift count anyway. This means we can always convert a SLLK to SLL, we just may have to truncate the constant. uweigand: So all shifts ignore everything but the low 6 bits of the shift count anyway. This means we…
		jonpaAuthorUnsubmitted Done Reply Inline Actions Ah, yes, forgot that. This gave ~100 more 2-address instructions. At least the assembler complains if we do not truncate the immediates. jonpa: Ah, yes, forgot that. This gave ~100 more 2-address instructions. At least the assembler…
		!TII->commuteInstruction(MI, false, 1, 2)))
		break;

		MI.setDesc(TII->get(TwoOperandOpcode));
		MI.tieOperands(0, 1);
		if (TwoOperandOpcode == SystemZ::SLL \|\|
		TwoOperandOpcode == SystemZ::SLA \|\|
		TwoOperandOpcode == SystemZ::SRL \|\|
		TwoOperandOpcode == SystemZ::SRA) {
		// These shifts only use the low 6 bits of the shift count.
		MachineOperand &ImmMO = MI.getOperand(3);
		ImmMO.setImm(ImmMO.getImm() & 0xfff);
		}
		Changed = true;
		break;
		}
}		}

LiveRegs.stepBackward(MI);		LiveRegs.stepBackward(MI);
}		}

return Changed;		return Changed;
}		}

Show All 15 Lines

lib/Target/SystemZ/SystemZTargetMachine.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	createPostMachineScheduler(MachineSchedContext *C) const override {
return new ScheduleDAGMI(C,		return new ScheduleDAGMI(C,
llvm::make_unique<SystemZPostRASchedStrategy>(C),		llvm::make_unique<SystemZPostRASchedStrategy>(C),
/RemoveKillFlags=/true);		/RemoveKillFlags=/true);
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addILPOpts() override;		bool addILPOpts() override;
		void addPostRewrite() override;
void addPreSched2() override;		void addPreSched2() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};

} // end anonymous namespace		} // end anonymous namespace

void SystemZPassConfig::addIRPasses() {		void SystemZPassConfig::addIRPasses() {
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
Show All 13 Lines	if (getOptLevel() != CodeGenOpt::None)
return false;		return false;
}		}

bool SystemZPassConfig::addILPOpts() {		bool SystemZPassConfig::addILPOpts() {
addPass(&EarlyIfConverterID);		addPass(&EarlyIfConverterID);
return true;		return true;
}		}

		void SystemZPassConfig::addPostRewrite() {
		addPass(createSystemZPostRewritePass(getSystemZTargetMachine()));
		}

void SystemZPassConfig::addPreSched2() {		void SystemZPassConfig::addPreSched2() {
		// PostRewrite needs to be run at -O0 also (in which case addPostRewrite()
		// is not called).
		if (getOptLevel() == CodeGenOpt::None)
		addPass(createSystemZPostRewritePass(getSystemZTargetMachine()));

addPass(createSystemZExpandPseudoPass(getSystemZTargetMachine()));		addPass(createSystemZExpandPseudoPass(getSystemZTargetMachine()));

if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addPass(&IfConverterID);		addPass(&IfConverterID);
}		}

void SystemZPassConfig::addPreEmitPass() {		void SystemZPassConfig::addPreEmitPass() {
// Do instruction shortening before compare elimination because some		// Do instruction shortening before compare elimination because some
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	public:
/// specified operand(s). If this is possible, the target should perform the		/// specified operand(s). If this is possible, the target should perform the
/// folding and return true, otherwise it should return false. If it folds		/// folding and return true, otherwise it should return false. If it folds
/// the instruction, it is likely that the MachineInstruction the iterator		/// the instruction, it is likely that the MachineInstruction the iterator
/// references has been changed.		/// references has been changed.
MachineInstr *		MachineInstr *
foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
ArrayRef<unsigned> Ops,		ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, int FrameIndex,		MachineBasicBlock::iterator InsertPt, int FrameIndex,
LiveIntervals *LIS = nullptr) const override;		LiveIntervals *LIS = nullptr,
		VirtRegMap *VRM = nullptr) const override;

/// foldMemoryOperand - Same as the previous version except it allows folding		/// foldMemoryOperand - Same as the previous version except it allows folding
/// of any load and store from / to any address, not just from a specific		/// of any load and store from / to any address, not just from a specific
/// stack slot.		/// stack slot.
MachineInstr *foldMemoryOperandImpl(		MachineInstr *foldMemoryOperandImpl(
MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,		MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,
LiveIntervals *LIS = nullptr) const override;		LiveIntervals *LIS = nullptr) const override;
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,767 Lines • ▼ Show 20 Lines	if (PrintFailedFusing && !MI.isCopy())
dbgs() << "We failed to fuse operand " << OpNum << " in " << MI;		dbgs() << "We failed to fuse operand " << OpNum << " in " << MI;
return nullptr;		return nullptr;
}		}

MachineInstr *		MachineInstr *
X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
ArrayRef<unsigned> Ops,		ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt,		MachineBasicBlock::iterator InsertPt,
int FrameIndex, LiveIntervals *LIS) const {		int FrameIndex, LiveIntervals *LIS,
		VirtRegMap *VRM) const {
// Check switch flag		// Check switch flag
if (NoFusing)		if (NoFusing)
return nullptr;		return nullptr;

// Avoid partial and undef register update stalls unless optimizing for size.		// Avoid partial and undef register update stalls unless optimizing for size.
if (!MF.getFunction().hasOptSize() &&		if (!MF.getFunction().hasOptSize() &&
(hasPartialRegUpdate(MI.getOpcode(), Subtarget, /ForLoadFold/true) \|\|		(hasPartialRegUpdate(MI.getOpcode(), Subtarget, /ForLoadFold/true) \|\|
shouldPreventUndefRegUpdateMemFold(MF, MI)))		shouldPreventUndefRegUpdateMemFold(MF, MI)))
▲ Show 20 Lines • Show All 2,809 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/asm-18.ll

Show First 20 Lines • Show All 597 Lines • ▼ Show 20 Lines	; CHECK: br %r14
%add2 = add i32 %res2, 1		%add2 = add i32 %res2, 1
%res3 = call i32 asm "stepc $0, $1", "=r,r"(i32 %add2)		%res3 = call i32 asm "stepc $0, $1", "=r,r"(i32 %add2)
%add3 = add i32 %res3, 32767		%add3 = add i32 %res3, 32767
call void asm sideeffect "stepd $0", "r"(i32 %add3)		call void asm sideeffect "stepd $0", "r"(i32 %add3)
ret void		ret void
}		}

; Test three-operand halfword immediate addition involving mixtures of low		; Test three-operand halfword immediate addition involving mixtures of low
; and high registers. RISBHG/AIH would be OK too, instead of AHIK/RISBHG.		; and high registers. AHIK/RISBHG would be OK too, instead of RISBHG/AIH.
		uweigandUnsubmitted Done Reply Inline Actions Should update this comment now. uweigand: Should update this comment now.
define i32 @f28(i32 %old) {		define i32 @f28(i32 %old) {
; CHECK-LABEL: f28:		; CHECK-LABEL: f28:
; CHECK: ahik [[REG1:%r[0-5]]], %r2, 14		; CHECK: ahik [[REG1:%r[0-5]]], %r2, 14
; CHECK: stepa %r2, [[REG1]]		; CHECK: stepa %r2, [[REG1]]
; CHECK: ahik [[TMP:%r[0-5]]], [[REG1]], 254		; CHECK: risbhg [[REG1]], [[REG1]], 0, 159, 32
; CHECK: risbhg [[REG2:%r[0-5]]], [[TMP]], 0, 159, 32		; CHECK: aih [[REG1]], 254
; CHECK: stepb [[REG1]], [[REG2]]		; CHECK: stepb [[REG1]], [[REG2]]
; CHECK: risbhg [[REG3:%r[0-5]]], [[REG2]], 0, 159, 0		; CHECK: risbhg [[REG3:%r[0-5]]], [[REG2]], 0, 159, 0
; CHECK: aih [[REG3]], 127		; CHECK: aih [[REG3]], 127
; CHECK: stepc [[REG2]], [[REG3]]		; CHECK: stepc [[REG2]], [[REG3]]
; CHECK: risblg %r2, [[REG3]], 0, 159, 32		; CHECK: risblg %r2, [[REG3]], 0, 159, 32
; CHECK: ahi %r2, 128		; CHECK: ahi %r2, 128
; CHECK: stepd [[REG3]], %r2		; CHECK: stepd [[REG3]], %r2
; CHECK: br %r14		; CHECK: br %r14
▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/codegenprepare-splitstore.ll

	; Test that CodeGenPrepare respects endianness when splitting a store.			; Test that CodeGenPrepare respects endianness when splitting a store.
	;			;
	; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z13 -force-split-store < %s \| FileCheck %s			; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z13 -force-split-store < %s \| FileCheck %s

	define void @fun(i16* %Src, i16* %Dst) {			define void @fun(i16* %Src, i16* %Dst) {
	; CHECK-LABEL: # %bb.0:			; CHECK-LABEL: # %bb.0:
	; CHECK: lh %r0, 0(%r2)			; CHECK: lh %r0, 0(%r2)
				; CHECK-NEXT: srlk %r1, %r0, 8
	; CHECK-NEXT: stc %r0, 1(%r3)			; CHECK-NEXT: stc %r0, 1(%r3)
	; CHECK-NEXT: srl %r0, 8			; CHECK-NEXT: stc %r1, 0(%r3)
	; CHECK-NEXT: stc %r0, 0(%r3)
	; CHECK-NEXT: br %r14			; CHECK-NEXT: br %r14
	%1 = load i16, i16* %Src			%1 = load i16, i16* %Src
	%2 = trunc i16 %1 to i8			%2 = trunc i16 %1 to i8
	%3 = lshr i16 %1, 8			%3 = lshr i16 %1, 8
	%4 = trunc i16 %3 to i8			%4 = trunc i16 %3 to i8
	%5 = zext i8 %2 to i16			%5 = zext i8 %2 to i16
	%6 = zext i8 %4 to i16			%6 = zext i8 %4 to i16
	%7 = shl nuw i16 %6, 8			%7 = shl nuw i16 %6, 8
	%8 = or i16 %7, %5			%8 = or i16 %7, %5
	store i16 %8, i16* %Dst			store i16 %8, i16* %Dst
	ret void			ret void
	}			}

test/CodeGen/SystemZ/ctpop-01.ll

	; Test population-count instruction			; Test population-count instruction
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z196 \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z196 \| FileCheck %s

	declare i32 @llvm.ctpop.i32(i32 %a)			declare i32 @llvm.ctpop.i32(i32 %a)
	declare i64 @llvm.ctpop.i64(i64 %a)			declare i64 @llvm.ctpop.i64(i64 %a)

	define i32 @f1(i32 %a) {			define i32 @f1(i32 %a) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: popcnt %r0, %r2			; CHECK: popcnt %r0, %r2
	; CHECK: sllk %r1, %r0, 16			; CHECK: sllk %r1, %r0, 16
	; CHECK: ar %r1, %r0			; CHECK: ar %r0, %r1
	; CHECK: sllk %r2, %r1, 8			; CHECK: sllk %r1, %r0, 8
	; CHECK: ar %r2, %r1			; CHECK: ar %r0, %r1
	; CHECK: srl %r2, 24			; CHECK: srlk %r2, %r0, 24
	; CHECK: br %r14			; CHECK: br %r14

	%popcnt = call i32 @llvm.ctpop.i32(i32 %a)			%popcnt = call i32 @llvm.ctpop.i32(i32 %a)
	ret i32 %popcnt			ret i32 %popcnt
	}			}

	define i32 @f2(i32 %a) {			define i32 @f2(i32 %a) {
	; CHECK-LABEL: f2:			; CHECK-LABEL: f2:
	; CHECK: llhr %r0, %r2			; CHECK: llhr %r0, %r2
	; CHECK: popcnt %r0, %r0			; CHECK: popcnt %r0, %r0
	; CHECK: risblg %r2, %r0, 16, 151, 8			; CHECK: risblg %r1, %r0, 16, 151, 8
	; CHECK: ar %r2, %r0			; CHECK: ar %r0, %r1
	; CHECK: srl %r2, 8			; CHECK: srlk %r2, %r0, 8
	; CHECK: br %r14			; CHECK: br %r14
	%and = and i32 %a, 65535			%and = and i32 %a, 65535
	%popcnt = call i32 @llvm.ctpop.i32(i32 %and)			%popcnt = call i32 @llvm.ctpop.i32(i32 %and)
	ret i32 %popcnt			ret i32 %popcnt
	}			}

	define i32 @f3(i32 %a) {			define i32 @f3(i32 %a) {
	; CHECK-LABEL: f3:			; CHECK-LABEL: f3:
	; CHECK: llcr %r0, %r2			; CHECK: llcr %r0, %r2
	; CHECK: popcnt %r2, %r0			; CHECK: popcnt %r2, %r0
	; CHECK: br %r14			; CHECK: br %r14
	%and = and i32 %a, 255			%and = and i32 %a, 255
	%popcnt = call i32 @llvm.ctpop.i32(i32 %and)			%popcnt = call i32 @llvm.ctpop.i32(i32 %and)
	ret i32 %popcnt			ret i32 %popcnt
	}			}

	define i64 @f4(i64 %a) {			define i64 @f4(i64 %a) {
	; CHECK-LABEL: f4:			; CHECK-LABEL: f4:
	; CHECK: popcnt %r0, %r2			; CHECK: popcnt %r0, %r2
	; CHECK: sllg %r1, %r0, 32			; CHECK: sllg %r1, %r0, 32
	; CHECK: agr %r1, %r0			; CHECK: agr %r0, %r1
	; CHECK: sllg %r0, %r1, 16			; CHECK: sllg %r1, %r0, 16
	; CHECK: agr %r0, %r1			; CHECK: agr %r0, %r1
	; CHECK: sllg %r1, %r0, 8			; CHECK: sllg %r1, %r0, 8
	; CHECK: agr %r1, %r0			; CHECK: agr %r0, %r1
	; CHECK: srlg %r2, %r1, 56			; CHECK: srlg %r2, %r0, 56
	; CHECK: br %r14			; CHECK: br %r14
	%popcnt = call i64 @llvm.ctpop.i64(i64 %a)			%popcnt = call i64 @llvm.ctpop.i64(i64 %a)
	ret i64 %popcnt			ret i64 %popcnt
	}			}

	define i64 @f5(i64 %a) {			define i64 @f5(i64 %a) {
	; CHECK-LABEL: f5:			; CHECK-LABEL: f5:
	; CHECK: llgfr %r0, %r2			; CHECK: llgfr %r0, %r2
	; CHECK: popcnt %r0, %r0			; CHECK: popcnt %r0, %r0
	; CHECK: sllg %r1, %r0, 16			; CHECK: sllg %r1, %r0, 16
	; CHECK: algfr %r0, %r1			; CHECK: algfr %r0, %r1
	; CHECK: sllg %r1, %r0, 8			; CHECK: sllg %r1, %r0, 8
	; CHECK: algfr %r0, %r1			; CHECK: algfr %r0, %r1
	; CHECK: srlg %r2, %r0, 24			; CHECK: srlg %r2, %r0, 24
	%and = and i64 %a, 4294967295			%and = and i64 %a, 4294967295
	%popcnt = call i64 @llvm.ctpop.i64(i64 %and)			%popcnt = call i64 @llvm.ctpop.i64(i64 %and)
	ret i64 %popcnt			ret i64 %popcnt
	}			}

	define i64 @f6(i64 %a) {			define i64 @f6(i64 %a) {
	; CHECK-LABEL: f6:			; CHECK-LABEL: f6:
	; CHECK: llghr %r0, %r2			; CHECK: llghr %r0, %r2
	; CHECK: popcnt %r0, %r0			; CHECK: popcnt %r0, %r0
	; CHECK: risbg %r1, %r0, 48, 183, 8			; CHECK: risbg %r1, %r0, 48, 183, 8
	; CHECK: agr %r1, %r0			; CHECK: agr %r0, %r1
	; CHECK: srlg %r2, %r1, 8			; CHECK: srlg %r2, %r0, 8
	; CHECK: br %r14			; CHECK: br %r14
	%and = and i64 %a, 65535			%and = and i64 %a, 65535
	%popcnt = call i64 @llvm.ctpop.i64(i64 %and)			%popcnt = call i64 @llvm.ctpop.i64(i64 %and)
	ret i64 %popcnt			ret i64 %popcnt
	}			}

	define i64 @f7(i64 %a) {			define i64 @f7(i64 %a) {
	; CHECK-LABEL: f7:			; CHECK-LABEL: f7:
	; CHECK: llgcr %r0, %r2			; CHECK: llgcr %r0, %r2
	; CHECK: popcnt %r2, %r0			; CHECK: popcnt %r2, %r0
	; CHECK: br %r14			; CHECK: br %r14
	%and = and i64 %a, 255			%and = and i64 %a, 255
	%popcnt = call i64 @llvm.ctpop.i64(i64 %and)			%popcnt = call i64 @llvm.ctpop.i64(i64 %and)
	ret i64 %popcnt			ret i64 %popcnt
	}			}

test/CodeGen/SystemZ/int-add-05.ll

; Test 64-bit addition in which the second operand is variable.		; Test 64-bit addition in which the second operand is variable.
;		;
; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 \| FileCheck %s		; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 \| FileCheck %s --check-prefixes=CHECK,Z10
; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z196 \| FileCheck %s		; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z196 \| FileCheck %s --check-prefixes=CHECK,Z196

declare i64 @foo()		declare i64 @foo()

; Check AGR.		; Check AGR.
define i64 @f1(i64 %a, i64 %b) {		define i64 @f1(i64 %a, i64 %b) {
; CHECK-LABEL: f1:		; CHECK-LABEL: f1:
; CHECK: agr %r2, %r3		; CHECK: agr %r2, %r3
; CHECK: br %r14		; CHECK: br %r14
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	; CHECK: br %r14
%add2 = add i64 %add1, 524280		%add2 = add i64 %add1, 524280
%ptr = inttoptr i64 %add2 to i64 *		%ptr = inttoptr i64 %add2 to i64 *
%b = load i64, i64 *%ptr		%b = load i64, i64 *%ptr
%add = add i64 %a, %b		%add = add i64 %a, %b
ret i64 %add		ret i64 %add
}		}

; Check that additions of spilled values can use AG rather than AGR.		; Check that additions of spilled values can use AG rather than AGR.
		; Note: Z196 is suboptimal with one unfolded reload.
		uweigandUnsubmitted Not Done Reply Inline Actions Just to clarify: even with the new MemFoldPseudos this is still suboptimal? Why is that? uweigand: Just to clarify: even with the new MemFoldPseudos this is still suboptimal? Why is that?
		jonpaAuthorUnsubmitted Done Reply Inline Actions The MemFoldPseudos are not improving anything compared to before, they are just making the IR legal, as well as handling a few rare cases by inserting COPYs before when needed. In the case where a MemFoldPseudo actually ended up getting the dst and LHS regs to be the same after regalloc, all that is needed is a lowering to the target instruction. In the case where a MemFoldPseudo had dst and LHS allocated to the same physreg at the point of foldMemoryOperandImpl(), but this was later changed by an eviction or so, a COPY of LHS to dst is also needed during lowering. In the case where dst and LHS were allocated different regs to begin with, the folding cannot occur, which is why this test case fails (discussed here on 26th of April). jonpa: The MemFoldPseudos are not improving anything compared to before, they are just making the IR…
		uweigandUnsubmitted Not Done Reply Inline Actions Ah, I thought the MemFoldPseudo classes would allow folding in the case where dst and LHS were allocated different regs to begin with! Why don't they? I thought this would fold to a pseudo three-operand add-from-memory, which later gets lowered to a COPY of the register LHS to dst followed by the real two-operand add-from-memory? uweigand: Ah, I thought the MemFoldPseudo classes would allow folding in the case where dst and LHS…
		jonpaAuthorUnsubmitted Done Reply Inline Actions I guess I was expecting "Load + Op(Reg)" be better than "COPY + Op(Mem)", but I really don't know. I tried to remove this restriction on SPEC 2006 and found these opcode differences: lg : 371908 370637 -1271 lgr : 349230 350477 +1247 ag : 11716 12786 +1070 agr : 32345 31346 -999 l : 72751 71972 -779 a : 13718 14396 +678 ar : 19560 18886 -674 lr : 28345 28960 +615 sg : 6715 6862 +147 sgrk : 11729 11635 -94 agrk : 8730 8658 -72 sgr : 15302 15249 -53 s : 1370 1416 +46 o : 1924 1964 +40 or : 2780 2742 -38 srk : 3730 3693 -37 ngr : 2811 2780 -31 ng : 3117 3148 +31 nr : 2933 2917 -16 ... This would also handle this test case to do the fold while requiring one more lgr. Would this be better? jonpa: I guess I was expecting "Load + Op(Reg)" be better than "COPY + Op(Mem)", but I really don't…
		uweigandUnsubmitted Not Done Reply Inline Actions It is an interesting question whether LGR/AG is in general better or worse (or the same) than LG/AGR. Even if they are the same hardware-wise, I guess there might still be differences w.r.t. LLVM register allocation ... When you make that change, do you see any performance differences / changes to those regressions you mention above? uweigand: It is an interesting question whether LGR/AG is in general better or worse (or the same) than…
		jonpaAuthorUnsubmitted Done Reply Inline Actions During quick preliminary benchmarking (14 x 3 runs per benchmark during the day): z13: (leela_r was down again to a 2.5% regression (without any rebase, and with same build). xz_r was helped ~1%, see below. Effects of not requiring Dst/LHS to be the same, compared to patch with dst/lhs required to be the same: 2017 (Average: 99.969%): Improvements: 0.992: i557.xz_r 0.996: i525.x264_r 0.997: f507.cactuBSSN_r 0.997: i541.leela_r Regressions: 1.004: i531.deepsjeng_r 1.003: f511.povray_r 1.003: i500.perlbench_r 2006 (Average: 99.983%): Improvements 0.975: f436.cactusADM 0.992: i456.hmmer 0.996: f453.povray Regressions 1.011: f470.lbm 1.009: i464.h264ref 1.005: f454.calculix 1.004: i473.astar 1.003: f435.gromacs z14: 2017 (Average: 100.126%): Improvements 0.991: f519.lbm_r Regressions 1.010: f511.povray_r 1.009: i500.perlbench_r 1.008: i523.xalancbmk_r 1.005: f507.cactuBSSN_r 1.005: f508.namd_r 1.005: i541.leela_r 2006 (Average: 99.820%) Improvements 0.988: f470.lbm 0.989: f435.gromacs 0.992: f447.dealII 0.994: f481.wrf 0.996: f436.cactusADM 0.997: i401.bzip2 Regressions 1.008: i400.perlbench 1.003: i458.sjeng The regalloc effects of this seem to be very marginal according to some stats (z13): 2006 / ThreeAddr 43688 regalloc - Number of spill slots allocated 57280 regalloc - Number of spilled live ranges 1794 regalloc - Number of spilled snippets 52901 regalloc - Number of spills inserted 1577 regalloc - Number of spills removed 2006 / ThreeAddr + disable_dstlhs_check 43686 regalloc - Number of spill slots allocated 57278 regalloc - Number of spilled live ranges 1795 regalloc - Number of spilled snippets 52899 regalloc - Number of spills inserted 1575 regalloc - Number of spills removed 2017 / ThreeAddr 138380 regalloc - Number of spill slots allocated 182928 regalloc - Number of spilled live ranges 9532 regalloc - Number of spilled snippets 177398 regalloc - Number of spills inserted 5050 regalloc - Number of spills removed 2017 / ThreeAddr + disable_dstlhs_check 138382 regalloc - Number of spill slots allocated 182928 regalloc - Number of spilled live ranges 9509 regalloc - Number of spilled snippets 177387 regalloc - Number of spills inserted 5057 regalloc - Number of spills removed All in all, it doesn't seem to matter much, but possibly it is better to skip this requirement as you expected. I can see the point of not needing that extra register to do the reload with... jonpa: During quick preliminary benchmarking (14 x 3 runs per benchmark during the day): z13: ====…
		uweigandUnsubmitted Not Done Reply Inline Actions All in all, it doesn't seem to matter much, but possibly it is better to skip this requirement as you expected. I can see the point of not needing that extra register to do the reload with... You shouldn't really need an extra register, since the original destination register will always be free at this point, so the allocator should be able to choose it. I was just wondering whether the additional allocation has any secondary effects in the allocator, but apparently not. Given the information I received from our hardware folks that LG/AGR is preferable from their point over LGR/AG, and your performance results that show not much difference, I now think we should leave the patch as-is. uweigand: > All in all, it doesn't seem to matter much, but possibly it is better to skip this…
define i64 @f9(i64 *%ptr0) {		define i64 @f9(i64 *%ptr0) {
; CHECK-LABEL: f9:		; CHECK-LABEL: f9:
; CHECK: brasl %r14, foo@PLT		; CHECK: brasl %r14, foo@PLT
; CHECK: ag %r2, 160(%r15)		; Z10: ag %r2, 168(%r15)
		; Z196: ag %r0, 168(%r15)
; CHECK: br %r14		; CHECK: br %r14
%ptr1 = getelementptr i64, i64 *%ptr0, i64 2		%ptr1 = getelementptr i64, i64 *%ptr0, i64 2
%ptr2 = getelementptr i64, i64 *%ptr0, i64 4		%ptr2 = getelementptr i64, i64 *%ptr0, i64 4
%ptr3 = getelementptr i64, i64 *%ptr0, i64 6		%ptr3 = getelementptr i64, i64 *%ptr0, i64 6
%ptr4 = getelementptr i64, i64 *%ptr0, i64 8		%ptr4 = getelementptr i64, i64 *%ptr0, i64 8
%ptr5 = getelementptr i64, i64 *%ptr0, i64 10		%ptr5 = getelementptr i64, i64 *%ptr0, i64 10
%ptr6 = getelementptr i64, i64 *%ptr0, i64 12		%ptr6 = getelementptr i64, i64 *%ptr0, i64 12
%ptr7 = getelementptr i64, i64 *%ptr0, i64 14		%ptr7 = getelementptr i64, i64 *%ptr0, i64 14
Show All 29 Lines

test/CodeGen/SystemZ/int-sub-11.ll

This file was added.

				; Test of subtraction that involves a constant as the first operand
				;
				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z196 \| FileCheck %s

				; Check highest 16-bit signed int immediate value.
				define i64 @f1(i64 %a) {
				; CHECK-LABEL: f1:
				; CHECK: lghi %r0, 32767
				; CHECK: sgrk %r2, %r0, %r2
				; CHECK: br %r14
				%sub = sub i64 32767, %a
				ret i64 %sub
				}
				; Check highest 32-bit signed int immediate value.
				define i64 @f2(i64 %a) {
				; CHECK-LABEL: f2:
				; CHECK: lgfi %r0, 2147483647
				; CHECK: sgrk %r2, %r0, %r2
				; CHECK: br %r14
				%sub = sub i64 2147483647, %a
				ret i64 %sub
				}

test/CodeGen/SystemZ/scalar-ctlz.ll

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; CHECK-NEXT: br %r14
ret i32 %1		ret i32 %1
}		}

define i16 @f4(i16 %arg) {		define i16 @f4(i16 %arg) {
; CHECK-LABEL: f4:		; CHECK-LABEL: f4:
; CHECK-LABEL: %bb.0:		; CHECK-LABEL: %bb.0:
; CHECK-NEXT: # kill		; CHECK-NEXT: # kill
; CHECK-NEXT: llghr %r0, %r2		; CHECK-NEXT: llghr %r0, %r2
; CHECK-NEXT: flogr %r2, %r0		; CHECK-NEXT: flogr %r0, %r0
; CHECK-NEXT: aghi %r2, -32		; CHECK-NEXT: aghi %r0, -32
; CHECK-NEXT: ahi %r2, -16		; CHECK-NEXT: ahik %r2, %r0, -16
; CHECK-NEXT: # kill
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%1 = tail call i16 @llvm.ctlz.i16(i16 %arg, i1 false)		%1 = tail call i16 @llvm.ctlz.i16(i16 %arg, i1 false)
ret i16 %1		ret i16 %1
}		}

define i16 @f5(i16 %arg) {		define i16 @f5(i16 %arg) {
; CHECK-LABEL: f5:		; CHECK-LABEL: f5:
; CHECK-LABEL: %bb.0:		; CHECK-LABEL: %bb.0:
; CHECK-NEXT: # kill		; CHECK-NEXT: # kill
; CHECK-NEXT: llghr %r0, %r2		; CHECK-NEXT: llghr %r0, %r2
; CHECK-NEXT: flogr %r2, %r0		; CHECK-NEXT: flogr %r0, %r0
; CHECK-NEXT: aghi %r2, -32		; CHECK-NEXT: aghi %r0, -32
; CHECK-NEXT: ahi %r2, -16		; CHECK-NEXT: ahik %r2, %r0, -16
; CHECK-NEXT: # kill
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%1 = tail call i16 @llvm.ctlz.i16(i16 %arg, i1 true)		%1 = tail call i16 @llvm.ctlz.i16(i16 %arg, i1 true)
ret i16 %1		ret i16 %1
}		}

define i8 @f6(i8 %arg) {		define i8 @f6(i8 %arg) {
; CHECK-LABEL: f6:		; CHECK-LABEL: f6:
; CHECK-LABEL: %bb.0:		; CHECK-LABEL: %bb.0:
; CHECK-NEXT: # kill		; CHECK-NEXT: # kill
; CHECK-NEXT: llgcr %r0, %r2		; CHECK-NEXT: llgcr %r0, %r2
; CHECK-NEXT: flogr %r2, %r0		; CHECK-NEXT: flogr %r0, %r0
; CHECK-NEXT: aghi %r2, -32		; CHECK-NEXT: aghi %r0, -32
; CHECK-NEXT: ahi %r2, -24		; CHECK-NEXT: ahik %r2, %r0, -24
; CHECK-NEXT: # kill
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%1 = tail call i8 @llvm.ctlz.i8(i8 %arg, i1 false)		%1 = tail call i8 @llvm.ctlz.i8(i8 %arg, i1 false)
ret i8 %1		ret i8 %1
}		}

define i8 @f7(i8 %arg) {		define i8 @f7(i8 %arg) {
; CHECK-LABEL: f7:		; CHECK-LABEL: f7:
; CHECK-LABEL: %bb.0:		; CHECK-LABEL: %bb.0:
; CHECK-NEXT: # kill		; CHECK-NEXT: # kill
; CHECK-NEXT: llgcr %r0, %r2		; CHECK-NEXT: llgcr %r0, %r2
; CHECK-NEXT: flogr %r2, %r0		; CHECK-NEXT: flogr %r0, %r0
; CHECK-NEXT: aghi %r2, -32		; CHECK-NEXT: aghi %r0, -32
; CHECK-NEXT: ahi %r2, -24		; CHECK-NEXT: ahik %r2, %r0, -24
; CHECK-NEXT: # kill
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%1 = tail call i8 @llvm.ctlz.i8(i8 %arg, i1 true)		%1 = tail call i8 @llvm.ctlz.i8(i8 %arg, i1 true)
ret i8 %1		ret i8 %1
}		}

test/CodeGen/SystemZ/store_nonbytesized_vecs.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines

	; Truncate a <8 x i32> vector to <8 x i31> and store it (test splitting).			; Truncate a <8 x i32> vector to <8 x i31> and store it (test splitting).
	define void @fun2(<8 x i32> %src, <8 x i31>* %p)			define void @fun2(<8 x i32> %src, <8 x i31>* %p)
	; CHECK-LABEL: fun2:			; CHECK-LABEL: fun2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: stmg %r14, %r15, 112(%r15)			; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
	; CHECK-NEXT: .cfi_offset %r14, -48			; CHECK-NEXT: .cfi_offset %r14, -48
	; CHECK-NEXT: .cfi_offset %r15, -40			; CHECK-NEXT: .cfi_offset %r15, -40
	; CHECK-NEXT: vlgvf %r3, %v26, 1			; CHECK-DAG: vlgvf [[REG11:%r[0-9]+]], %v26, 1
	; CHECK-NEXT: vlgvf %r1, %v26, 2			; CHECK-DAG: vlgvf [[REG12:%r[0-9]+]], %v26, 2
	; CHECK-NEXT: risbgn %r4, %r3, 0, 129, 62			; CHECK-DAG: risbgn [[REG13:%r[0-9]+]], [[REG11]], 0, 129, 62
	; CHECK-NEXT: rosbg %r4, %r1, 2, 32, 31			; CHECK-DAG: rosbg [[REG13]], [[REG12]], 2, 32, 31
	; CHECK-DAG: vlgvf %r0, %v26, 3			; CHECK-DAG: vlgvf %r0, %v26, 3
	; CHECK-DAG: rosbg %r4, %r0, 33, 63, 0			; CHECK-DAG: rosbg [[REG13]], %r0, 33, 63, 0
	; CHECK-DAG: stc %r0, 30(%r2)			; CHECK-DAG: stc %r0, 30(%r2)
	; CHECK-DAG: srl %r0, 8			; CHECK-DAG: srlk %r1, %r0, 8
	; CHECK-DAG: vlgvf [[REG0:%r[0-9]+]], %v24, 1			; CHECK-DAG: vlgvf [[REG0:%r[0-9]+]], %v24, 1
	; CHECK-DAG: vlgvf [[REG1:%r[0-9]+]], %v24, 0			; CHECK-DAG: vlgvf [[REG1:%r[0-9]+]], %v24, 0
	; CHECK-DAG: sth %r0, 28(%r2)			; CHECK-DAG: sth %r1, 28(%r2)
	; CHECK-DAG: vlgvf [[REG2:%r[0-9]+]], %v24, 2			; CHECK-DAG: vlgvf [[REG2:%r[0-9]+]], %v24, 2
	; CHECK-DAG: risbgn [[REG3:%r[0-9]+]], [[REG0]], 0, 133, 58			; CHECK-DAG: risbgn [[REG3:%r[0-9]+]], [[REG0]], 0, 133, 58
	; CHECK-DAG: rosbg [[REG3]], [[REG2]], 6, 36, 27			; CHECK-DAG: rosbg [[REG3]], [[REG2]], 6, 36, 27
	; CHECK-DAG: sllg [[REG4:%r[0-9]+]], [[REG1]], 25			; CHECK-DAG: sllg [[REG4:%r[0-9]+]], [[REG1]], 25
	; CHECK-DAG: rosbg [[REG4]], [[REG0]], 39, 63, 58			; CHECK-DAG: rosbg [[REG4]], [[REG0]], 39, 63, 58
	; CHECK-DAG: vlgvf [[REG5:%r[0-9]+]], %v24, 3			; CHECK-DAG: vlgvf [[REG5:%r[0-9]+]], %v24, 3
	; CHECK-DAG: rosbg [[REG3]], [[REG5]], 37, 63, 60			; CHECK-DAG: rosbg [[REG3]], [[REG5]], 37, 63, 60
	; CHECK-DAG: sllg [[REG6:%r[0-9]+]], [[REG4]], 8			; CHECK-DAG: sllg [[REG6:%r[0-9]+]], [[REG4]], 8
	; CHECK-DAG: rosbg [[REG6]], [[REG3]], 56, 63, 8			; CHECK-DAG: rosbg [[REG6]], [[REG3]], 56, 63, 8
	; CHECK-NEXT: stg [[REG6]], 0(%r2)			; CHECK-DAG: stg [[REG6]], 0(%r2)
	; CHECK-NEXT: srlg [[REG7:%r[0-9]+]], %r4, 24			; CHECK-DAG: srlg [[REG7:%r[0-9]+]], [[REG13]], 24
	; CHECK-NEXT: st [[REG7]], 24(%r2)			; CHECK-DAG: st [[REG7]], 24(%r2)
	; CHECK-NEXT: vlgvf [[REG8:%r[0-9]+]], %v26, 0			; CHECK-DAG: vlgvf [[REG8:%r[0-9]+]], %v26, 0
	; CHECK-NEXT: risbgn [[REG10:%r[0-9]+]], [[REG5]], 0, 131, 60			; CHECK-DAG: risbgn [[REG10:%r[0-9]+]], [[REG5]], 0, 131, 60
	; CHECK-NEXT: rosbg [[REG10]], [[REG8]], 4, 34, 29			; CHECK-DAG: rosbg [[REG10]], [[REG8]], 4, 34, 29
	; CHECK-NEXT: sllg [[REG9:%r[0-9]+]], [[REG3]], 8			; CHECK-DAG: sllg [[REG9:%r[0-9]+]], [[REG3]], 8
	; CHECK-NEXT: rosbg [[REG10]], %r3, 35, 63, 62			; CHECK-DAG: rosbg [[REG10]], [[REG11]], 35, 63, 62
	; CHECK-NEXT: rosbg [[REG9]], [[REG10]], 56, 63, 8			; CHECK-DAG: rosbg [[REG9]], [[REG10]], 56, 63, 8
	; CHECK-NEXT: stg [[REG9]], 8(%r2)			; CHECK-DAG: stg [[REG9]], 8(%r2)
	; CHECK-NEXT: sllg %r0, [[REG10]], 8			; CHECK-DAG: sllg %r0, [[REG10]], 8
	; CHECK-NEXT: rosbg %r0, %r4, 56, 63, 8			; CHECK-DAG: rosbg %r0, [[REG13]], 56, 63, 8
	; CHECK-NEXT: stg %r0, 16(%r2)			; CHECK-NEXT: stg %r0, 16(%r2)
	; CHECK-NEXT: lmg %r14, %r15, 112(%r15)			; CHECK-NEXT: lmg %r14, %r15, 112(%r15)
	; CHECK-NEXT: br %r14			; CHECK-NEXT: br %r14
	{			{
	%tmp = trunc <8 x i32> %src to <8 x i31>			%tmp = trunc <8 x i32> %src to <8 x i31>
	store <8 x i31> %tmp, <8 x i31>* %p			store <8 x i31> %tmp, <8 x i31>* %p
	ret void			ret void
	}			}
	Show All 23 Lines

test/CodeGen/SystemZ/vec-combine-02.ll

	Show First 20 Lines • Show All 402 Lines • ▼ Show 20 Lines
	define i32 @f9(double %scalar0, double %scalar1, double %scalar2,			define i32 @f9(double %scalar0, double %scalar1, double %scalar2,
	double %scalar3) {			double %scalar3) {
	; CHECK-LABEL: f9:			; CHECK-LABEL: f9:
	; CHECK-NOT: vperm			; CHECK-NOT: vperm
	; CHECK-NOT: vpk			; CHECK-NOT: vpk
	; CHECK-NOT: vmrh			; CHECK-NOT: vmrh
	; CHECK: ar {{%r[0-5]}},			; CHECK: ar {{%r[0-5]}},
	; CHECK: ar {{%r[0-5]}},			; CHECK: ar {{%r[0-5]}},
	; CHECK: or %r2,			; CHECK: ork %r2,
	; CHECK: br %r14			; CHECK: br %r14
	%vec0 = insertelement <2 x double> undef, double %scalar0, i32 0			%vec0 = insertelement <2 x double> undef, double %scalar0, i32 0
	%vec1 = insertelement <2 x double> undef, double %scalar1, i32 0			%vec1 = insertelement <2 x double> undef, double %scalar1, i32 0
	%vec2 = insertelement <2 x double> undef, double %scalar2, i32 0			%vec2 = insertelement <2 x double> undef, double %scalar2, i32 0
	%vec3 = insertelement <2 x double> undef, double %scalar3, i32 0			%vec3 = insertelement <2 x double> undef, double %scalar3, i32 0
	%join0 = shufflevector <2 x double> %vec0, <2 x double> %vec1,			%join0 = shufflevector <2 x double> %vec0, <2 x double> %vec1,
	<2 x i32> <i32 0, i32 2>			<2 x i32> <i32 0, i32 2>
	%join1 = shufflevector <2 x double> %vec2, <2 x double> %vec3,			%join1 = shufflevector <2 x double> %vec2, <2 x double> %vec3,
	Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Favor 3-address instructions during instruction selection.ClosedPublic

Details

Diff Detail

Event Timeline

i541.leela_r:

i557.xz_r:

z13:

z14:

Revision Contents

Diff 202861

include/llvm/CodeGen/TargetInstrInfo.h

include/llvm/CodeGen/TargetPassConfig.h

lib/CodeGen/InlineSpiller.cpp

lib/CodeGen/TargetInstrInfo.cpp

lib/CodeGen/TargetPassConfig.cpp

lib/Target/AArch64/AArch64InstrInfo.h

lib/Target/AArch64/AArch64InstrInfo.cpp

lib/Target/SystemZ/CMakeLists.txt

lib/Target/SystemZ/SystemZ.h

lib/Target/SystemZ/SystemZInstrFormats.td

lib/Target/SystemZ/SystemZInstrInfo.h

lib/Target/SystemZ/SystemZInstrInfo.cpp

lib/Target/SystemZ/SystemZInstrInfo.td

lib/Target/SystemZ/SystemZPostRewrite.cpp

lib/Target/SystemZ/SystemZRegisterInfo.cpp

lib/Target/SystemZ/SystemZShortenInst.cpp

lib/Target/SystemZ/SystemZTargetMachine.cpp

lib/Target/X86/X86InstrInfo.h

lib/Target/X86/X86InstrInfo.cpp

test/CodeGen/SystemZ/asm-18.ll

test/CodeGen/SystemZ/codegenprepare-splitstore.ll

test/CodeGen/SystemZ/ctpop-01.ll

test/CodeGen/SystemZ/int-add-05.ll

z13:

z14:

test/CodeGen/SystemZ/int-sub-11.ll

test/CodeGen/SystemZ/scalar-ctlz.ll

test/CodeGen/SystemZ/store_nonbytesized_vecs.ll

test/CodeGen/SystemZ/vec-combine-02.ll

[SystemZ] Favor 3-address instructions during instruction selection.
ClosedPublic