This is an archive of the discontinued LLVM Phabricator instance.

Differential D103865

[SystemZ] Generate XC loop for memset 0 of variable length.
ClosedPublic

Authored by jonpa on Jun 7 2021, 5:45 PM.

Download Raw Diff

Details

Reviewers

uweigand

Commits

rG37a92f3b03bf: [SystemZ] Generate XC loop for memset 0 of variable length.

Summary

Tried first inserting new MBBs for the EXRL and target instruction, but that has some problems:

Need a guarantee that the target instruction is not moved by later passes. hasSideEffects flag and surrounding the target instruction with MemBarrier:s might should do the trick but that is a bit clumsy. And to add hasSideEffects on all MemorySS instructions will cause many files to change since that will cause a new scheduler region to be created.
The MBB of the target instruction should be reachable, or I think it might be simply removed by some optimizer. In order to maintain CFG edges between the EXRL MBB and TargetIns MBB, analyzeBranch() needs to detect these blocks and return them as unanalyzable, since EXRL is not really a branch or terminator. This is also a bit of extra work...

I Instead tried using an EXRL_Pseudo that is kept intact all the way to AsmPrinter. My newbie approach of creating symbols and MCInsts and then emit them at the end of the function seems to work fine both for assembly and obj streaming. Not sure if there is another more correct way...

A new method emitMemMemLoopVarLen() which shares code with emitMemMemWrapper() via new class MemMemBuilder (since there is no public default constructor for MachineOperand the DestBase and SrcBase must be passed to its constructor rather than via a single MI argument).

The target instruction will never be out-of-range for the EXRL instruction with a 32 bit signed range, right?

Pardon the typo-fixes mixed in here (MMB -> MBB). Maybe pre-commit them?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jonpa created this revision.Jun 7 2021, 5:45 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 7 2021, 5:45 PM

jonpa requested review of this revision.Jun 7 2021, 5:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2021, 5:45 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B108113: Diff 350463.Jun 7 2021, 6:31 PM

There is something wrong somewhere here.. some benchmarks are seg-faulting. investigating

I think the problem was that EXRL cannot use %r0 as that means no or:ing will take place.

Harbormaster completed remote builds in B108983: Diff 351673.Jun 12 2021, 10:55 AM

Benchmarking:

Z14:

reduced:

Improvements:
0.984: f511.povray_r 
0.988: f510.parest_r 
0.992: i523.xalancbmk_r 
0.993: i531.deepsjeng_r 
0.993: f508.namd_r 
...

Regressions:
1.027: i557.xz_r 
1.006: f507.cactuBSSN_r 
...

full (4):

Improvements:
0.990: f511.povray_r 
0.991: i557.xz_r 
0.991: f510.parest_r 
0.998: i523.xalancbmk_r

Z15:

reduced:

Improvements:
0.823: i531.deepsjeng_r 
0.985: f538.imagick_r 
0.990: i500.perlbench_r 
0.993: f510.parest_r 

Regressions:
(none)

full (4):

Improvements:
0.994: i531.deepsjeng_r 
0.995: f510.parest_r 
0.997: f538.imagick_r 

Regressions:
1.002: i500.perlbench_r

This looks pretty good to me. One thing that is odd is the extra always-zero parameter to the LoopVarLen insns. This really should be removed.

As a suggestion that might remove a bit of the duplication between emitMemMemWrapper and emitMemMemLoopVarLen: the (old) routine is currently called for both the Sequence and Loop insns, and it distinguishes between them based on the operand count. However, for the LoopVarLen case, your patch now introduces a separate new routine. I think it might be simpler to continue to use the same routine, and just internally distinguish between the cases based on whether the length operand is an immediate or a register.

Finally, I'm wondering if it wouldn't make sense to extend support for the variable-length case to the other block operations, now that we already have the EXRL_Pseudo infrastructure. (I guess that could also be done in a separate patch.)

Pardon the typo-fixes mixed in here (MMB -> MBB). Maybe pre-commit them?

Yes, please pre-commit those.

llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
14	The above clang-format check looks valid (includes should be kept sorted).
29	This as well.

One thing that is odd is the extra always-zero parameter to the LoopVarLen insns. This really should be removed.

Done - it was easy to remove on the MachineInstr, but on the SystemZISD node it seems reasonable to keep a dummy zero there and reuse the existing opcode, or?

As a suggestion that might remove a bit of the duplication between emitMemMemWrapper and emitMemMemLoopVarLen: the (old) routine is currently called for both the Sequence and Loop insns, and it distinguishes between them based on the operand count. However, for the LoopVarLen case, your patch now introduces a separate new routine. I think it might be simpler to continue to use the same routine, and just internally distinguish between the cases based on whether the length operand is an immediate or a register.

OK - I merged the two functions instead.

Finally, I'm wondering if it wouldn't make sense to extend support for the variable-length case to the other block operations, now that we already have the EXRL_Pseudo infrastructure. (I guess that could also be done in a separate patch.)

I would like to do that afterwards then with some more benchmarking as well...

Pardon the typo-fixes mixed in here (MMB -> MBB). Maybe pre-commit them?

Yes, please pre-commit those.

b2cd98d

llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
The above clang-format check looks valid (includes should be kept sorted).
This as well.

done

Also added handling/test for i32 case (test/CodeGen/Generic/ForceStackAlign.ll was actually failing) - if the type of length is <64 bits, it needs zero-extension.

On spec, I see now 8869 exrl instructions, each with its own xc target instruction. I wonder if we should try to avoid duplicating target instructions. I see in some files hundreds of identical xc targets (worst case is 845 identical ones in one file: f510.parest_r/build/vectors.s !). 67 files have more than 20 identical xc targets...

Harbormaster completed remote builds in B110288: Diff 353483.Jun 21 2021, 3:08 PM

In D103865#2831600, @jonpa wrote:

One thing that is odd is the extra always-zero parameter to the LoopVarLen insns. This really should be removed.

Done - it was easy to remove on the MachineInstr, but on the SystemZISD node it seems reasonable to keep a dummy zero there and reuse the existing opcode, or?

Aren't there already two opcodes, one with 4 operands and one with 5? Can't you just use the one with 4 operands instead?

That said, maybe it actually is better to use the version with 5 operands, and actually pass in two values: the length (minus one), and the loop trip count (length / 256). These two values can easily be computed at the SelectionDAG level already, and if they are, that might open opportunities to optimize / reuse the values at that level. (Also, then it is more obvious to re-use the same opcode, since the operands do in fact have the same semantics then, it's just that in one case they're immediates and in the other they are registers.)

As a suggestion that might remove a bit of the duplication between emitMemMemWrapper and emitMemMemLoopVarLen: the (old) routine is currently called for both the Sequence and Loop insns, and it distinguishes between them based on the operand count. However, for the LoopVarLen case, your patch now introduces a separate new routine. I think it might be simpler to continue to use the same routine, and just internally distinguish between the cases based on whether the length operand is an immediate or a register.

OK - I merged the two functions instead.

Hmm, that's not quite what I had in mind, sorry. This code still has two completely separate code paths, just now in one function. What I had been thinking of is to actually share the same code path that e.g. emits the loop -- they should be identical except that the trip count is an immediate vs. a register.

The point is to merge the two paths to the extent where there is no longer any point in extracting buildMemMemLoop as a subroutine as it actually is used only once.

On spec, I see now 8869 exrl instructions, each with its own xc target instruction. I wonder if we should try to avoid duplicating target instructions. I see in some files hundreds of identical xc targets (worst case is 845 identical ones in one file: f510.parest_r/build/vectors.s !). 67 files have more than 20 identical xc targets...

That makes sense. It should be straightforward to sort and de-duplicate the target instructions at final emission time.

Aren't there already two opcodes, one with 4 operands and one with 5? Can't you just use the one with 4 operands instead?

Ah, yes, I guess I was looking at the names: 'Length' vs 'Loop'...

That said, maybe it actually is better to use the version with 5 operands, and actually pass in two values: the length (minus one), and the loop trip count (length / 256). These two values can easily be computed at the SelectionDAG level already, and if they are, that might open opportunities to optimize / reuse the values at that level. (Also, then it is more obvious to re-use the same opcode, since the operands do in fact have the same semantics then, it's just that in one case they're immediates and in the other they are registers.)

I updated the patch to create the length-1 and trip count on the DAG instead.

I see that this means that they are initially placed before the zero-length check, while before they were put in the preheader of the loop. The SRLG is actually moved to the preheader by MachineSink later, and this could probably be done for the AGHI as well if the comparison would be made with the AGHI source against 0.

To keep things simple, I just changed the comparison to be against -1 instead for now. What do you think about that - is the zero-length case rare enough to ignore where the AGHI is placed? We could do the extra work of finding the AGHI/-1 and comparing with 0 of that source reg instead, and expect the AGHI will be sunk later if there are no other users I would hope.

Hmm, that's not quite what I had in mind, sorry. This code still has two completely separate code paths, just now in one function. What I had been thinking of is to actually share the same code path that e.g. emits the loop -- they should be identical except that the trip count is an immediate vs. a register.

The point is to merge the two paths to the extent where there is no longer any point in extracting buildMemMemLoop as a subroutine as it actually is used only once.

I gave it another try: First let the reg/immediate cases set up the needed MBBs and then build the loop afterwards for both cases in one place. I guess that seems more simple than having a separate class for creating the loop...

That makes sense. It should be straightforward to sort and de-duplicate the target instructions at final emission time.

AsmPrinter::EmitToStreamer() calls getSubtargetInfo() which needs an MF, so it did not seem quite simple to emit all the EXRL targets in emitEndOfAsmFile(). That method seems to be intended for other things than emitting instructions. Does it seem right to try to create a new section at that point after all else and emit the target instructions?

For now, I tried just improving the output per function. This saves 4110 XC target instructions (prior count was ~9000). Worst case is now 168 in one file, and 38 files have >20, so it is an improvement, but still some room for improvement if reusing per file...

Harbormaster completed remote builds in B110734: Diff 354114.Jun 23 2021, 5:50 PM

In D103865#2837530, @jonpa wrote:

I updated the patch to create the length-1 and trip count on the DAG instead.

Thanks!

I see that this means that they are initially placed before the zero-length check, while before they were put in the preheader of the loop. The SRLG is actually moved to the preheader by MachineSink later, and this could probably be done for the AGHI as well if the comparison would be made with the AGHI source against 0.

To keep things simple, I just changed the comparison to be against -1 instead for now. What do you think about that - is the zero-length case rare enough to ignore where the AGHI is placed? We could do the extra work of finding the AGHI/-1 and comparing with 0 of that source reg instead, and expect the AGHI will be sunk later if there are no other users I would hope.

I think this is probably OK. GCC does the compare with 0, but there's probably not much difference. It's a pity we cannot move that check up to SystemZSelectionDAGInfo::EmitTargetCodeForMemset, but I believe we cannot create new basic blocks on the DAG level.

Hmm, that's not quite what I had in mind, sorry. This code still has two completely separate code paths, just now in one function. What I had been thinking of is to actually share the same code path that e.g. emits the loop -- they should be identical except that the trip count is an immediate vs. a register.

The point is to merge the two paths to the extent where there is no longer any point in extracting buildMemMemLoop as a subroutine as it actually is used only once.

I gave it another try: First let the reg/immediate cases set up the needed MBBs and then build the loop afterwards for both cases in one place. I guess that seems more simple than having a separate class for creating the loop...

That's better, thanks! I think readability might be even better if you continue to create the MBBs in order, i.e. emit the EXRL at the end instead of at the beginning. (That means a duplicated LengthMO.isReg() test, but that should still be better.)

That makes sense. It should be straightforward to sort and de-duplicate the target instructions at final emission time.

AsmPrinter::EmitToStreamer() calls getSubtargetInfo() which needs an MF, so it did not seem quite simple to emit all the EXRL targets in emitEndOfAsmFile(). That method seems to be intended for other things than emitting instructions. Does it seem right to try to create a new section at that point after all else and emit the target instructions?

Ah, that's unfortunate, but I guess understandable. The assumption is that code emission needs subtarget-specific info, and the subtarget is defined by attributes (like target CPU) that change between functions. At the end of the asm file, we're outside of all functions ...

There are ways around that. We do have a TargetMachine in the AsmPrinter, and there used to be a function getSubtargetImpl() that returns a "default" subtarget for the compilation unit. This has been deprecated (for the above reasons), but could probably be re-activated, and then be used as the subtarget to pass to emitInstruction during emitEndOfAsmFile.

Another option would be to simply "assemble" those instructions by hand, i.e. compute the 6-byte integer value from the constituent fields. Something like:

Opcode =  TargetInsOpc << 40 | LenMinus1Reg << 32 | DestReg << 28 | DestDisp << 16 | SrcReg << 12 | SrcDisp;

and then simply emit that value as 6 bytes of "data" (but still into the .text section, of course). This is also somewhat ugly, but maybe less ugly than the default Subtarget ...

llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
568	Ah, if you do it that way, you don't actually have to have multiple labels for the same instruction. Just compute the instruction (or just its opcode, see discussion in the main comment) first, and look it up in the table. If found, simply return its (one) associated symbol; if not, create a new symbol and add the pair of symbol, instruction to the table. Actually, that table can then just be a std::map (from instruction to symbol) instead of a vector, this implements the lookup in a more efficient way ...

I think readability might be even better if you continue to create the MBBs in order, i.e. emit the EXRL at the end instead of at the beginning. (That means a duplicated LengthMO.isReg() test, but that should still be better.)

Makes sense... fixed.

Ah, if you do it that way, you don't actually have to have multiple labels for the same instruction. Just compute the instruction (or just its opcode, see discussion in the main comment) first, and look it up in the table. If found, simply return its (one) associated symbol; if not, create a new symbol and add the pair of symbol, instruction to the table.

Actually, that table can then just be a std::map (from instruction to symbol) instead of a vector, this implements the lookup in a more efficient way ...

ah, of course - only one label emitted now. They by this get sorted per the operands in the comparator for the std::map object, but a little detail then is that the labels now get emitted in any order (non-sorted) - but I suppose that's ok, right?

There are ways around that. We do have a TargetMachine in the AsmPrinter, and there used to be a function getSubtargetImpl() that returns a "default" subtarget for the compilation unit. This has been deprecated (for the above reasons), but could probably be re-activated, and then be used as the subtarget to pass to emitInstruction during emitEndOfAsmFile.

Another option would be to simply "assemble" those instructions by hand, i.e. compute the 6-byte integer value from the constituent fields.

I started out with trying to "assemble" by hand per your suggestion, but it seemed like quite some work, especially considering that we also have to be able to print it as text. So I tried instead with adding back the GenericSubtarget, which was simple enough.

This then eliminates another 3786 XC targets: there are no duplicates in any file, and at most I see 13 different XC targets in a single file :-)

SPEC master <> patch

cgije          :               115009               132379   +17370
xc             :                13334                23198    +9864
exrl           :                    0                 8873    +8873
brctg          :                 8356                17228    +8872
srlg           :                54407                63229    +8822
brasl          :               640463               631658    -8805
...

Harbormaster completed remote builds in B111647: Diff 355400.Jun 29 2021, 5:39 PM

In D103865#2848831, @jonpa wrote:

ah, of course - only one label emitted now. They by this get sorted per the operands in the comparator for the std::map object, but a little detail then is that the labels now get emitted in any order (non-sorted) - but I suppose that's ok, right?

That shouldn't matter.

I started out with trying to "assemble" by hand per your suggestion, but it seemed like quite some work, especially considering that we also have to be able to print it as text.

I was thinking of just using emitInt16 for the three words, that will show as .word in the textual assembler output, which may not be pretty but is certainly correct. (You could in addition use an emitRawComment to print a textual representation to make the file more readable).

So I tried instead with adding back the GenericSubtarget, which was simple enough.

See the comments inline on that option.

In the meantime I thought of yet another option: the instructions used via EXRL are really part of the function containing the EXRL, and should therefore be emitted using the same Subtarget that is in effect for that function (this is not *currently* a problem, but it might be a potential issue if we want to use an EXRL target instruction that is only available in some ISA levels).

To correctly model this, one option might be to not have a single EXRLT2SymMap, but one per Subtarget, and sort the target instructions into the appropriate subtarget for the current function as they're added to the map. (Or, even simpler, just make the Subtarget part of the "key" for the map.)

In the common case where all functions use the same Subtarget, this would actually lead to the same result as now.

This then eliminates another 3786 XC targets: there are no duplicates in any file, and at most I see 13 different XC targets in a single file :-)

Excellent!

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
7895	All these changes seem unnecessary now.
llvm/lib/Target/SystemZ/SystemZSelectionDAGInfo.cpp
142	The above suggestion actually makes sense, we don't need a else here.
llvm/lib/Target/SystemZ/SystemZTargetMachine.h
48 ↗	(On Diff #355400)	So your new "getGenericSubtarget" does exactly the same that the no-argument getSubtargetImpl used to do. This is a bit odd, in particular given the comment here. If we do want to go that route, I think we should actually use the getSubtargetImpl name, but change the comment to explain what the "default" subtarget is, and what we use it for.

Updated per review.

I was thinking of just using emitInt16 for the three words, that will show as .word in the textual assembler output, which may not be pretty but is certainly correct. (You could in addition use an emitRawComment to print a textual representation to make the file more readable).

Ah, ok, I did not realize at the time I could just print the instruction as numbers... I tried this instead now, since we don't really want to have a generic subtarget around. Is there any reason to print 3 x Int16? Would it be better to print it as one (or three) hex if possible?

In the meantime I thought of yet another option: the instructions used via EXRL are really part of the function containing the EXRL, and should therefore be emitted using the same Subtarget that is in effect for that function (this is not *currently* a problem, but it might be a potential issue if we want to use an EXRL target instruction that is only available in some ISA levels).

This seems like the logical solution, but it is a bit unfortunate to have to go through all of that trouble just for this case where it is actually not really needed at all.

Not sure if the best thing is to do it the manual way, or to have an extra subtarget map on the side in the SystemZAsmPrinter...

I wonder if perhaps the MC layer could have an emitInstruction() / assembleInstruction() method without the STI argument, that was only very basic that could work in cases like this... It would be nice to reuse the bits encoding already available... On the other hand if we only have EXRL targets of this one format, maybe it's ok to do it on the side...

Harbormaster completed remote builds in B111835: Diff 355670.Jun 30 2021, 2:18 PM

In D103865#2851145, @jonpa wrote:

Updated per review.

I was thinking of just using emitInt16 for the three words, that will show as .word in the textual assembler output, which may not be pretty but is certainly correct. (You could in addition use an emitRawComment to print a textual representation to make the file more readable).

Ah, ok, I did not realize at the time I could just print the instruction as numbers... I tried this instead now, since we don't really want to have a generic subtarget around. Is there any reason to print 3 x Int16? Would it be better to print it as one (or three) hex if possible?

Well, there is no emitInt48 ... You could use one emitInt16 and one emitInt32, but that's not really better I think.

I must admin that manual encoding code is longer and uglier than I had initially thought ...

In the meantime I thought of yet another option: the instructions used via EXRL are really part of the function containing the EXRL, and should therefore be emitted using the same Subtarget that is in effect for that function (this is not *currently* a problem, but it might be a potential issue if we want to use an EXRL target instruction that is only available in some ISA levels).

This seems like the logical solution, but it is a bit unfortunate to have to go through all of that trouble just for this case where it is actually not really needed at all.

I think it wouldn't be all that much effort. You should simply change EXRLT2SymMap to be a map from a pair of (const SystemZSubtarget *, MCInst) to MCSymbol *, add the current subtarget pointer to the key when processing SystemZ::EXRL_Pseudo in the AsmPrinter, and in emitEXRLTargetInstructions just iterate over the map (as now) -- you'll get the pairs of (const SystemZSubtarget *, MCInst) back as key, and simply emit that instruction using that subtarget.

Otherwise, this is now looking good to me.

Well, there is no emitInt48 ...

(I saw emitIntValueInHex(uint64_t Value, unsigned Size), but haven't tried it...)

I must admin that manual encoding code is longer and uglier than I had initially thought ...
I think it wouldn't be all that much effort. You should simply change EXRLT2SymMap to be a map from a pair of (const SystemZSubtarget *, MCInst) to MCSymbol *, add the current subtarget pointer to the key when processing SystemZ::EXRL_Pseudo in the AsmPrinter, and in emitEXRLTargetInstructions just iterate over the map (as now) -- you'll get the pairs of (const SystemZSubtarget *, MCInst) back as key, and simply emit that instruction using that subtarget.

OK, I tried this now. It is in fact the identical number of target instructions to before, so using different Subtarget do not seem to play a role on SPEC...

Harbormaster completed remote builds in B112435: Diff 356488.Jul 5 2021, 6:28 AM

See the minor comment inside. Otherwise, this LGTM now. Thanks!

llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
35	This comparison might involve undefined behavior given that these aren't pointers into the same array. Probably best to convert to uintptr_t before comparing.

This revision is now accepted and ready to land.Jul 6 2021, 4:52 AM

This revision was landed with ongoing or failed builds.Jul 6 2021, 9:07 AM

Closed by commit rG37a92f3b03bf: [SystemZ] Generate XC loop for memset 0 of variable length. (authored by jonpa). · Explain Why

This revision was automatically updated to reflect the committed changes.

jonpa marked an inline comment as done.

jonpa added a commit: rG37a92f3b03bf: [SystemZ] Generate XC loop for memset 0 of variable length..

@jonpa @uweigand
It seems that this change causes clang to crash on s390x on Ubuntu bionic:
https://bugs.llvm.org/show_bug.cgi?id=51026

Revision Contents

Path

Size

llvm/

lib/

Target/

SystemZ/

SystemZAsmPrinter.h

31 lines

SystemZAsmPrinter.cpp

38 lines

SystemZISelLowering.cpp

136 lines

SystemZInstrFormats.td

5 lines

SystemZInstrInfo.td

8 lines

SystemZSelectionDAGInfo.cpp

16 lines

test/

CodeGen/

SystemZ/

memset-05.ll

101 lines

Diff 356749

llvm/lib/Target/SystemZ/SystemZAsmPrinter.h

	//===-- SystemZAsmPrinter.h - SystemZ LLVM assembly printer ----- C++ ---===//			//===-- SystemZAsmPrinter.h - SystemZ LLVM assembly printer ----- C++ ---===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_SYSTEMZ_SYSTEMZASMPRINTER_H			#ifndef LLVM_LIB_TARGET_SYSTEMZ_SYSTEMZASMPRINTER_H
	#define LLVM_LIB_TARGET_SYSTEMZ_SYSTEMZASMPRINTER_H			#define LLVM_LIB_TARGET_SYSTEMZ_SYSTEMZASMPRINTER_H

	#include "SystemZTargetMachine.h"
	#include "SystemZMCInstLower.h"			#include "SystemZMCInstLower.h"
				#include "SystemZTargetMachine.h"
	#include "llvm/CodeGen/AsmPrinter.h"			#include "llvm/CodeGen/AsmPrinter.h"
				uweigandUnsubmitted Done Reply Inline Actions The above clang-format check looks valid (includes should be kept sorted). uweigand: The above clang-format check looks valid (includes should be kept sorted).
	#include "llvm/CodeGen/StackMaps.h"			#include "llvm/CodeGen/StackMaps.h"
				#include "llvm/MC/MCInstBuilder.h"
	#include "llvm/Support/Compiler.h"			#include "llvm/Support/Compiler.h"

	namespace llvm {			namespace llvm {
	class MCStreamer;			class MCStreamer;
	class MachineBasicBlock;			class MachineBasicBlock;
	class MachineInstr;			class MachineInstr;
	class Module;			class Module;
	class raw_ostream;			class raw_ostream;

	class LLVM_LIBRARY_VISIBILITY SystemZAsmPrinter : public AsmPrinter {			class LLVM_LIBRARY_VISIBILITY SystemZAsmPrinter : public AsmPrinter {
	private:			private:
	StackMaps SM;			StackMaps SM;

				uweigandUnsubmitted Done Reply Inline Actions This as well. uweigand: This as well.
				typedef std::pair<MCInst, const MCSubtargetInfo *> MCInstSTIPair;
				struct CmpMCInst {
				bool operator()(const MCInstSTIPair &MCI_STI_A,
				const MCInstSTIPair &MCI_STI_B) const {
				if (MCI_STI_A.second != MCI_STI_B.second)
				return uintptr_t(MCI_STI_A.second) < uintptr_t(MCI_STI_B.second);
				uweigandUnsubmitted Done Reply Inline Actions This comparison might involve undefined behavior given that these aren't pointers into the same array. Probably best to convert to uintptr_t before comparing. uweigand: This comparison might involve undefined behavior given that these aren't pointers into the same…
				const MCInst &A = MCI_STI_A.first;
				const MCInst &B = MCI_STI_B.first;
				assert(A.getNumOperands() == B.getNumOperands() &&
				A.getNumOperands() == 5 && A.getOperand(2).getImm() == 1 &&
				B.getOperand(2).getImm() == 1 && "Unexpected EXRL target MCInst");
				if (A.getOpcode() != B.getOpcode())
				return A.getOpcode() < B.getOpcode();
				if (A.getOperand(0).getReg() != B.getOperand(0).getReg())
				return A.getOperand(0).getReg() < B.getOperand(0).getReg();
				if (A.getOperand(1).getImm() != B.getOperand(1).getImm())
				return A.getOperand(1).getImm() < B.getOperand(1).getImm();
				if (A.getOperand(3).getReg() != B.getOperand(3).getReg())
				return A.getOperand(3).getReg() < B.getOperand(3).getReg();
				if (A.getOperand(4).getImm() != B.getOperand(4).getImm())
				return A.getOperand(4).getImm() < B.getOperand(4).getImm();
				return false;
				}
				};
				typedef std::map<MCInstSTIPair, MCSymbol *, CmpMCInst> EXRLT2SymMap;
				EXRLT2SymMap EXRLTargets2Sym;

	public:			public:
	SystemZAsmPrinter(TargetMachine &TM, std::unique_ptr<MCStreamer> Streamer)			SystemZAsmPrinter(TargetMachine &TM, std::unique_ptr<MCStreamer> Streamer)
	: AsmPrinter(TM, std::move(Streamer)), SM(*this) {}			: AsmPrinter(TM, std::move(Streamer)), SM(*this) {}

	// Override AsmPrinter.			// Override AsmPrinter.
	StringRef getPassName() const override { return "SystemZ Assembly Printer"; }			StringRef getPassName() const override { return "SystemZ Assembly Printer"; }
	void emitInstruction(const MachineInstr *MI) override;			void emitInstruction(const MachineInstr *MI) override;
	void emitMachineConstantPoolValue(MachineConstantPoolValue *MCPV) override;			void emitMachineConstantPoolValue(MachineConstantPoolValue *MCPV) override;
	void emitEndOfAsmFile(Module &M) override;			void emitEndOfAsmFile(Module &M) override;
	bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo,			bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo,
	const char *ExtraCode, raw_ostream &OS) override;			const char *ExtraCode, raw_ostream &OS) override;
	bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo,			bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo,
	const char *ExtraCode, raw_ostream &OS) override;			const char *ExtraCode, raw_ostream &OS) override;

	bool doInitialization(Module &M) override {			bool doInitialization(Module &M) override {
	SM.reset();			SM.reset();
	return AsmPrinter::doInitialization(M);			return AsmPrinter::doInitialization(M);
	}			}

	private:			private:
	void LowerFENTRY_CALL(const MachineInstr &MI, SystemZMCInstLower &MCIL);			void LowerFENTRY_CALL(const MachineInstr &MI, SystemZMCInstLower &MCIL);
	void LowerSTACKMAP(const MachineInstr &MI);			void LowerSTACKMAP(const MachineInstr &MI);
	void LowerPATCHPOINT(const MachineInstr &MI, SystemZMCInstLower &Lower);			void LowerPATCHPOINT(const MachineInstr &MI, SystemZMCInstLower &Lower);
				void emitEXRLTargetInstructions();
	};			};
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp

Show First 20 Lines • Show All 535 Lines • ▼ Show 20 Lines	#undef LOWER_HIGH
case TargetOpcode::STACKMAP:		case TargetOpcode::STACKMAP:
LowerSTACKMAP(*MI);		LowerSTACKMAP(*MI);
return;		return;

case TargetOpcode::PATCHPOINT:		case TargetOpcode::PATCHPOINT:
LowerPATCHPOINT(*MI, Lower);		LowerPATCHPOINT(*MI, Lower);
return;		return;

		case SystemZ::EXRL_Pseudo: {
		unsigned TargetInsOpc = MI->getOperand(0).getImm();
		Register LenMinus1Reg = MI->getOperand(1).getReg();
		Register DestReg = MI->getOperand(2).getReg();
		int64_t DestDisp = MI->getOperand(3).getImm();
		Register SrcReg = MI->getOperand(4).getReg();
		int64_t SrcDisp = MI->getOperand(5).getImm();

		MCSymbol *DotSym = nullptr;
		MCInst ET = MCInstBuilder(TargetInsOpc).addReg(DestReg)
		.addImm(DestDisp).addImm(1).addReg(SrcReg).addImm(SrcDisp);
		MCInstSTIPair ET_STI(ET, &MF->getSubtarget());
		EXRLT2SymMap::iterator I = EXRLTargets2Sym.find(ET_STI);
		if (I != EXRLTargets2Sym.end())
		DotSym = I->second;
		else
		EXRLTargets2Sym[ET_STI] = DotSym = OutContext.createTempSymbol();
		const MCSymbolRefExpr *Dot = MCSymbolRefExpr::create(DotSym, OutContext);
		EmitToStreamer(
		*OutStreamer,
		MCInstBuilder(SystemZ::EXRL).addReg(LenMinus1Reg).addExpr(Dot));
		return;
		}

default:		default:
		uweigandUnsubmitted Done Reply Inline Actions Ah, if you do it that way, you don't actually have to have multiple labels for the same instruction. Just compute the instruction (or just its opcode, see discussion in the main comment) first, and look it up in the table. If found, simply return its (one) associated symbol; if not, create a new symbol and add the pair of symbol, instruction to the table. Actually, that table can then just be a std::map (from instruction to symbol) instead of a vector, this implements the lookup in a more efficient way ... uweigand: Ah, if you do it that way, you don't actually have to have multiple labels for the same…
Lower.lower(MI, LoweredMI);		Lower.lower(MI, LoweredMI);
break;		break;
}		}
EmitToStreamer(*OutStreamer, LoweredMI);		EmitToStreamer(*OutStreamer, LoweredMI);
}		}

// Emit the largest nop instruction smaller than or equal to NumBytes		// Emit the largest nop instruction smaller than or equal to NumBytes
// bytes. Return the size of nop emitted.		// bytes. Return the size of nop emitted.
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	assert(NumBytes >= EncodedBytes &&
"Patchpoint can't request size less than the length of a call.");		"Patchpoint can't request size less than the length of a call.");
assert((NumBytes - EncodedBytes) % 2 == 0 &&		assert((NumBytes - EncodedBytes) % 2 == 0 &&
"Invalid number of NOP bytes requested!");		"Invalid number of NOP bytes requested!");
while (EncodedBytes < NumBytes)		while (EncodedBytes < NumBytes)
EncodedBytes += EmitNop(OutContext, *OutStreamer, NumBytes - EncodedBytes,		EncodedBytes += EmitNop(OutContext, *OutStreamer, NumBytes - EncodedBytes,
getSubtargetInfo());		getSubtargetInfo());
}		}

		void SystemZAsmPrinter::emitEXRLTargetInstructions() {
		if (EXRLTargets2Sym.empty())
		return;
		// Switch to the .text section.
		OutStreamer->SwitchSection(getObjFileLowering().getTextSection());
		for (auto &I : EXRLTargets2Sym) {
		OutStreamer->emitLabel(I.second);
		const MCInstSTIPair &MCI_STI = I.first;
		OutStreamer->emitInstruction(MCI_STI.first, *MCI_STI.second);
		}
		EXRLTargets2Sym.clear();
		}

// Convert a SystemZ-specific constant pool modifier into the associated		// Convert a SystemZ-specific constant pool modifier into the associated
// MCSymbolRefExpr variant kind.		// MCSymbolRefExpr variant kind.
static MCSymbolRefExpr::VariantKind		static MCSymbolRefExpr::VariantKind
getModifierVariantKind(SystemZCP::SystemZCPModifier Modifier) {		getModifierVariantKind(SystemZCP::SystemZCPModifier Modifier) {
switch (Modifier) {		switch (Modifier) {
case SystemZCP::TLSGD: return MCSymbolRefExpr::VK_TLSGD;		case SystemZCP::TLSGD: return MCSymbolRefExpr::VK_TLSGD;
case SystemZCP::TLSLDM: return MCSymbolRefExpr::VK_TLSLDM;		case SystemZCP::TLSLDM: return MCSymbolRefExpr::VK_TLSLDM;
case SystemZCP::DTPOFF: return MCSymbolRefExpr::VK_DTPOFF;		case SystemZCP::DTPOFF: return MCSymbolRefExpr::VK_DTPOFF;
Show All 32 Lines	bool SystemZAsmPrinter::PrintAsmMemoryOperand(const MachineInstr *MI,
raw_ostream &OS) {		raw_ostream &OS) {
SystemZInstPrinter::printAddress(MAI, MI->getOperand(OpNo).getReg(),		SystemZInstPrinter::printAddress(MAI, MI->getOperand(OpNo).getReg(),
MI->getOperand(OpNo + 1).getImm(),		MI->getOperand(OpNo + 1).getImm(),
MI->getOperand(OpNo + 2).getReg(), OS);		MI->getOperand(OpNo + 2).getReg(), OS);
return false;		return false;
}		}

void SystemZAsmPrinter::emitEndOfAsmFile(Module &M) {		void SystemZAsmPrinter::emitEndOfAsmFile(Module &M) {
		emitEXRLTargetInstructions();
emitStackMaps(SM);		emitStackMaps(SM);
}		}

// Force static initialization.		// Force static initialization.
extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeSystemZAsmPrinter() {		extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeSystemZAsmPrinter() {
RegisterAsmPrinter<SystemZAsmPrinter> X(getTheSystemZTarget());		RegisterAsmPrinter<SystemZAsmPrinter> X(getTheSystemZTarget());
}		}

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,789 Lines • ▼ Show 20 Lines	const SystemZInstrInfo *TII =
static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());		static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();

MachineOperand DestBase = earlyUseOperand(MI.getOperand(0));		MachineOperand DestBase = earlyUseOperand(MI.getOperand(0));
uint64_t DestDisp = MI.getOperand(1).getImm();		uint64_t DestDisp = MI.getOperand(1).getImm();
MachineOperand SrcBase = earlyUseOperand(MI.getOperand(2));		MachineOperand SrcBase = earlyUseOperand(MI.getOperand(2));
uint64_t SrcDisp = MI.getOperand(3).getImm();		uint64_t SrcDisp = MI.getOperand(3).getImm();
uint64_t Length = MI.getOperand(4).getImm();		MachineOperand &LengthMO = MI.getOperand(4);
		uint64_t ImmLength = LengthMO.isImm() ? LengthMO.getImm() : 0;
		Register LenMinus1Reg =
		LengthMO.isReg() ? LengthMO.getReg() : SystemZ::NoRegister;

// When generating more than one CLC, all but the last will need to		// When generating more than one CLC, all but the last will need to
// branch to the end when a difference is found.		// branch to the end when a difference is found.
MachineBasicBlock *EndMBB = (Length > 256 && Opcode == SystemZ::CLC ?		MachineBasicBlock *EndMBB = (ImmLength > 256 && Opcode == SystemZ::CLC
SystemZ::splitBlockAfter(MI, MBB) : nullptr);		? SystemZ::splitBlockAfter(MI, MBB)
		: nullptr);

// Check for the loop form, in which operand 5 is the trip count.		// Check for the loop form, in which operand 5 is the trip count.
if (MI.getNumExplicitOperands() > 5) {		if (MI.getNumExplicitOperands() > 5) {
bool HaveSingleBase = DestBase.isIdenticalTo(SrcBase);

Register StartCountReg = MI.getOperand(5).getReg();		Register StartCountReg = MI.getOperand(5).getReg();

		MachineBasicBlock *StartMBB = nullptr;
		MachineBasicBlock *LoopMBB = nullptr;
		MachineBasicBlock *NextMBB = nullptr;
		MachineBasicBlock *DoneMBB = nullptr;
		MachineBasicBlock *AllDoneMBB = nullptr;

		bool HaveSingleBase = DestBase.isIdenticalTo(SrcBase);
Register StartSrcReg = forceReg(MI, SrcBase, TII);		Register StartSrcReg = forceReg(MI, SrcBase, TII);
Register StartDestReg = (HaveSingleBase ? StartSrcReg :		Register StartDestReg =
forceReg(MI, DestBase, TII));		(HaveSingleBase ? StartSrcReg : forceReg(MI, DestBase, TII));

const TargetRegisterClass *RC = &SystemZ::ADDR64BitRegClass;		const TargetRegisterClass *RC = &SystemZ::ADDR64BitRegClass;
Register ThisSrcReg = MRI.createVirtualRegister(RC);		Register ThisSrcReg = MRI.createVirtualRegister(RC);
Register ThisDestReg = (HaveSingleBase ? ThisSrcReg :		Register ThisDestReg =
MRI.createVirtualRegister(RC));		(HaveSingleBase ? ThisSrcReg : MRI.createVirtualRegister(RC));
Register NextSrcReg = MRI.createVirtualRegister(RC);		Register NextSrcReg = MRI.createVirtualRegister(RC);
Register NextDestReg = (HaveSingleBase ? NextSrcReg :		Register NextDestReg =
MRI.createVirtualRegister(RC));		(HaveSingleBase ? NextSrcReg : MRI.createVirtualRegister(RC));

RC = &SystemZ::GR64BitRegClass;		RC = &SystemZ::GR64BitRegClass;
Register ThisCountReg = MRI.createVirtualRegister(RC);		Register ThisCountReg = MRI.createVirtualRegister(RC);
Register NextCountReg = MRI.createVirtualRegister(RC);		Register NextCountReg = MRI.createVirtualRegister(RC);

MachineBasicBlock *StartMBB = MBB;		if (LengthMO.isReg()) {
MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MI, MBB);		AllDoneMBB = SystemZ::splitBlockBefore(MI, MBB);
MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(StartMBB);		StartMBB = SystemZ::emitBlockAfter(MBB);
MachineBasicBlock *NextMBB =		LoopMBB = SystemZ::emitBlockAfter(StartMBB);
(EndMBB ? SystemZ::emitBlockAfter(LoopMBB) : LoopMBB);		NextMBB = LoopMBB;
		DoneMBB = SystemZ::emitBlockAfter(LoopMBB);

		// MBB:
		// # Jump to AllDoneMBB if LenMinus1Reg is -1, or fall thru to StartMBB.
		BuildMI(MBB, DL, TII->get(SystemZ::CGHI))
		.addReg(LenMinus1Reg).addImm(-1);
		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_EQ)
		.addMBB(AllDoneMBB);
		MBB->addSuccessor(AllDoneMBB);
		MBB->addSuccessor(StartMBB);

		// StartMBB:
		// # Jump to DoneMBB if %StartCountReg is zero, or fall through to LoopMBB.
		MBB = StartMBB;
		BuildMI(MBB, DL, TII->get(SystemZ::CGHI))
		.addReg(StartCountReg).addImm(0);
		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_EQ)
		.addMBB(DoneMBB);
		MBB->addSuccessor(DoneMBB);
		MBB->addSuccessor(LoopMBB);
		}
		else {
		StartMBB = MBB;
		DoneMBB = SystemZ::splitBlockBefore(MI, MBB);
		LoopMBB = SystemZ::emitBlockAfter(StartMBB);
		NextMBB = (EndMBB ? SystemZ::emitBlockAfter(LoopMBB) : LoopMBB);

// StartMBB:		// StartMBB:
// # fall through to LoopMMB		// # fall through to LoopMBB
MBB->addSuccessor(LoopMBB);		MBB->addSuccessor(LoopMBB);

		DestBase = MachineOperand::CreateReg(NextDestReg, false);
		SrcBase = MachineOperand::CreateReg(NextSrcReg, false);
		ImmLength &= 255;
		if (EndMBB && !ImmLength)
		// If the loop handled the whole CLC range, DoneMBB will be empty with
		// CC live-through into EndMBB, so add it as live-in.
		DoneMBB->addLiveIn(SystemZ::CC);
		}

// LoopMBB:		// LoopMBB:
// %ThisDestReg = phi [ %StartDestReg, StartMBB ],		// %ThisDestReg = phi [ %StartDestReg, StartMBB ],
// [ %NextDestReg, NextMBB ]		// [ %NextDestReg, NextMBB ]
// %ThisSrcReg = phi [ %StartSrcReg, StartMBB ],		// %ThisSrcReg = phi [ %StartSrcReg, StartMBB ],
// [ %NextSrcReg, NextMBB ]		// [ %NextSrcReg, NextMBB ]
// %ThisCountReg = phi [ %StartCountReg, StartMBB ],		// %ThisCountReg = phi [ %StartCountReg, StartMBB ],
// [ %NextCountReg, NextMBB ]		// [ %NextCountReg, NextMBB ]
// ( PFD 2, 768+DestDisp(%ThisDestReg) )		// ( PFD 2, 768+DestDisp(%ThisDestReg) )
// Opcode DestDisp(256,%ThisDestReg), SrcDisp(%ThisSrcReg)		// Opcode DestDisp(256,%ThisDestReg), SrcDisp(%ThisSrcReg)
// ( JLH EndMBB )		// ( JLH EndMBB )
//		//
// The prefetch is used only for MVC. The JLH is used only for CLC.		// The prefetch is used only for MVC. The JLH is used only for CLC.
MBB = LoopMBB;		MBB = LoopMBB;

BuildMI(MBB, DL, TII->get(SystemZ::PHI), ThisDestReg)		BuildMI(MBB, DL, TII->get(SystemZ::PHI), ThisDestReg)
		uweigandUnsubmitted Done Reply Inline Actions All these changes seem unnecessary now. uweigand: All these changes seem unnecessary now.
.addReg(StartDestReg).addMBB(StartMBB)		.addReg(StartDestReg).addMBB(StartMBB)
.addReg(NextDestReg).addMBB(NextMBB);		.addReg(NextDestReg).addMBB(NextMBB);
if (!HaveSingleBase)		if (!HaveSingleBase)
BuildMI(MBB, DL, TII->get(SystemZ::PHI), ThisSrcReg)		BuildMI(MBB, DL, TII->get(SystemZ::PHI), ThisSrcReg)
.addReg(StartSrcReg).addMBB(StartMBB)		.addReg(StartSrcReg).addMBB(StartMBB)
.addReg(NextSrcReg).addMBB(NextMBB);		.addReg(NextSrcReg).addMBB(NextMBB);
BuildMI(MBB, DL, TII->get(SystemZ::PHI), ThisCountReg)		BuildMI(MBB, DL, TII->get(SystemZ::PHI), ThisCountReg)
.addReg(StartCountReg).addMBB(StartMBB)		.addReg(StartCountReg).addMBB(StartMBB)
Show All 18 Lines	if (MI.getNumExplicitOperands() > 5) {
// %NextSrcReg = LA 256(%ThisSrcReg)		// %NextSrcReg = LA 256(%ThisSrcReg)
// %NextCountReg = AGHI %ThisCountReg, -1		// %NextCountReg = AGHI %ThisCountReg, -1
// CGHI %NextCountReg, 0		// CGHI %NextCountReg, 0
// JLH LoopMBB		// JLH LoopMBB
// # fall through to DoneMBB		// # fall through to DoneMBB
//		//
// The AGHI, CGHI and JLH should be converted to BRCTG by later passes.		// The AGHI, CGHI and JLH should be converted to BRCTG by later passes.
MBB = NextMBB;		MBB = NextMBB;

BuildMI(MBB, DL, TII->get(SystemZ::LA), NextDestReg)		BuildMI(MBB, DL, TII->get(SystemZ::LA), NextDestReg)
.addReg(ThisDestReg).addImm(256).addReg(0);		.addReg(ThisDestReg).addImm(256).addReg(0);
if (!HaveSingleBase)		if (!HaveSingleBase)
BuildMI(MBB, DL, TII->get(SystemZ::LA), NextSrcReg)		BuildMI(MBB, DL, TII->get(SystemZ::LA), NextSrcReg)
.addReg(ThisSrcReg).addImm(256).addReg(0);		.addReg(ThisSrcReg).addImm(256).addReg(0);
BuildMI(MBB, DL, TII->get(SystemZ::AGHI), NextCountReg)		BuildMI(MBB, DL, TII->get(SystemZ::AGHI), NextCountReg)
.addReg(ThisCountReg).addImm(-1);		.addReg(ThisCountReg).addImm(-1);
BuildMI(MBB, DL, TII->get(SystemZ::CGHI))		BuildMI(MBB, DL, TII->get(SystemZ::CGHI))
.addReg(NextCountReg).addImm(0);		.addReg(NextCountReg).addImm(0);
BuildMI(MBB, DL, TII->get(SystemZ::BRC))		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_NE)		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_NE)
.addMBB(LoopMBB);		.addMBB(LoopMBB);
MBB->addSuccessor(LoopMBB);		MBB->addSuccessor(LoopMBB);
MBB->addSuccessor(DoneMBB);		MBB->addSuccessor(DoneMBB);

DestBase = MachineOperand::CreateReg(NextDestReg, false);
SrcBase = MachineOperand::CreateReg(NextSrcReg, false);
Length &= 255;
if (EndMBB && !Length)
// If the loop handled the whole CLC range, DoneMBB will be empty with
// CC live-through into EndMBB, so add it as live-in.
DoneMBB->addLiveIn(SystemZ::CC);
MBB = DoneMBB;		MBB = DoneMBB;
		if (LengthMO.isReg()) {
		// DoneMBB:
		// # Make PHIs for RemDestReg/RemSrcReg as the loop may or may not run.
		// # Use EXecute Relative Long for the remainder of the bytes. The target
		// instruction of the EXRL will have a length field of 1 since 0 is an
		// illegal value. The number of bytes processed becomes (%LenMinus1Reg &
		// 0xff) + 1.
		// # Fall through to AllDoneMBB.
		Register RemSrcReg = MRI.createVirtualRegister(&SystemZ::ADDR64BitRegClass);
		Register RemDestReg = HaveSingleBase ? RemSrcReg
		: MRI.createVirtualRegister(&SystemZ::ADDR64BitRegClass);
		BuildMI(MBB, DL, TII->get(SystemZ::PHI), RemDestReg)
		.addReg(StartDestReg).addMBB(StartMBB)
		.addReg(NextDestReg).addMBB(LoopMBB);
		if (!HaveSingleBase)
		BuildMI(MBB, DL, TII->get(SystemZ::PHI), RemSrcReg)
		.addReg(StartSrcReg).addMBB(StartMBB)
		.addReg(NextSrcReg).addMBB(LoopMBB);
		MRI.constrainRegClass(LenMinus1Reg, &SystemZ::ADDR64BitRegClass);
		BuildMI(MBB, DL, TII->get(SystemZ::EXRL_Pseudo))
		.addImm(Opcode)
		.addReg(LenMinus1Reg)
		.addReg(RemDestReg).addImm(DestDisp)
		.addReg(RemSrcReg).addImm(SrcDisp);
		MBB->addSuccessor(AllDoneMBB);
		MBB = AllDoneMBB;
}		}
		}

// Handle any remaining bytes with straight-line code.		// Handle any remaining bytes with straight-line code.
while (Length > 0) {		while (ImmLength > 0) {
uint64_t ThisLength = std::min(Length, uint64_t(256));		uint64_t ThisLength = std::min(ImmLength, uint64_t(256));
// The previous iteration might have created out-of-range displacements.		// The previous iteration might have created out-of-range displacements.
// Apply them using LAY if so.		// Apply them using LAY if so.
if (!isUInt<12>(DestDisp)) {		if (!isUInt<12>(DestDisp)) {
Register Reg = MRI.createVirtualRegister(&SystemZ::ADDR64BitRegClass);		Register Reg = MRI.createVirtualRegister(&SystemZ::ADDR64BitRegClass);
BuildMI(*MBB, MI, MI.getDebugLoc(), TII->get(SystemZ::LAY), Reg)		BuildMI(*MBB, MI, MI.getDebugLoc(), TII->get(SystemZ::LAY), Reg)
.add(DestBase)		.add(DestBase)
.addImm(DestDisp)		.addImm(DestDisp)
.addReg(0);		.addReg(0);
Show All 13 Lines	BuildMI(*MBB, MI, DL, TII->get(Opcode))
.add(DestBase)		.add(DestBase)
.addImm(DestDisp)		.addImm(DestDisp)
.addImm(ThisLength)		.addImm(ThisLength)
.add(SrcBase)		.add(SrcBase)
.addImm(SrcDisp)		.addImm(SrcDisp)
.setMemRefs(MI.memoperands());		.setMemRefs(MI.memoperands());
DestDisp += ThisLength;		DestDisp += ThisLength;
SrcDisp += ThisLength;		SrcDisp += ThisLength;
Length -= ThisLength;		ImmLength -= ThisLength;
// If there's another CLC to go, branch to the end if a difference		// If there's another CLC to go, branch to the end if a difference
// was found.		// was found.
if (EndMBB && Length > 0) {		if (EndMBB && ImmLength > 0) {
MachineBasicBlock *NextMBB = SystemZ::splitBlockBefore(MI, MBB);		MachineBasicBlock *NextMBB = SystemZ::splitBlockBefore(MI, MBB);
BuildMI(MBB, DL, TII->get(SystemZ::BRC))		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_NE)		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_NE)
.addMBB(EndMBB);		.addMBB(EndMBB);
MBB->addSuccessor(EndMBB);		MBB->addSuccessor(EndMBB);
MBB->addSuccessor(NextMBB);		MBB->addSuccessor(NextMBB);
MBB = NextMBB;		MBB = NextMBB;
}		}
▲ Show 20 Lines • Show All 474 Lines • ▼ Show 20 Lines	MachineBasicBlock *SystemZTargetLowering::EmitInstrWithCustomInserter(
case SystemZ::NCSequence:		case SystemZ::NCSequence:
case SystemZ::NCLoop:		case SystemZ::NCLoop:
return emitMemMemWrapper(MI, MBB, SystemZ::NC);		return emitMemMemWrapper(MI, MBB, SystemZ::NC);
case SystemZ::OCSequence:		case SystemZ::OCSequence:
case SystemZ::OCLoop:		case SystemZ::OCLoop:
return emitMemMemWrapper(MI, MBB, SystemZ::OC);		return emitMemMemWrapper(MI, MBB, SystemZ::OC);
case SystemZ::XCSequence:		case SystemZ::XCSequence:
case SystemZ::XCLoop:		case SystemZ::XCLoop:
		case SystemZ::XCLoopVarLen:
return emitMemMemWrapper(MI, MBB, SystemZ::XC);		return emitMemMemWrapper(MI, MBB, SystemZ::XC);
case SystemZ::CLCSequence:		case SystemZ::CLCSequence:
case SystemZ::CLCLoop:		case SystemZ::CLCLoop:
return emitMemMemWrapper(MI, MBB, SystemZ::CLC);		return emitMemMemWrapper(MI, MBB, SystemZ::CLC);
case SystemZ::CLSTLoop:		case SystemZ::CLSTLoop:
return emitStringWrapper(MI, MBB, SystemZ::CLST);		return emitStringWrapper(MI, MBB, SystemZ::CLST);
case SystemZ::MVSTLoop:		case SystemZ::MVSTLoop:
return emitStringWrapper(MI, MBB, SystemZ::MVST);		return emitStringWrapper(MI, MBB, SystemZ::MVST);
Show All 35 Lines

llvm/lib/Target/SystemZ/SystemZInstrFormats.td

Show First 20 Lines • Show All 5,247 Lines • ▼ Show 20 Lines	multiclass CondUnaryRSYPseudoAndMemFold<string mnemonic,
def _MemFoldPseudo : MemFoldPseudo_CondMove<mnemonic, cls, bytes, mode>;		def _MemFoldPseudo : MemFoldPseudo_CondMove<mnemonic, cls, bytes, mode>;
}		}

// Define an instruction that operates on two fixed-length blocks of memory,		// Define an instruction that operates on two fixed-length blocks of memory,
// and associated pseudo instructions for operating on blocks of any size.		// and associated pseudo instructions for operating on blocks of any size.
// The Sequence form uses a straight-line sequence of instructions and		// The Sequence form uses a straight-line sequence of instructions and
// the Loop form uses a loop of length-256 instructions followed by		// the Loop form uses a loop of length-256 instructions followed by
// another instruction to handle the excess.		// another instruction to handle the excess.
		// The LoopVarLen form is for a loop with a non-constant length parameter.
multiclass MemorySS<string mnemonic, bits<8> opcode,		multiclass MemorySS<string mnemonic, bits<8> opcode,
SDPatternOperator sequence, SDPatternOperator loop> {		SDPatternOperator sequence, SDPatternOperator loop> {
def "" : SideEffectBinarySSa<mnemonic, opcode>;		def "" : SideEffectBinarySSa<mnemonic, opcode>;
let usesCustomInserter = 1, hasNoSchedulingInfo = 1, Defs = [CC] in {		let usesCustomInserter = 1, hasNoSchedulingInfo = 1, Defs = [CC] in {
def Sequence : Pseudo<(outs), (ins bdaddr12only:$dest, bdaddr12only:$src,		def Sequence : Pseudo<(outs), (ins bdaddr12only:$dest, bdaddr12only:$src,
imm64:$length),		imm64:$length),
[(sequence bdaddr12only:$dest, bdaddr12only:$src,		[(sequence bdaddr12only:$dest, bdaddr12only:$src,
imm64:$length)]>;		imm64:$length)]>;
def Loop : Pseudo<(outs), (ins bdaddr12only:$dest, bdaddr12only:$src,		def Loop : Pseudo<(outs), (ins bdaddr12only:$dest, bdaddr12only:$src,
imm64:$length, GR64:$count256),		imm64:$length, GR64:$count256),
[(loop bdaddr12only:$dest, bdaddr12only:$src,		[(loop bdaddr12only:$dest, bdaddr12only:$src,
imm64:$length, GR64:$count256)]>;		imm64:$length, GR64:$count256)]>;
		def LoopVarLen : Pseudo<(outs), (ins bdaddr12only:$dest, bdaddr12only:$src,
		GR64:$length, GR64:$count256),
		[(loop bdaddr12only:$dest, bdaddr12only:$src,
		GR64:$length, GR64:$count256)]>;
}		}
}		}

// The same, but setting a CC result as comparison operator.		// The same, but setting a CC result as comparison operator.
multiclass CompareMemorySS<string mnemonic, bits<8> opcode,		multiclass CompareMemorySS<string mnemonic, bits<8> opcode,
SDPatternOperator sequence, SDPatternOperator loop> {		SDPatternOperator sequence, SDPatternOperator loop> {
def "" : SideEffectBinarySSa<mnemonic, opcode>;		def "" : SideEffectBinarySSa<mnemonic, opcode>;
let usesCustomInserter = 1, hasNoSchedulingInfo = 1 in {		let usesCustomInserter = 1, hasNoSchedulingInfo = 1 in {
Show All 26 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.td

	Show First 20 Lines • Show All 2,159 Lines • ▼ Show 20 Lines
	// Deflate conversion call.			// Deflate conversion call.
	let Predicates = [FeatureDeflateConversion],			let Predicates = [FeatureDeflateConversion],
	mayLoad = 1, mayStore = 1, Defs = [CC], Uses = [R0L, R1D] in			mayLoad = 1, mayStore = 1, Defs = [CC], Uses = [R0L, R1D] in
	def DFLTCC : SideEffectTernaryMemMemRRFa<"dfltcc", 0xB939,			def DFLTCC : SideEffectTernaryMemMemRRFa<"dfltcc", 0xB939,
	GR128, GR128, GR64>;			GR128, GR128, GR64>;

	// Execute.			// Execute.
	let hasSideEffects = 1 in {			let hasSideEffects = 1 in {
	def EX : SideEffectBinaryRX<"ex", 0x44, GR64>;			def EX : SideEffectBinaryRX<"ex", 0x44, ADDR64>;
	def EXRL : SideEffectBinaryRILPC<"exrl", 0xC60, GR64>;			def EXRL : SideEffectBinaryRILPC<"exrl", 0xC60, ADDR64>;
				let hasNoSchedulingInfo = 1 in
				def EXRL_Pseudo : Pseudo<(outs), (ins i64imm:$TargetOpc, ADDR64:$lenMinus1,
				bdaddr12only:$bdl1, bdaddr12only:$bd2),
				[]>;
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// .insn directive instructions			// .insn directive instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let isCodeGenOnly = 1, hasSideEffects = 1 in {			let isCodeGenOnly = 1, hasSideEffects = 1 in {
	def InsnE : DirectiveInsnE<(outs), (ins imm64zx16:$enc), ".insn e,$enc", []>;			def InsnE : DirectiveInsnE<(outs), (ins imm64zx16:$enc), ".insn e,$enc", []>;
	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZSelectionDAGInfo.cpp

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	SDValue SystemZSelectionDAGInfo::EmitTargetCodeForMemset(
SelectionDAG &DAG, const SDLoc &DL, SDValue Chain, SDValue Dst,		SelectionDAG &DAG, const SDLoc &DL, SDValue Chain, SDValue Dst,
SDValue Byte, SDValue Size, Align Alignment, bool IsVolatile,		SDValue Byte, SDValue Size, Align Alignment, bool IsVolatile,
MachinePointerInfo DstPtrInfo) const {		MachinePointerInfo DstPtrInfo) const {
EVT PtrVT = Dst.getValueType();		EVT PtrVT = Dst.getValueType();

if (IsVolatile)		if (IsVolatile)
return SDValue();		return SDValue();

		auto *CByte = dyn_cast<ConstantSDNode>(Byte);
if (auto *CSize = dyn_cast<ConstantSDNode>(Size)) {		if (auto *CSize = dyn_cast<ConstantSDNode>(Size)) {
uint64_t Bytes = CSize->getZExtValue();		uint64_t Bytes = CSize->getZExtValue();
if (Bytes == 0)		if (Bytes == 0)
return SDValue();		return SDValue();
if (auto *CByte = dyn_cast<ConstantSDNode>(Byte)) {		if (CByte) {
// Handle cases that can be done using at most two of		// Handle cases that can be done using at most two of
// MVI, MVHI, MVHHI and MVGHI. The latter two can only be		// MVI, MVHI, MVHHI and MVGHI. The latter two can only be
// used if ByteVal is all zeros or all ones; in other casees,		// used if ByteVal is all zeros or all ones; in other casees,
// we can move at most 2 halfwords.		// we can move at most 2 halfwords.
uint64_t ByteVal = CByte->getZExtValue();		uint64_t ByteVal = CByte->getZExtValue();
if (ByteVal == 0 \|\| ByteVal == 255 ?		if (ByteVal == 0 \|\| ByteVal == 255 ?
Bytes <= 16 && countPopulation(Bytes) <= 2 :		Bytes <= 16 && countPopulation(Bytes) <= 2 :
Bytes <= 4) {		Bytes <= 4) {
Show All 23 Lines	if (CByte) {
SDValue Chain2 = DAG.getStore(Chain, DL, Byte, Dst2,		SDValue Chain2 = DAG.getStore(Chain, DL, Byte, Dst2,
DstPtrInfo.getWithOffset(1), Align(1));		DstPtrInfo.getWithOffset(1), Align(1));
return DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chain1, Chain2);		return DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chain1, Chain2);
}		}
}		}
assert(Bytes >= 2 && "Should have dealt with 0- and 1-byte cases already");		assert(Bytes >= 2 && "Should have dealt with 0- and 1-byte cases already");

// Handle the special case of a memset of 0, which can use XC.		// Handle the special case of a memset of 0, which can use XC.
auto *CByte = dyn_cast<ConstantSDNode>(Byte);
if (CByte && CByte->getZExtValue() == 0)		if (CByte && CByte->getZExtValue() == 0)
return emitMemMem(DAG, DL, SystemZISD::XC, SystemZISD::XC_LOOP,		return emitMemMem(DAG, DL, SystemZISD::XC, SystemZISD::XC_LOOP,
Chain, Dst, Dst, Bytes);		Chain, Dst, Dst, Bytes);

// Copy the byte to the first location and then use MVC to copy		// Copy the byte to the first location and then use MVC to copy
// it to the rest.		// it to the rest.
Chain = DAG.getStore(Chain, DL, Byte, Dst, DstPtrInfo, Alignment);		Chain = DAG.getStore(Chain, DL, Byte, Dst, DstPtrInfo, Alignment);
SDValue DstPlus1 = DAG.getNode(ISD::ADD, DL, PtrVT, Dst,		SDValue DstPlus1 = DAG.getNode(ISD::ADD, DL, PtrVT, Dst,
DAG.getConstant(1, DL, PtrVT));		DAG.getConstant(1, DL, PtrVT));
return emitMemMem(DAG, DL, SystemZISD::MVC, SystemZISD::MVC_LOOP,		return emitMemMem(DAG, DL, SystemZISD::MVC, SystemZISD::MVC_LOOP,
Chain, DstPlus1, Dst, Bytes - 1);		Chain, DstPlus1, Dst, Bytes - 1);
}		}

		// Variable length
		uweigandUnsubmitted Done Reply Inline Actions The above suggestion actually makes sense, we don't need a else here. uweigand: The above suggestion actually makes sense, we don't need a else here.
		if (CByte && CByte->getZExtValue() == 0) {
		// Handle the special case of a variable length memset of 0 with XC.
		SDValue LenMinus1 = DAG.getNode(ISD::ADD, DL, MVT::i64,
		DAG.getZExtOrTrunc(Size, DL, MVT::i64),
		DAG.getConstant(-1, DL, MVT::i64));
		SDValue TripC = DAG.getNode(ISD::SRL, DL, MVT::i64, LenMinus1,
		DAG.getConstant(8, DL, MVT::i64));
		return DAG.getNode(SystemZISD::XC_LOOP, DL, MVT::Other, Chain, Dst, Dst,
		LenMinus1, TripC);
		}
return SDValue();		return SDValue();
}		}

// Use CLC to compare [Src1, Src1 + Size) with [Src2, Src2 + Size),		// Use CLC to compare [Src1, Src1 + Size) with [Src2, Src2 + Size),
// deciding whether to use a loop or straight-line code.		// deciding whether to use a loop or straight-line code.
static SDValue emitCLC(SelectionDAG &DAG, const SDLoc &DL, SDValue Chain,		static SDValue emitCLC(SelectionDAG &DAG, const SDLoc &DL, SDValue Chain,
SDValue Src1, SDValue Src2, uint64_t Size) {		SDValue Src1, SDValue Src2, uint64_t Size) {
SDVTList VTs = DAG.getVTList(MVT::i32, MVT::Other);		SDVTList VTs = DAG.getVTList(MVT::i32, MVT::Other);
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/memset-05.ll

This file was added.

				; Test memset 0 with variable length
				;
				; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s

				define void @fun0(i8* %Addr, i64 %Len) {
				; CHECK-LABEL: fun0:
				; CHECK: # %bb.0:
				; CHECK-NEXT: aghi %r3, -1
				; CHECK-NEXT: cgibe %r3, -1, 0(%r14)
				; CHECK-NEXT: .LBB0_1:
				; CHECK-NEXT: srlg %r0, %r3, 8
				; CHECK-NEXT: cgije %r0, 0, .LBB0_3
				; CHECK-NEXT: .LBB0_2: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: xc 0(256,%r2), 0(%r2)
				; CHECK-NEXT: la %r2, 256(%r2)
				; CHECK-NEXT: brctg %r0, .LBB0_2
				; CHECK-NEXT: .LBB0_3:
				; CHECK-NEXT: exrl %r3, .Ltmp0
				; CHECK-NEXT: br %r14
				tail call void @llvm.memset.p0i8.i64(i8* %Addr, i8 0, i64 %Len, i1 false)
				ret void
				}

				define void @fun1(i8* %Addr, i32 %Len) {
				; CHECK-LABEL: fun1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: llgfr %r1, %r3
				; CHECK-NEXT: aghi %r1, -1
				; CHECK-NEXT: cgibe %r1, -1, 0(%r14)
				; CHECK-NEXT: .LBB1_1:
				; CHECK-NEXT: srlg %r0, %r1, 8
				; CHECK-NEXT: cgije %r0, 0, .LBB1_3
				; CHECK-NEXT: .LBB1_2: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: xc 0(256,%r2), 0(%r2)
				; CHECK-NEXT: la %r2, 256(%r2)
				; CHECK-NEXT: brctg %r0, .LBB1_2
				; CHECK-NEXT: .LBB1_3:
				; CHECK-NEXT: exrl %r1, .Ltmp0
				; CHECK-NEXT: br %r14
				tail call void @llvm.memset.p0i8.i32(i8* %Addr, i8 0, i32 %Len, i1 false)
				ret void
				}

				; Test that identical target instructions get reused.
				define void @fun2(i8* %Addr, i32 %Len) {
				; CHECK-LABEL: fun2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: llgfr %r1, %r3
				; CHECK-NEXT: aghi %r1, -1
				; CHECK-NEXT: srlg %r0, %r1, 8
				; CHECK-NEXT: cgije %r1, -1, .LBB2_5
				; CHECK-NEXT: # %bb.1:
				; CHECK-NEXT: lgr %r3, %r2
				; CHECK-NEXT: cgije %r0, 0, .LBB2_4
				; CHECK-NEXT: # %bb.2:
				; CHECK-NEXT: lgr %r3, %r2
				; CHECK-NEXT: lgr %r4, %r0
				; CHECK-NEXT: .LBB2_3: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: xc 0(256,%r3), 0(%r3)
				; CHECK-NEXT: la %r3, 256(%r3)
				; CHECK-NEXT: brctg %r4, .LBB2_3
				; CHECK-NEXT: .LBB2_4:
				; CHECK-NEXT: exrl %r1, .Ltmp1
				; CHECK-NEXT: .LBB2_5:
				; CHECK-NEXT: cgije %r1, -1, .LBB2_10
				; CHECK-NEXT: # %bb.6:
				; CHECK-NEXT: lgr %r3, %r2
				; CHECK-NEXT: cgije %r0, 0, .LBB2_9
				; CHECK-NEXT: # %bb.7:
				; CHECK-NEXT: lgr %r3, %r2
				; CHECK-NEXT: lgr %r4, %r0
				; CHECK-NEXT: .LBB2_8: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: xc 0(256,%r3), 0(%r3)
				; CHECK-NEXT: la %r3, 256(%r3)
				; CHECK-NEXT: brctg %r4, .LBB2_8
				; CHECK-NEXT: .LBB2_9:
				; CHECK-NEXT: exrl %r1, .Ltmp1
				; CHECK-NEXT: .LBB2_10:
				; CHECK-NEXT: cgibe %r1, -1, 0(%r14)
				; CHECK-NEXT: .LBB2_11:
				; CHECK-NEXT: cgije %r0, 0, .LBB2_13
				; CHECK-NEXT: .LBB2_12: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: xc 0(256,%r2), 0(%r2)
				; CHECK-NEXT: la %r2, 256(%r2)
				; CHECK-NEXT: brctg %r0, .LBB2_12
				; CHECK-NEXT: .LBB2_13:
				; CHECK-NEXT: exrl %r1, .Ltmp0
				; CHECK-NEXT: br %r14
				tail call void @llvm.memset.p0i8.i32(i8* %Addr, i8 0, i32 %Len, i1 false)
				tail call void @llvm.memset.p0i8.i32(i8* %Addr, i8 0, i32 %Len, i1 false)
				tail call void @llvm.memset.p0i8.i32(i8* %Addr, i8 0, i32 %Len, i1 false)
				ret void
				}

				; CHECK: .Ltmp0:
				; CHECK-NEXT: xc 0(1,%r2), 0(%r2)
				; CHECK-NEXT: .Ltmp1:
				; CHECK-NEXT: xc 0(1,%r3), 0(%r3)

				declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)
				declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg)