This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load
AbandonedPublic

Authored by mareko on Jan 22 2017, 2:16 PM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
nhaehnle
arsenm

Summary

SMEM opcodes are faster, so we want to use them if possible.

Deus Ex performance: +13%
(with on-demand shader compilation enabled in Deus Ex, so the real
improvement should be higher)

Diff Detail

Build Status

Buildable 3148
Build 3148: arc lint + arc unit

Event Timeline

mareko created this revision.Jan 22 2017, 2:16 PM

Herald edited edge metadata. · View Herald TranscriptJan 22 2017, 2:16 PM

Herald added subscribers: tony-tye, yaxunl, wdng, kzhuravl. · View Herald Transcript

I wonder if the improvement comes from the fact that the intrinsics can use SMEM now, or the fact I fixed smrd#_SGPR to accept a non-constant offset.

arsenm added inline comments.Jan 23 2017, 11:37 AM

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1190	I'm not sure I understand what the point of the AnyReg parameter is
1311–1312	Can you move this getGeneration check into a new Subtarget->hasGLCForSMEM()?
lib/Target/AMDGPU/SIInstrInfo.cpp
2973	Do you also ne ed to handle x8/x16? (those can probably be a separate patch, but for now an assert/unreachable woul dwork)
2995–2996	These should use getNamedOperand
2997–2998	These look backwards from the order (it's also called offset, not inst_offset).
test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.ll
144	Tests seem to be missing for the moveToVALU path (or should those all be covered by the existing intrinsic tests?)

mareko mentioned this in D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.Jan 31 2017, 4:01 PM

What if something else has written to the buffer in the same shader? That would make using smem instructions illegal.

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

In D28993#662978, @nhaehnle wrote:

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Can you be more specific about why it's incorrect? I only see an issue with L1 coherency (GLC=0).

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1190	I'll change that to "AllowNonConst". It should be obvious if you look below.
1311–1312	Sure.
lib/Target/AMDGPU/SIInstrInfo.cpp
2973	No. They can't be selected for amdgcn.buffer.load.
2997–2998	The order is the same as in the .inc files.
test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.ll
144	They are all covered by existing tests.

In D28993#662995, @mareko wrote:

In D28993#662978, @nhaehnle wrote:

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Can you be more specific about why it's incorrect? I only see an issue with L1 coherency (GLC=0).

That's what I meant, yes.

Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.

t-tye added a subscriber: t-tye.Mar 22 2017, 6:38 PM

tony-tye removed a subscriber: tony-tye.Mar 22 2017, 6:50 PM

In D28993#663458, @tstellarAMD wrote:

Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.

Doesn't this make this change unworkable? Presumably the front-end would need to annotate in some way to indicate that this is a legitimate transformation, in which case you might as well use a different intrinsic anyway. Are there any circumstances where you can determine if this is definitely the case?

I've got a situation that benefits from this change, but equally could use the solution in D27586. Perhaps that change could be enhanced with the non-const offset change in this review?

In D28993#744583, @dstuttard wrote:

In D28993#663458, @tstellarAMD wrote:

Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.

Doesn't this make this change unworkable? Presumably the front-end would need to annotate in some way to indicate that this is a legitimate transformation, in which case you might as well use a different intrinsic anyway. Are there any circumstances where you can determine if this is definitely the case?

I've got a situation that benefits from this change, but equally could use the solution in D27586. Perhaps that change could be enhanced with the non-const offset change in this review?

Yes, having separate intrinsics like D27586 is preferable not just because of the sL1 vs vL1 coherency stuff, but also because SMEM instructions have many differences compared to VMEM and sometimes even the same looking VMEM and SMEM instructions have different behavior. It's also important that s.load intrinsics support non-constant offsets and are lowered to VMEM when the address comes from a VGPR.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

57 lines

SIInstrInfo.cpp

41 lines

SIRegisterInfo.td

10 lines

SMInstructions.td

23 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.buffer.load.ll

60 lines

Diff 85298

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	private:
bool SelectMUBUFIntrinsicOffset(SDValue Offset, SDValue &SOffset,		bool SelectMUBUFIntrinsicOffset(SDValue Offset, SDValue &SOffset,
SDValue &ImmOffset) const;		SDValue &ImmOffset) const;
bool SelectMUBUFIntrinsicVOffset(SDValue Offset, SDValue &SOffset,		bool SelectMUBUFIntrinsicVOffset(SDValue Offset, SDValue &SOffset,
SDValue &ImmOffset, SDValue &VOffset) const;		SDValue &ImmOffset, SDValue &VOffset) const;

bool SelectFlat(SDValue Addr, SDValue &VAddr,		bool SelectFlat(SDValue Addr, SDValue &VAddr,
SDValue &SLC, SDValue &TFE) const;		SDValue &SLC, SDValue &TFE) const;

bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue &Offset,		bool SelectSMRDOffset(SDValue ByteOffsetNode, bool AnyReg, SDValue &Offset,
bool &Imm) const;		bool &Imm) const;
bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset,		bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset,
bool &Imm) const;		bool &Imm) const;
bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDBufferImm(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferImm(SDValue Addr, SDValue &Offset) const;
bool SelectSMRDBufferImm32(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferImm32(SDValue Addr, SDValue &Offset) const;
bool SelectSMRDBufferSgpr(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferSgpr(SDValue Addr, SDValue &Offset) const;
		bool SelectSMRDBufferGLC(SDValue GLC, SDValue &Out) const;
bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;		bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;
bool SelectVOP3Mods(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3Mods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
bool SelectVOP3NoMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3NoMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
bool SelectVOP3Mods0(SDValue In, SDValue &Src, SDValue &SrcMods,		bool SelectVOP3Mods0(SDValue In, SDValue &Src, SDValue &SrcMods,
SDValue &Clamp, SDValue &Omod) const;		SDValue &Clamp, SDValue &Omod) const;
bool SelectVOP3NoMods0(SDValue In, SDValue &Src, SDValue &SrcMods,		bool SelectVOP3NoMods0(SDValue In, SDValue &Src, SDValue &SrcMods,
SDValue &Clamp, SDValue &Omod) const;		SDValue &Clamp, SDValue &Omod) const;

▲ Show 20 Lines • Show All 1,028 Lines • ▼ Show 20 Lines
/// directly into the instruction. On SI/CI the \p EncodedOffset		/// directly into the instruction. On SI/CI the \p EncodedOffset
/// will be in units of dwords and on VI+ it will be units of bytes.		/// will be in units of dwords and on VI+ it will be units of bytes.
static bool isLegalSMRDImmOffset(const AMDGPUSubtarget *ST,		static bool isLegalSMRDImmOffset(const AMDGPUSubtarget *ST,
int64_t EncodedOffset) {		int64_t EncodedOffset) {
return ST->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS ?		return ST->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS ?
isUInt<8>(EncodedOffset) : isUInt<20>(EncodedOffset);		isUInt<8>(EncodedOffset) : isUInt<20>(EncodedOffset);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,		bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode, bool AnyReg,
		arsenmUnsubmitted Not Done Reply Inline Actions I'm not sure I understand what the point of the AnyReg parameter is arsenm: I'm not sure I understand what the point of the AnyReg parameter is
		marekoAuthorUnsubmitted Not Done Reply Inline Actions I'll change that to "AllowNonConst". It should be obvious if you look below. mareko: I'll change that to "AllowNonConst". It should be obvious if you look below.
SDValue &Offset, bool &Imm) const {		SDValue &Offset, bool &Imm) const {

// FIXME: Handle non-constant offsets.
ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);		ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);
if (!C)		if (!C) {
return false;		Offset = ByteOffsetNode;
		Imm = false;
		return AnyReg;
		}

SDLoc SL(ByteOffsetNode);		SDLoc SL(ByteOffsetNode);
AMDGPUSubtarget::Generation Gen = Subtarget->getGeneration();		AMDGPUSubtarget::Generation Gen = Subtarget->getGeneration();
int64_t ByteOffset = C->getSExtValue();		int64_t ByteOffset = C->getSExtValue();
int64_t EncodedOffset = Gen < AMDGPUSubtarget::VOLCANIC_ISLANDS ?
ByteOffset >> 2 : ByteOffset;

if (isLegalSMRDImmOffset(Subtarget, EncodedOffset)) {		bool Aligned;
		int64_t EncodedOffset;

		if (Gen <= AMDGPUSubtarget::SEA_ISLANDS) {
		Aligned = ByteOffset % 4 == 0;
		EncodedOffset = ByteOffset >> 2;
		} else {
		Aligned = true;
		EncodedOffset = ByteOffset;
		}

		if (Aligned && isLegalSMRDImmOffset(Subtarget, EncodedOffset)) {
Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);		Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);
Imm = true;		Imm = true;
return true;		return true;
}		}

if (!isUInt<32>(EncodedOffset) \|\| !isUInt<32>(ByteOffset))		if (!isUInt<32>(EncodedOffset) \|\| !isUInt<32>(ByteOffset))
return false;		return false;

if (Gen == AMDGPUSubtarget::SEA_ISLANDS && isUInt<32>(EncodedOffset)) {		if (Gen == AMDGPUSubtarget::SEA_ISLANDS &&
		Aligned && isUInt<32>(EncodedOffset)) {
// 32-bit Immediates are supported on Sea Islands.		// 32-bit Immediates are supported on Sea Islands.
Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);		Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);
} else {		} else {
SDValue C32Bit = CurDAG->getTargetConstant(ByteOffset, SL, MVT::i32);		SDValue C32Bit = CurDAG->getTargetConstant(ByteOffset, SL, MVT::i32);
Offset = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32,		Offset = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32,
C32Bit), 0);		C32Bit), 0);
}		}
Imm = false;		Imm = false;
return true;		return true;
}		}

bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,
SDValue &Offset, bool &Imm) const {		SDValue &Offset, bool &Imm) const {
SDLoc SL(Addr);		SDLoc SL(Addr);
if (CurDAG->isBaseWithConstantOffset(Addr)) {		if (CurDAG->isBaseWithConstantOffset(Addr)) {
SDValue N0 = Addr.getOperand(0);		SDValue N0 = Addr.getOperand(0);
SDValue N1 = Addr.getOperand(1);		SDValue N1 = Addr.getOperand(1);

if (SelectSMRDOffset(N1, Offset, Imm)) {		if (SelectSMRDOffset(N1, false, Offset, Imm)) {
SBase = N0;		SBase = N0;
return true;		return true;
}		}
}		}
SBase = Addr;		SBase = Addr;
Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);		Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);
Imm = true;		Imm = true;
return true;		return true;
Show All 23 Lines	bool AMDGPUDAGToDAGISel::SelectSMRDSgpr(SDValue Addr, SDValue &SBase,
bool Imm;		bool Imm;
return SelectSMRD(Addr, SBase, Offset, Imm) && !Imm &&		return SelectSMRD(Addr, SBase, Offset, Imm) && !Imm &&
!isa<ConstantSDNode>(Offset);		!isa<ConstantSDNode>(Offset);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm(SDValue Addr,		bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm(SDValue Addr,
SDValue &Offset) const {		SDValue &Offset) const {
bool Imm;		bool Imm;
return SelectSMRDOffset(Addr, Offset, Imm) && Imm;		return SelectSMRDOffset(Addr, true, Offset, Imm) && Imm;
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm32(SDValue Addr,		bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm32(SDValue Addr,
SDValue &Offset) const {		SDValue &Offset) const {
if (Subtarget->getGeneration() != AMDGPUSubtarget::SEA_ISLANDS)		if (Subtarget->getGeneration() != AMDGPUSubtarget::SEA_ISLANDS)
return false;		return false;

bool Imm;		bool Imm;
if (!SelectSMRDOffset(Addr, Offset, Imm))		if (!SelectSMRDOffset(Addr, true, Offset, Imm))
return false;		return false;

return !Imm && isa<ConstantSDNode>(Offset);		return !Imm && isa<ConstantSDNode>(Offset);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDBufferSgpr(SDValue Addr,		bool AMDGPUDAGToDAGISel::SelectSMRDBufferSgpr(SDValue Addr,
SDValue &Offset) const {		SDValue &Offset) const {
bool Imm;		bool Imm;
return SelectSMRDOffset(Addr, Offset, Imm) && !Imm &&		return SelectSMRDOffset(Addr, true, Offset, Imm) && !Imm &&
!isa<ConstantSDNode>(Offset);		!isa<ConstantSDNode>(Offset);
}		}

		bool AMDGPUDAGToDAGISel::SelectSMRDBufferGLC(SDValue GLC,
		SDValue &Out) const {
		ConstantSDNode *C = dyn_cast<ConstantSDNode>(GLC);
		if (!C)
		return false;

		// Only VI supports GLC=1 for SMRD.
		if (Subtarget->getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS &&
		arsenmUnsubmitted Not Done Reply Inline Actions Can you move this getGeneration check into a new Subtarget->hasGLCForSMEM()? arsenm: Can you move this getGeneration check into a new Subtarget->hasGLCForSMEM()?
		marekoAuthorUnsubmitted Not Done Reply Inline Actions Sure. mareko: Sure.
		C->getZExtValue())
		return false;

		Out = GLC;
		return true;
		}

bool AMDGPUDAGToDAGISel::SelectMOVRELOffset(SDValue Index,		bool AMDGPUDAGToDAGISel::SelectMOVRELOffset(SDValue Index,
SDValue &Base,		SDValue &Base,
SDValue &Offset) const {		SDValue &Offset) const {
SDLoc DL(Index);		SDLoc DL(Index);

if (CurDAG->isBaseWithConstantOffset(Index)) {		if (CurDAG->isBaseWithConstantOffset(Index)) {
SDValue N0 = Index.getOperand(0);		SDValue N0 = Index.getOperand(0);
SDValue N1 = Index.getOperand(1);		SDValue N1 = Index.getOperand(1);
▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 2,962 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
case AMDGPU::S_CBRANCH_SCC1:		case AMDGPU::S_CBRANCH_SCC1:
// Clear unused bits of vcc		// Clear unused bits of vcc
BuildMI(*MBB, Inst, Inst.getDebugLoc(), get(AMDGPU::S_AND_B64),		BuildMI(*MBB, Inst, Inst.getDebugLoc(), get(AMDGPU::S_AND_B64),
AMDGPU::VCC)		AMDGPU::VCC)
.addReg(AMDGPU::EXEC)		.addReg(AMDGPU::EXEC)
.addReg(AMDGPU::VCC);		.addReg(AMDGPU::VCC);
break;		break;

		case AMDGPU::S_BUFFER_LOAD_DWORD_SGPR:
		case AMDGPU::S_BUFFER_LOAD_DWORDX2_SGPR:
		case AMDGPU::S_BUFFER_LOAD_DWORDX4_SGPR: {
		arsenmUnsubmitted Not Done Reply Inline Actions Do you also ne ed to handle x8/x16? (those can probably be a separate patch, but for now an assert/unreachable woul dwork) arsenm: Do you also ne ed to handle x8/x16? (those can probably be a separate patch, but for now an…
		marekoAuthorUnsubmitted Not Done Reply Inline Actions No. They can't be selected for amdgcn.buffer.load. mareko: No. They can't be selected for amdgcn.buffer.load.
		unsigned ResultReg;
		unsigned NewOpcode;

		switch (Opcode) {
		case AMDGPU::S_BUFFER_LOAD_DWORD_SGPR:
		NewOpcode = AMDGPU::BUFFER_LOAD_DWORD_OFFEN;
		ResultReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
		break;
		case AMDGPU::S_BUFFER_LOAD_DWORDX2_SGPR:
		NewOpcode = AMDGPU::BUFFER_LOAD_DWORDX2_OFFEN;
		ResultReg = MRI.createVirtualRegister(&AMDGPU::VReg_64RegClass);
		break;
		case AMDGPU::S_BUFFER_LOAD_DWORDX4_SGPR:
		NewOpcode = AMDGPU::BUFFER_LOAD_DWORDX4_OFFEN;
		ResultReg = MRI.createVirtualRegister(&AMDGPU::VReg_128RegClass);
		break;
		}

		MachineOperand &Dest = Inst.getOperand(0);

		BuildMI(*MBB, Inst, Inst.getDebugLoc(), get(NewOpcode), ResultReg)
		.addReg(Inst.getOperand(2).getReg()) // offset
		.addReg(Inst.getOperand(1).getReg()) // rsrc
		arsenmUnsubmitted Not Done Reply Inline Actions These should use getNamedOperand arsenm: These should use getNamedOperand
		.addImm(0) // soffset
		.addImm(0) // inst_offset
		arsenmUnsubmitted Not Done Reply Inline Actions These look backwards from the order (it's also called offset, not inst_offset). arsenm: These look backwards from the order (it's also called offset, not inst_offset).
		marekoAuthorUnsubmitted Not Done Reply Inline Actions The order is the same as in the .inc files. mareko: The order is the same as in the .inc files.
		.addImm(Inst.getOperand(3).getImm()) // glc
		.addImm(0) // slc
		.addImm(0) // tfe
		.addMemOperand(*Inst.memoperands_begin());

		MRI.replaceRegWith(Dest.getReg(), ResultReg);
		addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);
		Inst.eraseFromParent();
		continue;
		}

case AMDGPU::S_BFE_U64:		case AMDGPU::S_BFE_U64:
case AMDGPU::S_BFM_B64:		case AMDGPU::S_BFM_B64:
llvm_unreachable("Moving this op to VALU not implemented");		llvm_unreachable("moveToVALU: S_BFE_U64 and S_BFM_B64 not implemented");
}		}

if (NewOpcode == AMDGPU::INSTRUCTION_LIST_END) {		if (NewOpcode == AMDGPU::INSTRUCTION_LIST_END) {
// We cannot move this instruction to the VALU, so we should try to		// We cannot move this instruction to the VALU, so we should try to
// legalize its operands instead.		// legalize its operands instead.
legalizeOperands(Inst);		legalizeOperands(Inst);
continue;		continue;
}		}
▲ Show 20 Lines • Show All 661 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.td

	Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines
	}			}

	// Register class for all scalar registers (SGPRs + Special Registers)			// Register class for all scalar registers (SGPRs + Special Registers)
	def SReg_32 : RegisterClass<"AMDGPU", [i32, f32, i16, f16], 32,			def SReg_32 : RegisterClass<"AMDGPU", [i32, f32, i16, f16], 32,
	(add SReg_32_XM0, M0_CLASS, EXEC_LO, EXEC_HI)> {			(add SReg_32_XM0, M0_CLASS, EXEC_LO, EXEC_HI)> {
	let AllocationPriority = 7;			let AllocationPriority = 7;
	}			}

	def SGPR_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add SGPR_64Regs)> {			def SGPR_64 : RegisterClass<"AMDGPU", [v2i32, v2f32, i64, f64], 32, (add SGPR_64Regs)> {
	let CopyCost = 1;			let CopyCost = 1;
	let AllocationPriority = 8;			let AllocationPriority = 8;
	}			}

	def TTMP_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add TTMP_64Regs)> {			def TTMP_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add TTMP_64Regs)> {
	let isAllocatable = 0;			let isAllocatable = 0;
	}			}

	def SReg_64_XEXEC : RegisterClass<"AMDGPU", [v2i32, i64, f64, i1], 32,			def SReg_64_XEXEC : RegisterClass<"AMDGPU", [v2i32, v2f32, i64, f64, i1], 32,
	(add SGPR_64, VCC, FLAT_SCR, TTMP_64, TBA, TMA)> {			(add SGPR_64, VCC, FLAT_SCR, TTMP_64, TBA, TMA)> {
	let CopyCost = 1;			let CopyCost = 1;
	let AllocationPriority = 8;			let AllocationPriority = 8;
	}			}

	def SReg_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64, i1], 32,			def SReg_64 : RegisterClass<"AMDGPU", [v2i32, v2f32, i64, f64, i1], 32,
	(add SReg_64_XEXEC, EXEC)> {			(add SReg_64_XEXEC, EXEC)> {
	let CopyCost = 1;			let CopyCost = 1;
	let AllocationPriority = 8;			let AllocationPriority = 8;
	}			}

	// Requires 2 s_mov_b64 to copy			// Requires 2 s_mov_b64 to copy
	let CopyCost = 2 in {			let CopyCost = 2 in {

	def SGPR_128 : RegisterClass<"AMDGPU", [v4i32, v16i8, v2i64], 32, (add SGPR_128Regs)> {			def SGPR_128 : RegisterClass<"AMDGPU", [v4i32, v4f32, v16i8, v2i64], 32, (add SGPR_128Regs)> {
	let AllocationPriority = 10;			let AllocationPriority = 10;
	}			}

	def TTMP_128 : RegisterClass<"AMDGPU", [v4i32, v16i8, v2i64], 32, (add TTMP_128Regs)> {			def TTMP_128 : RegisterClass<"AMDGPU", [v4i32, v16i8, v2i64], 32, (add TTMP_128Regs)> {
	let isAllocatable = 0;			let isAllocatable = 0;
	}			}

	def SReg_128 : RegisterClass<"AMDGPU", [v4i32, v16i8, v2i64], 32, (add SGPR_128, TTMP_128)> {			def SReg_128 : RegisterClass<"AMDGPU", [v4i32, v4f32, v16i8, v2i64], 32, (add SGPR_128, TTMP_128)> {
	let AllocationPriority = 10;			let AllocationPriority = 10;
	}			}

	} // End CopyCost = 2			} // End CopyCost = 2

	def SReg_256 : RegisterClass<"AMDGPU", [v8i32, v8f32], 32, (add SGPR_256)> {			def SReg_256 : RegisterClass<"AMDGPU", [v8i32, v8f32], 32, (add SGPR_256)> {
	// Requires 4 s_mov_b64 to copy			// Requires 4 s_mov_b64 to copy
	let CopyCost = 4;			let CopyCost = 4;
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SMInstructions.td

Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	return Ld->getAlignment() >= 4 &&
(Subtarget->getScalarizeGlobalBehavior() && Ld->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS &&		(Subtarget->getScalarizeGlobalBehavior() && Ld->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS &&
static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N) &&		static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N) &&
static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N)));		static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N)));
}]>;		}]>;

def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;		def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;
def SMRDImm32 : ComplexPattern<i64, 2, "SelectSMRDImm32">;		def SMRDImm32 : ComplexPattern<i64, 2, "SelectSMRDImm32">;
def SMRDSgpr : ComplexPattern<i64, 2, "SelectSMRDSgpr">;		def SMRDSgpr : ComplexPattern<i64, 2, "SelectSMRDSgpr">;
		def SMRDSgprConst : ComplexPattern<i64, 2, "SelectSMRDSgprConst">;
def SMRDBufferImm : ComplexPattern<i32, 1, "SelectSMRDBufferImm">;		def SMRDBufferImm : ComplexPattern<i32, 1, "SelectSMRDBufferImm">;
def SMRDBufferImm32 : ComplexPattern<i32, 1, "SelectSMRDBufferImm32">;		def SMRDBufferImm32 : ComplexPattern<i32, 1, "SelectSMRDBufferImm32">;
def SMRDBufferSgpr : ComplexPattern<i32, 1, "SelectSMRDBufferSgpr">;		def SMRDBufferSgpr : ComplexPattern<i32, 1, "SelectSMRDBufferSgpr">;
		def SMRDBufferGLC : ComplexPattern<i32, 1, "SelectSMRDBufferGLC">;

let Predicates = [isGCN] in {		let Predicates = [isGCN] in {

multiclass SMRD_Pattern <string Instr, ValueType vt> {		multiclass SMRD_Pattern <string Instr, ValueType vt> {

// 1. IMM offset		// 1. IMM offset
def : Pat <		def : Pat <
(smrd_load (SMRDImm i64:$sbase, i32:$offset)),		(smrd_load (SMRDImm i64:$sbase, i32:$offset)),
(vt (!cast<SM_Pseudo>(Instr#"_IMM") $sbase, $offset, 0))		(vt (!cast<SM_Pseudo>(Instr#"_IMM") $sbase, $offset, 0))
>;		>;

// 2. SGPR offset		// 2. SGPR offset
def : Pat <		def : Pat <
(smrd_load (SMRDSgpr i64:$sbase, i32:$offset)),		(smrd_load (SMRDSgpr i64:$sbase, i32:$offset)),
(vt (!cast<SM_Pseudo>(Instr#"_SGPR") $sbase, $offset, 0))		(vt (!cast<SM_Pseudo>(Instr#"_SGPR") $sbase, $offset, 0))
>;		>;
}		}

		multiclass SMRD_LoadIntrinsicPat<SDPatternOperator name, ValueType vt,
		string opcode> {
		def : Pat<
		(vt (name v4i32:$rsrc, 0,
		(SMRDBufferImm i32:$offset),
		(SMRDBufferGLC i32:$glc), 0)),
		(!cast<SM_Load_Pseudo>(opcode # _IMM) $rsrc, $offset, (as_i1imm $glc))
		>;

		def : Pat<
		(vt (name v4i32:$rsrc, 0,
		(SMRDBufferSgpr i32:$offset),
		(SMRDBufferGLC i32:$glc), 0)),
		(!cast<SM_Load_Pseudo>(opcode # _SGPR) $rsrc, $offset, (as_i1imm $glc))
		>;
		}

let Predicates = [isSICI] in {		let Predicates = [isSICI] in {
def : Pat <		def : Pat <
(i64 (readcyclecounter)),		(i64 (readcyclecounter)),
(S_MEMTIME)		(S_MEMTIME)
>;		>;
}		}

// Global and constant loads can be selected to either MUBUF or SMRD		// Global and constant loads can be selected to either MUBUF or SMRD
// instructions, but SMRD instructions are faster so we want the instruction		// instructions, but SMRD instructions are faster so we want the instruction
// selector to prefer those.		// selector to prefer those.
let AddedComplexity = 100 in {		let AddedComplexity = 100 in {

defm : SMRD_Pattern <"S_LOAD_DWORD", i32>;		defm : SMRD_Pattern <"S_LOAD_DWORD", i32>;
defm : SMRD_Pattern <"S_LOAD_DWORDX2", v2i32>;		defm : SMRD_Pattern <"S_LOAD_DWORDX2", v2i32>;
defm : SMRD_Pattern <"S_LOAD_DWORDX4", v4i32>;		defm : SMRD_Pattern <"S_LOAD_DWORDX4", v4i32>;
defm : SMRD_Pattern <"S_LOAD_DWORDX8", v8i32>;		defm : SMRD_Pattern <"S_LOAD_DWORDX8", v8i32>;
defm : SMRD_Pattern <"S_LOAD_DWORDX16", v16i32>;		defm : SMRD_Pattern <"S_LOAD_DWORDX16", v16i32>;

		defm : SMRD_LoadIntrinsicPat<SIbuffer_load, f32, "S_BUFFER_LOAD_DWORD">;
		defm : SMRD_LoadIntrinsicPat<SIbuffer_load, v2f32, "S_BUFFER_LOAD_DWORDX2">;
		defm : SMRD_LoadIntrinsicPat<SIbuffer_load, v4f32, "S_BUFFER_LOAD_DWORDX4">;

// 1. Offset as an immediate		// 1. Offset as an immediate
def SM_LOAD_PATTERN : Pat < // name this pattern to reuse AddedComplexity on CI		def SM_LOAD_PATTERN : Pat < // name this pattern to reuse AddedComplexity on CI
(SIload_constant v4i32:$sbase, (SMRDBufferImm i32:$offset)),		(SIload_constant v4i32:$sbase, (SMRDBufferImm i32:$offset)),
(S_BUFFER_LOAD_DWORD_IMM $sbase, $offset, 0)		(S_BUFFER_LOAD_DWORD_IMM $sbase, $offset, 0)
>;		>;

// 2. Offset loaded in an 32bit SGPR		// 2. Offset loaded in an 32bit SGPR
def : Pat <		def : Pat <
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.ll

	;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=SICI			;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=SI -check-prefix=SICI -check-prefix=SIVI
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=VI			;RUN: llc < %s -march=amdgcn -mcpu=bonaire -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=CI -check-prefix=SICI -check-prefix=CIVI
				;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=VI -check-prefix=CIVI -check-prefix=SIVI

	;CHECK-LABEL: {{^}}buffer_load:			;CHECK-LABEL: {{^}}buffer_load:
	;CHECK: buffer_load_dwordx4 v[0:3], off, s[0:3], 0			;CHECK-DAG: s_buffer_load_dwordx4 s[{{[0-9:]+}}], s[0:3], 0x0
	;CHECK: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 glc			;VI-DAG: s_buffer_load_dwordx4 s[{{[0-9:]+}}], s[0:3], 0x0 glc
	;CHECK: buffer_load_dwordx4 v[8:11], off, s[0:3], 0 slc			;SICI-DAG: buffer_load_dwordx4 v[{{[0-9:]+}}], off, s[0:3], 0 glc
				;CHECK-DAG: buffer_load_dwordx4 v[{{[0-9:]+}}], off, s[0:3], 0 slc
	;CHECK: s_waitcnt			;CHECK: s_waitcnt
	define amdgpu_ps {<4 x float>, <4 x float>, <4 x float>} @buffer_load(<4 x i32> inreg) {			define amdgpu_ps {<4 x float>, <4 x float>, <4 x float>} @buffer_load(<4 x i32> inreg) {
	main_body:			main_body:
	%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 0, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 0, i1 0, i1 0)
	%data_glc = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 0, i1 1, i1 0)			%data_glc = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 0, i1 1, i1 0)
	%data_slc = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 0, i1 0, i1 1)			%data_slc = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 0, i1 0, i1 1)
	%r0 = insertvalue {<4 x float>, <4 x float>, <4 x float>} undef, <4 x float> %data, 0			%r0 = insertvalue {<4 x float>, <4 x float>, <4 x float>} undef, <4 x float> %data, 0
	%r1 = insertvalue {<4 x float>, <4 x float>, <4 x float>} %r0, <4 x float> %data_glc, 1			%r1 = insertvalue {<4 x float>, <4 x float>, <4 x float>} %r0, <4 x float> %data_glc, 1
	%r2 = insertvalue {<4 x float>, <4 x float>, <4 x float>} %r1, <4 x float> %data_slc, 2			%r2 = insertvalue {<4 x float>, <4 x float>, <4 x float>} %r1, <4 x float> %data_slc, 2
	ret {<4 x float>, <4 x float>, <4 x float>} %r2			ret {<4 x float>, <4 x float>, <4 x float>} %r2
	}			}

	;CHECK-LABEL: {{^}}buffer_load_immoffs:			;CHECK-LABEL: {{^}}buffer_load_immoffs:
	;CHECK: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 offset:42			;SICI: s_mov_b32 s4, 42
				;SICI: s_buffer_load_dwordx4 s[0:3], s[0:3], s4
				;VI: s_buffer_load_dwordx4 s[0:3], s[0:3], 0x2a
	;CHECK: s_waitcnt			;CHECK: s_waitcnt
	define amdgpu_ps <4 x float> @buffer_load_immoffs(<4 x i32> inreg) {			define amdgpu_ps <4 x float> @buffer_load_immoffs(<4 x i32> inreg) {
	main_body:			main_body:
	%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 42, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 42, i1 0, i1 0)
	ret <4 x float> %data			ret <4 x float> %data
	}			}

	;CHECK-LABEL: {{^}}buffer_load_immoffs_large:			;CHECK-LABEL: {{^}}buffer_load_immoffs_large:
	;SICI: buffer_load_dwordx4 v[0:3], {{v[0-9]+}}, s[0:3], 0 offen			;SI: s_movk_i32 s4, 0x2000
	;VI: s_movk_i32 [[OFFSET:s[0-9]+]], 0x1fff			;SI: s_buffer_load_dwordx4 s[0:3], s[0:3], s4
	;VI: buffer_load_dwordx4 v[0:3], off, s[0:3], [[OFFSET]] offset:1			;TODO: this should use SMEM:
				;CI: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x2000
				;CI: buffer_load_dwordx4 v[0:3], v0, s[0:3], 0 offen
				;VI: s_buffer_load_dwordx4 s[0:3], s[0:3], 0x2000
	;CHECK: s_waitcnt			;CHECK: s_waitcnt
	define amdgpu_ps <4 x float> @buffer_load_immoffs_large(<4 x i32> inreg) {			define amdgpu_ps <4 x float> @buffer_load_immoffs_large(<4 x i32> inreg) {
	main_body:			main_body:
	%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 8192, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 8192, i1 0, i1 0)
	ret <4 x float> %data			ret <4 x float> %data
	}			}

	;CHECK-LABEL: {{^}}buffer_load_idx:			;CHECK-LABEL: {{^}}buffer_load_idx:
	;CHECK: buffer_load_dwordx4 v[0:3], v0, s[0:3], 0 idxen			;CHECK: buffer_load_dwordx4 v[0:3], v0, s[0:3], 0 idxen
	;CHECK: s_waitcnt			;CHECK: s_waitcnt
	define amdgpu_ps <4 x float> @buffer_load_idx(<4 x i32> inreg, i32) {			define amdgpu_ps <4 x float> @buffer_load_idx(<4 x i32> inreg, i32) {
	main_body:			main_body:
	%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 %1, i32 0, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 %1, i32 0, i1 0, i1 0)
	ret <4 x float> %data			ret <4 x float> %data
	}			}

				;CHECK-LABEL: {{^}}buffer_load_ofs_smem:
				;CHECK: s_buffer_load_dwordx4 s[0:3], s[0:3], s4
				;CHECK: s_waitcnt
				define amdgpu_ps <4 x float> @buffer_load_ofs_smem(<4 x i32> inreg, i32 inreg) {
				main_body:
				%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %1, i1 0, i1 0)
				ret <4 x float> %data
				}

	;CHECK-LABEL: {{^}}buffer_load_ofs:			;CHECK-LABEL: {{^}}buffer_load_ofs:
	;CHECK: buffer_load_dwordx4 v[0:3], v0, s[0:3], 0 offen			;CHECK: buffer_load_dwordx4 v[0:3], v0, s[0:3], 0 offen
	;CHECK: s_waitcnt			;CHECK: s_waitcnt
	define amdgpu_ps <4 x float> @buffer_load_ofs(<4 x i32> inreg, i32) {			define amdgpu_ps <4 x float> @buffer_load_ofs(<4 x i32> inreg, i32) {
	main_body:			main_body:
	%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %1, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %1, i1 0, i1 0)
	ret <4 x float> %data			ret <4 x float> %data
	}			}

				;CHECK-LABEL: {{^}}buffer_load_ofs_imm_smem:
				;CHECK: s_add_i32 s4, s4, 58
				;CHECK: s_buffer_load_dwordx4 s[0:3], s[0:3], s4
				;CHECK: s_waitcnt
				define amdgpu_ps <4 x float> @buffer_load_ofs_imm_smem(<4 x i32> inreg, i32 inreg) {
				main_body:
				%ofs = add i32 %1, 58
				%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %ofs, i1 0, i1 0)
				ret <4 x float> %data
				}

	;CHECK-LABEL: {{^}}buffer_load_ofs_imm:			;CHECK-LABEL: {{^}}buffer_load_ofs_imm:
	;CHECK: buffer_load_dwordx4 v[0:3], v0, s[0:3], 0 offen offset:58			;TODO: v_add could be folded into VMEM:
				;CHECK: v_add_i32_e32 v0, vcc, 58, v0
				;CHECK: buffer_load_dwordx4 v[0:3], v0, s[0:3], 0 offen
				;XCHECK: buffer_load_dwordx4 v[0:3], v0, s[0:3], 0 offen offset:58
	;CHECK: s_waitcnt			;CHECK: s_waitcnt
	define amdgpu_ps <4 x float> @buffer_load_ofs_imm(<4 x i32> inreg, i32) {			define amdgpu_ps <4 x float> @buffer_load_ofs_imm(<4 x i32> inreg, i32) {
	main_body:			main_body:
	%ofs = add i32 %1, 58			%ofs = add i32 %1, 58
	%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %ofs, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %ofs, i1 0, i1 0)
	ret <4 x float> %data			ret <4 x float> %data
	}			}

	Show All 29 Lines
	;CHECK: buffer_load_dwordx2 v[0:1], v[0:1], s[0:3], 0 idxen offen			;CHECK: buffer_load_dwordx2 v[0:1], v[0:1], s[0:3], 0 idxen offen
	;CHECK: s_waitcnt			;CHECK: s_waitcnt
	define amdgpu_ps <2 x float> @buffer_load_x2(<4 x i32> inreg %rsrc, i32 %idx, i32 %ofs) {			define amdgpu_ps <2 x float> @buffer_load_x2(<4 x i32> inreg %rsrc, i32 %idx, i32 %ofs) {
	main_body:			main_body:
	%data = call <2 x float> @llvm.amdgcn.buffer.load.v2f32(<4 x i32> %rsrc, i32 %idx, i32 %ofs, i1 0, i1 0)			%data = call <2 x float> @llvm.amdgcn.buffer.load.v2f32(<4 x i32> %rsrc, i32 %idx, i32 %ofs, i1 0, i1 0)
	ret <2 x float> %data			ret <2 x float> %data
	}			}

				;CHECK-LABEL: {{^}}buffer_load_negative_offset_smem:
				;CHECK: s_add_i32 s4, s4, -16
				;CHECK: s_buffer_load_dwordx4 s[0:3], s[0:3], s4
				define amdgpu_ps <4 x float> @buffer_load_negative_offset_smem(<4 x i32> inreg, i32 inreg %ofs) {
				main_body:
				%ofs.1 = add i32 %ofs, -16
				%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %ofs.1, i1 0, i1 0)
				ret <4 x float> %data
				}

				arsenmUnsubmitted Not Done Reply Inline Actions Tests seem to be missing for the moveToVALU path (or should those all be covered by the existing intrinsic tests?) arsenm: Tests seem to be missing for the moveToVALU path (or should those all be covered by the…
				marekoAuthorUnsubmitted Not Done Reply Inline Actions They are all covered by existing tests. mareko: They are all covered by existing tests.
	;CHECK-LABEL: {{^}}buffer_load_negative_offset:			;CHECK-LABEL: {{^}}buffer_load_negative_offset:
	;CHECK: v_add_i32_e32 [[VOFS:v[0-9]+]], vcc, -16, v0			;CHECK: v_add_i32_e32 [[VOFS:v[0-9]+]], vcc, -16, v0
	;CHECK: buffer_load_dwordx4 v[0:3], [[VOFS]], s[0:3], 0 offen			;CHECK: buffer_load_dwordx4 v[0:3], [[VOFS]], s[0:3], 0 offen
	define amdgpu_ps <4 x float> @buffer_load_negative_offset(<4 x i32> inreg, i32 %ofs) {			define amdgpu_ps <4 x float> @buffer_load_negative_offset(<4 x i32> inreg, i32 %ofs) {
	main_body:			main_body:
	%ofs.1 = add i32 %ofs, -16			%ofs.1 = add i32 %ofs, -16
	%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %ofs.1, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 0, i32 %ofs.1, i1 0, i1 0)
	ret <4 x float> %data			ret <4 x float> %data
	Show All 21 Lines