This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix pointer info for pseudo source for r600
ClosedPublic

Authored by yaxunl on Nov 5 2017, 6:23 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
jsjodin

Commits

rG920cc2f813c6: [AMDGPU] Fix pointer info for pseudo source for r600
rL317861: [AMDGPU] Fix pointer info for pseudo source for r600

Summary

The pointer info for pseudo source for r600 is not correct when
alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz.

This patch fixes that.

Diff Detail

Event Timeline

yaxunl created this revision.Nov 5 2017, 6:23 PM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptNov 5 2017, 6:23 PM

Remove unnecessary changes to CodeGen since the PointerInfo can be inferred.

Ping. D39698 depends on this.

Why are we trying to use this alternative address space mapping for r600?

jsjodin added inline comments.Nov 7 2017, 9:38 AM

lib/Target/AMDGPU/R600InstrInfo.cpp
1511	I think you will need a return at the end of the function because I remember seeing some warnings on some buildbots. You can double check. Otherwise it looks good to me.

In D39670#917974, @tstellar wrote:

Why are we trying to use this alternative address space mapping for r600?

We need to simplify the backend to support only one address space mapping. Supporting multiple address space mapping causes unnecessary complexity in the backend, e.g. intrinsic functions returning different pointer types based on triple, etc.

Which intrinsic functions are you refering to? To me it seems easier to not touch r600 at all, but I guess I'm not as deep into the code as you. Can you give me some specific examples for how adding support to r600 for this alternative mapping simplifies the backend?

Revised by Jan's comment.

In D39670#918042, @tstellar wrote:

Which intrinsic functions are you refering to? To me it seems easier to not touch r600 at all, but I guess I'm not as deep into the code as you. Can you give me some specific examples for how adding support to r600 for this alternative mapping simplifies the backend?

For example, if we want to define a intrinsic function returning generic pointer, we have to define two versions. one for amdgiz environment, one for other environment, because they have different address space value for generic pointer. And we have to let the backend choose which version based on target environment.

Also, because address space values depends on target triple, we can not define them as constants. Instead it has to be a structure returned by function getAMDGPUAS(triple) and use it. If we only supports one address space mapping, we can define it as an enum and use it.

In D39670#918093, @yaxunl wrote:

In D39670#918042, @tstellar wrote:

Which intrinsic functions are you refering to? To me it seems easier to not touch r600 at all, but I guess I'm not as deep into the code as you. Can you give me some specific examples for how adding support to r600 for this alternative mapping simplifies the backend?

For example, if we want to define a intrinsic function returning generic pointer, we have to define two versions. one for amdgiz environment, one for other environment, because they have different address space value for generic pointer. And we have to let the backend choose which version based on target environment.

Also, because address space values depends on target triple, we can not define them as constants. Instead it has to be a structure returned by function getAMDGPUAS(triple) and use it. If we only supports one address space mapping, we can define it as an enum and use it.

Sorry I forgot to mention: the plan is to let amdgcn and r600 only support the new address space mapping, then make it default and remove the old address space mapping.

r600 doesn't support generic address space, so in that case, I would recommend only defining an intrinsic to be used by amdgcn. I understand the problem with non-constant address spaces, but I think the best solution here would be to try to make more of a separation between the amdgcn code and the r600 code in the backend rather than trying to change the address space mapping for r600. r600 should really be mostly read-only at this point.

In D39670#918112, @tstellar wrote:

r600 doesn't support generic address space, so in that case, I would recommend only defining an intrinsic to be used by amdgcn. I understand the problem with non-constant address spaces, but I think the best solution here would be to try to make more of a separation between the amdgcn code and the r600 code in the backend rather than trying to change the address space mapping for r600. r600 should really be mostly read-only at this point.

We discussed this internally and concluded that having a static address space mapping is more important. Most of issues are due to using dummy pointer info. Such places should be few in r600 code, therefore we will continue fixing these issues unless we found there need excessive efforts to do so.

In D39670#918887, @yaxunl wrote:

We discussed this internally and concluded that having a static address space mapping is more important. Most of issues are due to using dummy pointer info. Such places should be few in r600 code, therefore we will continue fixing these issues unless we found there need excessive efforts to do so.

It would be nice to have these discussions on the mailing list so more people could participate, and it would be helpful for convincing people like me that this is the right approach, but as of right now I don't see why it is so important to have a static address space mapping that is the same for r600 and amdgcn. I think the better approach would be to share less code between the two subtargets such that it's possible for each to have their own static mapping. I think this kind of code separation is something should be done anyway independent of address space mapping work.

I think the chance of R600 breaking with a different address space mapping is much higher. I think it is important to have the one static mapping that is consistent.

LGTM with the switch fix. Also could just move the SI one to AMDGPUInstrInfo

lib/Target/AMDGPU/R600InstrInfo.cpp
1511	Move unreachable into default and remove the final return

This revision is now accepted and ready to land.Nov 8 2017, 12:58 AM

In D39670#918904, @tstellar wrote:

In D39670#918887, @yaxunl wrote:

We discussed this internally and concluded that having a static address space mapping is more important. Most of issues are due to using dummy pointer info. Such places should be few in r600 code, therefore we will continue fixing these issues unless we found there need excessive efforts to do so.

It would be nice to have these discussions on the mailing list so more people could participate, and it would be helpful for convincing people like me that this is the right approach, but as of right now I don't see why it is so important to have a static address space mapping that is the same for r600 and amdgcn. I think the better approach would be to share less code between the two subtargets such that it's possible for each to have their own static mapping. I think this kind of code separation is something should be done anyway independent of address space mapping work.

As Matt said, the risk of having two sets of address space mapping is high. Also I estimate the workload to switch r600 to the new addr space mapping is moderate. On the hand, the workload for separating r600 from amdgcn is high.

In D39670#918904, @tstellar wrote:

In D39670#918887, @yaxunl wrote:

We discussed this internally and concluded that having a static address space mapping is more important. Most of issues are due to using dummy pointer info. Such places should be few in r600 code, therefore we will continue fixing these issues unless we found there need excessive efforts to do so.

It would be nice to have these discussions on the mailing list so more people could participate, and it would be helpful for convincing people like me that this is the right approach, but as of right now I don't see why it is so important to have a static address space mapping that is the same for r600 and amdgcn. I think the better approach would be to share less code between the two subtargets such that it's possible for each to have their own static mapping. I think this kind of code separation is something should be done anyway independent of address space mapping work.

r600 and amdgcn share lots of passes. It is not practical to duplicate these passes and maintain them separately. Even if we were able to do that, maintaining two sets of static address space is error prone. On the other hand, let r600 switching to the new address space mapping takes only moderate efforts since it is only address space number change. We will fix the regressions arising from this change. After this change, we will get a more consistent and bug-free AMDGPU backend.

! In D39670#919711, @yaxunl wrote:

As Matt said, the risk of having two sets of address space mapping is high. Also I estimate the workload to switch r600 to the new addr space mapping is moderate. On the hand, the workload for separating r600 from amdgcn is high.

I disagree with this, but it's also possible I'm wrong. If you are willing to do the work and deal with the fallout, then I'm OK with this change. Just make sure @jvesely is aware so he can help test/check for regressions.

One problem I have with this specific test is you are essentially removing a test for the current mapping and replacing it with a test for the new mapping, even though the new mapping isn't actually being used anywhere outside of LLVM. What is your timeframe for migrating r600 to the new mapping?

r600 and amdgcn share lots of passes. It is not practical to duplicate these passes and maintain them separately. Even if we were able to do that, maintaining two sets of static address space is error prone. On the other hand, let r600 switching to the new address space mapping takes only moderate efforts since it is only address space number change. We will fix the regressions arising from this change. After this change, we will get a more consistent and bug-free AMDGPU backend.

I started a branch to split these two apart and it wasn't as bad as I thought it would be. I think long term this where we want to go with the backend. Maybe at some point I'll get a chance to clean this branch up and send out the patches.

In D39670#920596, @tstellar wrote:

! In D39670#919711, @yaxunl wrote:

As Matt said, the risk of having two sets of address space mapping is high. Also I estimate the workload to switch r600 to the new addr space mapping is moderate. On the hand, the workload for separating r600 from amdgcn is high.

I disagree with this, but it's also possible I'm wrong. If you are willing to do the work and deal with the fallout, then I'm OK with this change. Just make sure @jvesely is aware so he can help test/check for regressions.

Sure. I will keep Jan updated.

One problem I have with this specific test is you are essentially removing a test for the current mapping and replacing it with a test for the new mapping, even though the new mapping isn't actually being used anywhere outside of LLVM. What is your timeframe for migrating r600 to the new mapping?

I plan to get it done ASAP. Hopefully in one month. I think the new address space mapping and old address space mapping share most of the code path in llvm/clang, so any regression for the old address space mapping is likely causes regression in the new address space mapping. On the other hand, since new address space mapping has non-zero alloca address space therefore some extra code path, regressions in new address space mapping may not cause regressions in old address space mapping.

yaxunl added inline comments.Nov 9 2017, 12:22 PM

lib/Target/AMDGPU/R600InstrInfo.cpp
1511	Doing this causes an error: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]

Closed by commit rL317861: [AMDGPU] Fix pointer info for pseudo source for r600 (authored by yaxunl). · Explain WhyNov 9 2017, 5:53 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

R600InstrInfo.h

3 lines

R600InstrInfo.cpp

18 lines

test/

CodeGen/

AMDGPU/

6 lines

6 lines

6 lines

157 lines

Diff 121927

lib/Target/AMDGPU/R600InstrInfo.h

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	public:
// Helper functions that check the opcode for status information		// Helper functions that check the opcode for status information
bool isRegisterStore(const MachineInstr &MI) const {		bool isRegisterStore(const MachineInstr &MI) const {
return get(MI.getOpcode()).TSFlags & R600InstrFlags::REGISTER_STORE;		return get(MI.getOpcode()).TSFlags & R600InstrFlags::REGISTER_STORE;
}		}

bool isRegisterLoad(const MachineInstr &MI) const {		bool isRegisterLoad(const MachineInstr &MI) const {
return get(MI.getOpcode()).TSFlags & R600InstrFlags::REGISTER_LOAD;		return get(MI.getOpcode()).TSFlags & R600InstrFlags::REGISTER_LOAD;
}		}

		unsigned getAddressSpaceForPseudoSourceKind(
		PseudoSourceValue::PSVKind Kind) const override;
};		};

namespace AMDGPU {		namespace AMDGPU {

int getLDSNoRetOp(uint16_t Opcode);		int getLDSNoRetOp(uint16_t Opcode);

} //End namespace AMDGPU		} //End namespace AMDGPU

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Target/AMDGPU/R600InstrInfo.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	FlagOp.setImm(0);	FlagOp.setImm(0);
	} else {	} else {
	MachineOperand &FlagOp = getFlagOp(MI);	MachineOperand &FlagOp = getFlagOp(MI);
	unsigned InstFlags = FlagOp.getImm();	unsigned InstFlags = FlagOp.getImm();
	InstFlags &= ~(Flag << (NUM_MO_FLAGS * Operand));	InstFlags &= ~(Flag << (NUM_MO_FLAGS * Operand));
	FlagOp.setImm(InstFlags);	FlagOp.setImm(InstFlags);
	}	}
	}	}

		unsigned R600InstrInfo::getAddressSpaceForPseudoSourceKind(
		PseudoSourceValue::PSVKind Kind) const {
		switch (Kind) {
		case PseudoSourceValue::Stack:
		case PseudoSourceValue::FixedStack:
		return AMDGPUASI.PRIVATE_ADDRESS;
		case PseudoSourceValue::ConstantPool:
		case PseudoSourceValue::GOT:
		case PseudoSourceValue::JumpTable:
		case PseudoSourceValue::GlobalValueCallEntry:
		case PseudoSourceValue::ExternalSymbolCallEntry:
		case PseudoSourceValue::TargetCustom:
		return AMDGPUASI.CONSTANT_ADDRESS;
		}
		llvm_unreachable("Invalid pseudo source kind");
		jsjodinUnsubmitted Not Done Reply Inline Actions I think you will need a return at the end of the function because I remember seeing some warnings on some buildbots. You can double check. Otherwise it looks good to me. jsjodin: I think you will need a return at the end of the function because I remember seeing some…
		arsenmUnsubmitted Not Done Reply Inline Actions Move unreachable into default and remove the final return arsenm: Move unreachable into default and remove the final return
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions Doing this causes an error: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] yaxunl: Doing this causes an error: error: default label in switch which covers all enumeration values…
		return AMDGPUASI.PRIVATE_ADDRESS;
		}
Context not available.

test/CodeGen/AMDGPU/load-constant-i1.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=cypress < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mtriple=r600---amdgiz -mcpu=cypress < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}constant_load_i1:			; FUNC-LABEL: {{^}}constant_load_i1:
	; GCN: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; GCN: v_and_b32_e32 v{{[0-9]+}}, 1			; GCN: v_and_b32_e32 v{{[0-9]+}}, 1
	; GCN: buffer_store_byte			; GCN: buffer_store_byte

	; EG: VTX_READ_8			; EG: VTX_READ_8
	; EG: AND_INT			; EG: AND_INT
	▲ Show 20 Lines • Show All 360 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-global-i1.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=cypress < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mtriple=r600---amdgiz -mcpu=cypress < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}global_load_i1:			; FUNC-LABEL: {{^}}global_load_i1:
	; GCN: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; GCN: v_and_b32_e32 v{{[0-9]+}}, 1			; GCN: v_and_b32_e32 v{{[0-9]+}}, 1
	; GCN: buffer_store_byte			; GCN: buffer_store_byte

	; EG: VTX_READ_8			; EG: VTX_READ_8
	; EG: AND_INT			; EG: AND_INT
	▲ Show 20 Lines • Show All 360 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-local-i1.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=cypress < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mtriple=r600---amdgiz -mcpu=cypress < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}local_load_i1:			; FUNC-LABEL: {{^}}local_load_i1:
	; GCN: ds_read_u8			; GCN: ds_read_u8
	; GCN: v_and_b32_e32 v{{[0-9]+}}, 1			; GCN: v_and_b32_e32 v{{[0-9]+}}, 1
	; GCN: ds_write_b8			; GCN: ds_write_b8

	; EG: LDS_UBYTE_READ_RET			; EG: LDS_UBYTE_READ_RET
	; EG: AND_INT			; EG: AND_INT
	▲ Show 20 Lines • Show All 360 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/vector-alloca.ll

	; RUN: llc -march=amdgcn -mcpu=verde -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=verde -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=verde -mattr=+promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=verde -mattr=+promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=+promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=+promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck --check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mtriple=r600---amdgiz -mcpu=redwood < %s \| FileCheck --check-prefix=EG -check-prefix=FUNC %s
	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-promote-alloca -sroa -instcombine < %s \| FileCheck -check-prefix=OPT %s			; RUN: opt -S -mtriple=amdgcn---amdgiz -amdgpu-promote-alloca -sroa -instcombine < %s \| FileCheck -check-prefix=OPT %s
				target datalayout = "A5"

	; OPT-LABEL: @vector_read(			; OPT-LABEL: @vector_read(
	; OPT: %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index			; OPT: %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index
	; OPT: store i32 %0, i32 addrspace(1)* %out, align 4			; OPT: store i32 %0, i32 addrspace(1)* %out, align 4

	; FUNC-LABEL: {{^}}vector_read:			; FUNC-LABEL: {{^}}vector_read:
	; EG: MOV			; EG: MOV
	; EG: MOV			; EG: MOV
	; EG: MOV			; EG: MOV
	; EG: MOV			; EG: MOV
	; EG: MOVA_INT			; EG: MOVA_INT
	define amdgpu_kernel void @vector_read(i32 addrspace(1)* %out, i32 %index) {			define amdgpu_kernel void @vector_read(i32 addrspace(1)* %out, i32 %index) {
	entry:			entry:
	%tmp = alloca [4 x i32]			%tmp = alloca [4 x i32], addrspace(5)
	%x = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 0			%x = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 0
	%y = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 1			%y = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 1
	%z = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 2			%z = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 2
	%w = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 3			%w = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 3
	store i32 0, i32* %x			store i32 0, i32 addrspace(5)* %x
	store i32 1, i32* %y			store i32 1, i32 addrspace(5)* %y
	store i32 2, i32* %z			store i32 2, i32 addrspace(5)* %z
	store i32 3, i32* %w			store i32 3, i32 addrspace(5)* %w
	%tmp1 = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 %index			%tmp1 = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 %index
	%tmp2 = load i32, i32* %tmp1			%tmp2 = load i32, i32 addrspace(5)* %tmp1
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; OPT-LABEL: @vector_write(			; OPT-LABEL: @vector_write(
	; OPT: %0 = insertelement <4 x i32> zeroinitializer, i32 1, i32 %w_index			; OPT: %0 = insertelement <4 x i32> zeroinitializer, i32 1, i32 %w_index
	; OPT: %1 = extractelement <4 x i32> %0, i32 %r_index			; OPT: %1 = extractelement <4 x i32> %0, i32 %r_index
	; OPT: store i32 %1, i32 addrspace(1)* %out, align 4			; OPT: store i32 %1, i32 addrspace(1)* %out, align 4

	; FUNC-LABEL: {{^}}vector_write:			; FUNC-LABEL: {{^}}vector_write:
	; EG: MOV			; EG: MOV
	; EG: MOV			; EG: MOV
	; EG: MOV			; EG: MOV
	; EG: MOV			; EG: MOV
	; EG: MOVA_INT			; EG: MOVA_INT
	; EG: MOVA_INT			; EG: MOVA_INT
	define amdgpu_kernel void @vector_write(i32 addrspace(1)* %out, i32 %w_index, i32 %r_index) {			define amdgpu_kernel void @vector_write(i32 addrspace(1)* %out, i32 %w_index, i32 %r_index) {
	entry:			entry:
	%tmp = alloca [4 x i32]			%tmp = alloca [4 x i32], addrspace(5)
	%x = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 0			%x = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 0
	%y = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 1			%y = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 1
	%z = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 2			%z = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 2
	%w = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 3			%w = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 3
	store i32 0, i32* %x			store i32 0, i32 addrspace(5)* %x
	store i32 0, i32* %y			store i32 0, i32 addrspace(5)* %y
	store i32 0, i32* %z			store i32 0, i32 addrspace(5)* %z
	store i32 0, i32* %w			store i32 0, i32 addrspace(5)* %w
	%tmp1 = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 %w_index			%tmp1 = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 %w_index
	store i32 1, i32* %tmp1			store i32 1, i32 addrspace(5)* %tmp1
	%tmp2 = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 %r_index			%tmp2 = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 %r_index
	%tmp3 = load i32, i32* %tmp2			%tmp3 = load i32, i32 addrspace(5)* %tmp2
	store i32 %tmp3, i32 addrspace(1)* %out			store i32 %tmp3, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; This test should be optimize to:			; This test should be optimize to:
	; store i32 0, i32 addrspace(1)* %out			; store i32 0, i32 addrspace(1)* %out

	; OPT-LABEL: @bitcast_gep(			; OPT-LABEL: @bitcast_gep(
	; OPT-LABEL: store i32 0, i32 addrspace(1)* %out, align 4			; OPT-LABEL: store i32 0, i32 addrspace(1)* %out, align 4

	; FUNC-LABEL: {{^}}bitcast_gep:			; FUNC-LABEL: {{^}}bitcast_gep:
	; EG: STORE_RAW			; EG: STORE_RAW
	define amdgpu_kernel void @bitcast_gep(i32 addrspace(1)* %out, i32 %w_index, i32 %r_index) {			define amdgpu_kernel void @bitcast_gep(i32 addrspace(1)* %out, i32 %w_index, i32 %r_index) {
	entry:			entry:
	%tmp = alloca [4 x i32]			%tmp = alloca [4 x i32], addrspace(5)
	%x = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 0			%x = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 0
	%y = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 1			%y = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 1
	%z = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 2			%z = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 2
	%w = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 3			%w = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 3
	store i32 0, i32* %x			store i32 0, i32 addrspace(5)* %x
	store i32 0, i32* %y			store i32 0, i32 addrspace(5)* %y
	store i32 0, i32* %z			store i32 0, i32 addrspace(5)* %z
	store i32 0, i32* %w			store i32 0, i32 addrspace(5)* %w
	%tmp1 = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 1			%tmp1 = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 1
	%tmp2 = bitcast i32* %tmp1 to [4 x i32]*			%tmp2 = bitcast i32 addrspace(5)* %tmp1 to [4 x i32] addrspace(5)*
	%tmp3 = getelementptr [4 x i32], [4 x i32]* %tmp2, i32 0, i32 0			%tmp3 = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp2, i32 0, i32 0
	%tmp4 = load i32, i32* %tmp3			%tmp4 = load i32, i32 addrspace(5)* %tmp3
	store i32 %tmp4, i32 addrspace(1)* %out			store i32 %tmp4, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; OPT-LABEL: @vector_read_bitcast_gep(			; OPT-LABEL: @vector_read_bitcast_gep(
	; OPT: %0 = extractelement <4 x i32> <i32 1065353216, i32 1, i32 2, i32 3>, i32 %index			; OPT: %0 = extractelement <4 x i32> <i32 1065353216, i32 1, i32 2, i32 3>, i32 %index
	; OPT: store i32 %0, i32 addrspace(1)* %out, align 4			; OPT: store i32 %0, i32 addrspace(1)* %out, align 4
	define amdgpu_kernel void @vector_read_bitcast_gep(i32 addrspace(1)* %out, i32 %index) {			define amdgpu_kernel void @vector_read_bitcast_gep(i32 addrspace(1)* %out, i32 %index) {
	entry:			entry:
	%tmp = alloca [4 x i32]			%tmp = alloca [4 x i32], addrspace(5)
	%x = getelementptr inbounds [4 x i32], [4 x i32]* %tmp, i32 0, i32 0			%x = getelementptr inbounds [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 0
	%y = getelementptr inbounds [4 x i32], [4 x i32]* %tmp, i32 0, i32 1			%y = getelementptr inbounds [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 1
	%z = getelementptr inbounds [4 x i32], [4 x i32]* %tmp, i32 0, i32 2			%z = getelementptr inbounds [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 2
	%w = getelementptr inbounds [4 x i32], [4 x i32]* %tmp, i32 0, i32 3			%w = getelementptr inbounds [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 3
	%bc = bitcast i32* %x to float*			%bc = bitcast i32 addrspace(5)* %x to float addrspace(5)*
	store float 1.0, float* %bc			store float 1.0, float addrspace(5)* %bc
	store i32 1, i32* %y			store i32 1, i32 addrspace(5)* %y
	store i32 2, i32* %z			store i32 2, i32 addrspace(5)* %z
	store i32 3, i32* %w			store i32 3, i32 addrspace(5)* %w
	%tmp1 = getelementptr inbounds [4 x i32], [4 x i32]* %tmp, i32 0, i32 %index			%tmp1 = getelementptr inbounds [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 %index
	%tmp2 = load i32, i32* %tmp1			%tmp2 = load i32, i32 addrspace(5)* %tmp1
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FIXME: Should be able to promote this. Instcombine should fold the			; FIXME: Should be able to promote this. Instcombine should fold the
	; cast in the hasOneUse case so it might not matter in practice			; cast in the hasOneUse case so it might not matter in practice

	; OPT-LABEL: @vector_read_bitcast_alloca(			; OPT-LABEL: @vector_read_bitcast_alloca(
	; OPT: alloca [4 x float]			; OPT: alloca [4 x float]
	; OPT: store float			; OPT: store float
	; OPT: store float			; OPT: store float
	; OPT: store float			; OPT: store float
	; OPT: store float			; OPT: store float
	; OPT: load float			; OPT: load float
	define amdgpu_kernel void @vector_read_bitcast_alloca(float addrspace(1)* %out, i32 %index) {			define amdgpu_kernel void @vector_read_bitcast_alloca(float addrspace(1)* %out, i32 %index) {
	entry:			entry:
	%tmp = alloca [4 x i32]			%tmp = alloca [4 x i32], addrspace(5)
	%tmp.bc = bitcast [4 x i32]* %tmp to [4 x float]*			%tmp.bc = bitcast [4 x i32] addrspace(5)* %tmp to [4 x float] addrspace(5)*
	%x = getelementptr inbounds [4 x float], [4 x float]* %tmp.bc, i32 0, i32 0			%x = getelementptr inbounds [4 x float], [4 x float] addrspace(5)* %tmp.bc, i32 0, i32 0
	%y = getelementptr inbounds [4 x float], [4 x float]* %tmp.bc, i32 0, i32 1			%y = getelementptr inbounds [4 x float], [4 x float] addrspace(5)* %tmp.bc, i32 0, i32 1
	%z = getelementptr inbounds [4 x float], [4 x float]* %tmp.bc, i32 0, i32 2			%z = getelementptr inbounds [4 x float], [4 x float] addrspace(5)* %tmp.bc, i32 0, i32 2
	%w = getelementptr inbounds [4 x float], [4 x float]* %tmp.bc, i32 0, i32 3			%w = getelementptr inbounds [4 x float], [4 x float] addrspace(5)* %tmp.bc, i32 0, i32 3
	store float 0.0, float* %x			store float 0.0, float addrspace(5)* %x
	store float 1.0, float* %y			store float 1.0, float addrspace(5)* %y
	store float 2.0, float* %z			store float 2.0, float addrspace(5)* %z
	store float 4.0, float* %w			store float 4.0, float addrspace(5)* %w
	%tmp1 = getelementptr inbounds [4 x float], [4 x float]* %tmp.bc, i32 0, i32 %index			%tmp1 = getelementptr inbounds [4 x float], [4 x float] addrspace(5)* %tmp.bc, i32 0, i32 %index
	%tmp2 = load float, float* %tmp1			%tmp2 = load float, float addrspace(5)* %tmp1
	store float %tmp2, float addrspace(1)* %out			store float %tmp2, float addrspace(1)* %out
	ret void			ret void
	}			}

	; The pointer arguments in local address space should not affect promotion to vector.			; The pointer arguments in local address space should not affect promotion to vector.

	; OPT-LABEL: @vector_read_with_local_arg(			; OPT-LABEL: @vector_read_with_local_arg(
	; OPT: %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index			; OPT: %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index
	; OPT: store i32 %0, i32 addrspace(1)* %out, align 4			; OPT: store i32 %0, i32 addrspace(1)* %out, align 4
	define amdgpu_kernel void @vector_read_with_local_arg(i32 addrspace(3)* %stopper, i32 addrspace(1)* %out, i32 %index) {			define amdgpu_kernel void @vector_read_with_local_arg(i32 addrspace(3)* %stopper, i32 addrspace(1)* %out, i32 %index) {
	entry:			entry:
	%tmp = alloca [4 x i32]			%tmp = alloca [4 x i32], addrspace(5)
	%x = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 0			%x = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 0
	%y = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 1			%y = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 1
	%z = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 2			%z = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 2
	%w = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 3			%w = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 3
	store i32 0, i32* %x			store i32 0, i32 addrspace(5)* %x
	store i32 1, i32* %y			store i32 1, i32 addrspace(5)* %y
	store i32 2, i32* %z			store i32 2, i32 addrspace(5)* %z
	store i32 3, i32* %w			store i32 3, i32 addrspace(5)* %w
	%tmp1 = getelementptr [4 x i32], [4 x i32]* %tmp, i32 0, i32 %index			%tmp1 = getelementptr [4 x i32], [4 x i32] addrspace(5)* %tmp, i32 0, i32 %index
	%tmp2 = load i32, i32* %tmp1			%tmp2 = load i32, i32 addrspace(5)* %tmp1
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix pointer info for pseudo source for r600ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 121927

lib/Target/AMDGPU/R600InstrInfo.h

lib/Target/AMDGPU/R600InstrInfo.cpp

test/CodeGen/AMDGPU/load-constant-i1.ll

test/CodeGen/AMDGPU/load-global-i1.ll

test/CodeGen/AMDGPU/load-local-i1.ll

test/CodeGen/AMDGPU/vector-alloca.ll

[AMDGPU] Fix pointer info for pseudo source for r600
ClosedPublic