This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: do not fold clamp instructions when sources are different
ClosedPublic

Authored by hakzsam on Sep 22 2017, 5:04 AM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle

Summary

The given sequence of instructions:

a = fadd(x, y)
b = fadd(x, z)
r = clamp(max(a, b), 0.0, 1.0)

is definitely not equivalent to:
r = fadd(x, y) with clamp enabled

This is because we don't check that source operands of max()
are equivalent before folding the instruction.

This has been reported by Alex Smith (Feral).

Diff Detail

Event Timeline

hakzsam created this revision.Sep 22 2017, 5:04 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptSep 22 2017, 5:04 AM

aejsmith added a subscriber: aejsmith.Sep 22 2017, 5:46 AM

Maybe you could add a helpful comment on isClamp? Either way, LGTM.

This revision is now accepted and ready to land.Sep 29 2017, 8:14 AM

In D38173#884464, @nhaehnle wrote:

Maybe you could add a helpful comment on isClamp? Either way, LGTM.

Thanks for accepting this review. What kind of comment do you think should be useful?
Also, could you push the patch for me? I don't have LLVM commit rights.

Mostly it took me a while to grok why the function was looking at MAX instructions. As far as I understand now, it's because the code that deduces the clamp output modifier happens to always produce it on a MAX(x,x) instruction, and not a MIN.

r314951

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIFoldOperands.cpp

1 line

test/

CodeGen/

AMDGPU/

clamp.ll

22 lines

Diff 116332

lib/Target/AMDGPU/SIFoldOperands.cpp

Context not available.
	const MachineOperand *Src0 = TII->getNamedOperand(MI, AMDGPU::OpName::src0);	const MachineOperand *Src0 = TII->getNamedOperand(MI, AMDGPU::OpName::src0);
	const MachineOperand *Src1 = TII->getNamedOperand(MI, AMDGPU::OpName::src1);	const MachineOperand *Src1 = TII->getNamedOperand(MI, AMDGPU::OpName::src1);
	if (!Src0->isReg() \|\| !Src1->isReg() \|\|	if (!Src0->isReg() \|\| !Src1->isReg() \|\|
		Src0->getReg() != Src1->getReg() \|\|
	Src0->getSubReg() != Src1->getSubReg() \|\|	Src0->getSubReg() != Src1->getSubReg() \|\|
	Src0->getSubReg() != AMDGPU::NoSubRegister)	Src0->getSubReg() != AMDGPU::NoSubRegister)
	return nullptr;	return nullptr;
Context not available.

test/CodeGen/AMDGPU/clamp.ll

Context not available.
	ret void	ret void
	}	}

		; GCN-LABEL: {{^}}v_clamp_diff_source_f32:
		; GCN: v_add_f32_e32 [[A:v[0-9]+]]
		; GCN: v_add_f32_e32 [[B:v[0-9]+]]
		; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[B]] clamp{{$}}
		define amdgpu_kernel void @v_clamp_diff_source_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0
		{
		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 0
		%gep1 = getelementptr float, float addrspace(1)* %aptr, i32 1
		%gep2 = getelementptr float, float addrspace(1)* %aptr, i32 2
		%l0 = load float, float addrspace(1)* %gep0
		%l1 = load float, float addrspace(1)* %gep1
		%l2 = load float, float addrspace(1)* %gep2
		%a = fadd nsz float %l0, %l1
		%b = fadd nsz float %l0, %l2
		%res = call nsz float @llvm.maxnum.f32(float %a, float %b)
		%max = call nsz float @llvm.maxnum.f32(float %res, float 0.0)
		%min = call nsz float @llvm.minnum.f32(float %max, float 1.0)
		%out.gep = getelementptr float, float addrspace(1)* %out, i32 3
		store float %min, float addrspace(1)* %out.gep
		ret void
		}

	declare i32 @llvm.amdgcn.workitem.id.x() #1	declare i32 @llvm.amdgcn.workitem.id.x() #1
	declare float @llvm.fabs.f32(float) #1	declare float @llvm.fabs.f32(float) #1
	declare float @llvm.minnum.f32(float, float) #1	declare float @llvm.minnum.f32(float, float) #1
Context not available.