This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR
ClosedPublic

Authored by grahamsellers on Nov 29 2018, 12:12 PM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle

Summary

The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

Diff Detail

Event Timeline

grahamsellers created this revision.Nov 29 2018, 12:12 PM

Herald added subscribers: llvm-commits, t-tye, tpr and 5 others. · View Herald TranscriptNov 29 2018, 12:12 PM

arsenm added inline comments.Nov 29 2018, 12:24 PM

lib/Target/AMDGPU/SIInstrInfo.cpp
4908	const reference
4927	.add on next line
test/CodeGen/AMDGPU/xnor.ll
128	It would be good if you can include some other use that requires SALU->VALU changes to stress that the wordlist insert happened

Addressing review.

grahamsellers marked 3 inline comments as done.Nov 29 2018, 12:43 PM

grahamsellers added inline comments.

lib/Target/AMDGPU/SIInstrInfo.cpp
4908	I stole this from elsewhere in this file. I went ahead and fixed other places where this was not const reference.
test/CodeGen/AMDGPU/xnor.ll
128	Not sure I follow. The new split turns S_XNOR_B64 into S_NOT_B64 followed by S_XOR_B64, adding the S_XOR_B64 to the worklist. The only way the two V_XOR_B32 instructions can be generated is when SIInstrInfo::moveToVALU iterates again, converting S_XOR_B64 -> 2 x S_XOR_B32, which re-adds to the worklist, causing moveToVALU to turn S_XOR_B32 to V_XOR_B32_E32 (which the test expects). If the worklist insert wasn't happening, none of these tests would work. What am I missing?

LGTM

lib/Target/AMDGPU/SIInstrInfo.cpp
4908	The guidance used to be to use values, but then that changed at some point but nobody ever went back and changed them all

This revision is now accepted and ready to land.Nov 29 2018, 2:21 PM

This was committed long ago in ba559ac0584900531a12a47e410fd7fe841be3e5

Herald added a subscriber: kerbowa. · View Herald TranscriptApr 7 2020, 11:43 AM

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIInstrInfo.h

3 lines

SIInstrInfo.cpp

46 lines

test/

CodeGen/

AMDGPU/

xnor.ll

86 lines

Diff 175923

lib/Target/AMDGPU/SIInstrInfo.h

Context not available.
	unsigned Opcode,	unsigned Opcode,
	MachineDominatorTree *MDT = nullptr) const;	MachineDominatorTree *MDT = nullptr) const;

		void splitScalar64BitXnor(SetVectorType &Worklist, MachineInstr &Inst,
		MachineDominatorTree *MDT = nullptr) const;

	void splitScalar64BitBCNT(SetVectorType &Worklist,	void splitScalar64BitBCNT(SetVectorType &Worklist,
	MachineInstr &Inst) const;	MachineInstr &Inst) const;
	void splitScalar64BitBFE(SetVectorType &Worklist,	void splitScalar64BitBFE(SetVectorType &Worklist,
Context not available.

lib/Target/AMDGPU/SIInstrInfo.cpp

Context not available.
	continue;	continue;

	case AMDGPU::S_XNOR_B64:	case AMDGPU::S_XNOR_B64:
	splitScalar64BitBinaryOp(Worklist, Inst, AMDGPU::S_XNOR_B32, MDT);	if (ST.hasDLInsts())
		splitScalar64BitBinaryOp(Worklist, Inst, AMDGPU::S_XNOR_B32, MDT);
		else
		splitScalar64BitXnor(Worklist, Inst, MDT);
	Inst.eraseFromParent();	Inst.eraseFromParent();
	continue;	continue;

Context not available.
	addUsersToMoveToVALUWorklist(FullDestReg, MRI, Worklist);	addUsersToMoveToVALUWorklist(FullDestReg, MRI, Worklist);
	}	}

		void SIInstrInfo::splitScalar64BitXnor(SetVectorType &Worklist,
		MachineInstr &Inst,
		MachineDominatorTree *MDT) const {
		MachineBasicBlock &MBB = *Inst.getParent();
		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();

		MachineOperand &Dest = Inst.getOperand(0);
		MachineOperand &Src0 = Inst.getOperand(1);
		MachineOperand &Src1 = Inst.getOperand(2);
		DebugLoc DL = Inst.getDebugLoc();
		arsenmUnsubmitted Not Done Reply Inline Actions const reference arsenm: const reference
		grahamsellersAuthorUnsubmitted Done Reply Inline Actions I stole this from elsewhere in this file. I went ahead and fixed other places where this was not const reference. grahamsellers: I stole this from elsewhere in this file. I went ahead and fixed other places where this was…
		arsenmUnsubmitted Not Done Reply Inline Actions The guidance used to be to use values, but then that changed at some point but nobody ever went back and changed them all arsenm: The guidance used to be to use values, but then that changed at some point but nobody ever went…

		MachineBasicBlock::iterator MII = Inst;

		const TargetRegisterClass *DestRC = MRI.getRegClass(Dest.getReg());

		unsigned Interm = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);

		MachineOperand* Op0;
		MachineOperand* Op1;

		if (Src0.isReg() && RI.isSGPRReg(MRI, Src0.getReg())) {
		Op0 = &Src0;
		Op1 = &Src1;
		} else {
		Op0 = &Src1;
		Op1 = &Src0;
		}

		BuildMI(MBB, MII, DL, get(AMDGPU::S_NOT_B64), Interm).add(*Op0);
		arsenmUnsubmitted Done Reply Inline Actions .add on next line arsenm: .add on next line

		unsigned NewDest = MRI.createVirtualRegister(DestRC);

		MachineInstr &Xor = *BuildMI(MBB, MII, DL, get(AMDGPU::S_XOR_B64), NewDest)
		.addReg(Interm)
		.add(*Op1);

		MRI.replaceRegWith(Dest.getReg(), NewDest);

		Worklist.insert(&Xor);
		}

	void SIInstrInfo::splitScalar64BitBCNT(	void SIInstrInfo::splitScalar64BitBCNT(
	SetVectorType &Worklist, MachineInstr &Inst) const {	SetVectorType &Worklist, MachineInstr &Inst) const {
	MachineBasicBlock &MBB = *Inst.getParent();	MachineBasicBlock &MBB = *Inst.getParent();
Context not available.

test/CodeGen/AMDGPU/xnor.ll

Context not available.
	; GCN-LABEL: {{^}}vector_xnor_i64_one_use	; GCN-LABEL: {{^}}vector_xnor_i64_one_use
	; GCN-NOT: s_xnor_b64	; GCN-NOT: s_xnor_b64
	; GCN: v_not_b32	; GCN: v_not_b32
	; GCN: v_xor_b32
	; GCN: v_not_b32	; GCN: v_not_b32
	; GCN: v_xor_b32	; GCN: v_xor_b32
		; GCN: v_xor_b32
	; GCN-DL: v_xnor_b32	; GCN-DL: v_xnor_b32
	; GCN-DL: v_xnor_b32	; GCN-DL: v_xnor_b32
	define i64 @vector_xnor_i64_one_use(i64 %a, i64 %b) {	define i64 @vector_xnor_i64_one_use(i64 %a, i64 %b) {
Context not available.
	ret void	ret void
	}	}

		; GCN-LABEL: {{^}}xnor_i64_s_v_one_use
		; GCN-NOT: s_xnor_b64
		; GCN: s_not_b64
		; GCN: v_xor_b32
		; GCN: v_xor_b32
		; GCN-DL: v_xnor_b32
		; GCN-DL: v_xnor_b32
		define amdgpu_kernel void @xnor_i64_s_v_one_use(
		i64 addrspace(1)* %r0, i64 %a) {
		entry:
		%b32 = call i32 @llvm.amdgcn.workitem.id.x() #1
		%b64 = zext i32 %b32 to i64
		%b = shl i64 %b64, 29
		%xor = xor i64 %a, %b
		%r0.val = xor i64 %xor, -1
		store i64 %r0.val, i64 addrspace(1)* %r0
		arsenmUnsubmitted Not Done Reply Inline Actions It would be good if you can include some other use that requires SALU->VALU changes to stress that the wordlist insert happened arsenm: It would be good if you can include some other use that requires SALU->VALU changes to stress…
		grahamsellersAuthorUnsubmitted Not Done Reply Inline Actions Not sure I follow. The new split turns S_XNOR_B64 into S_NOT_B64 followed by S_XOR_B64, adding the S_XOR_B64 to the worklist. The only way the two V_XOR_B32 instructions can be generated is when SIInstrInfo::moveToVALU iterates again, converting S_XOR_B64 -> 2 x S_XOR_B32, which re-adds to the worklist, causing moveToVALU to turn S_XOR_B32 to V_XOR_B32_E32 (which the test expects). If the worklist insert wasn't happening, none of these tests would work. What am I missing? grahamsellers: Not sure I follow. The new split turns S_XNOR_B64 into S_NOT_B64 followed by S_XOR_B64, adding…
		ret void
		}

		; GCN-LABEL: {{^}}xnor_i64_v_s_one_use
		; GCN-NOT: s_xnor_b64
		; GCN: s_not_b64
		; GCN: v_xor_b32
		; GCN: v_xor_b32
		; GCN-DL: v_xnor_b32
		; GCN-DL: v_xnor_b32
		define amdgpu_kernel void @xnor_i64_v_s_one_use(
		i64 addrspace(1)* %r0, i64 %a) {
		entry:
		%b32 = call i32 @llvm.amdgcn.workitem.id.x() #1
		%b64 = zext i32 %b32 to i64
		%b = shl i64 %b64, 29
		%xor = xor i64 %b, %a
		%r0.val = xor i64 %xor, -1
		store i64 %r0.val, i64 addrspace(1)* %r0
		ret void
		}

		; GCN-LABEL: {{^}}vector_xor_na_b_i32_one_use
		; GCN-NOT: s_xnor_b32
		; GCN: v_not_b32
		; GCN: v_xor_b32
		; GCN-DL: v_xnor_b32
		define i32 @vector_xor_na_b_i32_one_use(i32 %a, i32 %b) {
		entry:
		%na = xor i32 %a, -1
		%r = xor i32 %na, %b
		ret i32 %r
		}

		; GCN-LABEL: {{^}}vector_xor_a_nb_i32_one_use
		; GCN-NOT: s_xnor_b32
		; GCN: v_not_b32
		; GCN: v_xor_b32
		; GCN-DL: v_xnor_b32
		define i32 @vector_xor_a_nb_i32_one_use(i32 %a, i32 %b) {
		entry:
		%nb = xor i32 %b, -1
		%r = xor i32 %a, %nb
		ret i32 %r
		}

		; GCN-LABEL: {{^}}scalar_xor_a_nb_i64_one_use
		; GCN: s_xnor_b64
		define amdgpu_kernel void @scalar_xor_a_nb_i64_one_use(
		i64 addrspace(1)* %r0, i64 %a, i64 %b) {
		entry:
		%nb = xor i64 %b, -1
		%r0.val = xor i64 %a, %nb
		store i64 %r0.val, i64 addrspace(1)* %r0
		ret void
		}

		; GCN-LABEL: {{^}}scalar_xor_na_b_i64_one_use
		; GCN: s_xnor_b64
		define amdgpu_kernel void @scalar_xor_na_b_i64_one_use(
		i64 addrspace(1)* %r0, i64 %a, i64 %b) {
		entry:
		%na = xor i64 %a, -1
		%r0.val = xor i64 %na, %b
		store i64 %r0.val, i64 addrspace(1)* %r0
		ret void
		}

	; Function Attrs: nounwind readnone	; Function Attrs: nounwind readnone
	declare i32 @llvm.amdgcn.workitem.id.x() #0	declare i32 @llvm.amdgcn.workitem.id.x() #0
Context not available.