This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid using s_cmpk when src0 is not register
ClosedPublic

Authored by ruiling on Jul 1 2020, 8:00 PM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle

Summary

The hardware spec require src0 of s_cmpk should be a register. So, we
should not optimize s_cmp to s_cmpk if src0 is not register.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ruiling created this revision.Jul 1 2020, 8:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2020, 8:00 PM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 9 others. · View Herald Transcript

arsenm added inline comments.Jul 1 2020, 8:14 PM

llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6	positive checks are more useful. Also you can just generate these checks. Can you reproduce this with an IR test too?

ruiling marked an inline comment as done.Jul 1 2020, 8:59 PM

ruiling added inline comments.

llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6	will try positive check, how to generate the checks? could you give a little bit more info? The original test case that hit the issue is over-complex I think. Normally, a constant expression at IR level is easy to be optimized off by the middle-end. so I think a .mir test is enough for this issue.

Harbormaster failed remote builds in B62620: Diff 274996!Jul 1 2020, 9:06 PM

auto-generate the checks in test

Harbormaster completed remote builds in B62627: Diff 275003.Jul 1 2020, 10:43 PM

ping?

arsenm accepted this revision.Jul 6 2020, 7:05 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6	So what is the context this appears? Why wasn't it optimized out?

This revision is now accepted and ready to land.Jul 6 2020, 7:05 AM

Did you see @arsenm's comment?

ruiling marked an inline comment as done.Jul 6 2020, 5:28 PM

ruiling added inline comments.

llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6	well I didn't carefully check the program yet to understand why the optimization algorithms in llvm fails to optimize the program. but I think that is another problem that worth a careful investigation. I will investigate and try to optimize it off later. But I think this patch can be merged, right? can anyone help to merge? I don't have commit access.

ruiling marked an inline comment as done.Jul 7 2020, 2:21 AM

ruiling added inline comments.

llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6	@arsenm The issue occurs when running vulkancts again AMD open-source vulkan driver. I did a little more check, the test-case has ~40 BBs and lots of phi instructions which is later simplified and proved to be constant. And the problem may be because LLPC choose a subset of llvm optimization passes considering compilation time (https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/patch/Patch.cpp#L223). I tried use LLVM standard set of passes, the constant was optimized off. I think the optimization passes for vulkan may need further tuning to reach a better trade-off between compile-time and quality of generated code.

ping. can you help push the patch? @arsenm @nhaehnle

In D83020#2146471, @ruiling wrote:

ping. can you help push the patch? @arsenm @nhaehnle

Do you have commit access? If not, I can commit this. See https://llvm.org/docs/DeveloperPolicy.html#id16 if you want to request access.

@foad I don't have commit access, please go help push the patch if you think this is ok. I will apply commit access after some patches accepted. Thanks for pointing the useful link:)

Committed in 1658b8d7ddb65eb78e1304b009f1043ab6d9463f (sorry I forgot to put a link to this review in the commit message).

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIShrinkInstructions.cpp

5 lines

test/

CodeGen/

AMDGPU/

cmp_shrink.mir

11 lines

Diff 275003

llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp

	Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
	}			}

	static void shrinkScalarCompare(const SIInstrInfo *TII, MachineInstr &MI) {			static void shrinkScalarCompare(const SIInstrInfo *TII, MachineInstr &MI) {
	// cmpk instructions do scc = dst <cc op> imm16, so commute the instruction to			// cmpk instructions do scc = dst <cc op> imm16, so commute the instruction to
	// get constants on the RHS.			// get constants on the RHS.
	if (!MI.getOperand(0).isReg())			if (!MI.getOperand(0).isReg())
	TII->commuteInstruction(MI, false, 0, 1);			TII->commuteInstruction(MI, false, 0, 1);

				// cmpk requires src0 to be a register
				const MachineOperand &Src0 = MI.getOperand(0);
				if (!Src0.isReg())
				return;

	const MachineOperand &Src1 = MI.getOperand(1);			const MachineOperand &Src1 = MI.getOperand(1);
	if (!Src1.isImm())			if (!Src1.isImm())
	return;			return;

	int SOPKOpc = AMDGPU::getSOPKOp(MI.getOpcode());			int SOPKOpc = AMDGPU::getSOPKOp(MI.getOpcode());
	if (SOPKOpc == -1)			if (SOPKOpc == -1)
	return;			return;

	▲ Show 20 Lines • Show All 609 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cmp_shrink.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass si-shrink-instructions -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				---
				name: not_shrink_icmp
				body: \|
				arsenmUnsubmitted Not Done Reply Inline Actions positive checks are more useful. Also you can just generate these checks. Can you reproduce this with an IR test too? arsenm: positive checks are more useful. Also you can just generate these checks. Can you reproduce…
				ruilingAuthorUnsubmitted Done Reply Inline Actions will try positive check, how to generate the checks? could you give a little bit more info? The original test case that hit the issue is over-complex I think. Normally, a constant expression at IR level is easy to be optimized off by the middle-end. so I think a .mir test is enough for this issue. ruiling: will try positive check, how to generate the checks? could you give a little bit more info? The…
				arsenmUnsubmitted Not Done Reply Inline Actions So what is the context this appears? Why wasn't it optimized out? arsenm: So what is the context this appears? Why wasn't it optimized out?
				ruilingAuthorUnsubmitted Done Reply Inline Actions well I didn't carefully check the program yet to understand why the optimization algorithms in llvm fails to optimize the program. but I think that is another problem that worth a careful investigation. I will investigate and try to optimize it off later. But I think this patch can be merged, right? can anyone help to merge? I don't have commit access. ruiling: well I didn't carefully check the program yet to understand why the optimization algorithms in…
				ruilingAuthorUnsubmitted Done Reply Inline Actions @arsenm The issue occurs when running vulkancts again AMD open-source vulkan driver. I did a little more check, the test-case has ~40 BBs and lots of phi instructions which is later simplified and proved to be constant. And the problem may be because LLPC choose a subset of llvm optimization passes considering compilation time (https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/patch/Patch.cpp#L223). I tried use LLVM standard set of passes, the constant was optimized off. I think the optimization passes for vulkan may need further tuning to reach a better trade-off between compile-time and quality of generated code. ruiling: @arsenm The issue occurs when running vulkancts again AMD open-source vulkan driver. I did a…
				bb.0:
				; GCN-LABEL: name: not_shrink_icmp
				; GCN: S_CMP_GT_I32 1, 65, implicit-def $scc
				S_CMP_GT_I32 1, 65, implicit-def $scc
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid using s_cmpk when src0 is not registerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 275003

llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp

llvm/test/CodeGen/AMDGPU/cmp_shrink.mir

[AMDGPU] Avoid using s_cmpk when src0 is not register
ClosedPublic