Download Raw Diff

Details

Reviewers

Commits

rG46513965842f: [AMDGPU] Avoid predicated execution of the basic blocks containing scalar…
rL314828: [AMDGPU] Avoid predicated execution of the basic blocks containing scalar

Summary

V_READFIRSTLANE and V_READLANE produce scalar result from vector operand. Scalar instructions consuming the result should not be executed if exec mask is zero.

Tests passed : make llvm-check

Diff Detail

Repository: rL LLVM

Event Timeline

alex-t created this revision.Sep 26 2017, 2:03 PM

Herald added subscribers: nhaehnle, arsenm. · View Herald TranscriptSep 26 2017, 2:03 PM

alex-t added a reviewer: rampitec.Sep 26 2017, 2:14 PM

alex-t edited the summary of this revision. (Show Details)Sep 26 2017, 2:18 PM

Test is needed.

lib/Target/AMDGPU/SIInsertSkips.cpp
137 ↗	(On Diff #116708)	The comment is misleading. Scalar instructions executed even if exec = 0 (contrarily to the comment). That is unclear if there must be a scalar instruction consuming result of readlane too, since SGPR can be an operand of a vector instruction.

Added the code that checks that scalar register produced by V_READFIRSTLANE/V_READLANE

Is really used by some SALU instruction. This is necessary to avoid de-optimization of those cases when the scalar register is distinctly the scalar operand of vector instruction.

MIR test added

rampitec added inline comments.Oct 2 2017, 10:22 AM

lib/Target/AMDGPU/SIInsertSkips.cpp
145 ↗	(On Diff #117363)	What if an user is actually hoisted out the block? What if that is terminator uses it? In both cases users should then read undefined data.
148 ↗	(On Diff #117363)	What is interesting, an user can be also another v_readlane_b32's lane select operand. Subsequent v_readlane_b32 would be executed even with exec = 0 and will read an undefined data.
test/CodeGen/AMDGPU/readlane_exec0.mir
1 ↗	(On Diff #117363)	Please use -check-label=GCN and GCN labels instead of CHECK. This seems to be our new convention. Also add -verify-machineinstrs.
3 ↗	(On Diff #117363)	Label in this case needs to be function's name. That is to add new function to the test later. bb.0 then becomes a regular check.

It is really make sense to take care of the V_READFIRSTLANE/V_READLANE destination register under exec == 0 condition in case their source VGPR is re-defined in SI_MASK_BRANCH target block. Otherwise we assume that source VGPR is defined in one of the dominating blocks and contain correct value.

rampitec added inline comments.Oct 2 2017, 2:08 PM

lib/Target/AMDGPU/SIInsertSkips.cpp
145 ↗	(On Diff #117416)	It is named operand "src0", using getOperand(1) is not desirable. Also VReg is misleading, it reads like "virtual register".
156 ↗	(On Diff #117416)	Post increment is broken here.
157 ↗	(On Diff #117416)	Please follow the general and surrounding style: brace on the same line with expression.
175 ↗	(On Diff #117416)	That is broken. Remember that register is physical. Not speaking that is whole function scan is expensive.
test/CodeGen/AMDGPU/readlane_exec0.mir
3 ↗	(On Diff #117416)	You still need to add GCN-LABEL

I would just bail on any of these instructions not trying to optimize the case, just like you did in the beginning.

I want to replace this pass by always inserting the branches on execz, and have a new pass which optimizes out short jumps. Would that be easier than trying to analyze this?

Reverted back to the simplest approach.

LGTM

This revision is now accepted and ready to land.Oct 3 2017, 10:20 AM

Closed by commit rL314828: [AMDGPU] Avoid predicated execution of the basic blocks containing scalar (authored by alex-t). · Explain WhyOct 3 2017, 11:57 AM

This revision was automatically updated to reflect the committed changes.

alex-t reopened this revision.Oct 17 2017, 12:03 PM

This comment was removed by alex-t.

This revision is now accepted and ready to land.Oct 17 2017, 12:03 PM

r314828

This is an archive of the discontinued LLVM Phabricator instance.

Avoid predicated execution of the basic blocks containing scalar instructions
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 117559

llvm/trunk/lib/Target/AMDGPU/SIInsertSkips.cpp

llvm/trunk/test/CodeGen/AMDGPU/readlane_exec0.mir

This is an archive of the discontinued LLVM Phabricator instance.

Avoid predicated execution of the basic blocks containing scalar instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 117559

llvm/trunk/lib/Target/AMDGPU/SIInsertSkips.cpp

llvm/trunk/test/CodeGen/AMDGPU/readlane_exec0.mir

Avoid predicated execution of the basic blocks containing scalar instructions
ClosedPublic