This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fold readlane from copy of SGPR or imm
ClosedPublic

Authored by arsenm on Jun 12 2019, 12:49 PM.

Details

Reviewers
rampitec
nhaehnle
Summary

These may be inserted to assert uniformity somewhere.

Diff Detail

Event Timeline

arsenm created this revision.Jun 12 2019, 12:49 PM
rampitec added inline comments.Jun 12 2019, 12:58 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
525

I do not think it is correct. If first lane is off you will fold wrong value. In both cases of readfirstlane and readlane you need to check a specific lane, which I am not sure even possible.

arsenm marked an inline comment as done.Jun 12 2019, 1:07 PM
arsenm added inline comments.
lib/Target/AMDGPU/SIFoldOperands.cpp
525

readlane ignores exec. readfirstlane only reads exec to find the first lane, and still returns a result when exec is 0. The source value is a constant, so it doesn't matter what the lane is

rampitec added inline comments.Jun 12 2019, 1:13 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
525

Readlane ignores exec, but v_mov_b32 does not. So you may have an old value of vgpr instead of immediate.
For readfirstlane you will get wrong value on the sgpr if exec=0 I believe.

arsenm marked an inline comment as done.Jun 12 2019, 1:15 PM
arsenm added inline comments.
lib/Target/AMDGPU/SIFoldOperands.cpp
525

It would matter if there was an exec def between the copy and the use, which I want to ban within a block

rampitec added inline comments.Jun 12 2019, 1:19 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
525

But it is not banned yet, right? Are we sure that only happens within a block? Why block cannot be executed with a zero exec? And then at the very least we need to be sure register is virtual.

arsenm marked an inline comment as done.Jun 12 2019, 1:24 PM
arsenm added inline comments.
lib/Target/AMDGPU/SIFoldOperands.cpp
525

We force execz skip branches around blocks that have readlane/readfirstlane in them. exec = 0 should also be illegal on function entry

rampitec added inline comments.Jun 12 2019, 2:27 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
525

I thought about it. From the ISA perspective it is illegal if you are reading an inactive lane. This can be legal here on the grounds that register is virtual and we are in SSA. Are both conditions true?

arsenm updated this revision to Diff 204380.Jun 12 2019, 3:50 PM

Check for exec def

arsenm marked an inline comment as done.Jun 12 2019, 3:51 PM
arsenm added inline comments.
lib/Target/AMDGPU/SIFoldOperands.cpp
525

SGPR spilling does read/write VGPRs for inactive lanes, but the use of an SSA value for an invalid lane doesn't make sense. I'm not sure if the exec check is strictly necessary, but I've added it.

rampitec added inline comments.Jun 12 2019, 5:33 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
525

I do not think this check really changes anything. Look: you have lane 8 disabled, both at def and use. Then you do readlane for the lane 8. It will pass the check and yield wrong value. So this check is not necessary and does not help too.

There are just two conditions which may make it legal: 1) register is virtual 2) we are in SSA. If both a satisfied you can do it, because there can be no other valid def anyway.

rampitec requested changes to this revision.Jun 14 2019, 5:43 PM

This cannot work as written, at least readlane part. Maybe readfirstlane can, providing skip is always inserted around and def is in the same block. Otherwise it needs to be a virtual register and SSA.

This revision now requires changes to proceed.Jun 14 2019, 5:43 PM

This cannot work as written, at least readlane part. Maybe readfirstlane can, providing skip is always inserted around and def is in the same block. Otherwise it needs to be a virtual register and SSA.

SIFoldOperands is SSA only, and physical registers are filtered out already

This cannot work as written, at least readlane part. Maybe readfirstlane can, providing skip is always inserted around and def is in the same block. Otherwise it needs to be a virtual register and SSA.

SIFoldOperands is SSA only, and physical registers are filtered out already

Why do we check for virtual registers all over the place place then, including the few lines above?

Also note, if virtual and SSA, then check for exec modification is not needed because that must be the only def anyway. Otherwise it reads undef that way or another.

Thanks, this is very helpful!

Does this collapse redundant readfirstlanes? I.e., the case where the result of a readfirstlane is immediately fed to another readfirstlane. I think it might fix that -- could you please add a corresponding test?

lib/Target/AMDGPU/SIFoldOperands.cpp
525

You do need the EXEC mask check either way because the use could be observing a def inside a divergent loop.

For the readfirstlane case, that check should be sufficient. For the readlane case, I agree that you need to know the register is virtual and in SSA. But with that knowledge, I believe the code should be good as-is.

rampitec added inline comments.Jun 16 2019, 12:00 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
525

Right. Exec check probably needed, at any rate it doesn't hurt. But readlane case needs either be removed or additional check needed.

arsenm updated this revision to Diff 205102.Jun 17 2019, 9:32 AM
arsenm marked an inline comment as done.

Add test for multiple folds

lib/Target/AMDGPU/SIFoldOperands.cpp
525

Both of those are known. This pass is only used SSA, and skips physical registers before this point

rampitec added inline comments.Jun 17 2019, 11:22 AM
lib/Target/AMDGPU/SIFoldOperands.cpp
525

It is SSA, granted. But I do not see that we check for virtual register at all possible foldOperand() invocations, only in one of the cases.

arsenm marked an inline comment as done.Jun 17 2019, 11:44 AM
arsenm added inline comments.
lib/Target/AMDGPU/SIFoldOperands.cpp
525

The search for users begins from a virtual register def. foldInstOperand is never called

arsenm updated this revision to Diff 205138.Jun 17 2019, 11:45 AM

Add a few more examples of physical registers not folding

rampitec added inline comments.Jun 17 2019, 12:59 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
525

There are quite a few places where we call foldOperand(), including recursively. Only foldInstOperand() is guarded with the check. It is not immediately obvious none of that calls will end up with a virtual register only. Moreover, we are checking for virtual destination registers in the very same function. It will be safe to add the check here, otherwise I cannot prove to myself it is safe.

rampitec accepted this revision.Jun 17 2019, 1:41 PM

LGTM if precheckin will succeed for this and for D63456.

This revision is now accepted and ready to land.Jun 17 2019, 1:42 PM
arsenm closed this revision.Jun 18 2019, 5:34 AM

r363670