This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix DPP combiner check for exec modification
ClosedPublic

Authored by foad on Jul 9 2019, 1:49 AM.

Details

Summary

r363675 changed the exec modification helper function, now called
execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks
all instructions in the basic block, even beyond the last use. That
meant that the DPP combiner no longer worked in any basic block that
ended with a control flow instruction, and in particular it didn't work
on code sequences generated by the atomic optimizer.

Fix it by changing execMayBeModifiedBeforeUse to require UseMI and
changing the DPP combiner to check each use individually.

Event Timeline

foad created this revision.Jul 9 2019, 1:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2019, 1:50 AM

I missed r363675 commit and I think the original semantics of isEXECMaskConstantBetweenDefAndUses should be restored: there should be no scan after the last use.

execMayBeModifiedBeforeUse should be called as it was originally to check all uses at once to prevent rollback.

arsenm added a comment.Jul 9 2019, 3:18 PM

Checking all uses is unreasonable. There needs to be some search bounds to manage compile time

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
203–204

Can you add some cases with multiple uses independent of the atomic optimizer?

foad updated this revision to Diff 208900.Jul 10 2019, 2:26 AM
foad marked an inline comment as done.

Try a different approach.

Reinstate the old behaviour but in a separate helper function with
limits on the number of instructions scanned, and update the atomic
optimizer tests to more consistently check that the DPP combiner is
working on them.

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
203–204

See existing dpp_seq test in dpp_combine.mir.

Matt, I'm not against using scan limiter, just about aborting the scan after the last use.

Jay, I'm ok with your patch, just a note that the prt set can be avoided if we count the number of use instructions and use the count to limit the scan.

foad added a comment.Jul 10 2019, 5:37 AM

Jay, I'm ok with your patch, just a note that the prt set can be avoided if we count the number of use instructions and use the count to limit the scan.

I did think of that but I wasn't confident that I could implement it correctly. Does MRI.use_nodbg_instructions() guarantee not to repeat instructions, even if one instruction has multiple uses of the register?

Yea, I thought there is a guaranty on unique instructions, but looking into defusechain_instr_iterator I don't see how this can be true, so let's continue using the set.

Sorry, there is a code for returnining unique instr, so we can use the count:

defusechain_instr_iterator &operator++() {          // Preincrement
      assert(Op && "Cannot increment end iterator!");
      if (ByOperand)
        advance();
      else if (ByInstr) {
        MachineInstr *P = Op->getParent();
        do {
          advance();
        } while (Op && Op->getParent() == P); // <- removes duplicates
foad added a comment.Jul 10 2019, 6:07 AM

Sorry, there is a code for returnining unique instr, so we can use the count:

defusechain_instr_iterator &operator++() {          // Preincrement
      assert(Op && "Cannot increment end iterator!");
      if (ByOperand)
        advance();
      else if (ByInstr) {
        MachineInstr *P = Op->getParent();
        do {
          advance();
        } while (Op && Op->getParent() == P); // <- removes duplicates

But is it guaranteed that multiple uses by a single instruction will be adjacent in the list?

But is it guaranteed that multiple uses by a single instruction will be adjacent in the list?

I can't prove it but I think there would be no sense otherwise.

arsenm added inline comments.Jul 10 2019, 1:03 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6088

This should now be mandatory, and a reference

6102–6105

This should probably be a backwards scan, but that can be a separate change

6104–6105

Why is this erasing the use as it iterates?

llvm/lib/Target/AMDGPU/SIInstrInfo.h
1014 ↗(On Diff #208900)

Might as well update this to use Register

1022 ↗(On Diff #208900)

Should use Register

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
203–204

They weren't catching this issue

foad marked 2 inline comments as done.Jul 10 2019, 1:51 PM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6102–6105

What's the advantage?

6104–6105

So we can exit the loop when Uses.empty().

foad updated this revision to Diff 209061.Jul 10 2019, 1:53 PM

Address some review comments.

foad marked 3 inline comments as done.Jul 10 2019, 1:55 PM

I'll add tests tomorrow.

arsenm added inline comments.Jul 10 2019, 2:06 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6102–6105

I was thinking about kills, but it doesn't matter in SSA

6104–6105

I mean, this usage of a set doesn't make sense to me. Why can't you just increment a seen use count?

foad updated this revision to Diff 209142.Jul 11 2019, 1:40 AM

Count non-debug use operands of non-debug instructions instead of using
a SmallPtrSet.

foad marked an inline comment as done.Jul 11 2019, 1:42 AM
foad updated this revision to Diff 209147.Jul 11 2019, 2:35 AM

Add a DPP combiner test that would have caught the problem.

foad marked 2 inline comments as done.Jul 11 2019, 2:35 AM
vpykhtin added inline comments.Jul 11 2019, 4:34 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

Now I recall why I used the set originally - to detect use instruction fast. Matt, is there other way to detect whether the instruction use a register without scanning its operands?

vpykhtin added inline comments.Jul 11 2019, 4:38 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

and without scanning whether an instruction is in the use list for a register.

foad marked an inline comment as done.Jul 11 2019, 4:58 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

We have to scan the operands of every instruction anyway, in order to test I->modifiesRegister(AMDGPU::EXEC, TRI). Your original code did that too.

vpykhtin added inline comments.Jul 11 2019, 5:42 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

Good point. BTW you can probably use readsRegister instead of the loop.

foad marked an inline comment as done.Jul 11 2019, 6:01 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

readsRegister doesn't appear to ignore debug operands.

vpykhtin accepted this revision.Jul 12 2019, 7:06 AM

I think we can submit this.

This revision is now accepted and ready to land.Jul 12 2019, 7:06 AM
arsenm added inline comments.Jul 12 2019, 7:14 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

debug uses should only appear on the debug pseudo-instructions, which you already skipped with the ->isDebugInstr check

foad marked an inline comment as done.Jul 12 2019, 8:13 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

OK, if that is guaranteed (I wasn't sure) I could remove the !Op.isDebug() test. Do you want me to? But I still can't use readsRegister because I need to decrement NumUse multiple times if there are multiple operands that read the register.

I could also fold the check for modifying EXEC into the same loop, so that we don't have to do two separate loops over the operands of each instruction. But I don't know if it's worth it.

vpykhtin added inline comments.Jul 12 2019, 8:20 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

You can calculate NumUse as a number of use_nodbg_instructions in the first loop

foad marked an inline comment as done.Jul 12 2019, 8:29 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

We've discussed that before. It only works if use_nodbg_instructions doesn't repeat instructions, which is only true if multiple uses by the same instruction are adjacent in the use list, which I'm not sure is necessarily true.

foad updated this revision to Diff 209495.Jul 12 2019, 8:35 AM

Remove unnecessary check for debug operands.

foad marked an inline comment as done.Jul 12 2019, 8:36 AM
vpykhtin added inline comments.Jul 12 2019, 8:38 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6116

Again, there would be no sence if use_nodbg_instructions repeat instructions. Trust it and use :)

foad updated this revision to Diff 209498.Jul 12 2019, 8:50 AM

Count instructions instead of uses.

foad marked an inline comment as done.Jul 12 2019, 8:51 AM

Looks good, thank you!

This revision was automatically updated to reflect the committed changes.

I've found the case when execMayBeModifiedBeforeAnyUse randomly leads to a coredump, which is hard to debug. Most likely it's because an instruction beyond the end of a basic block is accessed. This means that the first loop calculates some instructions twice and I was wrong assuming use_nodbg_instructions doesn't repeat them. In fact there is no code in MachineRegisterInfo::verifyUseList that ensures that uses belonging to one instruction should be sequent in the use list nor the traces of such ordering can be found in MachineRegisterInfo::addRegOperandToUseList. I'm going to fix this code.

foad added a comment.Oct 12 2020, 1:24 AM

I've found the case when execMayBeModifiedBeforeAnyUse randomly leads to a coredump, which is hard to debug. Most likely it's because an instruction beyond the end of a basic block is accessed. This means that the first loop calculates some instructions twice and I was wrong assuming use_nodbg_instructions doesn't repeat them. In fact there is no code in MachineRegisterInfo::verifyUseList that ensures that uses belonging to one instruction should be sequent in the use list nor the traces of such ordering can be found in MachineRegisterInfo::addRegOperandToUseList. I'm going to fix this code.

:)

I think "Diff 6" of this patch should handle that case correctly.