Page MenuHomePhabricator

[AMDGPU] Fix DPP Combiner:
AbandonedPublic

Authored by vpykhtin on Jun 24 2020, 4:31 AM.

Details

Summary
  1. skip multiple per instruction DPP register usage.
  2. don't combine when DPP register is used as part of superreg/supersubreg.

I had to refactor the code to fix the first issue, so that the check for
multiple uses of DPP register can be done for all uses. In the process
I discovered that the combiner can try to combine DPP registers that
are part of superreg (REG_SEQUENCE) and aren't used as independent lanes.

execMayBeModifiedBeforeAnyUse is now called once for all uses. Number of
uses is constrained by the caller.

I removed code patching REG_SEQUENCE but instead stopped deleting original
DPP mov instruction, it will be deleted in the DCE.

Diff Detail

Unit TestsFailed

TimeTest
10,210 mslinux > libomp.env::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/env/kmp_set_dispatch_buf.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/env/Output/kmp_set_dispatch_buf.c.tmp -lm -latomic && env KMP_DISP_NUM_BUFFERS=0 /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/env/Output/kmp_set_dispatch_buf.c.tmp
1,340 mslinux > libomp.worksharing/for::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/worksharing/for/Output/kmp_set_dispatch_buf.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/worksharing/for/Output/kmp_set_dispatch_buf.c.tmp 7

Event Timeline

vpykhtin created this revision.Jun 24 2020, 4:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2020, 4:31 AM
arsenm added inline comments.Jun 24 2020, 6:58 AM
llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp
364

Usually for an output small vector, you pass SmallVectorImpl<Type>& rather than needing a template parameter for the size

365

Independ looks like a weird abbrevation to me. Shorten to Indep or lengthen to Independent?

367

Should cast to const SIRegisterInfo

370

It's weird to limit this based on the size of the small vector. It might make sense to add a maximum use to check, but disconnected from the vector size

382

Same as previous function

407

Why is this an rvalue reference? I also don't see why this needs to be a template

410–416

Why does this need to be a separate function? Can't you determine this over the initial use walk in collectRegUse?

504

s/much/many and end with newline

581

Typo orphane

vpykhtin marked 3 inline comments as done.Jun 24 2020, 8:11 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp
364

This is the way to pass the constraint on Uses. Initially the constraint was performed by execMayBeModifiedBeforeAnyUse but it's a bit late there. The constraint was introduced to reduce compile time so its of the same nature as a preallocation for Uses array and has similar value, so I decied to join them. Actually I don't need SmallVector here, but I haven't found vector with fixed size.

410–416

There're two places where Uses are collected: collectRegUse and collectIndependSubRegUse - I would need to add this code twice. Number of uses is limited to be small and the walk is cheap.

arsenm added inline comments.Jun 24 2020, 8:50 AM
llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp
410–416

You could early exit out of the whole search if you check up front, it's not really duplication there

vpykhtin updated this revision to Diff 273294.Jun 25 2020, 3:46 AM

Rebased, per review issues fixed:

  1. Use separate value to constrain use count
  2. Check multiple DPP operand during use collection
  3. Misc
vpykhtin marked 10 inline comments as done.Jun 25 2020, 3:48 AM
foad added a comment.Jun 25 2020, 4:15 AM
  1. skip multiple per instruction DPP register usage.
  2. don't combine when DPP register is used as part of superreg/supersubreg.

I had to refactor the code to fix the first issue,

Would it make sense to split this patch into two or three? Refactoring, fix bug 1, fix bug 2?

  1. skip multiple per instruction DPP register usage.
  2. don't combine when DPP register is used as part of superreg/supersubreg.

I had to refactor the code to fix the first issue,

Would it make sense to split this patch into two or three? Refactoring, fix bug 1, fix bug 2?

Yes, if there're no other objections.

vpykhtin abandoned this revision.Jun 25 2020, 7:31 AM

split to parts