kanarayan (Kannan Narayanan)
User

Projects

User does not belong to any projects.
User Since
Jan 12 2016, 1:12 PM (62 w, 4 d)

Recent Activity

Yesterday

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

Fix the last patch that inadvertently deleted "amdgpu_kernel" added recently.

  1. Rename SI_RETURN to SI_RETURN_TO_EPILOG to reflect recent change.
  2. Condition the nop insertion to break soft clauses on isXNACKEnabled(). Add a note to also condition this code on hasSoftClauses when that code is put back.
  3. Fix tests added since last patch to reflect the new pass.
  4. Remove experimental code.
Sat, Mar 25, 6:22 PM
kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.
  1. Rename SI_RETURN to SI_RETURN_TO_EPILOG to reflect recent change.
  2. Condition the nop insertion to break soft clauses on isXNACKEnabled(). Add a note to also condition this code on hasSoftClauses when that code is put back.
  3. Fix tests added since last patch to reflect the new pass.
  4. Remove experimental code.
Sat, Mar 25, 5:28 PM

Thu, Mar 23

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

This addresses two issues:

  1. SQ_MAX_PGM_VGPRS and other constants are used to map the llvm register map to a data structure internal to this algorithm. For the register ampping used by the algorithm see the comments before the enum. They are maximum values across all targets. It is ideal to use dynamically sized arrays that fit the particular architecture target. As an interim step, I have changed these constants to enum values, asserted that no target has a larger register file in the main entry to this pass. (In response to comments from Tony and Konstantin) I have also updated the getRegInterval call. Please notice that the previous version was identical to the pass used by the old pass except I do the necessary adjustments for the mapping used by this algorithm. In a next step, I will remove the assert and the associated code.
  2. Barriers no longer force a zero waitcnt. For GFX9 and above, barrier needs no additional waitcnt. For lesser targets, waitcnts are added only if needed. The following tests already test barrier: LLVM :: CodeGen/AMDGPU/addrspacecast.ll LLVM :: CodeGen/AMDGPU/array-ptr-calc-i32.ll LLVM :: CodeGen/AMDGPU/ds-negative-offset-addressing-mode-loop.ll LLVM :: CodeGen/AMDGPU/ds_read2.ll LLVM :: CodeGen/AMDGPU/indirect-private-64.ll LLVM :: CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll LLVM :: CodeGen/AMDGPU/local-memory.amdgcn.ll LLVM :: CodeGen/AMDGPU/merge-stores.ll LLVM :: CodeGen/AMDGPU/schedule-vs-if-nested-loop-failure.ll LLVM :: CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll LLVM :: CodeGen/AMDGPU/store-barrier.ll LLVM :: CodeGen/AMDGPU/wait.ll

There is also a CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll

Thu, Mar 23, 5:28 PM

Mon, Mar 20

kanarayan created D31161: [AMDGPU] New Waitcnt Insertion Pass.
Mon, Mar 20, 4:49 PM
kanarayan created D31159: New Waitcnt Insertion Pass.
Mon, Mar 20, 4:20 PM

May 5 2016

kanarayan updated subscribers of D12414: [NVPTX] add an NVPTX-specific alias analysis.
May 5 2016, 10:55 AM

Apr 21 2016

kanarayan added a comment to D19371: [AMDGPU] prohibit >4 bytes private memory access. .

If this fixes correctness issue, commit please. Optimization can come after, if not along.

Apr 21 2016, 10:51 AM

Feb 23 2016

kanarayan added a comment to D17280: AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}.

Hi, any update?

Feb 23 2016, 1:58 PM