kanarayan (Kannan Narayanan)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 12 2016, 1:12 PM (84 w, 14 h)

Recent Activity

May 16 2017

kanarayan added a comment to D33114: [AMDGPU] Fixes for the new waitcnt insertion pass. Add test..

Any update on the radv testing/timeline? Assuming all is well, I will check in this change today.

May 16 2017, 9:56 AM

May 11 2017

kanarayan added a comment to D33114: [AMDGPU] Fixes for the new waitcnt insertion pass. Add test..

Can you hold off on merging this until people have had a chance to test with radv and mesa.

May 11 2017, 8:47 PM
kanarayan created D33114: [AMDGPU] Fixes for the new waitcnt insertion pass. Add test..
May 11 2017, 7:47 PM

May 5 2017

kanarayan updated the diff for D32831: [AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header..

Remove more of the metadata manually.

May 5 2017, 1:19 PM
kanarayan updated the diff for D32831: [AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header..

Removed the opencl metadata manually and reran the instnamer.

May 5 2017, 12:48 PM
kanarayan added reviewers for D32831: [AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header.: arsenm, t-tye.
May 5 2017, 11:24 AM
kanarayan added inline comments to D32831: [AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header..
May 5 2017, 11:23 AM

May 4 2017

kanarayan updated the diff for D32831: [AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header..

Run instnamer on the test file.

May 4 2017, 9:33 PM

May 3 2017

kanarayan updated the diff for D32831: [AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header..

Updated to the correct test with loop.

May 3 2017, 4:25 PM
kanarayan created D32831: [AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header..
May 3 2017, 2:57 PM

Apr 19 2017

kanarayan added a comment to D32254: Revert earlier change that flagged ds permute operations as not affecting lgkm counter. .

I will add the test from the bug report as a lit test separately.

Apr 19 2017, 4:20 PM
kanarayan created D32254: Revert earlier change that flagged ds permute operations as not affecting lgkm counter. .
Apr 19 2017, 4:02 PM

Apr 11 2017

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

This update includes all the fixes so far, but keeps the old pass the default. We would like to get broader test coverage under the option first, and then add the tests and turn the new pass on by default.

Apr 11 2017, 6:59 PM
kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

Fix the looping issue found with the GL input. The bug only triggers for some targets under some conditions. This is due to an earlier fix for a VCCZ bug; that code needs to go away once the scheduler is fixed to handle this scenario. I need to add a lit test.

Apr 11 2017, 12:20 PM

Apr 10 2017

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

Fix the issue related to barrier.

Apr 10 2017, 8:36 PM

Apr 5 2017

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

clang-format

Apr 5 2017, 6:29 PM
kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

The update includes the following after rebasing:

Apr 5 2017, 6:20 PM

Mar 29 2017

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

In response to review comments from arsenm, the following updates/responses:

Mar 29 2017, 3:17 PM

Mar 28 2017

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

Use the architecture neutral interfaces to decode wait counts.

Mar 28 2017, 2:25 PM

Mar 25 2017

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

Fix the last patch that inadvertently deleted "amdgpu_kernel" added recently.

  1. Rename SI_RETURN to SI_RETURN_TO_EPILOG to reflect recent change.
  2. Condition the nop insertion to break soft clauses on isXNACKEnabled(). Add a note to also condition this code on hasSoftClauses when that code is put back.
  3. Fix tests added since last patch to reflect the new pass.
  4. Remove experimental code.
Mar 25 2017, 6:22 PM
kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.
  1. Rename SI_RETURN to SI_RETURN_TO_EPILOG to reflect recent change.
  2. Condition the nop insertion to break soft clauses on isXNACKEnabled(). Add a note to also condition this code on hasSoftClauses when that code is put back.
  3. Fix tests added since last patch to reflect the new pass.
  4. Remove experimental code.
Mar 25 2017, 5:28 PM

Mar 23 2017

kanarayan updated the diff for D31161: [AMDGPU] New Waitcnt Insertion Pass.

This addresses two issues:

  1. SQ_MAX_PGM_VGPRS and other constants are used to map the llvm register map to a data structure internal to this algorithm. For the register ampping used by the algorithm see the comments before the enum. They are maximum values across all targets. It is ideal to use dynamically sized arrays that fit the particular architecture target. As an interim step, I have changed these constants to enum values, asserted that no target has a larger register file in the main entry to this pass. (In response to comments from Tony and Konstantin) I have also updated the getRegInterval call. Please notice that the previous version was identical to the pass used by the old pass except I do the necessary adjustments for the mapping used by this algorithm. In a next step, I will remove the assert and the associated code.
  2. Barriers no longer force a zero waitcnt. For GFX9 and above, barrier needs no additional waitcnt. For lesser targets, waitcnts are added only if needed. The following tests already test barrier: LLVM :: CodeGen/AMDGPU/addrspacecast.ll LLVM :: CodeGen/AMDGPU/array-ptr-calc-i32.ll LLVM :: CodeGen/AMDGPU/ds-negative-offset-addressing-mode-loop.ll LLVM :: CodeGen/AMDGPU/ds_read2.ll LLVM :: CodeGen/AMDGPU/indirect-private-64.ll LLVM :: CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll LLVM :: CodeGen/AMDGPU/local-memory.amdgcn.ll LLVM :: CodeGen/AMDGPU/merge-stores.ll LLVM :: CodeGen/AMDGPU/schedule-vs-if-nested-loop-failure.ll LLVM :: CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll LLVM :: CodeGen/AMDGPU/store-barrier.ll LLVM :: CodeGen/AMDGPU/wait.ll

There is also a CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll

Mar 23 2017, 5:28 PM

Mar 20 2017

kanarayan created D31161: [AMDGPU] New Waitcnt Insertion Pass.
Mar 20 2017, 4:49 PM
kanarayan created D31159: New Waitcnt Insertion Pass.
Mar 20 2017, 4:20 PM

May 5 2016

kanarayan updated subscribers of D12414: [NVPTX] add an NVPTX-specific alias analysis.
May 5 2016, 10:55 AM

Apr 21 2016

kanarayan added a comment to D19371: [AMDGPU] prohibit >4 bytes private memory access. .

If this fixes correctness issue, commit please. Optimization can come after, if not along.

Apr 21 2016, 10:51 AM

Feb 23 2016

kanarayan added a comment to D17280: AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}.

Hi, any update?

Feb 23 2016, 1:58 PM