Page MenuHomePhabricator

rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (288 w, 6 d)

Recent Activity

Today

rampitec committed rL375209: Request commit access for rampitec.
Request commit access for rampitec
Fri, Oct 18, 1:22 AM

Yesterday

rampitec committed rGbefab66a2c8f: [AMDGPU] drop getIsFP td helper (authored by rampitec).
[AMDGPU] drop getIsFP td helper
Thu, Oct 17, 2:48 PM
rampitec closed D69138: [AMDGPU] drop getIsFP td helper.
Thu, Oct 17, 2:47 PM · Restricted Project
rampitec committed rL375175: [AMDGPU] drop getIsFP td helper.
[AMDGPU] drop getIsFP td helper
Thu, Oct 17, 2:47 PM
rampitec created D69138: [AMDGPU] drop getIsFP td helper.
Thu, Oct 17, 2:19 PM · Restricted Project
rampitec accepted D69096: [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32.

LGTM

Thu, Oct 17, 1:04 PM
rampitec added inline comments to D69095: [AMDGPU][MC][GFX9] Corrected parsing of v_cndmask_b32_sdwa.
Thu, Oct 17, 12:35 PM
rampitec added a comment to D69096: [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32.
In D69096#1713450, @dp wrote:

Still missing foreach around dpp.

I added them but found no differences in *.inc files.
Are they necessary as a reserve for the future?

Thu, Oct 17, 12:35 PM
rampitec added a comment to D69096: [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32.

Still missing foreach around dpp.

Thu, Oct 17, 12:17 PM
rampitec added inline comments to D69096: [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32.
Thu, Oct 17, 9:27 AM
rampitec accepted D69095: [AMDGPU][MC][GFX9] Corrected parsing of v_cndmask_b32_sdwa.

LGTM

Thu, Oct 17, 9:18 AM
rampitec added a comment to D68873: [AMDGPU] Amend target loop unroll defaults.

In addition it can be unrelated to the threshold at all. It may be a flaw in the cost model for specific instructions. Please also see D68881 which started to address cost model issues.

Thu, Oct 17, 9:09 AM · Restricted Project
rampitec added a comment to D68873: [AMDGPU] Amend target loop unroll defaults.

I disagree to the idea of having different thresholds based on the runtime. A runtime has nothing to do with it. For example compute can work on top of ROCm or PAL. Can you justify different results for the same programs?

Thu, Oct 17, 9:08 AM · Restricted Project

Wed, Oct 16

rampitec committed rGedcd5815ced6: [AMDGPU] Do not combine dpp mov reading physregs (authored by rampitec).
[AMDGPU] Do not combine dpp mov reading physregs
Wed, Oct 16, 12:31 PM
rampitec committed rL375033: [AMDGPU] Do not combine dpp mov reading physregs.
[AMDGPU] Do not combine dpp mov reading physregs
Wed, Oct 16, 12:31 PM
rampitec closed D69065: [AMDGPU] Do not combine dpp mov reading physregs.
Wed, Oct 16, 12:31 PM · Restricted Project
rampitec created D69065: [AMDGPU] Do not combine dpp mov reading physregs.
Wed, Oct 16, 12:21 PM · Restricted Project
rampitec committed rG3d99310c15e4: [AMDGPU] Do not combine dpp with physreg def (authored by rampitec).
[AMDGPU] Do not combine dpp with physreg def
Wed, Oct 16, 11:53 AM
rampitec closed D69063: [AMDGPU] Do not combine dpp with physreg def.
Wed, Oct 16, 11:53 AM · Restricted Project
rampitec committed rL375030: [AMDGPU] Do not combine dpp with physreg def.
[AMDGPU] Do not combine dpp with physreg def
Wed, Oct 16, 11:53 AM
rampitec added inline comments to D69063: [AMDGPU] Do not combine dpp with physreg def.
Wed, Oct 16, 11:52 AM · Restricted Project
rampitec created D69063: [AMDGPU] Do not combine dpp with physreg def.
Wed, Oct 16, 11:33 AM · Restricted Project
rampitec committed rGd4ab74ee0b37: [AMDGPU] Supress unused sdwa insts generation (authored by rampitec).
[AMDGPU] Supress unused sdwa insts generation
Wed, Oct 16, 10:00 AM
rampitec closed D69010: [AMDGPU] Supress unused sdwa insts generation.
Wed, Oct 16, 10:00 AM · Restricted Project
rampitec committed rL375016: [AMDGPU] Supress unused sdwa insts generation.
[AMDGPU] Supress unused sdwa insts generation
Wed, Oct 16, 10:00 AM
rampitec added a comment to D69010: [AMDGPU] Supress unused sdwa insts generation.

I haven't looked into the set of instructions that are effected by this, but the TableGen LGTM.

Wed, Oct 16, 8:18 AM · Restricted Project

Tue, Oct 15

rampitec updated the diff for D69010: [AMDGPU] Supress unused sdwa insts generation.

Added VOPC. This has erased 175 instructions in total.

Tue, Oct 15, 5:08 PM · Restricted Project
rampitec created D69010: [AMDGPU] Supress unused sdwa insts generation.
Tue, Oct 15, 3:37 PM · Restricted Project
rampitec accepted D68970: AMDGPU: Fix infinite searches in SIFixSGPRCopies.

LGTM

Tue, Oct 15, 11:17 AM · Restricted Project
rampitec committed rG1184c27fa586: [AMDGPU] Support mov dpp with 64 bit operands (authored by rampitec).
[AMDGPU] Support mov dpp with 64 bit operands
Tue, Oct 15, 9:43 AM
rampitec committed rL374910: [AMDGPU] Support mov dpp with 64 bit operands.
[AMDGPU] Support mov dpp with 64 bit operands
Tue, Oct 15, 9:43 AM
rampitec closed D68673: [AMDGPU] Support mov dpp with 64 bit operands.
Tue, Oct 15, 9:42 AM · Restricted Project
rampitec committed rG6e8599d93979: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE (authored by rampitec).
[AMDGPU] Allow DPP combiner to work with REG_SEQUENCE
Tue, Oct 15, 9:24 AM
rampitec committed rL374908: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
[AMDGPU] Allow DPP combiner to work with REG_SEQUENCE
Tue, Oct 15, 9:24 AM
rampitec closed D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
Tue, Oct 15, 9:24 AM · Restricted Project
rampitec added inline comments to D68970: AMDGPU: Fix infinite searches in SIFixSGPRCopies.
Tue, Oct 15, 1:58 AM · Restricted Project

Mon, Oct 14

rampitec updated the diff for D68673: [AMDGPU] Support mov dpp with 64 bit operands.

Rebased.

Mon, Oct 14, 1:25 PM · Restricted Project
rampitec updated the diff for D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.

Full accounting for undefs.

Mon, Oct 14, 12:47 PM · Restricted Project
rampitec updated the diff for D68673: [AMDGPU] Support mov dpp with 64 bit operands.

Addressed comments.

Mon, Oct 14, 10:52 AM · Restricted Project
rampitec updated the diff for D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.

Removed dpp subreg handling, a subreg cannot be defined in SSA.

Mon, Oct 14, 10:34 AM · Restricted Project
rampitec added inline comments to D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
Mon, Oct 14, 10:24 AM · Restricted Project
rampitec added a reviewer for D68673: [AMDGPU] Support mov dpp with 64 bit operands: vpykhtin.
Mon, Oct 14, 9:47 AM · Restricted Project

Fri, Oct 11

rampitec added inline comments to D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
Fri, Oct 11, 4:14 PM · Restricted Project
rampitec updated the diff for D68673: [AMDGPU] Support mov dpp with 64 bit operands.

Rebased.
Removed special handling of gfx10, it uses the same pseudo now.

Fri, Oct 11, 3:56 PM · Restricted Project
rampitec updated the diff for D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.

Rebased.

Fri, Oct 11, 3:38 PM · Restricted Project
rampitec committed rGf87fe45d5c3f: [AMDGPU] Use GCN prefix in dpp_combine.mir. NFC. (authored by rampitec).
[AMDGPU] Use GCN prefix in dpp_combine.mir. NFC.
Fri, Oct 11, 3:29 PM
rampitec committed rL374607: [AMDGPU] Use GCN prefix in dpp_combine.mir. NFC..
[AMDGPU] Use GCN prefix in dpp_combine.mir. NFC.
Fri, Oct 11, 3:29 PM
rampitec committed rGe2d104f64ca8: [AMDGPU] link dpp pseudos and real instructions on gfx10 (authored by rampitec).
[AMDGPU] link dpp pseudos and real instructions on gfx10
Fri, Oct 11, 3:10 PM
rampitec closed D68888: [AMDGPU] link dpp pseudos and real instructions on gfx10.
Fri, Oct 11, 3:10 PM · Restricted Project
rampitec accepted D68894: AMDGPU: Increase vcc liveness scan threshold.

LGTM

Fri, Oct 11, 3:10 PM
rampitec committed rL374604: [AMDGPU] link dpp pseudos and real instructions on gfx10.
[AMDGPU] link dpp pseudos and real instructions on gfx10
Fri, Oct 11, 3:01 PM
rampitec updated the diff for D68888: [AMDGPU] link dpp pseudos and real instructions on gfx10.

Added test.

Fri, Oct 11, 2:50 PM · Restricted Project
rampitec created D68888: [AMDGPU] link dpp pseudos and real instructions on gfx10.
Fri, Oct 11, 2:04 PM · Restricted Project
rampitec requested changes to D68873: [AMDGPU] Amend target loop unroll defaults.

How big was the performance testing?

Fri, Oct 11, 11:10 AM · Restricted Project

Thu, Oct 10

rampitec added a comment to D68338: [AMDGPU] Remove dubious logic in bidirectional list scheduler.

Reviewers: any advice on handling lots of test updates like this? I could pre-commit some of the tests, where I've made them strictly more lenient. I could also add -enable-misched=false to any tests that aren't specifically testing the scheduler, update them and pre-commit that, in order to protect them from this and future scheduler tweaks.

Thu, Oct 10, 6:17 PM · Restricted Project
rampitec added a reviewer for D68673: [AMDGPU] Support mov dpp with 64 bit operands: mjbedy.
Thu, Oct 10, 5:49 PM · Restricted Project
rampitec added a reviewer for D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE: mjbedy.
Thu, Oct 10, 5:49 PM · Restricted Project
rampitec added a parent revision for D68673: [AMDGPU] Support mov dpp with 64 bit operands: D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
Thu, Oct 10, 5:31 PM · Restricted Project
rampitec updated the diff for D68673: [AMDGPU] Support mov dpp with 64 bit operands.

GCNDPPCombiner can split the new pseudo and then handle the split.
Post-RA split is needed anyway since combining is an optimization.
Tests are updated to handle case w/o optimization.

Thu, Oct 10, 5:31 PM · Restricted Project
rampitec added a child revision for D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE: D68673: [AMDGPU] Support mov dpp with 64 bit operands.
Thu, Oct 10, 5:31 PM · Restricted Project
rampitec updated the diff for D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.

Addressed comments.

Thu, Oct 10, 3:58 PM · Restricted Project
rampitec added inline comments to D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
Thu, Oct 10, 3:58 PM · Restricted Project
rampitec added inline comments to D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
Thu, Oct 10, 3:30 PM · Restricted Project
rampitec committed rG19a1a739b15d: [AMDGPU] Handle undef old operand in DPP combine (authored by rampitec).
[AMDGPU] Handle undef old operand in DPP combine
Thu, Oct 10, 2:34 PM
rampitec closed D68813: [AMDGPU] Handle undef old operand in DPP combine.
Thu, Oct 10, 2:34 PM · Restricted Project
rampitec committed rL374455: [AMDGPU] Handle undef old operand in DPP combine.
[AMDGPU] Handle undef old operand in DPP combine
Thu, Oct 10, 2:34 PM
rampitec added a parent revision for D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE: D68813: [AMDGPU] Handle undef old operand in DPP combine.
Thu, Oct 10, 1:27 PM · Restricted Project
rampitec added a child revision for D68813: [AMDGPU] Handle undef old operand in DPP combine: D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
Thu, Oct 10, 1:26 PM · Restricted Project
rampitec created D68828: [AMDGPU] Allow DPP combiner to work with REG_SEQUENCE.
Thu, Oct 10, 1:25 PM · Restricted Project
rampitec accepted D68826: AMDGPU: Fix redundant setting of m0 for atomic load/store.

LGTM

Thu, Oct 10, 1:07 PM
rampitec accepted D68821: AMDGPU: Relax 32-bit SGPR register class.

LGTM

Thu, Oct 10, 11:41 AM
rampitec added a comment to D68338: [AMDGPU] Remove dubious logic in bidirectional list scheduler.

Given the numbers I tend to agree with the change.

Thu, Oct 10, 10:42 AM · Restricted Project
rampitec created D68813: [AMDGPU] Handle undef old operand in DPP combine.
Thu, Oct 10, 9:47 AM · Restricted Project
rampitec accepted D68788: [AMDGPU][MC][GFX6][GFX7][GFX10] Added instructions buffer_atomic_[fcmpswap/fmin/fmax]*.

LGTM

Thu, Oct 10, 9:38 AM · Restricted Project
rampitec committed rGcbe55c7caf4c: [AMDGPU] Fixed dpp_combine.mir with expensive checks. NFC. (authored by rampitec).
[AMDGPU] Fixed dpp_combine.mir with expensive checks. NFC.
Thu, Oct 10, 8:31 AM
rampitec committed rL374365: [AMDGPU] Fixed dpp_combine.mir with expensive checks. NFC..
[AMDGPU] Fixed dpp_combine.mir with expensive checks. NFC.
Thu, Oct 10, 8:31 AM
rampitec added inline comments to D68788: [AMDGPU][MC][GFX6][GFX7][GFX10] Added instructions buffer_atomic_[fcmpswap/fmin/fmax]*.
Thu, Oct 10, 8:12 AM · Restricted Project
rampitec accepted D68787: [AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ds_[read/write]_addtid_b32.

LGTM

Thu, Oct 10, 8:03 AM · Restricted Project
rampitec accepted D68785: [AMDGPU][MC][GFX10] Enabled null for 64-bit dst operands.

LGTM, thanks!

Thu, Oct 10, 8:03 AM · Restricted Project

Wed, Oct 9

rampitec accepted D68742: AMDGPU: Use SGPR_128 instead of SReg_128 for vregs.

LGTM

Wed, Oct 9, 4:49 PM
rampitec committed rGc6dec1d8288c: [AMDGPU] Fixed dpp combine of VOP1 (authored by rampitec).
[AMDGPU] Fixed dpp combine of VOP1
Wed, Oct 9, 3:05 PM
rampitec closed D68729: [AMDGPU] Fixed dpp combine of VOP1.
Wed, Oct 9, 3:04 PM · Restricted Project
rampitec committed rL374241: [AMDGPU] Fixed dpp combine of VOP1.
[AMDGPU] Fixed dpp combine of VOP1
Wed, Oct 9, 3:04 PM
rampitec accepted D68735: AMDGPU: Don't fold copies to physregs.

LGTM

Wed, Oct 9, 3:04 PM
rampitec updated the diff for D68729: [AMDGPU] Fixed dpp combine of VOP1.

Updated test.

Wed, Oct 9, 2:52 PM · Restricted Project
rampitec added inline comments to D68729: [AMDGPU] Fixed dpp combine of VOP1.
Wed, Oct 9, 2:52 PM · Restricted Project
rampitec added inline comments to D68729: [AMDGPU] Fixed dpp combine of VOP1.
Wed, Oct 9, 2:25 PM · Restricted Project
rampitec created D68729: [AMDGPU] Fixed dpp combine of VOP1.
Wed, Oct 9, 2:12 PM · Restricted Project
rampitec accepted D68635: [AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'.

LGTM

Wed, Oct 9, 11:26 AM · Restricted Project

Tue, Oct 8

rampitec created D68673: [AMDGPU] Support mov dpp with 64 bit operands.
Tue, Oct 8, 5:05 PM · Restricted Project
rampitec added inline comments to D68635: [AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'.
Tue, Oct 8, 10:06 AM · Restricted Project
rampitec committed rG8f002193bf49: [AMDGPU] Disable unused gfx10 dpp instructions (authored by rampitec).
[AMDGPU] Disable unused gfx10 dpp instructions
Tue, Oct 8, 9:58 AM
rampitec closed D68607: [AMDGPU] Disable unused gfx10 dpp instructions.
Tue, Oct 8, 9:57 AM · Restricted Project
rampitec committed rL374083: [AMDGPU] Disable unused gfx10 dpp instructions.
[AMDGPU] Disable unused gfx10 dpp instructions
Tue, Oct 8, 9:57 AM

Mon, Oct 7

rampitec created D68607: [AMDGPU] Disable unused gfx10 dpp instructions.
Mon, Oct 7, 4:07 PM · Restricted Project
rampitec accepted D68583: AMDGPU: Fix i16 arithmetic pattern redundancy.

LGTM

Mon, Oct 7, 10:35 AM

Thu, Oct 3

rampitec accepted D68409: AMDGPU/GlobalISel: Support wave32 waterfall loops.

LGTM

Thu, Oct 3, 11:51 AM
rampitec added a comment to D68338: [AMDGPU] Remove dubious logic in bidirectional list scheduler.

I tend to agree with the reasoning. We can only judge after benchmarking though. You say you see more gains than loses. Can you elaborate on a magnitude of those?

I benchmarked on gfx10 with 324 shaders from 14 different games, which I don't think I can share publically.
10 of them slowed down by more than 2%. The worst slow-down was 8.5%.
38 of them sped up by more than 2%. The best speed-up was 45%.

In addition to the overall speed-up, my hope is that it will behave a bit more consistently, because it won't be able to get stuck in the odd state where a good candidate is consistently ignored.

Thu, Oct 3, 11:44 AM · Restricted Project

Wed, Oct 2

rampitec committed rG1384c3a5b896: [AMDGPU] Fix illegal agpr use by VALU (authored by rampitec).
[AMDGPU] Fix illegal agpr use by VALU
Wed, Oct 2, 4:23 PM
rampitec committed rL373544: [AMDGPU] Fix illegal agpr use by VALU.
[AMDGPU] Fix illegal agpr use by VALU
Wed, Oct 2, 4:21 PM