rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (181 w, 1 d)

Recent Activity

Yesterday

rampitec accepted D38166: AMDGPU: Select d16 loads into low component of register.

LGTM

Fri, Sep 22, 7:58 AM

Wed, Sep 20

rampitec accepted D38116: AMDGPU: Add option to stress calls.

LGTM

Wed, Sep 20, 11:02 PM
rampitec accepted D38103: AMDGPU: Fix crash on immediate operand.

LGTM

Wed, Sep 20, 4:26 PM
rampitec added a comment to D37817: [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole. Add new merge SDWA preserve pass.

In any case independent of sub register questions, I think this would be better off done in the existing pass by adding variants with tied operands. This is how I am handling this problem currently in D38070/D38071 for mad_mix

Wed, Sep 20, 11:52 AM
rampitec accepted D38071: AMDGPU: Start selecting v_mad_mixhi_f16.

LGTM

Wed, Sep 20, 11:50 AM
rampitec accepted D38070: AMDGPU: Add tied operands to v_mad_mix{lo|hi}_f16.

LGTM

Wed, Sep 20, 11:38 AM
rampitec accepted D38069: AMDGPU: Start selecting v_mad_mixlo_f16.

LGTM

Wed, Sep 20, 11:34 AM

Tue, Sep 19

rampitec committed rL313723: [AMDGPU] Fixed memory leak with inliner replaced.
[AMDGPU] Fixed memory leak with inliner replaced
Tue, Sep 19, 11:36 PM
rampitec committed rL313718: [AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.ll.
[AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.ll
Tue, Sep 19, 11:11 PM
rampitec committed rL313714: [AMDGPU] Port of HSAIL inliner.
[AMDGPU] Port of HSAIL inliner
Tue, Sep 19, 9:27 PM
rampitec closed D36849: [AMDGPU] Port of HSAIL inliner by committing rL313714: [AMDGPU] Port of HSAIL inliner.
Tue, Sep 19, 9:27 PM
rampitec updated the diff for D36849: [AMDGPU] Port of HSAIL inliner.

Rebase to master.

Tue, Sep 19, 9:23 PM
rampitec committed rL313670: [AMDGPU] Prevent post-RA scheduler from breaking memory clauses.
[AMDGPU] Prevent post-RA scheduler from breaking memory clauses
Tue, Sep 19, 1:56 PM
rampitec closed D38014: [AMDGPU] Prevent post-RA scheduler from breaking memory clauses by committing rL313670: [AMDGPU] Prevent post-RA scheduler from breaking memory clauses.
Tue, Sep 19, 1:56 PM
rampitec updated the diff for D38014: [AMDGPU] Prevent post-RA scheduler from breaking memory clauses.

Added comment.

Tue, Sep 19, 1:40 PM

Mon, Sep 18

rampitec updated the diff for D38014: [AMDGPU] Prevent post-RA scheduler from breaking memory clauses.

Extracted struct from the getPostRAMutations().

Mon, Sep 18, 10:25 PM
rampitec accepted D37887: AMDGPU: Run internalize symbols at -O0.

LGTM

Mon, Sep 18, 10:23 PM
rampitec created D38014: [AMDGPU] Prevent post-RA scheduler from breaking memory clauses.
Mon, Sep 18, 5:38 PM
rampitec added inline comments to D37887: AMDGPU: Run internalize symbols at -O0.
Mon, Sep 18, 5:19 PM
rampitec added a comment to D37887: AMDGPU: Run internalize symbols at -O0.

Can you please restore foo_used and just fix calling convention? This case is not covered now.

That's the new func_used I added, which is the same thing.

Mon, Sep 18, 5:04 PM
rampitec added a comment to D37887: AMDGPU: Run internalize symbols at -O0.

Can you please restore foo_used and just fix calling convention? This case is not covered now.

Mon, Sep 18, 4:58 PM
rampitec added inline comments to D37887: AMDGPU: Run internalize symbols at -O0.
Mon, Sep 18, 4:43 PM
rampitec added inline comments to D37887: AMDGPU: Run internalize symbols at -O0.
Mon, Sep 18, 4:16 PM
rampitec added inline comments to D37887: AMDGPU: Run internalize symbols at -O0.
Mon, Sep 18, 3:01 PM
rampitec added inline comments to D37887: AMDGPU: Run internalize symbols at -O0.
Mon, Sep 18, 2:44 PM
rampitec added inline comments to D37887: AMDGPU: Run internalize symbols at -O0.
Mon, Sep 18, 2:30 PM
rampitec accepted D37985: [AMDGPU] add LDS f32 intrinsics.

LGTM

Mon, Sep 18, 11:23 AM
rampitec added inline comments to D37887: AMDGPU: Run internalize symbols at -O0.
Mon, Sep 18, 11:23 AM

Fri, Sep 15

rampitec added a comment to D37918: AMDGPU: Don't run redundant GlobalDCE.

It is not redundant. It is here to get rid of unused library functions before we start optimizing them.

Fri, Sep 15, 12:19 PM
rampitec added inline comments to D36849: [AMDGPU] Port of HSAIL inliner.
Fri, Sep 15, 11:23 AM
rampitec updated the diff for D36849: [AMDGPU] Port of HSAIL inliner.

Removed MaxBB limit.

Fri, Sep 15, 11:23 AM
rampitec accepted D37605: AMDGPU: Match load d16 hi instructions.

LGTM

Fri, Sep 15, 11:10 AM
rampitec added inline comments to D37887: AMDGPU: Run internalize symbols at -O0.
Fri, Sep 15, 11:05 AM

Thu, Sep 14

rampitec accepted D37857: AMDGPU: Fix violating constant bus restriction.

LGTM

Thu, Sep 14, 1:51 PM
rampitec accepted D37836: AMDGPU: Make frame register caller preserved.

LGTM

Thu, Sep 14, 9:59 AM
rampitec accepted D37839: AMDGPU: Stop modifying SP in call sequences.

LGTM

Thu, Sep 14, 8:49 AM
rampitec added a comment to D37836: AMDGPU: Make frame register caller preserved.

Please add a comment to the source why it was needed.

Thu, Sep 14, 8:48 AM

Wed, Sep 13

rampitec added inline comments to D37817: [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole. Add new merge SDWA preserve pass.
Wed, Sep 13, 3:40 PM
rampitec added a comment to D37817: [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole. Add new merge SDWA preserve pass.

I think we need to decide an overall strategy for dealing with instructions that only partially update the registers. GFX9 really complicated this issue by changing new instructions to preserve the high bits, and adding a control bit to some instructions to change the high bit behavior.

I started dealing with this to add the d16 loads and stores. We can still do this SSA by adding variants of the instructions with tied operands that preserve one half of the instructions, which would probably be less painful than adding another post-SSA pass that needs to deal with liveness. One issue is we still get suboptimal regalloc in some cases, so I'm debating adding new 16-bit subregister classes so subrange liveness tracking works.

Wed, Sep 13, 3:25 PM
rampitec committed rL313208: Allow target to decide when to cluster loads/stores in misched.
Allow target to decide when to cluster loads/stores in misched
Wed, Sep 13, 3:22 PM
rampitec closed D37698: Allow target to decide when to cluster loads/stores in misched by committing rL313208: Allow target to decide when to cluster loads/stores in misched.
Wed, Sep 13, 3:22 PM
rampitec accepted D37780: AMDGPU: Don't spill SP reg like a normal CSR.

LGTM

Wed, Sep 13, 3:17 PM
rampitec updated the diff for D37698: Allow target to decide when to cluster loads/stores in misched.

Changed to let shouldClusterMemOps decide as discussed.

Wed, Sep 13, 2:39 PM
rampitec planned changes to D37698: Allow target to decide when to cluster loads/stores in misched.

As an alternative I can just drop this portion of code from BaseMemOpClusterMutation::clusterNeighboringMemOps() completely:

if (MemOpRecords[Idx].BaseReg != MemOpRecords[Idx+1].BaseReg) {
  ClusterLength = 1;
  continue;
}

and let TII->shouldClusterMemOps() decide. Maybe it is better. Thoughts?

Indeed, that sounds like the best/easiest solution so far!

Wed, Sep 13, 1:57 PM
rampitec abandoned D37755: Change default implementation of doMemOpsHaveSameBasePtr.
Wed, Sep 13, 11:51 AM
rampitec added a comment to D37698: Allow target to decide when to cluster loads/stores in misched.
  • When seeing the name doMemOpsHaveSameBasePtr I would expect this to do exactly BaseReg1 == BaseReg2. Can you explain how you happen to have cases with different base pointers that still reference the same object?
Wed, Sep 13, 11:50 AM
rampitec added a comment to D37755: Change default implementation of doMemOpsHaveSameBasePtr.

Please see my comments in D37698. If the primary purpose of this on AArch64 is to enable the AArch64LoadStoreOptimizer, I'm not sure that this makes sense.

Wed, Sep 13, 9:06 AM
rampitec added inline comments to D37698: Allow target to decide when to cluster loads/stores in misched.
Wed, Sep 13, 8:57 AM

Tue, Sep 12

rampitec added inline comments to D36849: [AMDGPU] Port of HSAIL inliner.
Tue, Sep 12, 9:40 PM
rampitec added inline comments to D37698: Allow target to decide when to cluster loads/stores in misched.
Tue, Sep 12, 12:12 PM
rampitec added a dependency for D37755: Change default implementation of doMemOpsHaveSameBasePtr: D37698: Allow target to decide when to cluster loads/stores in misched.
Tue, Sep 12, 12:11 PM
rampitec added a dependent revision for D37698: Allow target to decide when to cluster loads/stores in misched: D37755: Change default implementation of doMemOpsHaveSameBasePtr.
Tue, Sep 12, 12:11 PM
rampitec created D37755: Change default implementation of doMemOpsHaveSameBasePtr.
Tue, Sep 12, 12:10 PM
rampitec added a comment to D37698: Allow target to decide when to cluster loads/stores in misched.

SIInstrInfo::doMemOpsHaveSameBasePtr looks generic enough to be default indeed but should we commit this first and then make doMemOpsHaveSameBasePtr default so we could rollback to this one in case of severe regressions? LGTM by the way.

Tue, Sep 12, 9:21 AM

Mon, Sep 11

rampitec added inline comments to D37698: Allow target to decide when to cluster loads/stores in misched.
Mon, Sep 11, 9:33 PM
rampitec added inline comments to D37698: Allow target to decide when to cluster loads/stores in misched.
Mon, Sep 11, 8:44 PM
rampitec added a comment to D36849: [AMDGPU] Port of HSAIL inliner.

There should be an explanation of what this pass does and why it is better than LLVM's default inliner and some benchmark data showing which applications / games this helps.

Mon, Sep 11, 11:55 AM
rampitec updated the diff for D36849: [AMDGPU] Port of HSAIL inliner.

Added file brief and more comments for thresholds.

Mon, Sep 11, 11:52 AM
rampitec added inline comments to D36849: [AMDGPU] Port of HSAIL inliner.
Mon, Sep 11, 11:32 AM
rampitec accepted D37701: AMDGPU: Allow coldcc calls.

LGTM

Mon, Sep 11, 11:19 AM
rampitec added inline comments to D36849: [AMDGPU] Port of HSAIL inliner.
Mon, Sep 11, 11:14 AM
rampitec updated the diff for D37698: Allow target to decide when to cluster loads/stores in misched.

Updated comment as suggested.

Mon, Sep 11, 11:12 AM
rampitec updated the diff for D37698: Allow target to decide when to cluster loads/stores in misched.

Renamed callback as suggested by Brian.
Cleanup the test.

Mon, Sep 11, 10:54 AM
rampitec created D37698: Allow target to decide when to cluster loads/stores in misched.
Mon, Sep 11, 10:30 AM
rampitec committed rL312928: [AMDGPU] Produce madak and madmk from the two-address pass.
[AMDGPU] Produce madak and madmk from the two-address pass
Mon, Sep 11, 10:15 AM
rampitec closed D37389: [AMDGPU] Produce madak and madmk from the two-address pass by committing rL312928: [AMDGPU] Produce madak and madmk from the two-address pass.
Mon, Sep 11, 10:15 AM
rampitec added a comment to D37389: [AMDGPU] Produce madak and madmk from the two-address pass.

LGTM with the minor cleanup

Mon, Sep 11, 9:47 AM

Fri, Sep 8

rampitec accepted D37595: AMDGPU: Recompute scc liveness.

LGTM

Fri, Sep 8, 11:47 AM

Thu, Sep 7

rampitec updated the diff for D37389: [AMDGPU] Produce madak and madmk from the two-address pass.

Added f16 case.

Thu, Sep 7, 3:52 PM
rampitec added inline comments to D37389: [AMDGPU] Produce madak and madmk from the two-address pass.
Thu, Sep 7, 3:30 PM
rampitec added a comment to D37595: AMDGPU: Recompute scc liveness.

It shall be caught by the operands analysis below and added only if needed. Do you have test showing problem?

Yes, it's pretty big though. It seems to only remove one s_or_b64 in it though.

Thu, Sep 7, 3:30 PM
rampitec added a comment to D37595: AMDGPU: Recompute scc liveness.

It shall be caught by the operands analysis below and added only if needed. Do you have test showing problem?

Thu, Sep 7, 2:58 PM

Wed, Sep 6

rampitec added inline comments to D37389: [AMDGPU] Produce madak and madmk from the two-address pass.
Wed, Sep 6, 7:30 PM
rampitec updated the diff for D37389: [AMDGPU] Produce madak and madmk from the two-address pass.
Wed, Sep 6, 7:30 PM
rampitec committed rL312676: [AMDGPU] Use v_pk_max_f16 for fcanonicalize.
[AMDGPU] Use v_pk_max_f16 for fcanonicalize
Wed, Sep 6, 3:29 PM
rampitec closed D37325: [AMDGPU] Use v_pk_max_f16 for fcanonicalize by committing rL312676: [AMDGPU] Use v_pk_max_f16 for fcanonicalize.
Wed, Sep 6, 3:28 PM
rampitec added inline comments to D37389: [AMDGPU] Produce madak and madmk from the two-address pass.
Wed, Sep 6, 3:26 PM
rampitec abandoned D37535: [AMDGPU] Fix legalization of VOP3P.

In fact this is all not needed. We just cannot produce such instructions and the testcase was artificial with the bogus td file change.

Wed, Sep 6, 3:17 PM
rampitec planned changes to D37535: [AMDGPU] Fix legalization of VOP3P.

Actually it seems I have misread the documentation. VOP3P still cannot have literals other than inline constants.
I will remove the part about the legalization, but need to keep isOperandLegal part.

Wed, Sep 6, 2:50 PM
rampitec created D37535: [AMDGPU] Fix legalization of VOP3P.
Wed, Sep 6, 2:18 PM
rampitec added inline comments to D37522: [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize.
Wed, Sep 6, 2:07 PM
rampitec updated the diff for D37325: [AMDGPU] Use v_pk_max_f16 for fcanonicalize.

Rebased.

Wed, Sep 6, 11:54 AM
rampitec committed rL312660: [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize.
[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize
Wed, Sep 6, 11:31 AM
rampitec closed D37522: [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize by committing rL312660: [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize.
Wed, Sep 6, 11:31 AM
rampitec added inline comments to D37522: [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize.
Wed, Sep 6, 11:29 AM
rampitec added inline comments to D37325: [AMDGPU] Use v_pk_max_f16 for fcanonicalize.
Wed, Sep 6, 11:00 AM
rampitec retitled D37325: [AMDGPU] Use v_pk_max_f16 for fcanonicalize from [AMDGPU] Use v_pm_max_f16 for fcanonicalize to [AMDGPU] Use v_pk_max_f16 for fcanonicalize.
Wed, Sep 6, 11:00 AM
rampitec created D37522: [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize.
Wed, Sep 6, 10:59 AM
rampitec added inline comments to D37325: [AMDGPU] Use v_pk_max_f16 for fcanonicalize.
Wed, Sep 6, 8:37 AM
rampitec committed rL312640: [AMDGPU] Fix shouldClusterMemOps to process flat loads.
[AMDGPU] Fix shouldClusterMemOps to process flat loads
Wed, Sep 6, 8:33 AM
rampitec closed D37502: [AMDGPU] Fix shouldClusterMemOps to process flat loads by committing rL312640: [AMDGPU] Fix shouldClusterMemOps to process flat loads.
Wed, Sep 6, 8:33 AM
rampitec added a comment to D37502: [AMDGPU] Fix shouldClusterMemOps to process flat loads.

LGTM. However it looks like we should fix TD to follow single naming convention.

Wed, Sep 6, 8:10 AM

Tue, Sep 5

rampitec created D37502: [AMDGPU] Fix shouldClusterMemOps to process flat loads.
Tue, Sep 5, 9:43 PM
rampitec accepted D36831: [AMDGPU] Transform __read_pipe_* and __write_pipe_*.

Thanks!

Tue, Sep 5, 2:32 PM
rampitec added inline comments to D36831: [AMDGPU] Transform __read_pipe_* and __write_pipe_*.
Tue, Sep 5, 2:09 PM
rampitec added inline comments to D36831: [AMDGPU] Transform __read_pipe_* and __write_pipe_*.
Tue, Sep 5, 2:00 PM
rampitec added inline comments to D37325: [AMDGPU] Use v_pk_max_f16 for fcanonicalize.
Tue, Sep 5, 11:01 AM
rampitec accepted D37486: AMDGPU: Cleanup load/store PatFrags.

LGTM

Tue, Sep 5, 10:49 AM
rampitec accepted D37485: AMDGPU: Match store d16_hi instructions.

LGTM

Tue, Sep 5, 10:48 AM
rampitec accepted D37411: AMDGPU: Fix not accounting for tail call resource usage.

LGTM

Tue, Sep 5, 10:46 AM