Page MenuHomePhabricator

rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (338 w, 2 d)

Recent Activity

Thu, Sep 24

rampitec committed rG43804364e2bc: [AMDGPU] Fixes typo in the test. NFC. (authored by rampitec).
[AMDGPU] Fixes typo in the test. NFC.
Thu, Sep 24, 4:08 PM
rampitec accepted D88211: [AMDGPU][MC] Added detection of unsupported instructions.

LGTM

Thu, Sep 24, 11:34 AM · Restricted Project
rampitec committed rG27a62f6317f3: [AMDGPU] global-isel support for RT (authored by rampitec).
[AMDGPU] global-isel support for RT
Thu, Sep 24, 10:30 AM
rampitec closed D87847: [AMDGPU] global-isel support for RT.
Thu, Sep 24, 10:30 AM · Restricted Project

Wed, Sep 23

rampitec committed rG59691dc8740c: [AMDGPU] Make ds fp atomics overloadable (authored by rampitec).
[AMDGPU] Make ds fp atomics overloadable
Wed, Sep 23, 11:40 AM
rampitec closed D87947: [AMDGPU] Make ds fp atomics overloadable.
Wed, Sep 23, 11:40 AM · Restricted Project, Restricted Project

Tue, Sep 22

rampitec accepted D87882: [AMDGPU] Fix merging m0 inits.

OK, thanks Alex.

Tue, Sep 22, 12:45 PM · Restricted Project
rampitec added a reviewer for D87882: [AMDGPU] Fix merging m0 inits: alex-t.
Tue, Sep 22, 10:19 AM · Restricted Project
rampitec accepted D87864: AMDGPU: Check global FP atomics match default FP mode.

LGTM

Tue, Sep 22, 9:56 AM · Restricted Project
rampitec accepted D87748: [AMDGPU] Consider all SGPR uses as unique in constant bus verify.

LGTM

Tue, Sep 22, 9:43 AM · Restricted Project
rampitec added a comment to D87882: [AMDGPU] Fix merging m0 inits.
bb.1:
  ..
  TO_inst   (inits m0)
  ...       (uses m0)
  FROM_inst (clobbers m0)
  ...
  S_CBRANCH_VCCZ %bb.1, undef
  S_BRANCH %bb.2

I guess I am missing something here, the dominance usually refers to blocks so it can be confusing if applied for instructions.

In this case I expect MDT.dominates(FROM_inst, TO_inst) to return false, because it is possible for a program invocation to execute TO_inst, without executing FROM_inst first (e.g. single loop iteration).

Anyway, the spirit of the code I am editing was that before my patch it was assumed that if FROM_inst comes after TO_inst in a basic block (as in the case described above) then there was no path from FROM_inst to TO_inst, so TO_inst could be removed. This is not true for loops, where FROM_inst will clober m0 used in the next iteration.

Tue, Sep 22, 9:40 AM · Restricted Project

Mon, Sep 21

rampitec updated the diff for D87847: [AMDGPU] global-isel support for RT.

Added legalization. Code is expectedly worse though.

Mon, Sep 21, 3:08 PM · Restricted Project
rampitec accepted D88028: [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector.

LGTM. Looks like Matt has no concerns about Global ISel as well.

Mon, Sep 21, 12:26 PM · Restricted Project
rampitec added a reviewer for D87947: [AMDGPU] Make ds fp atomics overloadable: yaxunl.
Mon, Sep 21, 12:09 PM · Restricted Project, Restricted Project
rampitec added a comment to D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling.

This is obviously LGTM from the AMDGPU BE point of view, we did it ourselves.

Mon, Sep 21, 12:01 PM · Restricted Project, Restricted Project
rampitec committed rGe8951474b194: [AMDGPU] Fixed typo in intrinsic comment. NFC. (authored by rampitec).
[AMDGPU] Fixed typo in intrinsic comment. NFC.
Mon, Sep 21, 11:53 AM
rampitec added a comment to D87882: [AMDGPU] Fix merging m0 inits.

Doesn't loop block self dominate?

You mean, why the condition "MDT.dominates(From, To)" returns false if From and To are in the same BB? Inside that function there is a bb dominance check if the instructions are in different blocks, but if the instructions are in the same block the code checks whether From comes before To.

The bug I am trying to fix is simplified in the test m0-in-loop-0 where before my patch SI_INIT from line 333 was incorrectly removed. Although "m0 = COPY" comes after SI_INIT in that bb, this is a loop and in the second iteration DS_WRITE would see a clobbered m0, so SI_INIT has to be preserved.

That sounds like a bug in the dominate() to me.

MDT.dominates() gives correct answer. In the failing case "From" (m0 copy, line 335 in the test) is inside the same block as "To" (SI_INIT, line 333). "From" comes after "To", and "From" does not dominate "To" (so MDT.dominates() return false as expected), but "To" is reachable from "From" from previous iteration, so "To" cannot be removed.

Mon, Sep 21, 11:51 AM · Restricted Project
rampitec added a reviewer for D87882: [AMDGPU] Fix merging m0 inits: resistor.
Mon, Sep 21, 11:47 AM · Restricted Project
rampitec added a comment to D88028: [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector.

Can this appear later in the codegen? It also does not cover global isel, so part in the operand folding probably needs to remain in addition to patterns.

Mon, Sep 21, 9:59 AM · Restricted Project

Fri, Sep 18

rampitec added inline comments to D87947: [AMDGPU] Make ds fp atomics overloadable.
Fri, Sep 18, 4:20 PM · Restricted Project, Restricted Project
rampitec added inline comments to D87947: [AMDGPU] Make ds fp atomics overloadable.
Fri, Sep 18, 4:13 PM · Restricted Project, Restricted Project
rampitec added inline comments to D87947: [AMDGPU] Make ds fp atomics overloadable.
Fri, Sep 18, 4:09 PM · Restricted Project, Restricted Project
rampitec requested review of D87947: [AMDGPU] Make ds fp atomics overloadable.
Fri, Sep 18, 2:43 PM · Restricted Project, Restricted Project
rampitec added a comment to D87882: [AMDGPU] Fix merging m0 inits.

Doesn't loop block self dominate?

You mean, why the condition "MDT.dominates(From, To)" returns false if From and To are in the same BB? Inside that function there is a bb dominance check if the instructions are in different blocks, but if the instructions are in the same block the code checks whether From comes before To.

The bug I am trying to fix is simplified in the test m0-in-loop-0 where before my patch SI_INIT from line 333 was incorrectly removed. Although "m0 = COPY" comes after SI_INIT in that bb, this is a loop and in the second iteration DS_WRITE would see a clobbered m0, so SI_INIT has to be preserved.

Fri, Sep 18, 2:00 PM · Restricted Project
rampitec accepted D87934: AMDGPU: Don't add frame register to frame pseudos.
Fri, Sep 18, 1:09 PM · Restricted Project
rampitec added a comment to D87882: [AMDGPU] Fix merging m0 inits.

Doesn't loop block self dominate?

Fri, Sep 18, 10:32 AM · Restricted Project
rampitec accepted D87542: AMDGPU: Don't sometimes allow instructions before lowered si_end_cf.
Fri, Sep 18, 10:30 AM · Restricted Project
rampitec added a comment to D87864: AMDGPU: Check global FP atomics match default FP mode.

I think you need to drop denorm checks and move the check outside of the address space check.

But the DS atomics do respect the denormal mode

Fri, Sep 18, 8:56 AM · Restricted Project
rampitec added a comment to D87864: AMDGPU: Check global FP atomics match default FP mode.

I think you need to drop denorm checks and move the check outside of the address space check.

Fri, Sep 18, 8:25 AM · Restricted Project

Thu, Sep 17

rampitec added inline comments to D87864: AMDGPU: Check global FP atomics match default FP mode.
Thu, Sep 17, 10:16 PM · Restricted Project
rampitec added inline comments to D87847: [AMDGPU] global-isel support for RT.
Thu, Sep 17, 6:00 PM · Restricted Project
rampitec updated the diff for D87847: [AMDGPU] global-isel support for RT.

Actually it can be done with a singe v_perm_b32.

Thu, Sep 17, 5:43 PM · Restricted Project
rampitec updated the diff for D87847: [AMDGPU] global-isel support for RT.

Replaced v_pack with two instructions.

Thu, Sep 17, 5:20 PM · Restricted Project
rampitec updated the diff for D87847: [AMDGPU] global-isel support for RT.
Thu, Sep 17, 4:50 PM · Restricted Project
rampitec added inline comments to D87847: [AMDGPU] global-isel support for RT.
Thu, Sep 17, 4:36 PM · Restricted Project
rampitec added inline comments to D87847: [AMDGPU] global-isel support for RT.
Thu, Sep 17, 4:32 PM · Restricted Project
rampitec added inline comments to D87864: AMDGPU: Check global FP atomics match default FP mode.
Thu, Sep 17, 4:19 PM · Restricted Project
rampitec added inline comments to D87847: [AMDGPU] global-isel support for RT.
Thu, Sep 17, 3:56 PM · Restricted Project
rampitec added inline comments to D87864: AMDGPU: Check global FP atomics match default FP mode.
Thu, Sep 17, 3:48 PM · Restricted Project
rampitec added a reviewer for D87847: [AMDGPU] global-isel support for RT: arsenm.
Thu, Sep 17, 11:50 AM · Restricted Project
rampitec added a comment to D87782: [AMDGPU] gfx1030 RT support.

Are you planning on adding the globalisel support?

Thu, Sep 17, 11:50 AM · Restricted Project
rampitec requested review of D87847: [AMDGPU] global-isel support for RT.
Thu, Sep 17, 11:49 AM · Restricted Project

Wed, Sep 16

rampitec committed rGa45cdb311f6e: [AMDGPU] gfx1030 test update. NFC. (authored by rampitec).
[AMDGPU] gfx1030 test update. NFC.
Wed, Sep 16, 2:00 PM
rampitec added a comment to D87757: [SplitKit] Only copy live lanes.

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

There is very high chance this test in unfixable with your changes as these prevent the very situation where the KIILs were formed. I'd say you may just generate checks there. At the end this code snippet did not compile and asserted, so we need to make sure it compiles. Bundles themselves are just an implementation detail.

How about this? I dropped the ASM RUN line because it's awkward to use update_llc_test_checks and update_mir_test_checks in the same file.

Wed, Sep 16, 12:45 PM · Restricted Project
rampitec added a comment to D87782: [AMDGPU] gfx1030 RT support.

Are you planning on adding the globalisel support?

Wed, Sep 16, 11:42 AM · Restricted Project
rampitec committed rG91f503c3af19: [AMDGPU] gfx1030 RT support (authored by rampitec).
[AMDGPU] gfx1030 RT support
Wed, Sep 16, 11:41 AM
rampitec closed D87782: [AMDGPU] gfx1030 RT support.
Wed, Sep 16, 11:41 AM · Restricted Project
rampitec requested review of D87782: [AMDGPU] gfx1030 RT support.
Wed, Sep 16, 11:35 AM · Restricted Project
rampitec added a comment to D87757: [SplitKit] Only copy live lanes.

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

Wed, Sep 16, 10:34 AM · Restricted Project
rampitec accepted D87472: AMDGPU: Clear offset register when using local stack area.
Wed, Sep 16, 8:32 AM · Restricted Project
rampitec added inline comments to D87472: AMDGPU: Clear offset register when using local stack area.
Wed, Sep 16, 8:30 AM · Restricted Project

Tue, Sep 15

rampitec committed rG277de43d88c9: [AMDGPU] Unify intrinsic ret/nortn interface (authored by rampitec).
[AMDGPU] Unify intrinsic ret/nortn interface
Tue, Sep 15, 3:37 PM
rampitec closed D87719: [AMDGPU] Unify intrinsic ret/nortn interface.
Tue, Sep 15, 3:36 PM · Restricted Project
rampitec added a reviewer for D87719: [AMDGPU] Unify intrinsic ret/nortn interface: kzhuravl.
Tue, Sep 15, 2:13 PM · Restricted Project
rampitec requested review of D87719: [AMDGPU] Unify intrinsic ret/nortn interface.
Tue, Sep 15, 1:18 PM · Restricted Project
rampitec accepted D87716: AMDGPU: Improve <2 x i24> arguments and return value handling.

LGTM

Tue, Sep 15, 1:14 PM · Restricted Project
rampitec accepted D87709: InferAddressSpaces: Fix assert with unreachable code.

LGTM

Tue, Sep 15, 11:24 AM · Restricted Project
rampitec accepted D87618: [AMDGPU] should expand ROTL i16 to shifts..

LGTM

Tue, Sep 15, 10:57 AM · Restricted Project
rampitec added a comment to D87585: [AMDGPU] Dynamically clear renamable to avoid constant bus errors.

The numbers for this change are not vastly compelling.
I looked at 11598 game shaders and compiled these for GFX7, GFX9 and GFX10.
On GFX7, 1 shader lost 1 instruction.
On GFX9, 1 shader lost 1 instruction, but 64 shaders gained 1 instruction.
On GFX10, 1 shader lost 1 instruction, but 2 shaders gained 1 instruction.

I started this change as I am looking at moving WQM after MI scheduling, and this potentially leaves some additional copies around.
But I could addressed these with very limited special case copy elimination in the WQM pass itself.

Does anyone have an opinion on whether I should continue pushing this?

Tue, Sep 15, 8:06 AM · Restricted Project

Mon, Sep 14

rampitec accepted D87556: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

LGTM, but please also wait for Matt's review.

Mon, Sep 14, 11:28 AM · Restricted Project
rampitec accepted D87621: [AMDGPU] Add XDL resource to scheduling model.

LGTM

Mon, Sep 14, 11:19 AM · Restricted Project
rampitec added inline comments to D87585: [AMDGPU] Dynamically clear renamable to avoid constant bus errors.
Mon, Sep 14, 10:29 AM · Restricted Project
rampitec added a comment to D87618: [AMDGPU] should expand ROTL i16 to shifts..

Needs test.

Mon, Sep 14, 10:09 AM · Restricted Project

Wed, Sep 9

rampitec accepted D87419: AMDGPU: Skip all meta instructions in hazard recognizer.
Wed, Sep 9, 3:31 PM · Restricted Project
rampitec accepted D87398: AMDGPU: Fix inserting waitcnts before kill uses.
Wed, Sep 9, 10:48 AM · Restricted Project
rampitec accepted D87158: [AMDGPU] Fix for folding v2.16 literals..

OK, thanks!

Wed, Sep 9, 9:58 AM · Restricted Project

Tue, Sep 8

rampitec added inline comments to D87158: [AMDGPU] Fix for folding v2.16 literals..
Tue, Sep 8, 12:06 PM · Restricted Project
rampitec added a comment to D87107: [AMDGPU] Target hook to apply target specific split constraint.

The idea is:

For the block that is queried

  1. Look for it's predecessors that can pass control through the S_EXECZ/EXECNZ
  2. If found one, look for exec restoring code starting the beginning of the block being queried.
  3. Since exec restoring code always belong to the block prologue, search the prologue and if not found return false.

Considering your comment that exec == 0 does not matter, we'd rather search upwards before the immediate dominator block in encountered to check what we met first - exec modify or exec restore. The problem here is that XOR can be both.

Tue, Sep 8, 10:28 AM · Restricted Project
rampitec added inline comments to D87158: [AMDGPU] Fix for folding v2.16 literals..
Tue, Sep 8, 10:09 AM · Restricted Project
rampitec accepted D87198: [AMDGPU] Correct gfx1031 XNACK setting documentation.

LGTM

Tue, Sep 8, 9:47 AM · Restricted Project

Fri, Sep 4

rampitec added a comment to D87107: [AMDGPU] Target hook to apply target specific split constraint.

Also I still think that disabling a whole "endif" block is an overkill.

It only is disabled if S_OR_B64 exec, ... is in the middle of the block that should never happen.

while (isBasicBlockPrologue(*J)) {
  if (IsExecRestore(&*J))
    return true;

assumes that if exec is restored in the block prologue it is valid

So practically it never happens and split is effectively only disabled in an empty block? I said it already: it does not matter that exec is zero, what matters is that it does not match. It does not matter that a block is empty as well, it is enough to split before s_or to hit the bug.

Let's forget about exec == 0...
The code snippet above looks for exec restoring instruction in the block prologue and return true if it exists. So, false positive can only happen if there is no exec restoring instruction in the beginning of the block but it is placed in the middle of the block. In this case I report the block is invalid for splitting despite it is valid to split after the exec restoring instruction in the middle of the block. Do we really have exec restored in the middle of the block?

Fri, Sep 4, 2:13 PM · Restricted Project
rampitec added inline comments to D87158: [AMDGPU] Fix for folding v2.16 literals..
Fri, Sep 4, 2:09 PM · Restricted Project
rampitec added a comment to D87107: [AMDGPU] Target hook to apply target specific split constraint.

Also I still think that disabling a whole "endif" block is an overkill.

It only is disabled if S_OR_B64 exec, ... is in the middle of the block that should never happen.

while (isBasicBlockPrologue(*J)) {
  if (IsExecRestore(&*J))
    return true;

assumes that if exec is restored in the block prologue it is valid

Fri, Sep 4, 12:23 PM · Restricted Project
rampitec added a comment to D87107: [AMDGPU] Target hook to apply target specific split constraint.

I still do not believe the problem is specific to EXEC = 0 case. In fact the problem could occur with any EXEC value, it is sufficient to have it different from what is expected. That is not valid to insert a split into any block before exec is restored, not just if previous value was zero.

Fri, Sep 4, 10:55 AM · Restricted Project

Thu, Sep 3

rampitec added a comment to D87107: [AMDGPU] Target hook to apply target specific split constraint.

I do not think it shall be at a block level. You can allow splitting after all exec modifications are done. For example:

Thu, Sep 3, 12:47 PM · Restricted Project

Wed, Sep 2

rampitec accepted D87039: AMDGPU: Remove code to handle tied si_else operands.

LGTM, thanks. I never really understood why are we doing this copy.

Wed, Sep 2, 10:09 AM · Restricted Project

Tue, Sep 1

rampitec accepted D84252: [amdgpu] Run SROA after loop unrolling..

LGTM with a nit: please run opt -instnamer on the test before submission.

Tue, Sep 1, 12:54 PM · Restricted Project
rampitec added inline comments to D84252: [amdgpu] Run SROA after loop unrolling..
Tue, Sep 1, 12:28 PM · Restricted Project
rampitec accepted D86940: [AMDGPU][MC] Corrected parser to avoid excessive error messages.

LGTM

Tue, Sep 1, 11:05 AM · Restricted Project
rampitec accepted D86938: [AMDGPU] Fix offset for REL32_HI relocs.

LGTM even if we need to rework it as Matt suggests.

Tue, Sep 1, 10:51 AM · Restricted Project
rampitec accepted D86634: [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block.

LGTM, but please run PSDB before submission. This stuff is quite nontrivial.

Tue, Sep 1, 10:46 AM · Restricted Project

Mon, Aug 31

rampitec added a comment to D86634: [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block.

The only difference is that now these redundant branch is inserted by MachineBasicBlock::updateTerminator() as Matt suggested.

Mon, Aug 31, 1:46 PM · Restricted Project
rampitec added a comment to D86833: [AMDGPU] AMDGPUAAResult::pointsToConstantMemory should not use the default MaxLookup (i.e., 6) to limit getUnderlyingObject.

Unlimited is really too aggressive. It can slow down compilation dramatically in some cases.
Also it would be nice to see a relevant test.

Mon, Aug 31, 11:43 AM · Restricted Project
rampitec accepted D86884: AMDGPU: Convert test to MIR.
Mon, Aug 31, 11:40 AM · Restricted Project

Aug 28 2020

rampitec added inline comments to D86634: [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block.
Aug 28 2020, 2:01 PM · Restricted Project
rampitec accepted D86806: AMDGPU: Fix incorrectly deleting copies after spilling SGPR tuples.
Aug 28 2020, 1:18 PM · Restricted Project
rampitec added inline comments to D86634: [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block.
Aug 28 2020, 11:39 AM · Restricted Project

Aug 27 2020

rampitec accepted D86738: AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize.
Aug 27 2020, 3:08 PM · Restricted Project
rampitec added inline comments to D86634: [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block.
Aug 27 2020, 12:57 PM · Restricted Project
rampitec added a reviewer for D86684: [Refactor] Add the SchedHeuristic for Scheduler to allow platform customizing the heuristics: kerbowa.
Aug 27 2020, 10:28 AM · Restricted Project
rampitec accepted D86717: AMDGPU: Use caller subtarget, not intrinsic declaration.

LGTM

Aug 27 2020, 10:25 AM · Restricted Project

Aug 26 2020

rampitec added inline comments to D86634: [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block.
Aug 26 2020, 10:25 AM · Restricted Project
rampitec accepted D86627: AMDGPU: Don't assert on misaligned DS read2/write2 offsets.
Aug 26 2020, 9:46 AM · Restricted Project

Aug 25 2020

rampitec committed rGb7760c3e5d00: [AMDGPU] Remove unsound dependency on ISA version in waitcnt (authored by rampitec).
[AMDGPU] Remove unsound dependency on ISA version in waitcnt
Aug 25 2020, 2:02 PM
rampitec closed D86566: [AMDGPU] Remove unsound dependency on ISA version in waitcnt.
Aug 25 2020, 2:02 PM · Restricted Project
rampitec committed rG817c831f023a: [AMDGPU] Switch to named simm16 in vscnt insertion (authored by rampitec).
[AMDGPU] Switch to named simm16 in vscnt insertion
Aug 25 2020, 1:05 PM
rampitec closed D86568: [AMDGPU] Switch to named simm16 in vscnt insertion.
Aug 25 2020, 1:05 PM · Restricted Project
rampitec requested review of D86568: [AMDGPU] Switch to named simm16 in vscnt insertion.
Aug 25 2020, 12:15 PM · Restricted Project
rampitec requested review of D86566: [AMDGPU] Remove unsound dependency on ISA version in waitcnt.
Aug 25 2020, 12:01 PM · Restricted Project

Aug 24 2020

rampitec accepted D85846: AMDGPU/GlobalISel: Handle AGPRs used for SGPR operands..
Aug 24 2020, 12:12 PM · Restricted Project