Page MenuHomePhabricator

rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (254 w, 5 d)

Recent Activity

Thu, Feb 14

rampitec committed rG871821f78680: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU (authored by rampitec).
[AMDGPU] Ressociate 'add (add x, y), z' to use SALU
Thu, Feb 14, 2:11 PM
rampitec committed rL354066: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.
[AMDGPU] Ressociate 'add (add x, y), z' to use SALU
Thu, Feb 14, 2:11 PM
rampitec closed D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.
Thu, Feb 14, 2:11 PM · Restricted Project
rampitec added a reviewer for D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU: arsenm.
Thu, Feb 14, 12:58 PM · Restricted Project
rampitec updated the diff for D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.

Added v2i32 test.

Thu, Feb 14, 11:46 AM · Restricted Project
rampitec updated the diff for D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.

Moved VT check right into function.

Thu, Feb 14, 11:31 AM · Restricted Project
rampitec updated the diff for D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.

Factored out reassociateScalarOps().

Thu, Feb 14, 11:15 AM · Restricted Project

Wed, Feb 13

rampitec added inline comments to D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.
Wed, Feb 13, 9:15 PM · Restricted Project
rampitec updated the diff for D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.

Removed flags.
Added gfx9 run line.

Wed, Feb 13, 9:14 PM · Restricted Project
rampitec added inline comments to D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.
Wed, Feb 13, 5:45 PM · Restricted Project
rampitec created D58220: [AMDGPU] Ressociate 'add (add x, y), z' to use SALU.
Wed, Feb 13, 4:51 PM · Restricted Project

Tue, Feb 12

rampitec accepted D58151: AMDGPU: Ignore CodeObjectV3 when inlining.

LGTM

Tue, Feb 12, 3:23 PM
rampitec accepted D58143: AMDGPU: Try to use function specific ST.

LGTM

Tue, Feb 12, 12:32 PM

Mon, Feb 11

rampitec added a comment to D55301: RegAlloc: Allow targets to split register allocation.

ping

Mon, Feb 11, 9:58 AM

Fri, Feb 8

rampitec committed rG344968fdb4d9: [AMDGPU] Split idot4/8 signed and unsigned tests. NFC. (authored by rampitec).
[AMDGPU] Split idot4/8 signed and unsigned tests. NFC.
Fri, Feb 8, 5:03 PM
rampitec committed rL353593: [AMDGPU] Split idot4/8 signed and unsigned tests. NFC..
[AMDGPU] Split idot4/8 signed and unsigned tests. NFC.
Fri, Feb 8, 5:02 PM
rampitec committed rG0e858b028db6: [AMDGPU] Split dot-insts feature (authored by rampitec).
[AMDGPU] Split dot-insts feature
Fri, Feb 8, 4:36 PM
rampitec committed rG1607a37308cc: [AMDGPU] Split dot-insts feature (authored by rampitec).
[AMDGPU] Split dot-insts feature
Fri, Feb 8, 4:36 PM
rampitec committed rC353588: [AMDGPU] Split dot-insts feature.
[AMDGPU] Split dot-insts feature
Fri, Feb 8, 4:36 PM
rampitec committed rL353588: [AMDGPU] Split dot-insts feature.
[AMDGPU] Split dot-insts feature
Fri, Feb 8, 4:36 PM
rampitec closed D57972: [AMDGPU] Split dot-insts feature.
Fri, Feb 8, 4:36 PM · Restricted Project
rampitec committed rL353587: [AMDGPU] Split dot-insts feature.
[AMDGPU] Split dot-insts feature
Fri, Feb 8, 4:36 PM
rampitec closed D57971: [AMDGPU] Split dot-insts feature.
Fri, Feb 8, 4:35 PM · Restricted Project
rampitec updated the diff for D57971: [AMDGPU] Split dot-insts feature.
Fri, Feb 8, 12:38 PM · Restricted Project
rampitec created D57972: [AMDGPU] Split dot-insts feature.
Fri, Feb 8, 12:34 PM · Restricted Project
rampitec created D57971: [AMDGPU] Split dot-insts feature.
Fri, Feb 8, 12:32 PM · Restricted Project
rampitec accepted D57886: AMDGPU: Eliminate GPU specific SubtargetFeatures.

LGTM

Fri, Feb 8, 11:22 AM
rampitec accepted D57885: AMDGPU: Remove GCN features and predicates.

LGTM

Fri, Feb 8, 9:57 AM

Thu, Feb 7

rampitec accepted D57889: [AMDGPU][MC] Added support of lds_direct operand.

LGTM

Thu, Feb 7, 11:36 AM · Restricted Project

Mon, Feb 4

rampitec accepted D57708: AMDGPU: Don't rematerialize mov with implicit operands.

LGTM

Mon, Feb 4, 1:11 PM
rampitec added inline comments to D57708: AMDGPU: Don't rematerialize mov with implicit operands.
Mon, Feb 4, 1:08 PM
rampitec accepted D57707: Add Triple::isAMDGPU.

LGTM

Mon, Feb 4, 12:30 PM

Mon, Jan 28

rampitec accepted D57349: AMDGPU: Add ds append/consume builtins.

LGTM

Mon, Jan 28, 12:01 PM
rampitec accepted D57340: AMDGPU: Add DS append/consume intrinsics.

LGTM

Mon, Jan 28, 11:51 AM
rampitec added inline comments to D57340: AMDGPU: Add DS append/consume intrinsics.
Mon, Jan 28, 10:06 AM

Tue, Jan 22

rampitec accepted D57008: [AMDGPU] With XNACK, cannot clause a load with result coalesced with operand.

LGTM

Tue, Jan 22, 10:03 AM
rampitec accepted D53966: Codegen support for atomicrmw fadd/fsub.

LGTM

Tue, Jan 22, 10:01 AM

Jan 21 2019

rampitec committed rL351759: [AMDGPU] Fixed hazard recognizer to walk predecessors.
[AMDGPU] Fixed hazard recognizer to walk predecessors
Jan 21 2019, 11:11 AM
rampitec closed D56923: [AMDGPU] Fixed hazard recognizer to walk predecessors.
Jan 21 2019, 11:11 AM
rampitec added inline comments to D57008: [AMDGPU] With XNACK, cannot clause a load with result coalesced with operand.
Jan 21 2019, 10:05 AM
rampitec added inline comments to D57008: [AMDGPU] With XNACK, cannot clause a load with result coalesced with operand.
Jan 21 2019, 10:03 AM

Jan 19 2019

rampitec added inline comments to D56923: [AMDGPU] Fixed hazard recognizer to walk predecessors.
Jan 19 2019, 12:18 AM

Jan 18 2019

rampitec updated the diff for D56923: [AMDGPU] Fixed hazard recognizer to walk predecessors.

Addressed review comments.

Jan 18 2019, 12:38 PM
rampitec added inline comments to D56923: [AMDGPU] Fixed hazard recognizer to walk predecessors.
Jan 18 2019, 12:20 PM
rampitec created D56923: [AMDGPU] Fixed hazard recognizer to walk predecessors.
Jan 18 2019, 11:10 AM

Jan 17 2019

rampitec accepted D56847: [AMDGPU][MC] Disabled use of 2 different literals with SOP2/SOPC instructions.

LGTM

Jan 17 2019, 10:14 AM

Jan 16 2019

rampitec accepted D56454: AMDGPU: Adjust the chain for loads writing to the HI part of a register..

LGTM

Jan 16 2019, 1:27 PM
rampitec added inline comments to D56454: AMDGPU: Adjust the chain for loads writing to the HI part of a register..
Jan 16 2019, 1:06 PM

Jan 15 2019

rampitec accepted D56745: AMDGPU: Raise the priority of MAD24 in instruction selection..

LGTM

Jan 15 2019, 3:09 PM
rampitec added inline comments to D56745: AMDGPU: Raise the priority of MAD24 in instruction selection..
Jan 15 2019, 2:50 PM

Jan 9 2019

rampitec committed rC350794: [AMDGPU] Separate feature dot-insts.
[AMDGPU] Separate feature dot-insts
Jan 9 2019, 7:29 PM
rampitec committed rL350794: [AMDGPU] Separate feature dot-insts.
[AMDGPU] Separate feature dot-insts
Jan 9 2019, 7:29 PM
rampitec closed D56525: [AMDGPU] Separate feature dot-insts.
Jan 9 2019, 7:29 PM
rampitec committed rL350793: [AMDGPU] Separate feature dot-insts.
[AMDGPU] Separate feature dot-insts
Jan 9 2019, 7:29 PM
rampitec closed D56524: [AMDGPU] Separate feature dot-insts.
Jan 9 2019, 7:29 PM
rampitec created D56525: [AMDGPU] Separate feature dot-insts.
Jan 9 2019, 5:00 PM
rampitec created D56524: [AMDGPU] Separate feature dot-insts.
Jan 9 2019, 5:00 PM

Jan 8 2019

rampitec committed rL350684: Remove check for single use in ShrinkDemandedConstant.
Remove check for single use in ShrinkDemandedConstant
Jan 8 2019, 6:28 PM
rampitec closed D56406: Remove check for single use in ShrinkDemandedConstant.
Jan 8 2019, 6:28 PM
rampitec added a comment to D56454: AMDGPU: Adjust the chain for loads writing to the HI part of a register..

Can you add a test where low half does not produce a chain? An arithmetic operation and an undef.

Jan 8 2019, 2:37 PM
rampitec created D56451: [AMDGPU] Expand fdot2 intrinsic if unsupported.
Jan 8 2019, 1:11 PM
rampitec updated the diff for D56406: Remove check for single use in ShrinkDemandedConstant.

Removed call to ShrinkDemandedConstant completely.

Jan 8 2019, 9:59 AM
rampitec added a comment to D56406: Remove check for single use in ShrinkDemandedConstant.

Doesn't SimplifyDemandedBits already call ShrinkDemandedConstant for AND/OR/XOR? And without an override of targetShrinkDemandedConstant, ShrinkDemandedConstant only works on those 3 nodes. So is an explicit call to ShrinkDemandedConstant even needed here?

Jan 8 2019, 9:56 AM
rampitec added a comment to D56406: Remove check for single use in ShrinkDemandedConstant.

Something seems wrong here. How/why is this valid on AArch64 in some situation? A test is really needed for whatever this regression is

Jan 8 2019, 8:23 AM

Jan 7 2019

rampitec added inline comments to D56291: ScheduleDAG: Don't break the dependence in clustering neighboring loads..
Jan 7 2019, 10:57 PM
rampitec added a comment to D56289: Added single use check to ShrinkDemandedConstant.

I am seeing a regression with this change on AArch64, where we do not use shrinked constants, e.g. for and/or if they have multiple users. I am not entirely sure what the problem is this patch is solving, as I am not familiar with AMDGPU. Is it a case where shrinking the constant leads to shrinking the related Op and in the end this introduces unnecessary conversion code, as multiple users require different bitwidths?

If that is the case, I am not sure if this is the best place to fix the issue. IIUC, shrinkDemandedBits just shrinks the constant of the passed in Op, but not the width of the result type of Op. So it seems to me that shrinking the constant should be independent of the users of Op.

Here is DAG exposing the problem:

t33: i32 = or t42, Constant:i32<-2147483647>
        t37: f32 = bitcast t33
        t66: f32 = CVT_F32_UBYTE0 t33
      t38: f32 = fadd t37, t66

The node t66 only needs a low byte from the t33 "or" node. The other use of "or" is bitcast and subsequently t38 fadd which needs full dword.
The "Op" argument of ShrinkDemandedConstant() is t33 "or" and Demanded = 0xff as needed for t66.

If optimization succeeds it will replace "or t42, 0x80000001" with "or t42, 1". However, this "or" has another use which needs the full range of the value.
That is a bug not to take other uses into account.

It is unfortunate this lead to regression on AArch64 side. Are you sure a similar issue can never happen with AArch64?
Theoretically that is possible to move a single use check past targetShrinkDemandedConstant() call which actually do the job for AArch I presume.
I think it should correct lack of optimization in that specific case, but I am not sure there will be no correctness issues elsewhere. Here is what I am speaking about:

--- a/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -350,13 +350,13 @@ bool TargetLowering::ShrinkDemandedConstant(SDValue Op, const APInt &Demanded,
   SDLoc DL(Op);
   unsigned Opcode = Op.getOpcode();

-  if (!Op.hasOneUse())
-    return false;
-
   // Do target-specific constant optimization.
   if (targetShrinkDemandedConstant(Op, Demanded, TLO))
     return TLO.New.getNode();

+  if (!Op.hasOneUse())
+    return false;
+
   // FIXME: ISD::SELECT, ISD::SELECT_CC
   switch (Opcode) {
   default:

In fact the bit width of all operations is the same. It just t66 only known to read low byte from its operand. I believe it worth nothing t66 is custom AMDGPU node.
Consider t66 is a shift left by 24:

t33: i32 = or t42, Constant:i32<-2147483647>
        t66: i32 = shl t33, 24
        t38: i32 = add t33, t66

There is nothing target specific here. t66 also only known to read low 8 bits of t33 and that is not unreasonable to call ShrinkDemandedConstant on t33 with Demanded = 0xff. It will expose the same bug regardless of target.

Jan 7 2019, 1:14 PM
rampitec created D56406: Remove check for single use in ShrinkDemandedConstant.
Jan 7 2019, 1:12 PM
rampitec added a comment to D56289: Added single use check to ShrinkDemandedConstant.

I am seeing a regression with this change on AArch64, where we do not use shrinked constants, e.g. for and/or if they have multiple users. I am not entirely sure what the problem is this patch is solving, as I am not familiar with AMDGPU. Is it a case where shrinking the constant leads to shrinking the related Op and in the end this introduces unnecessary conversion code, as multiple users require different bitwidths?

If that is the case, I am not sure if this is the best place to fix the issue. IIUC, shrinkDemandedBits just shrinks the constant of the passed in Op, but not the width of the result type of Op. So it seems to me that shrinking the constant should be independent of the users of Op.

Here is DAG exposing the problem:

t33: i32 = or t42, Constant:i32<-2147483647>
        t37: f32 = bitcast t33
        t66: f32 = CVT_F32_UBYTE0 t33
      t38: f32 = fadd t37, t66

The node t66 only needs a low byte from the t33 "or" node. The other use of "or" is bitcast and subsequently t38 fadd which needs full dword.
The "Op" argument of ShrinkDemandedConstant() is t33 "or" and Demanded = 0xff as needed for t66.

If optimization succeeds it will replace "or t42, 0x80000001" with "or t42, 1". However, this "or" has another use which needs the full range of the value.
That is a bug not to take other uses into account.

It is unfortunate this lead to regression on AArch64 side. Are you sure a similar issue can never happen with AArch64?
Theoretically that is possible to move a single use check past targetShrinkDemandedConstant() call which actually do the job for AArch I presume.
I think it should correct lack of optimization in that specific case, but I am not sure there will be no correctness issues elsewhere. Here is what I am speaking about:

--- a/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -350,13 +350,13 @@ bool TargetLowering::ShrinkDemandedConstant(SDValue Op, const APInt &Demanded,
   SDLoc DL(Op);
   unsigned Opcode = Op.getOpcode();

-  if (!Op.hasOneUse())
-    return false;
-
   // Do target-specific constant optimization.
   if (targetShrinkDemandedConstant(Op, Demanded, TLO))
     return TLO.New.getNode();

+  if (!Op.hasOneUse())
+    return false;
+
   // FIXME: ISD::SELECT, ISD::SELECT_CC
   switch (Opcode) {
   default:
Jan 7 2019, 10:39 AM
rampitec added a comment to D56289: Added single use check to ShrinkDemandedConstant.

I am seeing a regression with this change on AArch64, where we do not use shrinked constants, e.g. for and/or if they have multiple users. I am not entirely sure what the problem is this patch is solving, as I am not familiar with AMDGPU. Is it a case where shrinking the constant leads to shrinking the related Op and in the end this introduces unnecessary conversion code, as multiple users require different bitwidths?

If that is the case, I am not sure if this is the best place to fix the issue. IIUC, shrinkDemandedBits just shrinks the constant of the passed in Op, but not the width of the result type of Op. So it seems to me that shrinking the constant should be independent of the users of Op.

Jan 7 2019, 10:26 AM

Jan 6 2019

rampitec accepted D55985: AMDGPU: Remove VS/SV mappings from select.

LGTM

Jan 6 2019, 10:54 AM

Jan 5 2019

rampitec committed rL350475: Added single use check to ShrinkDemandedConstant.
Added single use check to ShrinkDemandedConstant
Jan 5 2019, 11:23 AM
rampitec closed D56289: Added single use check to ShrinkDemandedConstant.
Jan 5 2019, 11:23 AM

Jan 4 2019

rampitec added inline comments to D56291: ScheduleDAG: Don't break the dependence in clustering neighboring loads..
Jan 4 2019, 2:20 PM
rampitec updated the diff for D56289: Added single use check to ShrinkDemandedConstant.

Moved single use check into ShrinkDemandedConstant.

Jan 4 2019, 12:28 PM
rampitec added a comment to D56289: Added single use check to ShrinkDemandedConstant.

Probably should try fixing ShrinkDemandedConstant first

Jan 4 2019, 12:27 PM
rampitec added inline comments to D56291: ScheduleDAG: Don't break the dependence in clustering neighboring loads..
Jan 4 2019, 11:35 AM
rampitec added inline comments to D56291: ScheduleDAG: Don't break the dependence in clustering neighboring loads..
Jan 4 2019, 11:13 AM

Jan 3 2019

rampitec added inline comments to D56291: ScheduleDAG: Don't break the dependence in clustering neighboring loads..
Jan 3 2019, 2:50 PM
rampitec added a comment to D56289: Added single use check to ShrinkDemandedConstant.

Probably should try fixing ShrinkDemandedConstant first

Jan 3 2019, 2:26 PM
rampitec added a reviewer for D56289: Added single use check to ShrinkDemandedConstant: msearles.
Jan 3 2019, 1:26 PM
rampitec created D56289: Added single use check to ShrinkDemandedConstant.
Jan 3 2019, 12:40 PM

Dec 31 2018

rampitec accepted D56161: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.

LGTM

Dec 31 2018, 10:31 AM

Dec 20 2018

rampitec accepted D55908: AMDGPU/GlobalISel: Redo legality for build_vector.

LGTM

Dec 20 2018, 12:43 AM

Dec 19 2018

rampitec accepted D55880: AMDGPU: Make i1/i64/v2i32 and/or/xor legal.

LGTM

Dec 19 2018, 12:45 AM

Dec 18 2018

rampitec accepted D55808: AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw.

LGTM

Dec 18 2018, 10:04 AM

Dec 13 2018

rampitec accepted D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

LGTM

Dec 13 2018, 11:13 PM
rampitec added inline comments to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .
Dec 13 2018, 10:47 PM
rampitec added inline comments to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .
Dec 13 2018, 3:53 PM
rampitec accepted D55652: [RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registers.

LGTM

Dec 13 2018, 1:26 PM
rampitec added a comment to D55652: [RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registers.

It looks good to me, but @arsenm could you also take a look?

Dec 13 2018, 1:00 PM
rampitec added a reviewer for D55652: [RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registers: arsenm.
Dec 13 2018, 12:59 PM
rampitec accepted D55634: AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub.

LGTM

Dec 13 2018, 12:22 PM
rampitec accepted D55645: AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shifts.

LGTM

Dec 13 2018, 10:56 AM

Dec 12 2018

rampitec committed rL349006: [AMDGPU] Fix build failure, second attempt.
[AMDGPU] Fix build failure, second attempt
Dec 12 2018, 9:55 PM
rampitec committed rL349005: [AMDGPU] Fix build failure.
[AMDGPU] Fix build failure
Dec 12 2018, 9:24 PM
rampitec committed rL349003: [AMDGPU] Simplify negated condition.
[AMDGPU] Simplify negated condition
Dec 12 2018, 7:22 PM
rampitec closed D55402: [AMDGPU] Simplify negated condition.
Dec 12 2018, 7:22 PM
rampitec accepted D55637: AMDGPU: Legalize/regbankselect frame_index.

LGTM

Dec 12 2018, 6:51 PM
rampitec accepted D55636: AMDGPU/GlobalISel: Legalize/regbankselect block_addr.

LGTM

Dec 12 2018, 6:51 PM
rampitec accepted D55635: AMDGPU: Legalize/regbankselect fma.

LGTM

Dec 12 2018, 6:50 PM