rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (197 w, 6 d)

Recent Activity

Tue, Jan 16

rampitec added inline comments to D42124: SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64.
Tue, Jan 16, 1:01 PM · Restricted Project
rampitec added a comment to D41617: [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32.

I believe the source of inconsistency is the original definition of ID_SYMBOLIC_LAST_, which does not point to a last value in the enum, but a last + 1. This can be changed in a separate patch.

Tue, Jan 16, 9:40 AM

Mon, Jan 15

rampitec committed rL322500: [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32.
[AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32
Mon, Jan 15, 10:50 AM
rampitec closed D41617: [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32.
Mon, Jan 15, 10:50 AM
rampitec added inline comments to D41617: [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32.
Mon, Jan 15, 10:06 AM
rampitec committed rL322496: [AMDGPU] Copy impdefs from pseudo to real instructions.
[AMDGPU] Copy impdefs from pseudo to real instructions
Mon, Jan 15, 9:57 AM
rampitec closed D41783: [AMDGPU] Copy impdefs from pseudo to real instructions.
Mon, Jan 15, 9:57 AM

Fri, Jan 5

rampitec created D41783: [AMDGPU] Copy impdefs from pseudo to real instructions.
Fri, Jan 5, 12:53 PM

Tue, Jan 2

rampitec updated the diff for D41617: [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32.

Check for encoding in positive test.

Tue, Jan 2, 1:45 PM
rampitec updated the diff for D41617: [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32.

Added ID_SYMBOLIC_FIRST_GFX9_ to enum.

Tue, Jan 2, 12:48 PM
rampitec updated the diff for D41617: [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32.

Updated per review comments.

Tue, Jan 2, 10:39 AM

Thu, Dec 28

rampitec created D41617: [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32.
Thu, Dec 28, 2:21 PM

Tue, Dec 19

rampitec added inline comments to D41377: [AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU.
Tue, Dec 19, 9:30 AM

Dec 12 2017

rampitec accepted D41132: CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value.

LGTM

Dec 12 2017, 1:55 PM
rampitec added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

That's true. The problem is that in SelectionDAG::getNode (where the SCEMap insertion is) we have no Value and no chance to check it's divergence.
And this is correct: SelectionDAG is for selection and we should not expose the IR Values to it.

The only way I see is to pass the Divergence parameter to getNode from all the SelectionDAGBuilder visitors. This will be correct but requires to change each of 109 visitors and getNode().

In fact we have no chance to have 2 SDNodes that differ by the Divergence flag only.
Please note that the selection operates per block. SelectionDAGBuilder construct the DAG for one block at a time.
Then it selects and emits the code. Then all the data including CSE map get cleared.
FoldingSetNodeID creates the hash including node and it's operands.
Thus we hit the hash only if there is same node with same operands.
Form the data dependency point it must have same divergence. So literally it is same node and setting same value of divergence flag makes no harm.
The only case when we could have 2 nodes that differ by the divergence only is if both have same operands but one is control-dependent of the divergent branch.
That immediately means that 2 nodes belong to different basic blocks and hence cannot be folded.

Dec 12 2017, 1:18 AM

Dec 11 2017

rampitec added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.

Transforming "x*0 -> 0" is illegal if x is divergent? That seems surprising.

Okay, I was unclear. Except for the constants. Your example is a corner case that turn the variable to the constant.
In this case w/o bit propagation we're still correct but sub-optimal.
I can imagine though the case where a long sequence of constant folding ends up with pure zero. If in addition the operand that becomes constant was the only divergent operand, we'd like to propagate.

Dec 11 2017, 7:00 AM

Dec 8 2017

rampitec accepted D41028: AMDGPU: Set IntrReadMem on memtime intrinsics.

LGTM

Dec 8 2017, 11:15 AM

Dec 7 2017

rampitec added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

Dec 7 2017, 12:38 PM

Dec 6 2017

rampitec accepted D40924: AMDGPU: Report Arg's Value name in metadata if kernel_arg_name metadata is not available.

LGTM

Dec 6 2017, 4:47 PM

Dec 5 2017

rampitec added inline comments to D40851: [AMDGPU] Improve verifier wrt vcc subregs.
Dec 5 2017, 1:12 PM
rampitec created D40851: [AMDGPU] Improve verifier wrt vcc subregs.
Dec 5 2017, 12:35 PM
rampitec accepted D40848: AMDGPU: Fix SDWA crash on inline asm.

LGTM

Dec 5 2017, 12:08 PM
rampitec added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

In general adding "custom" code to SelectionDAGBuilder::setValue looks odd. Instead I would add a target-customizable postprocessing loop on pairs of Value <-> SDNode into SelectionDAGISel::SelectBasicBlock right after the DAG is created. The target hook should be able to get whatever it requires LLVM IR analisys and annotate SDNodes.

Dec 5 2017, 11:10 AM

Dec 4 2017

rampitec accepted D40822: AMDGPU: Fix infinite loop with dbg_value.

LGTM

Dec 4 2017, 9:39 PM
rampitec accepted D40113: AMDGPU: Fix crash when scheduling DBG_VALUE.

LGTM

Dec 4 2017, 1:44 PM

Nov 30 2017

rampitec accepted D40670: Let Alloca treated as nonnull for any alloca addr space value.

LGTM

Nov 30 2017, 1:45 PM
rampitec added inline comments to D40670: Let Alloca treated as nonnull for any alloca addr space value.
Nov 30 2017, 1:36 PM

Nov 29 2017

rampitec accepted D40628: AMDGPU: Use carry-less adds in FI elimination.

LGTM

Nov 29 2017, 11:30 PM
rampitec accepted D40556: SIFixSGPRCopies should not change non-divergent PHI.

LGTM

Nov 29 2017, 11:07 AM

Nov 28 2017

rampitec accepted D40585: AMDGPU: Allow negative MUBUF vaddr for gfx9.

LGTM

Nov 28 2017, 3:32 PM
rampitec accepted D37173: AMDGPU: Enable IPRA.

LGTM

Nov 28 2017, 3:30 PM
rampitec accepted D40578: AMDGPU: Make hazard recognizer aware of maximum clause sizes.

LGTM

Nov 28 2017, 3:27 PM
rampitec added a comment to D40343: AMDGPU: Do not combine loads/store across physreg defs.

This pass should probably ignore Subtarget->ldsRequiresM0Init(). The instructions are selected to a set without m0, so this should be able to just check the register uses

Nov 28 2017, 12:05 PM
rampitec added a reviewer for D40556: SIFixSGPRCopies should not change non-divergent PHI: nhaehnle.
Nov 28 2017, 12:01 PM
rampitec added inline comments to D40556: SIFixSGPRCopies should not change non-divergent PHI.
Nov 28 2017, 12:00 PM
rampitec added a reviewer for D40547: AMDGPU: Fix copying i1 value out of loop with non-uniform exit: alex-t.
Nov 28 2017, 11:55 AM
rampitec added a reviewer for D40546: StructurizeCFG: Test for branch divergence correctly: alex-t.
Nov 28 2017, 11:53 AM
rampitec added a comment to D40343: AMDGPU: Do not combine loads/store across physreg defs.

As far as I understand that is only a concern if defined register is M0 since it is read by the ds_* instructions. I.e. it should be better to check to M0, not just any physreg.
Also this should not be a concern on GFX9 since we have lds instructions which do not read M0 there (check Subtarget->ldsRequiresM0Init()).

The LDS use here is a bit of a red herring; it's really not about that. The original case where I found the bug had no "proper" LDS instructions at all, only v_interp, and the merged memory instructions were buffer instructions. You could probably construct a similar case also with e.g. only relative indexing with different indices, or with relative indexing and s_sendmsg (which uses M0).

The real problem is that the pass assumes machine-SSA form, and this assumption is broken with physreg-defs. Here's what happens:

%vreg0 = mem-instruction-1
M0 = def-1
use(%vreg0, M0)
M0 = def-2
...
mem-instruction-2

Without this fix, this gets changed to:

M0 = def-1
M0 = def-2
...
%vreg0, ... = merged-mem-instruction
use(%vreg0, M0)

So the %vreg0-use gets moved (because it depends on mem-instruction-1), without regard for the fact that the instruction reads a register that is later overwritten by a different value.

This possible write-after-read hazard of registers is something that the pass simply hasn't tracked before, because for virtual registers in machine-SSA its unnecessary. It's only with physical registers that it becomes necessary; we don't have many of those at this stage in the flow in practice, but M0 is not a special case.

Nov 28 2017, 11:51 AM

Nov 22 2017

rampitec accepted D40344: AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer.

LGTM

Nov 22 2017, 11:18 AM
rampitec added a comment to D40343: AMDGPU: Do not combine loads/store across physreg defs.

As far as I understand that is only a concern if defined register is M0 since it is read by the ds_* instructions. I.e. it should be better to check to M0, not just any physreg.
Also this should not be a concern on GFX9 since we have lds instructions which do not read M0 there (check Subtarget->ldsRequiresM0Init()).

Nov 22 2017, 11:15 AM
rampitec accepted D40342: AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediate.

LGTM

Nov 22 2017, 11:06 AM

Nov 21 2017

rampitec accepted D40303: AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer.

LGTM

Nov 21 2017, 10:06 AM

Nov 20 2017

rampitec accepted D40255: CodeGen: Fix SelectionDAGISel::LowerArguments for sret addr space.

LGTM

Nov 20 2017, 9:17 AM
rampitec accepted D40088: [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}.

LGTM

Nov 20 2017, 9:16 AM

Nov 17 2017

rampitec added inline comments to D40113: AMDGPU: Fix crash when scheduling DBG_VALUE.
Nov 17 2017, 4:52 PM
rampitec accepted D40158: AMDGPU: Use gfx9 carry-less add/sub instructions.
Nov 17 2017, 2:25 PM
rampitec added a comment to D40158: AMDGPU: Use gfx9 carry-less add/sub instructions.

OK, thanks.

Nov 17 2017, 2:24 PM
rampitec requested changes to D40158: AMDGPU: Use gfx9 carry-less add/sub instructions.
Nov 17 2017, 1:02 PM
rampitec accepted D40158: AMDGPU: Use gfx9 carry-less add/sub instructions.

LGTM with the assertion added to moveScalarAddSub.

Nov 17 2017, 12:21 PM
rampitec added inline comments to D39897: AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental).
Nov 17 2017, 9:42 AM
rampitec accepted D39897: AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental).

LGTM

Nov 17 2017, 8:50 AM
rampitec accepted D40163: AMDGPU: Move hazard avoidance out of waitcnt pass..

LGTM

Nov 17 2017, 8:43 AM
rampitec accepted D40172: [AMDGPU] SDWA: remove omod src operand for VOP2b instructions.

Thank you!

Nov 17 2017, 8:10 AM

Nov 16 2017

rampitec added inline comments to D40155: AMDGPU: Fix breaking SMEM clauses.
Nov 16 2017, 7:21 PM
rampitec added inline comments to D40158: AMDGPU: Use gfx9 carry-less add/sub instructions.
Nov 16 2017, 4:51 PM
rampitec accepted D40155: AMDGPU: Fix breaking SMEM clauses.

LGTM

Nov 16 2017, 4:02 PM
rampitec added inline comments to D40113: AMDGPU: Fix crash when scheduling DBG_VALUE.
Nov 16 2017, 3:54 PM
rampitec accepted D40153: AMDGPU: Replace list of SMEM buffer opcodes.

LGTM

Nov 16 2017, 3:43 PM
rampitec added inline comments to D40088: [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}.
Nov 16 2017, 3:41 PM
rampitec accepted D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

LGTM

Nov 16 2017, 1:29 PM

Nov 15 2017

rampitec accepted D40095: Fix pointer EVT in SelectionDAGBuilder::visitAlloca.

LGTM

Nov 15 2017, 6:56 PM
rampitec added inline comments to D40113: AMDGPU: Fix crash when scheduling DBG_VALUE.
Nov 15 2017, 4:56 PM
rampitec added inline comments to D40095: Fix pointer EVT in SelectionDAGBuilder::visitAlloca.
Nov 15 2017, 12:54 PM
rampitec accepted D39740: CodeGen: Fix pointer info and index type when splitting vector.

LGTM

Nov 15 2017, 11:50 AM
rampitec accepted D40085: Fix APInt bit size in processDbgDeclares.

LGTM

Nov 15 2017, 11:47 AM

Nov 14 2017

rampitec accepted D40059: AMDGPU: Select DS insts without m0 initialization.

LGTM

Nov 14 2017, 8:17 PM
rampitec accepted D39731: AMDGPU: Don't use MUBUF vaddr if address may overflow.

LGTM

Nov 14 2017, 3:03 PM
rampitec accepted D39685: AMDGPU: Handle or in multi-use shl ptr combi.

LGTM

Nov 14 2017, 2:59 PM
rampitec accepted D40040: [AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument.

LGTM

Nov 14 2017, 10:33 AM
rampitec added inline comments to D39998: AMDGPU: Directly implement computeKnownBits for workitem intrinsics .
Nov 14 2017, 9:52 AM
rampitec accepted D40000: AMDGPU: Error on stack size overflow.

LGTM

Nov 14 2017, 9:52 AM
rampitec added inline comments to D39685: AMDGPU: Handle or in multi-use shl ptr combi.
Nov 14 2017, 9:37 AM

Nov 13 2017

rampitec accepted D39983: AMDGPU: Add separate definitions for DS insts without m0 use.

LGTM

Nov 13 2017, 3:04 PM
rampitec accepted D39973: Let llvm.invariant.group.barrier accepts pointer to any address space.

LGTM

Nov 13 2017, 2:44 PM
rampitec added inline comments to D39973: Let llvm.invariant.group.barrier accepts pointer to any address space.
Nov 13 2017, 1:58 PM
rampitec added inline comments to D39973: Let llvm.invariant.group.barrier accepts pointer to any address space.
Nov 13 2017, 1:53 PM
rampitec added inline comments to D39973: Let llvm.invariant.group.barrier accepts pointer to any address space.
Nov 13 2017, 1:00 PM
rampitec accepted D39970: AMDGPU: Fix producing saveexec when the copy is spilled.

LGTM

Nov 13 2017, 12:12 PM
rampitec accepted D39945: AMDGPU: Fix not converting d16 load/stores to offset.

LGTM

Nov 13 2017, 9:23 AM
rampitec accepted D39951: AMDGPU: Implement computeKnownBitsForTargetNode for mbcnt.

LGTM with spacing fixed.

Nov 13 2017, 9:19 AM
rampitec added inline comments to D39897: AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental).
Nov 13 2017, 9:17 AM

Nov 10 2017

rampitec added inline comments to D39897: AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental).
Nov 10 2017, 12:30 PM
rampitec added a comment to D39897: AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental).

Can you add some tests just to show it does not crash? Maybe add run-lines to schedule-regpressure-limit.ll, schedule-regpressure-limit2.ll

Nov 10 2017, 12:24 PM

Nov 9 2017

rampitec added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

This actually looks clean to me, thank you!

Nov 9 2017, 12:06 PM

Nov 7 2017

rampitec added inline comments to D39758: CodeGen: Fix pointer info in SplitVecOp_EXTRACT_VECTOR_ELT.
Nov 7 2017, 6:07 PM
rampitec added inline comments to D39758: CodeGen: Fix pointer info in SplitVecOp_EXTRACT_VECTOR_ELT.
Nov 7 2017, 6:01 PM
rampitec added inline comments to D39758: CodeGen: Fix pointer info in SplitVecOp_EXTRACT_VECTOR_ELT.
Nov 7 2017, 5:48 PM
rampitec accepted D39758: CodeGen: Fix pointer info in SplitVecOp_EXTRACT_VECTOR_ELT.

LGTM

Nov 7 2017, 2:01 PM
rampitec added inline comments to D39731: AMDGPU: Don't use MUBUF vaddr if address may overflow.
Nov 7 2017, 10:00 AM

Nov 6 2017

rampitec accepted D39674: AMDGPU: Remove redundant combine.

LGTM

Nov 6 2017, 10:52 AM
rampitec accepted D39677: AMDGPU: Fix multi-use shl/add combine.

LGTM

Nov 6 2017, 10:33 AM
rampitec added inline comments to D39685: AMDGPU: Handle or in multi-use shl ptr combi.
Nov 6 2017, 10:29 AM
rampitec accepted D39686: AMDGPU: Preserve nuw in shl add ptr combine.

LGTM

Nov 6 2017, 10:23 AM

Nov 3 2017

rampitec accepted D39616: [AMDGPU] Remove hardcoded address space value from AMDGPULibFunc.

LGTM

Nov 3 2017, 9:02 PM
rampitec added inline comments to D39616: [AMDGPU] Remove hardcoded address space value from AMDGPULibFunc.
Nov 3 2017, 2:10 PM

Oct 30 2017

rampitec accepted D39413: AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32.

LGTM

Oct 30 2017, 8:02 PM
rampitec accepted D39432: InferAddressSpaces: Fix bug about replacing addrspacecast.

LGTM

Oct 30 2017, 2:01 PM
rampitec added inline comments to D39413: AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32.
Oct 30 2017, 9:49 AM

Oct 25 2017

rampitec accepted D39306: Fix CodeGen/AMDGPU/fcanonicalize-elimination.ll on FreeBSD 11.0.

LGTM

Oct 25 2017, 2:41 PM

Oct 23 2017

rampitec accepted D39205: AMDGPU: Initialize WavefrontSize from TD files.

LGTM

Oct 23 2017, 2:08 PM