Page MenuHomePhabricator

rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (303 w, 11 h)

Recent Activity

Today

rampitec committed rGbe8e38cbd978: Correct NumLoads in clustering (authored by rampitec).
Correct NumLoads in clustering
Fri, Jan 24, 12:46 PM
rampitec closed D73292: [AMDGPU] Correct NumLoads in clustering.
Fri, Jan 24, 12:46 PM · Restricted Project
rampitec updated the diff for D73292: [AMDGPU] Correct NumLoads in clustering.

Added operand description to TargetInstrInfo::shouldClusterMemOps()

Fri, Jan 24, 12:46 PM · Restricted Project
rampitec accepted D72258: AMDGPU: Don't error on ds.ordered intrinsic in function.
Fri, Jan 24, 12:45 PM · Restricted Project
rampitec added inline comments to D72258: AMDGPU: Don't error on ds.ordered intrinsic in function.
Fri, Jan 24, 12:16 PM · Restricted Project
rampitec requested review of D73292: [AMDGPU] Correct NumLoads in clustering.
Fri, Jan 24, 12:08 PM · Restricted Project
rampitec updated the diff for D73292: [AMDGPU] Correct NumLoads in clustering.

Changed patch to only correct clusterNeighboringMemOps. Threshold logic adjusted accordingly.
We will need to retune our threshold in the AMDGPU separately, likely after scheduler will gain the ability to break clustering under a high pressure.

Fri, Jan 24, 12:07 PM · Restricted Project
rampitec accepted D73375: AMDGPU/GlobalISel: Fix tablegen selection for scalar bin ops.

LGTM

Fri, Jan 24, 11:57 AM · Restricted Project
rampitec committed rG555d8f4ef5eb: [AMDGPU] Bundle loads before post-RA scheduler (authored by rampitec).
[AMDGPU] Bundle loads before post-RA scheduler
Fri, Jan 24, 11:39 AM
rampitec closed D72737: [AMDGPU] Bundle loads before post-RA scheduler.
Fri, Jan 24, 11:38 AM · Restricted Project
rampitec committed rG44b865fa7fea: [AMDGPU] Allow narrowing muti-dword loads (authored by rampitec).
[AMDGPU] Allow narrowing muti-dword loads
Fri, Jan 24, 11:20 AM
rampitec closed D73133: [AMDGPU] Allow narrowing muti-dword loads.
Fri, Jan 24, 11:19 AM · Restricted Project
rampitec accepted D73372: Revert "AMDGPU: Temporary drop s_mul_hi_i/u32 patterns".

LGTM

Fri, Jan 24, 11:10 AM · Restricted Project
rampitec accepted D73371: AMDGPU/GlobalISel: Eliminate SelectVOP3Mods_f32.

LGTM

Fri, Jan 24, 11:10 AM · Restricted Project
rampitec committed rG7a94d4f4ee43: Allow combining of extract_subvector to extract element (authored by rampitec).
Allow combining of extract_subvector to extract element
Fri, Jan 24, 11:04 AM
rampitec closed D73132: Allow combining of extract_subvector to extract element.
Fri, Jan 24, 11:03 AM · Restricted Project
rampitec accepted D73364: AMDGPU: Don't check constant address space for atomic stores.

LGTM

Fri, Jan 24, 11:01 AM · Restricted Project
rampitec accepted D73365: AMDGPU/GlobalISel: Fix not using global atomics on gfx9+.

LGTM

Fri, Jan 24, 11:01 AM · Restricted Project
rampitec accepted D73338: [AMDGPU] Fix GCN regpressure trackers for INLINEASM instructions.

LGTM. Thanks.

Fri, Jan 24, 10:53 AM · Restricted Project
rampitec added inline comments to D72737: [AMDGPU] Bundle loads before post-RA scheduler.
Fri, Jan 24, 8:04 AM · Restricted Project
rampitec accepted D73323: [DA] Don't propagate from unreachable blocks.

LGTM

Fri, Jan 24, 1:41 AM · Restricted Project
rampitec accepted D73315: Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI.

LGTM as long as parent is submitted.

Fri, Jan 24, 1:41 AM · Restricted Project
rampitec added a comment to D73292: [AMDGPU] Correct NumLoads in clustering.

I tried something similar in D72325.

Comments there argue about how much should we cluster, but regardless I do not think we should use a wrong data. If we want more clustering we need to increase thresholds, but still rely on a correct input.

I agree. I also think we should fix this properly in MachineScheduler:

--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -1584,7 +1584,7 @@ void BaseMemOpClusterMutation::clusterNeighboringMemOps(
     SUnit *SUb = MemOpRecords[Idx+1].SU;
     if (TII->shouldClusterMemOps(MemOpRecords[Idx].BaseOps,
                                  MemOpRecords[Idx + 1].BaseOps,
-                                 ClusterLength)) {
+                                 ClusterLength + 1)) {
       if (SUa->NodeNum > SUb->NodeNum)
         std::swap(SUa, SUb);
       if (DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) {

... and adjust any other target implementations of shouldClusterMemOps accordingly.

Fri, Jan 24, 1:32 AM · Restricted Project

Yesterday

rampitec added a comment to D73315: Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI.

What has caused the failure? Will it happen on Windows in real life?

Thu, Jan 23, 5:29 PM · Restricted Project
rampitec added a comment to D73292: [AMDGPU] Correct NumLoads in clustering.

I also have an early version of a patch which can reschedule without clustering if no optimal schedule were found for a region.

Thu, Jan 23, 4:40 PM · Restricted Project
rampitec added a comment to D73292: [AMDGPU] Correct NumLoads in clustering.

I tried something similar in D72325.

Thu, Jan 23, 2:31 PM · Restricted Project
rampitec created D73292: [AMDGPU] Correct NumLoads in clustering.
Thu, Jan 23, 1:33 PM · Restricted Project

Wed, Jan 22

rampitec accepted D73247: AMDGPU/GlobalISel: Select V_ADD3_U32/V_XOR3_B32.

LGTM

Wed, Jan 22, 10:11 PM · Restricted Project
rampitec updated the diff for D73132: Allow combining of extract_subvector to extract element.

Pre-commited NFC changes and rebased.

Wed, Jan 22, 9:27 AM · Restricted Project
rampitec committed rG2d0fcf786c5c: Precommit NFC part of DAGCombiner change. NFC. (authored by rampitec).
Precommit NFC part of DAGCombiner change. NFC.
Wed, Jan 22, 9:09 AM
rampitec accepted D73192: [tests] Use host-based XFAIL for test/MC/AMDGPU/hsa-gfx10-v3.s.

LGTM

Wed, Jan 22, 9:09 AM · Restricted Project
rampitec committed rGfb8a3d18340e: Regenerate test/CodeGen/ARM/vext.ll. NFC. (authored by rampitec).
Regenerate test/CodeGen/ARM/vext.ll. NFC.
Wed, Jan 22, 9:00 AM
rampitec added inline comments to D73132: Allow combining of extract_subvector to extract element.
Wed, Jan 22, 8:12 AM · Restricted Project
rampitec added inline comments to D73132: Allow combining of extract_subvector to extract element.
Wed, Jan 22, 12:33 AM · Restricted Project

Tue, Jan 21

rampitec created D73133: [AMDGPU] Allow narrowing muti-dword loads.
Tue, Jan 21, 1:00 PM · Restricted Project
rampitec added a parent revision for D73133: [AMDGPU] Allow narrowing muti-dword loads: D73132: Allow combining of extract_subvector to extract element.
Tue, Jan 21, 1:00 PM · Restricted Project
rampitec added a child revision for D73132: Allow combining of extract_subvector to extract element: D73133: [AMDGPU] Allow narrowing muti-dword loads.
Tue, Jan 21, 1:00 PM · Restricted Project
rampitec created D73132: Allow combining of extract_subvector to extract element.
Tue, Jan 21, 12:42 PM · Restricted Project
rampitec added inline comments to D72737: [AMDGPU] Bundle loads before post-RA scheduler.
Tue, Jan 21, 10:39 AM · Restricted Project

Mon, Jan 20

rampitec accepted D73069: AMDGPU: Look through casted selects to constant fold bin ops.

LGTM

Mon, Jan 20, 5:22 PM · Restricted Project
rampitec accepted D72187: AMDGPU: Prepare to use scalar register indexing.

LGTM. A verifier update is desirable though.

Mon, Jan 20, 1:04 PM · Restricted Project
rampitec accepted D72185: AMDGPU: Partially merge indirect register write handling.

LGTM

Mon, Jan 20, 12:55 PM · Restricted Project
rampitec accepted D73049: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI.

LGTM

Mon, Jan 20, 11:50 AM · Restricted Project
rampitec added a comment to D72185: AMDGPU: Partially merge indirect register write handling.

The change seems to miss parent revision introducing V_INDIRECT_REG_WRITE_*.

Mon, Jan 20, 11:31 AM · Restricted Project
rampitec added inline comments to D73049: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI.
Mon, Jan 20, 11:31 AM · Restricted Project
rampitec accepted D73033: AMDGPU: Cleanup and generate 64-bit div tests.

LGTM

Mon, Jan 20, 11:31 AM · Restricted Project
rampitec accepted D73008: AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare.

LGTM

Mon, Jan 20, 11:22 AM · Restricted Project
rampitec accepted D73009: AMDGPU: Don't create weird sized integers.

LGTM

Mon, Jan 20, 11:22 AM · Restricted Project
rampitec added inline comments to D72187: AMDGPU: Prepare to use scalar register indexing.
Mon, Jan 20, 11:22 AM · Restricted Project

Fri, Jan 17

rampitec updated the diff for D72737: [AMDGPU] Bundle loads before post-RA scheduler.

Prevent bundling of loads with overlapping destinations. They are dependent.

Fri, Jan 17, 1:36 PM · Restricted Project
rampitec committed rGeebdd85e7df4: [AMDGPU] allow multi-dword flat scratch access since GFX9 (authored by rampitec).
[AMDGPU] allow multi-dword flat scratch access since GFX9
Fri, Jan 17, 10:52 AM
rampitec closed D72865: [AMDGPU] allow multi-dword flat scratch access since GFX9.
Fri, Jan 17, 10:52 AM · Restricted Project
rampitec accepted D72927: AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp.

LGTM

Fri, Jan 17, 10:51 AM · Restricted Project
rampitec accepted D72925: AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp.

LGTM

Fri, Jan 17, 10:51 AM · Restricted Project
rampitec added inline comments to D72931: AMDGPU: Add a16 feature to gfx10.
Fri, Jan 17, 10:42 AM · Restricted Project
rampitec added inline comments to D72865: [AMDGPU] allow multi-dword flat scratch access since GFX9.
Fri, Jan 17, 10:41 AM · Restricted Project

Thu, Jan 16

rampitec accepted D72866: AMDGPU: Don't assert on a16 images on targets without FeatureR128A16.

LGTM

Thu, Jan 16, 1:26 PM · Restricted Project
rampitec added a comment to D72866: AMDGPU: Don't assert on a16 images on targets without FeatureR128A16.

I think we need to add FeatureR128A16 to GFX10. ISA spec says it does support it.

Thu, Jan 16, 1:17 PM · Restricted Project
rampitec updated the diff for D72865: [AMDGPU] allow multi-dword flat scratch access since GFX9.

Added subtarget predicate.

Thu, Jan 16, 1:16 PM · Restricted Project
rampitec added inline comments to D72865: [AMDGPU] allow multi-dword flat scratch access since GFX9.
Thu, Jan 16, 1:07 PM · Restricted Project
rampitec created D72865: [AMDGPU] allow multi-dword flat scratch access since GFX9.
Thu, Jan 16, 12:47 PM · Restricted Project
rampitec accepted D72844: AMDGPU: Move permlane discard vdst_in optimization.

LGTM

Thu, Jan 16, 12:37 PM · Restricted Project
rampitec accepted D72845: AMDGPU: Do permlane16 vdst_in discard optimization in InstCombine.

LGTM

Thu, Jan 16, 12:18 PM · Restricted Project
rampitec added inline comments to D72844: AMDGPU: Move permlane discard vdst_in optimization.
Thu, Jan 16, 12:18 PM · Restricted Project

Wed, Jan 15

rampitec committed rG8b417dd3d6c6: Process BUNDLE in tail duplication (authored by rampitec).
Process BUNDLE in tail duplication
Wed, Jan 15, 4:03 PM
rampitec closed D72783: Process BUNDLE in tail duplication.
Wed, Jan 15, 4:03 PM · Restricted Project
rampitec accepted D72800: [MachineScheduler] Don't swap when we can't cluster.

LGTM.

Wed, Jan 15, 1:10 PM · Restricted Project
rampitec added inline comments to D72783: Process BUNDLE in tail duplication.
Wed, Jan 15, 1:00 PM · Restricted Project
rampitec added a parent revision for D72737: [AMDGPU] Bundle loads before post-RA scheduler: D72783: Process BUNDLE in tail duplication.
Wed, Jan 15, 10:12 AM · Restricted Project
rampitec added inline comments to D72783: Process BUNDLE in tail duplication.
Wed, Jan 15, 10:12 AM · Restricted Project
rampitec added a child revision for D72783: Process BUNDLE in tail duplication: D72737: [AMDGPU] Bundle loads before post-RA scheduler.
Wed, Jan 15, 10:12 AM · Restricted Project
rampitec updated the diff for D72737: [AMDGPU] Bundle loads before post-RA scheduler.

Rebased on top of D72783 to prevent tail duplication of a bundle.

Wed, Jan 15, 10:12 AM · Restricted Project
rampitec added inline comments to D72783: Process BUNDLE in tail duplication.
Wed, Jan 15, 10:03 AM · Restricted Project
rampitec added inline comments to D72737: [AMDGPU] Bundle loads before post-RA scheduler.
Wed, Jan 15, 9:44 AM · Restricted Project
rampitec created D72783: Process BUNDLE in tail duplication.
Wed, Jan 15, 9:44 AM · Restricted Project
rampitec added inline comments to D72737: [AMDGPU] Bundle loads before post-RA scheduler.
Wed, Jan 15, 8:45 AM · Restricted Project
rampitec added a comment to D72737: [AMDGPU] Bundle loads before post-RA scheduler.

There are nice changes in a bunch of tests, where we're preserving clusters instead of breaking them apart.

But there are also strange changes in some other tests, where the clustering hasn't changed, but some instructions that use the result of a load have moved around. Does this mean we're getting the latency of the load wrong now? (Or were we getting it wrong before?) For example:
insert_vector_elt
llvm.maxnum.f16.ll
saddo.ll
sign_extend.ll

Wed, Jan 15, 8:16 AM · Restricted Project

Tue, Jan 14

rampitec added a comment to D72737: [AMDGPU] Bundle loads before post-RA scheduler.

By the way I think we actually do try to insert waitcnt into bundles. I wonder if it would be worth it to just update the memory legalizer to work with bundles as well.

I saw waitcount inserted after the bundle if I do not unpack them, so probably we do not do it always if we really do.

Yeah maybe. I remember it was added here: c04aab9c0646461bc187808920b3d5ee7f5cc5ab
Was the waitcnt after the bundle that you saw in the correct place?

Tue, Jan 14, 5:05 PM · Restricted Project
rampitec updated the diff for D72737: [AMDGPU] Bundle loads before post-RA scheduler.

Rebased.

Tue, Jan 14, 4:55 PM · Restricted Project
rampitec updated the diff for D72737: [AMDGPU] Bundle loads before post-RA scheduler.

Fixed bug in the bundling logic and added test for produced bundles and when do they break.

Tue, Jan 14, 4:43 PM · Restricted Project
rampitec added a comment to D72737: [AMDGPU] Bundle loads before post-RA scheduler.

By the way I think we actually do try to insert waitcnt into bundles. I wonder if it would be worth it to just update the memory legalizer to work with bundles as well.

Tue, Jan 14, 4:33 PM · Restricted Project
rampitec added a comment to D72737: [AMDGPU] Bundle loads before post-RA scheduler.

Actually I misunderstood the change because of the title. I see now that the loads are bundled before post-RA scheduler.

Tue, Jan 14, 4:23 PM · Restricted Project
rampitec retitled D72737: [AMDGPU] Bundle loads before post-RA scheduler from [AMDGPU] Bundle loads before pre-RA scheduler to [AMDGPU] Bundle loads before post-RA scheduler.
Tue, Jan 14, 4:23 PM · Restricted Project
rampitec planned changes to D72737: [AMDGPU] Bundle loads before post-RA scheduler.

I have found a bug in the logic. Will update the review.

Tue, Jan 14, 4:02 PM · Restricted Project
rampitec added a comment to D72737: [AMDGPU] Bundle loads before post-RA scheduler.

Is this just working around the post-RA scheduler? Can we finally replace the post-RA scheduler with misched?

Tue, Jan 14, 3:33 PM · Restricted Project
rampitec accepted D72709: [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU..

LGTM

Tue, Jan 14, 2:55 PM · Restricted Project
rampitec created D72737: [AMDGPU] Bundle loads before post-RA scheduler.
Tue, Jan 14, 2:55 PM · Restricted Project
rampitec added inline comments to D72709: [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU..
Tue, Jan 14, 1:38 PM · Restricted Project
rampitec added a comment to D72709: [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU..

You have skipped the dead MO, but was pass reordering really necessary? It seems we have higher register pressure with this change.

Tue, Jan 14, 11:02 AM · Restricted Project
rampitec accepted D72706: [MachineScheduler] Reduce reordering due to mem op clustering.

LGTM

Tue, Jan 14, 10:32 AM · Restricted Project
rampitec committed rGad741853c388: [AMDGPU] Model distance to instruction in bundle (authored by rampitec).
[AMDGPU] Model distance to instruction in bundle
Tue, Jan 14, 1:23 AM
rampitec closed D72669: [AMDGPU] Model distance to instruction in bundle.
Tue, Jan 14, 1:23 AM · Restricted Project
rampitec committed rGeca44745871b: [AMDGPU] Fix getInstrLatency() always returning 1 (authored by rampitec).
[AMDGPU] Fix getInstrLatency() always returning 1
Tue, Jan 14, 1:14 AM
rampitec closed D72655: [AMDGPU] Fix getInstrLatency() always returning 1.
Tue, Jan 14, 1:13 AM · Restricted Project

Mon, Jan 13

rampitec added reviewers for D72669: [AMDGPU] Model distance to instruction in bundle: kerbowa, vpykhtin.
Mon, Jan 13, 5:55 PM · Restricted Project
rampitec added reviewers for D72655: [AMDGPU] Fix getInstrLatency() always returning 1: kerbowa, vpykhtin.
Mon, Jan 13, 5:55 PM · Restricted Project
rampitec added a parent revision for D72669: [AMDGPU] Model distance to instruction in bundle: D72655: [AMDGPU] Fix getInstrLatency() always returning 1.
Mon, Jan 13, 5:09 PM · Restricted Project
rampitec added a child revision for D72655: [AMDGPU] Fix getInstrLatency() always returning 1: D72669: [AMDGPU] Model distance to instruction in bundle.
Mon, Jan 13, 5:09 PM · Restricted Project
rampitec created D72669: [AMDGPU] Model distance to instruction in bundle.
Mon, Jan 13, 5:09 PM · Restricted Project
rampitec created D72655: [AMDGPU] Fix getInstrLatency() always returning 1.
Mon, Jan 13, 2:48 PM · Restricted Project