Page MenuHomePhabricator

rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (263 w, 1 d)

Recent Activity

Wed, Apr 10

rampitec added inline comments to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Wed, Apr 10, 9:41 AM · Restricted Project

Tue, Apr 9

rampitec added inline comments to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Tue, Apr 9, 5:12 PM · Restricted Project
rampitec added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

The current problem i am trying to resolve in somewhat analogous to hoisting 1/2 of the 64 bit add instruction pair. Although in this particular situation we are actually sinking 1/2 of the instruction pair into a later position within the same block. And yes, i can see how in the future a new machine instruction pass might choose to hoist one of the instructions into a pred BB. I realize i can write additional code to scan a previous block. However i think its better that passes not hoist part of an instruction pair, especially ones such as these. To that end i would rather see my patch assert so that we are forced to deal with such a situation should it arise.
Your example, btw, is a good one for why we should have an IR test for the current problem, rather than an MIR test. An MIR test that runs just before SILoadStoreOptimizer will not detect the affects of a new pass. Whereas the IR test attached to this patch stands a better chance of detecting the issue.

Tue, Apr 9, 4:20 PM · Restricted Project
rampitec added inline comments to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Tue, Apr 9, 1:33 PM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

For the record, I think the "multiple connected components" problem could have been solved by something like this after the call to "createAndComputeVirtRegInterval"

LiveInterval &LI = LIS->getInterval(Reg);
assert(Reg == LI.reg && "Invalid reg to interval mapping");
if (TargetRegisterInfo::isVirtualRegister(LI.reg)) {
  // Check whether or not LI is composed by multiple connected
  // components and if that is the case, fix that.
  SmallVector<LiveInterval*, 8> SplitLIs;
  LIS->splitSeparateComponents(LI, SplitLIs);
}

However, that still leaves the questions about if the pass should do less optimizations related to physregs when LIS is available, or how to preserve LIS if optimizing on physregs.
And I'm not sure about the cost of using splitSeparateComponents compared to a full recompute.

For the record, this code works if applied after createAndComputeVirtRegInterval() call.

Tue, Apr 9, 12:37 PM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

For the record, I think the "multiple connected components" problem could have been solved by something like this after the call to "createAndComputeVirtRegInterval"

LiveInterval &LI = LIS->getInterval(Reg);
assert(Reg == LI.reg && "Invalid reg to interval mapping");
if (TargetRegisterInfo::isVirtualRegister(LI.reg)) {
  // Check whether or not LI is composed by multiple connected
  // components and if that is the case, fix that.
  SmallVector<LiveInterval*, 8> SplitLIs;
  LIS->splitSeparateComponents(LI, SplitLIs);
}

However, that still leaves the questions about if the pass should do less optimizations related to physregs when LIS is available, or how to preserve LIS if optimizing on physregs.
And I'm not sure about the cost of using splitSeparateComponents compared to a full recompute.

Tue, Apr 9, 11:00 AM · Restricted Project
rampitec added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

correction: the original input test did NOT have the instructions separated by more than 1 or 2 instructions The resultant output showed the large separation.
The default neighborhood of 10 is probably more than enough.

Tue, Apr 9, 10:40 AM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

For the record, I think the "multiple connected components" problem could have been solved by something like this after the call to "createAndComputeVirtRegInterval"

LiveInterval &LI = LIS->getInterval(Reg);
assert(Reg == LI.reg && "Invalid reg to interval mapping");
if (TargetRegisterInfo::isVirtualRegister(LI.reg)) {
  // Check whether or not LI is composed by multiple connected
  // components and if that is the case, fix that.
  SmallVector<LiveInterval*, 8> SplitLIs;
  LIS->splitSeparateComponents(LI, SplitLIs);
}

However, that still leaves the questions about if the pass should do less optimizations related to physregs when LIS is available, or how to preserve LIS if optimizing on physregs.
And I'm not sure about the cost of using splitSeparateComponents compared to a full recompute.

Tue, Apr 9, 9:27 AM · Restricted Project
rampitec committed rG913ba8eeb414: Revert LIS handling in MachineDCE (authored by rampitec).
Revert LIS handling in MachineDCE
Tue, Apr 9, 9:13 AM
rampitec committed rL358015: Revert LIS handling in MachineDCE.
Revert LIS handling in MachineDCE
Tue, Apr 9, 9:13 AM
rampitec closed D60466: Revert LIS handling in MachineDCE.
Tue, Apr 9, 9:13 AM · Restricted Project
rampitec updated the diff for D60466: Revert LIS handling in MachineDCE.

Added regression test.

Tue, Apr 9, 9:10 AM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

@bjope could you please check if D60419 solves your problem with physreg use removal?

Isn't it weird that this pass now is doing less optimizations when an analysis pass is provided? It might be OK for your use case, but it seems weird from a more general perspective.

FWIW, I'm not sure exactly why we are running this pass in a lot of places in our OOT target (after ISel, before coalescing, post-coalescing, etc). But we have done it for years (since before I started working with this target), and now some of those runs won't do as much cleanup as they used to do (when a LIS is found). Not sure, but maybe it is exactly these physreg removals we aim for, so I'm not sure that solving the LIS preservation problem by not eliminating certain dead instructions is good enough for us. It will take some time to analyse that, so at the moment it is easier for me to just make a work around patch downstream.

Bjorn, thank you for the testcase. It doesn't seem to be easy to quick fix, while I am going for vacation tomorrow. I will create a revert patch for now and try to sort it out later.

I understood someone might be aiming for dead phys regs, but handling phys without recompute can be much more challenging, so probably a full recompute is not unreasonable under such scenario too.

Tue, Apr 9, 8:53 AM · Restricted Project
rampitec abandoned D60419: Avoid removing physreg uses in MachineDCE with LIS present.
Tue, Apr 9, 8:52 AM
rampitec added a comment to D60466: Revert LIS handling in MachineDCE.

FYI, changes in AMDGPU tests are just because of the liveness rebuild which results in a slightly different and non important allocation change.

Tue, Apr 9, 8:52 AM · Restricted Project
rampitec created D60466: Revert LIS handling in MachineDCE.
Tue, Apr 9, 8:51 AM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

@bjope could you please check if D60419 solves your problem with physreg use removal?

Isn't it weird that this pass now is doing less optimizations when an analysis pass is provided? It might be OK for your use case, but it seems weird from a more general perspective.

FWIW, I'm not sure exactly why we are running this pass in a lot of places in our OOT target (after ISel, before coalescing, post-coalescing, etc). But we have done it for years (since before I started working with this target), and now some of those runs won't do as much cleanup as they used to do (when a LIS is found). Not sure, but maybe it is exactly these physreg removals we aim for, so I'm not sure that solving the LIS preservation problem by not eliminating certain dead instructions is good enough for us. It will take some time to analyse that, so at the moment it is easier for me to just make a work around patch downstream.

Tue, Apr 9, 8:39 AM · Restricted Project

Mon, Apr 8

rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

@bjope could you please check if D60419 solves your problem with physreg use removal?

Mon, Apr 8, 12:22 PM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

@bjope could you please check if D60419 solves your problem with physreg use removal?

Mon, Apr 8, 12:18 PM · Restricted Project
rampitec created D60419: Avoid removing physreg uses in MachineDCE with LIS present.
Mon, Apr 8, 12:18 PM
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
Mon, Apr 8, 11:32 AM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
  • I get MachineVerifier problems because the LIS update does not update live ranges for physical registers. Consider an instruction like this dead %42:regs = COPY $a1, if that is removed and it is the last use of the physical register $a1 then we need to shrink the live range for $a1 , otherwise the MachineVerifier will complain about "Live segment doesn't end at a valid instruction".
Mon, Apr 8, 11:06 AM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

did you enable MachineDCE in RA in some other target? So far it is only enabled in the AMDGPU, so I am curious where do you see these problems?

Mon, Apr 8, 10:29 AM · Restricted Project

Sat, Apr 6

rampitec committed rG5182302a3766: [AMDGPU] Sort out and rename multiple CI/VI predicates (authored by rampitec).
[AMDGPU] Sort out and rename multiple CI/VI predicates
Sat, Apr 6, 2:20 AM
rampitec committed rL357835: [AMDGPU] Sort out and rename multiple CI/VI predicates.
[AMDGPU] Sort out and rename multiple CI/VI predicates
Sat, Apr 6, 2:20 AM
rampitec closed D60346: [AMDGPU] Sort out and rename multiple CI/VI predicates.
Sat, Apr 6, 2:20 AM · Restricted Project

Fri, Apr 5

rampitec created D60346: [AMDGPU] Sort out and rename multiple CI/VI predicates.
Fri, Apr 5, 4:20 PM · Restricted Project
rampitec committed rGc8f78f8dd344: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs (authored by rampitec).
[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Fri, Apr 5, 1:13 PM
rampitec committed rL357805: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Fri, Apr 5, 1:13 PM
rampitec closed D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
Fri, Apr 5, 1:13 PM · Restricted Project
rampitec updated the diff for D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Changed option description.

Fri, Apr 5, 1:06 PM · Restricted Project
rampitec updated the diff for D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Addressed review comments.

Fri, Apr 5, 11:36 AM · Restricted Project
rampitec committed rG1d9f286ecb81: [AMDGPU] rename vi-insts into gfx8-insts (authored by rampitec).
[AMDGPU] rename vi-insts into gfx8-insts
Fri, Apr 5, 11:27 AM
rampitec committed rG7895c0323293: [AMDGPU] predicate and feature refactoring (authored by rampitec).
[AMDGPU] predicate and feature refactoring
Fri, Apr 5, 11:23 AM
rampitec committed rC357792: [AMDGPU] rename vi-insts into gfx8-insts.
[AMDGPU] rename vi-insts into gfx8-insts
Fri, Apr 5, 11:23 AM
rampitec committed rL357792: [AMDGPU] rename vi-insts into gfx8-insts.
[AMDGPU] rename vi-insts into gfx8-insts
Fri, Apr 5, 11:23 AM
rampitec closed D60293: [AMDGPU] rename vi-insts into gfx8-insts.
Fri, Apr 5, 11:23 AM · Restricted Project
rampitec committed rL357791: [AMDGPU] predicate and feature refactoring.
[AMDGPU] predicate and feature refactoring
Fri, Apr 5, 11:23 AM
rampitec closed D60292: [AMDGPU] predicate and feature refactoring.
Fri, Apr 5, 11:23 AM · Restricted Project
rampitec added inline comments to D60292: [AMDGPU] predicate and feature refactoring.
Fri, Apr 5, 11:13 AM · Restricted Project
rampitec added inline comments to D60292: [AMDGPU] predicate and feature refactoring.
Fri, Apr 5, 11:02 AM · Restricted Project
rampitec updated the diff for D60292: [AMDGPU] predicate and feature refactoring.

Also renamed decoder namespaces.

Fri, Apr 5, 10:58 AM · Restricted Project

Thu, Apr 4

rampitec created D60293: [AMDGPU] rename vi-insts into gfx8-insts.
Thu, Apr 4, 4:34 PM · Restricted Project
rampitec created D60292: [AMDGPU] predicate and feature refactoring.
Thu, Apr 4, 4:30 PM · Restricted Project
rampitec added inline comments to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
Thu, Apr 4, 12:47 PM · Restricted Project
rampitec updated the diff for D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Moving pass insertion from common TargetPassConfig into AMDGPU, since most targets do not need it.
Changes to maintain liveness if available remain in the MachineDCE pass, but are a no-op for any other target until they decide to use it late in the RA.

Thu, Apr 4, 12:45 PM · Restricted Project
rampitec added inline comments to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
Thu, Apr 4, 11:04 AM · Restricted Project

Wed, Apr 3

rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Hi @rampitec,

Are the dead instructions marked during the detect dead lanes pass or during the rename independent SubReg pass?

I would expect the former and in that case, it may be easier to just do the code deletion there.

Cheers,
-Quentin

Wed, Apr 3, 6:09 PM · Restricted Project
rampitec updated the diff for D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Skip this invocation for targets which do not have subregister liveness.

Wed, Apr 3, 3:47 PM · Restricted Project
rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

I don't think X86 needs this pass. Everything here looks like bad test construction or the issue in speculative load hardening.

Wed, Apr 3, 12:01 PM · Restricted Project
rampitec accepted D60109: AMDGPU: Split block for si_end_cf.

LGTM

Wed, Apr 3, 11:32 AM

Tue, Apr 2

rampitec added inline comments to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
Tue, Apr 2, 3:52 PM · Restricted Project
rampitec updated the diff for D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Rebased.

Tue, Apr 2, 3:52 PM · Restricted Project
rampitec committed rGea2e22792693: X86: regenerate speculative-load-hardening-indirect.ll tests. NFC. (authored by rampitec).
X86: regenerate speculative-load-hardening-indirect.ll tests. NFC.
Tue, Apr 2, 3:45 PM
rampitec committed rL357537: X86: regenerate speculative-load-hardening-indirect.ll tests. NFC..
X86: regenerate speculative-load-hardening-indirect.ll tests. NFC.
Tue, Apr 2, 3:43 PM
rampitec added inline comments to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
Tue, Apr 2, 3:34 PM · Restricted Project
rampitec accepted D60142: AMDGPU: Fix names for generation features.

LGTM

Tue, Apr 2, 11:07 AM
rampitec added inline comments to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
Tue, Apr 2, 10:56 AM

Mon, Apr 1

rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Ping.

Since the primary goal of the DCE here is to clean dead subregs I can bail this pass invocation if subreg liveness if not supported by target. What do you think?

Mon, Apr 1, 2:10 PM · Restricted Project
rampitec added reviewers for D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs: craig.topper, dfukalov, alex-t.
Mon, Apr 1, 2:05 PM · Restricted Project

Sun, Mar 31

rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Since the primary goal of the DCE here is to clean dead subregs I can bail this pass invocation if subreg liveness if not supported by target. What do you think?

Sun, Mar 31, 1:53 PM · Restricted Project

Fri, Mar 29

rampitec added inline comments to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
Fri, Mar 29, 4:51 PM
rampitec accepted D60002: AMDGPU: Remove dx10-clamp from subtarget features.

LGTM

Fri, Mar 29, 11:21 AM

Tue, Mar 26

rampitec accepted D59851: AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics.

LGTM

Tue, Mar 26, 3:27 PM
rampitec accepted D59846: AMDGPU: wave_barrier is not isBarrier.

LGTM

Tue, Mar 26, 2:50 PM
rampitec added a comment to D59846: AMDGPU: wave_barrier is not isBarrier.

What if we call a real barrier in the same situation?

Tue, Mar 26, 2:40 PM
rampitec accepted D59844: Reapply "AMDGPU: Scavenge register instead of findUnusedReg".

LGTM

Tue, Mar 26, 2:38 PM
rampitec accepted D59840: AMDGPU: Enable the scavenger for large frames.

LGTM

Tue, Mar 26, 1:37 PM

Mon, Mar 25

rampitec updated the diff for D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Rebased.

Mon, Mar 25, 1:51 PM · Restricted Project
rampitec added inline comments to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
Mon, Mar 25, 1:49 PM · Restricted Project
rampitec accepted D59769: AMDGPU: Fix missing scc implicit def on s_andn2_b64_term.

LGTM

Mon, Mar 25, 12:24 PM
rampitec accepted D59778: AMDGPU: Make exec mask optimzations more resistant to block splits.

LGTM

Mon, Mar 25, 11:03 AM
rampitec accepted D59773: AMDGPU: Add additional MIR tests for exec mask optimizations.

LGTM

Mon, Mar 25, 11:02 AM
rampitec added inline comments to D59778: AMDGPU: Make exec mask optimzations more resistant to block splits.
Mon, Mar 25, 10:35 AM
rampitec added inline comments to D59778: AMDGPU: Make exec mask optimzations more resistant to block splits.
Mon, Mar 25, 10:29 AM
rampitec requested changes to D59773: AMDGPU: Add additional MIR tests for exec mask optimizations.
Mon, Mar 25, 10:24 AM
rampitec accepted D59771: AMDGPU: Skip debug_instr when collapsing end_cf.

LGTM

Mon, Mar 25, 10:22 AM
rampitec accepted D59772: AMDGPU: Remove unnecessary check for isFullCopy.

LGTM

Mon, Mar 25, 10:22 AM
rampitec accepted D59770: AMDGPU: Make collapse-endcf test more useful.

LGTM

Mon, Mar 25, 10:22 AM
rampitec added a comment to D59768: AMDGPU: Set hasSideEffects 0 on _term instructions.

One question though, did it pass precheckin? Make sure it did before submission.

Mon, Mar 25, 10:15 AM
rampitec accepted D59768: AMDGPU: Set hasSideEffects 0 on _term instructions.

LGTM

Mon, Mar 25, 10:15 AM

Fri, Mar 22

rampitec accepted D59718: AMDGPU: Make sram-ecc off by default for Vega20.

LGTM

Fri, Mar 22, 2:56 PM · Restricted Project
rampitec added a comment to D59717: AMDGPU: Ignore SRAM ECC feature when inlining.

This isn't correct, this does impact codegen. This changes whether d16 instruction can be used

Fri, Mar 22, 2:52 PM
rampitec accepted D59717: AMDGPU: Ignore SRAM ECC feature when inlining.

LGTM

Fri, Mar 22, 2:17 PM

Mar 21 2019

rampitec added a comment to D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

I feel like we just keep spamming DCE runs to solve issues. Is there some reason it's difficult for RenameIndependentSubregs to clean up after itself?

Also, I think making more passes to have to care about LiveIntervals should generally be avoided

Mar 21 2019, 11:08 AM · Restricted Project

Mar 20 2019

rampitec created D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.
Mar 20 2019, 7:25 PM · Restricted Project
rampitec committed rG0a11829ab23a: Allow machine dce to remove uses in the same instruction (authored by rampitec).
Allow machine dce to remove uses in the same instruction
Mar 20 2019, 2:42 PM
rampitec committed rL356619: Allow machine dce to remove uses in the same instruction.
Allow machine dce to remove uses in the same instruction
Mar 20 2019, 2:41 PM
rampitec closed D59565: Allow machine dce to remove uses in the same instruction.
Mar 20 2019, 2:40 PM · Restricted Project

Mar 19 2019

rampitec added inline comments to D59565: Allow machine dce to remove uses in the same instruction.
Mar 19 2019, 3:12 PM · Restricted Project
rampitec added inline comments to D59565: Allow machine dce to remove uses in the same instruction.
Mar 19 2019, 3:09 PM · Restricted Project
rampitec updated the diff for D59565: Allow machine dce to remove uses in the same instruction.

Use use_nodbg_instructions().

Mar 19 2019, 3:09 PM · Restricted Project
rampitec created D59565: Allow machine dce to remove uses in the same instruction.
Mar 19 2019, 2:54 PM · Restricted Project
rampitec accepted D59517: AMDGPU: Add support for cross address space synchronization scopes.

LGTM

Mar 19 2019, 1:40 PM · Restricted Project
rampitec accepted D58511: AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselect.

LGTM

Mar 19 2019, 1:26 PM

Mar 18 2019

rampitec accepted D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).

LGTM

Mar 18 2019, 5:20 PM · Restricted Project
rampitec added inline comments to D59494: AMDGPU: Add support for cross address space synchronization scopes (clang).
Mar 18 2019, 4:31 PM · Restricted Project
rampitec accepted D59501: [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`..

LGTM

Mar 18 2019, 1:31 PM · Restricted Project
rampitec added inline comments to D59501: [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`..
Mar 18 2019, 1:10 PM · Restricted Project
rampitec added inline comments to D59501: [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`..
Mar 18 2019, 11:24 AM · Restricted Project