Page MenuHomePhabricator

rampitec (Stanislav Mekhanoshin)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 4 2014, 4:14 AM (406 w, 5 d)

Recent Activity

Today

rampitec accepted D117690: AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics.

LGTM

Wed, Jan 19, 9:37 AM · Restricted Project
rampitec accepted D117682: AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd.
Wed, Jan 19, 9:34 AM · Restricted Project
rampitec added inline comments to D117690: AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics.
Wed, Jan 19, 9:33 AM · Restricted Project
rampitec added a comment to D117682: AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd.

fp64-atomics-gfx90a.ll is failing.

Wed, Jan 19, 9:31 AM · Restricted Project
rampitec accepted D117678: AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics.
Wed, Jan 19, 9:29 AM · Restricted Project

Yesterday

rampitec requested changes to D117562: [AMDGPU] Sink immediate VGPR defs if high RP.

The code as written does not guarantee it will increase the occupancy if a def was sunk. Moreover it will sink defs even in regions with low RP. Essentially it may easily pessimise the code without improving anything. You really need to check that sinking is beneficial.

Tue, Jan 18, 1:02 PM · Restricted Project
rampitec updated the diff for D117253: [AMDGPU] Select VGPR versions of MFMA if possible.
  • Check for calls along with inline asm not relying on function attribute.
  • Moved -global-isel flag in tests.
Tue, Jan 18, 11:55 AM · Restricted Project

Mon, Jan 17

rampitec added inline comments to D117253: [AMDGPU] Select VGPR versions of MFMA if possible.
Mon, Jan 17, 4:58 PM · Restricted Project
rampitec added inline comments to D117253: [AMDGPU] Select VGPR versions of MFMA if possible.
Mon, Jan 17, 4:53 PM · Restricted Project
rampitec accepted D116819: [AMDGPU] Disable optimizeEndCf at -O0.
Mon, Jan 17, 10:06 AM · Restricted Project

Fri, Jan 14

rampitec accepted D117358: AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch.
Fri, Jan 14, 3:24 PM · Restricted Project
rampitec accepted D117364: AMDGPU: Use module level register maximums for unknown callees.
Fri, Jan 14, 3:23 PM · Restricted Project
rampitec added a comment to D117358: AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch.

Any tests?

Fri, Jan 14, 2:39 PM · Restricted Project
rampitec added a reviewer for D117358: AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch: sebastian-ne.
Fri, Jan 14, 2:39 PM · Restricted Project
rampitec accepted D117313: [AMDGPU][GFX10][MC] Updated symbolic names of internal HW registers.
Fri, Jan 14, 11:35 AM · Restricted Project
rampitec accepted D117305: [AMDGPU][MC] Corrected disassembly of s_waitcnt.

LGTM

Fri, Jan 14, 11:32 AM · Restricted Project
rampitec added a comment to D117253: [AMDGPU] Select VGPR versions of MFMA if possible.

Same thing is possible even with calls but requires a full call graph examination. If annotate kernel features pass will move to attributer it shall be possible to analyze inline asm in calls and propagate this info.

Fri, Jan 14, 1:12 AM · Restricted Project

Thu, Jan 13

rampitec retitled D117253: [AMDGPU] Select VGPR versions of MFMA if possible from [AMDGPU] Select VGPR versions of MFMA is possible to [AMDGPU] Select VGPR versions of MFMA if possible.
Thu, Jan 13, 3:46 PM · Restricted Project
rampitec requested review of D117253: [AMDGPU] Select VGPR versions of MFMA if possible.
Thu, Jan 13, 2:32 PM · Restricted Project
rampitec added a comment to D117189: [AMDGPU] Fix error handling in asm constraint syntax.

Testcase?

Thu, Jan 13, 9:43 AM · Restricted Project
rampitec committed rGfc6af7e188cc: [AMDGPU] Fix error handling in asm constraint syntax (authored by rampitec).
[AMDGPU] Fix error handling in asm constraint syntax
Thu, Jan 13, 9:34 AM
rampitec closed D117189: [AMDGPU] Fix error handling in asm constraint syntax.
Thu, Jan 13, 9:34 AM · Restricted Project

Wed, Jan 12

rampitec requested review of D117189: [AMDGPU] Fix error handling in asm constraint syntax.
Wed, Jan 12, 11:51 PM · Restricted Project
rampitec committed rGd043822daab9: [AMDGPU] Fixed physreg asm constraint parsing (authored by rampitec).
[AMDGPU] Fixed physreg asm constraint parsing
Wed, Jan 12, 4:37 PM
rampitec closed D117055: [AMDGPU] Fixed physreg asm constraint parsing.
Wed, Jan 12, 4:37 PM · Restricted Project
rampitec updated the diff for D117055: [AMDGPU] Fixed physreg asm constraint parsing.

Fixed tuple parsing as well.
Removed w/a in 2 places which were using it.

Wed, Jan 12, 3:36 PM · Restricted Project

Tue, Jan 11

rampitec added a comment to D117057: [AMDGPU] Annotate functions with inline asm using agprs.

I guess I can run this code in the AMDGPUDAGToDAGISel::runOnMachineFunction or even SIMachineFunctionInfo constructor and skip the attribution.

The downside is that I have to scan all instructions for that, and AMDGPUAnnotateKernelFeatures already does that. Where are you planning to move its code?

AMDGPUAttributor. If this were to be an attribute, which is pretty ugly, it should be the inverse. Why can't you just select to AGPRs, and later we can adjust the register classes if necessary?

Tue, Jan 11, 6:00 PM · Restricted Project
rampitec abandoned D117057: [AMDGPU] Annotate functions with inline asm using agprs.
Tue, Jan 11, 4:37 PM · Restricted Project
rampitec added a comment to D117057: [AMDGPU] Annotate functions with inline asm using agprs.

I guess I can run this code in the AMDGPUDAGToDAGISel::runOnMachineFunction or even SIMachineFunctionInfo constructor and skip the attribution.

Tue, Jan 11, 3:03 PM · Restricted Project
rampitec added a comment to D117057: [AMDGPU] Annotate functions with inline asm using agprs.

I guess I can run this code in the AMDGPUDAGToDAGISel::runOnMachineFunction or even SIMachineFunctionInfo constructor and skip the attribution.

Tue, Jan 11, 2:55 PM · Restricted Project
rampitec added a comment to D117057: [AMDGPU] Annotate functions with inline asm using agprs.

Why does this specifically need to check for inline asm? Is this only checked in the kernel or functions too?

I'm trying to delete this pass and don't want to add more stuff to it. If it's just for kernels can't you check for AGPR physreg uses later?

Tue, Jan 11, 2:48 PM · Restricted Project
rampitec requested review of D117057: [AMDGPU] Annotate functions with inline asm using agprs.
Tue, Jan 11, 2:38 PM · Restricted Project
rampitec requested review of D117055: [AMDGPU] Fixed physreg asm constraint parsing.
Tue, Jan 11, 2:18 PM · Restricted Project

Mon, Jan 10

rampitec added a comment to D116819: [AMDGPU] Disable optimizeEndCf at -O0.

I think this is an extreme interpretation of optnone. This is a minor optimization which happens as part of lowering. The fact we do this as a separate step is just an artifact of how we happen to lower control flow. It's not strictly true that no optimizations occur at -O0, especially if they are cheap and provide benefit

Mon, Jan 10, 11:05 AM · Restricted Project
rampitec accepted D116955: AMDGPU: Avoid enabling kernel workitem IDs with reqd_work_group_size.
Mon, Jan 10, 11:03 AM · Restricted Project
rampitec accepted D116954: AMDGPU: Select workitem ID intrinsics to 0 with req_work_group_size.
Mon, Jan 10, 11:02 AM · Restricted Project
rampitec accepted D116953: AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size.
Mon, Jan 10, 11:00 AM · Restricted Project
rampitec added inline comments to D104331: [AMDGPU] Use performOptimizedStructLayout for LDS sort.
Mon, Jan 10, 10:46 AM · Restricted Project

Fri, Jan 7

rampitec added inline comments to D116819: [AMDGPU] Disable optimizeEndCf at -O0.
Fri, Jan 7, 11:57 AM · Restricted Project
rampitec added inline comments to D116819: [AMDGPU] Disable optimizeEndCf at -O0.
Fri, Jan 7, 11:34 AM · Restricted Project
rampitec added a comment to D114126: [PatternMatch] Create match method to track uses complexity.

If we removed some strictness about uses, we probably wouldn't require this patch or the follow-ons for complicated bitwise logic optimizations. Those are only needed because we miss the simpler, intermediate folds in complex expressions with extra uses. I suppose we'd be more at risk for infinite combine loops if we allowed creating extra instructions, but it might be worth experimenting with removing some use checks on bitwise logic folds to see how that plays out.

Fri, Jan 7, 11:28 AM · Restricted Project

Thu, Jan 6

rampitec updated the diff for D114126: [PatternMatch] Create match method to track uses complexity.

Rebased.

Thu, Jan 6, 3:02 PM · Restricted Project
rampitec added a comment to D116714: AMDGPU: Fix LiveVariables error after optimizing VGPR ranges.

It seems to cause a real regression, block-should-not-be-in-alive-blocks.mir fails.

Thu, Jan 6, 11:07 AM · Restricted Project
rampitec added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

This looks like a regression in xnor.ll :

	s_not_b32 s0, s0                        	v_not_b32_e32 v0, v0
	v_xor_b32_e32 v0, s0, v0                        v_xor_b32_e32 v0, s4, v0

but it is not really. All the nodes in the example are divergent and the divergent ( xor, x -1) is selected to V_NOT_B32 as of https://reviews.llvm.org/D115884 has been committed.
S_NOT_B32 appears at the left because of the custom optimization that converts S_XNOR_B32 back to NOT (XOR) for the targets which have no V_XNOR. This optimization relies on the fact that if the NOT operand is SGPR and V_XOR_B32_e32 can accept SGPR as a first source operand.
I am not sure if it is always safe. The VALU instructions execution is controlled by the EXEC mask but SALU is not.

This is indeed a regression. It is always safe to keep s_not_b32 on SALU. Also note this effectively makes SIInstrInfo::lowerScalarXnor() useless. This is why XNOR was left behind by the D111907.

SIInstrInfo::lowerScalarXnor() is exactly the part of the "manual" SALU to VALU lowering that I am trying to get rid of.
The divergent "not" must be selected to the "V_NOT_B32_e32/64" otherwise we still have illegal VGPR to SGPR copies.
This happens because the divergent "not" node has divergent operands and their result will be likely in VGPR.
Also, we should select everything correctly first and can apply some peephole optimizations after.
In other words: we should not "cheat ourselves" during the selection. The selection should be done fairly corresponding to the node divergence bit.
Then we can apply the optimization in case it is safe.
Note that this is not the only case when we would like to further optimize the code after selection.
I'm planning to further add a separate pass for that.

We cannot solve the problem in the custom selection procedure because NOT node operand has not yet been selected and we do not know if it is SGPR or VGPR.
The only way, for now, is to post-process not(xor)/xor(not) in SIFixSGPRCopies. This may be considered a temporary hack until we have no proper pass for that.

Thu, Jan 6, 11:05 AM · Restricted Project
rampitec committed rG0b5340acb758: [InstCombine] Factor out a common pattern match used 3 times. NFC. (authored by rampitec).
[InstCombine] Factor out a common pattern match used 3 times. NFC.
Thu, Jan 6, 10:24 AM
rampitec closed D116194: [InstCombine] Factor out a common pattern match used 3 times. NFC..
Thu, Jan 6, 10:24 AM · Restricted Project
rampitec added a comment to D116231: [InstCombine] (~a & ~b & c) | (~a & ~c & b) --> (b ^ c) & ~a.

I don't really have an opinion on the patch, but I'm curious.

Given (~a & ~b & c) | (~a & ~c & b), do we make any attempt to pull out the ~a like this: ~a & ((~b & c) | (~c & b)) ? If not, why not? Would that be a good thing to address? Then it's just a case of simplifying the two-variable (~b & c) | (~c & b) -> b ^ c which is easy.

InstCombine does try that transform via InstCombinerImpl::SimplifyAssociativeOrCommutative() and/or InstCombinerImpl::SimplifyUsingDistributiveLaws(). And that works as expected - for example if we alter the 1st modified test in this patch to be like this:

define i32 @not_and_not_and_and_or(i32 %a, i32 %b, i32 %c) {
  %nota = xor i32 %a, -1
  %notb = xor i32 %b, -1
  %and1 = and i32 %nota, %c
  %and2 = and i32 %and1, %notb
  %or1 = or i32 %a, %c
  %not1 = xor i32 %or1, -1
  %and3 = and i32 %not1, %b
  %or3 = or i32 %and2, %and3
  call void @use(i32 %nota)
  call void @use(i32 %notb)
  ret i32 %or3

Then it reduces because we find the 'b' and '~b' values directly in the operands of the 'and' instructions. So this might be a question for "-reassociate" - can we get that pass to arrange the operands such that -instcombine can fold this (without breaking some other pattern)?

Thu, Jan 6, 10:11 AM · Restricted Project
rampitec added a comment to D116194: [InstCombine] Factor out a common pattern match used 3 times. NFC..

Making the caller pass a dummy seems awkward.
I don't have any suggestions to improve it, but adding some more potential reviewers.

I can pass a pointer with default null. Is that better?

Wouldn't it be a pointer to a pointer in that case?

Thu, Jan 6, 10:06 AM · Restricted Project

Wed, Jan 5

rampitec added a comment to D116194: [InstCombine] Factor out a common pattern match used 3 times. NFC..

Making the caller pass a dummy seems awkward.
I don't have any suggestions to improve it, but adding some more potential reviewers.

Wed, Jan 5, 2:47 PM · Restricted Project
rampitec added a comment to D114126: [PatternMatch] Create match method to track uses complexity.

ping

Wed, Jan 5, 9:55 AM · Restricted Project
rampitec added a comment to D116135: [InstCombine] ((~a & ~b & c) | ~(a | b | c) -> ~(a | b).

ping

Wed, Jan 5, 9:55 AM · Restricted Project
rampitec added a comment to D116194: [InstCombine] Factor out a common pattern match used 3 times. NFC..

ping

Wed, Jan 5, 9:55 AM · Restricted Project
rampitec accepted D116273: [AMDGPU] Iterate LoweredEndCf in the reverse order.

As discussed it would be worth to do a separate change to skip the optimization at -O0 alltogether.

Wed, Jan 5, 9:53 AM · Restricted Project
rampitec accepted D116640: AMDGPU: Avoid null check during addrspacecast lowering.

LGTM

Wed, Jan 5, 9:52 AM · Restricted Project

Tue, Jan 4

rampitec added inline comments to D116273: [AMDGPU] Iterate LoweredEndCf in the reverse order.
Tue, Jan 4, 9:26 AM · Restricted Project

Fri, Dec 31

rampitec added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

This looks like a regression in xnor.ll :

	s_not_b32 s0, s0                        	v_not_b32_e32 v0, v0
	v_xor_b32_e32 v0, s0, v0                        v_xor_b32_e32 v0, s4, v0

but it is not really. All the nodes in the example are divergent and the divergent ( xor, x -1) is selected to V_NOT_B32 as of https://reviews.llvm.org/D115884 has been committed.
S_NOT_B32 appears at the left because of the custom optimization that converts S_XNOR_B32 back to NOT (XOR) for the targets which have no V_XNOR. This optimization relies on the fact that if the NOT operand is SGPR and V_XOR_B32_e32 can accept SGPR as a first source operand.
I am not sure if it is always safe. The VALU instructions execution is controlled by the EXEC mask but SALU is not.

Fri, Dec 31, 10:43 AM · Restricted Project

Thu, Dec 23

rampitec accepted D116241: [AMDGPU] Changing S_AND_B32 to V_AND_B32_e64 in the divergent 'trunc' to i1 pattern.

LGTM

Thu, Dec 23, 3:18 PM · Restricted Project
rampitec requested review of D116254: [InstCombine] (~(a | b) & c) | ~(a | b | c) -> ~(a | b).
Thu, Dec 23, 2:55 PM · Restricted Project
rampitec added a comment to D116236: [InstCombine] (b & ~a & ~c) | ~(a | b) --> ~((b & c) | a).

A side note: this small change fully or partially simplifies 28 out of original 255 ternary cases.

Thu, Dec 23, 1:22 PM · Restricted Project
rampitec requested review of D116236: [InstCombine] (b & ~a & ~c) | ~(a | b) --> ~((b & c) | a).
Thu, Dec 23, 1:21 PM · Restricted Project
rampitec updated the diff for D116135: [InstCombine] ((~a & ~b & c) | ~(a | b | c) -> ~(a | b).

Fixed formatting.
It will go under the same LHS as above after D116231 (plus it will need some more tests in this case).

Thu, Dec 23, 11:59 AM · Restricted Project
rampitec requested review of D116231: [InstCombine] (~a & ~b & c) | (~a & ~c & b) --> (b ^ c) & ~a.
Thu, Dec 23, 11:51 AM · Restricted Project

Wed, Dec 22

rampitec requested review of D116194: [InstCombine] Factor out a common pattern match used 3 times. NFC..
Wed, Dec 22, 4:10 PM · Restricted Project
rampitec accepted D115996: [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info.

Thanks! LGTM.

Wed, Dec 22, 3:12 PM · Restricted Project
rampitec accepted D116187: [AMDGPU] Select build_vector DAG nodes according to the divergence.

LGTM

Wed, Dec 22, 1:54 PM · Restricted Project
rampitec accepted D116053: [MachineSink] Allow sinking of constant or ignorable physreg uses.

LGTM given the new tests.

Wed, Dec 22, 11:28 AM · Restricted Project

Tue, Dec 21

rampitec added a comment to D116135: [InstCombine] ((~a & ~b & c) | ~(a | b | c) -> ~(a | b).

I will precommit tests if that is generally OK.

Tue, Dec 21, 3:08 PM · Restricted Project
rampitec requested review of D116135: [InstCombine] ((~a & ~b & c) | ~(a | b | c) -> ~(a | b).
Tue, Dec 21, 3:05 PM · Restricted Project
rampitec accepted D115755: [InstSimplify] Fold logic And to Zero.
Tue, Dec 21, 3:01 PM · Restricted Project
rampitec added inline comments to D115755: [InstSimplify] Fold logic And to Zero.
Tue, Dec 21, 2:37 PM · Restricted Project
rampitec added a comment to D116053: [MachineSink] Allow sinking of constant or ignorable physreg uses.

IR is essentially a single thread representation. The implicit exec use is our way to model mutithreaded divergence. Consider this transformation which shall now become legal:

int lid = get_local_id(0);      int lid = get_local_id(0);
int i = 0;                      int i = 0;
x = def();                      do {
do {                        =>    x = def();
  use1(x);                        use1(x);
} while(i++ < lid);             } while(i++ < lid);
use2(x);                        use2(x);

def dominates use2 in both cases, but in the second case not with every lane. All lanes except first will use an undef.

We will not sink into a loop if the def is outside of the loop. In the test case loop_sink_fmac, the def was already in a loop and was why it was able to be sinked.

Tue, Dec 21, 12:59 PM · Restricted Project
rampitec added a comment to D116053: [MachineSink] Allow sinking of constant or ignorable physreg uses.

I am not sure I can prove to myself this is legal. For example you are sinking a def into a loop with divergent condition and this def is used after the loop. Can this happen? If so a def might be done with an exec smaller than a use which creates an undef. Hoisting was OK because def was moved into a direction where exec is strictly not less than before. Did you run PSDB on it?

I believe the requirement for defs to dominate all uses prevents this from happening but I can add more test cases to check for this. This passed PSDB.

Tue, Dec 21, 10:45 AM · Restricted Project

Mon, Dec 20

rampitec added a comment to D115755: [InstSimplify] Fold logic And to Zero.

There are still 2 comments around tests.

Mon, Dec 20, 1:57 PM · Restricted Project
rampitec added a comment to D116053: [MachineSink] Allow sinking of constant or ignorable physreg uses.

I am not sure I can prove to myself this is legal. For example you are sinking a def into a loop with divergent condition and this def is used after the loop. Can this happen? If so a def might be done with an exec smaller than a use which creates an undef. Hoisting was OK because def was moved into a direction where exec is strictly not less than before. Did you run PSDB on it?

Mon, Dec 20, 1:51 PM · Restricted Project
rampitec added inline comments to D115755: [InstSimplify] Fold logic And to Zero.
Mon, Dec 20, 11:53 AM · Restricted Project
rampitec added a comment to D115996: [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info.

Will the slot remain in the final allocation?

Yes. At least in the example I'm seeing.

Mon, Dec 20, 11:51 AM · Restricted Project

Dec 20 2021

rampitec accepted D116044: [AMDGPU] Enable divergence predicates for ctlz/cttz.
Dec 20 2021, 9:21 AM · Restricted Project
rampitec added a comment to D115996: [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info.

Will the slot remain in the final allocation?

Dec 20 2021, 9:19 AM · Restricted Project
rampitec added a comment to D115996: [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info.

Can we rip out this special "spill" handling since D109301 @cdevadas

Dec 20 2021, 9:18 AM · Restricted Project

Dec 17 2021

rampitec added a comment to D115880: [AMDGPU] Generate assertions for lds-atomic-fadd.ll and local-atomics-fp.ll..

I do not believe really needed for D115881.

This is to show the change in SIInsertWaitcnts for local-atomics-fp.ll in D115881 at the moment. If we fix the problems in D115881, hopefully the regression should go away. In that case, this change is a mere test generation.

Dec 17 2021, 10:58 AM · Restricted Project
rampitec accepted D115954: [AMDGPU] Re-enabling divergence predicates for min/max.

LGTM

Dec 17 2021, 10:56 AM · Restricted Project
rampitec accepted D115884: [AMDGPU] Expand not pattern according to the XOR node divergence.

LGTM

Dec 17 2021, 9:58 AM · Restricted Project

Dec 16 2021

rampitec added inline comments to D115884: [AMDGPU] Expand not pattern according to the XOR node divergence.
Dec 16 2021, 11:14 AM · Restricted Project
rampitec requested changes to D115880: [AMDGPU] Generate assertions for lds-atomic-fadd.ll and local-atomics-fp.ll..

It also obscures what is being tested and I do not believe really needed for D115881.

Dec 16 2021, 11:11 AM · Restricted Project

Dec 15 2021

rampitec added inline comments to D115755: [InstSimplify] Fold logic And to Zero.
Dec 15 2021, 2:17 PM · Restricted Project
rampitec added inline comments to D115755: [InstSimplify] Fold logic And to Zero.
Dec 15 2021, 2:16 PM · Restricted Project
rampitec added inline comments to D115755: [InstSimplify] Fold logic And to Zero.
Dec 15 2021, 1:15 PM · Restricted Project
rampitec updated the diff for D114126: [PatternMatch] Create match method to track uses complexity.

Rebased.
Added handling of newly added case (~A & B & C) | ~(A | B) --> (C | ~B) & ~A.

Dec 15 2021, 11:55 AM · Restricted Project
rampitec committed rGe6f694229696: [InstCombine] (~a & b & c) | ~(a | b) -> (c | ~b) & ~a (authored by rampitec).
[InstCombine] (~a & b & c) | ~(a | b) -> (c | ~b) & ~a
Dec 15 2021, 9:37 AM
rampitec closed D113037: [InstCombine] (~a & b & c) | ~(a | b) -> (c | ~b) & ~a.
Dec 15 2021, 9:37 AM · Restricted Project
rampitec added inline comments to D115755: [InstSimplify] Fold logic And to Zero.
Dec 15 2021, 9:20 AM · Restricted Project
rampitec accepted D115807: AMDGPU: Use v_accvgpr_mov_b32 when copying AGPR tuples on gfx90a.

LGTM, thank you!

Dec 15 2021, 9:11 AM · Restricted Project

Dec 14 2021

rampitec accepted D115762: AMDGPU: Remove AMDGPUFixFunctionBitcasts pass.
Dec 14 2021, 3:05 PM · Restricted Project
rampitec added inline comments to D115755: [InstSimplify] Fold logic And to Zero.
Dec 14 2021, 2:21 PM · Restricted Project
rampitec accepted D115753: [AMDGPU] Extract helper function in AsmParser. NFC.
Dec 14 2021, 12:50 PM · Restricted Project
rampitec added a comment to D113037: [InstCombine] (~a & b & c) | ~(a | b) -> (c | ~b) & ~a.

Ping

Dec 14 2021, 11:06 AM · Restricted Project

Dec 13 2021

rampitec accepted D115413: [amdgpu] Drop lowering of LDS used by global variables.

I am doubtful that this would be the correct strategy for initializers of LDS - I suspect that's going to have to be handled by injecting initialisation code into the prologue of kernels and changing the initialisers to undef - but even if it is, we can retrieve it from git at will. Right now it makes callgraph crawling to avoid unnecessary allocation of LDS much more complicated (or imprecise).

I definitely agree that LDS handling is under-tested at present but there's no value I can see to these specific tests. They'll need to be changed to be more useful at which point we're better off deciding what it is we want to check and writing tests that hit that.

Dec 13 2021, 3:08 PM · Restricted Project
rampitec added inline comments to D115675: AMDGPU: Fix assert on function argument as loop condition.
Dec 13 2021, 2:37 PM · Restricted Project
rampitec accepted D115675: AMDGPU: Fix assert on function argument as loop condition.

LGTM

Dec 13 2021, 2:34 PM · Restricted Project
rampitec accepted D115669: AMDGPU: Combine is.shared/is.private of null/undef.

LGTM

Dec 13 2021, 2:22 PM · Restricted Project