Page MenuHomePhabricator

sameerds (Sameer Sahasrabuddhe)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 17 2012, 11:25 AM (396 w, 6 d)

Working on ROCm at AMD

Recent Activity

Today

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

This is a second take of the solution, using SCCs instead of LoopInfo.
This algorithm can also handle irreducible loops, and, generally, any kind of cycle in the graph.

Tue, May 26, 11:25 AM · Restricted Project

Fri, May 22

sameerds accepted D80091: AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks.

LGTM!

Fri, May 22, 7:28 AM · Restricted Project
sameerds added inline comments to D80399: [Local] Prevent `invertCondition` from creating a redundant instruction.
Fri, May 22, 4:45 AM · Restricted Project
sameerds added a comment to D80091: AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks.

I think the patch is okay, but have a couple of doubts about the tests.

Fri, May 22, 3:36 AM · Restricted Project

Thu, May 21

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

TBH, I spent a small amount of time visualizing the new testcase, but couldn't figure out the problem, nor the solution. I see that the check for currentLoopSize is moved up the control flow a bit, but I didn't see why.

We need to move up the check for *CurrentLoopSize, as in the case it is zero, then we no longer need to skip nodes, and just continue to the next node/loop.
The simplified, new test case, demonstrate a case where it is needed.

Thu, May 21, 10:45 PM · Restricted Project
sameerds added a comment to D80399: [Local] Prevent `invertCondition` from creating a redundant instruction.

I support this enhancement. Thanks for doing it!

Thu, May 21, 9:38 PM · Restricted Project

Wed, May 20

sameerds accepted D78881: [FlattenCFG] Fix `MergeIfRegion` in case then-path is empty.

I am NOT familiar with this transformation. But I believe that the patch is well-formed, and I was able to follow the logic. The tests also demonstrate that it works as intended. LGTM.

Wed, May 20, 11:05 PM · Restricted Project

Mon, May 18

sameerds committed rG6c8488436644: [LoopSimplify] don't separate nested loops with convergent calls (authored by sameerds).
[LoopSimplify] don't separate nested loops with convergent calls
Mon, May 18, 10:13 PM
sameerds closed D80078: [LoopSimplify] don't separate nested loops with convergent calls.
Mon, May 18, 10:13 PM · Restricted Project
sameerds added inline comments to D80078: [LoopSimplify] don't separate nested loops with convergent calls.
Mon, May 18, 7:32 PM · Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Mon, May 18, 8:34 AM · Restricted Project, Restricted Project

Sun, May 17

sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Sun, May 17, 10:52 PM · Restricted Project, Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Sun, May 17, 10:52 PM · Restricted Project, Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

TBH, I spent a small amount of time visualizing the new testcase, but couldn't figure out the problem, nor the solution. I see that the check for currentLoopSize is moved up the control flow a bit, but I didn't see why. A few LLVM_DEBUG lines would be very helpful to trace what the structurizer is doing. Also, can the new test case be reduced further? There are a couple of simple chains, and a few branches. Are all of them necessary? The main problem seems to be an inner loop with just two blocks where both blocks exit. I don't know whether both these properties matter. For example, will the problem happen with a single block that loops on itself?

Sun, May 17, 9:16 PM · Restricted Project
sameerds updated the diff for D80078: [LoopSimplify] don't separate nested loops with convergent calls.

Test update: Moved the convergent call to where it is actually relevant.

Sun, May 17, 9:02 AM · Restricted Project
sameerds added reviewers for D80078: [LoopSimplify] don't separate nested loops with convergent calls: arsenm, nhaehnle.
Sun, May 17, 5:19 AM · Restricted Project
sameerds added a reviewer for D80078: [LoopSimplify] don't separate nested loops with convergent calls: chandlerc.
Sun, May 17, 5:19 AM · Restricted Project
sameerds created D80078: [LoopSimplify] don't separate nested loops with convergent calls.
Sun, May 17, 4:47 AM · Restricted Project

Sat, May 16

sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Sat, May 16, 8:08 PM · Restricted Project, Restricted Project

Fri, May 15

sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Fri, May 15, 9:44 AM · Restricted Project, Restricted Project

Mon, May 11

sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Mon, May 11, 10:39 PM · Restricted Project, Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Mon, May 11, 10:07 PM · Restricted Project, Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Mon, May 11, 7:27 PM · Restricted Project, Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
Mon, May 11, 6:55 PM · Restricted Project, Restricted Project

Sun, May 10

sameerds added inline comments to D78900: [HIP][AMDGPU] Enable structurizer workarounds.
Sun, May 10, 9:55 PM · Restricted Project, Restricted Project
sameerds accepted D79037: [StructurizeCFG] Fix region nodes ordering.

LGTM. It would be nice if you can move the following tests to the new file and demonstrate that that they are fixed. But that need not hold up this review any longer.

  • test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug.ll
  • test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug-xfail.ll
Sun, May 10, 7:41 PM · Restricted Project

Thu, May 7

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

Actually, I take that back. Consider the following CFG:

A -> H1, H2
H1->B, C
B->H1
C->B, H2
H2->E
E->H2, F

A is the entry and F is the exit. The loops (H1, B, C) and (H2, E) are siblings with backedges B->H1 and E->H2 respectively. C is an exiting block in the first loop that branches to H2. One POT would be "B, F, E, H2, C, H1, A". When iterating in reverse, the loop with H1 is started first, but H2 from the second loop is encountered before the first loop is completed.

In this case, those loops are in different regions, and as this is a region pass, they won't appear in the same POT.

Thu, May 7, 5:37 AM · Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

I see. It sure does look like PR25378. Then the only thing left is one new test with two sibling loops. The intention is to cover the case where CurrentLoop is not the parent of L.

I don't think it is possible, although I might be wrong.
Because the way I see it, for loop A be interfered with another loop B, while POT, there must be at least one block that belongs to loop A that is dependent on at least one block in loop B. But then loop B must be finished before loop A, and actually inside it. Otherwise, it won't be recognized as a separate loop.

I hope I explained it clear enough. Otherwise, can you please give an example?

I agree. That means that if L exists, then CurrentLoop must be its parent, right? So instead of checking for this condition, the code will have to assert it.

Thu, May 7, 2:48 AM · Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

I see. It sure does look like PR25378. Then the only thing left is one new test with two sibling loops. The intention is to cover the case where CurrentLoop is not the parent of L.

I don't think it is possible, although I might be wrong.
Because the way I see it, for loop A be interfered with another loop B, while POT, there must be at least one block that belongs to loop A that is dependent on at least one block in loop B. But then loop B must be finished before loop A, and actually inside it. Otherwise, it won't be recognized as a separate loop.

I hope I explained it clear enough. Otherwise, can you please give an example?

Thu, May 7, 1:41 AM · Restricted Project

Wed, May 6

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

The problem with those tests is something else, and not produced by the wrong ordering. You may refer to PR25378 for an example.
Still, this fix the ordering issue and fixes PR41509 (and its duplicates).
I think we should insert this patch, and attend to the other bug as a separate issue.

Wed, May 6, 8:35 AM · Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

I have bad news. This fix does not help with the test needs-fr-ule.ll, i.e., the resulting CFG is incorret if I remove the unify-loop-exits pass. The fix-irreducible pass produces an inner loop with the header "irr.guard" and a latch "while.cond47". This is inside an outer loop with header "while.cond" and many backedges. When I run the structurizer with the current fix on this graph, the backedge of the inner loop is lost. Instead we see a spurious loop between two new blocks Flow5 and Flow4, which is governed by incorrect boolean constants when in fact they should have been %Pred variables. That information is lost along the edge Flow8 to Flow5.

Wed, May 6, 2:39 AM · Restricted Project

Tue, May 5

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

The implementation looks okay to me. The new comments help a lot. Waiting for tests mentioned in the comments so far ... one new test (sibling loops where the DFS jumps from one to the other before finishing the first loop) and updates to existing tests.

Tue, May 5, 2:06 AM · Restricted Project

Sun, May 3

sameerds added inline comments to D79037: [StructurizeCFG] Fix region nodes ordering.
Sun, May 3, 9:18 PM · Restricted Project
sameerds added inline comments to D79037: [StructurizeCFG] Fix region nodes ordering.
Sun, May 3, 7:57 AM · Restricted Project

Sat, May 2

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

It seems the new commit is an incremental update on the previous commit. Can you please submit a single squashed commit, so that the change can be compared against the original state of the trunk?

I am not sure what do you mean. In this page in the Phabricator, under [Revision Contents]->[History] you can choose which of the patch revisions to diff. The default is "Base" against the latest "Diff no.".

Sat, May 2, 11:06 AM · Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

It seems the new commit is an incremental update on the previous commit. Can you please submit a single squashed commit, so that the change can be compared against the original state of the trunk?

Sat, May 2, 3:40 AM · Restricted Project

Wed, Apr 29

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

This needs more tests. There are a bunch of decisions being taken during the reordering, but the test peresented simply exercises the case of one nested loop inside another. For example, what about two inner loops L1 and L2, where L1 has an exit that reaches the entry of L2? There might be other interesting cases too.

Wed, Apr 29, 10:43 AM · Restricted Project

Mon, Apr 27

sameerds added inline comments to D78900: [HIP][AMDGPU] Enable structurizer workarounds.
Mon, Apr 27, 8:29 PM · Restricted Project, Restricted Project
sameerds committed rG848876368235: [NFC] UnifyLoopExits: correctly skip expensive checks (authored by sameerds).
[NFC] UnifyLoopExits: correctly skip expensive checks
Mon, Apr 27, 3:11 AM
sameerds created D78900: [HIP][AMDGPU] Enable structurizer workarounds.
Mon, Apr 27, 12:29 AM · Restricted Project, Restricted Project
sameerds added reviewers for D78900: [HIP][AMDGPU] Enable structurizer workarounds: yaxunl, scchan, arsenm, rampitec, dstuttard.
Mon, Apr 27, 12:29 AM · Restricted Project, Restricted Project

Apr 26 2020

sameerds committed rG06bdffb2bb45: [AMDGPU] Expose llvm fence instruction as clang intrinsic (authored by saiislam).
[AMDGPU] Expose llvm fence instruction as clang intrinsic
Apr 26 2020, 9:15 PM
sameerds closed D75917: Expose llvm fence instruction as clang intrinsic.
Apr 26 2020, 9:15 PM · Restricted Project

Apr 23 2020

sameerds accepted D75917: Expose llvm fence instruction as clang intrinsic.

Thanks @saiislam ... this looks much better!

Apr 23 2020, 8:06 PM · Restricted Project

Apr 22 2020

sameerds added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

Can you please submit a squashed single commit for review against the master branch? All the recent commits seem to be relative to previous commits, and I am having trouble looking at the change as a whole.

Apr 22 2020, 8:06 PM · Restricted Project

Apr 21 2020

sameerds committed rG5a7a6382bc06: FixIrreducible: don't crash when moving a child loop (authored by sameerds).
FixIrreducible: don't crash when moving a child loop
Apr 21 2020, 7:29 PM
sameerds closed D78544: FixIrreducible: don't crash when moving a child loop.
Apr 21 2020, 7:29 PM · Restricted Project
sameerds updated the diff for D78544: FixIrreducible: don't crash when moving a child loop.

review comments

Apr 21 2020, 9:10 AM · Restricted Project
sameerds added a comment to D77198: Introduce fix-irreducible pass.

@sameerds Please can you take a look at http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/23136 - I think you introduced a crash on the nested.ll test on the windows EXPENSIVE_CHECK build bot

Apr 21 2020, 3:44 AM · Restricted Project
sameerds updated the diff for D78544: FixIrreducible: don't crash when moving a child loop.

clang-format

Apr 21 2020, 2:07 AM · Restricted Project
sameerds created D78544: FixIrreducible: don't crash when moving a child loop.
Apr 21 2020, 1:35 AM · Restricted Project
sameerds updated the diff for D78544: FixIrreducible: don't crash when moving a child loop.

Added bug ID to the commit description

Apr 21 2020, 1:35 AM · Restricted Project
sameerds updated the summary of D78544: FixIrreducible: don't crash when moving a child loop.
Apr 21 2020, 1:35 AM · Restricted Project

Apr 15 2020

sameerds committed rG7bb9f500e24a: fix warning: specialization of template in different namespace (authored by sameerds).
fix warning: specialization of template in different namespace
Apr 15 2020, 3:47 AM
sameerds committed rG8c11bc0cd06f: Introduce fix-irreducible pass (authored by sameerds).
Introduce fix-irreducible pass
Apr 15 2020, 2:42 AM

Apr 14 2020

sameerds added a reverting change for rG2ada8e2525dd: Introduce fix-irreducible pass: rG44e09b59b869: Revert "Introduce fix-irreducible pass".
Apr 14 2020, 11:57 PM
sameerds committed rG44e09b59b869: Revert "Introduce fix-irreducible pass" (authored by sameerds).
Revert "Introduce fix-irreducible pass"
Apr 14 2020, 11:57 PM
sameerds committed rG2ada8e2525dd: Introduce fix-irreducible pass (authored by sameerds).
Introduce fix-irreducible pass
Apr 14 2020, 11:23 PM
sameerds closed D77198: Introduce fix-irreducible pass.
Apr 14 2020, 11:22 PM · Restricted Project
sameerds added inline comments to D77198: Introduce fix-irreducible pass.
Apr 14 2020, 11:22 PM · Restricted Project

Apr 8 2020

sameerds updated the diff for D77198: Introduce fix-irreducible pass.

added an ascii diagram illustrating the transformation

Apr 8 2020, 12:29 AM · Restricted Project

Apr 6 2020

sameerds added inline comments to D75917: Expose llvm fence instruction as clang intrinsic.
Apr 6 2020, 9:48 PM · Restricted Project
sameerds requested changes to D75917: Expose llvm fence instruction as clang intrinsic.
Apr 6 2020, 10:18 AM · Restricted Project
sameerds added inline comments to D75917: Expose llvm fence instruction as clang intrinsic.
Apr 6 2020, 8:06 AM · Restricted Project
sameerds added a reviewer for D75917: Expose llvm fence instruction as clang intrinsic: arsenm.
Apr 6 2020, 8:05 AM · Restricted Project

Apr 5 2020

sameerds updated the diff for D77198: Introduce fix-irreducible pass.

missing clang-format; sorry about that noise!

Apr 5 2020, 11:03 PM · Restricted Project
sameerds updated the diff for D77198: Introduce fix-irreducible pass.

removed unrelated change; refactored out a dubious nullptr assignment

Apr 5 2020, 10:56 PM · Restricted Project
sameerds requested changes to D75917: Expose llvm fence instruction as clang intrinsic.
Apr 5 2020, 9:21 PM · Restricted Project

Apr 4 2020

sameerds accepted D74139: AMDGPU: Remove dead paths for requiresUniformRegister.
Apr 4 2020, 12:52 AM · Restricted Project

Apr 2 2020

sameerds accepted D77228: [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU; add support for -opt-bisect-limit; update tests..
Apr 2 2020, 8:04 PM · Restricted Project, Restricted Project

Apr 1 2020

sameerds added inline comments to D77228: [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU; add support for -opt-bisect-limit; update tests..
Apr 1 2020, 7:37 PM · Restricted Project, Restricted Project
sameerds added inline comments to D77228: [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU; add support for -opt-bisect-limit; update tests..
Apr 1 2020, 7:37 PM · Restricted Project, Restricted Project
sameerds added reviewers for D77198: Introduce fix-irreducible pass: arsenm, nhaehnle, tpr.
Apr 1 2020, 1:36 AM · Restricted Project
sameerds created D77198: Introduce fix-irreducible pass.
Apr 1 2020, 1:36 AM · Restricted Project

Mar 30 2020

sameerds committed rG3cbbded68c26: Introduce unify-loop-exits pass. (authored by sameerds).
Introduce unify-loop-exits pass.
Mar 30 2020, 10:50 AM
sameerds closed D75865: Introduce unify-loop-exits pass..
Mar 30 2020, 10:50 AM · Restricted Project
sameerds updated the diff for D75865: Introduce unify-loop-exits pass..

clang-format

Mar 30 2020, 3:13 AM · Restricted Project

Mar 29 2020

sameerds added inline comments to D75865: Introduce unify-loop-exits pass..
Mar 29 2020, 7:48 PM · Restricted Project
sameerds updated the diff for D75865: Introduce unify-loop-exits pass..

review comments: reorder loops and merge the predicate logic

Mar 29 2020, 7:48 PM · Restricted Project

Mar 23 2020

sameerds updated the diff for D75865: Introduce unify-loop-exits pass..

require and preserve lower-switch pass

Mar 23 2020, 1:06 AM · Restricted Project

Mar 19 2020

sameerds updated the diff for D75865: Introduce unify-loop-exits pass..

Added a comment about requiring direct branches; removed an empty string from debug output

Mar 19 2020, 7:45 PM · Restricted Project
sameerds added inline comments to D75865: Introduce unify-loop-exits pass..
Mar 19 2020, 7:45 PM · Restricted Project

Mar 17 2020

sameerds added inline comments to D75865: Introduce unify-loop-exits pass..
Mar 17 2020, 10:43 AM · Restricted Project
sameerds added a comment to D75865: Introduce unify-loop-exits pass..

Bump!

Mar 17 2020, 9:37 AM · Restricted Project
sameerds accepted D76263: AMDGPU: Initial, crude support for indirect calls.
Mar 17 2020, 9:06 AM

Mar 16 2020

sameerds added a comment to D76263: AMDGPU: Initial, crude support for indirect calls.

Here's what I understood what this change is doing, based on just interpreting the names in the code for their English meaning: with this change, it is possible to call a function pointer. You can see a valid swappc in the caller, and the in/out arguments will be setup properly.

Mar 16 2020, 10:00 PM
sameerds added a comment to D75917: Expose llvm fence instruction as clang intrinsic.
Mar 16 2020, 10:00 PM · Restricted Project
sameerds added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

Well, there is a problem: The LangRef says that scopes are target-defined. This change says that scopes are defined by the high-level language and further assumes that OpenCL scopes make sense in all languages. Besides conflicting with the LangRef, this not seem to work with C++, which has no scopes and nor with CUDA or HIP, whose scopes are not represented in any AtomicScopeModel.

I don't follow. IR has a fence instruction. This builtin maps directly to it, passing whatever integer arguments were given to the intrinsic along unchanged. It's exactly as valid, or invalid, as said fence instruction.

Mar 16 2020, 5:31 PM · Restricted Project

Mar 14 2020

sameerds updated the diff for D75865: Introduce unify-loop-exits pass..

Fixed the ascii art example to be correct in terms of SSA. The extra
edge from In1 to Out2 would be nice to have, but it breaks the
pre-condition that Def dominates Use.

Mar 14 2020, 9:30 PM · Restricted Project
sameerds added inline comments to D75865: Introduce unify-loop-exits pass..
Mar 14 2020, 3:10 AM · Restricted Project
sameerds updated the diff for D75865: Introduce unify-loop-exits pass..

Addressed review comments:

  1. New ascii art.
  2. Split the large function CreateControlFlowHub into smaller functions.
Mar 14 2020, 3:10 AM · Restricted Project

Mar 11 2020

sameerds added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?

Identical to the fence instruction. Which is assumed well thought through already, given it's an IR instruction.

As far as I can tell, fence composes sensibly with other IR then generates the right thing at the back end. So it looks fit for purpose, just not currently available from clang.

Mar 11 2020, 8:31 AM · Restricted Project

Mar 10 2020

sameerds added a comment to D75917: Expose llvm fence instruction as clang intrinsic.

The commit summary needs improvement. The syntax is not really necessary there, but instead this needs a better explanation of how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?

Mar 10 2020, 7:42 PM · Restricted Project

Mar 9 2020

sameerds added reviewers for D75865: Introduce unify-loop-exits pass.: arsenm, nhaehnle, rampitec, tpr.
Mar 9 2020, 11:53 AM · Restricted Project
sameerds created D75865: Introduce unify-loop-exits pass..
Mar 9 2020, 11:52 AM · Restricted Project

Mar 4 2020

sameerds committed rG42febbab9138: StructurizeCFG: simplify phi nodes when possible (authored by sameerds).
StructurizeCFG: simplify phi nodes when possible
Mar 4 2020, 9:45 PM
sameerds closed D75500: StructurizeCFG: simplify phi nodes when possible.
Mar 4 2020, 9:45 PM · Restricted Project
sameerds updated the diff for D75500: StructurizeCFG: simplify phi nodes when possible.

addressed review comments

Mar 4 2020, 8:07 PM · Restricted Project

Mar 3 2020

sameerds updated the diff for D75500: StructurizeCFG: simplify phi nodes when possible.

address review comments

Mar 3 2020, 7:37 PM · Restricted Project
sameerds added inline comments to D75500: StructurizeCFG: simplify phi nodes when possible.
Mar 3 2020, 7:02 PM · Restricted Project