Page MenuHomePhabricator

sameerds (Sameer Sahasrabuddhe)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 17 2012, 11:25 AM (452 w, 2 d)

Working on ROCm at AMD

Recent Activity

Thu, Jun 17

sameerds updated subscribers of D104504: RFC: Update token semantics with default convergent attribute.
Thu, Jun 17, 11:18 PM · Restricted Project
sameerds updated subscribers of D104504: RFC: Update token semantics with default convergent attribute.
Thu, Jun 17, 8:03 PM · Restricted Project
sameerds added reviewers for D104504: RFC: Update token semantics with default convergent attribute: tra, jdoerfert, nhaehnle, Anastasia, simoll, jlebar, mehdi_amini.
Thu, Jun 17, 7:54 PM · Restricted Project
sameerds requested review of D104504: RFC: Update token semantics with default convergent attribute.
Thu, Jun 17, 7:46 PM · Restricted Project

Apr 30 2021

sameerds added a comment to D69498: IR: Invert convergent attribute handling.

I wonder if it is necessary for the exec mask to be implicit state, managed by a convergent/divergent abstraction.

We could reify it as an argument passed to kernels (where all bits are set), and adjusted by phi nodes at entry to basic blocks. Various intrinsics take and return that reified value. __ballot, a comparison op, possibly load/store.

At that point all the awkward control flow constraints are raised to data flow, and I think (but haven't really chased this into the dark corners) that solves the problems I know about. Notably, a uniform branch would be one where the exec mask value was unchanged, so the associated phi is constant folded.

Makes a mess of the human readable IR, but possibly at a lower overall complexity than continuing to handle divergence as control flow.

Apr 30 2021, 4:43 AM · Restricted Project

Apr 23 2021

sameerds added a comment to D69498: IR: Invert convergent attribute handling.

I don't have much to add to the conversation except to point out that this definition defines noconvergent in terms of divergent control flow, but the langref itself doesn't define what divergent control flow is, which makes it an incomplete spec. (Perhaps I'm just restating @arsenm's objections.) This seems unsatisfactory to me but I have no idea what to do about it. I agree with @sameerds that the current definition of convergent is too restrictive because in practice we really do want to be able to move convergent calls past uniform control flow.

That is one of the things that D85603 addresses. I suspect Sameer was assuming the language from there in the background.

I agree with Matt that it would be best to avoid too much TTI dependence. The existing situation with the intrinsics is already a bit of a strange one. I wonder if it is possible to move to a world where isSourceOfDivergence need not be target-dependent. (This requires e.g. new attributes on function arguments as well as new information about address spaces in the data layout, plus some new attributes to define intrinsic data flow. Likely beyond the scope of this patch.)

Apr 23 2021, 7:27 PM · Restricted Project

Apr 22 2021

sameerds added a comment to D69498: IR: Invert convergent attribute handling.
  1. It is actually okay add control dependences to a convergent function as long as the new branches are uniform.
  2. Some times we hack a transform to avoid doing things that cannot be described as "add new control dependence", and still talk about the function having the above named attribute.

There is another important bug that obfuscates the whole discussion: most places that use "isConvergent()" should in fact first check if it really matters to the surrounding control flow. There is too much loaded onto that one attribute, without sufficient context. The definition of "noconvergent" that I am proposing starts out by first fixing the definition of convergent itself. This definition is independent of target, and only talks about the properties of the control flow that reach the callsite. To repeat, this does not reinterpret the IR in a target-defined way. Like I said, in the new definition, all function calls are convergent even on CPUs, so I don't see where the target comes in. If you still insist on talking about interpretation of core attributes, please start by deconstructing the definition that I propose so I can see what I am missing.

Yes, the current concept of convergent is broken. I think this whole recent burst of conversation has been too focused on the current convergent situation and this patch in particular. I think we need to conceptually rebase this into the world that D85603 creates.

Apr 22 2021, 7:16 PM · Restricted Project
sameerds added a comment to D69498: IR: Invert convergent attribute handling.

You're still saying the same thing. This needs to be defined generically. Frontends don't *have* to to anything, they'll just get the assumed convergent behavior by default. Either frontends could always add noconvergent to avoid any possible optimization hit, or we could have a pass add noconvergent depending on the target. I don't want to change the interpretation of core attributes based on the target

Apr 22 2021, 10:04 AM · Restricted Project

Apr 21 2021

sameerds added a comment to D69498: IR: Invert convergent attribute handling.

I realize now that what @foad says above puts the idea in a clearer context. Essentially, any check for isConvergent() isn't happening in a vacuum. It is relevant only in the presence of divergent control flow, which in turn is relevant only when the target has divergence. Any standalone check for isConvergent() is merely making a pessimistic assumption that all the control flow incident on it is divergent. This has two consequences:

  1. The meaning of the convergent attribute has always been target-dependent.
  2. Dependence on TTI is not a real cost at all. We may eventually update every use of isConvergent() to depend on a check for divergence. The check for TTI is only the first step towards that.

The core IR semantics must *not* depend on target interpretation. convergent has never been target dependent, and was defined in an abstract way. Only certain targets will really care, but we cannot directly define it to mean what we want it to mean for particular targets

Apr 21 2021, 7:43 PM · Restricted Project
sameerds added a comment to D69498: IR: Invert convergent attribute handling.

I realize now that what @foad says above puts the idea in a clearer context. Essentially, any check for isConvergent() isn't happening in a vacuum. It is relevant only in the presence of divergent control flow, which in turn is relevant only when the target has divergence. Any standalone check for isConvergent() is merely making a pessimistic assumption that all the control flow incident on it is divergent. This has two consequences:

Apr 21 2021, 8:55 AM · Restricted Project
sameerds added a comment to D69498: IR: Invert convergent attribute handling.

This recasts the whole question to be one about combining optimizations with target-specific information. The only changes required are in transforms that check CallInst::isConvergent(). These should now also check TTI, possibly adding a dependency on the TTI analysis where it didn't exist earlier.

@sameerds I agree with your conclusions but I would describe the situation a little differently. As I understand it, the optimizations that check isConvergent really only care about moving convergent calls past control flow that might be divergent. !hasBranchDivergence is a promise that there are no possible sources of divergence for a target, so you can run a divergence analysis if you like but it will just tell you that everything is uniform, so all control flow is uniform, so it's OK to move isConvergent calls around.

In practice the optimizations that check isConvergent don't seem to use divergence analysis, they just pessimistically assume that any control flow might be divergent (if hasBranchDivergence). But they could and perhaps should use divergence analysis, and then it would all just fall out in the wash with no need for an explicit hasBranchDivergence test.

Apr 21 2021, 4:28 AM · Restricted Project

Apr 20 2021

sameerds added a comment to D69498: IR: Invert convergent attribute handling.

The way I see it, the notion of convergence is relevant only to a certain class of targets (usually represented by GPUs) and it only affects certain optimizations. Then why not have only these optimizations check TTI to see if convergence matters? TTI.hasBranchDivergence() seems like a sufficient proxy for this information.

Apr 20 2021, 3:59 PM · Restricted Project

Apr 15 2021

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

This needs to be inverted in the spirit of D69498. I would propose the following tweak:

  1. By default, every call has an implicit convergencectrl bundle with a token returned by the @llvm.experimental.convergence.entry intrinsic from the entry block of the caller. This default is the most conservative setting within the semantics defined here.
  2. A more informed frontend or a suitable transformation can replace this conservative token with one of the following:
    1. A token returned by any of the other intrinsics, which provides more specific information about convergence at this callsite.
    2. A predefined constant token (say none), which indicates complete freedom. This would be equivalent to the noconvergent attribute proposed in D69498.

Such a rewording would invert how we approach the spec. Instead of a representation that explicitly talks about special intrinsics that "need" convergence, the new semantics applies to all function calls. The redefined default is conservative instead of free, and the presence of the bundles relaxes the default instead of adding constraints.

Sounds good. If that would be acceptable to the wider community it might help to simplify the frontend design and improve the user interface and the coherence of the interfaces within the compiler stack too.

Apr 15 2021, 10:51 AM · Restricted Project

Apr 11 2021

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

Sorry for not being clear - I was talking about two separate threads here (1) generalizing convergent attribute to non-uniform CF that is addressed by this patch and (2) inverting convergent attribute that is addressed in https://reviews.llvm.org/D69498. Just to provide more details regarding (2) - right now in clang we have a logic that adds convergent to every single function because when we parse the function we don't know whether it will call any function in a call tree that would use convergent operations. Therefore we need to be conservative to prevent incorrect optimizations but this is not ideal for multiple reasons. The optimiser can undo all or some of those convergent decorations if it can prove they are not needed. And for the uniform CF convergent operations this was the only "broken" functionality to my memory.

Apr 11 2021, 9:38 PM · Restricted Project

Apr 9 2021

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

Regarding the HLL and frontend side, I believe this could be represented fairly similarly in different C/C++-based languages - considering that we already follow the same implementation for existing convergent semantics at least between CUDA and OpenCL. However, it isn't yet in its optimal state and perhaps we can attempt to refine this topic holistically for example also addressing the following rework that removes the need to make everything convergent: https://reviews.llvm.org/D69498. Otherwise, we will likely have to generate the convergent intrinsics absolutely everywhere, which is not ideal!

Apr 9 2021, 12:53 AM · Restricted Project

Apr 5 2021

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

Few questions below. Please bear with me as I try to grok the proposal and the long stream of comments...

I believe that with this proposal, we can model this with the attributes we have by saying that subgroupShuffle is convergent readnone, while __shfl_sync is inaccessiblememonly.

subgroupShuffle would require convergentctrl and __shfl_sync would not, correct?

Apr 5 2021, 8:18 AM · Restricted Project

Mar 25 2021

sameerds updated subscribers of D98958: [NewPM] Disable non-trivial loop-unswitch on targets with divergence.
Mar 25 2021, 9:59 PM · Restricted Project
sameerds committed rGb92c8c22b924: [NewPM] Disable non-trivial loop-unswitch on targets with divergence (authored by sameerds).
[NewPM] Disable non-trivial loop-unswitch on targets with divergence
Mar 25 2021, 4:29 AM
sameerds closed D98958: [NewPM] Disable non-trivial loop-unswitch on targets with divergence.
Mar 25 2021, 4:28 AM · Restricted Project

Mar 23 2021

sameerds added inline comments to D98958: [NewPM] Disable non-trivial loop-unswitch on targets with divergence.
Mar 23 2021, 10:08 PM · Restricted Project
sameerds added inline comments to D99128: [AMDGPU] Removed unnecessary cache invalidations..
Mar 23 2021, 8:18 AM · Restricted Project

Mar 22 2021

sameerds added inline comments to D98958: [NewPM] Disable non-trivial loop-unswitch on targets with divergence.
Mar 22 2021, 5:38 PM · Restricted Project

Mar 21 2021

sameerds added a comment to D98958: [NewPM] Disable non-trivial loop-unswitch on targets with divergence.

Improved the readability of a potentially triple-negative boolean condition

Mar 21 2021, 10:56 PM · Restricted Project
sameerds updated the diff for D98958: [NewPM] Disable non-trivial loop-unswitch on targets with divergence.

Improved the readability of a potentially triple-negative boolean condition

Mar 21 2021, 10:50 PM · Restricted Project

Mar 19 2021

sameerds added reviewers for D98958: [NewPM] Disable non-trivial loop-unswitch on targets with divergence: foad, arsenm, rampitec, aeubanks, tra, asbirlea.
Mar 19 2021, 8:57 AM · Restricted Project
sameerds abandoned D96754: [NewPM] Use stale divergence analysis with SimpleLoopUnswitch.

This needs a deeper design. For now, the quick way forward is to disable loop unswitching on divergent non-trivial branches:

Mar 19 2021, 8:56 AM · Restricted Project
sameerds requested review of D98958: [NewPM] Disable non-trivial loop-unswitch on targets with divergence.
Mar 19 2021, 8:52 AM · Restricted Project

Feb 16 2021

sameerds added a comment to D96754: [NewPM] Use stale divergence analysis with SimpleLoopUnswitch.

That is not true. The flow in the legacy pass manager is more involved. The function LoopPass::preparePassManager() actually makes sure that if a loop transform T running inside a loop pass manager M invalidates analyses used by other passes in M, then T is split out into a separate loop pass manager M'. Thus every instance of loop pass manager is responsible for making sure that the used analyses are recomputed before starting any passes. The divergence analysis is not computed in the middle of loop unswitching like you seem to be suggesting here.

But now that I looked at it more closely, this does invalidate my original assumption that the legacy PM is providing a stale analysis. In that sense, my patch should not be seen as reproducing existing behaviour ... it's turning out to be more of a hack because the new PM lacks some functionality. There doesn't seem to be a way for a transform in the new PM to isolate itself from the effects of other transforms that invalidate analyses in the outer analysis manager.

Feb 16 2021, 10:36 PM · Restricted Project
sameerds added a comment to D96754: [NewPM] Use stale divergence analysis with SimpleLoopUnswitch.

This is absolutely not the right way resolve this.
First, the restriction to not allow getting a stale analysis is very much intentional and part of the design of the new pass manager. The API being added here must not exist.
Second, it is not safe to use the stale DA.

Feb 16 2021, 9:28 PM · Restricted Project
sameerds added reviewers for D96754: [NewPM] Use stale divergence analysis with SimpleLoopUnswitch: foad, arsenm, rampitec, aeubanks, tra, asbirlea.
Feb 16 2021, 12:11 AM · Restricted Project

Feb 15 2021

sameerds requested review of D96754: [NewPM] Use stale divergence analysis with SimpleLoopUnswitch.
Feb 15 2021, 11:57 PM · Restricted Project
sameerds committed rG11bf7da64a11: [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager (authored by sameerds).
[NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager
Feb 15 2021, 8:58 PM
sameerds closed D96615: [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager.
Feb 15 2021, 8:57 PM · Restricted Project

Feb 12 2021

sameerds added a comment to D96615: [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager.

lgtm, thanks for doing this!

Feb 12 2021, 6:19 PM · Restricted Project
sameerds added reviewers for D96615: [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager: simoll, aeubanks, arsenm, tra.
Feb 12 2021, 9:41 AM · Restricted Project
sameerds requested review of D96615: [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager.
Feb 12 2021, 9:36 AM · Restricted Project

Jan 20 2021

sameerds committed rGc540ce9900ff: [AMDGPU] pin lit test divergent-unswitch.ll to the old pass manager (authored by sameerds).
[AMDGPU] pin lit test divergent-unswitch.ll to the old pass manager
Jan 20 2021, 8:33 AM
sameerds closed D95051: [AMDGPU] pin lit test divergent-unswitch.ll to the old pass manager.
Jan 20 2021, 8:33 AM · Restricted Project
sameerds added reviewers for D95051: [AMDGPU] pin lit test divergent-unswitch.ll to the old pass manager: foad, arsenm, rampitec.
Jan 20 2021, 8:10 AM · Restricted Project
sameerds requested review of D95051: [AMDGPU] pin lit test divergent-unswitch.ll to the old pass manager.
Jan 20 2021, 8:09 AM · Restricted Project

Nov 7 2020

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

Roughly speaking, there are subgroup ops with "implicit" thread set (what Vinod calls textually aligned, and what this proposal mostly focuses on, because they require the most additional explanation) and subgroup ops with "explicit" thread set (sm70+).

Nov 7 2020, 3:42 AM · Restricted Project

Nov 6 2020

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.
  • Will this paint us into a corner wrt CUDA, and specifically sm70+?

/me summons @wash, who is probably a better person to speak to this than me.

My understanding is that the semantics of <sm70 convergent are pretty similar to what is described in these examples. But starting in sm70+, each sync operation takes an arg specifying which threads in the warp participate in the instruction.

I admit I do not fully understand what the purpose of this is. At one point in time I thought it was to let humans write (or compilers generate) code like this, where the identity of the convergent instruction does not matter.

// Warning, does not seem to work on sm75
if (cond)
  __syncwarp(FULL_MASK);
else
  __syncwarp(FULL_MASK);

but my testcase, https://gist.github.com/50d1b5fedc926c879a64436229c1cc05, dies with an illegal-instruction error (715) when I make cond have different values within the warp. So, guess not?

Anyway, clearly I don't fully understand the sm70+ convergence semantics. I'd ideally like someone from nvidia (hi, @wash) to speak to whether we can represent their convergent instruction semantics using this proposal. Then we should also double-check that clang can in fact generate the relevant LLVM IR.

Nov 6 2020, 1:17 AM · Restricted Project

Nov 3 2020

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

I believe what is described here about convergent, as best I can understand it, is the semantics of syncthreads in CUDA. This semantics is the same for <sm70 and sm70+. Not clear whether what is described here is a "textually aligned" semantics or unaligned. syncthreads is aligned, meaning that all threads in the threadblock must wait on the same lexical syncthreads().

Nov 3 2020, 1:03 AM · Restricted Project

Oct 8 2020

sameerds accepted D89051: [FixIrreducible][NewPM] Port -fix-irreducible to NPM.

LGTM. Thanks!

Oct 8 2020, 11:11 PM · Restricted Project

Sep 16 2020

sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Sep 16 2020, 4:51 AM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Sep 16 2020, 4:43 AM · Restricted Project

Sep 13 2020

sameerds accepted D87548: [UnifyLoopExits] Fix non-deterministic iteration order.

LGTM!

Sep 13 2020, 5:26 PM · Restricted Project

Sep 7 2020

sameerds accepted D84413: [DA][SDA] SyncDependenceAnalysis re-write.

LGTM, to the extent of my high-level understanding of the analysis. I would recommend waiting a few days and give others a chance to finish the review. I do have some nitpicks, but feel free to decide which ones warrant a change.

Sep 7 2020, 3:03 AM · Restricted Project

Aug 13 2020

sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 13 2020, 9:40 AM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 13 2020, 2:57 AM · Restricted Project

Aug 12 2020

sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 12 2020, 9:59 PM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 12 2020, 12:03 AM · Restricted Project

Aug 11 2020

sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 11 2020, 9:49 PM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 11 2020, 1:42 AM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 11 2020, 1:25 AM · Restricted Project

Aug 6 2020

sameerds committed rGc530539badd5: [AArch64][NFC] require aarch64 support for hwasan test (authored by sameerds).
[AArch64][NFC] require aarch64 support for hwasan test
Aug 6 2020, 8:56 PM
sameerds closed D85412: [AArch64][NFC] require aarch64 support for hwasan test.
Aug 6 2020, 8:56 PM · Restricted Project
sameerds added reviewers for D85412: [AArch64][NFC] require aarch64 support for hwasan test: aemerson, arsenm.
Aug 6 2020, 1:52 AM · Restricted Project
sameerds requested review of D85412: [AArch64][NFC] require aarch64 support for hwasan test.
Aug 6 2020, 1:50 AM · Restricted Project

Jul 27 2020

sameerds added inline comments to D84413: [DA][SDA] SyncDependenceAnalysis re-write.
Jul 27 2020, 10:23 PM · Restricted Project

Jul 12 2020

sameerds added a comment to D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot.

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I think if the language interface insists on fixing the wave size, then I think the correct solution is to implement this in the header based on a wave size macro (which we're desperately missing). The library implementation should be responsible for inserting the extension to 64-bit for wave32

Jul 12 2020, 12:19 AM

Jul 11 2020

sameerds accepted D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare..

LGTM!

Jul 11 2020, 3:45 AM · Restricted Project

Jul 10 2020

sameerds added a comment to D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare..

From the description: "This patch inserts the Lowerswitch pass in an appropriate place to ensure any dead blocks resulting from the transformation will be removed"

Jul 10 2020, 9:44 PM · Restricted Project
sameerds added inline comments to D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare..
Jul 10 2020, 9:42 PM · Restricted Project
sameerds accepted D83562: [fix-irreducible] Skip unreachable predecessors..

LGTM!

Jul 10 2020, 9:33 PM · Restricted Project
sameerds updated subscribers of D83562: [fix-irreducible] Skip unreachable predecessors..

@cdevadas reported that other parts of the AMDGPU backend are also affected by the unreachable blocks being produced in the switch lowering. Instead of fixing each such pass separately, it seems the best way forward it to put the block elimination earlier in the flow. @cdevadas is already working on such a change.

Jul 10 2020, 9:26 AM · Restricted Project

Jul 9 2020

sameerds added a comment to D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways.

Rebase.
Fix silly mistake in checking for negative offsets.

Jul 9 2020, 7:47 PM · Restricted Project

Jul 8 2020

sameerds added a comment to D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot.

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

Jul 8 2020, 8:51 PM

Jun 18 2020

sameerds committed rG7aad220795b5: [DA] conservatively mark the join of every divergent branch (authored by sameerds).
[DA] conservatively mark the join of every divergent branch
Jun 18 2020, 5:25 AM
sameerds closed D81806: [DA] conservatively mark the join of every divergent branch.
Jun 18 2020, 5:24 AM · Restricted Project

Jun 17 2020

sameerds retitled D81806: [DA] conservatively mark the join of every divergent branch from [DivergenceAnalysis] mark join of divergent loop exits to [DA] conservatively mark the join of every divergent branch.
Jun 17 2020, 9:02 PM · Restricted Project
sameerds updated the diff for D81806: [DA] conservatively mark the join of every divergent branch.

filed a bug; improved the description; added a failing testcase

Jun 17 2020, 8:31 PM · Restricted Project

Jun 16 2020

sameerds added a comment to D81806: [DA] conservatively mark the join of every divergent branch.

@simoll, I've been staring at the code for some time now, and I am no longer sure that I have a proper grasp on the problem. It seems my fix works, but not for the reasons that I thought it should. Running the unmodified analysis on "join-at-loop-exit.ll" shows the following debug output:

Jun 16 2020, 9:51 PM · Restricted Project
sameerds committed rGd3963b3a5f4f: [DA] propagate loop live-out values that get used in a branch (authored by sameerds).
[DA] propagate loop live-out values that get used in a branch
Jun 16 2020, 9:19 PM
sameerds closed D81822: [DA] propagate loop live-out values that get used in a branch.
Jun 16 2020, 9:19 PM · Restricted Project
sameerds added inline comments to D81806: [DA] conservatively mark the join of every divergent branch.
Jun 16 2020, 7:09 AM · Restricted Project
sameerds added a comment to D81806: [DA] conservatively mark the join of every divergent branch.
bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

I am new to this party. Why is %x marked as uni? Its value will differ amongst threads as branch in bb2 is divergent. Is my understanding incorrect?

Jun 16 2020, 7:09 AM · Restricted Project
sameerds updated the diff for D81822: [DA] propagate loop live-out values that get used in a branch.

eliminate DeferredTerminators; also a small refactor for readability

Jun 16 2020, 4:58 AM · Restricted Project
sameerds added a comment to D81806: [DA] conservatively mark the join of every divergent branch.

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.

Jun 16 2020, 1:36 AM · Restricted Project

Jun 15 2020

sameerds added inline comments to D81822: [DA] propagate loop live-out values that get used in a branch.
Jun 15 2020, 5:56 AM · Restricted Project

Jun 14 2020

sameerds created D81822: [DA] propagate loop live-out values that get used in a branch.
Jun 14 2020, 10:23 PM · Restricted Project
sameerds added reviewers for D81822: [DA] propagate loop live-out values that get used in a branch: simoll, nhaehnle, kerbowa, foad.
Jun 14 2020, 10:23 PM · Restricted Project
sameerds added reviewers for D81806: [DA] conservatively mark the join of every divergent branch: simoll, nhaehnle, kerbowa, foad.
Jun 14 2020, 9:04 AM · Restricted Project
sameerds created D81806: [DA] conservatively mark the join of every divergent branch.
Jun 14 2020, 8:32 AM · Restricted Project

Jun 9 2020

sameerds committed rGd8f651d3e8e2: [AMDGPU] Enable structurizer workarounds by default (authored by sameerds).
[AMDGPU] Enable structurizer workarounds by default
Jun 9 2020, 1:04 AM
sameerds closed D81211: [AMDGPU] Enable structurizer workarounds by default.
Jun 9 2020, 1:04 AM · Restricted Project

Jun 4 2020

sameerds added inline comments to D81211: [AMDGPU] Enable structurizer workarounds by default.
Jun 4 2020, 10:01 PM · Restricted Project
sameerds abandoned D78900: [HIP][AMDGPU] Enable structurizer workarounds.

Abandoned in favour of enabling the workarounds in the AMDGPU backend: https://reviews.llvm.org/D81211

Jun 4 2020, 7:51 PM · Restricted Project, Restricted Project
sameerds updated the summary of D81211: [AMDGPU] Enable structurizer workarounds by default.
Jun 4 2020, 7:51 PM · Restricted Project
sameerds created D81211: [AMDGPU] Enable structurizer workarounds by default.
Jun 4 2020, 7:19 PM · Restricted Project

Jun 3 2020

sameerds committed rGb3cff3c72092: Utility to dump .dot representation of SelectionDAG without firing viewer (authored by madhur13490).
Utility to dump .dot representation of SelectionDAG without firing viewer
Jun 3 2020, 11:25 PM
sameerds closed D80711: Utility to dump .dot representation of SelectionDAG without firing viewer.
Jun 3 2020, 11:25 PM · Restricted Project
sameerds added inline comments to D80996: [AMDGPU][OpenMP] Fix duplicate copies of arguments in commands.
Jun 3 2020, 7:49 PM · Restricted Project

May 31 2020

sameerds accepted D79037: [StructurizeCFG] Fix region nodes ordering.

This looks really good. Thanks a lot for bringing it this far! Please do check the #include for LoopInfo.h before submitting.

May 31 2020, 9:50 PM · Restricted Project

May 29 2020

sameerds added inline comments to D79037: [StructurizeCFG] Fix region nodes ordering.
May 29 2020, 10:50 PM · Restricted Project
sameerds accepted D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.

LGTM, but please wait for approval from @arsenm

May 29 2020, 10:50 PM · Restricted Project
sameerds added a comment to D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.

Actually, the question really is about why inc/dec are needed as separate operations either as IR intrinsics or Clang builtins. Why not just expose a __builtin_amdgcn_atomicrmw that takes a scope, and map it to the LLVM atomicrmw? That would be way cleaner. The language can provide convenience functions for inc/dec that internally call the rmw builtin.

At the moment, atomic inc/dec exist along with atomicrmw. This patch only aims to devise a way to make it (inc/dec) accessible from the language level.

May 29 2020, 10:50 PM · Restricted Project
sameerds added a comment to D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.

Actually, the question really is about why inc/dec are needed as separate operations either as IR intrinsics or Clang builtins. Why not just expose a __builtin_amdgcn_atomicrmw that takes a scope, and map it to the LLVM atomicrmw? That would be way cleaner. The language can provide convenience functions for inc/dec that internally call the rmw builtin.

May 29 2020, 10:21 AM · Restricted Project
sameerds added inline comments to D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.
May 29 2020, 9:45 AM · Restricted Project
sameerds added a comment to D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.

The commit description needs fixing. These are not "llvm instructions" they are "AMDGCN intrinsics". They don't exist in the LangRef. Also, I would recommend inverting the first line of the description: "Introduce Clang builtins that are mapped to AMDGCN intrinsics".

May 29 2020, 9:12 AM · Restricted Project