Page MenuHomePhabricator

nhaehnle (Nicolai Hähnle)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 9 2015, 4:06 AM (276 w, 3 d)

Recent Activity

Tue, Dec 29

nhaehnle committed rGb76014a4f15a: RegionInfo: use a range-based for loop [NFCI] (authored by nhaehnle).
RegionInfo: use a range-based for loop [NFCI]
Tue, Dec 29, 7:01 AM
nhaehnle closed D92932: RegionInfo: use a range-based for loop [NFCI].
Tue, Dec 29, 7:01 AM · Restricted Project

Sun, Dec 27

nhaehnle added a comment to D93376: [LangRef] Clarify the semantics of lifetime intrinsics.

It's fair to say that this patch is a change of semantics by forbidding the use of the lifetime intrinsics in certain places where they were previously allowed.

They were allowed by the LangRef but not supported in the code

Sun, Dec 27, 1:45 AM · Restricted Project

Dec 23 2020

nhaehnle added inline comments to D93708: [AMDGPU] Add a new Clamp Pattern to the GlobalISel Path..
Dec 23 2020, 11:27 PM · Restricted Project, Restricted Project

Dec 21 2020

nhaehnle added a comment to D93376: [LangRef] Clarify the semantics of lifetime intrinsics.

Also, I did not suggest to change the semantics without an RFC and everything else that is needed. You suggest otherwise in your response. I did propose alternatives to fix your bug, but most of my responses show how the restrictions you want to put in place are making (potentially existing) optimizations invalid.

The thing is, in our view (aqjune, nlopes, and me) you *are* changing the semantics by allowing the lifetime intrinsics to be used on things like "malloc". From all we can tell, the intrinsics were never meant to be used with anything but alloca, so for all intents and purposes, currently, they only support alloca.

Dec 21 2020, 9:52 PM · Restricted Project
nhaehnle added inline comments to D93125: Update AMDGPU PAL usage documentation.
Dec 21 2020, 12:24 PM · Restricted Project

Dec 14 2020

nhaehnle added reviewers for D93125: Update AMDGPU PAL usage documentation: tpr, foad.
Dec 14 2020, 9:59 AM · Restricted Project
nhaehnle added a comment to D93125: Update AMDGPU PAL usage documentation.

Thanks for expanding on this documentation.

Dec 14 2020, 9:59 AM · Restricted Project

Dec 9 2020

nhaehnle added inline comments to D92661: [RFC] Fix TLS and Coroutine.
Dec 9 2020, 7:50 AM · Restricted Project, Restricted Project
nhaehnle abandoned D83088: Introduce CfgTraits abstraction.

Superseded by D92924, D92925, D92926

Dec 9 2020, 7:26 AM · Restricted Project, Restricted Project, Restricted Project
nhaehnle updated the diff for D92924: Add opaque Handle infrastructure.

Remove HandleWrapperFor which got de facto superseded by SsaContext
(subsequent patch). Update comment slightly to match.

Dec 9 2020, 7:24 AM · Restricted Project
nhaehnle added a reviewer for D92932: RegionInfo: use a range-based for loop [NFCI]: dblaikie.

Seems trivial enough, and there doesn't really seem to be an owner of this code...

Dec 9 2020, 4:31 AM · Restricted Project
nhaehnle requested review of D92932: RegionInfo: use a range-based for loop [NFCI].
Dec 9 2020, 4:30 AM · Restricted Project
nhaehnle updated the diff for D83094: Analysis: Add a GenericCycleInfo analysis.

v9:

  • rebase on factored-out Handle infrastructure
Dec 9 2020, 2:47 AM · Restricted Project
nhaehnle updated the diff for D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes.

v9:

  • rebase on factored-out Handle infrastructure
Dec 9 2020, 2:46 AM · Restricted Project
nhaehnle reopened D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes.
Dec 9 2020, 2:46 AM · Restricted Project
nhaehnle requested review of D92926: Add ISsaContext: Type-erased wrapper for SsaContext.
Dec 9 2020, 2:45 AM · Restricted Project
nhaehnle requested review of D92925: Add SsaContext: Template to access generic IR functionality.
Dec 9 2020, 2:44 AM · Restricted Project
nhaehnle requested review of D92924: Add opaque Handle infrastructure.
Dec 9 2020, 2:44 AM · Restricted Project

Nov 26 2020

nhaehnle added inline comments to D92086: Generalized PatternMatch & InstSimplify.
Nov 26 2020, 12:07 AM · Restricted Project

Nov 25 2020

nhaehnle added a comment to D91174: [Support] Introduce a new InstructionCost class.
(comparison being a total ordering)
/// This avoids having to add asserts the comparison operators that the states
/// are valid and users can test for validity of the cost explicitly.

Counterpoint: The asserts cost nothing in a release build, and for software maintainability, having the asserts once in the comparison operator is better than having them at every callsite. Users of this class will forget to put them there.

Nov 25 2020, 1:33 PM · Restricted Project
nhaehnle added a comment to D91086: AMDGPU: Document why we use (non-volatile) BUFFER_WBINVL1 in graphics.

After our internal discussion of this fizzled out, I am no longer actually sure whether this analysis is correct.

Nov 25 2020, 7:56 AM · Restricted Project

Nov 9 2020

nhaehnle updated the diff for D91086: AMDGPU: Document why we use (non-volatile) BUFFER_WBINVL1 in graphics.

It seems the MTYPE numbering was changed on gfx9, reflect that in the comment.

Nov 9 2020, 9:10 AM · Restricted Project
nhaehnle requested review of D91086: AMDGPU: Document why we use (non-volatile) BUFFER_WBINVL1 in graphics.
Nov 9 2020, 8:41 AM · Restricted Project

Nov 6 2020

nhaehnle added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.
  • Will this paint us into a corner wrt CUDA, and specifically sm70+?

/me summons @wash, who is probably a better person to speak to this than me.

My understanding is that the semantics of <sm70 convergent are pretty similar to what is described in these examples. But starting in sm70+, each sync operation takes an arg specifying which threads in the warp participate in the instruction.

I admit I do not fully understand what the purpose of this is. At one point in time I thought it was to let humans write (or compilers generate) code like this, where the identity of the convergent instruction does not matter.

// Warning, does not seem to work on sm75
if (cond)
  __syncwarp(FULL_MASK);
else
  __syncwarp(FULL_MASK);

but my testcase, https://gist.github.com/50d1b5fedc926c879a64436229c1cc05, dies with an illegal-instruction error (715) when I make cond have different values within the warp. So, guess not?

Anyway, clearly I don't fully understand the sm70+ convergence semantics. I'd ideally like someone from nvidia (hi, @wash) to speak to whether we can represent their convergent instruction semantics using this proposal. Then we should also double-check that clang can in fact generate the relevant LLVM IR.

To extrapolate from Vinod's answer, I would say that we can represent sm70+ convergence semantics with this proposal. The situation seems to be covered by the examples in the section on hoisting and sinking. Consider the following example copied from the spec:

define void @example(...) convergent {
  %entry = call token @llvm.experimental.convergence.entry()
  %data = ...
  %id = ...
  if (condition) {
    %shuffled = call i32 @subgroupShuffle(i32 %data, i32 %id) [ "convergencectrl"(token %entry) ]
    ...
  }
}

Here, hoisting subgroupShuffle() is generally disallowed because it depends on the identity of active threads. A CUDA builtin with a mask argument similarly identifies specific threads that must be active at the set of textually unaligned calls that synchronize with each other. So any change in the control flow surrounding those calls is generally disallowed without more information. The new representation doesn't seem to restrict a more informed optimizer that can predict how the threads evolve.

Nov 6 2020, 8:44 AM · Restricted Project

Nov 5 2020

nhaehnle added a comment to D90708: [LangRef] Clarify GEP inbounds wrapping semantics.

This makes sense to me.

Nov 5 2020, 2:10 AM · Restricted Project

Nov 4 2020

nhaehnle accepted D90635: [TableGen] Add true and false literals to represent booleans.

LGTM

Nov 4 2020, 8:57 AM · Restricted Project
nhaehnle added inline comments to D90036: [AMDGPU] Emit stack frame size in metadata.
Nov 4 2020, 8:42 AM · Restricted Project
nhaehnle added a comment to D90039: [AsmWriter] Factor out mnemonic generation to accessible getMnemonic..

This seems reasonable to me.

Nov 4 2020, 8:32 AM · Restricted Project
nhaehnle added a comment to D88777: [AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader.

LGTM modulo the inline comment.

Nov 4 2020, 8:26 AM · Restricted Project
nhaehnle added a comment to D90057: [TableGen] [test] Change integer ranges to use new '...' punctuation..

Does this mean it's time to remove the lo-hi syntax altogether?

Nov 4 2020, 8:19 AM · Restricted Project

Oct 30 2020

nhaehnle added a comment to D83088: Introduce CfgTraits abstraction.

I'm going to follow up with another RFC about this on llvm-dev.

Oct 30 2020, 11:56 AM · Restricted Project, Restricted Project, Restricted Project
nhaehnle added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

I'm going to try to give feedback, but with the caveat that there's a huge amount of discussion here, and with my apologies that I can't read the whole thread's worth of context. It's a lot. Sorry that I'm probably bringing up things that have already been discussed.

Oct 30 2020, 2:42 AM · Restricted Project
nhaehnle updated the diff for D85603: IR: Add convergence control operand bundle and intrinsics.

Address the comments from @jlebar that I indicate I'd address,
except for changes affecting the Verifier -- I'll do those later.

Oct 30 2020, 2:39 AM · Restricted Project

Oct 27 2020

nhaehnle added a reverting change for rG03a5f7ce12e2: Try to make GCC5 happy about the CfgTraits thing: rGe025d09b216d: Revert multiple patches based on "Introduce CfgTraits abstraction".
Oct 27 2020, 12:34 PM
nhaehnle added a reverting change for rGc0cdd22c72fa: Introduce CfgTraits abstraction: rGe025d09b216d: Revert multiple patches based on "Introduce CfgTraits abstraction".
Oct 27 2020, 12:34 PM
nhaehnle added a reverting change for rGf2a06875b604: Wrap CfgTraitsFor in namespace llvm to please GCC 5: rGe025d09b216d: Revert multiple patches based on "Introduce CfgTraits abstraction".
Oct 27 2020, 12:34 PM
nhaehnle added a reverting change for rGa74fc481588f: CfgInterface: rename interface() to getInterface(): rGe025d09b216d: Revert multiple patches based on "Introduce CfgTraits abstraction".
Oct 27 2020, 12:34 PM
nhaehnle committed rGe025d09b216d: Revert multiple patches based on "Introduce CfgTraits abstraction" (authored by nhaehnle).
Revert multiple patches based on "Introduce CfgTraits abstraction"
Oct 27 2020, 12:34 PM
nhaehnle added a reverting change for D83088: Introduce CfgTraits abstraction: rGe025d09b216d: Revert multiple patches based on "Introduce CfgTraits abstraction".
Oct 27 2020, 12:34 PM · Restricted Project, Restricted Project, Restricted Project
nhaehnle added a reverting change for rG848a68a032d1: DomTree: Extract (mostly) read-only logic into type-erased base classes: rGce6900c6cb51: Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes".
Oct 27 2020, 12:34 PM
nhaehnle committed rGce6900c6cb51: Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes" (authored by nhaehnle).
Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes"
Oct 27 2020, 12:34 PM
nhaehnle added a reverting change for D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes: rGce6900c6cb51: Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes".
Oct 27 2020, 12:34 PM · Restricted Project

Oct 26 2020

nhaehnle added a comment to D89995: Make the post-commit review expectations more explicit with respect to revert.

Overall, I think the policy change proposed here isn't entirely unreasonable, but I do think it needs to be treated as a change of policy. Not everybody is necessarily aware of a 4 year old email thread; heck, I've been working on upstream LLVM and frontends for 5 years and I wasn't aware of this supposed policy where in reality, things weren't actually done according to what's written in the policy documents. If I don't know this after 5 years, how can anybody joining in the last 4 years possibly be aware other than through mere chance. What's written in the documents needs to matter, if only as part of making LLVM a welcoming and inclusive community. Hidden tribal knowledge is the opposite of those ideals.

Oct 26 2020, 10:38 AM · Restricted Project
nhaehnle added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

ping^2

Oct 26 2020, 8:23 AM · Restricted Project

Oct 24 2020

nhaehnle added a comment to D83088: Introduce CfgTraits abstraction.

I replied on llvm-dev.

Oct 24 2020, 3:14 PM · Restricted Project, Restricted Project, Restricted Project
nhaehnle requested changes to D89995: Make the post-commit review expectations more explicit with respect to revert.

Considering that you are doing this in response to D83088, I think we have pre-existing evidence that this isn't workable as-is. There has to be some obligation on the people "thinking there should be more review" to actually engage in review. In particular, if they have previously demonstrated that they cannot be relied on for this, then the rule should not apply.

Oct 24 2020, 1:14 PM · Restricted Project

Oct 23 2020

nhaehnle added a comment to D83088: Introduce CfgTraits abstraction.

Hi Mehdi, this is not an appropriate place for this discussion. Yes, we have a general rule that patches can be reverted if they're obviously broken (e.g. build bot problems) or clearly violate some other standard. This is a good rule, but it doesn't apply here. If you think it does, please state your case in the email thread that I've started on llvm-dev for this very purpose. Just one thing:

Oct 23 2020, 8:16 AM · Restricted Project, Restricted Project, Restricted Project
nhaehnle committed rGa74fc481588f: CfgInterface: rename interface() to getInterface() (authored by nhaehnle).
CfgInterface: rename interface() to getInterface()
Oct 23 2020, 7:52 AM

Oct 22 2020

nhaehnle added a comment to D89826: [FunctionAttrs][NPM] Fix handling of convergent.

Oh, and on the change itself: I'm not familiar enough with the attributor framework to judge the implementation, but the described reasoning (being able to make deductions from indirect calls) is sound.

Oct 22 2020, 12:05 PM · Restricted Project
nhaehnle added a comment to D89826: [FunctionAttrs][NPM] Fix handling of convergent.

At a high level, it should be fine to assume an indirect call without the "convergent" attribute isn't convergent. If the language rules for some language say the callee of an indirect call might be convergent, the frontend can add the convergent attribute to that call. (Unlike most attributes, it's a negative attribute: it restricts optimizations, and attribute inference optimizations would remove it,)

I don't object to that but I want us to remove the "may" from the lang ref and make it a "will" (or similar). It describes the semantics not the optimization we "may" perform.

Oct 22 2020, 12:04 PM · Restricted Project
nhaehnle added a comment to D89880: [AMDGPU] Reorder SIMemoryLegalizer functions to be consistent.

If we're nitpicking over such details, I would point out that it would be better to structure the whole thing differently, with a single legalizer class in which the insertAcquire etc. methods have different paths based on the hardware generation.

Oct 22 2020, 11:51 AM · Restricted Project
nhaehnle accepted D89962: [TableGen] Update documents to make them more complete.

LGTM

Oct 22 2020, 8:45 AM · Restricted Project
nhaehnle accepted D88081: [AMDGPU] Move WQM Pass after MI Scheduler.

LGTM

Oct 22 2020, 8:41 AM · Restricted Project
nhaehnle accepted D89814: [TableGen] Change !getop and !setop to !getdagop and !setdagop.

LGTM

Oct 22 2020, 8:33 AM · Restricted Project, Restricted Project
nhaehnle added inline comments to D88540: [AMDGPU] Add amdgpu_gfx calling convention.
Oct 22 2020, 7:59 AM · Restricted Project
nhaehnle added a comment to D88540: [AMDGPU] Add amdgpu_gfx calling convention.

I noticed what I believe is one more correctness issues, plus a few assorted comments.

Oct 22 2020, 7:58 AM · Restricted Project

Oct 21 2020

nhaehnle added a comment to D83088: Introduce CfgTraits abstraction.

David, I don't think this is appropriate here. Let's take the discussion to llvm-dev.

Oct 21 2020, 12:58 PM · Restricted Project, Restricted Project, Restricted Project

Oct 20 2020

nhaehnle committed rG848a68a032d1: DomTree: Extract (mostly) read-only logic into type-erased base classes (authored by nhaehnle).
DomTree: Extract (mostly) read-only logic into type-erased base classes
Oct 20 2020, 10:53 AM
nhaehnle closed D83089: DomTree: Extract (mostly) read-only logic into type-erased base classes.
Oct 20 2020, 10:53 AM · Restricted Project
nhaehnle committed rGc0cdd22c72fa: Introduce CfgTraits abstraction (authored by nhaehnle).
Introduce CfgTraits abstraction
Oct 20 2020, 4:51 AM
nhaehnle closed D83088: Introduce CfgTraits abstraction.
Oct 20 2020, 4:51 AM · Restricted Project, Restricted Project, Restricted Project

Oct 19 2020

nhaehnle added a comment to D89610: AMDGPU: Fix missing read/writelane cases to skip with exec=0.

Adding writelane absolutely makes sense to me, but:

Additionally, since we emit the final encoded instructions, this wasn't really matching readlane either.

:(

We have a bunch of places where we _always_ use real instructions, and presumably those are fine. But if we let real instructions creep into places where we usually use pseudos, we end up making the backend less efficient because over time, everybody will have to check for pseudos and reals (like in this particular case here). Can we instead add some sort of check that those real instructions aren't used during CodeGen? (The MI verifier should be able to check for this, for example.)

There is some reason we use the encoded versions for readlane/writelane that I never remember what it is

Oct 19 2020, 10:42 AM · Restricted Project
nhaehnle added a comment to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

LGTM modulo that inline question.

Oct 19 2020, 10:31 AM · Restricted Project
nhaehnle accepted D89644: [AMDGPU] Remove fix up operand from SI_ELSE.

Thanks, LGTM

Oct 19 2020, 10:21 AM · Restricted Project
nhaehnle added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

This is fine for graphics.

Oct 19 2020, 10:14 AM · Restricted Project
nhaehnle added a comment to D89610: AMDGPU: Fix missing read/writelane cases to skip with exec=0.

Adding writelane absolutely makes sense to me, but:

Oct 19 2020, 10:05 AM · Restricted Project
nhaehnle added a comment to D89595: [AMDGPU] Update AMDGPUUsage.rst.

Duplicate of D89596?

Oct 19 2020, 10:01 AM · Restricted Project
nhaehnle added inline comments to D88081: [AMDGPU] Move WQM Pass after MI Scheduler.
Oct 19 2020, 9:57 AM · Restricted Project
nhaehnle added a comment to D18714: Add writeonly IR attribute.

That was very long ago. My best guess would be "oversight".

Oct 19 2020, 9:45 AM · Restricted Project

Oct 15 2020

nhaehnle added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

ping

Oct 15 2020, 6:19 AM · Restricted Project
nhaehnle added a comment to D83088: Introduce CfgTraits abstraction.

ping^2

Oct 15 2020, 6:19 AM · Restricted Project, Restricted Project, Restricted Project

Oct 14 2020

nhaehnle updated subscribers of D89424: [AMDGPU] Spilling using flat scratch.
Oct 14 2020, 4:35 PM · Restricted Project
nhaehnle accepted D89375: [AMDGPU] Add objdump invalid metadata testcase.

LGTM

Oct 14 2020, 4:31 PM · Restricted Project
nhaehnle added inline comments to D89399: [AMDGPU] Set rsrc1 flags for graphics shaders.
Oct 14 2020, 4:25 PM · Restricted Project
nhaehnle added a comment to D89388: [AMDGPU] Fix ieee mode default value.

Please make sure you give this a reasonable amount of testing, but it totally makes sense to me.

Oct 14 2020, 4:18 PM · Restricted Project
nhaehnle accepted D89331: [TableGen] Add the !not and !xor operators..
Oct 14 2020, 4:14 PM · Restricted Project
nhaehnle added a comment to D88774: Add disassembly counter after disasembly line.

Please work on removing this feature from the backend instead. It's long past time to stop handling this in the compiler

There are several shader debugger tools/situation dependent on this feature.
e.g. one line like this "s_andn2_b32 vcc_lo, vcc_lo, 0 ;000974: 8A6A806A"
1). we feed the disassembly count "0x000974" to the shader debugger tool to get this line of code execution ,register input and output.
2).Also there is some other shader dump tool,e.g. umr. or windbg which output a stream of hardware codes, we can use the hardware code to find matched part of disassembly lines.
3). we also have one shader replacement tool , to drop and replace part of the elf hardware code, if we have full disassembly line with counter and hardware code, it would be handy where to edit hardware code.

Hope it explains.

Oct 14 2020, 4:09 PM · Restricted Project
nhaehnle added inline comments to D88081: [AMDGPU] Move WQM Pass after MI Scheduler.
Oct 14 2020, 4:04 PM · Restricted Project
nhaehnle added a comment to D89187: [AMDGPU] Minimize number of s_mov generated by copyPhysReg.

How about splitting this change into two parts? Limit this change to generating mixes of 64- and 32-bit moves, and then separately consider what to do about those moves of registers where subregisters are overwritten immediately.

Oct 14 2020, 3:23 PM · Restricted Project

Oct 12 2020

nhaehnle added inline comments to D88081: [AMDGPU] Move WQM Pass after MI Scheduler.
Oct 12 2020, 11:19 AM · Restricted Project
nhaehnle added a comment to D89095: AMDGPU: Introduce a flag to control enable/disable instruction sink pass.

Why does this require a commit upstream?

Oct 12 2020, 10:47 AM · Restricted Project
nhaehnle added a comment to D88832: [TableGen] Add new getAllDerivedDefinitionsTwo function to RecordKeeper.

LGTM

Oct 12 2020, 10:22 AM · Restricted Project
nhaehnle added a comment to D88890: [AMDGPU] Add patterns for mad/mac legacy f32 instructions.

Looks reasonable to me.

Oct 12 2020, 10:14 AM · Restricted Project
nhaehnle added a comment to D89095: AMDGPU: Introduce a flag to control enable/disable instruction sink pass.

I agree with Matt here. You should be able to do experiments locally. Perhaps sinking should be disabled entirely, or perhaps sinking should be improved to take register liveness into account.

Oct 12 2020, 10:04 AM · Restricted Project
nhaehnle added a comment to D88774: Add disassembly counter after disasembly line.

I don't understand how this would help with shader replacement. Can you please explain how you plan to use this?

Oct 12 2020, 10:02 AM · Restricted Project

Oct 7 2020

nhaehnle accepted D89000: [AMDGPU][MC][GFX1030] Disabled v_mac_f32.

LGTM

Oct 7 2020, 12:12 PM · Restricted Project

Oct 5 2020

nhaehnle accepted D88775: [AMDGPU] SIInsertSkips: Refactor early exit block creation.

LGTM

Oct 5 2020, 8:40 AM · Restricted Project

Oct 1 2020

nhaehnle added a comment to D83088: Introduce CfgTraits abstraction.

ping

Oct 1 2020, 8:26 AM · Restricted Project, Restricted Project, Restricted Project

Sep 30 2020

nhaehnle added a comment to D87543: AMDGPU: Always split si_end_cf blocks.

I'm not trying to fully solve the live range splitting problem greedy regalloc hits. I'm trying to eliminate the isBasicBlockPrologue concept that fastregalloc trips over when inserting spills at the beginning of the block

Sep 30 2020, 8:01 AM · Restricted Project
nhaehnle requested changes to D87543: AMDGPU: Always split si_end_cf blocks.

It's unclear to me what this is trying to achieve. If it is to prevent

bb:
  <-- reload inserted here during live range splitting
  $exec = S_OR_B64 $exec, %other
  ... rest of code ...

... then this change only replaces it by:

bb:
  <-- reload inserted here during live range splitting
  $exec = S_OR_B64_term $exec, %other
  // fallthrough
Sep 30 2020, 7:55 AM · Restricted Project
nhaehnle added a comment to D67767: [AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic and live mask tracking.

Hi Carl, what's the status of this?

Sep 30 2020, 6:21 AM · Restricted Project
nhaehnle added inline comments to D88291: [AMDGPU] Insert waterfall loops for divergent calls.
Sep 30 2020, 4:43 AM · Restricted Project

Sep 24 2020

nhaehnle added a comment to D87674: [AMDGPU] Insert waitcnt after returning from call.

The revert also highlights the problem that we don't have representative test in llvm/test :(

Sep 24 2020, 12:22 AM · Restricted Project

Sep 23 2020

nhaehnle added a comment to D87674: [AMDGPU] Insert waitcnt after returning from call.

Yes, this commit is incorrect. It completely breaks code linking in Mesa OpenGL. s_waitcnt is required at the end of all global functions that return values.

Please revert. @nhaehnle

I don't understand why would it fail. This patch just moves s_waitcnt to the caller so they would be executed anyway. I think I am missing something. It would be helpful to root cause if we can isolate to a small test case.

Shader returns aren't real returns and the "caller" doesn't wait

I see. So how should this be implemented? May be we conditionalize this patch just for compute?

Sep 23 2020, 10:02 AM · Restricted Project
nhaehnle added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Sep 23 2020, 9:27 AM · Restricted Project

Sep 15 2020

nhaehnle added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Sep 15 2020, 6:08 AM · Restricted Project

Sep 7 2020

nhaehnle added inline comments to D85604: SimplifyCFG: prevent certain transforms on convergent operations.
Sep 7 2020, 7:59 AM · Restricted Project
nhaehnle added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Sep 7 2020, 7:49 AM · Restricted Project
nhaehnle added inline comments to D83088: Introduce CfgTraits abstraction.
Sep 7 2020, 7:38 AM · Restricted Project, Restricted Project, Restricted Project