Page MenuHomePhabricator

nhaehnle (Nicolai Hähnle)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 9 2015, 4:06 AM (206 w, 3 d)

Recent Activity

Tue, Aug 27

nhaehnle added inline comments to D66666: [AMDGPU] Remove unnecessary movs for v_fmac operands.
Tue, Aug 27, 8:56 AM · Restricted Project

Aug 7 2019

nhaehnle added a comment to D65813: Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0".

Hmm, those test changes were unexpected.

Aug 7 2019, 2:26 AM · Restricted Project

Aug 5 2019

nhaehnle accepted D65719: AMDGPU: Disambiguate v3f16 format in load/store tables.

LGTM

Aug 5 2019, 5:28 AM
nhaehnle accepted D65604: AMDGPU/GlobalISel: Alternative mappings for constants.

LGTM

Aug 5 2019, 5:25 AM
nhaehnle accepted D65601: AMDGPU/GlobalISel: Don't reject shader types.

Sure, why not.

Aug 5 2019, 5:22 AM
nhaehnle added a comment to D65238: AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.

Mostly seems reasonable to me, but two questions inline.

Aug 5 2019, 5:20 AM
nhaehnle accepted D65113: AMDGPU: Correct behavior of f16/i16 non-format store intrinsics.

LGTM

Aug 5 2019, 5:06 AM
nhaehnle committed rGe204786b6cc9: AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec} (authored by nhaehnle).
AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}
Aug 5 2019, 2:37 AM

Jul 25 2019

nhaehnle created D65283: AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}.
Jul 25 2019, 8:38 AM · Restricted Project
nhaehnle added a comment to D65274: [SDA] Don't stop divergence propagation at the IPD..

This needs a test. It should be possible to extract one from the bug reports?

Jul 25 2019, 5:44 AM · Restricted Project
nhaehnle accepted D65141: [DivergenceAnalysis] Add methods for querying divergence at use.

I do prefer the explicit isDivergentUse!

Jul 25 2019, 5:41 AM · Restricted Project

Jul 24 2019

nhaehnle accepted D65027: AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.raw.buffer.{load|store}.

LGTM

Jul 24 2019, 1:47 AM
nhaehnle accepted D65074: AMDGPU: Don't assert on v4f16 arguments to shader calling conventions.

LGTM

Jul 24 2019, 1:35 AM
nhaehnle accepted D64935: [AMDGPU] Add llvm.amdgcn.softwqm intrinsic.

LGTM. Followup for WQM/WWM sounds good to me as well.

Jul 24 2019, 1:29 AM · Restricted Project
nhaehnle added inline comments to D64862: AMDGPU/GlobalISel: RegBankSelect interp intrinsics.
Jul 24 2019, 1:22 AM
nhaehnle added a comment to D65088: [AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic.

Thanks for doing this. For the codegen quality question, I wonder if something like the following could be done:

Jul 24 2019, 1:12 AM · Restricted Project
nhaehnle accepted D65124: AMDGPU: Correct behavior of f16 buffer loads.

Also fixes support for targets without i16.

Jul 24 2019, 12:28 AM
nhaehnle added a comment to D65141: [DivergenceAnalysis] Add methods for querying divergence at use.

Nice work! It's missing tests though -- I'm not sure if we want to print this additional information in the divergence printer, but definitely a test for the atomic optimizer pass is required.

Jul 24 2019, 12:22 AM · Restricted Project

Jul 23 2019

nhaehnle accepted D65111: AMDGPU: Only allow FP types for format buffer intrinics.

Based on a quick check this should be safe for all graphics clients. LGTM.

Jul 23 2019, 5:58 AM
nhaehnle accepted D65126: AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts.

LGTM

Jul 23 2019, 5:55 AM
nhaehnle accepted D64954: [IR][Verifier] Allow IntToPtrInst to be !dereferenceable.

LGTM, though I believe the hyperlinks in :ref: need to be without the underscore (and other examples of :ref: use seem to corroborate that).

Jul 23 2019, 2:35 AM · Restricted Project
nhaehnle added a comment to D64899: AMDGPU/GlobalISel: First pass at attempting to legalize load/stores.

Just to clarify, how is selection of global_load_ubyte and friends going to work? I assume similar to today where the load returns an s32 value, but instruction selection does matching based on the MemOperand remembering the size?

Jul 23 2019, 2:32 AM

Jul 22 2019

nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

One more thought:

Jul 22 2019, 1:03 PM
nhaehnle added inline comments to D64954: [IR][Verifier] Allow IntToPtrInst to be !dereferenceable.
Jul 22 2019, 8:57 AM · Restricted Project
nhaehnle added inline comments to D64862: AMDGPU/GlobalISel: RegBankSelect interp intrinsics.
Jul 22 2019, 4:14 AM
nhaehnle added inline comments to D64862: AMDGPU/GlobalISel: RegBankSelect interp intrinsics.
Jul 22 2019, 2:38 AM
nhaehnle accepted D64901: [AMDGPU][NFC] Simplify test file for amdgcn intrinsics.

LGTM

Jul 22 2019, 2:28 AM · Restricted Project
nhaehnle accepted D64919: TableGen: Support physical register inputs > 255.

LGTM

Jul 22 2019, 2:25 AM
nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

Okay, the possibility of an AssertZext is an interesting point. So let me try the other way around: What would the MIR at this stage look like to enforce an and?

Jul 22 2019, 2:11 AM
nhaehnle added a comment to D64935: [AMDGPU] Add llvm.amdgcn.softwqm intrinsic.

Okay thanks, I see the logic now.

Jul 22 2019, 1:48 AM · Restricted Project
nhaehnle added a comment to D64946: [AMDGPU] Fix trivial PHI into SI_END_CF..

How about the following simpler logic:

  • if the PHI is used by any basic-block prologue instruction (except other PHIs), then insert the COPY at the top of the basic block
  • otherwise, insert the COPY after the basic-block prologue

In this case COPY in the prologue also shall be marked as prologue instruction somehow.

Jul 22 2019, 1:48 AM · Restricted Project
nhaehnle added a comment to D64508: AMDGPU: Force s_waitcnt after GWS instructions.

My understanding is that this is mostly related to CWSR. The trap handler has to be able to "replay" the GWS instruction.

Jul 22 2019, 1:40 AM
nhaehnle added a comment to D63281: [TargetLowering] Add SimplifyMultipleUseDemandedBits.

The AMDGPU changes seem fine to me overall.

Jul 22 2019, 1:39 AM · Restricted Project
nhaehnle added a comment to D64954: [IR][Verifier] Allow IntToPtrInst to be !dereferenceable.

Thanks! Could you please also add a test to Analysis/ValueTracking/memory-dereferenceable.ll?

Jul 22 2019, 1:31 AM · Restricted Project

Jul 19 2019

nhaehnle added a comment to D64911: [AMDGPU] Extend the SI Load/Store optimizer.

I still think we should be handling these on the IR level

Jul 19 2019, 7:16 AM · Restricted Project
nhaehnle added a comment to D64946: [AMDGPU] Fix trivial PHI into SI_END_CF..

How about the following simpler logic:

Jul 19 2019, 5:22 AM · Restricted Project
nhaehnle added a comment to D64935: [AMDGPU] Add llvm.amdgcn.softwqm intrinsic.

Have you checked that this actually fixes the reported CTS failure?

Jul 19 2019, 5:16 AM · Restricted Project

Jul 18 2019

nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

But then following this logic, I still think that by analogy with G_ZEXT, the operation of COPY from s1 into vcc should have the semantics of ignoring the high bits of the "s1 which is really an s32". Since there's nothing in the MIR test which guarantees that the incoming high bits of $sgpr0 are 0, the resulting code needs to have some form of masking.

Jul 18 2019, 7:20 AM
nhaehnle accepted D64490: AMDGPU/GlobalISel: Selection for fminnum/fmaxnum.

That seems reasonable to me.

Jul 18 2019, 7:12 AM

Jul 17 2019

nhaehnle committed rG8b7041a5c6f0: AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC (authored by nhaehnle).
AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC
Jul 17 2019, 4:24 AM
nhaehnle committed rGa256b8b7d77c: AMDGPU: Improve alias analysis for GDS (authored by nhaehnle).
AMDGPU: Improve alias analysis for GDS
Jul 17 2019, 4:24 AM
nhaehnle added inline comments to D64114: AMDGPU: Add missing code for GDS.
Jul 17 2019, 4:23 AM · Restricted Project

Jul 16 2019

nhaehnle updated the diff for D64807: AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC.

Add missing test changes

Jul 16 2019, 10:31 AM · Restricted Project
nhaehnle created D64807: AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC.
Jul 16 2019, 9:59 AM · Restricted Project
nhaehnle updated the diff for D64114: AMDGPU: Add missing code for GDS.

Add test case and remove the legalizer part.

Jul 16 2019, 9:44 AM · Restricted Project
nhaehnle commandeered D64114: AMDGPU: Add missing code for GDS.

I'm taking this over.

Jul 16 2019, 6:43 AM · Restricted Project
nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

This seems incorrect, doesn't it? The truncation disappeared.... (e.g., what if $sgpr0 is 0x10)

My current understanding of G_TRUNC is it's a no-op, and supposed to always be legal. This is supposed to be the legalized MIR, so theoretically this was generated by something that knew the original argument was zeroext from i1

Jul 16 2019, 6:42 AM
nhaehnle added inline comments to D63639: [AMDGPU] Prevent backend override of WGP when using PAL.
Jul 16 2019, 4:06 AM · Restricted Project
nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

This seems incorrect, doesn't it? The truncation disappeared.... (e.g., what if $sgpr0 is 0x10)

Jul 16 2019, 2:56 AM
nhaehnle added a comment to D64490: AMDGPU/GlobalISel: Selection for fminnum/fmaxnum.

Oh, is that because the new node causes the intrinsics to be lowered to G_FMINNUM etc.? Why doesn't this affect any other targets?

Jul 16 2019, 2:45 AM
nhaehnle added a comment to D64490: AMDGPU/GlobalISel: Selection for fminnum/fmaxnum.

Why are you removing the testcases for the intrinsics?

Jul 16 2019, 2:44 AM
nhaehnle accepted D64725: AMDGPU/GlobalISel: Select G_SHL.

Yeah, let's take this.

Jul 16 2019, 2:42 AM
nhaehnle accepted D64344: AMDGPU: Add register classes to flat store patterns.

Would obviously be good to fix the underlying issue, but sure, this seems reasonable.

Jul 16 2019, 2:37 AM

Jul 12 2019

nhaehnle accepted D64186: [NewPM] Port MachineDominatorTree analysis to the new PM..

You should probably wait a bit in case somebody else wants to chime in, but this looks good to me.

Jul 12 2019, 6:09 AM · Restricted Project

Jul 2 2019

nhaehnle added inline comments to D62766: [Attributor] Deduce "nosync" function attribute..
Jul 2 2019, 12:19 AM · Restricted Project

Jul 1 2019

nhaehnle committed rG10c911db63ec: AMDGPU/GFX10: implement ds_ordered_count changes (authored by nhaehnle).
AMDGPU/GFX10: implement ds_ordered_count changes
Jul 1 2019, 10:19 AM
nhaehnle committed rG4dc3b2bf95b0: AMDGPU: Support GDS atomics (authored by nhaehnle).
AMDGPU: Support GDS atomics
Jul 1 2019, 10:19 AM
nhaehnle added inline comments to D63452: AMDGPU: Support some GDS atomics.
Jul 1 2019, 9:16 AM · Restricted Project
nhaehnle updated the diff for D63452: AMDGPU: Support some GDS atomics.

Address review comments

Jul 1 2019, 9:16 AM · Restricted Project
nhaehnle committed rG7cfd99ab15d0: AMDGPU/GFX10: fix scratch resource descriptor (authored by nhaehnle).
AMDGPU/GFX10: fix scratch resource descriptor
Jul 1 2019, 8:46 AM
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1220

Jul 1 2019, 8:36 AM · Restricted Project
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

I'm currently looking into it.

Jul 1 2019, 5:15 AM · Restricted Project
nhaehnle added inline comments to D62766: [Attributor] Deduce "nosync" function attribute..
Jul 1 2019, 3:10 AM · Restricted Project
nhaehnle accepted D63824: AMDGPU: Add pass to lower SGPR spills.

One nit, apart from that LGTM.

Jul 1 2019, 3:02 AM
nhaehnle accepted D63819: AMDGPU/GlobalISel: Improve icmp selection coverage..

LGTM

Jul 1 2019, 2:53 AM
nhaehnle accepted D63766: AMDGPU/GlobalISel: Use and instead of BFE with inline immediate.

Thanks, LGTM

Jul 1 2019, 2:52 AM
nhaehnle accepted D63799: AMDGPU/GlobalISel: Fix scc->vcc copy handling.

LGTM

Jul 1 2019, 2:48 AM
nhaehnle accepted D63798: AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT.

LGTM

Jul 1 2019, 2:45 AM
nhaehnle accepted D63413: AMDGPU/GlobalISel: RegBankSelect for WWM/WQM.

LGTM

Jul 1 2019, 2:41 AM
nhaehnle accepted D63408: AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote.

LGTM

Jul 1 2019, 2:41 AM
nhaehnle added a comment to D63814: [TableGen] Allow DAG isel patterns to override default operands..

Would it be possible to make default operands overridable automatically iff they are at the end of the operand list? I.e., if you have a suffix of default operands, then those can be overridden?

Jul 1 2019, 2:06 AM · Restricted Project
nhaehnle accepted D63953: [AMDGPU] LCSSA pass added in preISel. .

Thanks for dealing with this. Matt's suggestion is reasonable to me, either way LGTM.

Jul 1 2019, 2:04 AM · Restricted Project
nhaehnle accepted D63980: [AMDGPU] Call isLoopExiting for blocks in the loop..

LGTM

Jul 1 2019, 2:04 AM · Restricted Project

Jun 28 2019

nhaehnle accepted D63412: AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap.

LGTM

Jun 28 2019, 8:55 AM
nhaehnle updated the diff for D63808: AMDGPU/GFX10: fix scratch resource descriptor.

Properly test based on wavefront size

Jun 28 2019, 6:37 AM · Restricted Project
nhaehnle updated the diff for D63808: AMDGPU/GFX10: fix scratch resource descriptor.

Add a test case

Jun 28 2019, 6:28 AM · Restricted Project
nhaehnle added a comment to D63452: AMDGPU: Support some GDS atomics.

ping

Jun 28 2019, 5:27 AM · Restricted Project

Jun 27 2019

nhaehnle committed rG32ef9292bea1: AMDGPU: Make fixing i1 copies robust against re-ordering (authored by nhaehnle).
AMDGPU: Make fixing i1 copies robust against re-ordering
Jun 27 2019, 9:58 AM
nhaehnle added a comment to D63520: AMDGPU: Use ReversePostOrder when fixing i1 copies.

I'd prefer the alternative fix at D63871, since it doesn't require RPOT.

Jun 27 2019, 4:43 AM · Restricted Project
nhaehnle created D63871: AMDGPU: Make fixing i1 copies robust against re-ordering.
Jun 27 2019, 4:43 AM · Restricted Project
nhaehnle requested changes to D62766: [Attributor] Deduce "nosync" function attribute..

This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).

Jun 27 2019, 1:52 AM · Restricted Project

Jun 26 2019

nhaehnle committed rG806600987d39: llvm-objcopy: silence warning introduced in r364296 (authored by nhaehnle).
llvm-objcopy: silence warning introduced in r364296
Jun 26 2019, 12:18 PM
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Oh... radv declares compute LDS as an LLVM global variable but doesn't use rtld yet, right? Sorry, I missed that.

Jun 26 2019, 11:57 AM · Restricted Project
nhaehnle created D63808: AMDGPU/GFX10: fix scratch resource descriptor.
Jun 26 2019, 12:53 AM · Restricted Project

Jun 25 2019

nhaehnle added inline comments to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 25 2019, 5:45 AM · Restricted Project
nhaehnle accepted D63751: AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT.

Trivial nitpick, but essentially LGTM.

Jun 25 2019, 5:14 AM
nhaehnle committed rG2710171a15e8: AMDGPU: Write LDS objects out as global symbols in code generation (authored by nhaehnle).
AMDGPU: Write LDS objects out as global symbols in code generation
Jun 25 2019, 4:59 AM
nhaehnle committed rG08e8cb576021: AMDGPU/MC: Add .amdgpu_lds directive (authored by nhaehnle).
AMDGPU/MC: Add .amdgpu_lds directive
Jun 25 2019, 4:58 AM
nhaehnle closed D61493: AMDGPU/MC: Add .amdgpu_lds directive.
Jun 25 2019, 4:58 AM · Restricted Project
nhaehnle added inline comments to D63716: AMDGPU/GFX10: implement ds_ordered_count changes.
Jun 25 2019, 4:49 AM · Restricted Project

Jun 24 2019

nhaehnle accepted D63715: AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelect.

LGTM

Jun 24 2019, 10:28 AM
nhaehnle accepted D63414: AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1.

LGTM

Jun 24 2019, 8:50 AM
nhaehnle added inline comments to D63484: AMDGPU/GlobalISel: Make s16 G_ICMP legal.
Jun 24 2019, 8:45 AM
nhaehnle accepted D63721: [AMDGPU] Remove unused variable AllSGPRSpilledToVGPRs. NFC.

LGTM

Jun 24 2019, 8:45 AM · Restricted Project
nhaehnle created D63716: AMDGPU/GFX10: implement ds_ordered_count changes.
Jun 24 2019, 7:18 AM · Restricted Project
nhaehnle added inline comments to D61493: AMDGPU/MC: Add .amdgpu_lds directive.
Jun 24 2019, 7:17 AM · Restricted Project
nhaehnle added inline comments to D63452: AMDGPU: Support some GDS atomics.
Jun 24 2019, 7:17 AM · Restricted Project
nhaehnle updated the diff for D63452: AMDGPU: Support some GDS atomics.

Address review

Jun 24 2019, 7:17 AM · Restricted Project
nhaehnle updated the diff for D61493: AMDGPU/MC: Add .amdgpu_lds directive.

Address review

Jun 24 2019, 6:48 AM · Restricted Project
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Thanks.

Jun 24 2019, 6:48 AM · Restricted Project