Page MenuHomePhabricator

nhaehnle (Nicolai Hähnle)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 9 2015, 4:06 AM (197 w, 3 d)

Recent Activity

Today

nhaehnle added inline comments to D64862: AMDGPU/GlobalISel: RegBankSelect interp intrinsics.
Mon, Jul 22, 4:14 AM
nhaehnle added inline comments to D64862: AMDGPU/GlobalISel: RegBankSelect interp intrinsics.
Mon, Jul 22, 2:38 AM
nhaehnle accepted D64901: [AMDGPU][NFC] Simplify test file for amdgcn intrinsics.

LGTM

Mon, Jul 22, 2:28 AM · Restricted Project
nhaehnle accepted D64919: TableGen: Support physical register inputs > 255.

LGTM

Mon, Jul 22, 2:25 AM
nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

Okay, the possibility of an AssertZext is an interesting point. So let me try the other way around: What would the MIR at this stage look like to enforce an and?

Mon, Jul 22, 2:11 AM
nhaehnle added a comment to D64935: [AMDGPU] Add llvm.amdgcn.softwqm intrinsic.

Okay thanks, I see the logic now.

Mon, Jul 22, 1:48 AM · Restricted Project
nhaehnle added a comment to D64946: [AMDGPU] Fix trivial PHI into SI_END_CF..

How about the following simpler logic:

  • if the PHI is used by any basic-block prologue instruction (except other PHIs), then insert the COPY at the top of the basic block
  • otherwise, insert the COPY after the basic-block prologue

In this case COPY in the prologue also shall be marked as prologue instruction somehow.

Mon, Jul 22, 1:48 AM · Restricted Project
nhaehnle added a comment to D64508: AMDGPU: Force s_waitcnt after GWS instructions.

My understanding is that this is mostly related to CWSR. The trap handler has to be able to "replay" the GWS instruction.

Mon, Jul 22, 1:40 AM
nhaehnle added a comment to D63281: [TargetLowering] Add SimplifyMultipleUseDemandedBits.

The AMDGPU changes seem fine to me overall.

Mon, Jul 22, 1:39 AM · Restricted Project
nhaehnle added a comment to D64954: [IR][Verifier] Allow IntToPtrInst to be !dereferenceable.

Thanks! Could you please also add a test to Analysis/ValueTracking/memory-dereferenceable.ll?

Mon, Jul 22, 1:31 AM · Restricted Project

Fri, Jul 19

nhaehnle added a comment to D64911: [AMDGPU] Extend the SI Load/Store optimizer.

I still think we should be handling these on the IR level

Fri, Jul 19, 7:16 AM · Restricted Project
nhaehnle added a comment to D64946: [AMDGPU] Fix trivial PHI into SI_END_CF..

How about the following simpler logic:

Fri, Jul 19, 5:22 AM · Restricted Project
nhaehnle added a comment to D64935: [AMDGPU] Add llvm.amdgcn.softwqm intrinsic.

Have you checked that this actually fixes the reported CTS failure?

Fri, Jul 19, 5:16 AM · Restricted Project

Thu, Jul 18

nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

But then following this logic, I still think that by analogy with G_ZEXT, the operation of COPY from s1 into vcc should have the semantics of ignoring the high bits of the "s1 which is really an s32". Since there's nothing in the MIR test which guarantees that the incoming high bits of $sgpr0 are 0, the resulting code needs to have some form of masking.

Thu, Jul 18, 7:20 AM
nhaehnle accepted D64490: AMDGPU/GlobalISel: Selection for fminnum/fmaxnum.

That seems reasonable to me.

Thu, Jul 18, 7:12 AM

Wed, Jul 17

nhaehnle committed rG8b7041a5c6f0: AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC (authored by nhaehnle).
AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC
Wed, Jul 17, 4:24 AM
nhaehnle committed rGa256b8b7d77c: AMDGPU: Improve alias analysis for GDS (authored by nhaehnle).
AMDGPU: Improve alias analysis for GDS
Wed, Jul 17, 4:24 AM
nhaehnle added inline comments to D64114: AMDGPU: Add missing code for GDS.
Wed, Jul 17, 4:23 AM · Restricted Project

Tue, Jul 16

nhaehnle updated the diff for D64807: AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC.

Add missing test changes

Tue, Jul 16, 10:31 AM · Restricted Project
nhaehnle created D64807: AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC.
Tue, Jul 16, 9:59 AM · Restricted Project
nhaehnle updated the diff for D64114: AMDGPU: Add missing code for GDS.

Add test case and remove the legalizer part.

Tue, Jul 16, 9:44 AM · Restricted Project
nhaehnle commandeered D64114: AMDGPU: Add missing code for GDS.

I'm taking this over.

Tue, Jul 16, 6:43 AM · Restricted Project
nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

This seems incorrect, doesn't it? The truncation disappeared.... (e.g., what if $sgpr0 is 0x10)

My current understanding of G_TRUNC is it's a no-op, and supposed to always be legal. This is supposed to be the legalized MIR, so theoretically this was generated by something that knew the original argument was zeroext from i1

Tue, Jul 16, 6:42 AM
nhaehnle added inline comments to D63639: [AMDGPU] Prevent backend override of WGP when using PAL.
Tue, Jul 16, 4:06 AM · Restricted Project
nhaehnle added a comment to D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC.

This seems incorrect, doesn't it? The truncation disappeared.... (e.g., what if $sgpr0 is 0x10)

Tue, Jul 16, 2:56 AM
nhaehnle added a comment to D64490: AMDGPU/GlobalISel: Selection for fminnum/fmaxnum.

Oh, is that because the new node causes the intrinsics to be lowered to G_FMINNUM etc.? Why doesn't this affect any other targets?

Tue, Jul 16, 2:45 AM
nhaehnle added a comment to D64490: AMDGPU/GlobalISel: Selection for fminnum/fmaxnum.

Why are you removing the testcases for the intrinsics?

Tue, Jul 16, 2:44 AM
nhaehnle accepted D64725: AMDGPU/GlobalISel: Select G_SHL.

Yeah, let's take this.

Tue, Jul 16, 2:42 AM
nhaehnle accepted D64344: AMDGPU: Add register classes to flat store patterns.

Would obviously be good to fix the underlying issue, but sure, this seems reasonable.

Tue, Jul 16, 2:37 AM

Fri, Jul 12

nhaehnle accepted D64186: [NewPM] Port MachineDominatorTree analysis to the new PM..

You should probably wait a bit in case somebody else wants to chime in, but this looks good to me.

Fri, Jul 12, 6:09 AM · Restricted Project

Tue, Jul 2

nhaehnle added inline comments to D62766: [Attributor] Deduce "nosync" function attribute..
Tue, Jul 2, 12:19 AM · Restricted Project

Mon, Jul 1

nhaehnle committed rG10c911db63ec: AMDGPU/GFX10: implement ds_ordered_count changes (authored by nhaehnle).
AMDGPU/GFX10: implement ds_ordered_count changes
Mon, Jul 1, 10:19 AM
nhaehnle committed rG4dc3b2bf95b0: AMDGPU: Support GDS atomics (authored by nhaehnle).
AMDGPU: Support GDS atomics
Mon, Jul 1, 10:19 AM
nhaehnle added inline comments to D63452: AMDGPU: Support some GDS atomics.
Mon, Jul 1, 9:16 AM · Restricted Project
nhaehnle updated the diff for D63452: AMDGPU: Support some GDS atomics.

Address review comments

Mon, Jul 1, 9:16 AM · Restricted Project
nhaehnle committed rG7cfd99ab15d0: AMDGPU/GFX10: fix scratch resource descriptor (authored by nhaehnle).
AMDGPU/GFX10: fix scratch resource descriptor
Mon, Jul 1, 8:46 AM
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1220

Mon, Jul 1, 8:36 AM · Restricted Project
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

I'm currently looking into it.

Mon, Jul 1, 5:15 AM · Restricted Project
nhaehnle added inline comments to D62766: [Attributor] Deduce "nosync" function attribute..
Mon, Jul 1, 3:10 AM · Restricted Project
nhaehnle accepted D63824: AMDGPU: Add pass to lower SGPR spills.

One nit, apart from that LGTM.

Mon, Jul 1, 3:02 AM
nhaehnle accepted D63819: AMDGPU/GlobalISel: Improve icmp selection coverage..

LGTM

Mon, Jul 1, 2:53 AM
nhaehnle accepted D63766: AMDGPU/GlobalISel: Use and instead of BFE with inline immediate.

Thanks, LGTM

Mon, Jul 1, 2:52 AM
nhaehnle accepted D63799: AMDGPU/GlobalISel: Fix scc->vcc copy handling.

LGTM

Mon, Jul 1, 2:48 AM
nhaehnle accepted D63798: AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT.

LGTM

Mon, Jul 1, 2:45 AM
nhaehnle accepted D63413: AMDGPU/GlobalISel: RegBankSelect for WWM/WQM.

LGTM

Mon, Jul 1, 2:41 AM
nhaehnle accepted D63408: AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote.

LGTM

Mon, Jul 1, 2:41 AM
nhaehnle added a comment to D63814: [TableGen] Allow DAG isel patterns to override default operands..

Would it be possible to make default operands overridable automatically iff they are at the end of the operand list? I.e., if you have a suffix of default operands, then those can be overridden?

Mon, Jul 1, 2:06 AM · Restricted Project
nhaehnle accepted D63953: [AMDGPU] LCSSA pass added in preISel. .

Thanks for dealing with this. Matt's suggestion is reasonable to me, either way LGTM.

Mon, Jul 1, 2:04 AM · Restricted Project
nhaehnle accepted D63980: [AMDGPU] Call isLoopExiting for blocks in the loop..

LGTM

Mon, Jul 1, 2:04 AM · Restricted Project

Fri, Jun 28

nhaehnle accepted D63412: AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap.

LGTM

Fri, Jun 28, 8:55 AM
nhaehnle updated the diff for D63808: AMDGPU/GFX10: fix scratch resource descriptor.

Properly test based on wavefront size

Fri, Jun 28, 6:37 AM · Restricted Project
nhaehnle updated the diff for D63808: AMDGPU/GFX10: fix scratch resource descriptor.

Add a test case

Fri, Jun 28, 6:28 AM · Restricted Project
nhaehnle added a comment to D63452: AMDGPU: Support some GDS atomics.

ping

Fri, Jun 28, 5:27 AM · Restricted Project

Thu, Jun 27

nhaehnle committed rG32ef9292bea1: AMDGPU: Make fixing i1 copies robust against re-ordering (authored by nhaehnle).
AMDGPU: Make fixing i1 copies robust against re-ordering
Thu, Jun 27, 9:58 AM
nhaehnle added a comment to D63520: AMDGPU: Use ReversePostOrder when fixing i1 copies.

I'd prefer the alternative fix at D63871, since it doesn't require RPOT.

Thu, Jun 27, 4:43 AM · Restricted Project
nhaehnle created D63871: AMDGPU: Make fixing i1 copies robust against re-ordering.
Thu, Jun 27, 4:43 AM · Restricted Project
nhaehnle requested changes to D62766: [Attributor] Deduce "nosync" function attribute..

This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).

Thu, Jun 27, 1:52 AM · Restricted Project

Wed, Jun 26

nhaehnle committed rG806600987d39: llvm-objcopy: silence warning introduced in r364296 (authored by nhaehnle).
llvm-objcopy: silence warning introduced in r364296
Wed, Jun 26, 12:18 PM
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Oh... radv declares compute LDS as an LLVM global variable but doesn't use rtld yet, right? Sorry, I missed that.

Wed, Jun 26, 11:57 AM · Restricted Project
nhaehnle created D63808: AMDGPU/GFX10: fix scratch resource descriptor.
Wed, Jun 26, 12:53 AM · Restricted Project

Tue, Jun 25

nhaehnle added inline comments to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Tue, Jun 25, 5:45 AM
nhaehnle accepted D63751: AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT.

Trivial nitpick, but essentially LGTM.

Tue, Jun 25, 5:14 AM
nhaehnle committed rG2710171a15e8: AMDGPU: Write LDS objects out as global symbols in code generation (authored by nhaehnle).
AMDGPU: Write LDS objects out as global symbols in code generation
Tue, Jun 25, 4:59 AM
nhaehnle committed rG08e8cb576021: AMDGPU/MC: Add .amdgpu_lds directive (authored by nhaehnle).
AMDGPU/MC: Add .amdgpu_lds directive
Tue, Jun 25, 4:58 AM
nhaehnle closed D61493: AMDGPU/MC: Add .amdgpu_lds directive.
Tue, Jun 25, 4:58 AM · Restricted Project
nhaehnle added inline comments to D63716: AMDGPU/GFX10: implement ds_ordered_count changes.
Tue, Jun 25, 4:49 AM · Restricted Project

Mon, Jun 24

nhaehnle accepted D63715: AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelect.

LGTM

Mon, Jun 24, 10:28 AM
nhaehnle accepted D63414: AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1.

LGTM

Mon, Jun 24, 8:50 AM
nhaehnle added inline comments to D63484: AMDGPU/GlobalISel: Make s16 G_ICMP legal.
Mon, Jun 24, 8:45 AM
nhaehnle accepted D63721: [AMDGPU] Remove unused variable AllSGPRSpilledToVGPRs. NFC.

LGTM

Mon, Jun 24, 8:45 AM · Restricted Project
nhaehnle created D63716: AMDGPU/GFX10: implement ds_ordered_count changes.
Mon, Jun 24, 7:18 AM · Restricted Project
nhaehnle added inline comments to D61493: AMDGPU/MC: Add .amdgpu_lds directive.
Mon, Jun 24, 7:17 AM · Restricted Project
nhaehnle added inline comments to D63452: AMDGPU: Support some GDS atomics.
Mon, Jun 24, 7:17 AM · Restricted Project
nhaehnle updated the diff for D63452: AMDGPU: Support some GDS atomics.

Address review

Mon, Jun 24, 7:17 AM · Restricted Project
nhaehnle updated the diff for D61493: AMDGPU/MC: Add .amdgpu_lds directive.

Address review

Mon, Jun 24, 6:48 AM · Restricted Project
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Thanks.

Mon, Jun 24, 6:48 AM · Restricted Project
nhaehnle added a comment to D63420: AMDGPU: Fix s.buffer.load being marked as readnone.

What does this actually fix?

This is likely to pessimize codegen for graphics quite badly. Graphics APIs make fairly strong aliasing guarantees which we don't properly express in LLVM at the moment, and we kind of get by without it by having s.buffer.load be readnone.

When trying to do some global isel work, I ran into inconsistencies in the attributes (e.g. D63422), and I don't want to spread awareness of this hack to more places. We can't really fix this until we have fat pointers. I would rather have the declaration be accurate. If graphics users want to assume readnone as a performance hack until that is fixed, they can annotate every call site with readnone.

Mon, Jun 24, 5:56 AM

Jun 17 2019

nhaehnle added inline comments to D63452: AMDGPU: Support some GDS atomics.
Jun 17 2019, 1:00 PM · Restricted Project
nhaehnle updated the diff for D63452: AMDGPU: Support some GDS atomics.
  • region pointers cannot appear as flat pointers
  • expand the tests, handle the cmpxchg case
Jun 17 2019, 12:55 PM · Restricted Project
nhaehnle updated the diff for D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Address review comment

Jun 17 2019, 12:54 PM · Restricted Project
nhaehnle committed rGae4fcb97dde0: AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer (authored by nhaehnle).
AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer
Jun 17 2019, 12:27 PM
nhaehnle committed rG8af7198c6caa: AMDGPU: Explicitly define a triple for some tests (authored by nhaehnle).
AMDGPU: Explicitly define a triple for some tests
Jun 17 2019, 12:24 PM
nhaehnle created D63452: AMDGPU: Support some GDS atomics.
Jun 17 2019, 11:51 AM · Restricted Project
nhaehnle accepted D63431: AMDGPU: Fold readlane/readfirstlane calls.

LGTM

Jun 17 2019, 9:04 AM
nhaehnle added a comment to D63431: AMDGPU: Fold readlane/readfirstlane calls.

Good idea, but needs a fix I think.

Jun 17 2019, 8:10 AM
nhaehnle created D63427: AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer.
Jun 17 2019, 6:09 AM · Restricted Project
nhaehnle added a comment to D63420: AMDGPU: Fix s.buffer.load being marked as readnone.

I should clarify that I'm not opposed to this kind of change in the long run, but as-is it needs a careful look at the performance implications.

Jun 17 2019, 5:33 AM
nhaehnle requested changes to D63420: AMDGPU: Fix s.buffer.load being marked as readnone.

What does this actually fix?

Jun 17 2019, 5:33 AM
nhaehnle accepted D63406: AMDGPU: Mark exp/exp.compr as inaccessiblememonly.

I don't quite understand why you think this can't be writeonly? (The flip side is, I don't see how having this writeonly would help codegen)

Jun 17 2019, 5:26 AM
nhaehnle committed rG582f2692945a: AsmPrinter: add doc-string for EmitLinkage (authored by nhaehnle).
AsmPrinter: add doc-string for EmitLinkage
Jun 17 2019, 5:22 AM
nhaehnle added inline comments to D61650: [AsmPrinter] Make EmitLinkage and EmitVisibility public.
Jun 17 2019, 5:22 AM · Restricted Project

Jun 16 2019

nhaehnle added a parent revision for D61494: AMDGPU: Write LDS objects out as global symbols in code generation: D63392: AMDGPU: Explicitly define a triple for some tests.
Jun 16 2019, 6:06 PM · Restricted Project
nhaehnle added a child revision for D63392: AMDGPU: Explicitly define a triple for some tests: D61494: AMDGPU: Write LDS objects out as global symbols in code generation.
Jun 16 2019, 6:06 PM · Restricted Project
nhaehnle updated the diff for D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Wrap LDS TargetGlobalAddress nodes in an AMDGPUISD::LDS node that is
selected to S_MOV/V_MOV. This should address the concerns about having
more machine nodes during SelectionDAG, and as a nice side bonus it
actually improves the quality of known-bits information by adding
alignment information as well.

Jun 16 2019, 6:06 PM · Restricted Project
nhaehnle added inline comments to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.
Jun 16 2019, 4:43 PM · Restricted Project
nhaehnle updated the diff for D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Rebase, address some of the comments.

Jun 16 2019, 4:39 PM · Restricted Project
nhaehnle created D63392: AMDGPU: Explicitly define a triple for some tests.
Jun 16 2019, 4:38 PM · Restricted Project
nhaehnle updated the diff for D61493: AMDGPU/MC: Add .amdgpu_lds directive.

Significant update to this patch to make .amdgpu_lds work more
like common symbols:

Jun 16 2019, 4:37 PM · Restricted Project
nhaehnle added a comment to D63225: AMDGPU: Fold readlane from copy of SGPR or imm.

Thanks, this is very helpful!

Jun 16 2019, 11:46 AM
nhaehnle accepted D63278: AMDGPU: Cleanup custom PseudoSourceValue definitions.

LGTM

Jun 16 2019, 11:34 AM