nhaehnle (Nicolai Hähnle)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 9 2015, 4:06 AM (85 w, 1 d)

Recent Activity

Thu, May 18

nhaehnle created D33319: AMDGPU: M0 operands to spill/restore opcodes are dead.
Thu, May 18, 8:10 AM

Mon, May 8

nhaehnle added a comment to D32862: [AMDGPU] add intrinsic for s_getpc.

Low & high bits and use cases aside, isn't this something that could be covered by adding support for PC in the read_register intrinsic?

Mon, May 8, 3:18 AM

Apr 24 2017

nhaehnle added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Is it at all possible to get merged shaders where either part has thread_count == 0? We might want a way to annotate branches so that the skip-jump for EXEC=0 is not introduced.

Yes, thread_count == 0 is possible, and it's explained here: https://patchwork.freedesktop.org/patch/152356/

Apr 24 2017, 10:24 AM
nhaehnle added inline comments to D32344: InstCombine/AMDGPU: Fix constant folding of llvm.amdgcn.{icmp,fcmp}.
Apr 24 2017, 10:20 AM
nhaehnle added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

I don't see anything wrong with the code.

I agree that the design is a bit iffy. It's almost like these intrinsics are something that is part of the calling convention. But even these intrinsics cannot quite lead to optimal code for merged monolithic shaders, because there's an unnecessary initialization of EXEC in the first part of the shader.

Since what we need to do here in general really doesn't fit well into LLVM IR semantics, I suspect that no matter what we come up with, it's bound to be ugly. So we might as well go with this particular solution here.

Merged monolithic shaders set exec = -1 and then they use "if (tid < thread_count) ...". I think that's the only way to jump over the conditional code right now if the wave has thread_count == 0. If we don't want v_mbcnt+v_cmp, we could do something like "if (amdgcn.set.thread_count(n)) ..." that sets EXEC regardless of current EXEC and skips the conditional for thread_count == 0. The performance of that solution is unlikely to justify the implementation effort.

Apr 24 2017, 10:13 AM
nhaehnle updated the diff for D32344: InstCombine/AMDGPU: Fix constant folding of llvm.amdgcn.{icmp,fcmp}.

Address review comments and apply some more formatting fixes.

Apr 24 2017, 10:05 AM
nhaehnle updated the diff for D32343: AMDGPU: Move v_readlane lane select from VGPR to SGPR.

Address review comments.

Apr 24 2017, 10:04 AM
nhaehnle accepted D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

I don't see anything wrong with the code.

Apr 24 2017, 9:19 AM

Apr 21 2017

nhaehnle created D32345: AMDGPU: Fix crash when scheduling non-memory SMRD instructions.
Apr 21 2017, 3:44 AM
nhaehnle created D32344: InstCombine/AMDGPU: Fix constant folding of llvm.amdgcn.{icmp,fcmp}.
Apr 21 2017, 3:09 AM
nhaehnle created D32343: AMDGPU: Move v_readlane lane select from VGPR to SGPR.
Apr 21 2017, 2:21 AM

Apr 19 2017

nhaehnle accepted D31222: AMDPU: Change DivergenceAnalysis for function arguments.

LGTM

Apr 19 2017, 9:14 AM
nhaehnle accepted D31476: AMDGPU: Don't align callable functions to 256.

There might be a benefit to choosing a larger alignment, like 64 bytes, due to cache line alignment.

Apr 19 2017, 9:10 AM
nhaehnle accepted D30852: StructurizeCFG: Directly invert cmp instructions.

LGTM

Apr 19 2017, 9:07 AM

Apr 8 2017

nhaehnle added a comment to D31161: [AMDGPU] New Waitcnt Insertion Pass.

When applied to current trunk, this code can get into an infinite loop. Please see the sample LLVM IR at https://paste.debian.net/926533/

Apr 8 2017, 2:04 AM

Feb 15 2017

nhaehnle added inline comments to D26005: AMDGPU: Don't use stack space for SGPR->VGPR spills.
Feb 15 2017, 5:14 AM

Feb 10 2017

nhaehnle added a reviewer for D26005: AMDGPU: Don't use stack space for SGPR->VGPR spills: nhaehnle.
Feb 10 2017, 9:26 AM
nhaehnle added inline comments to D26005: AMDGPU: Don't use stack space for SGPR->VGPR spills.
Feb 10 2017, 9:25 AM

Feb 1 2017

nhaehnle added a comment to D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Can you be more specific about why it's incorrect? I only see an issue with L1 coherency (GLC=0).

Feb 1 2017, 4:04 AM
nhaehnle added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

I haven't looked in too much detail yet. I assume getelementptr doesn't work with these pointers, so it would be good to have a negative test which ensures that GEP use fails.

Feb 1 2017, 3:51 AM
nhaehnle added a comment to D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Feb 1 2017, 3:46 AM

Jan 30 2017

nhaehnle updated the diff for D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

Backing out the effectively dead FMF code, but leave the corresponding
FIXMEs in place.

Jan 30 2017, 1:27 AM

Jan 27 2017

nhaehnle added a comment to D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

Yeah, I personally feel more comfortable having the exact same change on the branch.

I meant that having getFlags() always return something dereferenceable seems safer.

Jan 27 2017, 12:59 PM
nhaehnle added a comment to D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

Yeah, I personally feel more comfortable having the exact same change on the branch.

Jan 27 2017, 12:45 PM
nhaehnle updated the diff for D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

Drop unnecessary getNode()s.

Jan 27 2017, 12:16 PM
nhaehnle added a comment to D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

But you can't because they'll crash [...]

Jan 27 2017, 12:13 PM
nhaehnle added a comment to D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

@arsenm, unfortunately I can't add flag-based tests, since as @hfinkel points out, you can't actually get the flag on an FMA or FMAD in the DAG. @hfinkel, the idea was to put the check in already anyway, since by the discussion with @spatel, the plan is to add FMF for all nodes eventually. That seems reasonable to me.

Jan 27 2017, 11:24 AM
nhaehnle updated the diff for D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

Formatting of FIXMEs

Jan 27 2017, 11:24 AM

Jan 23 2017

nhaehnle accepted D28351: AMDGPU: Remove spurious out branches after a kill.

LGTM.

Jan 23 2017, 5:30 AM

Jan 20 2017

nhaehnle updated the diff for D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

Added the DefaultFlags. It does fix @spatel's example for me. Anything else?

Jan 20 2017, 6:56 AM

Jan 19 2017

nhaehnle added a comment to D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

Obviously it'd be even easier to skip the getFlags() thing entirely now, since it can never evaluate to true. But the right way forward is to extend support for FMF, so...

Jan 19 2017, 1:32 PM
nhaehnle added a comment to D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

The patch will cause a crash [...]

Jan 19 2017, 1:30 PM
nhaehnle updated the diff for D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.

Add FIXMEs. @hfinkel, do I understand your comment correctly that this patch is
good to go?

Jan 19 2017, 3:41 AM

Jan 17 2017

nhaehnle updated the diff for D28675: [DAGCombine] require UnsafeFPMath for re-association of addition.
  1. Take individual UnsafeAlgebra flags into account.
Jan 17 2017, 3:04 AM

Jan 13 2017

nhaehnle added a comment to D28351: AMDGPU: Remove spurious out branches after a kill.

Thanks! We generally use CHECK-LABEL only for the start of a function, not for labels inside a function. The idea is that the label helps FileCheck get less confused when there are many sub-tests in a single test file.

Jan 13 2017, 1:20 PM
nhaehnle added a comment to D28351: AMDGPU: Remove spurious out branches after a kill.

The approach is unfortunately not correct in general as far as I can tell. Say you have

Jan 13 2017, 6:15 AM
nhaehnle retitled D28675: [DAGCombine] require UnsafeFPMath for re-association of addition from to [DAGCombine] require UnsafeFPMath for re-association of addition.
Jan 13 2017, 5:38 AM

Jan 12 2017

nhaehnle accepted D28102: AMDGPU: Fold fneg into fmul_legacy.

LGTM

Jan 12 2017, 10:15 AM
nhaehnle accepted D28615: AMDGPU: Fold free fneg into sin.

LGTM

Jan 12 2017, 10:15 AM
nhaehnle accepted D28101: AMDGPU: Fold fneg into rcp.

I don't see the change that merges the FP_EXTEND / FP_ROUND cases. But the end result LGTM.

Jan 12 2017, 1:17 AM
nhaehnle accepted D28100: AMDGPU: Fold fneg into fp_round.

LGTM

Jan 12 2017, 1:15 AM
nhaehnle accepted D28099: AMDGPU: Fold fneg into fp_extend .

LGTM

Jan 12 2017, 1:14 AM

Jan 11 2017

nhaehnle added a comment to D28101: AMDGPU: Fold fneg into rcp.

This looks like it's missing a rebase onto the rest of the series.

Jan 11 2017, 2:49 PM
nhaehnle accepted D28098: AMDGPU: Fold fneg into fma or fmad .

LGTM

Jan 11 2017, 2:43 PM
nhaehnle accepted D28097: AMDGPU: Fold fneg into fmul if free.

LGTM

Jan 11 2017, 2:43 PM
nhaehnle accepted D28096: AMDGPU: Fold fneg into fadd if free.

LGTM

Jan 11 2017, 2:42 PM
nhaehnle accepted D28240: AMDGPU: Don't produce v_mov_b64_pseudo when folding operands.

A test case would be good. Apart from that, LGTM.

Jan 11 2017, 2:20 PM
nhaehnle accepted D28256: AMDGPU: Skip fneg/select combine if it can fold into other.

One minor comment, LGTM otherwise.

Jan 11 2017, 2:15 PM
nhaehnle accepted D27966: AMDGPU: Pull fneg/fabs out of a select if free.

LGTM

Jan 11 2017, 2:10 PM
nhaehnle accepted D28043: AMDGPU: Undo sub x, c -> add x, -c canonicalization.

LGTM

Jan 11 2017, 1:04 PM
nhaehnle accepted D28065: AMDGPU: Fix shrinking of addc/subb..

LGTM

Jan 11 2017, 12:53 PM
nhaehnle accepted D28062: AMDGPU: Fix breaking VOP3 v_add_i32s.

LGTM

Jan 11 2017, 12:51 PM
nhaehnle accepted D28509: AMDGPU: Fix folding immediates into mac src2.

LGTM

Jan 11 2017, 12:39 PM

Dec 16 2016

nhaehnle accepted D27844: Fixed '!NodePtr->isKnownSentinel()' assert caused by dereferencing end iterator when trying to const cast the iterator..

LGTM

Dec 16 2016, 5:01 AM · Restricted Project

Dec 12 2016

nhaehnle added a comment to D26348: Allow convergent attribute for function arguments.

Thank you for taking a look!

Dec 12 2016, 8:58 AM

Dec 9 2016

nhaehnle added a comment to D26348: Allow convergent attribute for function arguments.

We need to clarify, there are two parts in my comments:

  1. "divergence" is not common in LLVM. I stand on this point: the DivergenceAnalysis is a GPU specific pass and is hardly part of the core semantic of the IR.
Dec 9 2016, 3:59 AM

Dec 8 2016

nhaehnle accepted D27553: AMDGPU: Fix commuting v_sub_u16.

LGTM

Dec 8 2016, 6:30 AM
nhaehnle retitled D27572: AMDGPU: llvm.amdgcn.interp.mov is a source of divergence from to AMDGPU: llvm.amdgcn.interp.mov is a source of divergence.
Dec 8 2016, 6:22 AM
nhaehnle added a comment to D26348: Allow convergent attribute for function arguments.

Thank you for taking a look.

Dec 8 2016, 3:37 AM

Dec 7 2016

nhaehnle added inline comments to D26348: Allow convergent attribute for function arguments.
Dec 7 2016, 6:59 AM
nhaehnle added inline comments to D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.
Dec 7 2016, 6:52 AM
nhaehnle updated the diff for D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.
  • Use an enum instead of OnlyLegalOrCustom
  • Handle a corner case where shift amounts are too large for the native shift type
Dec 7 2016, 6:50 AM
nhaehnle updated the diff for D27344: AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg.
  • Test case using alloca
  • Unify getting the MUBUF instruction offset
Dec 7 2016, 5:12 AM
nhaehnle abandoned D27347: X86: Add checks for fma_patterns_distributive[_wide].ll with -enable-no-infs-fp-math.

Obsoleted by changes to D27346.

Dec 7 2016, 4:14 AM
nhaehnle retitled D27346: X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math from X86: Split fma_pattern[_wide].ll to X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math.
Dec 7 2016, 4:13 AM
nhaehnle updated the diff for D27346: X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math.

Don't split the files, just double the RUN lines in the existing files.

Dec 7 2016, 4:12 AM

Dec 2 2016

nhaehnle added inline comments to D27344: AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg.
Dec 2 2016, 10:29 AM
nhaehnle added a comment to D27346: X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math.

Why is this and D27347 better than just tweaking the check-prefixes?

Dec 2 2016, 9:30 AM
nhaehnle added a comment to D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.

Ping.

Dec 2 2016, 8:22 AM
nhaehnle added a dependent revision for D27346: X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math: D27347: X86: Add checks for fma_patterns_distributive[_wide].ll with -enable-no-infs-fp-math.
Dec 2 2016, 8:20 AM
nhaehnle added a dependency for D27347: X86: Add checks for fma_patterns_distributive[_wide].ll with -enable-no-infs-fp-math: D27346: X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math.
Dec 2 2016, 8:20 AM
nhaehnle retitled D27347: X86: Add checks for fma_patterns_distributive[_wide].ll with -enable-no-infs-fp-math from to X86: Add checks for fma_patterns_distributive[_wide].ll with -enable-no-infs-fp-math.
Dec 2 2016, 8:20 AM
nhaehnle retitled D27346: X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math from to X86: Split fma_pattern[_wide].ll.
Dec 2 2016, 8:19 AM
nhaehnle retitled D27344: AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg from to AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg.
Dec 2 2016, 7:27 AM

Dec 1 2016

nhaehnle updated the diff for D26602: [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default.

Rearrange the logic. It looks quite readable to me this way, and
clang-format-diff agrees with the formatting.

Dec 1 2016, 7:30 AM

Nov 30 2016

nhaehnle updated the diff for D26602: [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default.

Rebased on D27260.

Nov 30 2016, 8:52 AM
nhaehnle retitled D27260: [SelectionDAG] Rename and clarify visitFMULForFMADCombine (NFC) from to [SelectionDAG] Rename and clarify visitFMULForFMADCombine (NFC).
Nov 30 2016, 8:43 AM
nhaehnle added a comment to D26602: [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default.
  1. There's an unanswered question of whether we can do this with FPOpFusion::Fast with TargetOptions::NoInfsFPMath. That depends on whether x * y + x is always more precise than the original x * (y + 1).
Nov 30 2016, 8:33 AM

Nov 28 2016

nhaehnle accepted D27149: AMDGPU/SI: Allow using SGPRs 96-101 on VI.

LGTM

Nov 28 2016, 7:29 AM
nhaehnle accepted D26998: [StructurizeCFG] Refactor NearestCommonDominator..

I agree, this is a nice conceptual simplification. LGTM.

Nov 28 2016, 6:40 AM
nhaehnle updated the diff for D27064: [SelectionDAG] Refactor TargetLowering::expandMUL (NFC).

Get rid of DL to avoid possible confusion with dl.

Nov 28 2016, 6:27 AM

Nov 24 2016

nhaehnle updated the diff for D24822: [SelectionDAG] Enable division-by-constant optimization for wide types.

Rebase on latest version of D24956.

Nov 24 2016, 1:34 AM

Nov 23 2016

nhaehnle updated the diff for D27064: [SelectionDAG] Refactor TargetLowering::expandMUL (NFC).

Fix thinko switching signed/unsigned Has[US]MUL_LOHI and HasMULH[US] check
in MakeMUL_LOHI.

Nov 23 2016, 1:45 PM
nhaehnle added inline comments to D27064: [SelectionDAG] Refactor TargetLowering::expandMUL (NFC).
Nov 23 2016, 1:42 PM
nhaehnle updated the diff for D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.

Add assertion that HalfType is legal.

Nov 23 2016, 1:26 PM
nhaehnle added inline comments to D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.
Nov 23 2016, 1:15 PM
nhaehnle added a dependency for D27064: [SelectionDAG] Refactor TargetLowering::expandMUL (NFC): D27063: [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC).
Nov 23 2016, 1:12 PM
nhaehnle added a dependent revision for D27063: [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC): D27064: [SelectionDAG] Refactor TargetLowering::expandMUL (NFC).
Nov 23 2016, 1:12 PM
nhaehnle added a dependent revision for D27063: [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC): D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.
Nov 23 2016, 1:12 PM
nhaehnle added a dependent revision for D27064: [SelectionDAG] Refactor TargetLowering::expandMUL (NFC): D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.
Nov 23 2016, 1:12 PM
nhaehnle added dependencies for D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI: D27063: [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC), D27064: [SelectionDAG] Refactor TargetLowering::expandMUL (NFC).
Nov 23 2016, 1:12 PM
nhaehnle updated the diff for D24956: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.

Thank you for taking a look.

Nov 23 2016, 1:11 PM
nhaehnle retitled D27064: [SelectionDAG] Refactor TargetLowering::expandMUL (NFC) from to [SelectionDAG] Refactor TargetLowering::expandMUL (NFC).
Nov 23 2016, 1:09 PM
nhaehnle retitled D27063: [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC) from to [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC).
Nov 23 2016, 1:08 PM
nhaehnle updated the diff for D26348: Allow convergent attribute for function arguments.
  • Clarify that the set of threads is target-dependent (I'm still open to eliminating that paragraph outright if people find it confusing)
  • Talk about executions of functions
  • No mention of concurrency or simultaneous execution (except for the motivational explanation in the first paragraph), per @mehdi_amini
  • Add an explicit paragraph about calling functions containing call-sites with convergent function parameters
Nov 23 2016, 6:15 AM

Nov 21 2016

nhaehnle added inline comments to D26348: Allow convergent attribute for function arguments.
Nov 21 2016, 8:15 AM
nhaehnle added inline comments to D26348: Allow convergent attribute for function arguments.
Nov 21 2016, 7:43 AM
nhaehnle accepted D26874: AMDGPU: Remove m0 spilling code.

LGTM

Nov 21 2016, 1:44 AM
nhaehnle accepted D12067: AMDGPU: Refactor exp instructions.

LGTM

Nov 21 2016, 1:42 AM

Nov 18 2016

nhaehnle added inline comments to D26725: AMDGPU: Add llvm.amdgcn.interp.mov intrinsic.
Nov 18 2016, 12:01 PM