Page MenuHomePhabricator

nhaehnle (Nicolai Hähnle)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 9 2015, 4:06 AM (188 w, 4 d)

Recent Activity

Thu, May 16

nhaehnle accepted D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

Thanks!

Thu, May 16, 12:14 AM · Restricted Project

Wed, May 15

nhaehnle committed rGf672b6170ce8: [MachineOperand] Add a ChangeToGA method (authored by nhaehnle).
[MachineOperand] Add a ChangeToGA method
Wed, May 15, 10:46 AM
nhaehnle added a comment to D61492: AMDGPU: Prepare for explicit absolute relocations in code generation.

ping?

Wed, May 15, 10:34 AM · Restricted Project
nhaehnle added a comment to D61650: [AsmPrinter] Make EmitLinkage and EmitVisibility public.

ping?

Wed, May 15, 10:34 AM · Restricted Project
nhaehnle added a comment to D61491: AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0.

ping?

Wed, May 15, 10:33 AM · Restricted Project
nhaehnle committed rG664ceeda6857: RegAlloc: try to fail more gracefully when out of registers (authored by nhaehnle).
RegAlloc: try to fail more gracefully when out of registers
Wed, May 15, 10:30 AM
nhaehnle added inline comments to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.
Wed, May 15, 9:19 AM · Restricted Project

Tue, May 14

nhaehnle added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

LGTM apart from a bunch of formatting issues. I haven't marked all of them, please just run clang-format or clang-format-diff.

Tue, May 14, 6:23 AM
nhaehnle added a comment to D60772: [AMDGPU] Add optional bounds checking for scratch accesses.

I don't understand the problem being solved here. Who/what is this intended to benefit? An out of bounds access is going to be undefined.

Tue, May 14, 6:23 AM · Restricted Project
nhaehnle accepted D61261: [AMDGPU] Increases available SGPR for Calling Convention.

LGTM

Tue, May 14, 6:04 AM · Restricted Project
nhaehnle accepted D61374: AMDGPU: Don't clobber VCC in MUBUF addr64 emulation.

LGTM

Tue, May 14, 6:00 AM
nhaehnle added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

Thanks! This looks good, except I still believe the test case can be simplified.

Tue, May 14, 5:58 AM · Restricted Project
nhaehnle accepted D61888: TableGen: support #ifndef in addition to #ifdef.

LGTM

Tue, May 14, 5:58 AM · Restricted Project

Mon, May 13

nhaehnle added a comment to D61812: [AMDGPU] Fixed handling of imemdiate i1 literals.

Why does this return false? A 1-bit immediate is either 0 or -1, both of which can be represented as inline constants everywhere.

Mon, May 13, 7:22 AM · Restricted Project
nhaehnle added inline comments to D60457: [CodeGen] Fixed de-optimization of legalize subvector extract.
Mon, May 13, 5:21 AM · Restricted Project

Tue, May 7

nhaehnle added a parent revision for D61494: AMDGPU: Write LDS objects out as global symbols in code generation: D61650: [AsmPrinter] Make EmitLinkage and EmitVisibility public.
Tue, May 7, 1:22 PM · Restricted Project
nhaehnle added a child revision for D61650: [AsmPrinter] Make EmitLinkage and EmitVisibility public: D61494: AMDGPU: Write LDS objects out as global symbols in code generation.
Tue, May 7, 1:22 PM · Restricted Project
nhaehnle added a parent revision for D61494: AMDGPU: Write LDS objects out as global symbols in code generation: D61651: [MachineOperand] Add a ChangeToGA method.
Tue, May 7, 1:22 PM · Restricted Project
nhaehnle updated the diff for D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Address review comments.

Tue, May 7, 1:22 PM · Restricted Project
nhaehnle added a child revision for D61651: [MachineOperand] Add a ChangeToGA method: D61494: AMDGPU: Write LDS objects out as global symbols in code generation.
Tue, May 7, 1:22 PM · Restricted Project
nhaehnle created D61651: [MachineOperand] Add a ChangeToGA method.
Tue, May 7, 1:22 PM · Restricted Project
nhaehnle created D61650: [AsmPrinter] Make EmitLinkage and EmitVisibility public.
Tue, May 7, 1:22 PM · Restricted Project
nhaehnle added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Is it still possible to make use constant addresses in some cases? If there are no calls for example

Tue, May 7, 1:20 PM · Restricted Project
nhaehnle updated the diff for D61493: AMDGPU/MC: Add .amdgpu_lds directive.
  • Use '\n' instead of "\n"
  • Cleanup and extend tests
Tue, May 7, 5:58 AM · Restricted Project
nhaehnle updated the diff for D61492: AMDGPU: Prepare for explicit absolute relocations in code generation.

Add a test/CodeGen/MIR/AMDGPU test.

Tue, May 7, 4:48 AM · Restricted Project
nhaehnle added a comment to D61491: AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0.

But these are often -1?

Tue, May 7, 4:37 AM · Restricted Project
nhaehnle abandoned D61553: AMDGPU: Fix ds_{read,write}2_b64 on SI/gfx6.

After some more investigation, you're right.

Tue, May 7, 4:32 AM · Restricted Project
nhaehnle added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

That's an interesting catch. Basically a Write-after-Write hazard.

Though this seems like the wrong place to add this check, since the function is specifically called canMoveInstsAcrossMemOp.

I rather suspect that instead addToListsIfDependent needs to be fixed instead, so that when the scan hits the S_CMP_EQ_U32 it recognizes the WAW hazard and adds it to the instructions to be moved.

I also wonder if this is related to D60459.

How do I recognize the WAW when scanning S_CMP_EQ_U32? I'm not sure to follow you.

Tue, May 7, 4:24 AM · Restricted Project
nhaehnle committed rG79ea85c6afb5: AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand (authored by nhaehnle).
AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand
Tue, May 7, 2:19 AM
nhaehnle added a comment to D61490: AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand.

Added a minimal verifier-only test before commit.

Tue, May 7, 2:19 AM · Restricted Project
nhaehnle added a comment to D61489: RegAlloc: try to fail more gracefully when out of registers.

I've changed the patch to not report the location in the non-inline case on the basis that it's an internal compiler error. Will keep it up for a while here and commit if there are no further comments.

Tue, May 7, 2:15 AM · Restricted Project
nhaehnle updated the diff for D61489: RegAlloc: try to fail more gracefully when out of registers.

Changing the error reporting not to report the instruction location
in the non-inline asm case.

Tue, May 7, 2:14 AM · Restricted Project
nhaehnle added a comment to D61489: RegAlloc: try to fail more gracefully when out of registers.

Given that, there needs to be some way for error reporting to distinguish the two errors. When we run out of registers due to a non-inline-asm construct in clang, we want to make it clear to the user that the issue is a compiler bug, not user error. This means we need to trigger the code in the clang driver that asks for a bug report and generates preprocessed source.

Of course, that isn't fundamentally tied to calling abort() from the register allocator; some alternate mechanism for reporting ICEs which doesn't immediately kill the process would be reasonable.

Tue, May 7, 1:49 AM · Restricted Project
nhaehnle added inline comments to D60457: [CodeGen] Fixed de-optimization of legalize subvector extract.
Tue, May 7, 1:27 AM · Restricted Project

Mon, May 6

nhaehnle added a reviewer for D61493: AMDGPU/MC: Add .amdgpu_lds directive: kzhuravl.
Mon, May 6, 9:19 AM · Restricted Project
nhaehnle added a reviewer for D61494: AMDGPU: Write LDS objects out as global symbols in code generation: kzhuravl.
Mon, May 6, 9:19 AM · Restricted Project
nhaehnle added a reviewer for D61492: AMDGPU: Prepare for explicit absolute relocations in code generation: kzhuravl.
Mon, May 6, 9:17 AM · Restricted Project
nhaehnle added a reviewer for D61491: AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0: kzhuravl.
Mon, May 6, 9:17 AM · Restricted Project

Sat, May 4

nhaehnle created D61553: AMDGPU: Fix ds_{read,write}2_b64 on SI/gfx6.
Sat, May 4, 9:48 AM · Restricted Project
nhaehnle added a comment to D61525: [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32.

What does the signedness of the multiply even mean?

It does not. There is problem with the instruction itself.

Sat, May 4, 4:27 AM · Restricted Project

Fri, May 3

nhaehnle added a child revision for D61491: AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0: D61492: AMDGPU: Prepare for explicit absolute relocations in code generation.
Fri, May 3, 5:05 AM · Restricted Project
nhaehnle added a parent revision for D61492: AMDGPU: Prepare for explicit absolute relocations in code generation: D61491: AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0.
Fri, May 3, 5:05 AM · Restricted Project
nhaehnle added a child revision for D61492: AMDGPU: Prepare for explicit absolute relocations in code generation: D61493: AMDGPU/MC: Add .amdgpu_lds directive.
Fri, May 3, 5:05 AM · Restricted Project
nhaehnle added a parent revision for D61493: AMDGPU/MC: Add .amdgpu_lds directive: D61492: AMDGPU: Prepare for explicit absolute relocations in code generation.
Fri, May 3, 5:05 AM · Restricted Project
nhaehnle added a child revision for D61493: AMDGPU/MC: Add .amdgpu_lds directive: D61494: AMDGPU: Write LDS objects out as global symbols in code generation.
Fri, May 3, 5:05 AM · Restricted Project
nhaehnle added a parent revision for D61494: AMDGPU: Write LDS objects out as global symbols in code generation: D61493: AMDGPU/MC: Add .amdgpu_lds directive.
Fri, May 3, 5:05 AM · Restricted Project
nhaehnle created D61494: AMDGPU: Write LDS objects out as global symbols in code generation.
Fri, May 3, 5:04 AM · Restricted Project
nhaehnle created D61493: AMDGPU/MC: Add .amdgpu_lds directive.
Fri, May 3, 5:03 AM · Restricted Project
nhaehnle created D61492: AMDGPU: Prepare for explicit absolute relocations in code generation.
Fri, May 3, 5:02 AM · Restricted Project
nhaehnle created D61491: AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0.
Fri, May 3, 5:02 AM · Restricted Project
nhaehnle created D61490: AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand.
Fri, May 3, 5:01 AM · Restricted Project
nhaehnle created D61489: RegAlloc: try to fail more gracefully when out of registers.
Fri, May 3, 5:01 AM · Restricted Project
nhaehnle added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

I wonder if this is related to D61313?

Fri, May 3, 2:16 AM · Restricted Project
nhaehnle added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

That's an interesting catch. Basically a Write-after-Write hazard.

Fri, May 3, 2:12 AM · Restricted Project

Tue, Apr 23

nhaehnle committed rG7edae4c40387: AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies (authored by nhaehnle).
AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies
Tue, Apr 23, 6:11 AM
nhaehnle created D60999: AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies.
Tue, Apr 23, 3:40 AM · Restricted Project

Apr 18 2019

nhaehnle added a comment to D59042: [SDA] Bug fix: Use IPD outside the loop as divergence bound.

Committed. I believe the procedure for SVN access is still that you send Chris Lattner an email to ask about it.

Apr 18 2019, 9:18 AM · Restricted Project
nhaehnle committed rG523f90a2bad9: [SDA] Bug fix: Use IPD outside the loop as divergence bound (authored by nhaehnle).
[SDA] Bug fix: Use IPD outside the loop as divergence bound
Apr 18 2019, 9:16 AM
nhaehnle accepted D60864: [AMDGPU] Ignore non-SUnits edges.

LGTM, with the understanding that the test case which we have in our internal branches will make it upstream.

Apr 18 2019, 6:38 AM · Restricted Project
nhaehnle added a comment to D60834: [AMDGPU] Uniform values being used outside loop marked non-divergent.

I think there are some misunderstandings here. None of the IR passes require LCSSA. The problem is in getting the divergence data into the SelectionDAG.

Apr 18 2019, 1:03 AM · Restricted Project

Apr 17 2019

nhaehnle accepted D60824: AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions.

LGTM

Apr 17 2019, 7:32 AM · Restricted Project
nhaehnle added a comment to D60682: [AMDGPU] Fixed +DumpCode.

Mesa uses +DumpCode as well. Do we have a good solution for generation both ELF and textual output?

Apr 17 2019, 7:28 AM · Restricted Project

Apr 15 2019

nhaehnle accepted D59042: [SDA] Bug fix: Use IPD outside the loop as divergence bound.

Oh wow. That's an important fix indeed :) LGTM

Apr 15 2019, 8:49 AM · Restricted Project
nhaehnle added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Okay, you've convinced me. I only hope we can move forward with GlobalISel and do it right there.

Apr 15 2019, 7:51 AM
nhaehnle added inline comments to D60462: [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling.
Apr 15 2019, 12:54 AM · Restricted Project

Apr 7 2019

nhaehnle added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

From the point of view of the design of all these interface, It's too bad we can't fix this in post. From an overall standpoint, it's actually better to get the register classes from the beginning, so sure, let's go with this kind of approach.

Apr 7 2019, 6:40 AM

Mar 15 2019

nhaehnle accepted D58957: [AMDGPU] Add an experimental buffer fat pointer address space..

LGTM, whether you do the NoAlias between constant 32-bit and buffer address space here or separately. (Making NoAlias between constant 32-bit and the other existing address spaces should definitely be a different patch.)

Mar 15 2019, 1:32 PM · Restricted Project, Restricted Project
nhaehnle accepted D59312: AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges..

LGTM

Mar 15 2019, 1:28 PM · Restricted Project
nhaehnle added a comment to D59312: AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges..

Hmm, this is fragile -- I think your reasoning about domination is mostly sound, except when there's uniform control flow inside the loop itself, in which case I'm not sure. That said, it seems to me that the domination check is conservative, and your change shouldn't break anything that wasn't broken before, and a proper fix is potentially much more difficult.

Mar 15 2019, 1:28 PM · Restricted Project

Mar 7 2019

nhaehnle added inline comments to D58957: [AMDGPU] Add an experimental buffer fat pointer address space..
Mar 7 2019, 1:30 AM · Restricted Project, Restricted Project

Mar 5 2019

nhaehnle accepted D58895: [TableGen] Allow lists to be concatenated through '#'.

Awesome :)
LGTM

Mar 5 2019, 2:20 AM · Restricted Project

Feb 12 2019

Herald updated subscribers of D57511: [DebugInfo] Stop changing labels for register-described parameter DBG_VALUEs.
Feb 12 2019, 1:19 AM · Restricted Project, debug-info
nhaehnle accepted D57894: AMDGPU: Fix @llvm.amdgcn.wqm.vote implementation.

FWIW, the problem is a bit more involved than that. Consider

bool value = ...;
if (divergent condition) {
  use(value);
}

So the undefinedness of inactive bits in this case is not due to suboptimal lowering of NOT, but inherently due to the fact that we're in a different region of control flow.

Feb 12 2019, 1:19 AM · Restricted Project
Herald updated subscribers of D53765: [RFC prototype] Implementation of asm-goto support in LLVM.
Feb 12 2019, 1:14 AM · Restricted Project
nhaehnle added a comment to D58017: [DAG] Add SimplifyDemandedBits support for BSWAP.

Hmm, this is one of those cases where it'd be awesome to have a Godbolt for the tests.

Feb 12 2019, 1:13 AM · Restricted Project
Herald updated subscribers of D58026: LLD: Preserve ABI version during linking ELF.
Feb 12 2019, 1:10 AM · Restricted Project
nhaehnle accepted D58077: [tablegen] Add locations to many PrintFatalError() calls.

+1 for better error messages! LGTM.

Feb 12 2019, 1:09 AM · Restricted Project

Feb 8 2019

nhaehnle added a comment to D57825: IR: Add immarg attribute.

@nhaehnle can you look at the InstCombineSimplifyDemanded change? I was a bit confused by the assert you added

The background of this is that if TFE/LWE is enabled, the SimplifyDemanded logic won't work as-is; but it should also never be hit, because intrinsic calls with TFE/LWE should have a struct return type (e.g. {v4f32,i32}) and the SimplifyDemanded logic doesn't support looking through that. The assert double-checks that.

In hindsight, it would be possible for somebody to manually create malformed IR which calls image intrinsics with TFE/LWE enabled but with a vector return type. That would be an error, and one could argue that the code should produce an error instead of an assert. It would require a broken frontend or manually written IR, though.

I would expect TFE to be turned on by the usage of the struct type. We should probably add a custom verifier check for this

Feb 8 2019, 8:02 AM
nhaehnle added a comment to D57825: IR: Add immarg attribute.

@nhaehnle can you look at the InstCombineSimplifyDemanded change? I was a bit confused by the assert you added

Feb 8 2019, 7:46 AM
nhaehnle added a comment to D57825: IR: Add immarg attribute.

Looks reasonable to me. Maybe give a little more time for people to give feedback? Though this has already been on llvm-dev without opposition...

Feb 8 2019, 4:27 AM

Feb 7 2019

nhaehnle added a comment to D57748: AMDGPU: Add inverse ballot intrinsic.

Why can't we recognize this as a pattern? Basically, it's just (src & (1 << thread_idx)), and thread_idx can be matched as a sequence of mbcnt intrinsics.

Feb 7 2019, 9:42 AM · Restricted Project
nhaehnle added a comment to D57737: [AMDGPU] Fix DPP sequence in atomic optimizer..

Did you actually test this? The shift-by-3 should be unnecessary.

Feb 7 2019, 3:54 AM · Restricted Project, Restricted Project
nhaehnle accepted D56496: [AMDGPU] Fix CS scratch setup on pre-GCN3 ASICs.

LGTM

Feb 7 2019, 3:32 AM · Restricted Project
nhaehnle added a comment to D55474: [AMDGPU] Extend constant folding for logical operations.

It looks like this was never committed. What's the next step here?

Feb 7 2019, 3:18 AM
nhaehnle abandoned D53161: Fix some cases where the index size was used instead of the pointer size.

I still find the code as-is a bit dubious, but we no longer need this change and the review process tends to be a bit of a pain, so I'm dropping this.

Feb 7 2019, 3:16 AM · Restricted Project
nhaehnle accepted D55444: AMDGPU: Fix DPP combiner.

LGTM

Feb 7 2019, 12:43 AM · Restricted Project, Restricted Project

Feb 4 2019

nhaehnle committed rGa69146e67eb7: [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded (authored by nhaehnle).
[InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded
Feb 4 2019, 1:25 PM
nhaehnle added a comment to D57681: [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded.

Does it actually matter? I thought since this needs to be a constant, this just needs to not crash

Feb 4 2019, 1:12 PM · Restricted Project
nhaehnle created D57681: [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded.
Feb 4 2019, 5:03 AM · Restricted Project
nhaehnle added a reviewer for D57681: [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded: msearles.
Feb 4 2019, 5:03 AM · Restricted Project

Jan 15 2019

nhaehnle added a comment to D55444: AMDGPU: Fix DPP combiner.

Hi Valery, I really like the way the different cases are listed in the explanatory comment at the top of the file, and I believe those cases are correct. Would it be possible to restructure the code in a way that follows those cases? I think that would make it much easier to follow.

Jan 15 2019, 3:34 AM · Restricted Project, Restricted Project

Jan 2 2019

nhaehnle accepted D55179: AMDGPU: Remove v16i8 from register classes.

LGTM

Jan 2 2019, 1:04 AM

Dec 26 2018

nhaehnle added a comment to D56002: [AMDGPU] Fix a weird WWM intrinsic issue..

The only user of canReadVGPR is addUsersToMoveToVALUWorklist. Since the intended semantics of canReadVGPR aren't at all clear from the name, might I suggesting folding it into its only user?

Dec 26 2018, 8:36 AM · Restricted Project

Dec 18 2018

nhaehnle accepted D55602: AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged.

Thanks, LGTM

Dec 18 2018, 7:21 AM

Dec 12 2018

nhaehnle accepted D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..

LGTM

Dec 12 2018, 2:12 AM · Restricted Project

Dec 10 2018

nhaehnle accepted D55367: [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D..

LGTM

Dec 10 2018, 7:44 AM · Restricted Project

Dec 7 2018

nhaehnle added inline comments to D55435: [AMDGPU] Fix discarded result of addAttribute.
Dec 7 2018, 8:11 AM
nhaehnle added inline comments to D55402: [AMDGPU] Simplify negated condition.
Dec 7 2018, 8:08 AM
nhaehnle added a comment to D55369: AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1.

I thought mesa was moving to stop using the relocations at all for this?

Dec 7 2018, 7:56 AM
nhaehnle added a comment to D55367: [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D..

Please make the change apply to Mesa3D as well.

Dec 7 2018, 7:54 AM · Restricted Project