nhaehnle (Nicolai Hähnle)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 9 2015, 4:06 AM (154 w, 1 d)

Recent Activity

Yesterday

nhaehnle added a comment to D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.

ping

Fri, Sep 21, 3:03 AM
nhaehnle accepted D52221: [AMDGPU] lower-switch in preISel as a workaround for legacy DA.
Fri, Sep 21, 1:34 AM

Thu, Sep 20

nhaehnle added a comment to D52221: [AMDGPU] lower-switch in preISel as a workaround for legacy DA.

I changed the first line in the change description to make it explicit that this is a workaround for DA, but it is not showing up here. I just used "arc diff master" ... how do I get the web interface to show the updated description?

Thu, Sep 20, 10:17 AM
nhaehnle added a comment to D52221: [AMDGPU] lower-switch in preISel as a workaround for legacy DA.

Yes, thank you, this is clearer. I'm fine with the change, but please do address Matt's comments.

Thu, Sep 20, 8:51 AM
nhaehnle created D52291: AMDGPU: Future-proof {raw,struct}.buffer.atomic intrinsics.
Thu, Sep 20, 2:28 AM

Tue, Sep 18

nhaehnle added a comment to D52221: [AMDGPU] lower-switch in preISel as a workaround for legacy DA.

Good to hear the new DA can handle this. We need lowered switches for the control flow lowering anyway, though, so we may as well do it this way.

Tue, Sep 18, 4:10 AM
nhaehnle accepted D51947: [AMDGPU] Match udot8 pattern.

Okay, thanks for the explanation, that seems fair.

Tue, Sep 18, 3:57 AM

Mon, Sep 17

nhaehnle updated subscribers of D51223: Update tests for new YAMLIO polymorphic traits.

Adding @tpr as a subscriber due to the (admittedly maybe a bit indirect) MsgPack connection.

Mon, Sep 17, 8:16 AM
nhaehnle added a comment to D51947: [AMDGPU] Match udot8 pattern.

Thanks, this mostly looks good to me. Looks like this may be running into a serious limitation of the ISel infrastructure with commutativity / associativity, but it makes sense to land this patch without addressing it. I do have one last question.

Mon, Sep 17, 3:22 AM
nhaehnle accepted D52018: [AMDGPU] Add instruction selection for i1 to f16 conversion.

Why the detour via V_CVT_F16_F32 instead of selecting an fp16 1.0 constant directly?

It is not possible select an fp16 constant with V_CNDMASK_B32.
In principle the VOP2 version could be used to select an inline literal, but only when the i1 is vcc.

Mon, Sep 17, 3:09 AM
nhaehnle added a comment to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..

This still needs to be adjusted for D52087, and I have one minor comment, but mostly LGTM.

Mon, Sep 17, 3:09 AM · Restricted Project
nhaehnle added inline comments to D52087: [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle..
Mon, Sep 17, 2:59 AM

Fri, Sep 14

nhaehnle added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Fri, Sep 14, 2:22 AM · Restricted Project
nhaehnle added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Fri, Sep 14, 1:52 AM · Restricted Project
nhaehnle added inline comments to D51947: [AMDGPU] Match udot8 pattern.
Fri, Sep 14, 1:33 AM

Thu, Sep 13

nhaehnle accepted D52022: [AMDGPU] Removed unused method.

LGTM

Thu, Sep 13, 3:47 AM
nhaehnle added inline comments to D51995: AMDGPU: Generate VALU ThreeOp Integer instructions.
Thu, Sep 13, 2:08 AM
nhaehnle updated the diff for D51995: AMDGPU: Generate VALU ThreeOp Integer instructions.
  • Cleaned up the pattern condition to take inline constants into account
  • Add tests for inline constants and uniform values ending up in VGPRs
  • update_llc-ify the tests
Thu, Sep 13, 2:02 AM
nhaehnle accepted D52011: DAG: Fix expansion of unaligned FP loads and stores.

LGTM

Thu, Sep 13, 1:57 AM
nhaehnle added a comment to D52018: [AMDGPU] Add instruction selection for i1 to f16 conversion.

What about sitofp?

Thu, Sep 13, 1:50 AM

Wed, Sep 12

nhaehnle added a comment to D51993: TableGen/CodeGenDAGPatterns: addPredicateFn only once.

Any perf numbers? For instance, on x86 -gen-dag-isel

Wed, Sep 12, 11:12 AM
nhaehnle added a dependency for D51995: AMDGPU: Generate VALU ThreeOp Integer instructions: D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.
Wed, Sep 12, 9:55 AM
nhaehnle added a dependent revision for D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands: D51995: AMDGPU: Generate VALU ThreeOp Integer instructions.
Wed, Sep 12, 9:55 AM
nhaehnle added a dependent revision for D51993: TableGen/CodeGenDAGPatterns: addPredicateFn only once: D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.
Wed, Sep 12, 9:54 AM
nhaehnle added a dependency for D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands: D51993: TableGen/CodeGenDAGPatterns: addPredicateFn only once.
Wed, Sep 12, 9:54 AM
nhaehnle created D51995: AMDGPU: Generate VALU ThreeOp Integer instructions.
Wed, Sep 12, 9:54 AM
nhaehnle created D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.
Wed, Sep 12, 9:53 AM
nhaehnle created D51993: TableGen/CodeGenDAGPatterns: addPredicateFn only once.
Wed, Sep 12, 9:52 AM
nhaehnle added a comment to D51947: [AMDGPU] Match udot8 pattern.

As for the testcases, what about vectorized multiplicaton, i.e.:

%vec1 = load <8 x i4>, ...
vec2 = load <8 x i4>, ...
%ext1 = zext <8 x i4> %vec1 to <8 x i32>
%ext2 = zext <8 x i4> %vec2 to <8 x i32>
%mul = mul nuw nsw <8 x i32> %ext1, %ext2
... then extractelement and add up the result
... or possibly the same thing without the zext

The TableGen itself looks good to me, except for one nitpick (inline).

Wed, Sep 12, 4:32 AM
nhaehnle added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Thanks! Some small issues let, and please also add a test case for the "simplifyDemanded" implementation.

Wed, Sep 12, 4:24 AM
nhaehnle accepted D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.

One nitpick, apart from that LGTM.

Wed, Sep 12, 3:42 AM

Mon, Sep 10

nhaehnle added inline comments to D51726: [AMDGPU] Remove non-instructions from GCNHazardRecognizer buffer.
Mon, Sep 10, 2:29 AM
nhaehnle added inline comments to D51726: [AMDGPU] Remove non-instructions from GCNHazardRecognizer buffer.
Mon, Sep 10, 2:25 AM
nhaehnle accepted D51793: AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit.

LGTM

Mon, Sep 10, 2:23 AM
nhaehnle accepted D50572: DAG: Handle odd vector sizes in calling conv splitting.

LGTM

Mon, Sep 10, 2:22 AM

Fri, Aug 31

nhaehnle added a comment to D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

This patch should already add the tests for the new analysis, shouldn't it?

Fri, Aug 31, 11:22 AM
nhaehnle added inline comments to D50572: DAG: Handle odd vector sizes in calling conv splitting.
Fri, Aug 31, 11:18 AM

Thu, Aug 30

nhaehnle added a comment to D46759: [RISCV] Support named operands for CSR instructions..

Thanks, the TableGen part looks good to me. I can't speak for the actual RISC-V changes.

Thu, Aug 30, 12:27 AM

Wed, Aug 29

nhaehnle accepted D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).

Would the combination of:

  • check NUW here
  • create inbounds GEP make good use of SMEM/SMRD immediates?

The code in SelectionDAGBuilder::visitGetElementPtr is unsafe for our case, because it doesn't know that the addition of 32-bit addresses is performed in 64 bits. Even x+4 can overflow in 32 bits but not 64 bits. The GEP can be "inbounds", but it doesn't change anything. However, we can hackishly use the inbounds flag to mean that the addition is safe, because GEP with inbounds and offset <= INT_MAX is converted to "add nuw".

Wed, Aug 29, 6:39 AM
nhaehnle added a comment to D50575: [AMDGPU] Add support for a16 modifier for gfx9.

It is interesting to note though that clang-format changed so much in this file.

Wed, Aug 29, 12:32 AM

Tue, Aug 28

nhaehnle updated the diff for D50629: AMDGPU: Fix getInstSizeInBytes.

Guard sanity check with EXPENSIVE_CHECKS

Tue, Aug 28, 3:52 AM
nhaehnle added a comment to D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

Hi Simon, can you please confirm that this is still the latest version of this change? There were some edits to D50433 since the last comment here...

Tue, Aug 28, 3:18 AM
nhaehnle accepted D50575: [AMDGPU] Add support for a16 modifier for gfx9.

Thanks for the changes.

Tue, Aug 28, 3:11 AM
nhaehnle accepted D50461: DAG: Don't use ABI copies in some contexts.

I'm not very familiar with this code either, but the change looks reasonable to me.

Tue, Aug 28, 3:00 AM
nhaehnle added a comment to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

That sounds like the way to go is testing just the NUW bit here in SelectSMRDOffset?

Tue, Aug 28, 2:22 AM

Mon, Aug 27

nhaehnle added a comment to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

That would be good, yeah. Note here it's not so much a case of preserving NUW as it is of deducing as much NUW as possible from getelementptrs.

Mon, Aug 27, 7:19 AM
nhaehnle added a comment to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).

Mon, Aug 27, 6:55 AM
nhaehnle added a comment to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

Right, that matches my understanding: the SMRD/SMEM instruction does a 64-bit addition, so if the 32-bit (add X, imm) were to have an unsigned wraparound, moving it into the immediate of the SMRD/SMEM would remove the wraparound and therefore be incorrect.

Mon, Aug 27, 6:48 AM
nhaehnle accepted D51237: DAG: Check transformed type for forming fminnum/fmaxnum from vselect.

Nice catch. One suggestion for improvement, apart from that LGTM.

Mon, Aug 27, 6:33 AM

Fri, Aug 24

nhaehnle added inline comments to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Fri, Aug 24, 10:00 AM
nhaehnle accepted D51098: [AMDGPU] Add support for multi-dword s.buffer.load intrinsic.

Thanks. I don't see a test that actually sets glc, please add one before committing.

Fri, Aug 24, 4:08 AM
nhaehnle accepted D50614: DAG: Allow matching fminnum/fmaxnum from vselect.

LGTM

Fri, Aug 24, 4:01 AM
nhaehnle added a comment to D50433: A New Divergence Analysis for LLVM.

Also, maybe I missed it, but could you state clearly in a comment what the assumed preconditions are for correctness?

I will add comments to the class declarations of the DA and SDA - they require the CFG to be reducible.

Fri, Aug 24, 3:55 AM

Aug 23 2018

nhaehnle added a comment to D50575: [AMDGPU] Add support for a16 modifier for gfx9.

The operand name / encoding bits in TableGen should really be changed as well to indicate the overload between R128 and A16.

Aug 23 2018, 3:02 AM
nhaehnle added a comment to D50433: A New Divergence Analysis for LLVM.

Please do a style pass to capitalize all variable names according to the LLVM coding style.

Aug 23 2018, 2:40 AM
nhaehnle added inline comments to D51098: [AMDGPU] Add support for multi-dword s.buffer.load intrinsic.
Aug 23 2018, 1:16 AM
nhaehnle added inline comments to D50629: AMDGPU: Fix getInstSizeInBytes.
Aug 23 2018, 1:10 AM
nhaehnle added a comment to D50982: [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions.

I find it a bit concerning that we seem to have different semantics for these instructions depending on the environment.

Aug 23 2018, 12:59 AM
nhaehnle added inline comments to D46759: [RISCV] Support named operands for CSR instructions..
Aug 23 2018, 12:42 AM
nhaehnle added a comment to D50629: AMDGPU: Fix getInstSizeInBytes.

Always do the sanity check in debug builds.

I was originally worried about how expensive this is, but it
doesn't show up in the running time of all the tests in
test/CodeGen/AMDGPU.

These are by design almost all small, so that makes sense. It might not be representative in the real world

Aug 23 2018, 12:27 AM

Aug 22 2018

nhaehnle updated the diff for D50629: AMDGPU: Fix getInstSizeInBytes.

Always do the sanity check in debug builds.

Aug 22 2018, 6:58 AM
nhaehnle added inline comments to D48013: TableGen/SearchableTables: Support more generic enums and tables.
Aug 22 2018, 4:29 AM
nhaehnle created D51097: TableGen/SearchableTables: Cast enums to unsigned in generated code.
Aug 22 2018, 4:28 AM
nhaehnle accepted D50771: [clang-tblgen] Add -print-records and -dump-json modes..

LGTM

Aug 22 2018, 1:46 AM

Aug 13 2018

nhaehnle created D50629: AMDGPU: Fix getInstSizeInBytes.
Aug 13 2018, 3:31 AM

Aug 8 2018

nhaehnle added a comment to D49026: [AMDGPU] New tbuffer intrinsics.

Some smaller nitpicks, but mostly looks good to me!

Aug 8 2018, 6:18 AM
nhaehnle accepted D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

LGTM

Aug 8 2018, 5:12 AM

Jul 31 2018

nhaehnle added inline comments to D50055: Update the coding standard about NFC changes and whitespace.
Jul 31 2018, 8:36 AM
nhaehnle accepted D49483: [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero.

Thanks! LGTM

Jul 31 2018, 8:08 AM
nhaehnle added inline comments to D50055: Update the coding standard about NFC changes and whitespace.
Jul 31 2018, 5:52 AM
nhaehnle accepted D46756: [AMDGPU] Reworked SIFixWWMLiveness.

Agreed that LCSSA doesn't help either.

Jul 31 2018, 5:34 AM
nhaehnle accepted D50041: AMDGPU: Fold undef fcanonicalize to qNaN.

LGTM

Jul 31 2018, 3:46 AM
nhaehnle accepted D49662: DAG: Enhance isKnownNeverNaN.

LGTM

Jul 31 2018, 3:40 AM
nhaehnle accepted D47383: [AMDGPU] Avoid using divergent value in mubuf addr64 descriptor.

Great, thanks!

Jul 31 2018, 3:25 AM
nhaehnle added inline comments to D48179: [AMDGPU] Emit MessagePack HSA Metadata for v3 code object.
Jul 31 2018, 3:24 AM
nhaehnle accepted D49995: [AMDGPU] Minor change to d16 buffer load implementation.

LGTM

Jul 31 2018, 3:12 AM

Jul 30 2018

nhaehnle accepted D49977: AMDGPU: Reduce code size with fcanonicalize (fneg x).

LGTM

Jul 30 2018, 4:36 AM
nhaehnle accepted D49976: AMDGPU: Make fneg combine handle fcanonicalize.

LGTM

Jul 30 2018, 4:35 AM
nhaehnle accepted D49026: [AMDGPU] New tbuffer intrinsics.

Thanks for doing this.

Jul 30 2018, 4:32 AM
nhaehnle added a comment to D49027: [TableGen] FixedLenDecoderEmitter: allow for dummy operand in MCInst.

As far as I can tell, there are four kinds of operands when it comes to fixed-length encoding / decoding:

Jul 30 2018, 4:24 AM
nhaehnle added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

What's the status of this patch?

Jul 30 2018, 3:48 AM

Jul 27 2018

nhaehnle added inline comments to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.
Jul 27 2018, 7:14 AM
nhaehnle added inline comments to D47383: [AMDGPU] Avoid using divergent value in mubuf addr64 descriptor.
Jul 27 2018, 6:32 AM
nhaehnle accepted D49221: DAG: Add calling convention argument to calling convention funcs.

This makes sense to me. One nit-pick, apart from that LGTM.

Jul 27 2018, 6:11 AM
nhaehnle added a comment to D49483: [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero.

I don't think doing this as an IR pass has any advantage, so this is fine.

Jul 27 2018, 5:52 AM
nhaehnle accepted D49286: TableGen : Fix tablegen grammar documentation. NFC..

LGTM

Jul 27 2018, 5:20 AM

Jun 25 2018

nhaehnle updated the diff for D48431: AMDGPU: Force skip over s_sendmsg and exp instructions.

Factor out SIInstrInfo::hasUnwantedEffectsWhenEXECEmpty

Jun 25 2018, 3:45 AM
nhaehnle added inline comments to D47702: DAG: ComputeNumSignBits from load range metadata.
Jun 25 2018, 12:50 AM
nhaehnle added inline comments to D47702: DAG: ComputeNumSignBits from load range metadata.
Jun 25 2018, 12:49 AM
nhaehnle added a comment to D48431: AMDGPU: Force skip over s_sendmsg and exp instructions.

We really need to invert how this pass works

Jun 25 2018, 12:21 AM

Jun 21 2018

nhaehnle created D48431: AMDGPU: Force skip over s_sendmsg and exp instructions.
Jun 21 2018, 7:14 AM

Jun 19 2018

nhaehnle updated the diff for D48165: InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded.

Rebased.

Jun 19 2018, 2:22 AM
nhaehnle updated the diff for D48017: AMDGPU: Select MIMG instructions manually in SITargetLowering.

Rebased.

Jun 19 2018, 2:21 AM
nhaehnle updated the diff for D48015: ARM,AArch64: Use generic tables instead of SearchableTable.

Rebased.

Jun 19 2018, 2:21 AM
nhaehnle updated the diff for D48014: AMDGPU: Use generic tables instead of SearchableTable.

Rebased.

Jun 19 2018, 2:20 AM
nhaehnle added inline comments to D48013: TableGen/SearchableTables: Support more generic enums and tables.
Jun 19 2018, 12:17 AM
nhaehnle updated the diff for D48013: TableGen/SearchableTables: Support more generic enums and tables.

Preserve the case in preprocessor guards for GenericEnum and GenericTable.

Jun 19 2018, 12:15 AM

Jun 18 2018

nhaehnle added a comment to D48013: TableGen/SearchableTables: Support more generic enums and tables.

Ping. Does this look alright?

Jun 18 2018, 2:38 AM

Jun 16 2018

nhaehnle added inline comments to D47909: Utilize new SDNode flag functionality to expand current support for fadd.
Jun 16 2018, 1:42 AM

Jun 15 2018

nhaehnle added a comment to D46756: [AMDGPU] Reworked SIFixWWMLiveness.

I've had some time to let this sink in now.

Jun 15 2018, 5:50 AM