arsenm (Matt Arsenault)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 5 2012, 4:53 PM (254 w, 3 d)

Recent Activity

Tue, Oct 17

arsenm added a comment to D39040: AMDGPU: Fix creating invalid copy when adjusting dmask.

There are some test crashes with this and I think I made the wrong guess for dmask behavior

Tue, Oct 17, 11:14 PM
arsenm created D39040: AMDGPU: Fix creating invalid copy when adjusting dmask.
Tue, Oct 17, 10:44 PM
arsenm created D39036: AMDGPU: Fix default range in non-kernel functions.
Tue, Oct 17, 9:23 PM
arsenm accepted D39014: AMDGPU : Fix an error for the llvm.cttz implementation..

LGTM

Tue, Oct 17, 1:58 PM

Mon, Oct 16

arsenm added a comment to D38959: AMDGPU: Don't use TargetStreamer if it has not been initialized.

That seems like a clang bug? I don't understand what the option is for, but isn't there some null streamer it's supposed to use?

Mon, Oct 16, 2:47 PM
arsenm added inline comments to D38914: AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset.
Mon, Oct 16, 2:45 PM
arsenm added inline comments to D38029: [AVR] Override ParseDirective.
Mon, Oct 16, 2:44 PM
arsenm accepted D38466: [ TargetLowering, AMDGPU] Use the return value of UpdateNodeOperands(); .

LGTM

Mon, Oct 16, 2:43 PM
arsenm added a comment to D38959: AMDGPU: Don't use TargetStreamer if it has not been initialized.

I also don't understand why this would be necessary. How can the streamer not be available?

Mon, Oct 16, 2:42 PM
arsenm accepted D38958: AMDGPU: Start generating metadata for MaxFlatWorkGroupSize.

LGTM

Mon, Oct 16, 2:41 PM
arsenm added inline comments to D38906: AMDGPU/SI: Implement d16 support buffer_load_format and tbuffer_load_format intrinsics.
Mon, Oct 16, 2:18 PM
arsenm added a comment to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.

Should the name just be WQM since it seems to map directly to the WQM instruction?

Mon, Oct 16, 2:05 PM
arsenm updated the diff for D38770: AMDGPU: Use stricter bounds for workitem builtins.

Use 1024 for OpenCL

Mon, Oct 16, 1:55 PM
arsenm added inline comments to D38967: [SelectionDAG] Don't subject ISD:Constant to the depth limit in TargetLowering::SimplifyDemandedBits..
Mon, Oct 16, 11:59 AM
arsenm added a comment to D36597: DAG: Fix creating select with wrong condition type.

ping

Mon, Oct 16, 11:59 AM

Fri, Oct 13

arsenm added inline comments to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Fri, Oct 13, 6:27 PM
arsenm committed rL315754: AMDGPU: Implement hasBitPreservingFPLogic.
AMDGPU: Implement hasBitPreservingFPLogic
Fri, Oct 13, 2:10 PM
arsenm closed D38519: AMDGPU: Implement hasBitPreservingFPLogic.

r315754

Fri, Oct 13, 2:10 PM
arsenm committed rL315748: AMDGPU: Look for src mods before fp_extend.
AMDGPU: Look for src mods before fp_extend
Fri, Oct 13, 1:46 PM
arsenm closed D38518: AMDGPU: Look for src mods before fp_extend.

r315748

Fri, Oct 13, 1:45 PM
arsenm added a comment to D36597: DAG: Fix creating select with wrong condition type.

ping

Fri, Oct 13, 1:31 PM
arsenm accepted D38667: AMDGPU: Parse r600 CPU name early and expose FMAF capability.

LGTM

Fri, Oct 13, 1:21 PM
arsenm committed rL315744: AMDGPU: Implement isFPExtFoldable.
AMDGPU: Implement isFPExtFoldable
Fri, Oct 13, 1:19 PM
arsenm closed D38510: AMDGPU: Implement isFPExtFree.

r315744

Fri, Oct 13, 1:19 PM
arsenm closed D36706: DAGCombiner: Add form of isFPExtFree to check uses.

r315740

Fri, Oct 13, 12:56 PM
arsenm committed rL315740: DAG: Add opcode and source type to isFPExtFree.
DAG: Add opcode and source type to isFPExtFree
Fri, Oct 13, 12:56 PM
arsenm closed D38874: DAG: Add flags to dumps.

r315690

Fri, Oct 13, 8:42 AM
arsenm committed rL315690: DAG: Add flags to dumps.
DAG: Add flags to dumps
Fri, Oct 13, 8:41 AM

Thu, Oct 12

arsenm added a comment to D36706: DAGCombiner: Add form of isFPExtFree to check uses.

It would appear to me that the PPC-only portion of this is NFC since it'll still return true for any floating point destination type. So I would say that the PPC back end is fine with those (as long as the assert gets a message :)).
However, shouldn't this patch have a test case?

Thu, Oct 12, 9:30 PM
arsenm updated the diff for D36706: DAGCombiner: Add form of isFPExtFree to check uses.

Add assert messages

Thu, Oct 12, 9:30 PM
arsenm added inline comments to D34077: DAGCombine: Combine BUILD_VECTOR to TRUNCATE.
Thu, Oct 12, 9:06 PM
arsenm accepted D38713: Fix assembler for alloca of multiple elements in non-zero addr space.

LGTM

Thu, Oct 12, 7:32 PM
arsenm created D38874: DAG: Add flags to dumps.
Thu, Oct 12, 7:30 PM
arsenm accepted D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..

LGTM

Thu, Oct 12, 10:39 AM

Wed, Oct 11

arsenm added inline comments to D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..
Wed, Oct 11, 4:24 PM
arsenm added inline comments to D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..
Wed, Oct 11, 11:05 AM

Tue, Oct 10

arsenm added inline comments to D38748: AMDGPU: Add support for isa version note.
Tue, Oct 10, 3:46 PM
arsenm created D38770: AMDGPU: Use stricter bounds for workitem builtins.
Tue, Oct 10, 3:21 PM
arsenm committed rL315361: AMDGPU: Fix missing skipFunction calls.
AMDGPU: Fix missing skipFunction calls
Tue, Oct 10, 1:48 PM
arsenm committed rL315360: AMDGPU: Fix failure to select branch with optnone.
AMDGPU: Fix failure to select branch with optnone
Tue, Oct 10, 1:35 PM
arsenm closed D38710: AMDGPU: Fix failure to select branch with optnone.

r315360

Tue, Oct 10, 1:34 PM
arsenm closed D38709: AMDGPU: Fix incorrect selection of pseudo-branches.

r315357

Tue, Oct 10, 1:22 PM
arsenm committed rL315357: AMDGPU: Fix incorrect selection of pseudo-branches.
AMDGPU: Fix incorrect selection of pseudo-branches
Tue, Oct 10, 1:22 PM
arsenm added inline comments to D38710: AMDGPU: Fix failure to select branch with optnone.
Tue, Oct 10, 10:38 AM

Mon, Oct 9

arsenm updated the diff for D38710: AMDGPU: Fix failure to select branch with optnone.

Right version of test

Mon, Oct 9, 4:19 PM
arsenm created D38710: AMDGPU: Fix failure to select branch with optnone.
Mon, Oct 9, 4:12 PM
arsenm created D38709: AMDGPU: Fix incorrect selection of pseudo-branches.
Mon, Oct 9, 4:10 PM
arsenm committed rL315238: AMDGPU: Add read_exec_lo/hi builtins.
AMDGPU: Add read_exec_lo/hi builtins
Mon, Oct 9, 1:08 PM
arsenm closed D38698: AMDGPU: Add read_exec_lo/hi builtins.

r315238

Mon, Oct 9, 1:08 PM
arsenm added a reviewer for D38698: AMDGPU: Add read_exec_lo/hi builtins: kzhuravl.
Mon, Oct 9, 1:03 PM
arsenm created D38698: AMDGPU: Add read_exec_lo/hi builtins.
Mon, Oct 9, 11:34 AM
arsenm committed rL315219: AMDGPU: Fix missing declaration for __builtin_amdgcn_dispatch_ptr.
AMDGPU: Fix missing declaration for __builtin_amdgcn_dispatch_ptr
Mon, Oct 9, 10:46 AM
arsenm added a comment to D37989: InstCombine: Insert missing canonicalizes.

ping

Mon, Oct 9, 8:47 AM
arsenm added a comment to D37999: InstSimplify: Constant fold some canonicalizes.

ping

Mon, Oct 9, 8:42 AM

Sat, Oct 7

arsenm updated the diff for D36706: DAGCombiner: Add form of isFPExtFree to check uses.

Add source type and rename

Sat, Oct 7, 10:35 AM

Fri, Oct 6

arsenm added inline comments to D38610: [AMDGPU] Lower enqueued blocks and generate runtime metadata.
Fri, Oct 6, 2:12 PM
arsenm closed D38113: OpenCL: Assume functions are convergent.

r315094

Fri, Oct 6, 12:36 PM
arsenm committed rL315094: OpenCL: Assume functions are convergent.
OpenCL: Assume functions are convergent
Fri, Oct 6, 12:36 PM
arsenm added inline comments to D38610: [AMDGPU] Lower enqueued blocks and generate runtime metadata.
Fri, Oct 6, 11:05 AM
arsenm added inline comments to D38607: [AMDGPU] New 64 bit div/rem expansion.
Fri, Oct 6, 10:08 AM
arsenm added a comment to D36597: DAG: Fix creating select with wrong condition type.

ping

Fri, Oct 6, 10:04 AM
arsenm added a comment to D38607: [AMDGPU] New 64 bit div/rem expansion.

Is this a port of what SC emits? I thought it emitted control flow for this.

Fri, Oct 6, 10:01 AM

Thu, Oct 5

arsenm updated the diff for D38113: OpenCL: Assume functions are convergent.

Check noduplicate

Thu, Oct 5, 11:15 AM
arsenm committed rL314993: AMDGPU: Set v2i32 any_extend to expand.
AMDGPU: Set v2i32 any_extend to expand
Thu, Oct 5, 10:40 AM
arsenm updated the diff for D36706: DAGCombiner: Add form of isFPExtFree to check uses.

Also update fsub case

Thu, Oct 5, 9:48 AM

Wed, Oct 4

arsenm added inline comments to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Wed, Oct 4, 5:43 PM
arsenm committed rL314952: AMDGPU: Add comment about clamps.
AMDGPU: Add comment about clamps
Wed, Oct 4, 5:15 PM
arsenm closed D38173: AMDGPU: do not fold clamp instructions when sources are different.

r314951

Wed, Oct 4, 5:15 PM
arsenm committed rL314951: AMDGPU: Do not fold clamp instructions when sources are different.
AMDGPU: Do not fold clamp instructions when sources are different
Wed, Oct 4, 5:15 PM
arsenm accepted D38497: Minor refactoring regarding Cast::isNoopCast(), NFC.

LGTM

Wed, Oct 4, 5:00 PM
arsenm closed D38562: AMDGPU: Fix not accounting for instruction size in bundles.

r314944

Wed, Oct 4, 4:01 PM
arsenm committed rL314944: AMDGPU: Fix not accounting for instruction size in bundles.
AMDGPU: Fix not accounting for instruction size in bundles
Wed, Oct 4, 4:01 PM
arsenm added a comment to D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..

Needs more comprehensive check lines. Just checking the instructions won't demonstrate that the extra instructions you're trying to avoid aren't there

Wed, Oct 4, 2:50 PM
arsenm added a comment to D35073: [RegisterCoalescer] Fix for subrange join unreachable.

ping, I think I ran into this same issue on another testcase

Wed, Oct 4, 2:37 PM
arsenm created D38562: AMDGPU: Fix not accounting for instruction size in bundles.
Wed, Oct 4, 2:27 PM
arsenm added inline comments to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Wed, Oct 4, 1:42 PM
arsenm added inline comments to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Wed, Oct 4, 10:40 AM
arsenm updated subscribers of D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1).
Wed, Oct 4, 10:26 AM
arsenm added inline comments to D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1).
Wed, Oct 4, 10:24 AM

Tue, Oct 3

arsenm accepted D38439: AMDGPU/GlobalISel: Mark 32-bit G_FADD as legal.

LGTM

Tue, Oct 3, 3:58 PM
arsenm added a comment to D37999: InstSimplify: Constant fold some canonicalizes.

ping

Tue, Oct 3, 3:55 PM
arsenm created D38519: AMDGPU: Implement hasBitPreservingFPLogic.
Tue, Oct 3, 3:49 PM
arsenm created D38518: AMDGPU: Look for src mods before fp_extend.
Tue, Oct 3, 3:21 PM
arsenm accepted D38508: AMDGPU: Expand setcc for v2i32 and v4i32.

LGTM. It shouldn't be any different for the f32 vectors.

Tue, Oct 3, 12:41 PM
arsenm added a dependent revision for D36706: DAGCombiner: Add form of isFPExtFree to check uses: D38510: AMDGPU: Implement isFPExtFree.
Tue, Oct 3, 12:38 PM
arsenm added a dependency for D38510: AMDGPU: Implement isFPExtFree: D36706: DAGCombiner: Add form of isFPExtFree to check uses.
Tue, Oct 3, 12:38 PM
arsenm created D38510: AMDGPU: Implement isFPExtFree.
Tue, Oct 3, 12:38 PM
arsenm added inline comments to D38508: AMDGPU: Expand setcc for v2i32 and v4i32.
Tue, Oct 3, 10:12 AM

Mon, Oct 2

arsenm added a comment to D36706: DAGCombiner: Add form of isFPExtFree to check uses.

Given its limited use I don't see why we shouldn't just update the existing isFPExtFree to take Opcode, SrcVT and DstVT - all of which may be useful.

Mon, Oct 2, 6:13 PM
arsenm added a comment to D36597: DAG: Fix creating select with wrong condition type.

ping

Mon, Oct 2, 5:11 PM
arsenm closed D38467: AMDGPU: Remove global isGCN predicates.

r314742

Mon, Oct 2, 5:08 PM
arsenm committed rL314742: AMDGPU: Remove global isGCN predicates.
AMDGPU: Remove global isGCN predicates
Mon, Oct 2, 5:08 PM
arsenm added a comment to D38293: Avoid predicated execution of the basic blocks containing scalar instructions.

I want to replace this pass by always inserting the branches on execz, and have a new pass which optimizes out short jumps. Would that be easier than trying to analyze this?

Mon, Oct 2, 4:10 PM
arsenm committed rL314715: AMDGPU: Fix typos.
AMDGPU: Fix typos
Mon, Oct 2, 1:33 PM
arsenm committed rL314714: AMDGPU: Fix potentially incorrectly matching check lines.
AMDGPU: Fix potentially incorrectly matching check lines
Mon, Oct 2, 1:33 PM
arsenm accepted D37850: AMDGPU: Split MUBUF offset into aligned components.

LGTM

Mon, Oct 2, 12:36 PM
arsenm updated the diff for D38467: AMDGPU: Remove global isGCN predicates.

Remove an unnecessary SubtargetPredicate block, fix a few more globally overwriting let Predicates = blocks

Mon, Oct 2, 12:25 PM
arsenm accepted D37894: [Lint] Avoid failed assertion by fetching the proper pointer type.

LGTM

Mon, Oct 2, 12:10 PM
arsenm added a comment to D37894: [Lint] Avoid failed assertion by fetching the proper pointer type.

Pass DataLayout to isNoopCast when called from lint.

Question:
In FastISel::hasTrivialKill there is:

// No-op casts are trivially coalesced by fast-isel.
if (const auto *Cast = dyn_cast<CastInst>(I))
  if (Cast->isNoopCast(DL.getIntPtrType(Cast->getContext())) &&

Do you know if it is ok to change this isNoopCast call to Cast->isNoopCast(DL)?

If so, then I can clean up the patch further by removing the IntPtrTy versions completely.

Mon, Oct 2, 12:10 PM
arsenm added a comment to D37989: InstCombine: Insert missing canonicalizes.

ping

Mon, Oct 2, 12:09 PM