arsenm (Matt Arsenault)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 5 2012, 4:53 PM (285 w, 2 d)

Recent Activity

Yesterday

arsenm accepted D47359: AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI.

LGTM

Fri, May 25, 10:23 PM
arsenm created D47386: AMDGPU: Use better alignment for kernarg lowering.
Fri, May 25, 12:23 PM
arsenm added inline comments to D47370: AMDGPU: Round up kernel argument allocation size.
Fri, May 25, 12:09 PM
arsenm added a comment to D47370: AMDGPU: Round up kernel argument allocation size.

Are we sure that is what RT(s) do?

Fri, May 25, 11:46 AM
arsenm added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

Needs a test, preferably the full set of AA checks with 32 bit constant

Fri, May 25, 5:55 AM
arsenm created D47370: AMDGPU: Round up kernel argument allocation size.
Fri, May 25, 5:50 AM
arsenm created D47361: AMDGPU: Pass function directly instead of MachineFunction.
Fri, May 25, 12:35 AM
arsenm added inline comments to D47359: AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI.
Fri, May 25, 12:17 AM

Thu, May 24

arsenm created D47334: AMDGPU: Add nuw to add off of kernarg ptr.
Thu, May 24, 10:18 AM
arsenm added a comment to D46298: AMDGPU: Remove deadcode in isSDNodeSourceOfDivergence().

I agree isVGPR should be illegal to call for R600

So, how would you handle the bunch of r600 tests then?
We could claim the support of DA related stuff for GCN only but it should be explicitly agreed and documented somehow.

Thu, May 24, 3:08 AM

Wed, May 23

arsenm added inline comments to D46365: AMDGPU: Separate R600 and GCN TableGen files.
Wed, May 23, 11:57 PM
arsenm added a comment to D46298: AMDGPU: Remove deadcode in isSDNodeSourceOfDivergence().

Could you please clarify - why do you consider that check meaningless for r600?
I see that this line : " const SISubtarget &ST = MF->getSubtarget<SISubtarget>(); " is misleading and in fact is not correct.
I'd better check and choose the R600Subtarget or SISubtarget.
If I understand right we need just check which subtarget to retrieve for physregs check.

Since we consider any VGPR formal argument as divergent it does not matter R600 or SI at all.
We need to choose right TargetRegisterInfo (r600 or SI again)

So, for virtual register : if (MRI.isLiveIn(Reg) && TRI.isVGPR(Reg) ) return true

Or I maybe don't know something substantial? :)

I would expect the TRI.isVGPR() call to crash if this was called from an R600 code path, so that made me think this was dead code.

Wed, May 23, 11:42 PM
arsenm accepted D47307: AMDGPU: Split R600 MCInst lowering into its own class.

LGTM

Wed, May 23, 11:41 PM
arsenm accepted D47264: AMDGPU: Remove AMDGPUMCInstLower.h.

LGTM

Wed, May 23, 2:28 PM
arsenm created D47279: DAG: Remove redundant version of getRegisterTypeForCallingConv.
Wed, May 23, 1:27 PM
arsenm requested changes to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

Needs a test, preferably the full set of AA checks with 32 bit constant

Wed, May 23, 9:36 AM
arsenm added inline comments to D46968: Extending constant folding for float arithmetic to isFast IR flags.
Wed, May 23, 4:37 AM
arsenm accepted D46912: StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly.

LGTM. I don't believe that there are reasons RPO doesn't work other than bugs in the structurizer. The sorting used now is clearly broken, and if this fixes the concept it was going for this is a step in the right direction

Wed, May 23, 4:35 AM
arsenm added inline comments to D47194: [AMDGPU] Fixed non-uniform addr64 MUBUF in shader.
Wed, May 23, 3:20 AM
arsenm added inline comments to D47148: [CodeGen] Always update divergence in SelectionDAG::UpdateNodeOperands.
Wed, May 23, 2:05 AM
arsenm accepted D47245: AMDGPU: Split R600 AsmPrinter code into its own class.

LGTM

Wed, May 23, 1:11 AM

Tue, May 22

arsenm committed rL333024: AMDGPU: Fix missing test coverage for some 16-bit and packed ops.
AMDGPU: Fix missing test coverage for some 16-bit and packed ops
Tue, May 22, 1:46 PM
arsenm closed D47184: AMDGPU: Fix v2f16 fneg/fabs pattern.

r333019

Tue, May 22, 1:21 PM
arsenm committed rL333019: AMDGPU: Fix v2f16 fneg/fabs pattern.
AMDGPU: Fix v2f16 fneg/fabs pattern
Tue, May 22, 1:17 PM
arsenm updated the diff for D47215: DAG: Fix extract_subvector combine for a single element.

Fix missing test update

Tue, May 22, 12:15 PM
arsenm created D47215: DAG: Fix extract_subvector combine for a single element.
Tue, May 22, 12:11 PM
arsenm added a comment to D47207: DAG: Avoid bitcast/ext/build_vector combine.

please add regression tests

Tue, May 22, 10:59 AM
arsenm created D47207: DAG: Avoid bitcast/ext/build_vector combine.
Tue, May 22, 10:38 AM
arsenm added inline comments to D47194: [AMDGPU] Fixed non-uniform addr64 MUBUF in shader.
Tue, May 22, 8:49 AM
arsenm created D47184: AMDGPU: Fix v2f16 fneg/fabs pattern.
Tue, May 22, 1:33 AM

Mon, May 21

arsenm closed D47078: AMDGPU: Make v2i16/v2f16 legal on VI.

r332953

Mon, May 21, 11:36 PM
arsenm committed rL332953: AMDGPU: Make v2i16/v2f16 legal on VI.
AMDGPU: Make v2i16/v2f16 legal on VI
Mon, May 21, 11:36 PM
arsenm accepted D47055: [LowerSwitch] Fixed faulty PHI node update.

LGTM

Mon, May 21, 11:36 PM
arsenm accepted D46596: [AMDGPU] Optimze old value of v_mov_b32_dpp.

LGTM

Mon, May 21, 11:23 PM
arsenm accepted D47181: AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP.

LGTM

Mon, May 21, 11:05 PM
arsenm accepted D47180: AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering.

LGTM

Mon, May 21, 11:05 PM
arsenm added inline comments to D46992: [AMDGPU] Add perf hints to functions.
Mon, May 21, 2:50 PM
arsenm committed rL332874: AMDGPU: Update GCCBuiltin names for DS FP atomic intrinsics.
AMDGPU: Update GCCBuiltin names for DS FP atomic intrinsics
Mon, May 21, 12:47 PM
arsenm added a reviewer for D47154: Try to make builtin address space declarations not useless: dfukalov.
Mon, May 21, 12:17 PM
arsenm created D47154: Try to make builtin address space declarations not useless.
Mon, May 21, 12:06 PM

Fri, May 18

arsenm added inline comments to D46992: [AMDGPU] Add perf hints to functions.
Fri, May 18, 3:13 PM
arsenm committed rL332774: DAG: Fix crash on shift with large shift amounts.
DAG: Fix crash on shift with large shift amounts
Fri, May 18, 2:58 PM
arsenm accepted D47081: Fix evaluator for non-zero alloca addr space.

LGTM

Fri, May 18, 2:52 PM
arsenm closed D47009: AMDGPU: Add pass to optimize reqd_work_group_size.

r332771

Fri, May 18, 2:51 PM
arsenm committed rL332771: AMDGPU: Add pass to optimize reqd_work_group_size.
AMDGPU: Add pass to optimize reqd_work_group_size
Fri, May 18, 2:38 PM
arsenm added a comment to D43281: [AMDGPU] fixes for lds f32 builtins.

I think the intent of the current code is for the address space to correspond to a "target address space" as if the user code used attribute((address_space(n))) to specify a pointer value. This is confusingly named, and different from the target address space selected for a LangAS. I think we need to add some mechanism for specifying the builtin is a LangAS ID. Since Ideally this would also work for multiple languages (e.g. cuda_constant or opencl_constant for the same builtin) I think there needs to be some callback triggered for the address space value. This possibly needs to be distinct from the current pointer descriptor to avoid breaking the possibility of user defined address spaces. There aren't really any other users of builtins with address spaces. NVPTX has some, but the tests seem to not actually try to use the declared address space and pass generic pointers to them.

Fri, May 18, 12:37 PM · Restricted Project
arsenm added a comment to D45968: StackSlotColoring: Decide colors per stack ID.

ping

Fri, May 18, 12:24 PM
arsenm requested changes to D47081: Fix evaluator for non-zero alloca addr space.
Fri, May 18, 12:09 PM
arsenm added inline comments to D46769: [AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution..
Fri, May 18, 12:06 PM
arsenm added a comment to D43281: [AMDGPU] fixes for lds f32 builtins.

I'm looking at how the address space mapping works for builtins, and I think what's there is just uselessly broken and needs to be fixed. It seems to be operating under the assumption that the address spaces the target defines are totally disjoint from the language address spaces

Fri, May 18, 11:15 AM · Restricted Project
arsenm added inline comments to D46754: [AMDGPU] Add intrinsics for 16 bit interpolation.
Fri, May 18, 10:55 AM
arsenm added a comment to D46754: [AMDGPU] Add intrinsics for 16 bit interpolation.

Corrected the ordering of operands to interp_p2_f16, added lowered
intrinsics to list of those that cware a source of divergence, and
amended LIT test.

I have not overloaded the intrinsics as I don't believe it is possible
in this case as they have an additional operand, and apart from that
additional operand the interp_p1_f16 has the same types as the 32 bit
version, so there are no type differences to provide disambiguation.

Is the extra parameter you're referring the high parameter to change where the register is read from the high or low bits? That shouldn't be exposed in the intrinsic at all. Eliminating the high bit extraction is a codegen optimization pattern

Fri, May 18, 10:52 AM
arsenm added a comment to D46754: [AMDGPU] Add intrinsics for 16 bit interpolation.

Corrected the ordering of operands to interp_p2_f16, added lowered
intrinsics to list of those that cware a source of divergence, and
amended LIT test.

I have not overloaded the intrinsics as I don't believe it is possible
in this case as they have an additional operand, and apart from that
additional operand the interp_p1_f16 has the same types as the 32 bit
version, so there are no type differences to provide disambiguation.

Fri, May 18, 10:48 AM
arsenm created D47078: AMDGPU: Make v2i16/v2f16 legal on VI.
Fri, May 18, 10:39 AM
arsenm updated the diff for D47009: AMDGPU: Add pass to optimize reqd_work_group_size.

Account for difference between 1.2 and 2.0 wrt uniform-work-group-size

Fri, May 18, 1:52 AM
arsenm added inline comments to D46769: [AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution..
Fri, May 18, 1:35 AM
arsenm added inline comments to D46871: [AMDGPU] Add interpolation builtins.
Fri, May 18, 1:33 AM
arsenm added inline comments to D46756: [AMDGPU] Reworked SIFixWWMLiveness.
Fri, May 18, 1:31 AM
arsenm added inline comments to D46992: [AMDGPU] Add perf hints to functions.
Fri, May 18, 1:25 AM
arsenm added inline comments to D46570: [AMDGPU] Optimzed old value for dpp if unused.
Fri, May 18, 1:08 AM
arsenm added inline comments to D46596: [AMDGPU] Optimze old value of v_mov_b32_dpp.
Fri, May 18, 1:05 AM

Thu, May 17

arsenm updated the diff for D47009: AMDGPU: Add pass to optimize reqd_work_group_size.

Also handle -cl-uniform-work-group-size attribute

Thu, May 17, 3:59 AM
arsenm created D47009: AMDGPU: Add pass to optimize reqd_work_group_size.
Thu, May 17, 3:45 AM
arsenm added inline comments to D46992: [AMDGPU] Add perf hints to functions.
Thu, May 17, 3:17 AM
arsenm added a comment to D40405: [AMDGPU] Fix handling of void types in isLegalAddressingMode.

Test should go in test/LoopStrengthReduce/AMDGPU and run just LSR

I can move the test, but I cannot reproduce the bug with opt.

Thu, May 17, 1:16 AM · Restricted Project

Wed, May 16

arsenm added a comment to D46968: Extending constant folding for float arithmetic to isFast IR flags.

What is the test actually testing? I don't think there should be any difference from this in the floor lowering, and I don't really want a separate test file duplicating all of the lowering testing

Wed, May 16, 12:38 PM
arsenm committed rL332453: AMDGPU: Custom lower v4i16/v4f16 vector operations.
AMDGPU: Custom lower v4i16/v4f16 vector operations
Wed, May 16, 4:51 AM
arsenm closed D46828: AMDGPU: Custom lower v4i16/v4f16 vector operations.

r332453

Wed, May 16, 4:51 AM
arsenm added a comment to D40405: [AMDGPU] Fix handling of void types in isLegalAddressingMode.

Test should go in test/LoopStrengthReduce/AMDGPU and run just LSR

Wed, May 16, 1:10 AM · Restricted Project

Tue, May 15

arsenm added a comment to D46912: StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly.

I still think this process of using the LoopDepth and sorting from anything other than RPO is fundamentally broken, but if everything is passing with this that's an improvement.

Tue, May 15, 11:56 PM
arsenm added a comment to D46172: AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum.

Also f64 tests?

Tue, May 15, 11:50 PM
arsenm updated the diff for D46828: AMDGPU: Custom lower v4i16/v4f16 vector operations.

Add more asserts

Tue, May 15, 1:24 PM
arsenm added inline comments to D46871: [AMDGPU] Add interpolation builtins.
Tue, May 15, 7:18 AM
arsenm added inline comments to D46871: [AMDGPU] Add interpolation builtins.
Tue, May 15, 7:17 AM
arsenm added inline comments to D46853: AMDGPU: Add disasm tests for deep learning instructions + fix v_fmac_f32 disasm.
Tue, May 15, 12:37 AM

Mon, May 14

arsenm requested changes to D46754: [AMDGPU] Add intrinsics for 16 bit interpolation.
Mon, May 14, 4:26 AM
arsenm accepted D46085: AMDGPU/SI: Don't promote alloca to vector for atomic load/store.

LGTM with the check lines enhanced

Mon, May 14, 4:23 AM
arsenm created D46828: AMDGPU: Custom lower v4i16/v4f16 vector operations.
Mon, May 14, 4:08 AM
arsenm requested changes to D46811: AMDGPU: Don't force the IEEE bit for Mesa compute shaders..

This needs a test (I'm pretty sure it already exists and should break with this change).

Mon, May 14, 12:13 AM
arsenm accepted D46722: [InstCombine] fix crash due to ignored addrspacecast.

LGTM

Mon, May 14, 12:07 AM

Sun, May 13

arsenm closed D46346: AMDGPU: rename OpenCL lowering pass to be R600 specific..

r332196

Sun, May 13, 3:09 AM · Restricted Project
arsenm committed rL332196: AMDGPU: Rename OpenCL lowering pass to be R600 specific..
AMDGPU: Rename OpenCL lowering pass to be R600 specific.
Sun, May 13, 3:08 AM
arsenm closed D46744: AMDGPU: Make undef legal for v2i16/v2f16.

r332195

Sun, May 13, 3:08 AM
arsenm committed rL332195: AMDGPU: Make undef legal for v2i16/v2f16.
AMDGPU: Make undef legal for v2i16/v2f16
Sun, May 13, 3:08 AM

Fri, May 11

arsenm added inline comments to D46722: [InstCombine] fix crash due to ignored addrspacecast.
Fri, May 11, 1:44 AM
arsenm created D46744: AMDGPU: Make undef legal for v2i16/v2f16.
Fri, May 11, 1:22 AM

Thu, May 10

arsenm accepted D45993: AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction..

LGTM

Thu, May 10, 5:14 AM
arsenm accepted D45994: AMDGPU/GlobalISel: Enable TableGen'd instruction selector.

LGTM. At this point I would be more interested in what source and dest modifier folding looks like in global isel

Thu, May 10, 5:09 AM
arsenm added inline comments to D46616: [AMDGPU][Waitcnt] Fix handling of flat instrs.
Thu, May 10, 4:53 AM · Restricted Project

Wed, May 9

arsenm closed D46644: AMDGPU: Ignore any_extend in mul24 combine.

r331919

Wed, May 9, 2:16 PM
arsenm committed rL331919: AMDGPU: Ignore any_extend in mul24 combine.
AMDGPU: Ignore any_extend in mul24 combine
Wed, May 9, 2:15 PM
arsenm closed D46575: AMDGPU: Handle partial shift reduction for variable shifts.

r331917

Wed, May 9, 1:56 PM
arsenm closed D46573: AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit.

r331916

Wed, May 9, 1:56 PM
arsenm committed rL331917: AMDGPU: Handle partial shift reduction for variable shifts.
AMDGPU: Handle partial shift reduction for variable shifts
Wed, May 9, 1:56 PM
arsenm committed rL331916: AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit.
AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit
Wed, May 9, 1:56 PM
arsenm closed D46538: AMDGPU: Add combine for trunc of bitcast from build_vector.

r331909

Wed, May 9, 11:41 AM
arsenm committed rL331909: AMDGPU: Add combine for trunc of bitcast from build_vector.
AMDGPU: Add combine for trunc of bitcast from build_vector
Wed, May 9, 11:41 AM
arsenm closed D46532: AMDGPU: Stop special casing constant indexes of extract_vector_elt.

r331906

Wed, May 9, 11:33 AM
arsenm committed rL331906: AMDGPU: Stop special casing constant indexes of extract_vector_elt.
AMDGPU: Stop special casing constant indexes of extract_vector_elt
Wed, May 9, 11:33 AM
arsenm added inline comments to D45994: AMDGPU/GlobalISel: Enable TableGen'd instruction selector.
Wed, May 9, 11:03 AM