arsenm (Matt Arsenault)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 5 2012, 4:53 PM (293 w, 1 d)

Recent Activity

Today

arsenm added a comment to D47154: Try to make builtin address space declarations not useless.

ping

Thu, Jul 19, 2:30 PM
arsenm added inline comments to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.
Thu, Jul 19, 2:26 PM
arsenm added a comment to D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Thu, Jul 19, 2:26 PM
arsenm created D49561: AMDGPU: Try to make isKnownNeverSNan more accurate.
Thu, Jul 19, 11:14 AM
arsenm added a comment to D49448: [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits.

Address some more feedback; use -stress-regalloc to cut down on the clobbers needed, add non-kernel tests, explicitly test the increment/decrement case, including the scratch offset SGPR.

I will try adding a MIR test to exercise the subregister condition and loop; with an IR test I don't know how to get a VGPR with subregs to survive "AMDGPU DAG->DAG Pattern Instruction Selection"

Thu, Jul 19, 10:30 AM
arsenm added inline comments to D49448: [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits.
Thu, Jul 19, 10:10 AM
arsenm added inline comments to D49428: [LSV] Look through selects for consecutive addresses.
Thu, Jul 19, 5:41 AM
arsenm added inline comments to D49428: [LSV] Look through selects for consecutive addresses.
Thu, Jul 19, 5:41 AM
arsenm added inline comments to D49428: [LSV] Look through selects for consecutive addresses.
Thu, Jul 19, 5:23 AM
arsenm added a comment to D49448: [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits.

Addressed feedback, and added at least one test to exercise the fix and the condition for putting the offset in an SGPR.

I would like to add more tests for Offset + Size - EltSize where Size != EltSize but I have had some trouble getting a ValueReg with subregisters to survive until the spill occurs. E.g. if I load and store a <2 x i32> it is spilled as two distinct i32.

Thu, Jul 19, 5:13 AM

Yesterday

arsenm added a comment to D49483: [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero.

Should this be done in an IR pass instead?

Wed, Jul 18, 7:46 AM
arsenm added inline comments to D49483: [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero.
Wed, Jul 18, 7:45 AM
arsenm added a comment to D49221: DAG: Add calling convention argument to calling convention funcs.

ping

Wed, Jul 18, 6:07 AM
arsenm added inline comments to D49448: [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits.
Wed, Jul 18, 12:10 AM

Tue, Jul 17

arsenm added a reviewer for D49401: TII: Generalize X86's isSafeToClobberEFLAGs: MatzeB.
Tue, Jul 17, 7:20 AM

Mon, Jul 16

arsenm updated the diff for D49401: TII: Generalize X86's isSafeToClobberEFLAGs.

Fix handling of sub registers and live ins

Mon, Jul 16, 2:43 PM
arsenm added inline comments to D49342: [LSV] Refactoring + supporting bitcasts to a type of different size.
Mon, Jul 16, 2:25 PM
arsenm added inline comments to D49342: [LSV] Refactoring + supporting bitcasts to a type of different size.
Mon, Jul 16, 2:24 PM
arsenm created D49401: TII: Generalize X86's isSafeToClobberEFLAGs.
Mon, Jul 16, 1:50 PM
arsenm added a comment to D49262: [DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT.

LGTM

Mon, Jul 16, 1:14 PM

Fri, Jul 13

arsenm created D49308: AMDGPU: Use existing function to check for VGPRs.
Fri, Jul 13, 10:47 AM
arsenm added inline comments to D49146: [AMDGPU] Support a fdot2 pattern..
Fri, Jul 13, 10:25 AM
arsenm committed rL337022: AMDGPU: Properly handle shader inputs with split arguments.
AMDGPU: Properly handle shader inputs with split arguments
Fri, Jul 13, 9:45 AM
arsenm closed D49128: AMDGPU: Properly handle shader inputs with split arguments.

r337022

Fri, Jul 13, 9:45 AM
arsenm committed rL337021: AMDGPU: Fix handling of alignment padding in DAG argument lowering.
AMDGPU: Fix handling of alignment padding in DAG argument lowering
Fri, Jul 13, 9:45 AM
arsenm closed D48978: AMDGPU: Fix handling of alignment padding in DAG argument lowering.

r337021

Fri, Jul 13, 9:45 AM
arsenm added inline comments to D49288: [AMDGPU] run post-RA hazard recognizer pass late.
Fri, Jul 13, 5:33 AM · Restricted Project
arsenm added inline comments to D49288: [AMDGPU] run post-RA hazard recognizer pass late.
Fri, Jul 13, 5:00 AM · Restricted Project
arsenm created D49287: AMDGPU: Break 64-bit arguments into 32-bit pieces.
Fri, Jul 13, 4:47 AM

Thu, Jul 12

arsenm closed D49257: AMDGPU: Fix assert in truncate combine with vectors.

r336935

Thu, Jul 12, 12:49 PM
arsenm committed rL336935: AMDGPU: Fix assert in truncate combine with vectors.
AMDGPU: Fix assert in truncate combine with vectors
Thu, Jul 12, 12:45 PM
arsenm created D49257: AMDGPU: Fix assert in truncate combine with vectors.
Thu, Jul 12, 11:03 AM
arsenm added a dependent revision for D49254: AMDGPU: Scalarize vector argument types to calls: D49255: AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls.
Thu, Jul 12, 10:37 AM
arsenm added a dependency for D49255: AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls: D49254: AMDGPU: Scalarize vector argument types to calls.
Thu, Jul 12, 10:37 AM
arsenm created D49255: AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls.
Thu, Jul 12, 10:37 AM
arsenm added dependencies for D49254: AMDGPU: Scalarize vector argument types to calls: D49221: DAG: Add calling convention argument to calling convention funcs, D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32, D49128: AMDGPU: Properly handle shader inputs with split arguments.
Thu, Jul 12, 10:36 AM
arsenm added a dependent revision for D49221: DAG: Add calling convention argument to calling convention funcs: D49254: AMDGPU: Scalarize vector argument types to calls.
Thu, Jul 12, 10:36 AM
arsenm added a dependent revision for D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32: D49254: AMDGPU: Scalarize vector argument types to calls.
Thu, Jul 12, 10:36 AM
arsenm added a dependent revision for D49128: AMDGPU: Properly handle shader inputs with split arguments: D49254: AMDGPU: Scalarize vector argument types to calls.
Thu, Jul 12, 10:36 AM
arsenm created D49254: AMDGPU: Scalarize vector argument types to calls.
Thu, Jul 12, 10:35 AM
arsenm created D49221: DAG: Add calling convention argument to calling convention funcs.
Thu, Jul 12, 1:41 AM
arsenm added a reviewer for D48978: AMDGPU: Fix handling of alignment padding in DAG argument lowering: rampitec.
Thu, Jul 12, 1:02 AM
arsenm added a comment to D48978: AMDGPU: Fix handling of alignment padding in DAG argument lowering.

ping

Thu, Jul 12, 1:01 AM
arsenm added a dependent revision for D49128: AMDGPU: Properly handle shader inputs with split arguments: D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.
Thu, Jul 12, 12:36 AM
arsenm added a dependency for D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32: D49128: AMDGPU: Properly handle shader inputs with split arguments.
Thu, Jul 12, 12:36 AM
arsenm updated the diff for D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.

Update for D49128

Thu, Jul 12, 12:36 AM
arsenm updated the diff for D49128: AMDGPU: Properly handle shader inputs with split arguments.

Complete change needed to handle other code doing the vector argument splitting

Thu, Jul 12, 12:21 AM

Wed, Jul 11

arsenm added a comment to D49128: AMDGPU: Properly handle shader inputs with split arguments.

This is a no-op change, right? Because the previous code also works.

Wed, Jul 11, 11:24 PM

Tue, Jul 10

arsenm added inline comments to D49146: [AMDGPU] Support a fdot2 pattern..
Tue, Jul 10, 2:33 PM
arsenm added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

As far as I understand it should be also legal with -mattr=-fp32-denormals,-fp64-fp16-denormals. I.e. when both 32 and 16 denorms are not supported. Right? Not that is really helps in the real world.
Otherwise it shall be legal if either UnsafeAlgebra or AllowContract flag is set on both FMA nodes.

Having the FMA node already grantees that either UnsafeAlgebra is set or AllowContract flag set is on the FAdd/FMUL nodes. We don't need to check them again during the FMA combine, right?

Tue, Jul 10, 2:32 PM
arsenm committed rC336681: AMDGPU: Try to fix test again.
AMDGPU: Try to fix test again
Tue, Jul 10, 7:52 AM
arsenm committed rL336681: AMDGPU: Try to fix test again.
AMDGPU: Try to fix test again
Tue, Jul 10, 7:52 AM
arsenm committed rL336676: Update test for backend error message change.
Update test for backend error message change
Tue, Jul 10, 7:08 AM
arsenm committed rC336676: Update test for backend error message change.
Update test for backend error message change
Tue, Jul 10, 7:08 AM
arsenm committed rL336675: Reapply "AMDGPU: Force inlining if LDS global address is used".
Reapply "AMDGPU: Force inlining if LDS global address is used"
Tue, Jul 10, 7:08 AM
arsenm created D49128: AMDGPU: Properly handle shader inputs with split arguments.
Tue, Jul 10, 6:50 AM
arsenm added inline comments to D49096: AMDGPU: Make hidden argument metadata consistent with amdgpu-implicitarg-num-bytes attribute.
Tue, Jul 10, 1:21 AM

Mon, Jul 9

arsenm added inline comments to D49096: AMDGPU: Make hidden argument metadata consistent with amdgpu-implicitarg-num-bytes attribute.
Mon, Jul 9, 12:39 PM
arsenm added inline comments to D49096: AMDGPU: Make hidden argument metadata consistent with amdgpu-implicitarg-num-bytes attribute.
Mon, Jul 9, 12:35 PM
arsenm closed D49035: AMDGPU: Force inlining if LDS global address is used.

r336587

Mon, Jul 9, 12:30 PM
arsenm committed rL336587: AMDGPU: Force inlining if LDS global address is used.
AMDGPU: Force inlining if LDS global address is used
Mon, Jul 9, 12:27 PM
arsenm added inline comments to D49035: AMDGPU: Force inlining if LDS global address is used.
Mon, Jul 9, 10:55 AM
arsenm added inline comments to D47541: Allow creating llvm::Function in non-zero address spaces.
Mon, Jul 9, 9:07 AM
arsenm added a reviewer for D49035: AMDGPU: Force inlining if LDS global address is used: rampitec.
Mon, Jul 9, 9:03 AM
arsenm added a comment to D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.

Typo in summary v4i32/v4f32 -> v3i32,/v3f32.

Mon, Jul 9, 8:09 AM
arsenm added inline comments to D47154: Try to make builtin address space declarations not useless.
Mon, Jul 9, 3:34 AM
arsenm updated the diff for D47154: Try to make builtin address space declarations not useless.

Add sema test for numbered address spaces

Mon, Jul 9, 3:32 AM
arsenm added a comment to D40183: [AMDGPU] Waitcnt pass. Add S_WAITCNT 0 if incomplete predecessor info.

Is this still necessary? I thought I saw a similar patch before

Mon, Jul 9, 3:15 AM
arsenm accepted D46871: [AMDGPU] Add interpolation builtins.

LGTM. Checking the full operands wouldn't hurt though.

Mon, Jul 9, 3:14 AM
arsenm accepted D45882: AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp.

LGTM

Mon, Jul 9, 3:10 AM
arsenm accepted D49052: RenameIndependentSubregs: Fix handling of undef tied operands.

LGTM

Mon, Jul 9, 3:09 AM · Restricted Project
arsenm accepted D47203: [LowerSwitch] Fixed faulty PHI node in switch default block.

LGTM with nit

Mon, Jul 9, 3:06 AM
arsenm accepted D49004: [CodeGen] Emit more precise AssertZext/AssertSext nodes..

LGTM

Mon, Jul 9, 3:05 AM
arsenm accepted D46172: AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum.

LGTM

Mon, Jul 9, 3:04 AM
arsenm added a comment to D48582: Reverse subregister saved loops in register usage info collector..

ping

Mon, Jul 9, 3:02 AM
arsenm created D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.
Mon, Jul 9, 1:45 AM
arsenm created D49064: DAG: Add helper for creating shifts with correct type.
Mon, Jul 9, 1:22 AM
arsenm added inline comments to D49027: [TableGen] FixedLenDecoderEmitter: allow for dummy operand in MCInst.
Mon, Jul 9, 1:12 AM

Fri, Jul 6

arsenm accepted D49037: AMDGPU: Refactor Subtarget classes.

LGTM

Fri, Jul 6, 11:35 AM
arsenm accepted D48431: AMDGPU: Force skip over s_sendmsg and exp instructions.

LGTM

Fri, Jul 6, 11:31 AM
arsenm created D49035: AMDGPU: Force inlining if LDS global address is used.
Fri, Jul 6, 10:15 AM
arsenm added inline comments to D49027: [TableGen] FixedLenDecoderEmitter: allow for dummy operand in MCInst.
Fri, Jul 6, 9:26 AM
arsenm accepted D48979: AMDGPU: Fix UBSan error caused by r335942.

LGTM

Fri, Jul 6, 6:41 AM
arsenm added inline comments to D49004: [CodeGen] Emit more precise AssertZext/AssertSext nodes..
Fri, Jul 6, 5:14 AM

Thu, Jul 5

arsenm added inline comments to D48979: AMDGPU: Fix UBSan error caused by r335942.
Thu, Jul 5, 10:42 AM
arsenm added inline comments to D48979: AMDGPU: Fix UBSan error caused by r335942.
Thu, Jul 5, 10:24 AM
arsenm closed D48573: [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic.

This was extended to f16 in r335866

Thu, Jul 5, 10:18 AM
arsenm closed D48556: Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN.

r336375

Thu, Jul 5, 10:10 AM
arsenm committed rL336375: Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN.
Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN
Thu, Jul 5, 10:10 AM
arsenm committed rL336374: AMDGPU: Don't use spir_kernel in a test.
AMDGPU: Don't use spir_kernel in a test
Thu, Jul 5, 10:06 AM
arsenm committed rL336373: AMDGPU/GlobalISel: Implement custom kernel arg lowering.
AMDGPU/GlobalISel: Implement custom kernel arg lowering
Thu, Jul 5, 10:06 AM
arsenm closed D48819: AMDGPU/GlobalISel: Implement custom kernel arg lowering.

r336373

Thu, Jul 5, 10:06 AM
arsenm created D48978: AMDGPU: Fix handling of alignment padding in DAG argument lowering.
Thu, Jul 5, 9:35 AM

Mon, Jul 2

arsenm added inline comments to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.
Mon, Jul 2, 7:14 AM
arsenm added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

I would expect the intrinsics to change for this. You can use a struct return type, which is what I would expect for this. Something like { <4 x float>, i1 }? You also could have a 5 element vector, it would just require more work to deal with during lowering

Mon, Jul 2, 7:10 AM
arsenm created D48819: AMDGPU/GlobalISel: Implement custom kernel arg lowering.
Mon, Jul 2, 12:56 AM

Fri, Jun 29

arsenm accepted D48772: [AMDGPU] Add VALU to V_INTERP Instructions.

LGTM

Fri, Jun 29, 11:01 AM
arsenm accepted D48777: AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal..

LGTM

Fri, Jun 29, 10:51 AM
arsenm accepted D48772: [AMDGPU] Add VALU to V_INTERP Instructions.

LGTM with test using GCN-NEXT

Fri, Jun 29, 10:51 AM
arsenm committed rL335999: AMDGPU: Don't use struct type for argument layout.
AMDGPU: Don't use struct type for argument layout
Fri, Jun 29, 10:36 AM