arsenm (Matt Arsenault)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 5 2012, 4:53 PM (302 w, 6 d)

Recent Activity

Yesterday

arsenm added a comment to D36224: [TwoAddressInstructionPass] Replace subregister uses when processing tied operands.

The tests may pass, but is the output exactly the same? I think this should break the test in the listed example. There can't be a copy in the case it's trying to fix

Sorry, I don't think I understand what you mean. Are you talking about the example in the summary, or the added mir-test?

Tue, Sep 25, 5:19 PM
arsenm added a comment to D50200: AMDGPU: Handle "uniform-work-group-size" attribute.

I think you need to split this into a separate loop before the propagate attributes function instead of adding the recursive call at the same time. This is different from the other attributes because it is inferred top down. You should have a first loop over the CallGraphSCC that adds this. Since the CallGraphSCC should have all of the nodes reachable from each other, this should be some set building / checks from there. You shouldn't need to be looking at the instructions inside the functions and looking for specific call sites

Tue, Sep 25, 5:02 PM · Restricted Project
arsenm accepted D52469: AMDGPU/SI: Change predicate to isCIOnly for 32-bit imm s_buffer_load* patterns.

LGTM

Tue, Sep 25, 4:27 PM
arsenm added a comment to D52518: AMDGPU: Add Selection patterns to support add of one bit..

Also needs to support/test sub. I’m also concerned it’s not this simple with i1 to use a scalar operation. This needs some tests stressing SIFixSGPRCopies, and with control flow

Tue, Sep 25, 3:09 PM

Mon, Sep 24

arsenm added a comment to D52410: Use TRI->regsOverlap() in MachineBasicBlock::computeRegisterLiveness.

An instruction just needs to def one or the other and the live out would be vcc, since it needs to visit the super register

Hm, you think any of the cases in test/CodeGen/AMDGPU/fold-immediate-operand-shrink.mir could be copied and modifed
into exposing it then? Any idea which one would be most appropriate?

Mon, Sep 24, 5:56 PM
arsenm committed rL342879: AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses.
AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses
Mon, Sep 24, 7:24 AM
arsenm committed rL342878: Fix some missing opcodes in bcanalyzer.
Fix some missing opcodes in bcanalyzer
Mon, Sep 24, 7:24 AM
arsenm committed rL342858: Fix asserts when linking wrong address space declarations.
Fix asserts when linking wrong address space declarations
Mon, Sep 24, 7:23 AM
arsenm committed rL342857: llvm-diff: Fix crash on anonymous functions.
llvm-diff: Fix crash on anonymous functions
Mon, Sep 24, 7:23 AM
arsenm added a dependency for D52416: Allow FP types for atomicrmw xchg: D52415: Add atomicrmw operation to error messages.
Mon, Sep 24, 7:20 AM
arsenm created D52416: Allow FP types for atomicrmw xchg.
Mon, Sep 24, 7:20 AM
arsenm added a dependent revision for D52415: Add atomicrmw operation to error messages: D52416: Allow FP types for atomicrmw xchg.
Mon, Sep 24, 7:20 AM
arsenm added a dependency for D52415: Add atomicrmw operation to error messages: D52414: IR: Move AtomicRMW string names into class.
Mon, Sep 24, 7:20 AM
arsenm added a dependent revision for D52414: IR: Move AtomicRMW string names into class: D52415: Add atomicrmw operation to error messages.
Mon, Sep 24, 7:20 AM
arsenm created D52415: Add atomicrmw operation to error messages.
Mon, Sep 24, 7:20 AM
arsenm created D52414: IR: Move AtomicRMW string names into class.
Mon, Sep 24, 7:20 AM
arsenm created D52413: AMDGPU: Expand atomicrmw nand in IR.
Mon, Sep 24, 7:20 AM
arsenm added a comment to D36224: [TwoAddressInstructionPass] Replace subregister uses when processing tied operands.

The tests may pass, but is the output exactly the same? I think this should break the test in the listed example. There can't be a copy in the case it's trying to fix

Mon, Sep 24, 7:20 AM
arsenm closed D52170: AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses.

r342879

Mon, Sep 24, 7:20 AM
arsenm added a comment to D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).

Should the instcombine part change also to allow creation of i1 uses?

What do you mean by that? I'm not sure what you mean.

Mon, Sep 24, 7:20 AM
arsenm added inline comments to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Mon, Sep 24, 7:20 AM
arsenm created D52412: OpenCL: Mark printf format string argument.
Mon, Sep 24, 7:20 AM
arsenm added a comment to D52410: Use TRI->regsOverlap() in MachineBasicBlock::computeRegisterLiveness.

An instruction just needs to def one or the other and the live out would be vcc, since it needs to visit the super register

Mon, Sep 24, 7:20 AM
arsenm closed D52354: Fix some missing opcodes in bcanalyzer.

r342878

Mon, Sep 24, 7:20 AM
arsenm added a comment to D52353: Fix asserts when linking wrong address space declarations.

r342858

Mon, Sep 24, 7:20 AM
arsenm closed D52353: Fix asserts when linking wrong address space declarations.
Mon, Sep 24, 7:20 AM
arsenm closed D52352: llvm-diff: Fix crash on anonymous functions.

r342857

Mon, Sep 24, 7:20 AM

Sat, Sep 22

arsenm added inline comments to D51348: CodeGen: Make computeRegisterLiveness consider successors.
Sat, Sep 22, 7:09 AM
arsenm added inline comments to D52310: Set correct MMO offset on scalarized load pieces.
Sat, Sep 22, 6:16 AM

Fri, Sep 21

arsenm created D52355: AMDGPU: Always run AMDGPUAlwaysInline.
Fri, Sep 21, 5:24 AM
arsenm created D52354: Fix some missing opcodes in bcanalyzer.
Fri, Sep 21, 5:11 AM
arsenm created D52353: Fix asserts when linking wrong address space declarations.
Fri, Sep 21, 5:10 AM
arsenm created D52352: llvm-diff: Fix crash on anonymous functions.
Fri, Sep 21, 5:10 AM

Thu, Sep 20

arsenm added a comment to D52221: [AMDGPU] lower-switch in preISel as a workaround for legacy DA.

Needs a comment that it’s a workaround for DA

Thu, Sep 20, 12:47 AM
arsenm added inline comments to D52221: [AMDGPU] lower-switch in preISel as a workaround for legacy DA.
Thu, Sep 20, 12:38 AM

Mon, Sep 17

arsenm committed rL342443: AMDGPU: Don't form fmed3 if it will require materialization.
AMDGPU: Don't form fmed3 if it will require materialization
Mon, Sep 17, 7:36 PM
arsenm closed D52172: AMDGPU: Don't form fmed3 if it will require materialization.

r342443

Mon, Sep 17, 7:36 PM
arsenm accepted D52006: Copy utilities updated and added for MI flags.

LGTM

Mon, Sep 17, 7:26 PM
arsenm committed rL342442: LSV: Fix adjust alloca alignment trick for AMDGPU.
LSV: Fix adjust alloca alignment trick for AMDGPU
Mon, Sep 17, 7:07 PM
arsenm closed D52024: LSV: Fix adjust alloca alignment trick for AMDGPU.

r342442

Mon, Sep 17, 7:07 PM
arsenm committed rL342439: AMDGPU: Expand vector canonicalizes.
AMDGPU: Expand vector canonicalizes
Mon, Sep 17, 6:53 PM
arsenm added reviewers for D48144: [Support] Teach YAMLIO about polymorphic types: kledzik, dblaikie.
Mon, Sep 17, 8:14 AM
arsenm created D52172: AMDGPU: Don't form fmed3 if it will require materialization.
Mon, Sep 17, 7:24 AM
arsenm created D52170: AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses.
Mon, Sep 17, 7:05 AM
arsenm added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Mon, Sep 17, 6:58 AM · Restricted Project
arsenm added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Mon, Sep 17, 6:58 AM · Restricted Project
arsenm added inline comments to D51348: CodeGen: Make computeRegisterLiveness consider successors.
Mon, Sep 17, 6:52 AM
arsenm added a comment to D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.

I don't actually understand why this code is where it is? Why is SIFixSGPRCopies doing this? To clarify is this just an optimization? My initial reaction was that it was a fix, but looking at it again it seems like an optimization to me

Mon, Sep 17, 6:41 AM
arsenm committed rL342390: Fix vectorization of canonicalize.
Fix vectorization of canonicalize
Mon, Sep 17, 6:26 AM

Thu, Sep 13

arsenm added a comment to D51742: [AMDGPU] Fix regression with not maintaining MachineDominatorTree.

The regression was caught by the expensive_checks build, if that is sufficient for testing? I am actually not sure how to write a lit test for this, because opt -analyze will just re-calculate the DT; I don't know how to just print the DT after the legalize pass runs.

Thu, Sep 13, 10:09 PM
arsenm added a comment to D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).

Should the instcombine part change also to allow creation of i1 uses?

Thu, Sep 13, 10:05 PM
arsenm added a comment to D51794: AMDGPU: Don't error on calls to null or undef.

Considering emitting traps requires fixing traps first, otherwise a program that should work will incorrectly trap

Thu, Sep 13, 5:19 AM
arsenm closed D52012: AMDGPU: Fix not preserving alignent in call setups.

r342133

Thu, Sep 13, 5:15 AM
arsenm committed rL342133: AMDGPU: Fix not preserving alignent in call setups.
AMDGPU: Fix not preserving alignent in call setups
Thu, Sep 13, 5:15 AM
arsenm committed rL342132: DAG: Fix expansion of unaligned FP loads and stores.
DAG: Fix expansion of unaligned FP loads and stores
Thu, Sep 13, 5:15 AM
arsenm closed D52011: DAG: Fix expansion of unaligned FP loads and stores.

r342132

Thu, Sep 13, 5:15 AM
arsenm added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Thu, Sep 13, 5:15 AM · Restricted Project
arsenm added inline comments to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..
Thu, Sep 13, 5:11 AM · Restricted Project
arsenm added a comment to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..

Should include tests running only the IR pass

Thu, Sep 13, 5:10 AM · Restricted Project
arsenm committed rL342131: AMDGPU: Fix some outdated datalayouts in tests.
AMDGPU: Fix some outdated datalayouts in tests
Thu, Sep 13, 4:57 AM
arsenm added a comment to D51925: [AMDGPU] Fix issue for zext of f16 to i32.

Looking again at the code - you're correct that it attempts to only do this transformation if the high bits are zero.
However, the code that checks this has the following telling comment:

// (i32 zext (i16 (bitcast f16:$src))) -> fp16_zext $src
// FIXME: It is not universally true that the high bits are zeroed on gfx9.
if (Src.getOpcode() == ISD::BITCAST) {
  SDValue BCSrc = Src.getOperand(0);
  if (BCSrc.getValueType() == MVT::f16 &&
      fp16SrcZerosHighBits(BCSrc.getOpcode()))
    return DCI.DAG.getNode(AMDGPUISD::FP16_ZEXT, SDLoc(N), VT, BCSrc);
}

In this particular case the BCSrc operation was an fptrunc which passes the fp16SrcZerosHighBits test - but that eventually ends up as v_mad_mixlo_f16 which doesn't ensure that the high bits are zero.

Any suggestions on how to proceed? I agree that it seems a shame to have to insert the extra AND operation blindly.

Thu, Sep 13, 4:40 AM
arsenm added inline comments to D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.
Thu, Sep 13, 4:28 AM
arsenm added inline comments to D52024: LSV: Fix adjust alloca alignment trick for AMDGPU.
Thu, Sep 13, 4:26 AM
arsenm created D52024: LSV: Fix adjust alloca alignment trick for AMDGPU.
Thu, Sep 13, 4:24 AM
arsenm added inline comments to D52010: RegAllocFast: Rewrite and improve.
Thu, Sep 13, 2:45 AM
arsenm added inline comments to D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.
Thu, Sep 13, 2:01 AM
arsenm added a comment to D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.

Should have a special check in the verifier

Thu, Sep 13, 1:55 AM
arsenm requested changes to D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.
Thu, Sep 13, 1:54 AM
arsenm added inline comments to D52006: Copy utilities updated and added for MI flags.
Thu, Sep 13, 1:54 AM

Wed, Sep 12

arsenm added inline comments to D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.
Wed, Sep 12, 8:57 PM
arsenm created D52012: AMDGPU: Fix not preserving alignent in call setups.
Wed, Sep 12, 8:19 PM
arsenm added inline comments to D51975: [AMDGPU] Preliminary patch for divergence driven instruction selection. Load offset inlining pattern changed..
Wed, Sep 12, 8:19 PM
arsenm created D52011: DAG: Fix expansion of unaligned FP loads and stores.
Wed, Sep 12, 8:19 PM
arsenm added a comment to D51995: AMDGPU: Generate VALU ThreeOp Integer instructions.

Needs some edge case tests where the values are uniform but still need to be in a VGPR

Wed, Sep 12, 8:16 PM
arsenm added inline comments to D51947: [AMDGPU] Match udot8 pattern.
Wed, Sep 12, 8:14 PM

Tue, Sep 11

arsenm added inline comments to D51925: [AMDGPU] Fix issue for zext of f16 to i32.
Tue, Sep 11, 5:29 AM

Mon, Sep 10

arsenm added inline comments to D51474: Consider CSRs in computeRegisterLiveness.
Mon, Sep 10, 9:56 PM
arsenm committed rL341898: AMDGPU: Fix r600 test.
AMDGPU: Fix r600 test
Mon, Sep 10, 9:40 PM
arsenm created D51909: AMDGPU: Fix annotate kernel features through casted calls.
Mon, Sep 10, 9:38 PM
arsenm closed D51845: AMDGPU: Remove leftovers from configurable address spaces.

r341895

Mon, Sep 10, 9:02 PM
arsenm committed rL341895: AMDGPU: Remove leftovers from configurable address spaces.
AMDGPU: Remove leftovers from configurable address spaces
Mon, Sep 10, 9:02 PM
arsenm closed D51843: AMDGPU: Don't error on out of bounds address spaces.

r341894

Mon, Sep 10, 9:02 PM
arsenm committed rL341894: AMDGPU: Don't error on out of bounds address spaces.
AMDGPU: Don't error on out of bounds address spaces
Mon, Sep 10, 9:02 PM
arsenm updated the diff for D51736: DAG: Change behavior of fminnum/fmaxnum nodes.

Remove commented out code

Mon, Sep 10, 8:57 PM
arsenm added inline comments to D50200: AMDGPU: Handle "uniform-work-group-size" attribute.
Mon, Sep 10, 8:49 PM · Restricted Project
arsenm created D51857: HIP: Remove reference to irif.
Mon, Sep 10, 6:04 AM
arsenm closed D51796: AMDGPU: Remove function pointer type hack.

r341806

Mon, Sep 10, 5:17 AM
arsenm committed rL341806: AMDGPU: Remove function pointer type hack.
AMDGPU: Remove function pointer type hack
Mon, Sep 10, 5:17 AM
arsenm committed rL341803: AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit.
AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit
Mon, Sep 10, 5:02 AM
arsenm closed D51793: AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit.

r341803

Mon, Sep 10, 5:02 AM
arsenm closed D50572: DAG: Handle odd vector sizes in calling conv splitting.

r341801

Mon, Sep 10, 4:50 AM
arsenm committed rL341801: DAG: Handle odd vector sizes in calling conv splitting.
DAG: Handle odd vector sizes in calling conv splitting
Mon, Sep 10, 4:50 AM

Sun, Sep 9

arsenm added a comment to D51737: DAG: Combine extract_vector_elt of concat_vectors.

Handling INSERT_SUBVECTOR might make sense as well?

Sun, Sep 9, 8:28 PM
arsenm added a dependency for D51845: AMDGPU: Remove leftovers from configurable address spaces: D51843: AMDGPU: Don't error on out of bounds address spaces.
Sun, Sep 9, 8:10 PM
arsenm added a dependent revision for D51843: AMDGPU: Don't error on out of bounds address spaces: D51845: AMDGPU: Remove leftovers from configurable address spaces.
Sun, Sep 9, 8:10 PM
arsenm created D51845: AMDGPU: Remove leftovers from configurable address spaces.
Sun, Sep 9, 8:09 PM
arsenm committed rL341770: AMDGPU: Fix tests using old number for constant address space.
AMDGPU: Fix tests using old number for constant address space
Sun, Sep 9, 7:57 PM
arsenm closed D51797: AMDGPU: Fix tests using old number for constant address space.

r341770

Sun, Sep 9, 7:57 PM
arsenm committed rL341768: AMDGPU: Use GOT PSV since it has an address space now.
AMDGPU: Use GOT PSV since it has an address space now
Sun, Sep 9, 7:26 PM
arsenm committed rL341767: AMDGPU: Don't abort on unknown addrspace argument.
AMDGPU: Don't abort on unknown addrspace argument
Sun, Sep 9, 7:26 PM