Page MenuHomePhabricator

arsenm (Matt Arsenault)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 5 2012, 4:53 PM (411 w, 1 d)

Recent Activity

Yesterday

arsenm updated the diff for D89997: AMDGPU: Increase branch size estimate with offset bug.

Add another test

Thu, Oct 22, 7:57 PM · Restricted Project
arsenm requested review of D89997: AMDGPU: Increase branch size estimate with offset bug.
Thu, Oct 22, 5:30 PM · Restricted Project
arsenm accepted D88955: [AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy.
Thu, Oct 22, 11:54 AM · Restricted Project
arsenm committed rG549f326d3254: AMDGPU: Cleanup MIR test (authored by arsenm).
AMDGPU: Cleanup MIR test
Thu, Oct 22, 9:55 AM
arsenm added a comment to D89890: [Docs] Clarify that FunctionPasses can't add/remove declarations.

Ideally they'd be module passes, like in https://reviews.llvm.org/D86178.
It does look like some existing legacy and NPM function passes break this rule. Maybe I should revert and ask more explicitly on llvm-dev. WDYT?

Thu, Oct 22, 9:47 AM · Restricted Project
arsenm added inline comments to D89386: [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse..
Thu, Oct 22, 9:35 AM · Restricted Project
arsenm added a comment to D89890: [Docs] Clarify that FunctionPasses can't add/remove declarations.

How is this supposed to work practically? A huge number of functions passes want to add intrinsic declarations

Thu, Oct 22, 9:08 AM · Restricted Project
arsenm added inline comments to D86878: [AMDGPU] Fix a miscompile with S_ADD/S_SUB.
Thu, Oct 22, 9:02 AM · Restricted Project
arsenm accepted D89965: [AMDGPU] Fix expansion of i16 MULH.
Thu, Oct 22, 8:02 AM · Restricted Project
arsenm committed rGd5c056166798: AMDGPU: Fix not always reserving VGPRs used for SGPR spilling (authored by arsenm).
AMDGPU: Fix not always reserving VGPRs used for SGPR spilling
Thu, Oct 22, 7:30 AM
arsenm closed D89510: AMDGPU: Fix not always reserving VGPRs used for SGPR spilling.

d5c05616679894b3eb99194f1a3ffeef07c5cb19

Thu, Oct 22, 7:30 AM · Restricted Project
arsenm committed rGd3bcfe2a3602: AMDGPU: Implement getNoPreservedMask (authored by arsenm).
AMDGPU: Implement getNoPreservedMask
Thu, Oct 22, 7:18 AM
arsenm closed D89466: AMDGPU: Implement getNoPreservedMask.

d3bcfe2a3602e14606417d8bb9d6bbaf636d9a02

Thu, Oct 22, 7:17 AM · Restricted Project
arsenm closed D89482: ScheduleDAGInstrs: Skip debug instructions at end of scheduling region.

188df1742042610a4c9af1fff9943d3d2a2740c6

Thu, Oct 22, 7:17 AM · Restricted Project
arsenm committed rG188df1742042: ScheduleDAGInstrs: Skip debug instructions at end of scheduling region (authored by arsenm).
ScheduleDAGInstrs: Skip debug instructions at end of scheduling region
Thu, Oct 22, 7:17 AM
arsenm accepted D84779: [AMDGPU] Add amdgpu specific loop threshold metadata.
Thu, Oct 22, 7:00 AM · Restricted Project
arsenm added a comment to D89917: Revert "SimplifyCFG: Clean up optforfuzzing implementation".

Thank you for the patch!

I disagree with this change. SimplifyCFG should continue to pass in options for transform options as is done for the others. I don't think avoiding duplication between the new PM and code that will be deleted is worth it

I disagree here. Why is that beneficial, if we want to force-disable those transforms regardless of their previous configured status?

Because SimplifyCFG is a standalone utility, not just a pass. It's a pass's responsibility to ensure attributes are respected, not individual transformations. If I'm writing another control flow pass (e.g. a structurizer) and calling these utilities, the transformation options should not be limited based on this extremely poorly specified attribute.

Let's back-track here.
Are you saying that those folds should not disable themselves in presence of OptForFuzzing attribute,
or that they should be guarded by their options?
Because i agree with later, but disagree with former.

Thu, Oct 22, 6:59 AM · Restricted Project
arsenm added a comment to D89170: [AMDGPU] Use flat scratch instructions where available.

I also came to conclusion that the only robust way to have no failed scavenging during frame lowering is to always have an sp or fp. Otherwise it can fail regardless of the spilling method. The only other way is to have an instruction with full 32 bit immediate offset. I.e. it can fail in a kernel with MUBUF as well.

Thu, Oct 22, 6:35 AM · Restricted Project
arsenm added inline comments to D72939: [Schedule] Add a MultiHazardRecognizer.
Thu, Oct 22, 6:32 AM · Restricted Project
arsenm added inline comments to D72939: [Schedule] Add a MultiHazardRecognizer.
Thu, Oct 22, 6:32 AM · Restricted Project
arsenm added a comment to D83088: Introduce CfgTraits abstraction.

David, I don't think this is appropriate here. Let's take the discussion to llvm-dev.

Seems like David asked to revert in the meantime?

Thu, Oct 22, 6:28 AM · Restricted Project, Restricted Project, Restricted Project
arsenm added a comment to D89917: Revert "SimplifyCFG: Clean up optforfuzzing implementation".

Thank you for the patch!

I disagree with this change. SimplifyCFG should continue to pass in options for transform options as is done for the others. I don't think avoiding duplication between the new PM and code that will be deleted is worth it

I disagree here. Why is that beneficial, if we want to force-disable those transforms regardless of their previous configured status?

Thu, Oct 22, 6:24 AM · Restricted Project

Wed, Oct 21

arsenm requested changes to D89917: Revert "SimplifyCFG: Clean up optforfuzzing implementation".

I disagree with this change. SimplifyCFG should continue to pass in options for transform options as is done for the others. I don't think avoiding duplication between the new PM and code that will be deleted is worth it

Wed, Oct 21, 5:35 PM · Restricted Project
arsenm added inline comments to D87107: [AMDGPU] Target hook to apply target specific split constraint.
Wed, Oct 21, 2:47 PM · Restricted Project
arsenm added inline comments to D89900: [amdgpu] Enhance disjoint memory accesses checking..
Wed, Oct 21, 11:29 AM · Restricted Project
arsenm committed rG817cd3d191be: Fix missing c++ mode comment (authored by arsenm).
Fix missing c++ mode comment
Wed, Oct 21, 10:41 AM
arsenm accepted D89023: [NFC] Clean up always false variables.
Wed, Oct 21, 9:58 AM · Restricted Project
arsenm accepted D89599: [AMDGPU] Fixed v_swap_b32 match.
Wed, Oct 21, 9:57 AM · Restricted Project
arsenm added inline comments to D89599: [AMDGPU] Fixed v_swap_b32 match.
Wed, Oct 21, 9:08 AM · Restricted Project
arsenm closed D89805: AMDGPU: Lower the threshold reported for maximum stack size exceeded.

1ed4caff1d5cd49233c1ae7b9f6483a946ed5eea

Wed, Oct 21, 9:07 AM · Restricted Project
arsenm committed rG1ed4caff1d5c: AMDGPU: Lower the threshold reported for maximum stack size exceeded (authored by arsenm).
AMDGPU: Lower the threshold reported for maximum stack size exceeded
Wed, Oct 21, 9:07 AM
arsenm committed rG53c43431bc6a: AMDGPU: Propagate amdgpu-flat-work-group-size attributes (authored by arsenm).
AMDGPU: Propagate amdgpu-flat-work-group-size attributes
Wed, Oct 21, 9:07 AM
arsenm closed D89737: AMDGPU: Propagate amdgpu-flat-work-group-size attributes.

53c43431bc6a01ad1e29b9351450ac18d5270ab3

Wed, Oct 21, 9:06 AM · Restricted Project
arsenm added inline comments to D84779: [AMDGPU] Add amdgpu specific loop threshold metadata.
Wed, Oct 21, 8:03 AM · Restricted Project

Tue, Oct 20

arsenm accepted D89806: [AMDGPU] Remove getAllVGPR32() which cannot handle Accum VGPRs properly.
Tue, Oct 20, 9:58 AM · Restricted Project
arsenm requested review of D89805: AMDGPU: Lower the threshold reported for maximum stack size exceeded.
Tue, Oct 20, 9:40 AM · Restricted Project
arsenm added a comment to D89804: [AMDGPU] Fix off by one in assert.

Testcase?

Tue, Oct 20, 9:37 AM · Restricted Project

Mon, Oct 19

arsenm added inline comments to D89170: [AMDGPU] Use flat scratch instructions where available.
Mon, Oct 19, 3:30 PM · Restricted Project
arsenm accepted D89501: [AMDGPU] flat scratch ST addressing mode on gfx10.
Mon, Oct 19, 3:26 PM · Restricted Project
arsenm added inline comments to D89501: [AMDGPU] flat scratch ST addressing mode on gfx10.
Mon, Oct 19, 2:23 PM · Restricted Project
arsenm requested review of D89737: AMDGPU: Propagate amdgpu-flat-work-group-size attributes.
Mon, Oct 19, 2:04 PM · Restricted Project
arsenm committed rGae3625d7526f: Fix typo (authored by arsenm).
Fix typo
Mon, Oct 19, 12:37 PM
arsenm added inline comments to D89525: [amdgpu] Enhance AMDGPU AA..
Mon, Oct 19, 10:31 AM · Restricted Project
arsenm added a comment to D89610: AMDGPU: Fix missing read/writelane cases to skip with exec=0.

Adding writelane absolutely makes sense to me, but:

Additionally, since we emit the final encoded instructions, this wasn't really matching readlane either.

:(

We have a bunch of places where we _always_ use real instructions, and presumably those are fine. But if we let real instructions creep into places where we usually use pseudos, we end up making the backend less efficient because over time, everybody will have to check for pseudos and reals (like in this particular case here). Can we instead add some sort of check that those real instructions aren't used during CodeGen? (The MI verifier should be able to check for this, for example.)

Mon, Oct 19, 10:10 AM · Restricted Project
arsenm accepted D89706: [AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24.
Mon, Oct 19, 9:10 AM · Restricted Project
arsenm added inline comments to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Mon, Oct 19, 8:22 AM · Restricted Project

Sat, Oct 17

arsenm added a comment to D89599: [AMDGPU] Fixed v_swap_b32 match.

Couldn't this try to preserve the implicit operands?

Sat, Oct 17, 6:26 AM · Restricted Project

Fri, Oct 16

arsenm requested review of D89610: AMDGPU: Fix missing read/writelane cases to skip with exec=0.
Fri, Oct 16, 5:56 PM · Restricted Project
arsenm added a comment to D89582: clang/AMDGPU: Apply workgroup related attributes to all functions.

What if a device function is called by kernels with different work group sizes, will caller's work group size override callee's work group size?

It's user error to call a function with a larger range than the caller

The problem is that user can override default on a kernel with the attribute, but cannot do so on function. So a module can be compiled with a default smaller than requested on one of the kernels.

Fri, Oct 16, 12:35 PM
arsenm accepted D89568: [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs.
Fri, Oct 16, 12:13 PM · Restricted Project
arsenm added a comment to D89582: clang/AMDGPU: Apply workgroup related attributes to all functions.

What if a device function is called by kernels with different work group sizes, will caller's work group size override callee's work group size?

Fri, Oct 16, 12:12 PM
arsenm requested review of D89582: clang/AMDGPU: Apply workgroup related attributes to all functions.
Fri, Oct 16, 11:51 AM
arsenm added inline comments to D89568: [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs.
Fri, Oct 16, 11:39 AM · Restricted Project
arsenm added inline comments to D89568: [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs.
Fri, Oct 16, 11:03 AM · Restricted Project
arsenm closed D88973: IR: Fix issues with linking typed sret.

Merged into 0a7cd99a702595ccf73c957be0127af9f25fb9a2

Fri, Oct 16, 8:56 AM · Restricted Project
arsenm accepted D89558: [AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic.
Fri, Oct 16, 8:54 AM · Restricted Project
arsenm added a reverting change for rGeb9f7c28e5fe: Revert "OpaquePtr: Add type to sret attribute": rG0a7cd99a7025: Reapply "OpaquePtr: Add type to sret attribute".
Fri, Oct 16, 8:05 AM
arsenm committed rG0a7cd99a7025: Reapply "OpaquePtr: Add type to sret attribute" (authored by arsenm).
Reapply "OpaquePtr: Add type to sret attribute"
Fri, Oct 16, 8:05 AM
arsenm accepted D89077: [AMDGPU] Run hazard recognizer pass later.
Fri, Oct 16, 7:41 AM · Restricted Project
arsenm accepted D83088: Introduce CfgTraits abstraction.
Fri, Oct 16, 7:24 AM · Restricted Project, Restricted Project, Restricted Project
arsenm added a comment to D89480: [GlobalISel][Legalizer] Implement lower action for G_FSHL/G_FSHR.

Duplicate D76500

I somehow forgot about this open review. I'll abandon this then.
Is there anything we can help with to get D76500 integrated? Since a recent upstream change we are seeing G_FSHL's being generated, which our GISel-only backend cannot handle.

Fri, Oct 16, 7:23 AM · Restricted Project
arsenm accepted D89028: [GlobalISel] Add translation support for vector reduction intrinsics.
Fri, Oct 16, 7:22 AM · Restricted Project
arsenm added a comment to D89510: AMDGPU: Fix not always reserving VGPRs used for SGPR spilling.

Any tests?

Fri, Oct 16, 7:21 AM · Restricted Project
arsenm added a comment to D89388: [AMDGPU] Fix ieee mode default value.

Can you add a test that directly checks the mode setting in the output

Fri, Oct 16, 7:19 AM · Restricted Project
arsenm committed rGee6e25e4391a: llvm-reduce: Don't replace intrinsic calls with undef (authored by arsenm).
llvm-reduce: Don't replace intrinsic calls with undef
Fri, Oct 16, 7:11 AM
arsenm closed D88684: llvm-reduce: Don't replace intrinsic calls with undef.

ee6e25e4391a6d3ac0a3c89615474e512f44cda6

Fri, Oct 16, 7:11 AM · Restricted Project
arsenm added a comment to D88684: llvm-reduce: Don't replace intrinsic calls with undef.

Since we should catch these cases later on in reduceInstructionsDeltaPass() (please add a test), this should be fine i guess.

Fri, Oct 16, 6:55 AM · Restricted Project
arsenm committed rG952f43cb431c: llvm-reduce: Fix typo in status message (authored by arsenm).
llvm-reduce: Fix typo in status message
Fri, Oct 16, 6:48 AM
arsenm committed rGce16b6835bce: AMDGPU: Don't kill super-register with overlapping copy (authored by arsenm).
AMDGPU: Don't kill super-register with overlapping copy
Fri, Oct 16, 6:46 AM
arsenm closed D89502: AMDGPU: Don't kill super-register with overlapping copy.

ce16b6835bce18989e1dc0796305fe703e59ca4d

Fri, Oct 16, 6:46 AM · Restricted Project
arsenm accepted D89536: [AMDGPU] Do not generate S_CMP_LG_U64 on gfx7.

I don't see why this is emitting this compare in the first place, but this is just doing the same thing that's already done here

Fri, Oct 16, 6:41 AM · Restricted Project

Thu, Oct 15

arsenm added inline comments to D89525: [amdgpu] Enhance AMDGPU AA..
Thu, Oct 15, 8:01 PM · Restricted Project
arsenm requested review of D89510: AMDGPU: Fix not always reserving VGPRs used for SGPR spilling.
Thu, Oct 15, 4:13 PM · Restricted Project
arsenm added inline comments to D89502: AMDGPU: Don't kill super-register with overlapping copy.
Thu, Oct 15, 3:32 PM · Restricted Project
arsenm added inline comments to D89502: AMDGPU: Don't kill super-register with overlapping copy.
Thu, Oct 15, 3:18 PM · Restricted Project
arsenm requested review of D89502: AMDGPU: Don't kill super-register with overlapping copy.
Thu, Oct 15, 3:07 PM · Restricted Project
arsenm added a comment to D89424: [AMDGPU] Spilling using flat scratch.

A fantastic result that it has passed PSDB on gfx9 with flat scratch enabled. I did not expect it to start working from the first attempt.

Thu, Oct 15, 3:04 PM · Restricted Project
arsenm added a comment to D89170: [AMDGPU] Use flat scratch instructions where available.

This will change the ABI, so I don't think belongs as a subtarget property

Thu, Oct 15, 11:57 AM · Restricted Project
arsenm requested review of D89482: ScheduleDAGInstrs: Skip debug instructions at end of scheduling region.
Thu, Oct 15, 9:25 AM · Restricted Project
arsenm added a comment to D89480: [GlobalISel][Legalizer] Implement lower action for G_FSHL/G_FSHR.

Duplicate D76500

Thu, Oct 15, 9:23 AM · Restricted Project
arsenm requested review of D89466: AMDGPU: Implement getNoPreservedMask.
Thu, Oct 15, 7:19 AM · Restricted Project
arsenm closed D89434: AMDGPU: Fix verifier error on killed spill of partially undef register.

663f16684d1a5f129874b7be9e1f486682977bfa

Thu, Oct 15, 7:02 AM · Restricted Project
arsenm committed rG663f16684d1a: AMDGPU: Fix verifier error on killed spill of partially undef register (authored by arsenm).
AMDGPU: Fix verifier error on killed spill of partially undef register
Thu, Oct 15, 6:59 AM

Wed, Oct 14

arsenm requested review of D89434: AMDGPU: Fix verifier error on killed spill of partially undef register.
Wed, Oct 14, 4:31 PM · Restricted Project
arsenm added a comment to D88774: Add disassembly counter after disasembly line.

Please work on removing this feature from the backend instead. It's long past time to stop handling this in the compiler

There are several shader debugger tools/situation dependent on this feature.
e.g. one line like this "s_andn2_b32 vcc_lo, vcc_lo, 0 ;000974: 8A6A806A"
1). we feed the disassembly count "0x000974" to the shader debugger tool to get this line of code execution ,register input and output.
2).Also there is some other shader dump tool,e.g. umr. or windbg which output a stream of hardware codes, we can use the hardware code to find matched part of disassembly lines.
3). we also have one shader replacement tool , to drop and replace part of the elf hardware code, if we have full disassembly line with counter and hardware code, it would be handy where to edit hardware code.

Hope it explains.

Wed, Oct 14, 4:16 PM · Restricted Project
arsenm added inline comments to D88684: llvm-reduce: Don't replace intrinsic calls with undef.
Wed, Oct 14, 1:42 PM · Restricted Project
arsenm updated the diff for D88684: llvm-reduce: Don't replace intrinsic calls with undef.

Remove requires, and change FunctionCount

Wed, Oct 14, 1:27 PM · Restricted Project
arsenm added a comment to D88684: llvm-reduce: Don't replace intrinsic calls with undef.

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

Developers should just always build all targets. We also have buildbots for this

Not necessarily - https://llvm.org/docs/GettingStarted.html suggests narrowing the target list in a few places, to speed up iteration time. So making tests more portable across targets is valuable - also means buildbots will get to your tests sooner (as the test will be run on more configurations, increasing the chances it'll be run on a buildbot that's running faster/picking up your changes sooner).

I disagree with this (maybe it's OK for frontend work), and you should run all target tests before committing backend changes. For this test it doesn't actually require the target so I've just dropped the REQUIRES

Wed, Oct 14, 1:27 PM · Restricted Project
arsenm closed D88857: InstCombine: Fix losing load properties in copy-constant-to-alloca.

6a9484f4bf6c9136f6679ab64a18c11464fd20ca

Wed, Oct 14, 9:56 AM · Restricted Project
arsenm closed D88851: InstCombine: Fix infinite loop in copy-constant-to-alloca transform.

6da31fa4a61d68af21dfa1e144e726ed6d77903e

Wed, Oct 14, 9:55 AM · Restricted Project
arsenm committed rG6a9484f4bf6c: InstCombine: Fix losing load properties in copy-constant-to-alloca (authored by arsenm).
InstCombine: Fix losing load properties in copy-constant-to-alloca
Wed, Oct 14, 9:55 AM
arsenm committed rG6da31fa4a61d: InstCombine: Fix infinite loop in copy-constant-to-alloca transform (authored by arsenm).
InstCombine: Fix infinite loop in copy-constant-to-alloca transform
Wed, Oct 14, 9:55 AM
arsenm added a comment to D89392: [GlobalISel] Fold unary opcodes in CSEMIRBuilder.

Automatically folding legalization artifacts makes me a bit nervous. What happens if you want to legalize a constant by widening, and the legalizer wants to insert the trunc from a widened constant?

Wed, Oct 14, 7:11 AM · Restricted Project

Tue, Oct 13

arsenm added a comment to D89023: [NFC] Clean up always false variables.

Was there anything that ever set these?

Tue, Oct 13, 12:20 PM · Restricted Project
arsenm added inline comments to D88684: llvm-reduce: Don't replace intrinsic calls with undef.
Tue, Oct 13, 10:30 AM · Restricted Project
arsenm added a comment to D88684: llvm-reduce: Don't replace intrinsic calls with undef.

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

Developers should just always build all targets. We also have buildbots for this

Tue, Oct 13, 8:00 AM · Restricted Project
arsenm added a comment to D88684: llvm-reduce: Don't replace intrinsic calls with undef.

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

Tue, Oct 13, 7:58 AM · Restricted Project
arsenm added a comment to D88684: llvm-reduce: Don't replace intrinsic calls with undef.

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

Tue, Oct 13, 7:54 AM · Restricted Project
arsenm added a comment to D89316: [AMDGPU][GlobalISel] Compute known bits for zero-extending loads.

This could do better by checking the range metadata on the memory operand, but I don't think that's wired into the IR->MIR lowering now (if we ever even attach this to intrinsic calls)

Tue, Oct 13, 7:38 AM · Restricted Project
arsenm accepted D89316: [AMDGPU][GlobalISel] Compute known bits for zero-extending loads.
Tue, Oct 13, 7:37 AM · Restricted Project