Page MenuHomePhabricator

arsenm (Matt Arsenault)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 5 2012, 4:53 PM (407 w, 3 d)

Recent Activity

Yesterday

arsenm added inline comments to D87936: [GISel] Add new combines for G_ADD.
Fri, Sep 25, 1:29 PM · Restricted Project
arsenm requested review of D88325: IR: Reject unsized sret in verifier.
Fri, Sep 25, 11:28 AM · Restricted Project
arsenm committed rG55c4ff91bd82: OpaquePtr: Add type to sret attribute (authored by arsenm).
OpaquePtr: Add type to sret attribute
Fri, Sep 25, 11:07 AM
arsenm closed D88241: OpaquePtr: Add type to sret attribute.

55c4ff91bd820d72014f63dcf7f3d5a0d3397986

Fri, Sep 25, 11:07 AM · Restricted Project
arsenm accepted D88246: [AMDGPU] Add bfi immediate pattern.
Fri, Sep 25, 10:57 AM · Restricted Project
arsenm added inline comments to D88246: [AMDGPU] Add bfi immediate pattern.
Fri, Sep 25, 10:56 AM · Restricted Project
arsenm added inline comments to D88315: [AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer.
Fri, Sep 25, 10:55 AM · Restricted Project
arsenm added a reverting change for rG42bfa7c63b85: Revert rGe55410f8b260 : "AArch64/GlobalISel: Add testcase for bug 47619": rG6cb0d23f2ea6: AArch64/GlobalISel: Narrow stack passed argument access size.
Fri, Sep 25, 10:36 AM
arsenm committed rG6cb0d23f2ea6: AArch64/GlobalISel: Narrow stack passed argument access size (authored by arsenm).
AArch64/GlobalISel: Narrow stack passed argument access size
Fri, Sep 25, 10:36 AM
arsenm closed D88306: AArch64/GlobalISel: Narrow stack passed argument access size.

6cb0d23f2ea6fb25106b0380797ccbc2141d71e1

Fri, Sep 25, 10:35 AM · Restricted Project
arsenm added a reverting change for rG42bfa7c63b85: Revert rGe55410f8b260 : "AArch64/GlobalISel: Add testcase for bug 47619": D88306: AArch64/GlobalISel: Narrow stack passed argument access size.
Fri, Sep 25, 7:51 AM
arsenm requested review of D88306: AArch64/GlobalISel: Narrow stack passed argument access size.
Fri, Sep 25, 7:51 AM · Restricted Project

Thu, Sep 24

arsenm committed rGe55410f8b260: AArch64/GlobalISel: Add testcase for bug 47619 (authored by arsenm).
AArch64/GlobalISel: Add testcase for bug 47619
Thu, Sep 24, 12:44 PM
arsenm committed rGe75afc9acf9b: GlobalISel: Use unmerge when copying wide vectors to result registers (authored by arsenm).
GlobalISel: Use unmerge when copying wide vectors to result registers
Thu, Sep 24, 12:20 PM
arsenm closed D87699: GlobalISel: Use unmerge when copying wide vectors to result registers.

e75afc9acf9b6de511c0c90b8e8a06364de46e3e

Thu, Sep 24, 12:20 PM · Restricted Project
arsenm accepted D87847: [AMDGPU] global-isel support for RT.

LGTM. Code should get better when we start trying to optimize bit ops

Thu, Sep 24, 10:12 AM · Restricted Project
arsenm added inline comments to D88246: [AMDGPU] Add bfi immediate pattern.
Thu, Sep 24, 10:08 AM · Restricted Project
arsenm accepted D88245: [AMDGPU] Make bfi patterns divergence-aware.

LGTM (although I think readfirstlane is the same cost as a regular copy)

Thu, Sep 24, 10:07 AM · Restricted Project
arsenm accepted D88244: [AMDGPU] Split R600 and GCN bfi patterns.

LGTM, I've been meaning to do this. I do think some of these should be moved to combines and are missing hasOneUse checks though

Thu, Sep 24, 10:05 AM · Restricted Project
arsenm requested review of D88241: OpaquePtr: Add type to sret attribute.
Thu, Sep 24, 9:41 AM · Restricted Project
arsenm committed rGdc08185ca797: IR: Have byref imply dereferenceable (authored by arsenm).
IR: Have byref imply dereferenceable
Thu, Sep 24, 6:57 AM
arsenm closed D88165: IR: Have byref imply dereferenceable.

dc08185ca797a3bcd7721a0d55db876a6cc4de10

Thu, Sep 24, 6:57 AM · Restricted Project
arsenm committed rGd65a7003c435: OpaquePtr: Add helpers for sret to mirror byval (authored by arsenm).
OpaquePtr: Add helpers for sret to mirror byval
Thu, Sep 24, 6:57 AM
arsenm added inline comments to D88165: IR: Have byref imply dereferenceable.
Thu, Sep 24, 6:47 AM · Restricted Project
arsenm closed D88159: OpaquePtr: Add helpers for sret to mirror byval.

a07759d04ab0f44c462b362df4df4d9a3a2f2b89

Thu, Sep 24, 6:44 AM · Restricted Project
arsenm accepted D88206: [AMDGPU] Fix v3f16 handling for getresinfo.
Thu, Sep 24, 6:37 AM · Restricted Project

Wed, Sep 23

arsenm added inline comments to D88191: [AArch64][GlobalISel] Use custom legalization for G_TRUNC for v8i8 vectors.
Wed, Sep 23, 4:43 PM · Restricted Project
arsenm added inline comments to D88060: [GISel]: Few InsertVecElt combines.
Wed, Sep 23, 3:23 PM · Restricted Project
arsenm added inline comments to D88060: [GISel]: Few InsertVecElt combines.
Wed, Sep 23, 2:42 PM · Restricted Project
arsenm added inline comments to D84287: [SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops..
Wed, Sep 23, 1:53 PM · Restricted Project
arsenm requested changes to D84779: [AMDGPU] Add amdgpu specific loop threshold metadata.

With the current subject, I don't think this would have received the appropriate attention. I think it should be reposted with a more generic subject for people more familiar with loop metadata

Wed, Sep 23, 10:42 AM · Restricted Project
arsenm requested review of D88165: IR: Have byref imply dereferenceable.
Wed, Sep 23, 10:05 AM · Restricted Project
arsenm added a comment to D87674: [AMDGPU] Insert waitcnt after returning from call.

This could also just check the specific return opcodes (we could maybe even remove the isReturn from the shader pseudo-return)

Wed, Sep 23, 10:04 AM · Restricted Project
arsenm added a comment to D87674: [AMDGPU] Insert waitcnt after returning from call.

Yes, this commit is incorrect. It completely breaks code linking in Mesa OpenGL. s_waitcnt is required at the end of all global functions that return values.

Please revert. @nhaehnle

I don't understand why would it fail. This patch just moves s_waitcnt to the caller so they would be executed anyway. I think I am missing something. It would be helpful to root cause if we can isolate to a small test case.

Wed, Sep 23, 9:50 AM · Restricted Project
arsenm requested review of D88159: OpaquePtr: Add helpers for sret to mirror byval.
Wed, Sep 23, 9:23 AM · Restricted Project
arsenm accepted D85653: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH.
Wed, Sep 23, 6:10 AM · Restricted Project
arsenm closed D88031: GlobalISel: Fix truncating shift amount in trunc (shl) combine.

c463fd136ec259ec269ee6741763ce595811da71

Wed, Sep 23, 6:09 AM · Restricted Project
arsenm closed D87864: AMDGPU: Check global FP atomics match default FP mode.

af0207f2bae8578c5283877a786e502ce6e33b14

Wed, Sep 23, 6:09 AM · Restricted Project
arsenm added a comment to D87699: GlobalISel: Use unmerge when copying wide vectors to result registers.

ping

Wed, Sep 23, 6:08 AM · Restricted Project
arsenm committed rGc463fd136ec2: GlobalISel: Fix truncating shift amount in trunc (shl) combine (authored by arsenm).
GlobalISel: Fix truncating shift amount in trunc (shl) combine
Wed, Sep 23, 6:08 AM
arsenm committed rGaf0207f2bae8: AMDGPU: Check global FP atomics match default FP mode (authored by arsenm).
AMDGPU: Check global FP atomics match default FP mode
Wed, Sep 23, 6:08 AM
arsenm added inline comments to D88122: [GlobalISel] Add widenScalar support for G_CONCAT_VECTORS and use for legalization v2s32 G_ICMPs.
Wed, Sep 23, 5:53 AM · Restricted Project
arsenm added inline comments to D87748: [AMDGPU] Consider all SGPR uses as unique in constant bus verify.
Wed, Sep 23, 5:48 AM · Restricted Project
arsenm added inline comments to D88060: [GISel]: Few InsertVecElt combines.
Wed, Sep 23, 5:48 AM · Restricted Project
arsenm accepted D87744: [RegisterCoalescer] passs Undefs to extendToIndices().

LGTM with test converted to generated checks

Wed, Sep 23, 5:46 AM · Restricted Project
arsenm added inline comments to D88120: [GlobalISel] Add artifact combine for trunc(concat_vectors(a, ...) -> concat_vectors(trunc(a), ...).
Wed, Sep 23, 5:44 AM · Restricted Project
arsenm accepted D84287: [SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops..
Wed, Sep 23, 5:42 AM · Restricted Project

Tue, Sep 22

arsenm added inline comments to D88060: [GISel]: Few InsertVecElt combines.
Tue, Sep 22, 5:11 PM · Restricted Project
arsenm added inline comments to D88060: [GISel]: Few InsertVecElt combines.
Tue, Sep 22, 2:28 PM · Restricted Project
arsenm added inline comments to D88060: [GISel]: Few InsertVecElt combines.
Tue, Sep 22, 2:25 PM · Restricted Project
arsenm added inline comments to D87748: [AMDGPU] Consider all SGPR uses as unique in constant bus verify.
Tue, Sep 22, 10:31 AM · Restricted Project
arsenm added inline comments to D85653: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH.
Tue, Sep 22, 7:13 AM · Restricted Project
arsenm accepted D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Tue, Sep 22, 6:36 AM · Restricted Project
arsenm added inline comments to D85653: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH.
Tue, Sep 22, 6:19 AM · Restricted Project

Mon, Sep 21

arsenm committed rG6daddc213fe5: AMDGPU: Don't add frame register to frame pseudos (authored by arsenm).
AMDGPU: Don't add frame register to frame pseudos
Mon, Sep 21, 1:23 PM
arsenm closed D87934: AMDGPU: Don't add frame register to frame pseudos.

6daddc213fe56dccf1e88de61065c7fee09deccf

Mon, Sep 21, 1:22 PM · Restricted Project
arsenm added a reverting change for rGdbd53a1f0c93: Temporarily Revert "RegAllocFast: Rewrite and improve": rG55f9f87da2c2: Reapply Revert "RegAllocFast: Rewrite and improve".
Mon, Sep 21, 12:45 PM
arsenm committed rG55f9f87da2c2: Reapply Revert "RegAllocFast: Rewrite and improve" (authored by arsenm).
Reapply Revert "RegAllocFast: Rewrite and improve"
Mon, Sep 21, 12:45 PM
arsenm added a comment to D88028: [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector.

Can this appear later in the codegen? It also does not cover global isel, so part in the operand folding probably needs to remain in addition to patterns.

Mon, Sep 21, 10:02 AM · Restricted Project
arsenm requested review of D88031: GlobalISel: Fix truncating shift amount in trunc (shl) combine.
Mon, Sep 21, 9:36 AM · Restricted Project
arsenm added a comment to D88028: [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector.

All of this constant folding is really a DAG workaround

Mon, Sep 21, 8:46 AM · Restricted Project
arsenm added a comment to D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling.

I assume this makes 1f4e7463b5e3ff654c84371527767830e51db10d redundant?

Mon, Sep 21, 5:54 AM · Restricted Project, Restricted Project
arsenm added a reviewer for D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling: hliao.
Mon, Sep 21, 5:53 AM · Restricted Project, Restricted Project

Fri, Sep 18

arsenm accepted D87948: [LoopSimplifyCFG][NewPM] Rename simplify-cfg -> loop-simplifycfg.
Fri, Sep 18, 4:11 PM · Restricted Project
arsenm added inline comments to D87947: [AMDGPU] Make ds fp atomics overloadable.
Fri, Sep 18, 4:11 PM · Restricted Project, Restricted Project
arsenm added inline comments to D87936: [GISel] Add new combines for G_ADD.
Fri, Sep 18, 4:10 PM · Restricted Project
arsenm added inline comments to D87847: [AMDGPU] global-isel support for RT.
Fri, Sep 18, 4:09 PM · Restricted Project
arsenm added inline comments to D87947: [AMDGPU] Make ds fp atomics overloadable.
Fri, Sep 18, 4:07 PM · Restricted Project, Restricted Project
arsenm added inline comments to D87936: [GISel] Add new combines for G_ADD.
Fri, Sep 18, 1:46 PM · Restricted Project
arsenm added inline comments to D87870: [GISel] Add new combines for G_FMUL.
Fri, Sep 18, 1:38 PM · Restricted Project
arsenm requested review of D87934: AMDGPU: Don't add frame register to frame pseudos.
Fri, Sep 18, 1:06 PM · Restricted Project
arsenm added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

They serve different purposes. Cross-reg-bank COPY is purely for value propagation as it must be materialized. But, coalescable COPY (on compatible register classes) needs to satisfy architectural constraints on different instructions and need to find a better register assignment to remove it if possible. The former needs eliminating redundant ones but the later needs skipping by most optimizations without comprehensive target-specific info. They need to be treated very differently.

Fri, Sep 18, 12:19 PM
arsenm closed D52010: RegAllocFast: Rewrite and improve.

c8757ff3aa7dd7a25a6343f6ef74a70c7be04325

Fri, Sep 18, 11:08 AM · Restricted Project
arsenm closed D87760: CodeGen: Move split block utility to MachineBasicBlock.

3105d0f84bfa6b765bb88cbf090f557e588764ea

Fri, Sep 18, 11:08 AM · Restricted Project
arsenm committed rG3105d0f84bfa: CodeGen: Move split block utility to MachineBasicBlock (authored by arsenm).
CodeGen: Move split block utility to MachineBasicBlock
Fri, Sep 18, 11:06 AM
arsenm committed rGc8757ff3aa7d: RegAllocFast: Rewrite and improve (authored by arsenm).
RegAllocFast: Rewrite and improve
Fri, Sep 18, 11:06 AM
arsenm added a reverting change for rGa21387c65470: Revert "RegAllocFast: Record internal state based on register units": rG870fd53e4f63: Reapply "RegAllocFast: Record internal state based on register units".
Fri, Sep 18, 11:06 AM
arsenm committed rG870fd53e4f63: Reapply "RegAllocFast: Record internal state based on register units" (authored by arsenm).
Reapply "RegAllocFast: Record internal state based on register units"
Fri, Sep 18, 11:05 AM
arsenm closed D87542: AMDGPU: Don't sometimes allow instructions before lowered si_end_cf.

0576f436e577cede25810729aef236ec8c649446

Fri, Sep 18, 10:43 AM · Restricted Project
arsenm committed rG0576f436e577: AMDGPU: Don't sometimes allow instructions before lowered si_end_cf (authored by arsenm).
AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Fri, Sep 18, 10:43 AM
arsenm added a comment to D87882: [AMDGPU] Fix merging m0 inits.

About a year ago I was working on eliminating this code completely but I guess I won't be getting back to this any time soon

Fri, Sep 18, 10:33 AM · Restricted Project
arsenm added inline comments to D87903: [CSInfo][GlobalISel] CallSiteInfo support when using GlobalISel.
Fri, Sep 18, 10:30 AM · debug-info, Restricted Project
arsenm added inline comments to D87903: [CSInfo][GlobalISel] CallSiteInfo support when using GlobalISel.
Fri, Sep 18, 10:29 AM · debug-info, Restricted Project
arsenm added inline comments to D86294: AMDGPU/GlobalISel: Add tablegen operator that looks through copies.
Fri, Sep 18, 10:15 AM · Restricted Project
arsenm added reviewers for D86294: AMDGPU/GlobalISel: Add tablegen operator that looks through copies: dsanders, paquette, aemerson, aditya_nandakumar.
Fri, Sep 18, 10:15 AM · Restricted Project
arsenm added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

That would be too late for MachineCSE and other optimization to remove the redundant COPYs and reduce the register usage. Moving that after RA won't reduce register pressure.

So you are working around something MahcineCSE isn't doing on copies. You could just have MachineCSE do this for these copies

MachineCSE and other optimizations are designed not to handle that target-independent COPY or like, which is added with the intention that, potentially, the source and destination operands are coalesced and that COPY is removed finally. As SGPR and VGPR are different register banks and won't be coalesced anyway, native instruction should be used instead.

This does not make sense, COPY is what generic optimizations do understand. If it's useful to CSE cross bank copies, MachineCSE should handle them. As a representational choice, not-copy is worse than copy

A target-independent cross-bank COPY is definitely useful but the current COPY should not be used for that purpose considering how it's used in RA-related passes, especially only architectural constraints are changed between the source and destination operands where the propagation should be stopped.

Fri, Sep 18, 10:13 AM
arsenm added inline comments to D85653: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH.
Fri, Sep 18, 9:40 AM · Restricted Project
arsenm added a comment to D87864: AMDGPU: Check global FP atomics match default FP mode.

I think you need to drop denorm checks and move the check outside of the address space check.

Fri, Sep 18, 8:55 AM · Restricted Project
arsenm added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

That would be too late for MachineCSE and other optimization to remove the redundant COPYs and reduce the register usage. Moving that after RA won't reduce register pressure.

So you are working around something MahcineCSE isn't doing on copies. You could just have MachineCSE do this for these copies

MachineCSE and other optimizations are designed not to handle that target-independent COPY or like, which is added with the intention that, potentially, the source and destination operands are coalesced and that COPY is removed finally. As SGPR and VGPR are different register banks and won't be coalesced anyway, native instruction should be used instead.

This does not make sense, COPY is what generic optimizations do understand. If it's useful to CSE cross bank copies, MachineCSE should handle them. As a representational choice, not-copy is worse than copy

I ran into this before and apparently peephole optimizer should handle this: https://groups.google.com/g/llvm-dev/c/a4jKBqCJIDM/m/BRu3sWopBAAJ

Fri, Sep 18, 7:48 AM
arsenm added a comment to D87542: AMDGPU: Don't sometimes allow instructions before lowered si_end_cf.

ping

Fri, Sep 18, 7:04 AM · Restricted Project
arsenm updated the diff for D87864: AMDGPU: Check global FP atomics match default FP mode.

Add comment

Fri, Sep 18, 7:02 AM · Restricted Project
arsenm committed rG751a6c5760b8: IR: Move denormal mode parsing from MachineFunction to Function (authored by arsenm).
IR: Move denormal mode parsing from MachineFunction to Function
Fri, Sep 18, 6:57 AM
arsenm closed D87866: IR: Move denormal mode parsing from MachineFunction to Function.

751a6c5760b8de591cf241effbdad1b1cae67814

Fri, Sep 18, 6:56 AM · Restricted Project
arsenm added a comment to D87902: [GlobalISel] Fix enumeration of entry basic blocks when using GlobalISel.

Why is a new entry block added in the first place?

Fri, Sep 18, 6:52 AM · debug-info, Restricted Project
arsenm committed rG05c02eda4552: emacs: Add nofree and willreturn to list of attributes (authored by arsenm).
emacs: Add nofree and willreturn to list of attributes
Fri, Sep 18, 6:49 AM
arsenm added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

I've reverted this in 27df1652709ba83d6b07f313297e7c796e36dce1

Fri, Sep 18, 6:49 AM
arsenm added a reverting change for rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.: rG27df1652709b: Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.".
Fri, Sep 18, 6:49 AM
arsenm committed rG27df1652709b: Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." (authored by arsenm).
Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel."
Fri, Sep 18, 6:49 AM
arsenm added a reverting change for D87556: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.: rG27df1652709b: Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.".
Fri, Sep 18, 6:49 AM · Restricted Project
arsenm added inline comments to D87866: IR: Move denormal mode parsing from MachineFunction to Function.
Fri, Sep 18, 6:16 AM · Restricted Project