Page MenuHomePhabricator

arsenm (Matt Arsenault)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 5 2012, 4:53 PM (314 w, 3 h)

Recent Activity

Today

arsenm added inline comments to D55568: AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing..
Wed, Dec 12, 7:21 PM
arsenm added a reviewer for D54615: RegisterCoalescer: Defer clearing implicit_def lanes: wmi.
Wed, Dec 12, 7:18 PM
arsenm added a reviewer for D54551: RegisterCoalescer: Assume CR_Replace for SubRangeJoin: wmi.
Wed, Dec 12, 7:17 PM
arsenm added inline comments to D55568: AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing..
Wed, Dec 12, 7:16 PM
arsenm created D55639: GlobalISel: Allow shift amount to be a different type.
Wed, Dec 12, 7:15 PM
arsenm added inline comments to D55570: [AMDGPU] Improve SDWA generation for V_OR_B32_E32..
Wed, Dec 12, 6:58 PM
arsenm accepted D55474: [AMDGPU] Extend constant folding for logical operations.

LGTM with another minor round of test reduction

Wed, Dec 12, 6:57 PM
arsenm created D55637: AMDGPU: Legalize/regbankselect frame_index.
Wed, Dec 12, 6:43 PM
arsenm created D55636: AMDGPU/GlobalISel: Legalize/regbankselect block_addr.
Wed, Dec 12, 6:42 PM
arsenm created D55635: AMDGPU: Legalize/regbankselect fma.
Wed, Dec 12, 6:42 PM
arsenm created D55634: AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub.
Wed, Dec 12, 6:41 PM
arsenm created D55633: AMDGPU/GlobalISel: Legalize f64 fadd/fmul.
Wed, Dec 12, 6:41 PM
arsenm created D55632: AMDGPU/GlobalISel: RegBankSelect some simple operations.
Wed, Dec 12, 6:41 PM
arsenm accepted D55402: [AMDGPU] Simplify negated condition.

LGTM. I think this is OK, but I think there still might be a more proper way to use the live intervals

Wed, Dec 12, 6:36 PM
arsenm added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Can you try implementing the other approach first, and then applying this on top of it to show the difference more clearly?

Wed, Dec 12, 6:33 PM

Yesterday

arsenm added inline comments to D55402: [AMDGPU] Simplify negated condition.
Tue, Dec 11, 1:10 PM
arsenm added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Why aren't these matched in the first place? These shouldn't have gotten this far

The offsets are promoted to immediate during the instruction selection if they are 13bit which is the allowed size for globals. This patch is trying to find a 13bit offset from base-address of the nearby instructions by recomputing the relative offset from nearby base address. So, it's more of a global optimization where it needs to traverse the whole program (currently it's limited to the basic block). The optimization also caches all the information to save the compile time. Are you suggesting to do this optimization during instruction selection phase? Wouldn't it be too complex to do during instruction selection?

Tue, Dec 11, 10:34 AM

Mon, Dec 10

arsenm added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Why aren't these matched in the first place? These shouldn't have gotten this far

Mon, Dec 10, 9:20 PM
arsenm committed rC348809: Update test for instcombine change.
Update test for instcombine change
Mon, Dec 10, 3:06 PM
arsenm committed rL348809: Update test for instcombine change.
Update test for instcombine change
Mon, Dec 10, 3:06 PM
arsenm closed D55182: InstCombine: Scalarize single use icmp/fcmp.

r348801

Mon, Dec 10, 1:54 PM
arsenm committed rL348801: InstCombine: Scalarize single use icmp/fcmp.
InstCombine: Scalarize single use icmp/fcmp
Mon, Dec 10, 1:54 PM
arsenm accepted D50200: AMDGPU: Handle "uniform-work-group-size" attribute.

LGTM with the long line fixed

Mon, Dec 10, 12:17 PM · Restricted Project
arsenm added a comment to D54551: RegisterCoalescer: Assume CR_Replace for SubRangeJoin.

ping

Mon, Dec 10, 10:43 AM
arsenm added a comment to D54615: RegisterCoalescer: Defer clearing implicit_def lanes.

ping

Mon, Dec 10, 10:42 AM

Sun, Dec 9

arsenm added a comment to D55435: [AMDGPU] Fix discarded result of addAttribute.

The $ is supposed to be for check variables that are used globally and not cleared on a -LABEL

Sun, Dec 9, 8:21 AM
arsenm added a comment to D55435: [AMDGPU] Fix discarded result of addAttribute.

The $ is supposed to be for check variables that are used globally and not cleared on a -LABEL

Sun, Dec 9, 8:13 AM

Sat, Dec 8

arsenm added inline comments to D55474: [AMDGPU] Extend constant folding for logical operations.
Sat, Dec 8, 12:21 PM

Fri, Dec 7

arsenm accepted D55369: AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1.

LGTM

Fri, Dec 7, 2:18 PM
arsenm added a comment to D55241: AMDGPU: Should always start from the first register in VGPR indexing..

No, it's not committed. Variable + constant is a common case in general.

Fri, Dec 7, 2:18 PM
arsenm committed rL348658: AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments.
AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments
Fri, Dec 7, 2:15 PM
arsenm closed D55453: AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments.

r348658

Fri, Dec 7, 2:15 PM
arsenm committed rL348643: AMDGPU: Use gfx9 instead of gfx8 in a test.
AMDGPU: Use gfx9 instead of gfx8 in a test
Fri, Dec 7, 1:01 PM
arsenm created D55453: AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments.
Fri, Dec 7, 12:55 PM
arsenm added inline comments to D55402: [AMDGPU] Simplify negated condition.
Fri, Dec 7, 11:21 AM
arsenm added a comment to D55179: AMDGPU: Remove v16i8 from register classes.

Can we also remove the setTruncStoreAction mention of v16i8 from SIISelLowering?

Fri, Dec 7, 10:48 AM
arsenm closed D55180: AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load.

r348625

Fri, Dec 7, 10:44 AM
arsenm committed rL348625: AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load.
AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load
Fri, Dec 7, 10:44 AM
arsenm committed rL348619: AMDGPU: Remove llvm.SI.tbuffer.store.
AMDGPU: Remove llvm.SI.tbuffer.store
Fri, Dec 7, 10:06 AM
arsenm closed D55177: AMDGPU: Remove llvm.SI.tbuffer.store.

r348619

Fri, Dec 7, 10:06 AM
arsenm closed D55176: AMDGPU: Remove llvm.SI.buffer.load.dword.

r348616

Fri, Dec 7, 9:49 AM
arsenm committed rL348616: AMDGPU: Remove llvm.SI.buffer.load.dword.
AMDGPU: Remove llvm.SI.buffer.load.dword
Fri, Dec 7, 9:49 AM
arsenm committed rL348615: AMDGPU: Remove llvm.AMDGPU.kill.
AMDGPU: Remove llvm.AMDGPU.kill
Fri, Dec 7, 9:49 AM
arsenm closed D55175: AMDGPU: Remove llvm.AMDGPU.kill.

r348615

Fri, Dec 7, 9:49 AM

Thu, Dec 6

arsenm updated the diff for D53965: IR: Add fp operations to atomicrmw.

Move AtomicExpand change, make targets default to cmpxchg

Thu, Dec 6, 7:28 PM
arsenm added inline comments to D55402: [AMDGPU] Simplify negated condition.
Thu, Dec 6, 7:13 PM
arsenm accepted D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions.

LGTM

Thu, Dec 6, 6:55 PM
arsenm added inline comments to D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions.
Thu, Dec 6, 6:47 PM
arsenm added inline comments to D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions.
Thu, Dec 6, 12:19 PM
arsenm added inline comments to D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions.
Thu, Dec 6, 11:49 AM
arsenm added a comment to D55150: Emit warnings from the driver for use of -mllvm or -Xclang options..

I don't really see the point of this and think it will just be an inconvenience to llvm developers.

Thu, Dec 6, 10:53 AM
arsenm added inline comments to D53965: IR: Add fp operations to atomicrmw.
Thu, Dec 6, 10:14 AM
arsenm added inline comments to D53965: IR: Add fp operations to atomicrmw.
Thu, Dec 6, 10:02 AM
arsenm updated the diff for D52416: Allow FP types for atomicrmw xchg.

Add requested comment

Thu, Dec 6, 9:55 AM
arsenm added a comment to D52416: Allow FP types for atomicrmw xchg.

Looks reasonable.

Perhaps add support for pointer types too? Supporting the same set of types for load, store, xchg, and cmpxchg (eventually) would seem sensible.

Thu, Dec 6, 9:54 AM
arsenm added inline comments to D52416: Allow FP types for atomicrmw xchg.
Thu, Dec 6, 9:50 AM
arsenm added a comment to D52416: Allow FP types for atomicrmw xchg.

xxz

Thu, Dec 6, 9:44 AM
arsenm added a comment to D55301: RegAlloc: Allow targets to split register allocation.

Hi Matt,

Have you tried to use combined V+S register classes?
By describing such classes, when a S or V register would be split, they would eventually have constraints in that "super" class. Thus, inside of spilling, the splitting mechanism would naturally insert copies of the form [V|S] = copy V+S or V+S = copy [V|S], which seem to be what you are trying to achieve. The advantage of such approach is that we would not have to effectively split the allocation.

Cheers,
-Quentin

Thu, Dec 6, 9:24 AM
arsenm added inline comments to D52416: Allow FP types for atomicrmw xchg.
Thu, Dec 6, 9:05 AM
arsenm added a comment to D55369: AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1.

I thought mesa was moving to stop using the relocations at all for this?

Thu, Dec 6, 9:03 AM

Wed, Dec 5

arsenm committed rL348456: InstCombine: Add some missing tests for scalarization.
InstCombine: Add some missing tests for scalarization
Wed, Dec 5, 7:37 PM
arsenm updated the diff for D55182: InstCombine: Scalarize single use icmp/fcmp.

Added more tests in r348456, use match

Wed, Dec 5, 7:37 PM
arsenm added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

I think if we can just declare something simple to follow that doesn't depend on the IR type alignment, we could pack any basic type and align any aggregates to 4

Wed, Dec 5, 2:36 PM
arsenm added inline comments to D52416: Allow FP types for atomicrmw xchg.
Wed, Dec 5, 12:51 PM
arsenm committed rL348385: AMDGPU: Fix using old address spaces in some tests.
AMDGPU: Fix using old address spaces in some tests
Wed, Dec 5, 9:37 AM
arsenm added a comment to D52416: Allow FP types for atomicrmw xchg.

ping

Wed, Dec 5, 9:25 AM
arsenm added a parent revision for D55333: VirtRegMap: Preserve LiveDebugVariables: D55301: RegAlloc: Allow targets to split register allocation.
Wed, Dec 5, 9:06 AM
arsenm added a child revision for D55301: RegAlloc: Allow targets to split register allocation: D55333: VirtRegMap: Preserve LiveDebugVariables.
Wed, Dec 5, 9:06 AM
arsenm created D55333: VirtRegMap: Preserve LiveDebugVariables.
Wed, Dec 5, 9:06 AM
arsenm accepted D55314: AMDGPU: Turn on the DPP combiner by default.

LGTM. Probably should remove the flag from the dpp_combine test

Wed, Dec 5, 7:13 AM

Tue, Dec 4

arsenm added parent revisions for D55301: RegAlloc: Allow targets to split register allocation: D55295: LiveIntervals: Add removePhysReg, D55238: MIR: Preserve incoming frame index numbers, D55287: VirtRegMap: Support partially allocated virtual registers, D55285: AMDGPU: Scavenge register instead of findUnusedReg, D55286: VirtRegMap: Add pass option to not clear virt regs, D55284: RegisterScavenger: Allow fail without spill, D55283: CodeGen: Refactor regallocator command line and target selection, D55282: CodeGen: Make RegAllocRegistry a template class.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55282: CodeGen: Make RegAllocRegistry a template class: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55284: RegisterScavenger: Allow fail without spill: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55283: CodeGen: Refactor regallocator command line and target selection: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55286: VirtRegMap: Add pass option to not clear virt regs: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55285: AMDGPU: Scavenge register instead of findUnusedReg: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55295: LiveIntervals: Add removePhysReg: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55238: MIR: Preserve incoming frame index numbers: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55287: VirtRegMap: Support partially allocated virtual registers: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm created D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:24 PM
arsenm accepted D55102: [MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM.

LGTM

Tue, Dec 4, 2:21 PM
arsenm created D55295: LiveIntervals: Add removePhysReg.
Tue, Dec 4, 1:38 PM
arsenm updated the diff for D55238: MIR: Preserve incoming frame index numbers.

Attach correct patch

Tue, Dec 4, 12:35 PM
arsenm added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

I understand that it's copied into a properly-aligned local variable, but if it affects how the function is called, that's also part of the ABI, and it should be taken from the C alignment, not any vagaries of how struct types are translated into IR.

The other reason to stick with the C alignment value is that it's actually stable across compiler reasons, whereas IRGen does not promise to use the same IR type for a struct across compiler versions.

Now, you're a GPU compiler, so maybe you don't have any ABI requirements, or maybe they're weaker for kernel functions, in which case this is basically irrelevant. But if you do care about compatibility here, you should be aiming to communicate the alignment of this parameter correctly.

This is part of why we largely try to avoid using arbitrary first-class aggregate as parameters in IR.

byval isn't suitable for the purposes of the entry point here, so I don't see what the alternative is with the current constraints. We've defined the ABI input buffer as just the ABI allocation size/alignment of the IR type in order. Clang needs to get this to match what it wants.

What I have said, repeatedly, is that that is not a good ABI rule because the alignment of the IR type is not guaranteed to be meaningful or even stable. Maybe there are good reasons why you don't care about having a stable ABI, but I need you to at least acknowledge that that's what you're doing, because otherwise it is hard for me to approve a patch that I know has this problem.

Tue, Dec 4, 11:57 AM
arsenm updated the diff for D55182: InstCombine: Scalarize single use icmp/fcmp.

Rearrange code, apply on top of existing tests

Tue, Dec 4, 11:08 AM
arsenm added a comment to D55182: InstCombine: Scalarize single use icmp/fcmp.

I think this is the right change for IR (I'm trying to do something similar with D50992 but that's been held up while we try to get the backend in shape). .

  1. I see that the new code is copying the existing handling of binops around line 315, but can we update this to use 'match()' and reduce the duplication?
  2. There should be a test that shows the benefit of using 'isCheapToScalarize()'. IIUC, that would involve a sequence with multiple binops and/or compares between an insertelement and the trailing extract.
  3. Please commit the tests with baseline checks as a preliminary commit, so we only show the diffs in this patch.
Tue, Dec 4, 11:07 AM
arsenm committed rL348290: Move llc-start-stop-instance to x86.
Move llc-start-stop-instance to x86
Tue, Dec 4, 10:22 AM
arsenm created D55287: VirtRegMap: Support partially allocated virtual registers.
Tue, Dec 4, 10:10 AM
arsenm created D55286: VirtRegMap: Add pass option to not clear virt regs.
Tue, Dec 4, 10:04 AM
arsenm added a parent revision for D55285: AMDGPU: Scavenge register instead of findUnusedReg: D55284: RegisterScavenger: Allow fail without spill.
Tue, Dec 4, 10:02 AM
arsenm added a child revision for D55284: RegisterScavenger: Allow fail without spill: D55285: AMDGPU: Scavenge register instead of findUnusedReg.
Tue, Dec 4, 10:02 AM
arsenm created D55285: AMDGPU: Scavenge register instead of findUnusedReg.
Tue, Dec 4, 10:02 AM
arsenm created D55284: RegisterScavenger: Allow fail without spill.
Tue, Dec 4, 10:01 AM
arsenm created D55283: CodeGen: Refactor regallocator command line and target selection.
Tue, Dec 4, 10:00 AM
arsenm created D55282: CodeGen: Make RegAllocRegistry a template class.
Tue, Dec 4, 9:58 AM
arsenm closed D55178: AMDGPU: Add f32 vectors to SGPR register classes.

r348286

Tue, Dec 4, 9:54 AM
arsenm committed rL348286: AMDGPU: Add f32 vectors to SGPR register classes.
AMDGPU: Add f32 vectors to SGPR register classes
Tue, Dec 4, 9:54 AM
arsenm committed rL348285: MIR: Add method to stop after specific runs of passes.
MIR: Add method to stop after specific runs of passes
Tue, Dec 4, 9:48 AM
arsenm closed D55082: WIP: Try adding method to stop after specific runs of passes.

r348285

Tue, Dec 4, 9:48 AM
arsenm added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

I understand that it's copied into a properly-aligned local variable, but if it affects how the function is called, that's also part of the ABI, and it should be taken from the C alignment, not any vagaries of how struct types are translated into IR.

The other reason to stick with the C alignment value is that it's actually stable across compiler reasons, whereas IRGen does not promise to use the same IR type for a struct across compiler versions.

Now, you're a GPU compiler, so maybe you don't have any ABI requirements, or maybe they're weaker for kernel functions, in which case this is basically irrelevant. But if you do care about compatibility here, you should be aiming to communicate the alignment of this parameter correctly.

This is part of why we largely try to avoid using arbitrary first-class aggregate as parameters in IR.

Tue, Dec 4, 9:43 AM