Page MenuHomePhabricator
Feed Advanced Search

Yesterday

arsenm added a parent revision for D55695: AMDGPU/GlobalISel: RegBankSelect for add/sub with carry out: D55693: AMDGPU/GlobalISel: Fix ValueMapping tables for i1.
Fri, Dec 14, 12:36 AM
arsenm added a child revision for D55693: AMDGPU/GlobalISel: Fix ValueMapping tables for i1: D55695: AMDGPU/GlobalISel: RegBankSelect for add/sub with carry out.
Fri, Dec 14, 12:36 AM
arsenm created D55695: AMDGPU/GlobalISel: RegBankSelect for add/sub with carry out.
Fri, Dec 14, 12:36 AM
arsenm added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Can you try implementing the other approach first, and then applying this on top of it to show the difference more clearly?

Sure, we can implement the other approach. But it will take some time.

And, I am not seeing any reason why this one needs to wait on the other one. If the benefit of this approach is not clear to you, I can give you an example extracted from a real world application. This approach takes into account of available base computations, therefore, can optimize away more computations.

Fri, Dec 14, 12:33 AM
arsenm created D55694: GlobalISel: Improve crash on invalid mapping.
Fri, Dec 14, 12:24 AM
arsenm added inline comments to D55693: AMDGPU/GlobalISel: Fix ValueMapping tables for i1.
Fri, Dec 14, 12:17 AM
arsenm created D55693: AMDGPU/GlobalISel: Fix ValueMapping tables for i1.
Fri, Dec 14, 12:16 AM

Thu, Dec 13

arsenm added a comment to D55652: [RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registers.

LGTM

Thu, Dec 13, 1:20 PM
arsenm committed rL349081: AMDGPU/GlobalISel: Legalize/regbankselect block_addr.
AMDGPU/GlobalISel: Legalize/regbankselect block_addr
Thu, Dec 13, 12:37 PM
arsenm closed D55636: AMDGPU/GlobalISel: Legalize/regbankselect block_addr.

r349081

Thu, Dec 13, 12:37 PM
arsenm updated the diff for D55634: AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub.

Fix missing copy in test

Thu, Dec 13, 12:17 PM
arsenm committed rL349014: AMDGPU/GlobalISel: Legalize f64 fadd/fmul.
AMDGPU/GlobalISel: Legalize f64 fadd/fmul
Thu, Dec 13, 12:31 AM
arsenm closed D55633: AMDGPU/GlobalISel: Legalize f64 fadd/fmul.

r349014

Thu, Dec 13, 12:30 AM
arsenm committed rL349013: Fix missing C++ mode comment in header.
Fix missing C++ mode comment in header
Thu, Dec 13, 12:27 AM
arsenm committed rL349012: AMDGPU/GlobalISel: RegBankSelect some simple operations.
AMDGPU/GlobalISel: RegBankSelect some simple operations
Thu, Dec 13, 12:27 AM
arsenm closed D55632: AMDGPU/GlobalISel: RegBankSelect some simple operations.

r349012

Thu, Dec 13, 12:27 AM
arsenm committed rL349011: AMDGPU/GlobalISel: Test cleanups.
AMDGPU/GlobalISel: Test cleanups
Thu, Dec 13, 12:15 AM

Wed, Dec 12

arsenm added a parent revision for D55645: AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shifts: D55639: GlobalISel: Allow shift amount to be a different type.
Wed, Dec 12, 11:12 PM
arsenm created D55645: AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shifts.
Wed, Dec 12, 11:12 PM
arsenm added inline comments to D55568: AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing..
Wed, Dec 12, 7:21 PM
arsenm added a reviewer for D54615: RegisterCoalescer: Defer clearing implicit_def lanes: wmi.
Wed, Dec 12, 7:18 PM
arsenm added a reviewer for D54551: RegisterCoalescer: Assume CR_Replace for SubRangeJoin: wmi.
Wed, Dec 12, 7:17 PM
arsenm added inline comments to D55568: AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing..
Wed, Dec 12, 7:16 PM
arsenm created D55639: GlobalISel: Allow shift amount to be a different type.
Wed, Dec 12, 7:15 PM
arsenm added inline comments to D55570: [AMDGPU] Improve SDWA generation for V_OR_B32_E32..
Wed, Dec 12, 6:58 PM
arsenm accepted D55474: [AMDGPU] Extend constant folding for logical operations.

LGTM with another minor round of test reduction

Wed, Dec 12, 6:57 PM
arsenm created D55637: AMDGPU: Legalize/regbankselect frame_index.
Wed, Dec 12, 6:43 PM
arsenm created D55636: AMDGPU/GlobalISel: Legalize/regbankselect block_addr.
Wed, Dec 12, 6:42 PM
arsenm created D55635: AMDGPU: Legalize/regbankselect fma.
Wed, Dec 12, 6:42 PM
arsenm created D55634: AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub.
Wed, Dec 12, 6:41 PM
arsenm created D55633: AMDGPU/GlobalISel: Legalize f64 fadd/fmul.
Wed, Dec 12, 6:41 PM
arsenm created D55632: AMDGPU/GlobalISel: RegBankSelect some simple operations.
Wed, Dec 12, 6:41 PM
arsenm accepted D55402: [AMDGPU] Simplify negated condition.

LGTM. I think this is OK, but I think there still might be a more proper way to use the live intervals

Wed, Dec 12, 6:36 PM
arsenm added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Can you try implementing the other approach first, and then applying this on top of it to show the difference more clearly?

Wed, Dec 12, 6:33 PM

Tue, Dec 11

arsenm added inline comments to D55402: [AMDGPU] Simplify negated condition.
Tue, Dec 11, 1:10 PM
arsenm added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Why aren't these matched in the first place? These shouldn't have gotten this far

The offsets are promoted to immediate during the instruction selection if they are 13bit which is the allowed size for globals. This patch is trying to find a 13bit offset from base-address of the nearby instructions by recomputing the relative offset from nearby base address. So, it's more of a global optimization where it needs to traverse the whole program (currently it's limited to the basic block). The optimization also caches all the information to save the compile time. Are you suggesting to do this optimization during instruction selection phase? Wouldn't it be too complex to do during instruction selection?

Tue, Dec 11, 10:34 AM

Mon, Dec 10

arsenm added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Why aren't these matched in the first place? These shouldn't have gotten this far

Mon, Dec 10, 9:20 PM
arsenm committed rC348809: Update test for instcombine change.
Update test for instcombine change
Mon, Dec 10, 3:06 PM
arsenm committed rL348809: Update test for instcombine change.
Update test for instcombine change
Mon, Dec 10, 3:06 PM
arsenm closed D55182: InstCombine: Scalarize single use icmp/fcmp.

r348801

Mon, Dec 10, 1:54 PM
arsenm committed rL348801: InstCombine: Scalarize single use icmp/fcmp.
InstCombine: Scalarize single use icmp/fcmp
Mon, Dec 10, 1:54 PM
arsenm accepted D50200: AMDGPU: Handle "uniform-work-group-size" attribute.

LGTM with the long line fixed

Mon, Dec 10, 12:17 PM ยท Restricted Project
arsenm added a comment to D54551: RegisterCoalescer: Assume CR_Replace for SubRangeJoin.

ping

Mon, Dec 10, 10:43 AM
arsenm added a comment to D54615: RegisterCoalescer: Defer clearing implicit_def lanes.

ping

Mon, Dec 10, 10:42 AM

Sun, Dec 9

arsenm added a comment to D55435: [AMDGPU] Fix discarded result of addAttribute.

The $ is supposed to be for check variables that are used globally and not cleared on a -LABEL

Sun, Dec 9, 8:21 AM
arsenm added a comment to D55435: [AMDGPU] Fix discarded result of addAttribute.

The $ is supposed to be for check variables that are used globally and not cleared on a -LABEL

Sun, Dec 9, 8:13 AM

Sat, Dec 8

arsenm added inline comments to D55474: [AMDGPU] Extend constant folding for logical operations.
Sat, Dec 8, 12:21 PM

Fri, Dec 7

arsenm accepted D55369: AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1.

LGTM

Fri, Dec 7, 2:18 PM
arsenm added a comment to D55241: AMDGPU: Should always start from the first register in VGPR indexing..

No, it's not committed. Variable + constant is a common case in general.

Fri, Dec 7, 2:18 PM
arsenm committed rL348658: AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments.
AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments
Fri, Dec 7, 2:15 PM
arsenm closed D55453: AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments.

r348658

Fri, Dec 7, 2:15 PM
arsenm committed rL348643: AMDGPU: Use gfx9 instead of gfx8 in a test.
AMDGPU: Use gfx9 instead of gfx8 in a test
Fri, Dec 7, 1:01 PM
arsenm created D55453: AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments.
Fri, Dec 7, 12:55 PM
arsenm added inline comments to D55402: [AMDGPU] Simplify negated condition.
Fri, Dec 7, 11:21 AM
arsenm added a comment to D55179: AMDGPU: Remove v16i8 from register classes.

Can we also remove the setTruncStoreAction mention of v16i8 from SIISelLowering?

Fri, Dec 7, 10:48 AM
arsenm closed D55180: AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load.

r348625

Fri, Dec 7, 10:44 AM
arsenm committed rL348625: AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load.
AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load
Fri, Dec 7, 10:44 AM
arsenm committed rL348619: AMDGPU: Remove llvm.SI.tbuffer.store.
AMDGPU: Remove llvm.SI.tbuffer.store
Fri, Dec 7, 10:06 AM
arsenm closed D55177: AMDGPU: Remove llvm.SI.tbuffer.store.

r348619

Fri, Dec 7, 10:06 AM
arsenm closed D55176: AMDGPU: Remove llvm.SI.buffer.load.dword.

r348616

Fri, Dec 7, 9:49 AM
arsenm committed rL348616: AMDGPU: Remove llvm.SI.buffer.load.dword.
AMDGPU: Remove llvm.SI.buffer.load.dword
Fri, Dec 7, 9:49 AM
arsenm committed rL348615: AMDGPU: Remove llvm.AMDGPU.kill.
AMDGPU: Remove llvm.AMDGPU.kill
Fri, Dec 7, 9:49 AM
arsenm closed D55175: AMDGPU: Remove llvm.AMDGPU.kill.

r348615

Fri, Dec 7, 9:49 AM

Thu, Dec 6

arsenm updated the diff for D53965: IR: Add fp operations to atomicrmw.

Move AtomicExpand change, make targets default to cmpxchg

Thu, Dec 6, 7:28 PM
arsenm added inline comments to D55402: [AMDGPU] Simplify negated condition.
Thu, Dec 6, 7:13 PM
arsenm accepted D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions.

LGTM

Thu, Dec 6, 6:55 PM
arsenm added inline comments to D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions.
Thu, Dec 6, 6:47 PM
arsenm added inline comments to D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions.
Thu, Dec 6, 12:19 PM
arsenm added inline comments to D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions.
Thu, Dec 6, 11:49 AM
arsenm added a comment to D55150: Emit warnings from the driver for use of -mllvm or -Xclang options..

I don't really see the point of this and think it will just be an inconvenience to llvm developers.

Thu, Dec 6, 10:53 AM
arsenm added inline comments to D53965: IR: Add fp operations to atomicrmw.
Thu, Dec 6, 10:14 AM
arsenm added inline comments to D53965: IR: Add fp operations to atomicrmw.
Thu, Dec 6, 10:02 AM
arsenm updated the diff for D52416: Allow FP types for atomicrmw xchg.

Add requested comment

Thu, Dec 6, 9:55 AM
arsenm added a comment to D52416: Allow FP types for atomicrmw xchg.

Looks reasonable.

Perhaps add support for pointer types too? Supporting the same set of types for load, store, xchg, and cmpxchg (eventually) would seem sensible.

Thu, Dec 6, 9:54 AM
arsenm added inline comments to D52416: Allow FP types for atomicrmw xchg.
Thu, Dec 6, 9:50 AM
arsenm added a comment to D52416: Allow FP types for atomicrmw xchg.

xxz

Thu, Dec 6, 9:44 AM
arsenm added a comment to D55301: RegAlloc: Allow targets to split register allocation.

Hi Matt,

Have you tried to use combined V+S register classes?
By describing such classes, when a S or V register would be split, they would eventually have constraints in that "super" class. Thus, inside of spilling, the splitting mechanism would naturally insert copies of the form [V|S] = copy V+S or V+S = copy [V|S], which seem to be what you are trying to achieve. The advantage of such approach is that we would not have to effectively split the allocation.

Cheers,
-Quentin

Thu, Dec 6, 9:24 AM
arsenm added inline comments to D52416: Allow FP types for atomicrmw xchg.
Thu, Dec 6, 9:05 AM
arsenm added a comment to D55369: AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1.

I thought mesa was moving to stop using the relocations at all for this?

Thu, Dec 6, 9:03 AM

Wed, Dec 5

arsenm committed rL348456: InstCombine: Add some missing tests for scalarization.
InstCombine: Add some missing tests for scalarization
Wed, Dec 5, 7:37 PM
arsenm updated the diff for D55182: InstCombine: Scalarize single use icmp/fcmp.

Added more tests in r348456, use match

Wed, Dec 5, 7:37 PM
arsenm added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

I think if we can just declare something simple to follow that doesn't depend on the IR type alignment, we could pack any basic type and align any aggregates to 4

Wed, Dec 5, 2:36 PM
arsenm added inline comments to D52416: Allow FP types for atomicrmw xchg.
Wed, Dec 5, 12:51 PM
arsenm committed rL348385: AMDGPU: Fix using old address spaces in some tests.
AMDGPU: Fix using old address spaces in some tests
Wed, Dec 5, 9:37 AM
arsenm added a comment to D52416: Allow FP types for atomicrmw xchg.

ping

Wed, Dec 5, 9:25 AM
arsenm added a parent revision for D55333: VirtRegMap: Preserve LiveDebugVariables: D55301: RegAlloc: Allow targets to split register allocation.
Wed, Dec 5, 9:06 AM
arsenm added a child revision for D55301: RegAlloc: Allow targets to split register allocation: D55333: VirtRegMap: Preserve LiveDebugVariables.
Wed, Dec 5, 9:06 AM
arsenm created D55333: VirtRegMap: Preserve LiveDebugVariables.
Wed, Dec 5, 9:06 AM
arsenm accepted D55314: AMDGPU: Turn on the DPP combiner by default.

LGTM. Probably should remove the flag from the dpp_combine test

Wed, Dec 5, 7:13 AM

Tue, Dec 4

arsenm added parent revisions for D55301: RegAlloc: Allow targets to split register allocation: D55295: LiveIntervals: Add removePhysReg, D55238: MIR: Preserve incoming frame index numbers, D55287: VirtRegMap: Support partially allocated virtual registers, D55285: AMDGPU: Scavenge register instead of findUnusedReg, D55286: VirtRegMap: Add pass option to not clear virt regs, D55284: RegisterScavenger: Allow fail without spill, D55283: CodeGen: Refactor regallocator command line and target selection, D55282: CodeGen: Make RegAllocRegistry a template class.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55282: CodeGen: Make RegAllocRegistry a template class: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55284: RegisterScavenger: Allow fail without spill: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55283: CodeGen: Refactor regallocator command line and target selection: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55286: VirtRegMap: Add pass option to not clear virt regs: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55285: AMDGPU: Scavenge register instead of findUnusedReg: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55295: LiveIntervals: Add removePhysReg: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55238: MIR: Preserve incoming frame index numbers: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm added a child revision for D55287: VirtRegMap: Support partially allocated virtual registers: D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:25 PM
arsenm created D55301: RegAlloc: Allow targets to split register allocation.
Tue, Dec 4, 4:24 PM
arsenm accepted D55102: [MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM.

LGTM

Tue, Dec 4, 2:21 PM