Page MenuHomePhabricator

mareko (Marek Olšák)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 17 2015, 4:04 AM (174 w, 1 d)

Recent Activity

Wed, Nov 28

mareko added inline comments to D55030: [AMDGPU] Fold brcond (setcc zext(i1 x), 1, ne) -> brcond x.
Wed, Nov 28, 5:04 PM

Mon, Nov 26

mareko updated the diff for D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.

Use report_fatal_error, add SourceOfDivergence lines

Mon, Nov 26, 8:40 PM
mareko added a comment to D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.

Any comments?

Mon, Nov 26, 8:33 PM
mareko updated the diff for D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).

Add InstCombine tests.

Mon, Nov 26, 8:30 PM

Tue, Nov 20

mareko added inline comments to D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).
Tue, Nov 20, 3:18 PM

Nov 12 2018

mareko added a comment to D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.

If I make m0 integer, DS_ORDERED_COUNT won't be a mem node.

Nov 12 2018, 9:14 PM
mareko added a comment to D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.

I might change the intrinsic to add the option to insert "s_and_saveexec s[N:M], 1" and "s_mov_b64 exec, s[N:M]" around the intrinsic to get an optimal single-lane block.

Nov 12 2018, 7:48 PM
mareko added a comment to D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.

Does this need to be marked as isSourceOfDivergence?

Now that you mention it, yes, even though it's for stupid reasons: I believe the ds_ordered_count instruction executes only in a single lane, so it's intuitively a uniform operation; however, it returns its result only in lane 0, so it's formally non-uniform.

Nov 12 2018, 7:43 PM
mareko abandoned D33865: Mark llvm.*annotation intrinsics as NoMem and Speculatable.
Nov 12 2018, 7:38 PM
mareko abandoned D52907: AMDGPU: Don't merge DS opcodes on SI to fix corruption in Hitman.

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive

Nov 12 2018, 7:37 PM

Nov 2 2018

mareko added a comment to D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..

I'm concerned that x8 and x16 loads will significantly increase SGPR usage and therefore SGPR spilling. We have a shader database with over 70 games and benchmarks and I guess the results will not be good after this is committed.

Nov 2 2018, 1:39 PM · Unknown Object (Project)

Oct 8 2018

mareko added a comment to D52907: AMDGPU: Don't merge DS opcodes on SI to fix corruption in Hitman.

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Oct 8 2018, 6:25 PM
mareko added inline comments to D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.
Oct 8 2018, 2:59 PM
mareko added inline comments to D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.
Oct 8 2018, 2:56 PM
mareko added inline comments to D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.
Oct 8 2018, 8:14 AM

Oct 5 2018

mareko created D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap.
Oct 5 2018, 1:34 PM
mareko added a comment to D52907: AMDGPU: Don't merge DS opcodes on SI to fix corruption in Hitman.

Multi-dword LDS opcodes seem to be the culprit.

Oct 5 2018, 1:34 PM
mareko updated the diff for D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).

AMDGPU: Add a fast path for icmp.i1(src, false, NE)

Oct 5 2018, 1:31 PM
mareko added a comment to D52907: AMDGPU: Don't merge DS opcodes on SI to fix corruption in Hitman.

This isn't a correct fix. If there's an issue with 64-bit DS instructions, it's a lowering problem. If we can't use them for some reason, changing this here might be a helpful heuristic but as-is this is not a real fix

Oct 5 2018, 10:59 AM

Oct 4 2018

mareko created D52907: AMDGPU: Don't merge DS opcodes on SI to fix corruption in Hitman.
Oct 4 2018, 2:18 PM

Sep 29 2018

mareko accepted D52683: [AMDGPU] Fix for negative offsets in buffer/tbuffer intrinsics.
Sep 29 2018, 5:07 PM

Sep 27 2018

mareko added a comment to D52577: [AMDGPU] Fold copy (copy vgpr).

On a related note, another way to decrease VGPR usage is to fold immediates with more than 1 uses. The backend currently folds immediates with only 1 use.

Sep 27 2018, 10:03 AM

Sep 24 2018

mareko added a comment to D51969: [AMDGPU] Add an AMDGPU specific atomic optimizer..

What happens if a shader already does "if (threadID == 0) { do_atomic(); }"? Is the optimization skipped in this case?

Sep 24 2018, 6:41 PM · Unknown Object (Project)

Sep 21 2018

mareko added a comment to D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).

Should the instcombine part change also to allow creation of i1 uses?

Sep 21 2018, 12:03 AM

Sep 13 2018

mareko created D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).
Sep 13 2018, 2:30 PM

Aug 29 2018

mareko committed rL340959: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes
Aug 29 2018, 1:04 PM
mareko closed D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Aug 29 2018, 1:03 PM

Aug 28 2018

mareko updated the diff for D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

This fixes GPU hangs with OpenGL bindless handle arithmetic.

Aug 28 2018, 11:50 PM
mareko added a comment to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

@arsenm Where do you have the patches that preserve NUW?

Aug 28 2018, 8:34 PM

Aug 27 2018

mareko added a comment to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

We can ignore old Mesa + new LLVM, because LLVM 7 is the first release to have 32-bit pointers, and I think we can fix that before release.

Aug 27 2018, 11:32 AM

Aug 24 2018

mareko added inline comments to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Aug 24 2018, 10:26 PM
mareko added inline comments to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Aug 24 2018, 9:20 AM

Aug 23 2018

mareko created D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Aug 23 2018, 6:08 PM

Aug 22 2018

mareko added a comment to D51098: [AMDGPU] Add support for multi-dword s.buffer.load intrinsic.
In D51098#1209699, @tpr wrote:

Marek, I will correct the spelling of your name in the commit message when I land this. :-)

Aug 22 2018, 12:46 PM

Aug 21 2018

mareko added inline comments to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.
Aug 21 2018, 5:41 PM

Aug 20 2018

mareko accepted D50306: [AMDGPU] New buffer intrinsics.
Aug 20 2018, 4:23 PM
mareko added inline comments to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.
Aug 20 2018, 2:28 PM

Aug 11 2018

mareko accepted D50469: AMDGPU: Check NSZ MI flag when folding omod.

Accepted.

Aug 11 2018, 11:02 AM

Aug 10 2018

mareko accepted D50486: MachineScheduler: Refactor setPolicy() to limit computing remaining latency.
Aug 10 2018, 9:31 AM

Aug 1 2018

mareko accepted D50148: AMDGPU: Partially fix handling of packed amdgpu_ps arguments.
Aug 1 2018, 12:10 PM

Jul 30 2018

mareko added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

The patch is missing a test. shader-db can reproduce it.

Jul 30 2018, 2:17 PM

Jul 13 2018

mareko added a comment to D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.

This patch along with the prerequisite patch doesn't break Mesa.

What's the prerequisite patch?

Jul 13 2018, 11:36 AM

Jul 12 2018

mareko added a comment to D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.

This patch along with the prerequisite patch doesn't break Mesa.

Jul 12 2018, 7:15 PM
mareko accepted D49128: AMDGPU: Properly handle shader inputs with split arguments.
Jul 12 2018, 7:12 PM

Jul 11 2018

mareko added a comment to D49128: AMDGPU: Properly handle shader inputs with split arguments.

This is a no-op change, right? Because the previous code also works.

Jul 11 2018, 11:52 AM

Jul 9 2018

mareko added inline comments to D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.
Jul 9 2018, 6:34 PM

Jul 5 2018

mareko added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

FYI, this bug is also reproducible on Mesa OpenGL now.

Jul 5 2018, 12:37 PM

May 23 2018

mareko accepted D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

LGTM.

May 23 2018, 7:59 AM

May 19 2018

mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How does this pass affect shaders that use a lot of memory instructions but no pointers?

Can you give an example? What is a memory instruction without a pointer? As you may see, pass processes something which can cast to load, store, atomic or memory intrinsic. Everything else considered an ordinary instruction. For example if have an image in mind it is conservatively not considered memory instruction.

May 19 2018, 7:33 PM
mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How does this pass affect shaders that use a lot of memory instructions but no pointers?

May 19 2018, 10:01 AM

May 16 2018

mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How can UMDs disable this optimization?

May 16 2018, 9:56 PM

May 15 2018

mareko closed D46351: StructurizeCFG: fix inverting conditions.

Committed.

May 15 2018, 2:52 PM
mareko committed rL332404: AMDGPU: Add a missing test for the 128-bit local addr space option.
AMDGPU: Add a missing test for the 128-bit local addr space option
May 15 2018, 2:45 PM
mareko committed rL332403: StructurizeCFG: fix inverting conditions.
StructurizeCFG: fix inverting conditions
May 15 2018, 2:45 PM

Apr 10 2018

mareko committed rL329764: AMDGPU: enable 128-bit for local addr space under an option.
AMDGPU: enable 128-bit for local addr space under an option
Apr 10 2018, 3:53 PM

Apr 9 2018

mareko committed rL329591: AMDGPU: enable 128-bit for local addr space under an option.
AMDGPU: enable 128-bit for local addr space under an option
Apr 9 2018, 10:02 AM

Mar 16 2018

mareko added a comment to D44401: [AMDGPU] Always use IDX for load/store format intrinsics..

I think the unit change is a hint we should use different intrinsics for this

Mar 16 2018, 1:31 PM

Feb 21 2018

mareko added inline comments to D42647: AMDGPU: Track physreg uses in SILoadStoreOptimizer.
Feb 21 2018, 9:48 AM

Feb 15 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

In fact v_readfirstlane is inserted by the ISel to glue vector input to the unexpected scalar instruction.
This means that compiler user writing valid IR will get unexpected behavior.
Is this documented somewhere?

My objections WRT implementation are:
Bypassing the normal way of processing values divergence is misleading. I was very much surprised to see "amdgpu.uniform" metadata already set at the point (AMDGPUAnnotateUniformValues) where they are expected to be queried from DA.
Moreover they were set for the value that is reported by DA as divergent!

Feb 15 2018, 4:40 AM

Feb 14 2018

mareko added inline comments to D41651: AMDGPU: Add 32-bit constant address space.
Feb 14 2018, 3:18 PM

Feb 7 2018

mareko committed rL324487: AMDGPU: Add 32-bit constant address space.
AMDGPU: Add 32-bit constant address space
Feb 7 2018, 8:05 AM
mareko closed D41651: AMDGPU: Add 32-bit constant address space.
Feb 7 2018, 8:05 AM
mareko committed rL324486: AMDGPU: Remove the s_buffer workaround for GFX9 chips.
AMDGPU: Remove the s_buffer workaround for GFX9 chips
Feb 7 2018, 8:05 AM
mareko closed D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.
Feb 7 2018, 8:05 AM

Feb 6 2018

mareko committed rL324353: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU.
AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU
Feb 6 2018, 7:22 AM
mareko closed D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..

Pushed, thanks.

Feb 6 2018, 7:22 AM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

Ping

Feb 6 2018, 6:41 AM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

Ping

Feb 6 2018, 6:41 AM

Feb 5 2018

mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

Ping

Feb 5 2018, 7:20 AM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

Ping

Feb 5 2018, 7:19 AM
mareko added a comment to D42885: [AMDGPU] intrintrics for byte/short load/store.

and, of course, the vdata type.

Feb 5 2018, 7:18 AM
mareko added a comment to D42885: [AMDGPU] intrintrics for byte/short load/store.

It should be easy to change the return type to i32.

Feb 5 2018, 7:18 AM

Feb 3 2018

mareko accepted D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..
Feb 3 2018, 5:59 PM
mareko added a comment to D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..

Is this supposed to just insert v_readfirstlane for the descriptor? Can you mention that in the comment before calling legalizeOperands? Can you add a test case? Thanks.

Feb 3 2018, 6:20 AM

Feb 1 2018

mareko updated the diff for D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

Feb 1 2018, 12:39 PM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

What details? Can you be more specific about what you're asking here?

Like you mentioned in the commit message that there is a problem with x3 loads only.

Feb 1 2018, 12:35 PM
mareko abandoned D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.

The workaround is not needed.

Feb 1 2018, 9:38 AM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

Feb 1 2018, 9:38 AM

Jan 31 2018

mareko committed rL323913: [SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test.
[SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test
Jan 31 2018, 12:51 PM
mareko committed rL323908: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Jan 31 2018, 12:21 PM
mareko committed rL323909: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Jan 31 2018, 12:21 PM
mareko committed rL323907: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
[SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs
Jan 31 2018, 12:21 PM
mareko closed D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 31 2018, 12:21 PM
mareko closed D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Jan 31 2018, 12:21 PM
mareko closed D42744: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
Jan 31 2018, 12:20 PM
mareko created D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.
Jan 31 2018, 11:48 AM
mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

32-bit loads are always considered uniform and so are always translated
to s_loads with possible v_readfirstlane.

Jan 31 2018, 11:04 AM
mareko created D42744: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
Jan 31 2018, 9:46 AM

Jan 30 2018

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 30 2018, 4:11 AM

Jan 29 2018

mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 6:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 6:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 5:11 PM
mareko closed D42302: AMDGPU: Allow a SGPR for the conditional KILL operand..

Pushed.

Jan 29 2018, 4:16 PM
mareko committed rL323706: AMDGPU: Allow a SGPR for the conditional KILL operand.
AMDGPU: Allow a SGPR for the conditional KILL operand
Jan 29 2018, 3:22 PM
mareko added inline comments to D42647: AMDGPU: Track physreg uses in SILoadStoreOptimizer.
Jan 29 2018, 2:30 PM
mareko accepted D40343: AMDGPU: Do not combine loads/store across physreg defs.
Jan 29 2018, 2:26 PM

Jan 28 2018

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 28 2018, 4:23 PM

Jan 25 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

This is exactly why it's not OK? If it's dropped you get a compile error or miscompile

Jan 25 2018, 4:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 25 2018, 4:32 PM
mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 25 2018, 12:13 PM