mareko (Marek Olšák)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 17 2015, 4:04 AM (161 w, 4 d)

Recent Activity

Today

mareko added a comment to D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).

Should the instcombine part change also to allow creation of i1 uses?

Fri, Sep 21, 12:03 AM

Thu, Sep 13

mareko created D52060: AMDGPU: Add a fast path for icmp.i1(src, false, NE).
Thu, Sep 13, 2:30 PM

Wed, Aug 29

mareko committed rL340959: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes
Wed, Aug 29, 1:04 PM
mareko closed D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Wed, Aug 29, 1:03 PM

Tue, Aug 28

mareko updated the diff for D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

This fixes GPU hangs with OpenGL bindless handle arithmetic.

Tue, Aug 28, 11:50 PM
mareko added a comment to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

@arsenm Where do you have the patches that preserve NUW?

Tue, Aug 28, 8:34 PM

Mon, Aug 27

mareko added a comment to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.

We can ignore old Mesa + new LLVM, because LLVM 7 is the first release to have 32-bit pointers, and I think we can fix that before release.

Mon, Aug 27, 11:32 AM

Fri, Aug 24

mareko added inline comments to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Fri, Aug 24, 10:26 PM
mareko added inline comments to D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Fri, Aug 24, 9:20 AM

Thu, Aug 23

mareko created D51203: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes.
Thu, Aug 23, 6:08 PM

Aug 22 2018

mareko added a comment to D51098: [AMDGPU] Add support for multi-dword s.buffer.load intrinsic.
In D51098#1209699, @tpr wrote:

Marek, I will correct the spelling of your name in the commit message when I land this. :-)

Aug 22 2018, 12:46 PM

Aug 21 2018

mareko added inline comments to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.
Aug 21 2018, 5:41 PM

Aug 20 2018

mareko accepted D50306: [AMDGPU] New buffer intrinsics.
Aug 20 2018, 4:23 PM
mareko added inline comments to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.
Aug 20 2018, 2:28 PM

Aug 11 2018

mareko accepted D50469: AMDGPU: Check NSZ MI flag when folding omod.

Accepted.

Aug 11 2018, 11:02 AM

Aug 10 2018

mareko accepted D50486: MachineScheduler: Refactor setPolicy() to limit computing remaining latency.
Aug 10 2018, 9:31 AM

Aug 1 2018

mareko accepted D50148: AMDGPU: Partially fix handling of packed amdgpu_ps arguments.
Aug 1 2018, 12:10 PM

Jul 30 2018

mareko added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

The patch is missing a test. shader-db can reproduce it.

Jul 30 2018, 2:17 PM

Jul 13 2018

mareko added a comment to D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.

This patch along with the prerequisite patch doesn't break Mesa.

What's the prerequisite patch?

Jul 13 2018, 11:36 AM

Jul 12 2018

mareko added a comment to D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.

This patch along with the prerequisite patch doesn't break Mesa.

Jul 12 2018, 7:15 PM
mareko accepted D49128: AMDGPU: Properly handle shader inputs with split arguments.
Jul 12 2018, 7:12 PM

Jul 11 2018

mareko added a comment to D49128: AMDGPU: Properly handle shader inputs with split arguments.

This is a no-op change, right? Because the previous code also works.

Jul 11 2018, 11:52 AM

Jul 9 2018

mareko added inline comments to D49065: AMDGPU: Stop wasting argument registers with v3i32/v3f32.
Jul 9 2018, 6:34 PM

Jul 5 2018

mareko added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

FYI, this bug is also reproducible on Mesa OpenGL now.

Jul 5 2018, 12:37 PM

May 23 2018

mareko accepted D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

LGTM.

May 23 2018, 7:59 AM

May 19 2018

mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How does this pass affect shaders that use a lot of memory instructions but no pointers?

Can you give an example? What is a memory instruction without a pointer? As you may see, pass processes something which can cast to load, store, atomic or memory intrinsic. Everything else considered an ordinary instruction. For example if have an image in mind it is conservatively not considered memory instruction.

May 19 2018, 7:33 PM
mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How does this pass affect shaders that use a lot of memory instructions but no pointers?

May 19 2018, 10:01 AM

May 16 2018

mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How can UMDs disable this optimization?

May 16 2018, 9:56 PM

May 15 2018

mareko closed D46351: StructurizeCFG: fix inverting conditions.

Committed.

May 15 2018, 2:52 PM
mareko committed rL332404: AMDGPU: Add a missing test for the 128-bit local addr space option.
AMDGPU: Add a missing test for the 128-bit local addr space option
May 15 2018, 2:45 PM
mareko committed rL332403: StructurizeCFG: fix inverting conditions.
StructurizeCFG: fix inverting conditions
May 15 2018, 2:45 PM

Apr 10 2018

mareko committed rL329764: AMDGPU: enable 128-bit for local addr space under an option.
AMDGPU: enable 128-bit for local addr space under an option
Apr 10 2018, 3:53 PM

Apr 9 2018

mareko committed rL329591: AMDGPU: enable 128-bit for local addr space under an option.
AMDGPU: enable 128-bit for local addr space under an option
Apr 9 2018, 10:02 AM

Mar 16 2018

mareko added a comment to D44401: [AMDGPU] Always use IDX for load/store format intrinsics..

I think the unit change is a hint we should use different intrinsics for this

Mar 16 2018, 1:31 PM

Feb 21 2018

mareko added inline comments to D42647: AMDGPU: Track physreg uses in SILoadStoreOptimizer.
Feb 21 2018, 9:48 AM

Feb 15 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

In fact v_readfirstlane is inserted by the ISel to glue vector input to the unexpected scalar instruction.
This means that compiler user writing valid IR will get unexpected behavior.
Is this documented somewhere?

My objections WRT implementation are:
Bypassing the normal way of processing values divergence is misleading. I was very much surprised to see "amdgpu.uniform" metadata already set at the point (AMDGPUAnnotateUniformValues) where they are expected to be queried from DA.
Moreover they were set for the value that is reported by DA as divergent!

Feb 15 2018, 4:40 AM

Feb 14 2018

mareko added inline comments to D41651: AMDGPU: Add 32-bit constant address space.
Feb 14 2018, 3:18 PM

Feb 7 2018

mareko committed rL324487: AMDGPU: Add 32-bit constant address space.
AMDGPU: Add 32-bit constant address space
Feb 7 2018, 8:05 AM
This revision was not accepted when it landed; it landed in state Needs Review.
Feb 7 2018, 8:05 AM
mareko committed rL324486: AMDGPU: Remove the s_buffer workaround for GFX9 chips.
AMDGPU: Remove the s_buffer workaround for GFX9 chips
Feb 7 2018, 8:05 AM
This revision was not accepted when it landed; it landed in state Needs Review.
Feb 7 2018, 8:05 AM

Feb 6 2018

mareko committed rL324353: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU.
AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU
Feb 6 2018, 7:22 AM
mareko closed D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..

Pushed, thanks.

Feb 6 2018, 7:22 AM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

Ping

Feb 6 2018, 6:41 AM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

Ping

Feb 6 2018, 6:41 AM

Feb 5 2018

mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

Ping

Feb 5 2018, 7:20 AM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

Ping

Feb 5 2018, 7:19 AM
mareko added a comment to D42885: [AMDGPU] intrintrics for byte/short load/store.

and, of course, the vdata type.

Feb 5 2018, 7:18 AM
mareko added a comment to D42885: [AMDGPU] intrintrics for byte/short load/store.

It should be easy to change the return type to i32.

Feb 5 2018, 7:18 AM

Feb 3 2018

mareko accepted D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..
Feb 3 2018, 5:59 PM
mareko added a comment to D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..

Is this supposed to just insert v_readfirstlane for the descriptor? Can you mention that in the comment before calling legalizeOperands? Can you add a test case? Thanks.

Feb 3 2018, 6:20 AM

Feb 1 2018

mareko updated the diff for D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

Feb 1 2018, 12:39 PM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

What details? Can you be more specific about what you're asking here?

Like you mentioned in the commit message that there is a problem with x3 loads only.

Feb 1 2018, 12:35 PM
mareko abandoned D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.

The workaround is not needed.

Feb 1 2018, 9:38 AM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

Feb 1 2018, 9:38 AM

Jan 31 2018

mareko committed rL323913: [SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test.
[SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test
Jan 31 2018, 12:51 PM
mareko committed rL323908: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Jan 31 2018, 12:21 PM
mareko committed rL323909: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Jan 31 2018, 12:21 PM
mareko committed rL323907: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
[SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs
Jan 31 2018, 12:21 PM
mareko closed D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 31 2018, 12:21 PM
mareko closed D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Jan 31 2018, 12:21 PM
mareko closed D42744: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
Jan 31 2018, 12:20 PM
mareko created D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.
Jan 31 2018, 11:48 AM
mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

32-bit loads are always considered uniform and so are always translated
to s_loads with possible v_readfirstlane.

Jan 31 2018, 11:04 AM
mareko created D42744: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
Jan 31 2018, 9:46 AM

Jan 30 2018

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 30 2018, 4:11 AM

Jan 29 2018

mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 6:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 6:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 5:11 PM
mareko closed D42302: AMDGPU: Allow a SGPR for the conditional KILL operand..

Pushed.

Jan 29 2018, 4:16 PM
mareko committed rL323706: AMDGPU: Allow a SGPR for the conditional KILL operand.
AMDGPU: Allow a SGPR for the conditional KILL operand
Jan 29 2018, 3:22 PM
mareko added inline comments to D42647: AMDGPU: Track physreg uses in SILoadStoreOptimizer.
Jan 29 2018, 2:30 PM
mareko accepted D40343: AMDGPU: Do not combine loads/store across physreg defs.
Jan 29 2018, 2:26 PM

Jan 28 2018

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 28 2018, 4:23 PM

Jan 25 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

This is exactly why it's not OK? If it's dropped you get a compile error or miscompile

Jan 25 2018, 4:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 25 2018, 4:32 PM
mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 25 2018, 12:13 PM
mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Jan 25 2018, 11:55 AM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

This needs documentation in AMDGPUUsage.rst.

Relying on metadata for correctness is indeed not okay. We should either say that CONSTANT_ADDRESS_32BIT just assumes uniformness, and move the address to an SGPR (via v_readfirstlane) if required, *or* support this also with VMEM instructions.

Jan 25 2018, 11:08 AM

Jan 24 2018

mareko abandoned D41715: AMDGPU: Process amdgpu.uniform on loads.

I've abandoned this patch for now. Please re-open if you think this patch is useful.

Jan 24 2018, 9:51 AM

Jan 15 2018

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 15 2018, 11:23 AM
mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 15 2018, 10:26 AM
mareko created D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 15 2018, 9:41 AM
mareko created D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 15 2018, 9:41 AM

Jan 10 2018

mareko added inline comments to D41715: AMDGPU: Process amdgpu.uniform on loads.
Jan 10 2018, 1:56 PM
mareko added inline comments to D41715: AMDGPU: Process amdgpu.uniform on loads.
Jan 10 2018, 1:49 PM
mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Jan 10 2018, 8:50 AM

Jan 9 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.
Jan 9 2018, 4:41 PM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

This needs to update AMDGPUAliasAnalysis. Also needs more test coverage. I don't see this testing unaligned access or some of the other places it was added.

Jan 9 2018, 4:38 PM

Jan 3 2018

mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Version 3:

  • Much simpler.
  • Addrspacecast is never inserted.
  • Unaligned loads are unaffected (work fine).
  • Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile.
  • D41715 (amdgpu.uniform on loads) is required for enforce scalar loads in some cases.
Jan 3 2018, 1:47 PM
mareko created D41715: AMDGPU: Process amdgpu.uniform on loads.
Jan 3 2018, 1:39 PM

Jan 2 2018

mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Version 2: As discussed on IRC.

Jan 2 2018, 7:42 PM
mareko updated the diff for D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.

Switched the return type to v2i16.

Jan 2 2018, 4:13 PM
mareko abandoned D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

Abandoning. The new plan is that we'll do it the right way in AMDGPU now.

Jan 2 2018, 12:29 PM
mareko added inline comments to D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Jan 2 2018, 8:43 AM
mareko added a comment to D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Yes.

Also, for my own curiosity, is this a temporary workaround until AMDGPU grows proper support for 32-bit GEPs or is this an inherent limitation?
If the former, maybe this hack should not go in (or at least we should consider what's the amount of work needed to finish implementing the support)

It's possible that this will be a permanent solution, because we don't plan to have full 32-bit support (it would be too much work in the backend for little benefit).

This is quite unfortunate. I'd like to point out I don't feel particularly comfortable to have this as a long term solution (but, maybe, OK for 6.0 & reverted in trunk).
The contract between the mid-level optimizer and the backend is that the latter should possibly accept everything produced by the former (or, FWIW, error out in some circumstances).
Maybe you can have something in your backend logic that recovers from the fact that AMDGPU doesn't know (and won't be taught) about 32-bit GEPs?
Have you considered something during legalization? [Apologies if I'm off, but I'm not really familiar with the way AMDGPU works, at least not in depth].

Jan 2 2018, 7:31 AM
mareko added a comment to D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Jan 2 2018, 6:19 AM
mareko created D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Jan 2 2018, 4:25 AM

Jan 1 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

FYI, I'd like to get this into LLVM 6.0 if it's OK.

Jan 1 2018, 12:29 PM
mareko added reviewers for D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP: arsenm, nhaehnle.

FYI, I'd like to get this into LLVM 6.0 if it's OK.

Jan 1 2018, 12:28 PM