mareko (Marek Olšák)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 17 2015, 4:04 AM (148 w, 18 h)

Recent Activity

Wed, May 23

mareko accepted D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

LGTM.

Wed, May 23, 7:59 AM

May 19 2018

mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How does this pass affect shaders that use a lot of memory instructions but no pointers?

Can you give an example? What is a memory instruction without a pointer? As you may see, pass processes something which can cast to load, store, atomic or memory intrinsic. Everything else considered an ordinary instruction. For example if have an image in mind it is conservatively not considered memory instruction.

May 19 2018, 7:33 PM
mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How does this pass affect shaders that use a lot of memory instructions but no pointers?

May 19 2018, 10:01 AM

May 16 2018

mareko added a comment to D46992: [AMDGPU] Add perf hints to functions.

How can UMDs disable this optimization?

May 16 2018, 9:56 PM

May 15 2018

mareko closed D46351: StructurizeCFG: fix inverting conditions.

Committed.

May 15 2018, 2:52 PM
mareko committed rL332404: AMDGPU: Add a missing test for the 128-bit local addr space option.
AMDGPU: Add a missing test for the 128-bit local addr space option
May 15 2018, 2:45 PM
mareko committed rL332403: StructurizeCFG: fix inverting conditions.
StructurizeCFG: fix inverting conditions
May 15 2018, 2:45 PM

Apr 10 2018

mareko committed rL329764: AMDGPU: enable 128-bit for local addr space under an option.
AMDGPU: enable 128-bit for local addr space under an option
Apr 10 2018, 3:53 PM

Apr 9 2018

mareko committed rL329591: AMDGPU: enable 128-bit for local addr space under an option.
AMDGPU: enable 128-bit for local addr space under an option
Apr 9 2018, 10:02 AM

Mar 16 2018

mareko added a comment to D44401: [AMDGPU] Always use IDX for load/store format intrinsics..

I think the unit change is a hint we should use different intrinsics for this

Mar 16 2018, 1:31 PM

Feb 21 2018

mareko added inline comments to D42647: AMDGPU: Track physreg uses in SILoadStoreOptimizer.
Feb 21 2018, 9:48 AM

Feb 15 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

In fact v_readfirstlane is inserted by the ISel to glue vector input to the unexpected scalar instruction.
This means that compiler user writing valid IR will get unexpected behavior.
Is this documented somewhere?

My objections WRT implementation are:
Bypassing the normal way of processing values divergence is misleading. I was very much surprised to see "amdgpu.uniform" metadata already set at the point (AMDGPUAnnotateUniformValues) where they are expected to be queried from DA.
Moreover they were set for the value that is reported by DA as divergent!

Feb 15 2018, 4:40 AM

Feb 14 2018

mareko added inline comments to D41651: AMDGPU: Add 32-bit constant address space.
Feb 14 2018, 3:18 PM

Feb 7 2018

mareko committed rL324487: AMDGPU: Add 32-bit constant address space.
AMDGPU: Add 32-bit constant address space
Feb 7 2018, 8:05 AM
This revision was not accepted when it landed; it landed in state Needs Review.
Feb 7 2018, 8:05 AM
mareko committed rL324486: AMDGPU: Remove the s_buffer workaround for GFX9 chips.
AMDGPU: Remove the s_buffer workaround for GFX9 chips
Feb 7 2018, 8:05 AM
This revision was not accepted when it landed; it landed in state Needs Review.
Feb 7 2018, 8:05 AM

Feb 6 2018

mareko committed rL324353: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU.
AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU
Feb 6 2018, 7:22 AM
mareko closed D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..

Pushed, thanks.

Feb 6 2018, 7:22 AM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

Ping

Feb 6 2018, 6:41 AM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

Ping

Feb 6 2018, 6:41 AM

Feb 5 2018

mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

Ping

Feb 5 2018, 7:20 AM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

Ping

Feb 5 2018, 7:19 AM
mareko added a comment to D42885: [AMDGPU] intrintrics for byte/short load/store.

and, of course, the vdata type.

Feb 5 2018, 7:18 AM
mareko added a comment to D42885: [AMDGPU] intrintrics for byte/short load/store.

It should be easy to change the return type to i32.

Feb 5 2018, 7:18 AM

Feb 3 2018

mareko accepted D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..
Feb 3 2018, 5:59 PM
mareko added a comment to D42881: AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU..

Is this supposed to just insert v_readfirstlane for the descriptor? Can you mention that in the comment before calling legalizeOperands? Can you add a test case? Thanks.

Feb 3 2018, 6:20 AM

Feb 1 2018

mareko updated the diff for D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

Feb 1 2018, 12:39 PM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

What details? Can you be more specific about what you're asking here?

Like you mentioned in the commit message that there is a problem with x3 loads only.

Feb 1 2018, 12:35 PM
mareko abandoned D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.

The workaround is not needed.

Feb 1 2018, 9:38 AM
mareko added a comment to D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

Feb 1 2018, 9:38 AM

Jan 31 2018

mareko committed rL323913: [SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test.
[SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test
Jan 31 2018, 12:51 PM
mareko committed rL323908: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Jan 31 2018, 12:21 PM
mareko committed rL323909: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Jan 31 2018, 12:21 PM
mareko committed rL323907: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
[SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs
Jan 31 2018, 12:21 PM
mareko closed D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 31 2018, 12:21 PM
mareko closed D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Jan 31 2018, 12:21 PM
mareko closed D42744: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
Jan 31 2018, 12:20 PM
mareko created D42756: AMDGPU: Remove the s_buffer workaround for GFX9 chips.
Jan 31 2018, 11:48 AM
mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

32-bit loads are always considered uniform and so are always translated
to s_loads with possible v_readfirstlane.

Jan 31 2018, 11:04 AM
mareko created D42744: [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs.
Jan 31 2018, 9:46 AM

Jan 30 2018

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 30 2018, 4:11 AM

Jan 29 2018

mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 6:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 6:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 29 2018, 5:11 PM
mareko closed D42302: AMDGPU: Allow a SGPR for the conditional KILL operand..

Pushed.

Jan 29 2018, 4:16 PM
mareko committed rL323706: AMDGPU: Allow a SGPR for the conditional KILL operand.
AMDGPU: Allow a SGPR for the conditional KILL operand
Jan 29 2018, 3:22 PM
mareko added inline comments to D42647: AMDGPU: Track physreg uses in SILoadStoreOptimizer.
Jan 29 2018, 2:30 PM
mareko accepted D40343: AMDGPU: Do not combine loads/store across physreg defs.
Jan 29 2018, 2:26 PM

Jan 28 2018

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 28 2018, 4:23 PM

Jan 25 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

This is exactly why it's not OK? If it's dropped you get a compile error or miscompile

Jan 25 2018, 4:49 PM
mareko added inline comments to D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 25 2018, 4:32 PM
mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 25 2018, 12:13 PM
mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Jan 25 2018, 11:55 AM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

This needs documentation in AMDGPUUsage.rst.

Relying on metadata for correctness is indeed not okay. We should either say that CONSTANT_ADDRESS_32BIT just assumes uniformness, and move the address to an SGPR (via v_readfirstlane) if required, *or* support this also with VMEM instructions.

Jan 25 2018, 11:08 AM

Jan 24 2018

mareko abandoned D41715: AMDGPU: Process amdgpu.uniform on loads.

I've abandoned this patch for now. Please re-open if you think this patch is useful.

Jan 24 2018, 9:51 AM

Jan 15 2018

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 15 2018, 11:23 AM
mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 15 2018, 10:26 AM
mareko created D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 15 2018, 9:41 AM
mareko created D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Jan 15 2018, 9:41 AM

Jan 10 2018

mareko added inline comments to D41715: AMDGPU: Process amdgpu.uniform on loads.
Jan 10 2018, 1:56 PM
mareko added inline comments to D41715: AMDGPU: Process amdgpu.uniform on loads.
Jan 10 2018, 1:49 PM
mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Jan 10 2018, 8:50 AM

Jan 9 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.
Jan 9 2018, 4:41 PM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

This needs to update AMDGPUAliasAnalysis. Also needs more test coverage. I don't see this testing unaligned access or some of the other places it was added.

Jan 9 2018, 4:38 PM

Jan 3 2018

mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Version 3:

  • Much simpler.
  • Addrspacecast is never inserted.
  • Unaligned loads are unaffected (work fine).
  • Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile.
  • D41715 (amdgpu.uniform on loads) is required for enforce scalar loads in some cases.
Jan 3 2018, 1:47 PM
mareko created D41715: AMDGPU: Process amdgpu.uniform on loads.
Jan 3 2018, 1:39 PM

Jan 2 2018

mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Version 2: As discussed on IRC.

Jan 2 2018, 7:42 PM
mareko updated the diff for D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.

Switched the return type to v2i16.

Jan 2 2018, 4:13 PM
mareko abandoned D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

Abandoning. The new plan is that we'll do it the right way in AMDGPU now.

Jan 2 2018, 12:29 PM
mareko added inline comments to D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Jan 2 2018, 8:43 AM
mareko added a comment to D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Yes.

Also, for my own curiosity, is this a temporary workaround until AMDGPU grows proper support for 32-bit GEPs or is this an inherent limitation?
If the former, maybe this hack should not go in (or at least we should consider what's the amount of work needed to finish implementing the support)

It's possible that this will be a permanent solution, because we don't plan to have full 32-bit support (it would be too much work in the backend for little benefit).

This is quite unfortunate. I'd like to point out I don't feel particularly comfortable to have this as a long term solution (but, maybe, OK for 6.0 & reverted in trunk).
The contract between the mid-level optimizer and the backend is that the latter should possibly accept everything produced by the former (or, FWIW, error out in some circumstances).
Maybe you can have something in your backend logic that recovers from the fact that AMDGPU doesn't know (and won't be taught) about 32-bit GEPs?
Have you considered something during legalization? [Apologies if I'm off, but I'm not really familiar with the way AMDGPU works, at least not in depth].

Jan 2 2018, 7:31 AM
mareko added a comment to D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Jan 2 2018, 6:19 AM
mareko created D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Jan 2 2018, 4:25 AM

Jan 1 2018

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

FYI, I'd like to get this into LLVM 6.0 if it's OK.

Jan 1 2018, 12:29 PM
mareko added reviewers for D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP: arsenm, nhaehnle.

FYI, I'd like to get this into LLVM 6.0 if it's OK.

Jan 1 2018, 12:28 PM

Dec 31 2017

mareko created D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.
Dec 31 2017, 8:21 PM
mareko created D41651: AMDGPU: Add 32-bit constant address space.
Dec 31 2017, 8:09 PM
mareko added a comment to D40343: AMDGPU: Do not combine loads/store across physreg defs.

We need this fix in LLVM 6.0, right?

Dec 31 2017, 11:05 AM

Dec 28 2017

mareko accepted D41468: AMDGPU: Implement getTgtMemIntrinsic for images.
Dec 28 2017, 6:55 AM

Dec 26 2017

mareko accepted D41470: AMDGPU: Use unique PSVs for buffer resources.

Accepted, though I added suggestions for tbuffer intrinsics.

Dec 26 2017, 7:07 AM
mareko accepted D41469: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores.
Dec 26 2017, 6:56 AM
mareko added inline comments to D41468: AMDGPU: Implement getTgtMemIntrinsic for images.
Dec 26 2017, 6:54 AM

Dec 8 2017

mareko accepted D40982: AMDGPU: image_getlod and image_getresinfo do not read memory.

LGTM.

Dec 8 2017, 7:15 AM

Dec 4 2017

mareko added a comment to D39040: AMDGPU: Fix creating invalid copy when adjusting dmask.

There are no piglit regressions.

Dec 4 2017, 1:17 PM

Nov 21 2017

mareko accepted D40303: AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer.

LGTM.

Nov 21 2017, 8:57 AM

Nov 16 2017

mareko added a comment to D40047: AMDGPU/GCN: Remove xnack from 801 and 810.

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Mesa can't modify SH_MEM_CONFIG to enable/disable XNACK. What is hardcoded in the kernel is what we get. XNACK is only enabled on compute rings on gfx8 APUs and on all rings on gfx9. In practice, Mesa should never access an unmapped page. I don't know if setting -xnack on all chips is a good idea in that case. We might also have suboptimal performance on gfx9 due to XNACK being always enabled by the KMD.

If Mesa always guarantees that the shaders will never access non-resident memory, and so will never have an XNACK, then it can always generate shaders that have XNACK disabled regardless of whether the kernel enables XNACK replay. In other words, enabling XNACK replay does not affect performance unless the shader chooses to generate XNACK compatible code. And the shader does not need to generate XNACK compatible code if it will never access a non-resident page.

For example, OpenCL 1.2 runtime always ensures all buffers are resident, and so compilers all shaders with XNACK disabled, regardless of whther the kernel has enabled XNACK replay.

So, does Mesa runtime always ensure all data accessed will be resident? Even for APUs? If so it would likely be a performance gain to always request no-XNACK unless page migration may also be active on the data accessed.

Nov 16 2017, 8:16 AM

Nov 15 2017

mareko added a comment to D40047: AMDGPU/GCN: Remove xnack from 801 and 810.

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Nov 15 2017, 6:42 PM

Nov 14 2017

mareko added a comment to D40047: AMDGPU/GCN: Remove xnack from 801 and 810.

Does this change break backwards compatibility with the gfx801 target? If so, which ROCm version will I need to use with these changes?

Nov 14 2017, 6:19 PM
mareko added a comment to D40047: AMDGPU/GCN: Remove xnack from 801 and 810.

Looks good to me.

Nov 14 2017, 5:51 PM

Nov 8 2017

mareko committed rL317754: AMDGPU: Lower buffer store and atomic intrinsics manually.
AMDGPU: Lower buffer store and atomic intrinsics manually
Nov 8 2017, 5:53 PM
mareko committed rL317755: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4.
AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
Nov 8 2017, 5:53 PM
mareko closed D39060: AMDGPU: Lower buffer store and atomic intrinsics manually by committing rL317754: AMDGPU: Lower buffer store and atomic intrinsics manually.
Nov 8 2017, 5:53 PM
mareko closed D39012: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4 by committing rL317755: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4.
Nov 8 2017, 5:53 PM
mareko committed rL317753: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4.
AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4
Nov 8 2017, 5:52 PM
mareko committed rL317752: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4.
AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4
Nov 8 2017, 5:52 PM
mareko closed D38950: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4 by committing rL317752: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4.
Nov 8 2017, 5:52 PM
mareko closed D38951: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4 by committing rL317753: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4.
Nov 8 2017, 5:52 PM
mareko committed rL317751: AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4.
AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
Nov 8 2017, 5:52 PM
mareko committed rL317750: AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM.
AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM
Nov 8 2017, 5:52 PM