mareko (Marek Olšák)
User

Projects

User does not belong to any projects.
User Since
Aug 17 2015, 4:04 AM (84 w, 1 d)

Recent Activity

Tue, Mar 21

mareko added a comment to D31216: AMDGPU: Remove hasSideEffects from SI_RETURN_TO_EPILOG.

LGTM.

Tue, Mar 21, 2:58 PM
mareko added a comment to D31209: AMDGPU: Rename SI_RETURN.

Other than the "from->to" typo, this looks good to me, though my preferred name for the instruction is SI_RETURN_TO_EPILOG.

Tue, Mar 21, 2:35 PM
mareko committed rL298397: AMDGPU: Buffer descriptor changes for GFX9.
AMDGPU: Buffer descriptor changes for GFX9
Tue, Mar 21, 10:13 AM
mareko committed rL298396: AMDGPU: Always use VGPR indexing on GFX9.
AMDGPU: Always use VGPR indexing on GFX9
Tue, Mar 21, 10:13 AM
mareko closed D31158: AMDGPU: Buffer descriptor changes for GFX9 by committing rL298397: AMDGPU: Buffer descriptor changes for GFX9.
Tue, Mar 21, 10:13 AM
mareko closed D31157: AMDGPU: Always use VGPR indexing on GFX9 by committing rL298396: AMDGPU: Always use VGPR indexing on GFX9.
Tue, Mar 21, 10:13 AM

Mon, Mar 20

mareko created D31158: AMDGPU: Buffer descriptor changes for GFX9.
Mon, Mar 20, 4:15 PM
mareko created D31157: AMDGPU: Always use VGPR indexing on GFX9.
Mon, Mar 20, 4:15 PM

Feb 24 2017

mareko added inline comments to D30209: AMDGPU: Fold omod into instructions.
Feb 24 2017, 1:42 AM

Feb 23 2017

mareko added a comment to D30209: AMDGPU: Fold omod into instructions.

Other than that, nice work.

Feb 23 2017, 6:57 AM

Feb 22 2017

mareko added a comment to D30204: AMDGPU: Add replacement bfe intrinsics.

LGTM

Feb 22 2017, 1:59 PM
mareko added a comment to D30203: AMDGPU: Add another BFE pattern.

LGTM

Feb 22 2017, 1:47 PM
mareko added a comment to D30205: AMDGPU: Convert image intrinsic uses in tests.

Acked.

Feb 22 2017, 1:01 PM
mareko added a comment to D30202: AMDGPU: Don't look at chain users when adjusting writemask.

LGTM.

Feb 22 2017, 12:59 PM
mareko added a comment to D29584: AMDGPU: Replace disabled exp inputs with undef.

LGTM, but maybe there should be parentheses around "2 * I".

Feb 22 2017, 12:47 PM

Feb 21 2017

mareko added a comment to D30217: AMDGPU: Fix asserting on 0 dmask for image intrinsics.

LGTM. (other than the fact I don't know why MERGE_VALUES is needed here)

Feb 21 2017, 12:20 PM
mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

LGTM.

Feb 21 2017, 12:05 PM
mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

I only know that exceptions won't occur with the clamp modifier. No idea about denormals.

Also, shouldn't this handle MIN as well?

There's no practical reason to handle min. The higher level operation minnum(x, x) is folded to x in the IR, so this should only be appearing when we emit this pattern for the clamp operation, where max was arbitrarily chosen.

Feb 21 2017, 9:02 AM
mareko added a comment to D30198: AMDGPU: Add cvt.pkrtz intrinsic.

LGTM.

Feb 21 2017, 8:30 AM
mareko added a comment to D30197: AMDGPU: Remove llvm.AMDGPU.flbit intrinsic.

LGTM.

Feb 21 2017, 7:56 AM
mareko added a comment to D30196: AMDGPU: Remove some uses of llvm.SI.export in tests.

LGTM.

Feb 21 2017, 7:55 AM
mareko added a comment to D30195: AMDGPU: Remove clamp intrinsic.

LGTM.

Feb 21 2017, 7:52 AM

Feb 19 2017

mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

I only know that exceptions won't occur with the clamp modifier. No idea about denormals.

Feb 19 2017, 3:50 PM

Feb 16 2017

mareko added a comment to D30020: AMDGPU: Remove llvm.AMDGPU.cube intrinsic.

LGTM.

Feb 16 2017, 1:54 AM

Feb 13 2017

mareko added inline comments to D29584: AMDGPU: Replace disabled exp inputs with undef.
Feb 13 2017, 2:47 PM

Feb 1 2017

mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

Would you please describe the purpose of this patch? It's not obvious why it's useful.

The main reason it is useful is because it tells the compiler that this is a load from a constant value without neededing any more analysis. It's also useful because s_buffer_load_* instructions have a much more simplified resource descriptor, so if then do end up getting selected to MUBUF you don't have to worry about swizzled addressing. It is true however, that you could just use a single llvm.amdgcn.buffer.load.i32 intrinsic for everything, but you may end up with worse code if you are unable to do the analysis required to select it to SMRD instructions.

Feb 1 2017, 11:21 AM
mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

Would you please describe the purpose of this patch? It's not obvious why it's useful.

Feb 1 2017, 5:53 AM
mareko added a comment to D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Feb 1 2017, 3:56 AM

Jan 31 2017

mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

I don't think it's legal to select amdgcn.buffer.load to SMRD unless you can prove that it is uniform. llvm.amdgcn.s.buffer.load is known to always be uniform.

Jan 31 2017, 5:42 PM
mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

Jan 31 2017, 4:01 PM

Jan 30 2017

mareko committed rL293477: AMDGPU: Remove a useless VI SMRD pattern.
AMDGPU: Remove a useless VI SMRD pattern
Jan 30 2017, 4:36 AM
mareko closed D28995: AMDGPU: Remove a useless VI SMRD pattern by committing rL293477: AMDGPU: Remove a useless VI SMRD pattern.
Jan 30 2017, 4:36 AM
mareko committed rL293476: AMDGPU: Fix assembler encoding for EXP instructions on VI.
AMDGPU: Fix assembler encoding for EXP instructions on VI
Jan 30 2017, 4:36 AM
mareko closed D28992: AMDGPU: Fix assembler encoding for EXP instructions on VI by committing rL293476: AMDGPU: Fix assembler encoding for EXP instructions on VI.
Jan 30 2017, 4:36 AM

Jan 22 2017

mareko added a comment to D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

I wonder if the improvement comes from the fact that the intrinsics can use SMEM now, or the fact I fixed smrd#_SGPR to accept a non-constant offset.

Jan 22 2017, 3:48 PM
mareko created D28995: AMDGPU: Remove a useless VI SMRD pattern.
Jan 22 2017, 2:17 PM
mareko created D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.
Jan 22 2017, 2:16 PM
mareko created D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.
Jan 22 2017, 2:16 PM
mareko created D28992: AMDGPU: Fix assembler encoding for EXP instructions on VI.
Jan 22 2017, 2:15 PM

Jan 11 2017

mareko added a comment to D27682: AMDGPU: Add replacement export intrinsics.

LGTM.

Jan 11 2017, 1:42 PM
mareko added a comment to D27682: AMDGPU: Add replacement export intrinsics.

LGTM if I can just bitcast from i32 to v2i16.

Jan 11 2017, 6:02 AM

Dec 28 2016

mareko added inline comments to D25428: AMDGPU add support for spilling to a user sgpr pointed buffers.
Dec 28 2016, 3:07 PM

Dec 14 2016

mareko added inline comments to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.
Dec 14 2016, 11:20 AM

Dec 13 2016

mareko added inline comments to D27682: AMDGPU: Add replacement export intrinsics.
Dec 13 2016, 6:55 AM

Dec 9 2016

mareko committed rL289262: AMDGPU/SI: Don't reserve XNACK when it's disabled.
AMDGPU/SI: Don't reserve XNACK when it's disabled
Dec 9 2016, 12:00 PM
mareko committed rL289263: AMDGPU/SI: Remove XNACK feature from CI.
AMDGPU/SI: Remove XNACK feature from CI
Dec 9 2016, 12:00 PM
mareko closed D27151: AMDGPU/SI: Don't reserve XNACK when it's disabled by committing rL289262: AMDGPU/SI: Don't reserve XNACK when it's disabled.
Dec 9 2016, 12:00 PM
mareko closed D27175: AMDGPU/SI: Remove XNACK feature from CI by committing rL289263: AMDGPU/SI: Remove XNACK feature from CI.
Dec 9 2016, 12:00 PM
mareko committed rL289261: AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects.
AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects
Dec 9 2016, 12:00 PM
mareko committed rL289260: AMDGPU/SI: Allow using SGPRs 96-101 on VI.
AMDGPU/SI: Allow using SGPRs 96-101 on VI
Dec 9 2016, 12:00 PM
mareko closed D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets by committing rL289261: AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects.
Dec 9 2016, 12:00 PM
mareko closed D27149: AMDGPU/SI: Allow using SGPRs 96-101 on VI by committing rL289260: AMDGPU/SI: Allow using SGPRs 96-101 on VI.
Dec 9 2016, 12:00 PM

Dec 8 2016

mareko added a comment to D27593: AMDGPU/SI: Don't mark VINTRP instructions as mayLoad.

The portion of LDS dedicated to VINTRP instructions is read-only immutable memory. LDS instruction can't access it, because the VINTRP portion of LDS memory is stolen from the shader. I don't see a reason to mark VINTRP as read-only memory instructions.

Dec 8 2016, 2:27 PM

Nov 30 2016

mareko updated the diff for D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.

Update: Use FlatScratchInit, also reserve FLAT_SCRATCH on CI.

Nov 30 2016, 2:33 PM

Nov 29 2016

mareko added a comment to D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.

hasStackObjects is a good enough approximation for now. Ideally it would be enabled only if it has an alloca that is addrspacecasted or has a captured private pointer. hasStackObjects would ideally only be alloca derived stack objects, not anything inserted by lowering (which should exist by the time reserved registers are frozen)

Nov 29 2016, 3:10 AM

Nov 28 2016

mareko retitled D27175: AMDGPU/SI: Remove XNACK feature from CI from to AMDGPU/SI: Remove XNACK feature from CI.
Nov 28 2016, 1:56 PM
mareko updated the diff for D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.

Update: Add SIMachineFunctionInfo::flatUsesScratch()

Nov 28 2016, 1:54 PM
mareko added inline comments to D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.
Nov 28 2016, 9:11 AM

Nov 27 2016

mareko retitled D27151: AMDGPU/SI: Don't reserve XNACK when it's disabled from to AMDGPU/SI: Don't reserve XNACK when it's disabled.
Nov 27 2016, 11:12 AM
mareko retitled D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets from to AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.
Nov 27 2016, 11:12 AM
mareko retitled D27149: AMDGPU/SI: Allow using SGPRs 96-101 on VI from to AMDGPU/SI: Allow using SGPRs 96-101 on VI.
Nov 27 2016, 11:12 AM

Nov 25 2016

mareko committed rL287942: AMDGPU/SI: Add back reverted SGPR spilling code, but disable it.
AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
Nov 25 2016, 9:47 AM
mareko abandoned D27121: AMDGPU/SI: Prevent "s_buffer_store_dword exec" from occuring due to WQM.

Abandoned for now, but can be re-used if SGPR spilling via scalar stores ever comes back.

Nov 25 2016, 8:15 AM
mareko abandoned D27120: AMDGPU/SI: Move EXEC_LO/HI to SReg_32.

Abandoned for now, but can be re-used if SGPR spilling via scalar stores ever comes back.

Nov 25 2016, 8:15 AM
mareko committed rL287936: Revert "AMDGPU: Implement SGPR spilling with scalar stores".
Revert "AMDGPU: Implement SGPR spilling with scalar stores"
Nov 25 2016, 8:13 AM
mareko committed rL287935: Revert "AMDGPU: Fix MMO when splitting spill".
Revert "AMDGPU: Fix MMO when splitting spill"
Nov 25 2016, 8:13 AM
mareko committed rL287934: Revert "AMDGPU: Fix adding extra implicit def of register".
Revert "AMDGPU: Fix adding extra implicit def of register"
Nov 25 2016, 8:13 AM
mareko committed rL287933: Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling".
Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling"
Nov 25 2016, 8:13 AM
mareko committed rL287932: Revert "AMDGPU: Make m0 unallocatable".
Revert "AMDGPU: Make m0 unallocatable"
Nov 25 2016, 8:13 AM
mareko committed rL287930: Revert "AMDGPU: Preserve m0 value when spilling".
Revert "AMDGPU: Preserve m0 value when spilling"
Nov 25 2016, 8:13 AM
mareko committed rL287931: Revert "AMDGPU: Remove m0 spilling code".
Revert "AMDGPU: Remove m0 spilling code"
Nov 25 2016, 8:13 AM
mareko added a comment to D27120: AMDGPU/SI: Move EXEC_LO/HI to SReg_32.

This series fixes some issues, but in general the compiler is still in a very bad shape. There are errors like this:

Nov 25 2016, 6:35 AM

Nov 24 2016

mareko retitled D27121: AMDGPU/SI: Prevent "s_buffer_store_dword exec" from occuring due to WQM from to AMDGPU/SI: Prevent "s_buffer_store_dword exec" from occuring due to WQM.
Nov 24 2016, 4:46 PM
mareko retitled D27120: AMDGPU/SI: Move EXEC_LO/HI to SReg_32 from to AMDGPU/SI: Move EXEC_LO/HI to SReg_32.
Nov 24 2016, 4:46 PM

Nov 21 2016

mareko added inline comments to D26725: AMDGPU: Add llvm.amdgcn.interp.mov intrinsic.
Nov 21 2016, 5:32 PM
mareko added inline comments to D26725: AMDGPU: Add llvm.amdgcn.interp.mov intrinsic.
Nov 21 2016, 5:05 PM

Nov 18 2016

mareko added a comment to D12067: AMDGPU: Refactor exp instructions.

FYI, I'm working on a pass that moves all EXP instructions to the end of shaders after the machine scheduler. We don't want any scheduling optimizations for those.

Nov 18 2016, 10:54 AM

Aug 25 2016

mareko added a comment to D23688: AMDGPU/SI: Implement a custom MachineSchedStrategy.

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

This is only accounting for the known static LDS, there's no reason to special case this

What is the "known static LDS"?

The declared. if you mean these shaders are creating a huge LDS array and don't intend to use it, they should switch to passing in a local pointer argument

Aug 25 2016, 12:09 PM
mareko added a comment to D23688: AMDGPU/SI: Implement a custom MachineSchedStrategy.

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

This is only accounting for the known static LDS, there's no reason to special case this

Aug 25 2016, 12:05 PM
mareko added a comment to D23688: AMDGPU/SI: Implement a custom MachineSchedStrategy.

Please also ignore the LDS size for shader type = GS. It's the same reason as VS.

Aug 25 2016, 11:49 AM
mareko added a comment to D23688: AMDGPU/SI: Implement a custom MachineSchedStrategy.

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

Aug 25 2016, 11:47 AM

Aug 17 2016

mareko added a comment to D22898: AMDGPU: Fix ffloor for SI.

Yeah I know about the CI bug, but it's not important for OpenGL.

Aug 17 2016, 1:28 PM
mareko added a comment to D22898: AMDGPU: Fix ffloor for SI.

I don't understand.

Aug 17 2016, 5:03 AM

Aug 12 2016

mareko added a comment to D22898: AMDGPU: Fix ffloor for SI.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

Yes, by clamping to exactly 1 it skips the broken 1 value. 0.999999... needs to be passed through fract

Aug 12 2016, 11:38 AM

Aug 11 2016

mareko added a comment to D23286: AMDGPU/SI: Propose to redefine image load/store intrinsics.

The changes look good to me. Existing users (Mesa) will need to be fixed manually, but we can take care of that.

Aug 11 2016, 3:39 PM

Aug 10 2016

mareko added a comment to D23286: AMDGPU/SI: Propose to redefine image load/store intrinsics.

"unorm" and "da" must be exposed as parameters. They don't change the type, but they change the behavior of the TA hardware block. In all cases, the type is always floating-point.

"r128" doesn't have to be exposed and it's kinda useless. We don't have any use case for it and I think the next-gen hardware (after Polaris) doesn't have it either.

This patch also consider image load and image store. For image store, unorm bit must be 1. I haven't seen any restriction regarding image load.
Are you sure the coordinate type is always float-point? I know for image_sample, it is the case, and not sure image load and image store.

Aug 10 2016, 1:35 PM
mareko added a comment to D23286: AMDGPU/SI: Propose to redefine image load/store intrinsics.

if you use v8i32 as rsrc type, then r128 is true!

Also, the quoted sentence is non-sense. In 100% of our cases, r128 must be 0.

Yes, the quoted comment is wrong. Do you think I can drop r128 bit support?
i.e. set resource type to be v8i32 and r128 bit to be 0 all the time?

I heard GL somehow uses R128 bit.

Aug 10 2016, 1:28 PM
mareko added a comment to D23286: AMDGPU/SI: Propose to redefine image load/store intrinsics.

if you use v8i32 as rsrc type, then r128 is true!

Aug 10 2016, 8:22 AM
mareko added a comment to D23286: AMDGPU/SI: Propose to redefine image load/store intrinsics.

"unorm" and "da" must be exposed as parameters. They don't change the type, but they change the behavior of the TA hardware block. In all cases, the type is always floating-point.

Aug 10 2016, 8:00 AM

Aug 5 2016

mareko committed rL277867: AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland.
AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland
Aug 5 2016, 2:31 PM
mareko closed D23034: AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland by committing rL277867: AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland.
Aug 5 2016, 2:31 PM

Aug 1 2016

mareko added a comment to D23020: [ValueTracking] bitreverse, sin, cos are safe to speculatively execute.

I'd also like to mark UDiv, URem, SDiv, SRem, exp2, log2, pow, and at least 14 AMDGPU intrinsics as speculatively-executable.

Aug 1 2016, 2:26 PM
mareko retitled D23034: AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland from to AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland.
Aug 1 2016, 1:59 PM
mareko updated subscribers of D23020: [ValueTracking] bitreverse, sin, cos are safe to speculatively execute.
Aug 1 2016, 9:54 AM
mareko retitled D23020: [ValueTracking] bitreverse, sin, cos are safe to speculatively execute from to [ValueTracking] bitreverse, sin, cos are safe to speculatively execute.
Aug 1 2016, 9:47 AM
mareko added a comment to D22898: AMDGPU: Fix ffloor for SI.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

Aug 1 2016, 3:21 AM
mareko added a comment to D22838: AMDGPU/SI: Implement amdgcn image intrinsics.

BTW, SLC and GLC bits definitely SHOULD be exposed. LLVM isn't high-level enough to be able to make any assumptions about cache strategies.

Aug 1 2016, 3:02 AM

Jul 26 2016

mareko added a comment to D22821: AMDGPU: Use rcp for fdiv 1, x with fpmath metadata.

BTW, while this fixes the regression, doing x*(1/y) in Mesa generates slightly better code.

Jul 26 2016, 1:21 PM
mareko added a comment to D22821: AMDGPU: Use rcp for fdiv 1, x with fpmath metadata.

BTW, while this fixes the regression, doing x*(1/y) in Mesa generates slightly better code.

Jul 26 2016, 1:20 PM

Jul 19 2016

mareko added a comment to D22032: AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling.

FYI, This is a high priority bug. It prevents DiRT Showdown from working. Phoronix can't test the game due to that.

Jul 19 2016, 5:42 AM