mareko (Marek Olšák)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 17 2015, 4:04 AM (92 w, 4 d)

Recent Activity

Wed, May 24

mareko committed rL303754: Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex….
Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex…
Wed, May 24, 7:54 AM

Tue, May 23

mareko committed rL303658: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.
AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns
Tue, May 23, 10:14 AM
mareko closed D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns by committing rL303658: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.
Tue, May 23, 10:14 AM

Fri, May 19

mareko updated the diff for D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.

add one more use of has32BitLiteralSMRDOffset.

Fri, May 19, 11:13 AM
mareko updated the diff for D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.

This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4.

Fri, May 19, 11:05 AM

Thu, May 4

mareko committed rL302200: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.
AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5
Thu, May 4, 3:38 PM
mareko closed D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5 by committing rL302200: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.
Thu, May 4, 3:38 PM
mareko added a comment to D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.

Yeah, we could have a simpler test, but my approach is usually to copy an existing test and modify it, so it would help if we had such a test that screams "hey I'm a simple scratch buffer test!"

Thu, May 4, 12:21 PM

Wed, May 3

mareko abandoned D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.

Doesn't this make this change unworkable? Presumably the front-end would need to annotate in some way to indicate that this is a legitimate transformation, in which case you might as well use a different intrinsic anyway. Are there any circumstances where you can determine if this is definitely the case?

I've got a situation that benefits from this change, but equally could use the solution in D27586. Perhaps that change could be enhanced with the non-const offset change in this review?

Wed, May 3, 9:30 AM

Tue, May 2

mareko updated the diff for D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.

Directly set SGPR5 in the SIMachineFunctionInfo constructor.

Tue, May 2, 12:13 PM
mareko abandoned D23020: [ValueTracking] bitreverse, sin, cos are safe to speculatively execute.
Tue, May 2, 11:18 AM
mareko committed rL301930: AMDGPU: Add AMDGPU_HS calling convention.
AMDGPU: Add AMDGPU_HS calling convention
Tue, May 2, 8:56 AM
mareko closed D32644: AMDGPU: Add AMDGPU_HS calling convention by committing rL301930: AMDGPU: Add AMDGPU_HS calling convention.
Tue, May 2, 8:56 AM
mareko added a comment to D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.

Ping

Tue, May 2, 7:44 AM

Fri, Apr 28

mareko committed rL301677: AMDGPU: Add new amdgcn.init.exec intrinsics.
AMDGPU: Add new amdgcn.init.exec intrinsics
Fri, Apr 28, 1:35 PM
mareko closed D31762: AMDGPU: Add new amdgcn.init.exec intrinsics by committing rL301677: AMDGPU: Add new amdgcn.init.exec intrinsics.
Fri, Apr 28, 1:35 PM
mareko updated the diff for D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.

Use isPhysRegUsed.

Fri, Apr 28, 12:28 PM
mareko added inline comments to D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.
Fri, Apr 28, 11:34 AM
mareko created D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.
Fri, Apr 28, 7:24 AM
mareko created D32644: AMDGPU: Add AMDGPU_HS calling convention.
Fri, Apr 28, 7:23 AM

Apr 24 2017

mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Is it at all possible to get merged shaders where either part has thread_count == 0? We might want a way to annotate branches so that the skip-jump for EXEC=0 is not introduced.

Yes, thread_count == 0 is possible, and it's explained here: https://patchwork.freedesktop.org/patch/152356/

Ah, so that's why the barrier instruction is needed between shader parts? Interesting.

Apr 24 2017, 10:35 AM
mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

I don't see anything wrong with the code.

I agree that the design is a bit iffy. It's almost like these intrinsics are something that is part of the calling convention. But even these intrinsics cannot quite lead to optimal code for merged monolithic shaders, because there's an unnecessary initialization of EXEC in the first part of the shader.

Since what we need to do here in general really doesn't fit well into LLVM IR semantics, I suspect that no matter what we come up with, it's bound to be ugly. So we might as well go with this particular solution here.

Merged monolithic shaders set exec = -1 and then they use "if (tid < thread_count) ...". I think that's the only way to jump over the conditional code right now if the wave has thread_count == 0. If we don't want v_mbcnt+v_cmp, we could do something like "if (amdgcn.set.thread_count(n)) ..." that sets EXEC regardless of current EXEC and skips the conditional for thread_count == 0. The performance of that solution is unlikely to justify the implementation effort.

Is it at all possible to get merged shaders where either part has thread_count == 0? We might want a way to annotate branches so that the skip-jump for EXEC=0 is not introduced.

Apr 24 2017, 10:16 AM
mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

I don't see anything wrong with the code.

I agree that the design is a bit iffy. It's almost like these intrinsics are something that is part of the calling convention. But even these intrinsics cannot quite lead to optimal code for merged monolithic shaders, because there's an unnecessary initialization of EXEC in the first part of the shader.

Since what we need to do here in general really doesn't fit well into LLVM IR semantics, I suspect that no matter what we come up with, it's bound to be ugly. So we might as well go with this particular solution here.

Apr 24 2017, 9:42 AM

Apr 23 2017

mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Ping

Apr 23 2017, 2:01 PM

Apr 18 2017

mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Ping

Apr 18 2017, 6:39 AM

Apr 14 2017

mareko updated the diff for D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Bug fixes, cosmetic changes, more tests.

Apr 14 2017, 2:18 PM
mareko added a comment to D32036: AMDGPU: SimplifyDemandedElts for image intrinsics.

GETLOD should be OK.

Apr 14 2017, 12:55 PM

Apr 13 2017

mareko added inline comments to D32036: AMDGPU: SimplifyDemandedElts for image intrinsics.
Apr 13 2017, 12:46 PM

Apr 7 2017

mareko added a comment to D31271: AMDGPU: Refactor SIMachineFunctionInfo slightly.

LGTM.

Apr 7 2017, 2:28 PM
mareko added a comment to D31268: AMDGPU: Refactor argument lowering.

LGTM.

Apr 7 2017, 1:58 PM
mareko added inline comments to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 7 2017, 6:41 AM

Apr 6 2017

mareko added inline comments to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 6 2017, 10:08 AM
mareko added inline comments to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 6 2017, 10:06 AM
mareko added a comment to D31250: AMDGPU: Stop using CCAssignToRegWithShadow.

LGTM.

Apr 6 2017, 8:32 AM
mareko created D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 6 2017, 7:37 AM

Apr 4 2017

mareko added a comment to D31612: AMDGPU: Remove legacy export intrinsic.

LGTM.

Apr 4 2017, 7:17 AM
mareko added a comment to D31611: AMDGPU: Remove legacy image intrinsics.

LGTM.

Apr 4 2017, 7:16 AM
mareko abandoned D31631: AMDGPU: Add a function attribute that sets EXEC as the first thing done.
Apr 4 2017, 1:19 AM

Apr 3 2017

mareko created D31631: AMDGPU: Add a function attribute that sets EXEC as the first thing done.
Apr 3 2017, 4:40 PM

Mar 21 2017

mareko added a comment to D31216: AMDGPU: Remove hasSideEffects from SI_RETURN_TO_EPILOG.

LGTM.

Mar 21 2017, 2:58 PM
mareko added a comment to D31209: AMDGPU: Rename SI_RETURN.

Other than the "from->to" typo, this looks good to me, though my preferred name for the instruction is SI_RETURN_TO_EPILOG.

Mar 21 2017, 2:35 PM
mareko committed rL298397: AMDGPU: Buffer descriptor changes for GFX9.
AMDGPU: Buffer descriptor changes for GFX9
Mar 21 2017, 10:13 AM
mareko committed rL298396: AMDGPU: Always use VGPR indexing on GFX9.
AMDGPU: Always use VGPR indexing on GFX9
Mar 21 2017, 10:13 AM
mareko closed D31158: AMDGPU: Buffer descriptor changes for GFX9 by committing rL298397: AMDGPU: Buffer descriptor changes for GFX9.
Mar 21 2017, 10:13 AM
mareko closed D31157: AMDGPU: Always use VGPR indexing on GFX9 by committing rL298396: AMDGPU: Always use VGPR indexing on GFX9.
Mar 21 2017, 10:13 AM

Mar 20 2017

mareko created D31158: AMDGPU: Buffer descriptor changes for GFX9.
Mar 20 2017, 4:15 PM
mareko created D31157: AMDGPU: Always use VGPR indexing on GFX9.
Mar 20 2017, 4:15 PM

Feb 24 2017

mareko added inline comments to D30209: AMDGPU: Fold omod into instructions.
Feb 24 2017, 1:42 AM

Feb 23 2017

mareko added a comment to D30209: AMDGPU: Fold omod into instructions.

Other than that, nice work.

Feb 23 2017, 6:57 AM

Feb 22 2017

mareko added a comment to D30204: AMDGPU: Add replacement bfe intrinsics.

LGTM

Feb 22 2017, 1:59 PM
mareko added a comment to D30203: AMDGPU: Add another BFE pattern.

LGTM

Feb 22 2017, 1:47 PM
mareko added a comment to D30205: AMDGPU: Convert image intrinsic uses in tests.

Acked.

Feb 22 2017, 1:01 PM
mareko added a comment to D30202: AMDGPU: Don't look at chain users when adjusting writemask.

LGTM.

Feb 22 2017, 12:59 PM
mareko added a comment to D29584: AMDGPU: Replace disabled exp inputs with undef.

LGTM, but maybe there should be parentheses around "2 * I".

Feb 22 2017, 12:47 PM

Feb 21 2017

mareko added a comment to D30217: AMDGPU: Fix asserting on 0 dmask for image intrinsics.

LGTM. (other than the fact I don't know why MERGE_VALUES is needed here)

Feb 21 2017, 12:20 PM
mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

LGTM.

Feb 21 2017, 12:05 PM
mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

I only know that exceptions won't occur with the clamp modifier. No idea about denormals.

Also, shouldn't this handle MIN as well?

There's no practical reason to handle min. The higher level operation minnum(x, x) is folded to x in the IR, so this should only be appearing when we emit this pattern for the clamp operation, where max was arbitrarily chosen.

Feb 21 2017, 9:02 AM
mareko added a comment to D30198: AMDGPU: Add cvt.pkrtz intrinsic.

LGTM.

Feb 21 2017, 8:30 AM
mareko added a comment to D30197: AMDGPU: Remove llvm.AMDGPU.flbit intrinsic.

LGTM.

Feb 21 2017, 7:56 AM
mareko added a comment to D30196: AMDGPU: Remove some uses of llvm.SI.export in tests.

LGTM.

Feb 21 2017, 7:55 AM
mareko added a comment to D30195: AMDGPU: Remove clamp intrinsic.

LGTM.

Feb 21 2017, 7:52 AM

Feb 19 2017

mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

I only know that exceptions won't occur with the clamp modifier. No idea about denormals.

Feb 19 2017, 3:50 PM

Feb 16 2017

mareko added a comment to D30020: AMDGPU: Remove llvm.AMDGPU.cube intrinsic.

LGTM.

Feb 16 2017, 1:54 AM

Feb 13 2017

mareko added inline comments to D29584: AMDGPU: Replace disabled exp inputs with undef.
Feb 13 2017, 2:47 PM

Feb 1 2017

mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

Would you please describe the purpose of this patch? It's not obvious why it's useful.

The main reason it is useful is because it tells the compiler that this is a load from a constant value without neededing any more analysis. It's also useful because s_buffer_load_* instructions have a much more simplified resource descriptor, so if then do end up getting selected to MUBUF you don't have to worry about swizzled addressing. It is true however, that you could just use a single llvm.amdgcn.buffer.load.i32 intrinsic for everything, but you may end up with worse code if you are unable to do the analysis required to select it to SMRD instructions.

Feb 1 2017, 11:21 AM
mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

Would you please describe the purpose of this patch? It's not obvious why it's useful.

Feb 1 2017, 5:53 AM
mareko added a comment to D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Feb 1 2017, 3:56 AM

Jan 31 2017

mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

I don't think it's legal to select amdgcn.buffer.load to SMRD unless you can prove that it is uniform. llvm.amdgcn.s.buffer.load is known to always be uniform.

Jan 31 2017, 5:42 PM
mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

Jan 31 2017, 4:01 PM

Jan 30 2017

mareko committed rL293477: AMDGPU: Remove a useless VI SMRD pattern.
AMDGPU: Remove a useless VI SMRD pattern
Jan 30 2017, 4:36 AM
mareko closed D28995: AMDGPU: Remove a useless VI SMRD pattern by committing rL293477: AMDGPU: Remove a useless VI SMRD pattern.
Jan 30 2017, 4:36 AM
mareko committed rL293476: AMDGPU: Fix assembler encoding for EXP instructions on VI.
AMDGPU: Fix assembler encoding for EXP instructions on VI
Jan 30 2017, 4:36 AM
mareko closed D28992: AMDGPU: Fix assembler encoding for EXP instructions on VI by committing rL293476: AMDGPU: Fix assembler encoding for EXP instructions on VI.
Jan 30 2017, 4:36 AM

Jan 22 2017

mareko added a comment to D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

I wonder if the improvement comes from the fact that the intrinsics can use SMEM now, or the fact I fixed smrd#_SGPR to accept a non-constant offset.

Jan 22 2017, 3:48 PM
mareko created D28995: AMDGPU: Remove a useless VI SMRD pattern.
Jan 22 2017, 2:17 PM
mareko created D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.
Jan 22 2017, 2:16 PM
mareko created D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.
Jan 22 2017, 2:16 PM
mareko created D28992: AMDGPU: Fix assembler encoding for EXP instructions on VI.
Jan 22 2017, 2:15 PM

Jan 11 2017

mareko added a comment to D27682: AMDGPU: Add replacement export intrinsics.

LGTM.

Jan 11 2017, 1:42 PM
mareko added a comment to D27682: AMDGPU: Add replacement export intrinsics.

LGTM if I can just bitcast from i32 to v2i16.

Jan 11 2017, 6:02 AM

Dec 28 2016

mareko added inline comments to D25428: AMDGPU add support for spilling to a user sgpr pointed buffers.
Dec 28 2016, 3:07 PM

Dec 14 2016

mareko added inline comments to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.
Dec 14 2016, 11:20 AM

Dec 13 2016

mareko added inline comments to D27682: AMDGPU: Add replacement export intrinsics.
Dec 13 2016, 6:55 AM

Dec 9 2016

mareko committed rL289262: AMDGPU/SI: Don't reserve XNACK when it's disabled.
AMDGPU/SI: Don't reserve XNACK when it's disabled
Dec 9 2016, 12:00 PM
mareko committed rL289263: AMDGPU/SI: Remove XNACK feature from CI.
AMDGPU/SI: Remove XNACK feature from CI
Dec 9 2016, 12:00 PM
mareko closed D27151: AMDGPU/SI: Don't reserve XNACK when it's disabled by committing rL289262: AMDGPU/SI: Don't reserve XNACK when it's disabled.
Dec 9 2016, 12:00 PM
mareko closed D27175: AMDGPU/SI: Remove XNACK feature from CI by committing rL289263: AMDGPU/SI: Remove XNACK feature from CI.
Dec 9 2016, 12:00 PM
mareko committed rL289261: AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects.
AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects
Dec 9 2016, 12:00 PM
mareko committed rL289260: AMDGPU/SI: Allow using SGPRs 96-101 on VI.
AMDGPU/SI: Allow using SGPRs 96-101 on VI
Dec 9 2016, 12:00 PM
mareko closed D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets by committing rL289261: AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects.
Dec 9 2016, 12:00 PM
mareko closed D27149: AMDGPU/SI: Allow using SGPRs 96-101 on VI by committing rL289260: AMDGPU/SI: Allow using SGPRs 96-101 on VI.
Dec 9 2016, 12:00 PM

Dec 8 2016

mareko added a comment to D27593: AMDGPU/SI: Don't mark VINTRP instructions as mayLoad.

The portion of LDS dedicated to VINTRP instructions is read-only immutable memory. LDS instruction can't access it, because the VINTRP portion of LDS memory is stolen from the shader. I don't see a reason to mark VINTRP as read-only memory instructions.

Dec 8 2016, 2:27 PM

Nov 30 2016

mareko updated the diff for D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.

Update: Use FlatScratchInit, also reserve FLAT_SCRATCH on CI.

Nov 30 2016, 2:33 PM

Nov 29 2016

mareko added a comment to D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.

hasStackObjects is a good enough approximation for now. Ideally it would be enabled only if it has an alloca that is addrspacecasted or has a captured private pointer. hasStackObjects would ideally only be alloca derived stack objects, not anything inserted by lowering (which should exist by the time reserved registers are frozen)

Nov 29 2016, 3:10 AM

Nov 28 2016

mareko retitled D27175: AMDGPU/SI: Remove XNACK feature from CI from to AMDGPU/SI: Remove XNACK feature from CI.
Nov 28 2016, 1:56 PM
mareko updated the diff for D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.

Update: Add SIMachineFunctionInfo::flatUsesScratch()

Nov 28 2016, 1:54 PM
mareko added inline comments to D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.
Nov 28 2016, 9:11 AM

Nov 27 2016

mareko retitled D27151: AMDGPU/SI: Don't reserve XNACK when it's disabled from to AMDGPU/SI: Don't reserve XNACK when it's disabled.
Nov 27 2016, 11:12 AM
mareko retitled D27150: AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets from to AMDGPU/SI: Don't reserve FLAT_SCRATCH on non-HSA targets.
Nov 27 2016, 11:12 AM
mareko retitled D27149: AMDGPU/SI: Allow using SGPRs 96-101 on VI from to AMDGPU/SI: Allow using SGPRs 96-101 on VI.
Nov 27 2016, 11:12 AM