mareko (Marek Olšák)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 17 2015, 4:04 AM (109 w, 1 d)

Recent Activity

Wed, Aug 30

mareko added a comment to D22898: AMDGPU: Fix ffloor for SI.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

What does that mean?

IEEE canonicalize. http://www.llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic

NaNs are quieted, denormals may be flushed

Wed, Aug 30, 2:53 AM

Tue, Aug 29

mareko added a comment to D22898: AMDGPU: Fix ffloor for SI.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

Tue, Aug 29, 1:18 PM

Jul 25 2017

mareko committed rL309028: AMDGPU/SI: Fix Depth and Height computation for SI scheduler.
AMDGPU/SI: Fix Depth and Height computation for SI scheduler
Jul 25 2017, 1:37 PM
mareko closed D34967: AMDGPU/SI: Fix Depth and Height computation for SI scheduler by committing rL309028: AMDGPU/SI: Fix Depth and Height computation for SI scheduler.
Jul 25 2017, 1:37 PM · Restricted Project
mareko committed rL309027: AMDGPU/SI: Force exports at the end for SI scheduler.
AMDGPU/SI: Force exports at the end for SI scheduler
Jul 25 2017, 1:37 PM
mareko closed D34965: AMDGPU/SI: Force exports at the end for SI scheduler by committing rL309027: AMDGPU/SI: Force exports at the end for SI scheduler.
Jul 25 2017, 1:37 PM · Restricted Project

Jul 4 2017

mareko closed D34190: [AMDGPU] Fix latency of MIMG instructions.

Committed manually. Closing.

Jul 4 2017, 7:44 AM
mareko committed rL307081: [AMDGPU] Fix latency of MIMG instructions.
[AMDGPU] Fix latency of MIMG instructions
Jul 4 2017, 7:44 AM

Jun 28 2017

mareko added a comment to D34767: AMDGPU: Remove SITypeRewriter.

I approve this. I don't know if Dave does.

Jun 28 2017, 1:59 PM

Jun 22 2017

mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

I've got a scenario that will benefit from this change. It seems to me that this might be a more workable solution than D28993 (although a more generic approach like that is attractive). This change doesn't necessarily preclude something like D28993 at a later stage does it?

Is it possible to make the non-const offset change that mareko talks about as well?

Jun 22 2017, 11:02 AM

Jun 20 2017

mareko added a comment to D33865: Mark llvm.*annotation intrinsics as NoMem and Speculatable.

How do you use these integer annotations? If you're using them solely as optimization information (i.e. dropping them would be correct, but perhaps undesirable), it may be better to use metadata here.

Jun 20 2017, 3:42 AM

Jun 19 2017

mareko added a comment to D33865: Mark llvm.*annotation intrinsics as NoMem and Speculatable.

Do we need to add more reviewers?

JFYI: so far you haven't added any reviewers. :)

However, if you want to make this change, at the very least you also need to edit the language reference entry for llvm.var.annotation to state that it isn't allowed to express control flow dependent properties. Can you also please give some examples of how you use llvm.var.annotate (for discussion)?

Jun 19 2017, 9:05 AM

Jun 16 2017

mareko added a comment to D33865: Mark llvm.*annotation intrinsics as NoMem and Speculatable.

Do we need to add more reviewers?

Jun 16 2017, 5:18 AM
mareko added a comment to D33865: Mark llvm.*annotation intrinsics as NoMem and Speculatable.

Ping

Jun 16 2017, 5:17 AM

Jun 3 2017

mareko created D33865: Mark llvm.*annotation intrinsics as NoMem and Speculatable.
Jun 3 2017, 8:51 AM

May 24 2017

mareko committed rL303754: Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex….
Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex…
May 24 2017, 7:54 AM

May 23 2017

mareko committed rL303658: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.
AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns
May 23 2017, 10:14 AM
mareko closed D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns by committing rL303658: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.
May 23 2017, 10:14 AM

May 19 2017

mareko updated the diff for D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.

add one more use of has32BitLiteralSMRDOffset.

May 19 2017, 11:13 AM
mareko updated the diff for D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.

This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4.

May 19 2017, 11:05 AM

May 4 2017

mareko committed rL302200: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.
AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5
May 4 2017, 3:38 PM
mareko closed D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5 by committing rL302200: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.
May 4 2017, 3:38 PM
mareko added a comment to D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.

Yeah, we could have a simpler test, but my approach is usually to copy an existing test and modify it, so it would help if we had such a test that screams "hey I'm a simple scratch buffer test!"

May 4 2017, 12:21 PM

May 3 2017

mareko abandoned D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.

Doesn't this make this change unworkable? Presumably the front-end would need to annotate in some way to indicate that this is a legitimate transformation, in which case you might as well use a different intrinsic anyway. Are there any circumstances where you can determine if this is definitely the case?

I've got a situation that benefits from this change, but equally could use the solution in D27586. Perhaps that change could be enhanced with the non-const offset change in this review?

May 3 2017, 9:30 AM

May 2 2017

mareko updated the diff for D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.

Directly set SGPR5 in the SIMachineFunctionInfo constructor.

May 2 2017, 12:13 PM
mareko abandoned D23020: [ValueTracking] bitreverse, sin, cos are safe to speculatively execute.
May 2 2017, 11:18 AM
mareko committed rL301930: AMDGPU: Add AMDGPU_HS calling convention.
AMDGPU: Add AMDGPU_HS calling convention
May 2 2017, 8:56 AM
mareko closed D32644: AMDGPU: Add AMDGPU_HS calling convention by committing rL301930: AMDGPU: Add AMDGPU_HS calling convention.
May 2 2017, 8:56 AM
mareko added a comment to D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.

Ping

May 2 2017, 7:44 AM

Apr 28 2017

mareko committed rL301677: AMDGPU: Add new amdgcn.init.exec intrinsics.
AMDGPU: Add new amdgcn.init.exec intrinsics
Apr 28 2017, 1:35 PM
mareko closed D31762: AMDGPU: Add new amdgcn.init.exec intrinsics by committing rL301677: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 28 2017, 1:35 PM
mareko updated the diff for D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.

Use isPhysRegUsed.

Apr 28 2017, 12:28 PM
mareko added inline comments to D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.
Apr 28 2017, 11:34 AM
mareko created D32645: AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5.
Apr 28 2017, 7:24 AM
mareko created D32644: AMDGPU: Add AMDGPU_HS calling convention.
Apr 28 2017, 7:23 AM

Apr 24 2017

mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Is it at all possible to get merged shaders where either part has thread_count == 0? We might want a way to annotate branches so that the skip-jump for EXEC=0 is not introduced.

Yes, thread_count == 0 is possible, and it's explained here: https://patchwork.freedesktop.org/patch/152356/

Ah, so that's why the barrier instruction is needed between shader parts? Interesting.

Apr 24 2017, 10:35 AM
mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

I don't see anything wrong with the code.

I agree that the design is a bit iffy. It's almost like these intrinsics are something that is part of the calling convention. But even these intrinsics cannot quite lead to optimal code for merged monolithic shaders, because there's an unnecessary initialization of EXEC in the first part of the shader.

Since what we need to do here in general really doesn't fit well into LLVM IR semantics, I suspect that no matter what we come up with, it's bound to be ugly. So we might as well go with this particular solution here.

Merged monolithic shaders set exec = -1 and then they use "if (tid < thread_count) ...". I think that's the only way to jump over the conditional code right now if the wave has thread_count == 0. If we don't want v_mbcnt+v_cmp, we could do something like "if (amdgcn.set.thread_count(n)) ..." that sets EXEC regardless of current EXEC and skips the conditional for thread_count == 0. The performance of that solution is unlikely to justify the implementation effort.

Is it at all possible to get merged shaders where either part has thread_count == 0? We might want a way to annotate branches so that the skip-jump for EXEC=0 is not introduced.

Apr 24 2017, 10:16 AM
mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

I don't see anything wrong with the code.

I agree that the design is a bit iffy. It's almost like these intrinsics are something that is part of the calling convention. But even these intrinsics cannot quite lead to optimal code for merged monolithic shaders, because there's an unnecessary initialization of EXEC in the first part of the shader.

Since what we need to do here in general really doesn't fit well into LLVM IR semantics, I suspect that no matter what we come up with, it's bound to be ugly. So we might as well go with this particular solution here.

Apr 24 2017, 9:42 AM

Apr 23 2017

mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Ping

Apr 23 2017, 2:01 PM

Apr 18 2017

mareko added a comment to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Ping

Apr 18 2017, 6:39 AM

Apr 14 2017

mareko updated the diff for D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.

Bug fixes, cosmetic changes, more tests.

Apr 14 2017, 2:18 PM
mareko added a comment to D32036: AMDGPU: SimplifyDemandedElts for image intrinsics.

GETLOD should be OK.

Apr 14 2017, 12:55 PM

Apr 13 2017

mareko added inline comments to D32036: AMDGPU: SimplifyDemandedElts for image intrinsics.
Apr 13 2017, 12:46 PM

Apr 7 2017

mareko added a comment to D31271: AMDGPU: Refactor SIMachineFunctionInfo slightly.

LGTM.

Apr 7 2017, 2:28 PM
mareko added a comment to D31268: AMDGPU: Refactor argument lowering.

LGTM.

Apr 7 2017, 1:58 PM
mareko added inline comments to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 7 2017, 6:41 AM

Apr 6 2017

mareko added inline comments to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 6 2017, 10:08 AM
mareko added inline comments to D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 6 2017, 10:06 AM
mareko added a comment to D31250: AMDGPU: Stop using CCAssignToRegWithShadow.

LGTM.

Apr 6 2017, 8:32 AM
mareko created D31762: AMDGPU: Add new amdgcn.init.exec intrinsics.
Apr 6 2017, 7:37 AM

Apr 4 2017

mareko added a comment to D31612: AMDGPU: Remove legacy export intrinsic.

LGTM.

Apr 4 2017, 7:17 AM
mareko added a comment to D31611: AMDGPU: Remove legacy image intrinsics.

LGTM.

Apr 4 2017, 7:16 AM
mareko abandoned D31631: AMDGPU: Add a function attribute that sets EXEC as the first thing done.
Apr 4 2017, 1:19 AM

Apr 3 2017

mareko created D31631: AMDGPU: Add a function attribute that sets EXEC as the first thing done.
Apr 3 2017, 4:40 PM

Mar 21 2017

mareko added a comment to D31216: AMDGPU: Remove hasSideEffects from SI_RETURN_TO_EPILOG.

LGTM.

Mar 21 2017, 2:58 PM
mareko added a comment to D31209: AMDGPU: Rename SI_RETURN.

Other than the "from->to" typo, this looks good to me, though my preferred name for the instruction is SI_RETURN_TO_EPILOG.

Mar 21 2017, 2:35 PM
mareko committed rL298397: AMDGPU: Buffer descriptor changes for GFX9.
AMDGPU: Buffer descriptor changes for GFX9
Mar 21 2017, 10:13 AM
mareko committed rL298396: AMDGPU: Always use VGPR indexing on GFX9.
AMDGPU: Always use VGPR indexing on GFX9
Mar 21 2017, 10:13 AM
mareko closed D31158: AMDGPU: Buffer descriptor changes for GFX9 by committing rL298397: AMDGPU: Buffer descriptor changes for GFX9.
Mar 21 2017, 10:13 AM
mareko closed D31157: AMDGPU: Always use VGPR indexing on GFX9 by committing rL298396: AMDGPU: Always use VGPR indexing on GFX9.
Mar 21 2017, 10:13 AM

Mar 20 2017

mareko created D31158: AMDGPU: Buffer descriptor changes for GFX9.
Mar 20 2017, 4:15 PM
mareko created D31157: AMDGPU: Always use VGPR indexing on GFX9.
Mar 20 2017, 4:15 PM

Feb 24 2017

mareko added inline comments to D30209: AMDGPU: Fold omod into instructions.
Feb 24 2017, 1:42 AM

Feb 23 2017

mareko added a comment to D30209: AMDGPU: Fold omod into instructions.

Other than that, nice work.

Feb 23 2017, 6:57 AM

Feb 22 2017

mareko added a comment to D30204: AMDGPU: Add replacement bfe intrinsics.

LGTM

Feb 22 2017, 1:59 PM
mareko added a comment to D30203: AMDGPU: Add another BFE pattern.

LGTM

Feb 22 2017, 1:47 PM
mareko added a comment to D30205: AMDGPU: Convert image intrinsic uses in tests.

Acked.

Feb 22 2017, 1:01 PM
mareko added a comment to D30202: AMDGPU: Don't look at chain users when adjusting writemask.

LGTM.

Feb 22 2017, 12:59 PM
mareko added a comment to D29584: AMDGPU: Replace disabled exp inputs with undef.

LGTM, but maybe there should be parentheses around "2 * I".

Feb 22 2017, 12:47 PM

Feb 21 2017

mareko added a comment to D30217: AMDGPU: Fix asserting on 0 dmask for image intrinsics.

LGTM. (other than the fact I don't know why MERGE_VALUES is needed here)

Feb 21 2017, 12:20 PM
mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

LGTM.

Feb 21 2017, 12:05 PM
mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

I only know that exceptions won't occur with the clamp modifier. No idea about denormals.

Also, shouldn't this handle MIN as well?

There's no practical reason to handle min. The higher level operation minnum(x, x) is folded to x in the IR, so this should only be appearing when we emit this pattern for the clamp operation, where max was arbitrarily chosen.

Feb 21 2017, 9:02 AM
mareko added a comment to D30198: AMDGPU: Add cvt.pkrtz intrinsic.

LGTM.

Feb 21 2017, 8:30 AM
mareko added a comment to D30197: AMDGPU: Remove llvm.AMDGPU.flbit intrinsic.

LGTM.

Feb 21 2017, 7:56 AM
mareko added a comment to D30196: AMDGPU: Remove some uses of llvm.SI.export in tests.

LGTM.

Feb 21 2017, 7:55 AM
mareko added a comment to D30195: AMDGPU: Remove clamp intrinsic.

LGTM.

Feb 21 2017, 7:52 AM

Feb 19 2017

mareko added a comment to D30134: AMDGPU: Fold FP clamp as modifier bit.

I only know that exceptions won't occur with the clamp modifier. No idea about denormals.

Feb 19 2017, 3:50 PM

Feb 16 2017

mareko added a comment to D30020: AMDGPU: Remove llvm.AMDGPU.cube intrinsic.

LGTM.

Feb 16 2017, 1:54 AM

Feb 13 2017

mareko added inline comments to D29584: AMDGPU: Replace disabled exp inputs with undef.
Feb 13 2017, 2:47 PM

Feb 1 2017

mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

Would you please describe the purpose of this patch? It's not obvious why it's useful.

The main reason it is useful is because it tells the compiler that this is a load from a constant value without neededing any more analysis. It's also useful because s_buffer_load_* instructions have a much more simplified resource descriptor, so if then do end up getting selected to MUBUF you don't have to worry about swizzled addressing. It is true however, that you could just use a single llvm.amdgcn.buffer.load.i32 intrinsic for everything, but you may end up with worse code if you are unable to do the analysis required to select it to SMRD instructions.

Feb 1 2017, 11:21 AM
mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

Would you please describe the purpose of this patch? It's not obvious why it's useful.

Feb 1 2017, 5:53 AM
mareko added a comment to D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Feb 1 2017, 3:56 AM

Jan 31 2017

mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

I don't think it's legal to select amdgcn.buffer.load to SMRD unless you can prove that it is uniform. llvm.amdgcn.s.buffer.load is known to always be uniform.

Jan 31 2017, 5:42 PM
mareko added a comment to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

Jan 31 2017, 4:01 PM

Jan 30 2017

mareko committed rL293477: AMDGPU: Remove a useless VI SMRD pattern.
AMDGPU: Remove a useless VI SMRD pattern
Jan 30 2017, 4:36 AM
mareko closed D28995: AMDGPU: Remove a useless VI SMRD pattern by committing rL293477: AMDGPU: Remove a useless VI SMRD pattern.
Jan 30 2017, 4:36 AM
mareko committed rL293476: AMDGPU: Fix assembler encoding for EXP instructions on VI.
AMDGPU: Fix assembler encoding for EXP instructions on VI
Jan 30 2017, 4:36 AM
mareko closed D28992: AMDGPU: Fix assembler encoding for EXP instructions on VI by committing rL293476: AMDGPU: Fix assembler encoding for EXP instructions on VI.
Jan 30 2017, 4:36 AM

Jan 22 2017

mareko added a comment to D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.

I wonder if the improvement comes from the fact that the intrinsics can use SMEM now, or the fact I fixed smrd#_SGPR to accept a non-constant offset.

Jan 22 2017, 3:48 PM
mareko created D28995: AMDGPU: Remove a useless VI SMRD pattern.
Jan 22 2017, 2:17 PM
mareko created D28994: AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns.
Jan 22 2017, 2:16 PM
mareko created D28993: AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load.
Jan 22 2017, 2:16 PM
mareko created D28992: AMDGPU: Fix assembler encoding for EXP instructions on VI.
Jan 22 2017, 2:15 PM

Jan 11 2017

mareko added a comment to D27682: AMDGPU: Add replacement export intrinsics.

LGTM.

Jan 11 2017, 1:42 PM
mareko added a comment to D27682: AMDGPU: Add replacement export intrinsics.

LGTM if I can just bitcast from i32 to v2i16.

Jan 11 2017, 6:02 AM

Dec 28 2016

mareko added inline comments to D25428: AMDGPU add support for spilling to a user sgpr pointed buffers.
Dec 28 2016, 3:07 PM

Dec 14 2016

mareko added inline comments to D27586: AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.
Dec 14 2016, 11:20 AM

Dec 13 2016

mareko added inline comments to D27682: AMDGPU: Add replacement export intrinsics.
Dec 13 2016, 6:55 AM

Dec 9 2016

mareko committed rL289262: AMDGPU/SI: Don't reserve XNACK when it's disabled.
AMDGPU/SI: Don't reserve XNACK when it's disabled
Dec 9 2016, 12:00 PM
mareko committed rL289263: AMDGPU/SI: Remove XNACK feature from CI.
AMDGPU/SI: Remove XNACK feature from CI
Dec 9 2016, 12:00 PM