mareko (Marek Olšák)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 17 2015, 4:04 AM (126 w, 2 d)

Recent Activity

Mon, Jan 15

mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Mon, Jan 15, 11:23 AM
mareko added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Mon, Jan 15, 10:26 AM
mareko created D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Mon, Jan 15, 9:41 AM
mareko created D42078: AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9.
Mon, Jan 15, 9:41 AM

Wed, Jan 10

mareko added inline comments to D41715: AMDGPU: Process amdgpu.uniform on loads.
Wed, Jan 10, 1:56 PM
mareko added inline comments to D41715: AMDGPU: Process amdgpu.uniform on loads.
Wed, Jan 10, 1:49 PM
mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Wed, Jan 10, 8:50 AM

Tue, Jan 9

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.
Tue, Jan 9, 4:41 PM
mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

This needs to update AMDGPUAliasAnalysis. Also needs more test coverage. I don't see this testing unaligned access or some of the other places it was added.

Tue, Jan 9, 4:38 PM

Wed, Jan 3

mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Version 3:

  • Much simpler.
  • Addrspacecast is never inserted.
  • Unaligned loads are unaffected (work fine).
  • Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile.
  • D41715 (amdgpu.uniform on loads) is required for enforce scalar loads in some cases.
Wed, Jan 3, 1:47 PM
mareko created D41715: AMDGPU: Process amdgpu.uniform on loads.
Wed, Jan 3, 1:39 PM

Tue, Jan 2

mareko updated the diff for D41651: AMDGPU: Add 32-bit constant address space.

Version 2: As discussed on IRC.

Tue, Jan 2, 7:42 PM
mareko updated the diff for D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.

Switched the return type to v2i16.

Tue, Jan 2, 4:13 PM
mareko abandoned D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

Abandoning. The new plan is that we'll do it the right way in AMDGPU now.

Tue, Jan 2, 12:29 PM
mareko added inline comments to D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Tue, Jan 2, 8:43 AM
mareko added a comment to D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Yes.

Also, for my own curiosity, is this a temporary workaround until AMDGPU grows proper support for 32-bit GEPs or is this an inherent limitation?
If the former, maybe this hack should not go in (or at least we should consider what's the amount of work needed to finish implementing the support)

It's possible that this will be a permanent solution, because we don't plan to have full 32-bit support (it would be too much work in the backend for little benefit).

This is quite unfortunate. I'd like to point out I don't feel particularly comfortable to have this as a long term solution (but, maybe, OK for 6.0 & reverted in trunk).
The contract between the mid-level optimizer and the backend is that the latter should possibly accept everything produced by the former (or, FWIW, error out in some circumstances).
Maybe you can have something in your backend logic that recovers from the fact that AMDGPU doesn't know (and won't be taught) about 32-bit GEPs?
Have you considered something during legalization? [Apologies if I'm off, but I'm not really familiar with the way AMDGPU works, at least not in depth].

Tue, Jan 2, 7:31 AM
mareko added a comment to D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Tue, Jan 2, 6:19 AM
mareko created D41663: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}.
Tue, Jan 2, 4:25 AM

Mon, Jan 1

mareko added a comment to D41651: AMDGPU: Add 32-bit constant address space.

FYI, I'd like to get this into LLVM 6.0 if it's OK.

Mon, Jan 1, 12:29 PM
mareko added reviewers for D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP: arsenm, nhaehnle.

FYI, I'd like to get this into LLVM 6.0 if it's OK.

Mon, Jan 1, 12:28 PM

Sun, Dec 31

mareko created D41652: [InstCombine] Add an option to disable addrspacecast folding into GEP.
Sun, Dec 31, 8:21 PM
mareko created D41651: AMDGPU: Add 32-bit constant address space.
Sun, Dec 31, 8:09 PM
mareko added a comment to D40343: AMDGPU: Do not combine loads/store across physreg defs.

We need this fix in LLVM 6.0, right?

Sun, Dec 31, 11:05 AM

Thu, Dec 28

mareko accepted D41468: AMDGPU: Implement getTgtMemIntrinsic for images.
Thu, Dec 28, 6:55 AM

Tue, Dec 26

mareko accepted D41470: AMDGPU: Use unique PSVs for buffer resources.

Accepted, though I added suggestions for tbuffer intrinsics.

Tue, Dec 26, 7:07 AM
mareko accepted D41469: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores.
Tue, Dec 26, 6:56 AM
mareko added inline comments to D41468: AMDGPU: Implement getTgtMemIntrinsic for images.
Tue, Dec 26, 6:54 AM

Dec 8 2017

mareko accepted D40982: AMDGPU: image_getlod and image_getresinfo do not read memory.

LGTM.

Dec 8 2017, 7:15 AM

Dec 4 2017

mareko added a comment to D39040: AMDGPU: Fix creating invalid copy when adjusting dmask.

There are no piglit regressions.

Dec 4 2017, 1:17 PM

Nov 21 2017

mareko accepted D40303: AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer.

LGTM.

Nov 21 2017, 8:57 AM

Nov 16 2017

mareko added a comment to D40047: AMDGPU/GCN: Remove xnack from 801 and 810.

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Mesa can't modify SH_MEM_CONFIG to enable/disable XNACK. What is hardcoded in the kernel is what we get. XNACK is only enabled on compute rings on gfx8 APUs and on all rings on gfx9. In practice, Mesa should never access an unmapped page. I don't know if setting -xnack on all chips is a good idea in that case. We might also have suboptimal performance on gfx9 due to XNACK being always enabled by the KMD.

If Mesa always guarantees that the shaders will never access non-resident memory, and so will never have an XNACK, then it can always generate shaders that have XNACK disabled regardless of whether the kernel enables XNACK replay. In other words, enabling XNACK replay does not affect performance unless the shader chooses to generate XNACK compatible code. And the shader does not need to generate XNACK compatible code if it will never access a non-resident page.

For example, OpenCL 1.2 runtime always ensures all buffers are resident, and so compilers all shaders with XNACK disabled, regardless of whther the kernel has enabled XNACK replay.

So, does Mesa runtime always ensure all data accessed will be resident? Even for APUs? If so it would likely be a performance gain to always request no-XNACK unless page migration may also be active on the data accessed.

Nov 16 2017, 8:16 AM

Nov 15 2017

mareko added a comment to D40047: AMDGPU/GCN: Remove xnack from 801 and 810.

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Nov 15 2017, 6:42 PM

Nov 14 2017

mareko added a comment to D40047: AMDGPU/GCN: Remove xnack from 801 and 810.

Does this change break backwards compatibility with the gfx801 target? If so, which ROCm version will I need to use with these changes?

Nov 14 2017, 6:19 PM
mareko added a comment to D40047: AMDGPU/GCN: Remove xnack from 801 and 810.

Looks good to me.

Nov 14 2017, 5:51 PM

Nov 8 2017

mareko committed rL317754: AMDGPU: Lower buffer store and atomic intrinsics manually.
AMDGPU: Lower buffer store and atomic intrinsics manually
Nov 8 2017, 5:53 PM
mareko committed rL317755: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4.
AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
Nov 8 2017, 5:53 PM
mareko closed D39060: AMDGPU: Lower buffer store and atomic intrinsics manually by committing rL317754: AMDGPU: Lower buffer store and atomic intrinsics manually.
Nov 8 2017, 5:53 PM
mareko closed D39012: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4 by committing rL317755: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4.
Nov 8 2017, 5:53 PM
mareko committed rL317753: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4.
AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4
Nov 8 2017, 5:52 PM
mareko committed rL317752: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4.
AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4
Nov 8 2017, 5:52 PM
mareko closed D38950: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4 by committing rL317752: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4.
Nov 8 2017, 5:52 PM
mareko closed D38951: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4 by committing rL317753: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4.
Nov 8 2017, 5:52 PM
mareko committed rL317751: AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4.
AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
Nov 8 2017, 5:52 PM
mareko committed rL317750: AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM.
AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM
Nov 8 2017, 5:52 PM
mareko closed D38949: AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4 by committing rL317751: AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4.
Nov 8 2017, 5:52 PM
mareko closed D38915: AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM by committing rL317750: AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM.
Nov 8 2017, 5:52 PM

Oct 31 2017

mareko added a comment to D38951: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4.

Ping

Oct 31 2017, 2:10 PM
mareko committed rL317038: AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset.
AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset
Oct 31 2017, 2:07 PM
mareko closed D38914: AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset by committing rL317038: AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset.
Oct 31 2017, 2:07 PM
mareko updated the diff for D39012: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4.
  • don't ignore MOVolatile
Oct 31 2017, 2:04 PM
mareko updated the diff for D39060: AMDGPU: Lower buffer store and atomic intrinsics manually.
  • don't set MOVolatile
  • this precedes the buffer store merging patch
Oct 31 2017, 2:04 PM
mareko added inline comments to D39060: AMDGPU: Lower buffer store and atomic intrinsics manually.
Oct 31 2017, 1:49 PM
mareko added a comment to D39012: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4.

One minor comment inline.

The volatile / hasOrderedMemoryRef issue is worrying, and I don't think the change should go in as-is. Besides, it's probably not as big a win as the other patches, especially due to the added VGPR spills. Do you have all these patches in a branch somewhere?

Oct 31 2017, 12:43 PM
mareko added inline comments to D38949: AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4.
Oct 31 2017, 12:05 PM
mareko updated the diff for D38915: AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM.
  • address feedback
  • Offset == 0 also means that the operand is not an immediate.
Oct 31 2017, 11:53 AM
mareko added a comment to D38915: AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM.

Another improvement that can be done here is that LLVM sometimes generates OR instead of ADD, and I don't know how to tell when OR means ADD.

Oct 31 2017, 11:46 AM

Oct 26 2017

mareko committed rL316666: AMDGPU: Handle s_buffer_load_dword hazard on SI.
AMDGPU: Handle s_buffer_load_dword hazard on SI
Oct 26 2017, 7:43 AM
mareko closed D39171: AMDGPU: Handle s_buffer_load_dword hazard on SI by committing rL316666: AMDGPU: Handle s_buffer_load_dword hazard on SI.
Oct 26 2017, 7:43 AM

Oct 25 2017

mareko added a comment to D39040: AMDGPU: Fix creating invalid copy when adjusting dmask.

In case it's still confusing: the number of components returned by image opcodes is popcount(dmask). The code could just do popcount(dmask) instead of computing BitsSet.

Oct 25 2017, 7:33 AM
mareko added a comment to D39040: AMDGPU: Fix creating invalid copy when adjusting dmask.

With this, when dmask = 0x2, we get "image_get_lod v[0:1], ...". Based on what Marek said, wouldn't it only be returning a single value to v0? What would v1 get set to?

In comparison, for image_sample with dmask = 0x2 I see only a single destination register specified. That's also what I see on the proprietary driver for image_get_lod.

Oct 25 2017, 7:30 AM

Oct 24 2017

mareko committed rL316427: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1).
AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Oct 24 2017, 3:27 AM
mareko closed D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1) by committing rL316427: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1).
Oct 24 2017, 3:27 AM
mareko committed rL316426: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
Oct 24 2017, 3:27 AM
mareko closed D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic by committing rL316426: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Oct 24 2017, 3:27 AM
mareko updated the diff for D39171: AMDGPU: Handle s_buffer_load_dword hazard on SI.

Cleanups.

Oct 24 2017, 2:49 AM
mareko added inline comments to D39171: AMDGPU: Handle s_buffer_load_dword hazard on SI.
Oct 24 2017, 2:47 AM
mareko accepted D39040: AMDGPU: Fix creating invalid copy when adjusting dmask.

LGTM.

Oct 24 2017, 2:16 AM

Oct 22 2017

mareko created D39171: AMDGPU: Handle s_buffer_load_dword hazard on SI.
Oct 22 2017, 2:13 PM

Oct 18 2017

mareko added a comment to D39040: AMDGPU: Fix creating invalid copy when adjusting dmask.

Each bit of dmask determines whether that component is enabled. Image opcodes return 4 components if dmask == 0xf. If dmask == 0x2, image opcodes only return the 2nd component in <1 x float>. If dmask = 0x5, image opcodes return the 1st and 3rd component in <2 x float>. If dmask = 0xa, image opcodes return the 2nd and 4th component in <2 x float>.
Gather4 opcodes are an exception and always return 4 components.

Oct 18 2017, 9:57 AM
mareko created D39060: AMDGPU: Lower buffer store and atomic intrinsics manually.
Oct 18 2017, 9:49 AM

Oct 17 2017

mareko created D39012: AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4.
Oct 17 2017, 10:29 AM
mareko updated the diff for D38951: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4.

fix up merge-stores.ll

Oct 17 2017, 9:16 AM
mareko updated the diff for D38915: AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM.

Rebase.

Oct 17 2017, 8:46 AM
mareko updated the diff for D38914: AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset.

Preserve GLC.

Oct 17 2017, 8:45 AM
mareko added inline comments to D38914: AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset.
Oct 17 2017, 8:30 AM

Oct 16 2017

mareko added a comment to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.

My understanding is that the existing llvm.amdgcn.wqm() intrinsic annotates "expression trees" for the WQM pass and might not insert any instructions at the call site. This new wqm.vote intrinsic does translate to S_WQM_B{wavesize} directly.

Oct 16 2017, 3:01 PM
mareko created D38951: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4.
Oct 16 2017, 6:05 AM
mareko created D38950: AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4.
Oct 16 2017, 6:05 AM
mareko created D38949: AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4.
Oct 16 2017, 6:04 AM

Oct 15 2017

mareko updated the diff for D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.

fix the undef test.

Oct 15 2017, 8:09 AM
mareko updated the diff for D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.

also test undef.

Oct 15 2017, 7:47 AM

Oct 13 2017

mareko created D38915: AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM.
Oct 13 2017, 5:50 PM
mareko created D38914: AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset.
Oct 13 2017, 5:50 PM
mareko added a comment to D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1).

Nicolai, that's a good point, though let's just merge this intrinsic replacement for now.

Oct 13 2017, 12:26 PM
mareko updated the diff for D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1).

Address feedback.

Oct 13 2017, 12:22 PM
mareko updated the diff for D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.

Address feedback.

Oct 13 2017, 10:01 AM

Oct 5 2017

mareko added inline comments to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Oct 5 2017, 10:47 AM

Oct 4 2017

mareko added inline comments to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Oct 4 2017, 3:42 PM
mareko added inline comments to D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1).
Oct 4 2017, 3:15 PM
mareko added inline comments to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Oct 4 2017, 11:46 AM
mareko created D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1).
Oct 4 2017, 8:05 AM
mareko created D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Oct 4 2017, 8:04 AM

Aug 30 2017

mareko added a comment to D22898: AMDGPU: Fix ffloor for SI.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

What does that mean?

IEEE canonicalize. http://www.llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic

NaNs are quieted, denormals may be flushed

Aug 30 2017, 2:53 AM

Aug 29 2017

mareko added a comment to D22898: AMDGPU: Fix ffloor for SI.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

Aug 29 2017, 1:18 PM

Jul 25 2017

mareko committed rL309028: AMDGPU/SI: Fix Depth and Height computation for SI scheduler.
AMDGPU/SI: Fix Depth and Height computation for SI scheduler
Jul 25 2017, 1:37 PM
mareko closed D34967: AMDGPU/SI: Fix Depth and Height computation for SI scheduler by committing rL309028: AMDGPU/SI: Fix Depth and Height computation for SI scheduler.
Jul 25 2017, 1:37 PM · Restricted Project
mareko committed rL309027: AMDGPU/SI: Force exports at the end for SI scheduler.
AMDGPU/SI: Force exports at the end for SI scheduler
Jul 25 2017, 1:37 PM
mareko closed D34965: AMDGPU/SI: Force exports at the end for SI scheduler by committing rL309027: AMDGPU/SI: Force exports at the end for SI scheduler.
Jul 25 2017, 1:37 PM · Restricted Project

Jul 4 2017

mareko closed D34190: [AMDGPU] Fix latency of MIMG instructions.

Committed manually. Closing.

Jul 4 2017, 7:44 AM
mareko committed rL307081: [AMDGPU] Fix latency of MIMG instructions.
[AMDGPU] Fix latency of MIMG instructions
Jul 4 2017, 7:44 AM